onnxruntime/tools/ci_build
Suffian Khan e6de0eb813
Add nightly pipeline for MI100 to run convergence and batch size test similar to V100. (#6611)
* Partial updating of ROCM reduction code.

* Update reduction_all.cu

* Add reduce template parameters.

* miopen common

* Reuse CUDA's reduction_functions.cc

* Reduction ops.

* Update remaining reduction ops to use MIOpen.  double datatype is not supported, so disable those typed kernels.

* Disable a couple more unsupported tests.

* Code formatting.

* Delete ROCM-specific reduction code that is identical to CUDA reduction code.

* Fix scratch buffer early free.

* Fix merge conflict.

* first attempt nightly amd ci pipeline

* try fix bad yaml file

* try again with corrected model directory

* add convergence test as well

* update reference loss for amd mi100

* include mi100 test results csv

* update the mi100  convergence test reference values

* update batch sizes for mi100 32g

* fix gpu sku for run_convergence_test.py

* undo unrelated changes to master

* pr comments

* pr comment

Co-authored-by: Jesse Benson <jesseb@microsoft.com>
2021-02-12 13:22:06 -08:00
..
github Add nightly pipeline for MI100 to run convergence and batch size test similar to V100. (#6611) 2021-02-12 13:22:06 -08:00
__init__.py Add validation of op registrations (#5817) 2020-11-17 10:44:09 -08:00
amd_hipify.py Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models. 2021-02-06 14:40:34 -08:00
build.py Don't update the excluded ops/types unless args.update is true. Updating the exclusion info triggers rebuilding of all kernels using type reduction. (#6604) 2021-02-09 07:15:31 +10:00
clean_docker_image_cache.py Fix clean_docker_image_cache.py detection of image pushes. (#6151) 2020-12-16 17:25:22 -08:00
coverage.py Add support for running Android emulator from build.py on Windows. (#6317) 2021-01-13 19:21:49 -08:00
exclude_unused_ops_and_types.py Support disabling a typed kernel registration that uses the output type (#6530) 2021-02-03 14:22:32 +10:00
gen_def.py Reduce IOS shared library size by symbol file. (#5171) 2020-09-14 23:59:41 -07:00
get_docker_image.py Update get_docker_image.py to enable use without image cache container registry. (#6177) 2020-12-18 19:01:02 -08:00
logger.py Cache build docker images in container registry. (#5811) 2020-11-17 17:02:24 -08:00
op_registration_utils.py Support disabling training kernels as part of a reduced build (#6557) 2021-02-09 09:51:31 -08:00
op_registration_validator.py Support disabling a typed kernel registration that uses the output type (#6530) 2021-02-03 14:22:32 +10:00