* Partial updating of ROCM reduction code.
* Update reduction_all.cu
* Add reduce template parameters.
* miopen common
* Reuse CUDA's reduction_functions.cc
* Reduction ops.
* Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels.
* Disable a couple more unsupported tests.
* Code formatting.
* Delete ROCM-specific reduction code that is identical to CUDA reduction code.
* Fix scratch buffer early free.
* Fix merge conflict.
* first attempt nightly amd ci pipeline
* try fix bad yaml file
* try again with corrected model directory
* add convergence test as well
* update reference loss for amd mi100
* include mi100 test results csv
* update the mi100 convergence test reference values
* update batch sizes for mi100 32g
* fix gpu sku for run_convergence_test.py
* undo unrelated changes to master
* pr comments
* pr comment
Co-authored-by: Jesse Benson <jesseb@microsoft.com>
This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.
Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.
With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.
The cache container registry will need to be cleaned up periodically. This is not automated yet.
* Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Update MaxBatchSize and include recompute mode
* Minor fix for frontend test
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* gpt2 training perf
* gpt2 training perf
* debug
* debug
* debug
* fix bug
* minor
* on comments
* dynamic sql
* fix build
* minor
* linked hash
* on comments
* minor
* mem
* minor
Co-authored-by: Ethan Tao <ettao@microsoft.com>