This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.
Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.
With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.
The cache container registry will need to be cleaned up periodically. This is not automated yet.
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
* Add validation of operator registrations to the reduction script
- the script has all the logic to process the registrations, and there's a CI that uses it
Fix some operator registrations
* Fix CUDA PRelu registration
* Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script.
Run op validation in minimal build CI
* Fix PEP8 error and some comments
* Add copy sparse model in minimal CI
* Add squeeze 13 support
* fix small typo
* Add ut for squeeze in NNAPI
* Fix some issue in the UT and code
* Modify based on the master change
* Fix build break
* Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add YAML file for pipeline
* Modify typo
* Add working directory
* Modify and test
* Modfiy and test
* Modify and test
* Modify and test
* Modify
* Modify
* Modify
* Modify
* Make sure to copy all the result files
* Add clearn up
* Modify
* Modify agent pool name
* Upload only specific artifacts
* Modify
* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target
* Fix bug
* Fix bug
* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation
* Modify the script file for CI
* Replace print with logger.info
* Fix bug
* Fix bug
* Refine the code
* Modify the script so that it can capture script segmentation fault while
running ORT
* Fix bug
* fix bug
* fix bug
* Add debug info
* fix bug
* Refine perf code
* Refine the code
* fix bug
* Code refactoring
* change many-models path
* remove metadata after validation/benchmark are done
* Update README.md
* Fix bug so that metadata doesn't hold stale value
* Remove hardcode and update README
* Add arguments to the script to make it run correctly
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Fix bug so that metadata doesn't hold stale value
* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information
* use -i random for perf test of TRT changes
Co-authored-by: Olivia Jain <oljain@microsoft.com>
* create new nuget packaging pipeline without openmp
* rename package
* update image name
* rename package name
* rename managed package
* reset project attribute
* merge master
* set package name
* set NoOpenMP as cpu build
* shorten line length
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/
ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …
Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.
Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……
The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Cmake changes for 2021.1
* added new ov version 2020.1 for faster rcnn
* Added missing defs
* equal op modified
* changes to incoroporate faster rcnn
* backend util.cc
* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp
* changing myriad precision bool to i32
* gather is not enabled for gpu
* conv2D and pooltest auto_pad attribute should not be null
* negative indices are not valid for scatter op in myriad
* non max suppression op only supported in faster rcnn mode
* maxpool indices output is not supported
* Cleaned redundant code in backends
* Added ifdefs for HDDL config
* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify
* we are limiting the subgraph size to 3 here
* taking care of review comments
* Fixed minor bugs
* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph
* incorporated upsample conditions too
* Dockerfile changes for 2021.1 release
* dockerfile aptkey update
* Minor fixes
* ceil condition added again
* Fixed few gpu models
* Disabled LSTM and yolov3 in ModelTests
* python softmax cross entropy tests and negative log likelihood
* Update Build.md
Updated for openvino 2021.1
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider for 2021.1
* Update READMe.md
updated new openvino version
* Update Dockerfile.openvino
added environment variable for DEBIAN Frontend
* Fixed myriad models
* Fixed gather condition
* Fixed mask rcnn model on myriad
* Modified Gather condition
* set default target of MCR dockerfile to MYRIAD_FP16
* Fixed tinyolov3 on CPU
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider documentation
* Update Dockerfile.openvino
Removed environment variable
* Update OpenVINO-ExecutionProvider.md
update image manipulation networks supported
* Update onnx_backend_test_series_filters.jsonc
removed test_upsample_nearest from cpu test cases
* New InternalCI changes for 2021.1
* Full protobuf removed for OpenVINO
* Protobuf added
* Updated with apt installation for openvino
* Revert the testing changes
* Reverted testing changes
* File permessions are changed to original
* Deleted openvino installation and cmake change
* Optimized Dockerfile
Removed unnecessary cmake installation, numpy
* Added missing ifdefs
* delete array fix
* backend_utils.cc output_shape
* Revert "set default target of MCR dockerfile to MYRIAD_FP16"
This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* add the ios ci build.
* no dependency on mac ci pipeline.
* fix the command line.
* keep sync
* automatically retrieve sdpath
* fix the case errors and warnings
* fix the vlog switch issue.
* add parallel flag for build.
* update the display name of the pipeline.
- Update docker image release build to use build commit.
- Use valid default in component governance detection step.
- Use smaller docker build context.
* Nuget store packaging
* Move DNNL workaround to EP
* Fix warning as error
* Disable store tests
* Skip store tests
* msbuild target
* Cross compile protoc in Store
* Disable DML in store
* Move store builds to CPU queue
* Copy uap10 to final nuget
* Fix pip8 error
* Remove extra dml copies
* Fix argparse
* pep8
* Forward IsStoreBuild
* Apply is_store_build to duplicate generate_nuspec
* runtimes
* Refactor uap10
* Store .NET
* uap
* PR feedback
* cancel night build on pyop
* setup win cuda11 pipeline
* add debug build
* test base gpu settings
* setup pipelines to test cuda 10.2 and 11
* rename linux docker images
* rename docker image tag and add clean up job
* fix typo in cuda 11 config
* set cuda11 env
* update linux cuda 11 pipeline
* reset docker image name
* disable uninitialized warning from linux build
* change the way to silence uninitialized warning
* add flags to linux gpu pipeline
* switch docker image for linux cuda 10.2
* switch linuc cuda 10.2 image
* test cuda11 with devtool8
* try latest built images
Co-authored-by: Randy Shuai <rashuai@microsoft.com>