This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.
Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.
With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.
The cache container registry will need to be cleaned up periodically. This is not automated yet.
* adding parallelization for resize bi-linear mode.
* Adding parallelization for resize op.
* Use TrySimpleParallelFor instead of TryParallelFor.
TryParallelFor has unaddressed issue with cost model.
* Addressing PR comments.
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
* Add validation of operator registrations to the reduction script
- the script has all the logic to process the registrations, and there's a CI that uses it
Fix some operator registrations
* Fix CUDA PRelu registration
* Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script.
Run op validation in minimal build CI
* Fix PEP8 error and some comments
* Add copy sparse model in minimal CI
* Add squeeze 13 support
* fix small typo
* Add ut for squeeze in NNAPI
* Fix some issue in the UT and code
* Modify based on the master change
* Fix build break
* Merged PR 5253310: Fix 0-sized dimension broadcasting
Tensors that contain 0-sized dimensions were being broadcasted to higher dimensions, which would remove the possibility to remove them from the graph. 0-sized dimensions represent empty tensors, so whatever operator needs to broadcast it shouldn't try to call into DML.
* Merged PR 5334334: Fix asserts and failure in GraphKernelHelper.cpp
This extends a workaround needed to match node inputs with Tensors to the EP code handling constant input upload.
This was causing issues in a couple of models, including EfficientDet, although that model still fails due to this bug:
https://microsoft.visualstudio.com/OS/_workitems/edit/29970551
Related work items: #29706035
* Merged PR 5344477: Disable GPU timeouts in DML EP command queue creation
GPU timeouts have already been disabled in command queues created by Winml, but not the ones created by the DML EP within the ORT API
* Merged PR 5380534: BatchNormalization failure in autopilot - fix output size
New validation [here](https://microsoft.visualstudio.com/DefaultCollection/WindowsAI/_git/WindowsAI/pullrequest/5354070?_a=files&path=%2Fdml%2FSharedValidation%2FDmlBatchNormalizationOperatorValidator.h) causes some BatchNorm cases to fail (e.g. OnnxConformanceTestsTaef::BatchNormalization (BatchNormalization_2x2x2)). I'm unsure how long this bug existed, but based on Nick's investigation, it apparently still worked anyway.
Related work items: #27678610
* Merged PR 5386132: Update 8D BatchNorm
Update 8D BatchNorm
Related work items: #27678610
* Merged PR 5390213: Tile allow 0 in repeats
0 is valid in Tile in "repeats" parameter. The CPU kernel handles it fine. So should the DML EP.
Related work items: #29970551
Co-authored-by: Justin Stoecker
Co-authored-by: Jeff Bloomfield
Co-authored-by: Patrice Vignola
Co-authored-by: Nick Feeney
* switch to work PC
* back with iterable of buffers
* add raw api tests
* tensorization
* last test
* all tests pass!
* small cleanup
* whitespace
* newline
* whitespace
* refactor common code into DisjointBufferHelpers
* remove unused file
* warning
* skip gpu tests when hardware not available
* Add error condition when createreference is invoked
* add null check to cretereference
* uncomment out check
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* add case for cpu custom op on gpu
* format doc
* restrict GPU custom op on Linux GPU CI only
* separate cu file to a independent project
* fix typo
* include cuda_add lib
* move lib def
* add file header
Co-authored-by: RandySheriffH <rashuai@microsoft.com>
* Make NNAPI EP build on non-Android Platform
* minor updates
* Adress CR comments
* Fix build issue using Windows, address CR comments
* Fix linux build warnings
* Fix for test failure
* Fix for test failure
* Fix model_tests failure
The kernel declaration of Identity needs to be updated in ROCm EP since
ROCm EP shares the implementation of Identity with CUDA EP in which it
has been changed due to opset 13 support.
* ng_supported_ops
* Remove ng_supported_ops
* Revert "Remove ng_supported_ops"
This reverts commit 3c27385b2d88c6e8cf7ac4e8c290a367ad5d0bd8.
* Revert "ng_supported_ops"
This reverts commit 650721ae2913b79739521d58838298e031abdac1.
* cmake changes to ensure that the debug build on windows link to debug builds of openvino
and do not result in bad allocation error
Co-authored-by: sfatimar <sahar.fatima@intel/com>
* gradient builder for opset13
* code clean.
* resolve comments
* stop grad for axes input
* add split to stop grad list.
Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
This commit adds shape inference support for the following ops:
SoftmaxCrossEntropy
SoftmaxCrossEntropyLossGrad
SoftmaxCrossEntropyGrad
LayerNormalizationGrad
Motivation and Context
* add profile caching to improve engine caching feature
* Add comments
* fix typo
* add decryption for engine caching
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* update onnx-tensorrt submodule
* set opt profile to max value of the range
* add hash to engine/profile name
* Add calibration based INT8 quantization
* add an option to enable both FP16 and INT8
* Update tensorrt_execution_provider.cc
* add env variable to specify calibration file name
* clean up code
* Add comments and update TRT document
* enable tensorrt basic test and add EngineCachingTest
* clean up
* update envrionment variable in the test
* clean up
* Introduce PassThrough op to wait for all gradient ready before weight update
* Compute gradient norm for fp32 runs
* Update FE UT expected value
* Respect enable_grad_norm_clip
* Large model export and run ORT Python support
* Megatron change
refine a bit
workaround self attention issue
use partitioned name for weights when megatron model parallel is enabled
Fix Megatron Transformer Issue (cuased by the renaming)
Add UTs for T5 model parallel
Fix megatron seed issue
fix log a bit
checkkpointing changes + rebase
Unintended reshape transform change
t5 layer norm changes
add t5 layer norm kernel
use template for t5 layer norm
template definition changes
no build error
add CPU cuda kernel
first unit test
other forward unit tests
add T5LayerNormGrad
Add c++ transform and test for T5 LN
minor fix
BART MLP Megatron tranform
Add concat slice transform + test
Cosmetic improvements in concat slice transform
Constant folding bug fix + megatron attention transform for BART
Undo unnecessary changes
* Cleanup
* Remove unnecessary changes
* Cleanup megatron
* Windows build
* Add self attention test graph
* Correcting transforms + cleanup
* review comments
* review comments
* fix build and test failures
* Fix CI
* fix windows CI
Co-authored-by: Peng Wang <pengwa@microsoft.com>
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Enabling Multi Device support for UEP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor fix added
*Added a simple fix to determine OpenVINO
version for Arm build as well
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Move GetCapability independent of ModelBuilder
* minor code style fix
* Move ort_enforce for same number of op_builders and op_support_checkers
* minor code fix
* cpu send/recv
* clean up send/recv
* remove unused code
* assert and nccl option for mnist
* add build option to enable build with only cpu. Without this, nccl is always enabled which will break build on machine that only contains cpu
* Add USE_MPI distinct from USE_NCCL/USE_HOROVOD
* fix
* fix
* exclude cpu send/recv for machines without mpi
Co-authored-by: Tim Harris <tiharr@microsoft.com>