* implement cuda provider
* define profiler common
* call start after register
* add memcpy event
* add cuda correlation
* format code
* add cupti to test path
* switch to CUpti_ActivityKernel3
* reset cupti path
* fix test case
* fix trt pipeline
* add namespace
* format code
* exclude training from testing
* remove mutex
* Remove USE_TENSORRT macro and disable TRT EP at runtime if not support
* Remove USE_TENSORRT macro and disable TRT EP at runtime if not support
* Remove USE_TENSORRT macro and disable TRT EP at runtime if not support
* handle unused parameters
* Remove USE_TENSORRT macro and disable TRT EP at runtime if not support
* Remove USE_TENSORRT macro and disable TRT EP at runtime if not support
* handle unused parameters
* Disable some testcases
* only include opset13 for testing and add a keyword filter set
* rename variable
* add back code which was accidentally commented on previous commit
* Adjust model test filter for opset14
* Added code to support Softmaxgrad
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Bringing back the opset filters for softmax that I had removed.
This will fix the test failures from onnnx repo.
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Now that DML has int64 support directly, register the related operators for uint64/int64 (rather than the hack in the ORT DML EP with doubled strides).
## Remaining work
- Not implemented in DML: CumSum, Range, MaxPool/MaxUnpool, TopK, ReduceProd/Sum/SumSquare/L1
- Implemented in DML but need DML EP kernel work: Clip, Pad, Neg, Range, ConstantOfShape
```
te.exe OnnxConformanceTests.dll
Summary: Total=4454, Passed=4147, Failed=0, Blocked=0, Not Run=0, Skipped=307
```
Corresponding PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/6486426
Related work items: #28761231, #33883294
* make work for both rocm 4.2 and rocm 4.3.1
* fix rocm 4.3.1 docker image reference
* fix CUDA_VERSION to ROCM_VERSION
* fix ReduceConsts conflict def
* add ifdef to miopen_common.h as well
* trailing ws
* 2021.4.1 Docker and ci changes
* OV version change
* Removing Imagescaler op from the op's list
Reverting this change which was added in last
PR. Imagescaler is now deprecated. so removing
it from the supported list. Also this
op is causing regression in the performance
of the FP16 models.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Re-writing the help message for num_of_threads
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
* try to run inside 4.3.1 container
* no \ in container run command
* remove networking options
* try with adding video render groups
* add job to build docker image
* try without 1st stage
* change alpha, beta to float
* try adding service connection
* retain huggingface directory
* static video and render gid
* use runtime expression for variables
* install torch-ort
* pin sacrebleu==1.5.1
* update curves for rocm 4.3.1
* try again
* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05
* disable RoBERTa due to high run variablity on ROCm 4.3.1
* put reduction unit tests back in
* Globally enable ms-experimental ops
* change meaning of ms_experimental to mean *all* ms_experimental ops. Some experimental ops will still be enabled globally without this flag like audio ops.
* add cmath
* add cmath to signal_defs.cc
* move audio back into experimental, verify on mac
* remove experimental from mac builds
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>