Updated MLOperatorAuthorPrivate.h to remove `enum DML_TENSOR_DATA_TYPE;` to avoid warning "C4471: 'DML_TENSOR_DATA_TYPE': a forward declaration of an unscoped enumeration must have an underlying type"
Updated OperatorUtility to avoid compiler error errors C2672 and C2783.
- Error C2672: 'TryMapStringToIndex': no matching overloaded function found
- Error C2783: 'std::optional<_Ty> Dml::TryMapStringToIndex(std::string_view,gsl::span<const Dml::NameAndIndex>)': could not deduce template argument for 'T'. note: see declaration of 'Dml::TryMapStringToIndex'. 'TryMapStringToIndex': function declaration must be available as none of the arguments depend on a template parameter
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
* libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package.
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* Make DmlEp Clang compatible for EPIC
* Fix build issues occurred when engine/lotus points to ORT Github latest
* Fix more build errors
* Fixed one build issue and removed temporary changes for Clang
* Addressed comments on the PR.
* [WIP] - DmlEp ORT NO Exception
* Made DmlEp compatible with ORT_NO_EXCEPTION
* Fixed typo
* Addressed comments on the PR, mostly nit styling and using approriate HR error code
* Added dependency of ErrorHandling.h
* Addressed comment on the PR
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
* Update MLOperatorAuthorImpl to remove warning C4495
Update MLOperatorAuthorImpl to remove warning C4495: nonstandard extension '__super' used: replace with explicit base class name
* Update DmlOperatorRecurentNeuralNetwork to avoid warning C4495
Update DmlOperatorRecurentNeuralNetwork to avoid warning C4495: nonstandard extension '__super' used: replace with explicit base class name
* Add source for conv_grad
* Add sources for ROCm EP.
* Transliterate sources for conv_grad for ROCm EP.
* Add conv_grad to ROCm EP
Add conv_grad to ROCm execution
provider.
* Update ROCm EP ConvGrad
Update ConvGrad for the ROCm EP to match other EP
changes and fix a build issue.
* Make DmlEp Clang compatible for EPIC
* Fix build issues occurred when engine/lotus points to ORT Github latest
* Fix more build errors
* Fixed one build issue and removed temporary changes for Clang
* Addressed comments on the PR.
* Style fixes
* Fix unreachable code
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
* Changes to fuse embed layer for gpt2, kernal changes pending
* verified add output and regular add match
* Test added for additional output embedlayernorm, working on CUDA
* Test passing on CPU
* updated convert_to_onnx toll to check parity correctly
* removed some debugs
* couple of TODO left as in optimizer.py
* removed changes to optimizer.py
* fixing build
* fixing build
* updated order of initilization
* added a test case for float16
* updating the docs
* updating tests failing due to embed layer fusion
* update unit tests
* updating CUDA documentation in operatorkernels.md
* addressing comments
* OperatorKernels.md updated with CUDA
* adding TODO to qembed_layer
* minor edit
* updated docs
* addressing comments
* adding position ids to embed layer gpt2
* updating fused gpt2 model
* added extra test
* remove comments
* addressing comments
* contrib_defs.cc updated
* all tests passing
* fixing a typo
* minor edit
* trigger build
* qembedlayernorm checkinputs updated
* fixing build error
* fixing build error
* fixing build error