* consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors
* update ONNX 1.12.1
* test updated PR
* use official rel-1.12.1 commit
* Warn on node EP fallback from preferred provider
* Clarify with comment
* Update to ORT's weird coding style for ragged parameter wrap
* Android build error: unused parameter ‘providers’
* Update logic to be more robust
* Updates from Pranav's feedback about messaging to rerun with verbose and respecting explicit vs implicit EP addition. Also merge from main.
* brace style patch up
* Update with feedback from Pranav and Scott McKay
* Restore node_placement_set after realizing it only applies when is_verbose is true
* Fix build warning on Android
* Renamed to node_placement_provider_set per Pranav's suggestion
1. Move the Linux ARM64 part of python packaging pipeline to a real ARM64 machine pool
2. Refactor the Linux CPU build jobs of python packaging pipeline to two parts: build and test. The test part will be exempted from Cyber EO compliance requirements as it won't affect the final bits we publish. This refactoring is to reduce dependencies in the build part. For example, this PR remove pytorch from the build dependencies.
3. Combine DML nuget packaging pipeline with "Zip-Nuget-Java-Nodejs Packaging Pipeline" as they all produce ORT nuget packages. Also, publish DML nuget packages and ORT GPU nuget packages to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly feed.
* Changes for AIX compilation to get CPU of running thread. hz is internal variable in AIX, hence changing to hz1 in window_functions.cc so that all OS shall work
Co-authored-by: madurais <root@telesto10.in.ibm.com>
Co-authored-by: tvkai <vamshikrishna@in.ibm.com>
* Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP
* Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP
The kernel already masks off invalid items and this gives a much
faster implementation in hipCUB.
* Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel
Hard code accumulator to fp32 for hipCUB in indicated kernel.
* Reset casting to old behavior
* Document steps to optimize SoftMax kernel on ROCm EP
Usage of the hipCUB valid_items interface on reduction operations
has a significant performance impact. Masking all thread data to
avoid need to use the valid_items interface to hipCUB.
* Fix bug in pybind get_all_operator_schema due to premature reference dropping
* Add updated operator kernels markdown table
* Update build.py to include documentation generation for DML operators too
* Update GPU pipeline to include DML in the build to so operators can be generated.
* Use a separate pipeline stage, feedback from Changming and Scott
* Appease annoying Python linter
* Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage
* Use CUDA callback to release deferred-release buffers
Polishment
* Minor improvements.
1. Reorder a if-else so that frequent cases are checked first.
2. More documents.
* Fix tests.
Previously, in CUDAExecutionProvider::OnRunStart, we call
GetPerThreadContext in
auto& current_deferred_release_event = GetPerThreadContext().GetCurrentDeferredReleaseEvent();
so that a CUDAExecutionProvider always owns an active PerThreadContext
and the ReleasePerThreadContext in CUDAExecutionProvider::OnRunEnd
is always valid. However, this isn't true after we drop event-
based deferred-release code, so we need to check if
CUDAExecutionProvider really owns PerThreadContext than call
ReleasePerThreadContext if yes.
* Follow up for AMD GPU and improve CUDA part's return value.
Currently, CUDA hardware is not available to be leveraged by build
during `docker build`. because of that, CUDA capable hardware would not
have CUDA support
This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA
extensions to be compiled even when CUDA support is not detected.
* drop nuphar code and configs
* refactor test case
* format python
* remove nuphar from training test
* remove commented nuphar logics
* restore llvm setting
* drop nuphar ci
* fix compile err
* fix compile err
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind
* Add overloaded constructor for KVP, UT still in progress
* Fix class member access in pybind, fix unit test
* Resolve linter warnings
* Improve formatting
* Simplify UT
* Fix linter formatting
Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
Fix a few obvious issues:
(1) bert_perf_test.py create session without provider in line 65.
(2) compare_bert_results.py miss a parameter in create_session in line 37
(3) onnx_exporter.py returns value mismatch in lines 667, 690.
(4) remove some imports not used in the scripts.
(5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"...
(6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+