* Use CUDA callback to release deferred-release buffers
Polishment
* Minor improvements.
1. Reorder a if-else so that frequent cases are checked first.
2. More documents.
* Fix tests.
Previously, in CUDAExecutionProvider::OnRunStart, we call
GetPerThreadContext in
auto& current_deferred_release_event = GetPerThreadContext().GetCurrentDeferredReleaseEvent();
so that a CUDAExecutionProvider always owns an active PerThreadContext
and the ReleasePerThreadContext in CUDAExecutionProvider::OnRunEnd
is always valid. However, this isn't true after we drop event-
based deferred-release code, so we need to check if
CUDAExecutionProvider really owns PerThreadContext than call
ReleasePerThreadContext if yes.
* Follow up for AMD GPU and improve CUDA part's return value.
Currently, CUDA hardware is not available to be leveraged by build
during `docker build`. because of that, CUDA capable hardware would not
have CUDA support
This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA
extensions to be compiled even when CUDA support is not detected.
* drop nuphar code and configs
* refactor test case
* format python
* remove nuphar from training test
* remove commented nuphar logics
* restore llvm setting
* drop nuphar ci
* fix compile err
* fix compile err
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind
* Add overloaded constructor for KVP, UT still in progress
* Fix class member access in pybind, fix unit test
* Resolve linter warnings
* Improve formatting
* Simplify UT
* Fix linter formatting
Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
Fix a few obvious issues:
(1) bert_perf_test.py create session without provider in line 65.
(2) compare_bert_results.py miss a parameter in create_session in line 37
(3) onnx_exporter.py returns value mismatch in lines 667, 690.
(4) remove some imports not used in the scripts.
(5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"...
(6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+
* Adding Split Fusion
* Make changes to comments
* Format files and change typo
* Format files and change typo
* Format files and change typo
* Format files and change typo
* Format file
* Format files
* Format files
* Format files
* Format files
* Update stale.yml
Change the number of days of inactivity before an issue becomes stale from 60 to 5 and the number of days of inactivity before a stale issue is closed from 7 to 5. Update the exempt labels based on the redefined set of GH labels.
* Implement stale.yml feedback.
**Description**: Create codeql.yml to replace LGTM
**Motivation and Context**
LGTM.com is shutting down and moving to github code scanning. This PR enables github code scanning.
cpp and c# support will be added in a separate pr.
**Description**: Remove reference to the deprecated variable in `torch.onnx.symbolic_helper` pytorch/pytorch#81953
- Removed unused imports
- Changed BANNED_AUTOGRAD_FUNCTION_NAMES to a frozenset
**Motivation and Context**
The cast_pytorch_to_onnx variable is deprecated and removed in `torch.onnx.symbolic_helper`. Since there is still a need for converting scalar types to onnx type, I copied the mapping to `_CAST_PYTORCH_TO_ONNX` in the module.
* upgrade emsdk to 3.1.19
* fix build break
* ignore '-Wunused-but-set-variable' in eigen
* add malloc and free in exported functions
* EXPORTED_FUNCTIONS