* Specify list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
* Supply list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
- Introduce an extra utility to help creating maps with expected capacity
* Move utility function to OrtUtil and drop MapUtil, also add Java doc to method
* Move test to the right class
Java side parts for configuring CUDA and TensorRT.
Adding tests for CUDA and TensorRT. Refactoring library loading logic as provider options need to have their shared library loaded before they can be constructed.
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101.
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.
* More cuda shared library refactoring
* More cuda shared library refactoring
* More build options tested, converted the training ops over.
* Fix merge breaks
* Fix submodules
* Fix submodules
* Fix submodules
* Fix python
* Fix compile errors
* Duplicate symbol fix
* Test fix for ROCM provider
* Another ROCM test workaround
* ROCM Build Test
* ROCM build fix
* ROCM
* ROCM
* ROCM
* ROCM
* ROCM
* ROCM test
* Reduce header dependencies
* Remove redundant namespace
* Test fix for linux
* Fix linux build
* Fix Eigen build error
* Fix unused parameter warning
* Test link error
* Another linker test
* Linker test
* Linker test
* Another test
* Another build test
* Fix linux link error
* Build test
* Fix control flow ops to use common base class with core code
* Remove extra qualifiers
* Fix template syntax for linux
* Fix cuda memory leak
* Fix pybind
* Test disabling cast
* Cleanup
* Restore cuda in test
* Remove more header dependencies
* Test not adding cuda provider to session
* Make GetProviderInfo_CUDA throw
* No-op cuda provider creation
* Fix some setup issues
* Fix memory cleanup on unload
* Diagnostics
* Don't unload library
* Add diagnostics
* Fix deleting registry at right time.
* Test disabling profiler
* Fix merge break
* Revert profiler change
* Move unloading of shared providers into Environment
* Free more global allocations before library unloads
* Add more diagnostics
* Move unloading back to the OrtEnv as there are multiple Environments created during a session.
Remove some library dependencies for tests.
* Fix more cmake files
* ERROR -> WARNING
* Fix python shutdown
* Test not using dml in pipeline
* Change python version and disable dml
* Update python version
* Test adding unload method for shared providers
* Disable DLL test
* Python test
* Revert "Python test"
This reverts commit c7ec2cfe98.
* Revert "Disable DLL test"
This reverts commit e901cb93aa.
* Revert "Test adding unload method for shared providers"
This reverts commit c427b78799.
* Point to RyanWinGPU
* Revert python version
* Fix id_to_allocator_map
* Another python exit test
* Remove extra debug messages
Try a more clean python shutdown through DllMain
* Revert DllMain idea, it didn't work
* Merge conflicts
* Fix merge with master issues.
* Comments
* Undo edit to file
* Cleanup + new training ops
* Revert yml changes
* Fix another merge error
* ROCM fix
* ROCM fix v2
* Put back Linux hack, it is necessary
* Stupid fixes
* Fix submodule out of sync
* ROCM fix 3
* ROCM 4
* Test java fix
* Fix typos
* Java test on my VM
* Fix build error
* Spotless fix
* Leave temp file around to load properly
* Fix cleanup on exit
* Fix break
* Java comments
* Remove LongformerAttentionBase workaround
* Spotless fix
* Switch yml back to regular build pool
* Revert "Switch yml back to regular build pool"
This reverts commit be35fc2a5a.
* Code review feedback
* Fix errors due to merge
* Spotless fix
* Fix minimal build
* Java fix for non cuda case
* Java fix for CPU build
* Fix Nuphar?
* Fix nuphar 2
* Fix formatting
* Revert "Remove LongformerAttentionBase workaround"
This reverts commit 648679b370.
* Training fix
* Another java fix
* Formatting
* Formatting
* For orttraining
* Last orttraining build fix...
* training fixes
* Fix test provider error
* Missing pass command
* Removed in wrong spot
* Python typo
* Python typos
* Python crash on exit, possibly due to unloading of libraries.
* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit
* Still can't unload providers in python, alas.
* Disable Nvtx temporarily
* MPI Kernels for Training
* MPI Kernels part 2
* Patch through INcclService
* Oops, wrong CMakeLists
* Missing namespace
* Fix missing ()
* Move INcclService::GetInstance around to link nicer
* Missing }
* Missing MPI libraries for Cuda
* Add extra GetType functions used by MPI
* Missing Nccl library
* Remove LOGS statements as a test
* Add in a couple more missing GetType methods
* Update comments
* Missed a logging reference in mpi_context.h
* Convert aten_op to shared (due to marge with master)
* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)
* Test passed, now with fix
* Missing static
* Oops, scope DistributedRunContext to just NCCL
* Merge related issues and code review feedback.
* Merge error
* Bump to rel-1.9.1 (#7684)
* Formatting
* Code review feedback for Java build on non Windows
* Remove cupti library dependency from core library
* Test Java pipeline fix
* Linux build fix
* Revert "Linux build fix"
This reverts commit a73a811516.
* Revert "Remove cupti library dependency from core library"
This reverts commit 6a889ee8bf.
* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages
* Add cuda to Tensorrt nuget package
* onnxruntime_common still has a cuda header dependency
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
* test
* [gwang] make cmake compile work
* [gwang] enble build apks
* some build update
* add simple sigmoid test android project and cmake
* add build.py
* refine and remove unused import lib
* address CR comments
* remove unnecessary files
* add README.md
* minor update
* remove
* minor change
* fix ci failure and minor update
* fix typo in project folder
* remove
* remove and minor update
* refine
* minor fix
* fix
* fix typo
* add gradle spotlessApply task to fix CI failure
* fix
* enable spotlessApply in build gradle
* revert some changes
* minor fix
* run spotless apply for format
* address CR comments and fix CI version and format
* refine
* Refine
* address comments
* refine
* refine
* modify
* reformat
* resolve version conflicts
* minor update
* minor update
* address comments
* minor update
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
Add providers for CoreML, ROCM, NNAPI, ArmNN
Adding the structs for OrtCUDAProviderOptions and OrtOpenVINOProviderOptions
Updating NNAPI flags.
Adding the new CoreML flag.
Adding hooks to the build system to tell Java about the new providers.
* Updates for Gradle 7.
* Adding support for OrtThreadingOptions into the Java API.
* Fixing a typo in the JNI code.
* Adding a test for the environment's thread pool.
* Fix cuda test, add comment to failure.
* Updating build.gradle
* Adding Java support for getAvailableProviders, addFreeDimensionOverrideByName, disablePerSessionThreads and getProfilingStartTimeNs.
* Fixing copyright years, running spotless and adding javadoc and an accessor to OrtProvider.
* Renaming OrtSession.getProfilingStartTimeInNs.
* Removing ngraph as it's been deprecated.
* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon.
On an M1 Macbook Pro clang reports:
$ clang -dumpmachine
arm64-apple-darwin20.1.0
So the regex check needs to look for "arm64" first, as otherwise it
matches 32-bit ARM and you get NEON compilation failures.
* Adding Java side library loading support for Apple Silicon (and other aarch64 architectures).
* Adding Qgemm fix from @tracysh
* Fixes the java packaging on Windows.
* Missed a check in the java platform detector.
* Remove nGraph Execution Provider
Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice
**Deprecation Notice**
| | |
| --- | --- |
| Deprecation Begins | June 1, 2020 |
| Removal Date | December 1, 2020 |
Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.
Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.
* Remove nGraph Licence info from ThirdPartyNotices.txt
* Use simple Test.Run() for tests without EP exclusions
To be consistent with rest of test code.
* Remove nGraph EP functions from Java code
* Add options for nnapi ep
* Add nnapi flags test
* add comments
* Add flag comments
* Make the flags bitset const
* Fix build break
* Add stub changes to java and c# api
* Fix java related build break
* Fix java build break
* Switch to bit flags instead of bitset
Fixed static IS_ANDROID detection
final static IS_ANDROID is causing an Error Unsupport arch:aarch64, so removed IS_ANDROID & replaced with IS_ANDROID with isAndroid().