This fixes#1034: Can't Create Model Sessions on Different GPU
The root cause of the bug is that CUDA execution provider uses thread_local to save per-thread-context and allocator, and when two CUDA execution providers are running on the same thread there's a conflict. The fix is to add a std::unordered_map to differentiate EPs in the same thread.
* Update MaxPool & AveragePool to support opset 10
* fix build issue
* still use cudnn for MaxPool if dilation is not set or are default 1.
* fix build issue
* add version filter to failed tests
* exclude test from backend
* exclude shrink from opset 9
* fix compile err
* exclude certain version of constant shape
* enable flatten test
* fix compile err
* comment mvn test
* disable constantofshape test in x86
* disable x86 test
* get model version from imported opset
* test linux x86 case
* disable nonzero opset 10
* make mutex const
* test filter by commit id
* adjust substr offset
* Limit test platform
* remove change impacting TFModleInfo.h
* refactoring
* refactoring
* test x86 pipeline with filter
* add comment
* restrict version extraction on non-win
* restrict version extraction on non-win
* add tag
* exclude case from backend test
* remove dup
* remove dup
* make script runnable
* hard code adsolute path
* refactor log
* fix x86 compile err
* fix x86 compile err
* fix x86 compile err
* sync with latest tensorrt
* switch to regex
* fix cpu pipeline err
* test filter
* disable nonzero from all versions
Not needed any more. Because we don't build the date library.
And Sheil says: "It’s a little bit intrusive for callers to be forced onto cpp14 just because they are consuming onnxruntime."
* refactoring the ep codes.
* remove unnecessary lock.
* fix the comment to claim KernelRegistryManager is not thread safe.
* clarify that APIs to add custom op in inferencesession is not thread safe.
* CUDA opset9: Update Cast/MatMul version, add Erf
* Address CR
* More fixes on node placement logic
* Fix typo
* Update CUDA ops Gemm and BatchNormalization to be registered in opset10
* Remove Relu if followed by Clip. Update Clip 'min' if necessary.
Add unit test.
* Rename to match behaviour a little better.
* Update to match latest RewriteRule interface
* Add version and latest commit id to ORT Server
* Update cmake
* Change build id to build number
* Use target_compile_definitions instead of add_definitions
More C++ API improvements and cleanup
Add templates to tensor creation
Add run method that allows preallocated outputs
Simplify CreateTensor<T> to multiply by sizeof(T)
Convert io_types code
Optimize away vector copies in Session::Run
* Improve TensorRT GetCapability Accuracy
* Update onnxruntime_providers.cmake
* made changes based on feedback
* update unit tests for TensorRT
* update onnx-tensorrt submodule to v5.0 branch
* remove uncessary comments
* convert int32 to int64 at inferencing output
* add more data types in compute
* change returns in compute
* use StatusCode as return in compute
* CUDA CPU/GPU sync optimization
Even though CUDA device is capable of handling certain ops, it may be better to leave them on CPU especially for dynamic shape computations starting from Shape.
* Fix TensorRT test crash when fused graph may have null node in topological sort
* As we consistently use non-const reference for modifiable arguments that cannot be null, update the conventions to reflect that.
Add a note on qualifying 'auto' to make the intent clearer and it easier to notice accidental copies.
* Address PR comment by adding a statement around disabling copy/assignment/move for new classes until needed.