This fixes#1034: Can't Create Model Sessions on Different GPU
The root cause of the bug is that CUDA execution provider uses thread_local to save per-thread-context and allocator, and when two CUDA execution providers are running on the same thread there's a conflict. The fix is to add a std::unordered_map to differentiate EPs in the same thread.
* Update MaxPool & AveragePool to support opset 10
* fix build issue
* still use cudnn for MaxPool if dilation is not set or are default 1.
* fix build issue
* add version filter to failed tests
* exclude test from backend
* exclude shrink from opset 9
* fix compile err
* exclude certain version of constant shape
* enable flatten test
* fix compile err
* comment mvn test
* disable constantofshape test in x86
* disable x86 test
* get model version from imported opset
* test linux x86 case
* disable nonzero opset 10
* make mutex const
* test filter by commit id
* adjust substr offset
* Limit test platform
* remove change impacting TFModleInfo.h
* refactoring
* refactoring
* test x86 pipeline with filter
* add comment
* restrict version extraction on non-win
* restrict version extraction on non-win
* add tag
* exclude case from backend test
* remove dup
* remove dup
* make script runnable
* hard code adsolute path
* refactor log
* fix x86 compile err
* fix x86 compile err
* fix x86 compile err
* sync with latest tensorrt
* switch to regex
* fix cpu pipeline err
* test filter
* disable nonzero from all versions
Not needed any more. Because we don't build the date library.
And Sheil says: "It’s a little bit intrusive for callers to be forced onto cpp14 just because they are consuming onnxruntime."
* refactoring the ep codes.
* remove unnecessary lock.
* fix the comment to claim KernelRegistryManager is not thread safe.
* clarify that APIs to add custom op in inferencesession is not thread safe.
* CUDA opset9: Update Cast/MatMul version, add Erf
* Address CR
* More fixes on node placement logic
* Fix typo
* Update CUDA ops Gemm and BatchNormalization to be registered in opset10
* Remove Relu if followed by Clip. Update Clip 'min' if necessary.
Add unit test.
* Rename to match behaviour a little better.
* Update to match latest RewriteRule interface
* Add version and latest commit id to ORT Server
* Update cmake
* Change build id to build number
* Use target_compile_definitions instead of add_definitions
More C++ API improvements and cleanup
Add templates to tensor creation
Add run method that allows preallocated outputs
Simplify CreateTensor<T> to multiply by sizeof(T)
Convert io_types code
Optimize away vector copies in Session::Run
* Improve TensorRT GetCapability Accuracy
* Update onnxruntime_providers.cmake
* made changes based on feedback
* update unit tests for TensorRT
* update onnx-tensorrt submodule to v5.0 branch
* remove uncessary comments
* convert int32 to int64 at inferencing output
* add more data types in compute
* change returns in compute
* use StatusCode as return in compute
* CUDA CPU/GPU sync optimization
Even though CUDA device is capable of handling certain ops, it may be better to leave them on CPU especially for dynamic shape computations starting from Shape.
* Fix TensorRT test crash when fused graph may have null node in topological sort
* As we consistently use non-const reference for modifiable arguments that cannot be null, update the conventions to reflect that.
Add a note on qualifying 'auto' to make the intent clearer and it easier to notice accidental copies.
* Address PR comment by adding a statement around disabling copy/assignment/move for new classes until needed.
* onnx_backend_test_series.py update to allow specifying a single test to run.
The python unittest filtering is to a test method not test name so can't be used directly.
* Clarify help message.
First, we don't need this line of code.
Second, it may change path unintentionally. That, you want to use gcc from /usr/lib/ccache/gcc, but cmake pickup it from /usr/bin.
The shape of Loop inputs M and cond, according to below, could either be a scalar(rank 0), or a 1-d tensor(rank 1), and depending on the actual rank it may produce different outputs for subsequent nodes such as Gather. Thus ORT Loop operator cannot hard code these inputs to be rank 1. This PR also include some fixes for test failures caused by updating Conv shape inferences in ONNX(onnx/onnx#1988).
.Input(
0,
"M",
"A maximum trip-count for the loop specified at runtime. Optional."
" Pass empty string to skip.",
"I",
OpSchema::Optional)
.Input(
1,
"cond",
"A boolean termination condition. Optional. Pass empty string to skip.",
"B",
OpSchema::Optional)
...
.TypeConstraint(
"I",
{"tensor(int64)"},
"tensor of int64, which should be a scalar.")
.TypeConstraint(
"B",
{"tensor(bool)"},
"tensor of bool, which should be a scalar.")
* add x86 legs to ci
* minor update
* update platform from x86 to Win32
* remove --use_mklml from x86 build
* add win x64 mklml pipeline
* remove pybind and use_tvm from win x86
* Add build pipelines for --disable_contrib_ops (mac, lnx, win)
* remove --gen-doc generation for x86
* set environment variables during build to disablecontribops=on
* update to match aiinfra pipelines
* update test data url
* update mac pipeline test data
* remove gen_doc from win x64 leg
* update model files for nocontribops
* reset win-ci-pipeline.yml
* remove confidential models