This PR introduces a rewrite rule that replaces a Shape node with an initializer when the shape of the input is statically known through shape inference.
* ConstantOfShape CUDA implementation
* Enhance the fallback logic, so the case that Shape -> ... -> ConstantOfShape won't fallback ConstantOfShape to CPU provider
* move shared code to cpu implementation
* do the fill based on sizeof(data_type)
* update method access level
This fixes#1034: Can't Create Model Sessions on Different GPU
The root cause of the bug is that CUDA execution provider uses thread_local to save per-thread-context and allocator, and when two CUDA execution providers are running on the same thread there's a conflict. The fix is to add a std::unordered_map to differentiate EPs in the same thread.
* Update MaxPool & AveragePool to support opset 10
* fix build issue
* still use cudnn for MaxPool if dilation is not set or are default 1.
* fix build issue
* add version filter to failed tests
* exclude test from backend
* exclude shrink from opset 9
* fix compile err
* exclude certain version of constant shape
* enable flatten test
* fix compile err
* comment mvn test
* disable constantofshape test in x86
* disable x86 test
* get model version from imported opset
* test linux x86 case
* disable nonzero opset 10
* make mutex const
* test filter by commit id
* adjust substr offset
* Limit test platform
* remove change impacting TFModleInfo.h
* refactoring
* refactoring
* test x86 pipeline with filter
* add comment
* restrict version extraction on non-win
* restrict version extraction on non-win
* add tag
* exclude case from backend test
* remove dup
* remove dup
* make script runnable
* hard code adsolute path
* refactor log
* fix x86 compile err
* fix x86 compile err
* fix x86 compile err
* sync with latest tensorrt
* switch to regex
* fix cpu pipeline err
* test filter
* disable nonzero from all versions
Not needed any more. Because we don't build the date library.
And Sheil says: "It’s a little bit intrusive for callers to be forced onto cpp14 just because they are consuming onnxruntime."
* refactoring the ep codes.
* remove unnecessary lock.
* fix the comment to claim KernelRegistryManager is not thread safe.
* clarify that APIs to add custom op in inferencesession is not thread safe.
* CUDA opset9: Update Cast/MatMul version, add Erf
* Address CR
* More fixes on node placement logic
* Fix typo
* Update CUDA ops Gemm and BatchNormalization to be registered in opset10
* Remove Relu if followed by Clip. Update Clip 'min' if necessary.
Add unit test.
* Rename to match behaviour a little better.
* Update to match latest RewriteRule interface
* Add version and latest commit id to ORT Server
* Update cmake
* Change build id to build number
* Use target_compile_definitions instead of add_definitions