* Adding versioned dlls to tar/zip packages
* fix syntax error
* fix version name of dylib
* minor fix in the target
* update pattern for versioned dylib files
* initial commit
* More changes
* More changes
* Adding stuff back to the targets xml
* More changes v3
* More changes v4
* More changes v5
* More changes v6
* More changes v7
* More changes v9
* Disable CSharp tests for now
* More changes
* Revert file to same status
* Update props file for x86
* Change to usage of TargetArchitecture instead of PlatformTarget
* Update targets.xml
* Minor formatting nit fix
* Update based on PR comments
Root cause:
The crash is caused by the null threadpool in the op. The op is inside the subgraph. theadpool is set on the session_state. However it doesn't pass it to the session_sate owned by subgraph.
Fix:
Pass the threadpool to the session_sate owned by subgraph when we create CreateSubgraphSessionState.
* Update ONNX to 70c9026ca11b0af0050f8186bea6cab94636947f to pickup ReverseSequence op.
Copy ReverseSequence from contrib ops to ONNX (keep contrib op in this commit), and update to use int64_t for sequence_lens input.
* Copy ReverseSequence from contrib to ONNX and update to use int64_t for sequence_lens.
Maintain contrib op in this commit.
* Remove contrib op as it was temporary and only used internally.
* Remove contrib op schema defs.
* Cleanup contrib_defs.cc
* support non-tensor types
* support non-tensor types.
* support non-tensor types.
* fix compilation issues
* fix compilation issues
* Build without mkldnn for release packages. We'll default to MLAS.
* Remove OSes/architectures that we don't build on and have no CI for.
* initial checkin for dilations and ceil support
* add unit tests for ceil_mode
* Update to use versioned_kernel for AveragePool
* update mkldnn/poolc..
* Add versioning to cuda_execution_providers.cc
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* Folding PR comments
* Folded PR comments
* removed copy of dilations from contrib ops
* support non-tensor types
* support non-tensor types.
* support non-tensor types.
* fix compilation issues
* fix compilation issues
* Build without mkldnn for release packages. We'll default to MLAS.
* Modify roialign to conform with the new onnx spec and take it out from contrib ops.
Memory pattern doesn't work for parallel executor by design. Enabling Memory Pattern for parallel executor logs warning and make the perf bad.
Add option to enable/disable memory pattern back.
* move files
* move files
* Remove NonMaxSuppression from Contrib op, move it to Onnx domain, opset 10
* move NMS out of namespace contrib
* update data type in UT
* update to latest onnx
* white list the node test for Mod which is not implemented yet
* Fix warning in tensor_type_and_shape.cc
tensor_type_and_shape.cc:139:18: error: ‘out’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
* fix warnings
Change MLAS to be able to build standalone without onnxruntime header dependencies. This is enabled when building with MLAS_NO_ONNXRUNTIME_THREADPOOL defined.
mlas.h had been changed to include the ThreadPool header, but this header now just has a forward reference for the class. The header was also doing a "using onnxruntime::concurrency"; that has been removed and the external mlas.h users fixed up as needed.
As before, if ThreadPool==nullptr, then MLAS uses OpenMP or falls back to a single threaded implementation. The build option to use the Win32 system thread pool has been removed as onnxruntime can't hit that path and I don't use that option for standalone tests anymore.
* Disable tests for certain models (Cherry pick from 0.3.1)
* Disable more tests
* More tests
* even more tests
* Fix gpu builds
* Disable L2 transformers
* Env variable to disable contrip ops for csharp tests
Introduce a quick pre-filtering of rules based on the node op types they are targeting.
The goal is to avoid evaluating all rules for all nodes. Instead, for each node, we will only be evaluating the rules associated with its op type.