* Adjust ngraph cmake files to onnx 1.5.0
* Enable LSTM reverse direction mode in nGraph EP
* Enable full support for the Split op in nGraph EP
* Revert "Disable the unsigned input Shrink op tests for nGraph until the next update"
This reverts commit 257b42a55bdd98f804d4846868542b8e3aeb4b4e.
* Enable Gather and remove unused subgraph attribute
* Remove the unused param from AppendClusterToSubGraph
* Fix for the incorrect onnx opset version
* Use the r0.26 release branch before the tag is created
* Enable the quantizelinear and dequantizelinear for NGEP
* Use the v0.26.0-rc.2 tag in ngraph.cmake
* Add skip for modes others than default in Pad operator
* Reenable negative axis tests for ngraph
* Use temporary ngraph version
* Use branch name instead of SHA for temporary ngraph branch
* Use ngraph v0.26.0-rc.4
* Remove patch for missing symbol in MKLDNN
* Use MKLDNN 1.0 in ngraph
* Exclude the Pad op for opsets greater than 10
* Disable quantizelinear and dequantizelinear tests for ONNX 1.5.0
* Fix the onnx-headers related compilation errors
* ONNX libs linking fix
* Use a tag for ngraph and support more Pad modes
* Use the v0.26.0 release tag for nGraph
* Update ngraph to RC8 - bigobj flag for Windows builds
* Fix the MKLDNN constexpr error on Windows
* Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity.
* Fix cuda build
* Modify the test slightly
* Make C and C# APIs consistent with Python.
* Add ability to get symbolic dimension info for graph inputs and outputs.
WIP to get initial feedback.
* Fix linxu build error.
Update C# API and add unit test
* Clarify the two different ways Tensor shape and type info is created. One is from concrete values and one is from a type proto where symbolic dimensions may exist. Doing so allows a change to default to empty strings for the symbolic dimensions if not provided.
Fix issue that TRT not work for device other than device id 0. Because the allocation planner need to get the default allocator to allocate memory for graph input data. (#2094)
* Fix bug with delayed allocation of If and Scan outputs.
If the subgraph is producing output on a non-CPU device the delayed allocation was incorrectly providing a CPU allocated tensor.
Check for the required location, and update 'fetches' instead if a device copy is needed.
The utils::ExecuteGraph logic will handle the device copy in this case.
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040)
* throw exception using dmlc::LogMessageFatal
On windows, ORT_THROW couldn't be caught if the exception was thrown from
a jitted functions. Let's call dmlc::LogMessageFatal instead.
* address CR
use LOG(FATAL)
* Add Embed Layer Normalization and Skip Layer Normalization ops for bert optimization.
* add float16 test for skiplayernorm
* Add test for EmbedLayerNormalization op
* fix cpu build error
* fix build warning
* update HasCudaEnvironment function
* handle cuda error
* Mention OrtCreateSessionFromArray in C API doc
* fix seq of tensors
* changes on 9/30
* All tests passing
* Add SequenceAt op
* Fix shared_lib non_tensor_types test
* Address some PR comments
* Address PR comments
* Add support in python bindings to accept seq(tensor)
* Change data type from vector<Tensor> to TensorSeq
* Change data type from vector<Tensor> to TensorSeq
* Added some documentation
* Added missing test model
* Fix Linux build
* Fix Mac build
* Fix Mac build
* Add Attention op for multi head self attention in BERT
* Add test cases
* Move op from kOnnxDomain to kMSDomain.
Limit test to run by CUDA provider only.
* fix test
* Add float16 test
* fix cpu build error
* handle cuda error
* get last cuda error when failed