* GPT2 Gelu Fusion & Test
* change header path
* Refine code & add missing test onnx file
* Fix builds & refine float/double/fp16 compare.
* Fix builds
* Add Bias Check and UTs
* Fix build and uts
* Fuse with second formula & test
* minor change
* disable FastGelu to see whether the builds can pass
* Verify where is wrong
* disable for debugging
* Revert "disable for debugging"
This reverts commit 535c0817fb36fb95a75773a7f00c8b969dd5362c.
* Revert "Verify where is wrong"
This reverts commit ffc43ec1d136636ba2cee30df49f563a75e84676.
* disable the transformer for inference currently
* Enable FastGeluFusion and fix segement fault when run bertsquad10.onnx test
* Add more Unit tests convering Gelu subgraph use graph input/output
(cherry picked from commit 0739ab985240c6d9acdb8f0afd40c5fb316166af)
* Mode Bias Fusion in BiasGelu.cc
Co-authored-by: Changming Sun <chasun@microsoft.com>
Add support to fuse ReorderOutput+Transpose(NHWC). Converting from NCHWc to NHWC tensors is a trivial copy of data and avoids the cost of a transpose node.
This fixes a customer reported issue where the NCHWc optimizer was dropping graph outputs when an edge was used as both a graph output and an input to another NCHWc node.
* Optimization for Bert and DistilBert model exported by keras2onnx
* Add model_type parameter for models from different export tools (pytorch, tf2onnx, keras2onnx).
* Split LayerNormalization and SkipLayerNormalization fusions
Optimize the implementation of Math::Im2col that is currently used for ConvInteger/QLinearConv. Also, avoid Im2col for pointwise convolutions in ConvInteger.
* merge training kernels to master
* merge training kernels to master
* revert two files
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* merge training kernels to master
* Avoid unneccesary copy creations of ModelProto
* Comment nit
* Nuit
* Comment refactoring
* Comment refactoring
* Fix build break
* Fix a few more instances where copies take place
* update onnx-tensorrt submodule to trt7 branch
* add fp16 option for TRT7
* switch to master branch of onnx tensorrt
* update submodule
* update to TensorRT7.0.0.11
* update to onnx-tensorrt for TensorRT7.0
* switch to private branch due to issues in master branch
* remove trt_onnxify
* disable warnings c4804 for TensorRT parser
* disable warnings c4702 for TensorRT parser
* add back sanity check of shape tensort input in the parser
* disable some warnings for TensorRT7
* change fp16 threshold for TensorRT
* update onn-tensorrt parser
* fix cycle issue in faster-rcnn and add cycle detection in GetCapability
* Update TensorRT container to v20.01
* Update TensorRT image name
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
* Update linux-gpu-tensorrt-ci-pipeline.yml
* disable rnn tests for TensorRT
* disable rnn tests for TensorRT
* disabled some unit test for TensorRT
* update onnx-tensorrt submodule
* update build scripts for TensorRT
* formating the code
* Update TensorRT-ExecutionProvider.md
* Update BUILD.md
* Update tensorrt_execution_provider.h
* Update tensorrt_execution_provider.cc
* Update win-gpu-tensorrt-ci-pipeline.yml
* use GetEnvironmentVar function to get env virables and switch to Win-GPU-2019 agent pool for win CI build
* change tensorrt path
* change tensorrt path
* fix win ci build issue
* update code based on the reviews
* fix build issue
* roll back to cuda10.0
* add RemoveCycleTest for TensorRT
* fix windows ci build issues
* fix ci build issues
* fix file permission
* fix out of range issue for max_workspace_size_env
Provide alternative std::mutex implementation on Windows. OrtMutex is no longer an alias of std::mutex.
We do it because:
1. This new thing is faster and much much simpler.
2. Static constructors are considered harmful. We should avoid such thing as possible as we can.
* Enable ARM64 release builds
* Add ARM release
* Skip C# dll signing in ARM
* Copy ARM binaries to Nuget
* Restore nuget packages before ARM packaging
* wip
* Use host protoc at C# build
* Set ProtocDirectory on cross-compiled builds
* wip
* Fix typo
* Add unit test.
Add an option --use_onnxruntime to use onnxruntime to do optimization for pytorch model.
Update layer norm and gelu for tensorflow 2.1 keras bert model.
Add logging and use f-strings.
Add extra checking for tensorflow model reshape fusion.
Allow output model to json for test purpose.
update match parent path utility function to return index
* remove function not used.