* Updated TPN
* Update batch_norm_op_test.cc
* Update ThirdPartyNotices.txt
* Update ThirdPartyNotices.txt
* Update readme with package links
* Update README.md
* Update README.md
* Update README.md
* Merged Ryan and TPN changes into single PR
* minor fix
* added mkldnn to GPU pipeline. Required by C# library as it is the default execution provider
Rewrite rule that eliminates slice operators when they are redundant (i.e., when they preserve the whole input).
Re-implementation of the identity elimination rule using the latest Graph API.
* Docker Container for CPU Version (Ubuntu 16.04, Python3 Bindings, compatible with Windows-Docker)
* Nvidia-Docker Container for GPU Version (Ubuntu 16.04, CUDA, CUDNN, Python3 Bindings)
* README with Docker quickstart instructions (i.e. docker pull .../onnxruntime:cpu, docker run -it ...)
* Include plans to publish public images (with ONNX Runtime 0.2) on README
The outer GEMM loop repeatedly calls the inner GEMM kernel with a row count (the M parameter to GEMM) and the inner kernel decides how many rows it will actually handle. The FMA3 kernel only handled row counts of 1,3,6 to keep code size down. To be competitive however, the FMA3 kernel needs to handle any row count from 1-6.
One example model was issuing a GEMM with M=11 and this had been broken up into 6,3,1,1, but can now be handled as 6,5.
The kernels have been templatized MASM style to avoid the cut/paste code from the original implementation. The Linux variants will be updated after doing some additional work on the MASM variants first.
* add file for cshar test using CUDA docker image
* add gpu csharp end to end test scripts
* uncomment data download
* minor change to kick off ci build
* small change to kick off build
* Add windows GPU test runner script
* Add the ability to use a custom allocator for fetches.
Allows control flow nodes to forward the allocation to the control flow op and avoid an unnecessary copy when the subgraph output has a symbolic dimension.
Update Scan and If to use custom allocators when applicable.
* Remove unnecessary forward declaration
* Fix Mac build warnings
* Add ability to initialize InferenceSession with a model that is already loaded.
* Cleanup some unnecessary namespace qualifications and some long lines.
* Remove InferenceSession::Initialize(std::shared_ptr<Model>&)
* Remove unit test for init from existing Model instance.
* passed the OnnxRuntimeBuildDirectory to the docker
* removed the requirement for the docker host to set the env var
* set the env var to the path where the build dir is mounted in the container
* Copy mkldnn to output folder for linux. Nuget doesn't resolve dll dependency correctly within a package
* Modify to copy all dlls to output folder
* update rpath for shared library
* Simplified linker flags for RPATH
* Removing copying of dlls to output folder, since setting RPATH works fine now
* Improve VerifyKernelDef() performance when op has many inputs/outputs/type constraints.
* Added two modes for resolving type binding.
* Updated TypeBindingResolver to avoid heap allocation.
* Tweaked TypeBindingResolver for performance.
* Handle negative axes for reduce ops
* negative axes are not handled in shape inference if input shape
is not known at that time.
* nit: use HandleNegativeAxis in provider/common.h
* fixed typo in runtest.sh
* some fixes
* some fixes
* some fixes in the runtest.sh
* added test data url
* fixes on the dotnet test scripts
* fix on prior mistake regarding installation of apt-transport-https
* added verbosity in the test run for easy debugging
* updated comment in the runtest.sh
* Advance ONNX commit, move Ngram files under ONNX and rename to TfIdfVectorizer
* Rename Ngram to TfIdfVectorizer and redeclare in ONNX domain
* Restore tfidfvectorizer tests
* Remove ML definition.
* Update ONNX version to pickup Scan spec change that adds scan_output_axes.
Add logic to transpose an output
- write to temporary buffer when executing subgraph
- transpose temporary buffer into Scan output when execution completes
Add unit tests
* Update to ONNX dbf3581835e3a05716e10587511d7ab3b2cdc386 to pickup inferencing bugfix.
Update test to match.
* Disable some tests for opset 9 operators that haven't been implemented yet.
* matmul add fusion
* add shape check on Gemm input C
* walk around the issue with RemoveNode
* update the version support
* If MatMul has shape [K] * [K, N], update it to [1, K] * [K, N], so that it can work for Gemm
* Fuse Gemm+Activation into FusedGemm
* test
* revert the change which fuse the matmul with shape [K]*[K, N] to Gemm as shape [1, K]*[K, N], this may cause runtime failure, as the we can't change input data shape.
* revert the change which change the shape for Matmul from [K]*[K, N] to [1, K]*[K, N]. It enables fuse Matmul + Add to Gemm, but the issue is the data is not aware of this, so the data shape is still [K]*[K, N] and cause runtime issue.
* 1. Fix build issue for CUDA
2. Update Gemm so that we can fuse Matmul [K] * [K, N] + Add [1, N] into Gemm with shape [1,K] * [K, N] + [1, N]
* Fix build issue
* Fuse the activation node even it connects the output
* resolve the merge conflicts
* Add test model for Gemm+Activation fusion