* Adding CPU implementation of BroadcastGradientArgs op
Modify to take shape as input instead of tensor
Cleanup
Correct schema
Corrected kernel, added tests, addressed review comments.
Initial change, to add ReduceSumTraining cpu op
cpu support
Initial changes to gradient builder
Non-empty reduction case passing.
Added exception,test for invalid broadcast,addresed review comments.
Initial change, to add ReduceSumTraining cpu op
cpu support
cuda support + more UTs
on comments + UT
no op support for {} axes with new attr - noop_with_empty_axes
Add noop attribute to ReduceSumTraining use
Add testing for no-shape graph, modify AddSub grad builder, logging.:
MulGrad support
Div support
Expand support
Gemm support
MatMul grad change
Transpose Grad change
BiasGeluGrad change.
Fixes after squash
* Remove logging, add specific exception for shape inference error
* fix build
* Review comments
* Review comments
* Fix windows build
Co-authored-by: Ethan Tao <ettao@microsoft.com>
* Adjust indentation of statement, without this fix GCC 7.5 errors
out with:
"this ‘if’ clause does not guard this statement, but the
latter is misleadingly indented as if it were guarded by the ‘if’"
* Add braces around the if-statement for improved clarity.
Co-authored-by: Alberto Magni <alberto.magni@microsoft.com>
* Working changes for ConcatTraining op
* Refactor to move changes to orttraining
* Fix segfault
* Support -ve axis for shape inferencing
* fix build
Co-authored-by: Ethan Tao <ettao@microsoft.com>
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
* add LRN/Grouped Conv Support, minor changes
* better pool ops sdk version requirement
* reduce string comparision for gemm/matmul ops
* fix nnapi fall back to cpu for softmax
* addressed review comments, correct a small error in the code
* improve calibration tool
* modify calibration interface name
* modify calibration interface name
* refine calibrate and calibrate_user
* refine and add type info
* refine and add type info
* add e2e user example file
* remove unnecessary files
* remote test images no longer needed
* update readme document
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Updated pushed CPU and CUDA tags.
* Add tensorRT, fix typo.
* Add OpenVINO tags. Remove 2020.2 installation instructions for VAD-M.
* Revert instruction changes for V-ADM and update 2020.2 to 2020.3
* Optimize CreateEnv by not creating the logging manager instance if env instance has already been created.
* Move creation of logging mgr inside if block
* Add BN to ArmNN EP
* Add Concat to ArmNN EP
* ACL logging improvements
* ArmNN logging improvements
* Fallback to CPU for 9x9 convolution in ACL EP
* Fallback to CPU for 9x9 convolution in ArmNN EP
* Enable python support for ACL and ArmNN EPs when compiled with BSP toolchain
* Removed the matmul operator
* Fix conv infer shape function
* Fix provider_names list for armnn
Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
* Expose load tensor proto from protobuf file function
* Add comment
* Remove use of fstream and use parsefromzerocopystream
* Close file descriptor after finish parsing it
* Close input stream too
* Set Close on delete only, no need to close file descriptor
* Revert "Set Close on delete only, no need to close file descriptor"
This reverts commit 5ba6e3c31b.
* Revert "Close input stream too"
This reverts commit 4564776733.
* Revert "Close file descriptor after finish parsing it"
This reverts commit 846e550c4f.
* Revert "Remove use of fstream and use parsefromzerocopystream"
This reverts commit 25a3117183.
* Add python API for specifying CUDA device id
* Modification for providing session based python api for specifying
device id
* When include header file pybind11/stl.h, conversion between c++
containers and Python list, vector and dict data structure are
automatically enabled.
https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#
Therefore, refactor the code for better leverage this advantage.
* Make struct CudaDeviceOptions as default cuda device options
* Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts)
But still stay consistent with existing sess.set_providers(list_of_provider)
* Add cuda provider option default setting
* Add support for setting cuda cuda_mem_limit and arena_extend_strategy.
Also resolved the merge conflict on session.py
* Use python ctypes to call cuda library to help python unittest
* Refine the code with reviewer's suggestions
* Add the capability of getting execution provider's configuration
- Once we introduced the capability to set execution provider's
configuration, it makes sense to add capability of getting ep's configuration.
* Modify the code with reviewer's suggestions.
* Using stoull() and stoul() depends on 32/64-bits architecture.
* Rewrite the testcases for testing setting CUDA device id
Note: We need to make sure every ORT process be run on one CUDA device
at a time.
* Make sure old session object is destroyed by python gc before new
session object is being created
* Move testcases to original onnxruntime_test_python.py
* Fix bugs to pass CI build
* Make it pass CI build (cont.)
* Make it pass CI build (cont.)
* Adding CPU implementation of BroadcastGradientArgs op
* Modify to take shape as input instead of tensor
* Cleanup
* Correct schema
* Corrected kernel, added tests, addressed review comments.
* Added exception,test for invalid broadcast,addresed review comments.
* Fix mac build error.
* Initial change, to add ReduceSumTraining cpu op
* cpu support
* cuda support + more UTs
* on comments + UT
* no op support for {} axes with new attr - noop_with_empty_axes
* on comments
* fix build
* on comments
Co-authored-by: aishwarya bhandare <aibhanda@microsoft.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com>