* Working changes for ConcatTraining op
* Refactor to move changes to orttraining
* Fix segfault
* Support -ve axis for shape inferencing
* fix build
Co-authored-by: Ethan Tao <ettao@microsoft.com>
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
* add LRN/Grouped Conv Support, minor changes
* better pool ops sdk version requirement
* reduce string comparision for gemm/matmul ops
* fix nnapi fall back to cpu for softmax
* addressed review comments, correct a small error in the code
* improve calibration tool
* modify calibration interface name
* modify calibration interface name
* refine calibrate and calibrate_user
* refine and add type info
* refine and add type info
* add e2e user example file
* remove unnecessary files
* remote test images no longer needed
* update readme document
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Updated pushed CPU and CUDA tags.
* Add tensorRT, fix typo.
* Add OpenVINO tags. Remove 2020.2 installation instructions for VAD-M.
* Revert instruction changes for V-ADM and update 2020.2 to 2020.3
* Optimize CreateEnv by not creating the logging manager instance if env instance has already been created.
* Move creation of logging mgr inside if block
* Add BN to ArmNN EP
* Add Concat to ArmNN EP
* ACL logging improvements
* ArmNN logging improvements
* Fallback to CPU for 9x9 convolution in ACL EP
* Fallback to CPU for 9x9 convolution in ArmNN EP
* Enable python support for ACL and ArmNN EPs when compiled with BSP toolchain
* Removed the matmul operator
* Fix conv infer shape function
* Fix provider_names list for armnn
Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
* Expose load tensor proto from protobuf file function
* Add comment
* Remove use of fstream and use parsefromzerocopystream
* Close file descriptor after finish parsing it
* Close input stream too
* Set Close on delete only, no need to close file descriptor
* Revert "Set Close on delete only, no need to close file descriptor"
This reverts commit 5ba6e3c31b.
* Revert "Close input stream too"
This reverts commit 4564776733.
* Revert "Close file descriptor after finish parsing it"
This reverts commit 846e550c4f.
* Revert "Remove use of fstream and use parsefromzerocopystream"
This reverts commit 25a3117183.
* Add python API for specifying CUDA device id
* Modification for providing session based python api for specifying
device id
* When include header file pybind11/stl.h, conversion between c++
containers and Python list, vector and dict data structure are
automatically enabled.
https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#
Therefore, refactor the code for better leverage this advantage.
* Make struct CudaDeviceOptions as default cuda device options
* Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts)
But still stay consistent with existing sess.set_providers(list_of_provider)
* Add cuda provider option default setting
* Add support for setting cuda cuda_mem_limit and arena_extend_strategy.
Also resolved the merge conflict on session.py
* Use python ctypes to call cuda library to help python unittest
* Refine the code with reviewer's suggestions
* Add the capability of getting execution provider's configuration
- Once we introduced the capability to set execution provider's
configuration, it makes sense to add capability of getting ep's configuration.
* Modify the code with reviewer's suggestions.
* Using stoull() and stoul() depends on 32/64-bits architecture.
* Rewrite the testcases for testing setting CUDA device id
Note: We need to make sure every ORT process be run on one CUDA device
at a time.
* Make sure old session object is destroyed by python gc before new
session object is being created
* Move testcases to original onnxruntime_test_python.py
* Fix bugs to pass CI build
* Make it pass CI build (cont.)
* Make it pass CI build (cont.)
* Adding CPU implementation of BroadcastGradientArgs op
* Modify to take shape as input instead of tensor
* Cleanup
* Correct schema
* Corrected kernel, added tests, addressed review comments.
* Added exception,test for invalid broadcast,addresed review comments.
* Fix mac build error.
* Initial change, to add ReduceSumTraining cpu op
* cpu support
* cuda support + more UTs
* on comments + UT
* no op support for {} axes with new attr - noop_with_empty_axes
* on comments
* fix build
* on comments
Co-authored-by: aishwarya bhandare <aibhanda@microsoft.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com>
* Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)"
Previously it fails because it used too much memory.
Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.
* Deprecate TrainableDropout.
* Add Dropout(12) back into Megatron transformer.
* Remove TrainableDropout from front-end test models.
* Update baseline for front-end tests after converting test models to opset-12.
* Update baseline for front-end tests after converting test models to opset-12.
* Revise pipeline schedule to consider communication ops
* Add test
* Fix warning
* inline some short functions
* Fix warnings
* Rename a class
* Add comment for test
* op renamed to task
* Fix NVTX wrapper's bug
* concat
* add path_utils
* address feedback
* use string in test
* convert wstring to sting in windows
* address feedback
* address feedback
* fix comment