* disable materialize grads
* gradient builder bugfix
* fix ut
* fix ut
* resolve comments and bugfix
* add more assert
* disable forward compare for now
* priority-based exec order
* disable 1 failing test
* fix UT
* more comments
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add more asserts on forward outputs
* Found one more failing case
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
In the previous shared providers there aren't many OpKernel classes, and the existing Provider_OpKernel wrapper was fine. With the opposibility of making Cuda a shared provider, having this need to be changed per OpKernel adds a lot of complexity.
It was fairly straightforward to make OpKernel work with shared providers with minimal changes.
In this change, the ONNX_OPERATOR_* macros can also be shared with the shared providers.
Adds support for required types to the op kernel type control infrastructure. Required types are always enabled.
Added int64 as a required type for certain ops.
* update Attention operator spec to support pruned model
* update Attention and QAttention cpu & cuda kernel
* Fix invalid embed layer norm fusion test models.
* Added required_grad attribute to YieldOp
* Chagened YieldOp attribute to hold the indices of the required gradient outputs from the count, and removed the code reordering the outputs.
* Changed backward_output_grad_names to a map from backward output gradient name to the corresponding output index.
* Handle case where bias_name is already quantized
If bias is shared between multiple nodes and we've already quantized it, just return the quantized name from the map
* Remove qType attribute from QuantizedValue and QuantizedInitializer
These are unused (and were incorrectly set in the case of int8 quantization)
* Add Reshape op to quantizer
* Add test for Reshape quant
* Fixed issue in python cmake to update wheel package
* Fixes python cmake issue for OV EP
Added post build step for libonnxruntime_providers_openvino
that copies the updated libonnxruntime_providers_openvino.so file
to /onnxruntime/capi directory every time this target is rebuilt.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Removed post_build step from onnxruntime_python.cmake
Now that we have added the post build step to copy
onnxruntime_providers_openvino.so and providers_shared.so
to /onnxruntime/capi directory in onnxruntime_providers.cmake file.
so removing the duplication of the same from here.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed python cmake issue for OpenVINO-EP
->Fixed issue for both Linux and windows
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
* Introduce OrtTasks to replace EventPool
* return run_id to frontend
* pass run_id to backward
* OrtTasks support multiple bg_events
* make message_queue a member of orttask
* Replace MessageQueue with std::promise
* Move status_promise into Task
* Move terminate flag into Task
* Reenable previously disabled UTs
* Add unit tests
* Replace condition variables with std::promise
* Move to CreateBackgroundTask in the main thread
* return status and output in forward_future
* use throw for terminating background thread
* cleanup tasks at destructor
* reenable test_mixed_nnmodule_ortmodules_training
* add mutex for ORTTasks functions
* add mutex for bg_threads
* delay tests before start
* add ut for multi-task common backbone
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* update benchmark for transformers 4.* and ORT 1.7
* Fix gpt2 onnx conversion for transformers 4.3.*. Add a check of transformer version >= 3.1.
* remove code related to openmp
* update pretrain model list: keep representitive models only