In the previous shared providers there aren't many OpKernel classes, and the existing Provider_OpKernel wrapper was fine. With the opposibility of making Cuda a shared provider, having this need to be changed per OpKernel adds a lot of complexity.
It was fairly straightforward to make OpKernel work with shared providers with minimal changes.
In this change, the ONNX_OPERATOR_* macros can also be shared with the shared providers.
Adds support for required types to the op kernel type control infrastructure. Required types are always enabled.
Added int64 as a required type for certain ops.
* update Attention operator spec to support pruned model
* update Attention and QAttention cpu & cuda kernel
* Fix invalid embed layer norm fusion test models.
* Added required_grad attribute to YieldOp
* Chagened YieldOp attribute to hold the indices of the required gradient outputs from the count, and removed the code reordering the outputs.
* Changed backward_output_grad_names to a map from backward output gradient name to the corresponding output index.
* Handle case where bias_name is already quantized
If bias is shared between multiple nodes and we've already quantized it, just return the quantized name from the map
* Remove qType attribute from QuantizedValue and QuantizedInitializer
These are unused (and were incorrectly set in the case of int8 quantization)
* Add Reshape op to quantizer
* Add test for Reshape quant
* Fixed issue in python cmake to update wheel package
* Fixes python cmake issue for OV EP
Added post build step for libonnxruntime_providers_openvino
that copies the updated libonnxruntime_providers_openvino.so file
to /onnxruntime/capi directory every time this target is rebuilt.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Removed post_build step from onnxruntime_python.cmake
Now that we have added the post build step to copy
onnxruntime_providers_openvino.so and providers_shared.so
to /onnxruntime/capi directory in onnxruntime_providers.cmake file.
so removing the duplication of the same from here.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed python cmake issue for OpenVINO-EP
->Fixed issue for both Linux and windows
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
* Introduce OrtTasks to replace EventPool
* return run_id to frontend
* pass run_id to backward
* OrtTasks support multiple bg_events
* make message_queue a member of orttask
* Replace MessageQueue with std::promise
* Move status_promise into Task
* Move terminate flag into Task
* Reenable previously disabled UTs
* Add unit tests
* Replace condition variables with std::promise
* Move to CreateBackgroundTask in the main thread
* return status and output in forward_future
* use throw for terminating background thread
* cleanup tasks at destructor
* reenable test_mixed_nnmodule_ortmodules_training
* add mutex for ORTTasks functions
* add mutex for bg_threads
* delay tests before start
* add ut for multi-task common backbone
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* update benchmark for transformers 4.* and ORT 1.7
* Fix gpt2 onnx conversion for transformers 4.3.*. Add a check of transformer version >= 3.1.
* remove code related to openmp
* update pretrain model list: keep representitive models only
Add providers for CoreML, ROCM, NNAPI, ArmNN
Adding the structs for OrtCUDAProviderOptions and OrtOpenVINOProviderOptions
Updating NNAPI flags.
Adding the new CoreML flag.
Adding hooks to the build system to tell Java about the new providers.
* remove tests to speed up CI
* add back _into_data_parallelism tests to see how long the CI test takes
* remove unnecessary save calls
* add back data_parallelism_full_precision_bart_path
* add data_parallelism_full_precision_path
* remove data parallelism tests
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* If unit tests are manually excluded via `--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF` (e.g. testing changes to binary size where you want to keep the build time as quick as possible) it should still be possible to create the python bindings.
Update CMakeLists.txt to decouple the inclusion of onnxruntime_python.cmake from unit tests being enabled.
Update onnxruntime_python.cmake so it works when unit tests are disabled. Also skip copying of test py files when unit tests are disabled.
* Update torchtext usage for pytorch transformer sample
* Temporarily disable tests to unblock repo (failures are being worked on already)
* Update loss numbers for ORTTrainer UTs