* Introduce OrtTasks to replace EventPool
* return run_id to frontend
* pass run_id to backward
* OrtTasks support multiple bg_events
* make message_queue a member of orttask
* Replace MessageQueue with std::promise
* Move status_promise into Task
* Move terminate flag into Task
* Reenable previously disabled UTs
* Add unit tests
* Replace condition variables with std::promise
* Move to CreateBackgroundTask in the main thread
* return status and output in forward_future
* use throw for terminating background thread
* cleanup tasks at destructor
* reenable test_mixed_nnmodule_ortmodules_training
* add mutex for ORTTasks functions
* add mutex for bg_threads
* delay tests before start
* add ut for multi-task common backbone
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Update torchtext usage for pytorch transformer sample
* Temporarily disable tests to unblock repo (failures are being worked on already)
* Update loss numbers for ORTTrainer UTs
* Support keyword arguments for ORTModule.
* Add backward workaround to the test.
* Specify test name directly without -k.
* Handle unused inputs removed by ONNX exporter.
* Enable external CUDA allocator in ORTModule.
* Fix assert after unification of allocators.
* Update no grad memory test.
* update comments.
* fix provider options array when not sharing allocator.
* Fixes OpenVINO-EP windows build
Openvino EP build is broken on windows. The issue
is wchar_t is UTF-16 on windows while on other platforms
such as Linux and MacOS, wchar_t is UTF-32.
so wide Unicode string has to be converted to an UTF8 string
for sure on windows.
This commit fixes this issue.
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
* Integrate memory improvements from NVidia
* compute max_global_num before buffer allocation
* update conversion script to support transformers 4.0
* update benchmark script for creating dummy inputs for different batch_size
* Use a wrapper of cuda event to avoid memory leak
* rename pipelines
* resync and rename
* resync master
* rename package id
* remove OrtPackageId which is for nuget
Co-authored-by: Randy Shuai <rashuai@microsoft.com>