* IsInf ReduceSum transform
* Revert unnecessary changes, add isinf_only and isnan_only attr
* add tests, review comments
* Disable test for non-cuda
* Move IsAllFinite from training to contrib op
* review comments
* Review comment, formatting
* Enable test for ROCm EP
* Add DropoutGrad function body
* Add DropoutGrad function body
* Fix documentation and add test cases
* Fix template specialization
* Check expansion for float16 and bfloat16
* ConvGrad CUDA impl
* Set up the test case for Deberta Conv1D
* Add fp16 test
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Avoid passing zero bias to Gemm in gradients
The bias argument to Gemm is optional and defaults to zero. Therefore we do not need to generate zero initializers and pass them to that argument.
* Remove unused declaration.
* Enabled rocm support for graph transformations
* Support for external Hip allocator
* Added const_cast to reinterpret_cast to fix compiler issue
* Another crack at fixing the compile error
* More compilation fixes
* Added compilation flags to load_inline extension
* Added ROCM, ROCM_PINNED constants
* Changes to address PR comments
* Changed gpu identifier from ROCM to CUDA
* Added HIP compilation flag for torch inline functions
* Fixed a typo in header allocator string formatting
* Fix for runtime error with external_cuda_allocator
* Removed cuda/rocm specific code paths for allocators
* More name changes to generic gpu from rocm/cuda
* Removed duplicate allocator creation
* Rename cuda_external_ config options as gpu_external_
* Rename hip_mem_limit to gpu_mem_limit
* Rename cuda_mem_limit to gpu_mem_limit
With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set.
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* Allow specific optimizers to be disabled.
- replace unused ability to specify just the optimizers to run
- never used so not needed
Allow the disabled list to be specified via the python bindings
- expected usage is internal, so using kwargs for that so as not to pollute the documentation with stuff no user is likely to need
Update the ORT format model conversion script to disable NCHWc transformer when level is 'all'
- currently there aren't any known use cases where we'd want the NCHWc transformations to run as they create a device specific model and aren't used on ARM
- the ORT format model is not expected to be generated on the target device (e.g. generate on Windows/Linux/macOS to deploy to Android/iOS so there's a good chance we'd generate a useless/invalid model
- default to 'all' as ARM and MLAS prefer NHWC and the NHWC transformer runs at that level
* Add matching changes to optimizer generation in training code
* Add function body to SoftmaxGrad schema
* Add type context and cleanup
* Add test case with symbolic dimensions
* Add opset specification to function
* handle opset dependence
* Exclude from minimal build
* Promote BiasDropout from orttraining to onnxruntime
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Liqun/ort module perf1 (#6806)
add mysql script to log perf data
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989)
* Reduce logging for ORTModule for the end user (#6982)
* Support none types in forward output (#7001)
* Missed test case for none type output (#7014)
* save iobinding to ctx
* save run_options to ctx
* remove debug tests
* PR comments and clean up
* add RunStateInfo
* remove whitespace edits
* PR comments
* remove test changes
* fix test failure
* Fit unit test test_nesting_forward_backward_calls
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>