Commit graph

582 commits

Author SHA1 Message Date
Ryan Hill
80cae23393 Merge with master 2021-04-14 19:07:25 -07:00
Jesse Benson
be79575c6a Use built-in reduce_sum() for simple reduction cases, specifically reduce all to a scalar. 2021-04-14 08:55:35 -07:00
ashbhandare
6ceee5d131
IsInf ReduceSum transform (#7188)
* IsInf ReduceSum transform

* Revert unnecessary changes, add isinf_only and isnan_only attr

* add tests, review comments

* Disable test for non-cuda

* Move IsAllFinite from training to contrib op

* review comments

* Review comment, formatting

* Enable test for ROCm EP
2021-04-13 16:05:21 -07:00
G. Ramalingam
f8a36dd6b3
Add DropoutGrad function body (#7310)
* Add DropoutGrad function body

* Add DropoutGrad function body

* Fix documentation and add test cases

* Fix template specialization

* Check expansion for float16 and bfloat16
2021-04-13 14:31:53 -07:00
harshithapv
a5d3a52d1a
Add Tile grad (#7289)
* tile grad

* fixed bugs

* added tile grad test

* bug fix

* Added tests. Addressed comments

* added optimization recommended and addressed comments

* fixed comment
2021-04-13 12:54:45 -07:00
Ryan Hill
20644043e5 Fix merge breaks 2021-04-12 17:08:11 -07:00
Ryan Hill
57591f5b27 Merge with master 2021-04-12 16:51:35 -07:00
Ryan Hill
a841d17d06 More build options tested, converted the training ops over. 2021-04-12 14:02:08 -07:00
Weixing Zhang
75c0192e4f
enable more unit tests for ROCM EP (#7307) 2021-04-09 15:15:13 -07:00
baijumeswani
b221a4fd86
Better error message when ORTModule used with torch.DataParallel (#7287)
* Better error message when ORTModule used with torch.DataParallel
2021-04-09 10:07:22 -07:00
Weixing Zhang
c22963c23d
Polish Lamb Kernel (#7299) 2021-04-09 09:55:57 -07:00
Weixing Zhang
8ad5007f8f
Polish Adam kernel (#7294)
* Polish Adam kernel
2021-04-09 01:11:09 -07:00
Thiago Crepaldi
7b4362c21a
Add support to dynamic positional/keyword input for ORTModule (#7189) 2021-04-08 12:46:21 -07:00
ytaous
e14b291ce7
Enable symbolic shape inference in ORTModule (#7282)
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-08 09:47:09 -07:00
baijumeswani
d272c8434d
Suppress tracer warnings from onnx export in ORTModule (#7221)
* Suppress tracer warnings from onnx export in ORTModule
2021-04-08 03:41:38 -07:00
Sherlock
aa2c465143
Restrict ConvGrad to __CUDA_ARCH__>=700 (#7278)
* Restrict ConvGrad to __CUDA_ARCH__>=700

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-07 20:10:29 -07:00
Vincent Wang
beb299e17d
ConvGrad CUDA Kernel Bugfix (#7273)
* bugfix

* add ut
2021-04-08 08:22:18 +08:00
baijumeswani
844361bc67
Support eval mode and torch.no_grad context in ORTModule and restructure ortmodule.py (#7162) 2021-04-07 09:29:54 -07:00
Sherlock
4bc17ca04e
CUDA ConvGrad Kernel (#7227)
* ConvGrad CUDA impl

* Set up the test case for Deberta Conv1D

* Add fp16 test

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-06 22:09:06 -07:00
Derek Murray
25e261f196
Avoid passing zero bias to Gemm in gradients (#7244)
* Avoid passing zero bias to Gemm in gradients

The bias argument to Gemm is optional and defaults to zero. Therefore we do not need to generate zero initializers and pass them to that argument.

* Remove unused declaration.
2021-04-06 16:49:34 -07:00
ashbhandare
2aa89989c4
Not-where fusion (#7182)
* Not-where fusion

* Change to rewrite rule

* Add to inference transforms

* Support numtiple where consumers

* review comments
2021-04-06 16:12:26 -07:00
raviskolli
5d759e182b
Allocate external Rocm allocator via PyBind (#7148)
* Enabled rocm support for graph transformations

* Support for external Hip allocator

* Added const_cast to reinterpret_cast to fix compiler issue

* Another crack at fixing the compile error

* More compilation fixes

* Added compilation flags to load_inline extension

* Added ROCM, ROCM_PINNED constants

* Changes to address PR comments

* Changed gpu identifier from ROCM to CUDA

* Added HIP compilation flag for torch inline functions

* Fixed a typo in header allocator string formatting

* Fix for runtime error with external_cuda_allocator

* Removed cuda/rocm specific code paths for allocators

* More name changes to generic gpu from rocm/cuda

* Removed duplicate allocator creation

* Rename cuda_external_ config options as gpu_external_

* Rename hip_mem_limit to gpu_mem_limit

* Rename cuda_mem_limit to gpu_mem_limit
2021-04-06 15:23:51 -07:00
G. Ramalingam
a9ff4c29e5
Add function body to GeluGrad schema (#7190)
* Add GeluGrad function definition

* complete gelugrad function definition

* add opset to function definition
2021-04-06 12:40:59 -07:00
ashari4
56b22c1c6b
Fix assert that the tensor's device type is 'cpu' #7248 2021-04-06 09:08:32 -07:00
Pranav Prakash
3b16afc0db
Make dW optional for convgrad (#7083) 2021-04-05 17:05:20 -07:00
Suffian Khan
9f14af9809
Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240)
* restore bs test and add perf test

* update perf number and fix path to results
2021-04-05 15:51:52 -07:00
ashbhandare
2b8513539e
Div mul fusion (#7183)
* Div mul fusion

* Change to rewrite rule

* Add to inference transformers
2021-04-05 09:35:30 -07:00
Weixing Zhang
74ee24cf7f
rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226)
With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-05 09:04:04 -07:00
baijumeswani
68b12a6179
Support for saving and loading pytorch compatible state dictionaries (#7220)
* Override methods on torch.nn.Module to get direct access to the methods on the original module.
2021-04-05 03:40:41 -07:00
Weixing Zhang
59b57d8322
HSA_NO_SCRATCH_RECLAIM and RCCL_ALLTOALL_KERNEL_DISABLE are not needed for ROCm 4.1 (#7224)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-02 18:19:11 -07:00
Weixing Zhang
ef88dc912c
enable more unit tests for ROCM EP (#7222) 2021-04-02 15:57:08 -07:00
Sherlock
a98c2ebb8c
Enable saving optimized models in OrtModule (#7214)
* Enable saving optimized models in OrtModule

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-02 12:37:05 -07:00
Weixing Zhang
a3f17c8b0d
update lamb and GatherGrad kernel for ROCm EP (#7184)
With ROCm4.1, the CUDA implementation of Lamb and GatherGrad can be
utilized for ROCm EP.
2021-04-02 09:02:49 -07:00
Edward Chen
0ebeaf529d
Check kernel def hashes (#7120)
Add unit test for verifying kernel def hashes.
Add way to add new types to kernel definition without changing hash.
2021-04-01 17:42:58 -07:00
ashbhandare
15c67ddbf0
Make output 1 of ConcatTraining Optional and place on CPU (#7199)
* Optional input 1 on CPU ConcatTraining

* Rename output_1
2021-04-01 16:05:17 -07:00
Tang, Cheng
07201bac7a
expose session option and provider options (#7112)
* expose session option and provider options

* merge provider_names and provider_options

* integrate into orttrainer options

* fix doc string

* fix a typo

* Update orttraining/orttraining/python/training/orttrainer.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update orttraining/orttraining/python/training/orttrainer.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update orttraining/orttraining/python/training/orttrainer_options.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* fix the usage of provider_options

* Update orttraining/orttraining/python/training/orttrainer.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update orttraining/orttraining/python/training/orttrainer.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* update expected result in tests

* fix default provider options

* minor update to trigger rebuild

* minor update to trigger rebuild

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-30 09:49:45 -07:00
Scott McKay
9297527b7a
Enable NHWC transformer when generating ORT format model (#7126)
* Allow specific optimizers to be disabled.
  - replace unused ability to specify just the optimizers to run
    - never used so not needed
Allow the disabled list to be specified via the python bindings
  - expected usage is internal, so using kwargs for that so as not to pollute the documentation with stuff no user is likely to need
Update the ORT format model conversion script to disable NCHWc transformer when level is 'all'
  - currently there aren't any known use cases where we'd want the NCHWc transformations to run as they create a device specific model and aren't used on ARM
    - the ORT format model is not expected to be generated on the target device (e.g. generate on Windows/Linux/macOS to deploy to Android/iOS so there's a good chance we'd generate a useless/invalid model
  - default to 'all' as ARM and MLAS prefer NHWC and the NHWC transformer runs at that level
* Add matching changes to optimizer generation in training code
2021-03-29 18:39:48 +10:00
Jeff Daily
65ce5f07b3
add Dockerfile.rocm4.1.pytorch (#7152) 2021-03-26 21:40:10 -07:00
Sherlock
ab86634c36
Address comments from ORTModule master merge (#7101)
* Address ortmodule merge master comments

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-26 16:26:42 -07:00
Thiago Crepaldi
a01f15198c
Add support for large models (#7113)
* Add support for large models

* Handle models with registered buffers
2021-03-26 14:08:46 -07:00
KeDengMS
c9b29fbd06
Disable MatmulTransposeFusion for CPU EP (#7135)
It causes convergence issue in BERT on CPU
2021-03-25 17:16:58 -07:00
G. Ramalingam
cc0e7bee76
Add function-body to SoftmaxGrad (#6988)
* Add function body to SoftmaxGrad schema

* Add type context and cleanup

* Add test case with symbolic dimensions

* Add opset specification to function

* handle opset dependence

* Exclude from minimal build
2021-03-25 11:34:06 -07:00
Vincent Wang
fda0470683
Add New AllocKind for YieldOp Outputs, Run YieldOp with InferenceSession in UT (#7125)
* new allockind, add ut

* change macro

* fix win build

* rename alloc kind

* fix mem leak
2021-03-25 15:18:51 +08:00
Sherlock
1c8d874412
Promote BiasDropout from orttraining to onnxruntime (#7116)
* Promote BiasDropout from orttraining to onnxruntime

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-24 20:42:42 -07:00
jingyanwangms
cd67f12add
Move IOBinding and RunOptions to ctx (#7028)
* Liqun/ort module perf1 (#6806)

add mysql script to log perf data
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989)

* Reduce logging for ORTModule for the end user (#6982)

* Support none types in forward output (#7001)

* Missed test case for none type output (#7014)

* save iobinding to ctx

* save run_options to ctx

* remove debug tests

* PR comments and clean up

* add RunStateInfo

* remove whitespace edits

* PR comments

* remove test changes

* fix test failure

* Fit unit test test_nesting_forward_backward_calls

Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-24 17:51:00 -07:00
harshithapv
540eac253e
Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078)
* adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule

* fixed typo in args

* addressed Thiago's comments

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-24 09:43:05 -07:00
Suffian Khan
5cb8934459
update Dockerfile for workaround for issue in RCCL for rocm4.0 (#7108) 2021-03-23 13:36:04 -07:00
Suffian Khan
c0994fdfbb
Update ORTTrainer to permit Rocm and permit export of opset 13 (#7059)
* update orttrainer to permit rocm and allow export for opset 13

* wrap rocm check in try-except block
2021-03-23 11:09:48 -07:00
baijumeswani
c3310efdcd
Support for models having partially non trainable parameters (#7058)
* Support for models having partially non trainable parameters
2021-03-23 09:41:16 -07:00
Sherlock
5ec0e71542
ORTModule support non-differentiable module output (#7048)
* Handle non-differentiable module output

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-22 15:46:11 -07:00