Commit graph

3947 commits

Author SHA1 Message Date
Vincent Wang
f6a8d2aa5f split graphs info 2020-12-15 09:03:08 -08:00
Vincent Wang
cfd57c0136 fix input order, and input grad. 2020-12-15 09:03:08 -08:00
Vincent Wang
e759da178d bugfix for graph inputs and outputs. 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
b7564d0732 Refactor after Vincent work on splitting on backend 2020-12-15 09:03:08 -08:00
Vincent Wang
6d8fde8324 sample code change. 2020-12-15 09:03:08 -08:00
Vincent Wang
934feb0c99 gradient graph split in backend. 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
ea5871ac15 Change DropouGrad.input[1].input_type and del logits_grad from backward graph 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
f1dc6e4007 Refactor BERT classifier fine tune for better debugging 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
d4917f2d65 Hard-code input types for DropoutGrad on BERT 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
3b267d1d60 Add BERT classifier example 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
30042b6e0e Update InferenceSession usage to match master 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
8b0ade0e83 Integrate automatic graph split into ORTModule 2020-12-15 09:03:08 -08:00
Vincent Wang
c36c8e14a7 refactor 2020-12-15 09:03:08 -08:00
Vincent Wang
26e6d6d004 module transformer 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
3524fb04e8 Add working example for MNIST (MVP) 2020-12-15 09:03:08 -08:00
Thiago Crepaldi
f1b5c25b2d Improve example to display grads before and after optim step 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
f06cafdebd Fix path on test script 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
56ca4ab05b Add flag to allow pytorch-only or ORT flexible api runs 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
d4449d86b9 Add script to run Flexible API MVP PoC 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
e71e08851a Basic plumbing for backward pass. Not fully working 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
77cefcd6c2 Perform forward pass using training graph with intermediate outputs 2020-12-15 09:03:07 -08:00
Thiago Crepaldi
11b69f141e Forward pass using InferenceSession on exported ONNX
Although forward pass works, this has the limitation of not working for
backward pass due to the lack of intermediate tensors needed for
gradient.

Next step is to export a training graph and split it manually
2020-12-15 09:03:07 -08:00
Jesse Benson
a8d549e181 Minor changes to AMD element-wise kernels to converge with CUDA element-wise kernels. 2020-12-15 08:46:36 -08:00
Pranav Sharma
a9548283d0
Don't mark issues that are marked as enhancement as stale (#6134) 2020-12-14 18:57:40 -08:00
Edward Chen
9810b9e02b
Reduce amount of compiled CUDA device code (#6118)
Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight.

Make corresponding changes for ROCM execution provider code.

Other minor cleanup.
2020-12-14 15:27:40 -08:00
Sheil Kumar
a6a23db130
Enable C# .NET5 for WinML (#6120)
* build for .net5

* only reference cswinrt for .net5

* remove netstandard2.0 references

* upgrade language version

* net5

* remove extra comment closure

* add targetframework

* set target framework

* remove net*

* pep8 errors

* make test project build with .net windows SDK projection

* disable c# builds for non-x64 builds

* fix pep8 errors

* disable for store build

* fix tests

* remove cswinrt and sdk references from package

* bump cswinrt down to 1.0.1

* fix bin path

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-12-14 15:05:15 -08:00
Sherlock
eb5c1f0fcc
Unify activation and initializer alignment value (#6109)
* Unify activation and initializer alignment value

* Fix VerifyInputTensorsAllocatedContiguously
2020-12-14 13:13:41 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
baijumeswani
dd2e5a1a05
state_dict and load_state_dict for ORTTrainer (#6095)
* add functions state_dict and load_state_dict to ORTTrainer

* unit tests for state_dict and load_state_dict for ORTTrainer
2020-12-14 11:55:52 -08:00
dependabot[bot]
d4dddd99d9 Bump ini from 1.3.5 to 1.3.8 in /nodejs
Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](https://github.com/isaacs/ini/compare/v1.3.5...v1.3.8)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-12 13:06:43 -08:00
Hariharan Seshadri
c755ca0b71
Honor auto_pad attribute in ConvTranspose (#4271) 2020-12-11 22:30:17 -08:00
Suffian Khan
6cb5d3ac09
Fix multi-tensor LAMB reduction to be deterministic (#6028)
* define ordering of reduction across blocks

* save state

* remove debug code

* remove debug code

* review comments

* significant correction for reduction only over blocks on same tensor

* addressing ocmments

* update rocm/lamb.cc to build as well

* remove times 2048*size in multitensor test until threshold error in rocm resolved

* convert tuple => struct as per recomendation

* update comment

* apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer

* remove excess template arguments from rocm lamb.cc launch_multitensor as well

* fixes for AMD build

* pr comments

* run formatter from vscode

* formatter on cuda files
2020-12-11 13:13:05 -08:00
Edward Chen
c8ac34d6a5
Fix DEBUG_NODE_INPUTS_OUTPUTS test by putting it in a separate process, clean up unused test_main.cc files. (#5949)
Move the DEBUG_NODE_INPUTS_OUTPUTS test into its own process. The implementation uses static variables which do not interact well with other tests.
Clean up old test_main.cc files which are no longer used.
2020-12-11 11:36:58 -08:00
Sherlock
a53f4dd379
Introduce VariadicAlias, remove hardcoded alias limits (#6106)
* Introduce VariadicAlias, remove hardcoded alias limits

* Include optional-lite in winml build

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-11 10:47:08 -08:00
Jesse Benson
38c49c2483 Make ROCM and CUDA reduction_all code more similar. 2020-12-11 09:35:07 -08:00
Ryan Lai
1eb146f561
Implement conversion from ORT String to WinML Tensor String (#6097)
* Implement conversion from ort string to winml string

* NIT:comment
2020-12-10 17:47:50 -08:00
Ryan Lai
8bcb5fd119
Add skip test reason for onnx model zoo models and tier 2 models (#6081) 2020-12-10 14:41:17 -08:00
Ryan Lai
753af576c4
If building inbox, hook up winrt_activation_handler for WinML Tests (#6074)
* If building inbox, hook up winrt_activation_handler with what is already defined in in dllload.cpp

* Add base.h header

* Missed custom ops test
2020-12-10 14:41:01 -08:00
Du Li
e945b5fcf6
adding fp16 support for topk cuda kernel (#6082)
* adding fp16 support for topk.

* disable fp16 tests for cpu ep

Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-10 11:04:19 -08:00
Vincent Wang
7ddeafdfcc
Add ReduceL2Grad and ClipGrad (#5970)
* ReduceL2Grad and ClipGrad.

* fix win build and amd ci pipeline

* resolve comments.

Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-12-10 11:03:26 +08:00
RandySheriffH
404982ded5
Enable varied input type for custom op (#6066)
* allow custom op taking varied types

* refactor test case

* add test model

* refactor test case

* enable copy elision

* update test case

* fix issue in ToString function
2020-12-09 15:10:42 -08:00
Jesse Benson
cc47cfcb31 Update AMD transpose to match CUDA transpose. 2020-12-09 11:00:18 -08:00
Edward Chen
abdbb5fc84
Reduction kernel optimization (#6088)
Optimize reduction kernel code by moving loads from global memory before computation.
Add CMake option to build CUDA code with --generate-line-info option.
2020-12-09 10:20:23 -08:00
Sergii Dymchenko
9e26e59a37
Deprecate opsets <12 for training. (#6027) 2020-12-09 00:15:27 -08:00
Weixing Zhang
d95fc5e849
clean un-used code. (#6059)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-12-08 23:15:30 -08:00
Weixing Zhang
2705115732
add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821)
* add HSA_NO_SCRATCH_RECLAIM=1 to dockerfile

It is to work around an issue in AMD compiler which generates poor GPU ISA when the type of kernel parameter is a structure and “pass-by-value” is used

* update BUILD.md

* add dockerfile for rocm3.10
2020-12-08 23:14:56 -08:00
ashbhandare
b1a75d0e98
Enable passing initial optimizer state while creating training session (#5869)
* Support to pass initial optimizer states to optimizer graph builder

* Changes for passing init optim state to training session config

* Pass optimizer state through cpp and python frontend

* Cleanup

* Review comments

* Fix windows and mac CI

* Review comments

* review comments

* Review comments

* Frontend review changes

* Fix CI
2020-12-08 21:20:51 -05:00
Sherlock
7a43fa0028
Fix AllReduce kernel for contiguous buffer (#6064) 2020-12-08 15:55:13 -08:00
Edward Chen
e357486707
Fix build definition template typo, add logging (#6065)
Fix a typo in tools/ci_build/github/azure-pipelines/templates/get-docker-image-steps.yml.
Add logging to tools/ci_build/get_docker_image.py for easier debugging.
2020-12-08 15:16:50 -08:00
baijumeswani
523d187193
save data to and load data from an hdf5 file for checkpointing (#5975)
* save python dictionary to hdf5 representation and load an hdf5 file into a python dictionary

* unit tests for saving data to and loading data from hdf5 file
2020-12-08 11:40:57 -08:00