Commit graph

1055 commits

Author SHA1 Message Date
pengwa
24eab921be
Enable PythonOp for --enable_training_torch_interop build (#12539)
* enable PythonOp by default when --enable_training_torch_interop is enabled during build

* clean up

* fix

* fix comment

* fix

* fix tests

* fix fallback test

* pylint format

* refine based on comments
2022-08-12 00:49:30 +08:00
Baiju Meswani
3e78f3cf1f
Add win-ci pipeline for on-device training (#12513) 2022-08-10 14:45:39 -07:00
msftlincoln
0d9a02e647
Eager Mode - Support Concatenation via aten::cat.out (#12527)
* support concatenation via aten::cat.out

* wrap dims

* rename vars in tests, test wrapped dims
2022-08-09 17:16:18 -04:00
Adam Louly
2681648f5b
Load checkpoint in cpp (#12352)
* Load checkpoint in cpp

* removed unused imports

* throw error on invalid name and change function name

* inplace model assignment, change name and other comments resolved

* name change  on import

* Addded unit test, resolved comments

* remove unused  imports

* resolved comments

* refactoring too reduce memoory allocation

* resolved extra comments

* changed files hierarchy an force added onnx moodel

* solved order of function argument

* used gtest macros on test cases

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-09 12:30:50 -07:00
Vincent Wang
2bed0d4abb
[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482)
* sce refactor

* refactor

* remove usnecessory memset
2022-08-09 16:48:44 +08:00
pengwa
a2dc3e9eac
Improve the compilation speed when compiling for multiple architectures. (#12490)
* improve the compilation speed when compiling for multiple architectures.

* formatting

* fix

* use 0 by default

* fix comments
2022-08-09 11:52:26 +08:00
Vincent Wang
e85e31ee80
Update ORTModule Default Opset Version to 15 (#12419)
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
2022-08-05 16:55:04 +08:00
Baiju Meswani
a7d6290774
CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412) 2022-08-04 22:28:28 -07:00
LironKesem
d452462b5e
Lironkesem/unsqueeze_and_squeeze (#12421) 2022-08-04 15:12:34 -04:00
Baiju Meswani
7f58bd7236
Perform graph transformations during offline tooling (#12422) 2022-08-03 11:27:12 -07:00
Vincent Wang
99d2a63e1a
Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432)
add seed
2022-08-03 13:29:30 +08:00
smrkatte
54d5e86981
Add cast before copy for dissimilar scalar type (#12391)
* Add proper cast/copy callflow for ORT and non-ORT devices
2022-08-02 18:32:58 -07:00
LironKesem
315e006532
adding a comment on nll_loss_forward.output that can not be implemented (#12406)
adding a comment on nll_loss_forward.output that can not be implemented
2022-08-01 19:12:35 -04:00
msftlincoln
62922f4c3c
Eager Mode generator: add comments, rename functions (#12385)
* eager generator: add comments, rename functions

* lint
2022-08-01 15:52:47 -04:00
pengwa
6d1eb9509e
Refine gradient accumulation (on device training) (#12363)
* a

(cherry picked from commit 43909cdd6e3daf30a82d584292286806d1172a0b)

* optimize inplace accumulator a bit

* fix inputs

* revert logging

* minor fix

* tune perf and resolve comments

* typo

* fix

* fix tests

* move threshold to constexpr.
2022-07-30 10:24:01 +08:00
msftlincoln
9559d25da9
ORT Eager Mode Generator - make smaller functions (#12371)
These changes resulted in no change to the generated outputs ort_aten.g.cpp and ort_customops.g.cpp.
2022-07-29 10:12:34 -04:00
pengwa
6514069749
Make memory profiler work with multiple session runs. (#12317)
* make memory profiler work with multiple session runs.

(cherry picked from commit 5b636b4dd6fe91b75c063696dc73eda33ec36c8d)

* minor fix

* fix build

* fix window build

* 1. fix cpplint issues;
2. give unique filesname for each session profiler result.
2022-07-29 18:36:31 +08:00
msftlincoln
9cf6912bba
Fix ORT Eager Mode to work with Pytorch 1.12 (#12323) 2022-07-27 16:24:46 -04:00
Wil Brady
1163294699
Fixing up some python warnings. (#12319) 2022-07-27 07:24:37 -04:00
Adam Louly
f3dcbf539a
Checkpoint load inference (#12168)
* LoadCheckPoint to tensor cpp functions (draft)

* Load Checkpoint into inference model

* fix python lint

* fix python lint

* Fixing lint and some unused imports

* added assert for zero weights model, resolved other issues

* resolved issues

* Solved issues

* changed variable names for get_models

* paparameters names missmatched fix

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-26 11:08:50 -05:00
Wil Brady
de57daaab0
Eager mode: binary ops more complete behavior and testing. (#12293)
* Remove hand written add_.Tensor as it can now be generated.

* Generate .out for tensor version of basic math ops. Add.out testing added too.

* Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section.

* Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor.

* Add support for scalar input to add, div, mul, and sub.
2022-07-26 09:14:57 -04:00
Vincent Wang
c40f73ae0c
Remove aten::binary_cross_entropy_with_logits from ATen Fallback (#12301) 2022-07-26 07:29:56 +08:00
Dmitri Smirnov
3bf614fd47
Eliminate memory allocations per recent profiling (#12225)
* Alloc begin

FeedsFetches refactoring
Refactor Tensor class
Fix buffer deletor
Remove new/delete deleted
Adjust alloc move
Fix up xnnpack provider
Clarifying the comment on Create()
2022-07-25 14:14:38 -07:00
Baiju Meswani
ddb45e9126
On device training CI pipeline (#11987) 2022-07-25 10:07:17 -07:00
Jameson Miller
8d0e86dec8
Apply project formatting rules to ort_aten.cpp (#12294)
* Apply project formatting rules to ort_aten.cpp

Formatting applied by formatting the file in VS Code.

This file is under active development and the inconsistent formatting
was causing friction due to:

  1. cpplint job on Pipeline was flagging a lot of style issues,
     resulting in a lot of noisy annotations.

  2. local edits would result in changes that are not part of the core change.

While there are other files in this part of the source tree with
inconsistent formatting, this file was causing the most friction. We can
come back and address the other files later, which would be a much
larger change.

* Apply consistent pattern for invoker.Invoke(...)
2022-07-25 07:26:35 -04:00
Vincent Wang
0fa3aeb65c
[CUDA] Add Strided Tensor Support for Expand->GatherElements for Training (#11976)
* strided tensor for expand and gather_elements

* bugfix

* simplify CoalesceDimensions

* resolve comments

* resolve more comments.
2022-07-25 16:05:26 +08:00
pengwa
75bda9f267
CPU AdamW implementation (#11978)
* cpu adamwoptimizer implementation

* unit tests for cpu kernel pass

* refine based on comments

* parallize the weights loop in PrepareForCompute.

* fix wrong test data path

* fix kernel hash

* fix rocm ci pipeline
2022-07-25 09:43:52 +08:00
Juan Paez
4f57da78cf
OrtModule fix pytorch version comparison (#12280)
* fix torch version comparison

* remove patchfile

Co-authored-by: Juan Paez <juanpaez@microsoft.com>
2022-07-22 09:11:28 -07:00
pengwa
feabafe58b
Fix memory consumption discrepancy (#12266)
* release cached cuda memory after temp model_copy run

* op schema change only: remove PythonOp forward output from PythonOpGrad inputs.

* always export model using torch.no_grad

* 1.update PythonOP's "input_requires_grads" attribute according to ORT gradient graph.
2. remove PythonOp's "output_tensor_requires_grads" attribute because in torch.no_grad mode, the exported value is not correct.
3. [related to 2] remove PythonOPGrad's "input_tensor_requires_grads" because it comes from corresponding PythonOP's "output_tensor_requires_grads".

* fix uts

* refine basde on wschin's comments && fix pylint

* fix comments

* fix unused variable
2022-07-22 16:55:50 +08:00
Ashwini Khade
ceb76429db
Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc
Merge On-Device-Training Offline Tooling and C/C++ APIs
2022-07-21 15:09:48 -07:00
Wil Brady
45c0be8a25
Modify generator for eager to use all inputs for determining promote type. (#12268)
* Sort supported types order so we get a consistently generated order of types.
* Fix promote type to include all the input types and not just the first one.
2022-07-21 17:21:10 -04:00
Baiju Meswani
cbf08c7a7b Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments 2022-07-21 18:11:48 +00:00
LironKesem
7dc45bc311
Implementing aten::gt.Scalar_out and aten::lt.Scalar_out (#12181)
* Implementing aten::gt.Scalar_out and aten::lt.Scalar_out

* modified the code according to code review
2022-07-21 10:36:43 -04:00
msftlincoln
424120d0fa
cpplint & Eager mode: refactor and add comments to empty_* functions, general lint cleanup in ort_aten (#12238)
* empty* comments and code reuse

* lint

* more cpplint

* add cpplint settings

* test empty
2022-07-20 11:47:57 -04:00
Vincent Wang
72c689a502
[CUDA] Use dim3.z to Handle Large Input For GatherGrad (#12250)
* use dim3.z to handle large input size

* less blocks
2022-07-20 18:42:52 +08:00
pengwa
ebfd81e67e
Fix BiasGeluGrad bug (#12200)
* use 3D grid to avoid the upper limit of grid dimension

* enrich tests

* Revert "use 3D grid to avoid the upper limit of grid dimension"

This reverts commit 2d5badf2fe8cd985f3f29ee2cb18fff13d07c2ab.

* change to a fix: switch the 1st and 2nd dim
2022-07-20 17:59:29 +08:00
Vincent Wang
3cdc6d7775
[ORTModule] Bugfix of torch.chunk's Custom Symbolic when chunks==1 (#12249)
handle custom chunk with chunks==1
2022-07-20 17:00:41 +08:00
Juan Paez
9b6ef17c5f
Eager opgen support for in-place operations with variadic args (#12125)
* use torch library binding frontend for tensorlist

* fix test

* allow in-place modification of variadic args

* fix lint issues

* update ORT eager readme

Co-authored-by: Juan Paez <juanpaez@microsoft.com>
2022-07-19 21:01:00 -07:00
Jameson Miller
975bb56e8c
Eager mode - argmax_out: set output tensor (#12233)
This change updates the implementation or te argmax_out operator to 1)
set the output tensor correctly and 2) remove the unnecessary use of a
temporary tensor to store intermediate result of onnx ArgMax operation.

Previously, the argmax_out operator did not correctly update the out
tensor - it replaced the OrtValue instead of the memory backing the
OrtValue . To properly update the output tensor, we need to calculate
the expected shape of the out tensor.

We add the helper function calculate_reduction_shape to calculate the
shape of the reduced tensor from the input tensor, dimension to reduce,
and option to keep the reduced dimension or not. This is based on the
utility functions in aten/src/ATen/native/ReduceOpsUtils.h in the
PyTorch repository, but is tailored to be a bit more specific to our
current needs.

Notes:

We considered just directly leveraging PyTorch's utility functions (e.g.
get_reduction_shape) to calculate the shape of the reduced tensor from
aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but
including this header file resulted in warnings around unused functions
that we need to handle. As we only need a limited functionality at the
moment, we instead implemented our own utility function to calculate the
reduction shape for our specific current needs. If we need a utility
function to more generally calculate the reduction shape, we could
consider switching to leveraging the utility methods in PyTorch.
2022-07-19 14:37:03 -04:00
Wil Brady
4235ebc161
Add eager mode support for mm.out (matrix multiplication). (#12214)
* Add eager mode support for mm.out (matrix multiplication).

* Fallback to cpu when mm requirements not met so cpu can print error message.
2022-07-19 07:28:48 -04:00
Michael Melesse
bb5bd08545
[ROCM] Navi21 fixes pr (#11368)
* add scripts

* update docker scripts

* update build script

* create run script

* add test script

* add log 3 flags

* use the right build function

* build navi

* add clean script

* add pytorch like soln

* only build gfx 1030

* use HOST side var

* ignore logs

* update scripts

* GPU_WARP_SIZE_HOST

* update scripts

* remove scripts/amd

* match main

* add GPU_WARP_SIZE_HOST on cuda side

* match main

* correct gfx1030

* remove print

* move gfx add to rocm5.0

* remove inline

* make constexpr on cuda side
2022-07-18 22:26:57 -07:00
Vincent Wang
173bcdbc71
[CUDA] Split/Concat Kernel Optimization (#12175)
* split concat optimization

* bugfix

* fix ut

* deprecate LooseVersion
2022-07-19 08:10:46 +08:00
msftlincoln
52095fb042
Fix line spacing/break issue, extend existing tests (#12191)
* fix line length

* extend test cases

* lint
2022-07-15 19:32:34 -04:00
msftlincoln
a2dc6d32fc
OnnxRuntime Eager: Implement log_softmax with ONNX Ops (#12190)
* share CHECK_STATUS

* log_softmax
2022-07-15 15:03:08 -04:00
msftlincoln
9bca8405aa
bitwise_and ONNX support (#12189)
* bitwise_and ONNX support

* whitespace lint
2022-07-15 12:59:56 -04:00
Wil Brady
89bf6c9b5d
Simple eager training models (#12180)
* Simple NN using ort, and added or modified ort op support.
2022-07-15 09:18:00 -04:00
msftlincoln
fafb24142f
add comment to explain local scalar dense (#12179)
* add comment to explain local scalar dense

* spacing
2022-07-15 09:03:43 -04:00
Wil Brady
9ebef91a6f
Update eager Readme.md (#12170) 2022-07-14 06:05:50 -04:00
PeixuanZuo
7b53b223b8
[UPDATE] update AMD CI pipeline to Rocm5.2 with torch1.11 (#12162)
* [UPDATE] update ci to rocm5.2 + torch1.11

* [Revert] disable ort module test

* [DELETE] delete Rocm5.1.1 ci test result

* [UPDATE] update the comments
2022-07-14 16:38:16 +08:00
Vincent Wang
a7eb9fe3ac
Remove Apex Dependency For Deepspeed FP16_Optimizer (#12077)
* remove apex dependency

* fix amd build
2022-07-14 11:15:53 +08:00