Commit graph

132 commits

Author SHA1 Message Date
suffiank
7f5339505e
Discover trainable parameters using reverse DFS from loss node (#4116)
Discover trainable parameters using reverse DFS from loss node, omitting recursion along untrainable inputs.

Co-authored-by: suffian khan <sukha@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: suffian khan <sukha@microsoft.com>
2020-06-08 14:16:10 -07:00
Sergii Dymchenko
653417ae4b
Fix scaler->scalar typo. (#4142) 2020-06-08 13:02:12 -07:00
Dmitri Smirnov
4e1dac67cd
Address memory leak and improve memory handling (#4124)
Fix memory leak when a Python list passed as a feed.
  Create a custom allocator that can take ownership of python
  arrays that are created inside pybind.
  Allow direct memory use if continuous array is a copy because
  we now can take ownership of it by the allocator.
2020-06-08 09:29:46 -07:00
liqunfu
ffed43e9b8
handle loss and name marching wrappers (#4066)
* handle loss and name marching wrappers

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-05 23:34:26 -07:00
Bowen Bao
1e5307d458
Bug fix for parameter names of models not using wrapper (#4061)
* bug fix for models not using wrapper

* add test case for no wrapper case

* update test case to use internal learning rate

* fix bug with frozen weight update
2020-06-05 12:03:38 -07:00
Thiago Crepaldi
81101c9efd
Fix DropoutGrad op (#4052)
Dropout op was recently changed to accept a new input named
'training_mode', which is passed in to DropoutGrad automatically.

This PR updates the DropoutGrad schema to accommodate the new input.
Tests were also update to reflect the API change

Co-authored-by: Thiago Crepaldi <thiag.crepaldi@microsoft.com>
2020-06-04 15:00:02 -07:00
liqunfu
905c535626
still need to make the test stable. Lower the acc number a bit to make the test pass for now (#4117)
Co-authored-by: liqun fu <liqun@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-02 21:37:48 -07:00
ashbhandare
f18a99b245
Exclude non-trainable torch buffers from trainable weights (#4099)
* Initial changes

* Removed redundant fix

* Revert unintended formatting change.

* Add unit test
2020-06-02 14:05:44 -07:00
edgchen1
ba74914c5a
Remove evaluation output from training e2e test baseline data. (#4092) 2020-06-01 15:06:21 -07:00
ytaous
72d508b7a0
New perf metric - e2e throughput (#4085)
* new metric

* on comments

* tab to spaces

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-01 12:11:34 -07:00
Tixxx
6404aba5ae
Orttraining rc1 master merge (#4080)
* fixed seg fault when using concrete shape
disable gradient as output

* fix evaluation hang issue for multiple gpu run

* Remove dead code, ORTModel and improve docstrings (#3814)

* Refine ORTTrainer docstring descriptions (#3907)
2020-05-29 12:28:12 -07:00
Wei-Sheng Chin
e951b29a0b
Fix a macro and memory regression (#4068)
onnxruntime_training_bert can run the following command again.

./onnxruntime_training_bert --model_name=bert-large-uncased_L_24_H_1024_A_16_V_30528_S_512_Dp_0.1_optimized_layer_norm --num_train_steps=16 --train_batch_size=52 --mode=train --train_data_dir=/bert_data/128/books_wiki_en_corpus/train --test_data_dir=/bert_data/128/books_wiki_en_corpus/test --gradient_accumulation_steps=16 --optimizer=Lamb --learning_rate=3e-3 --max_seq_length=128 --max_predictions_per_seq=20 --warmup_ratio=0.2843 --warmup_mode=Poly --display_loss_steps=100  --use_mixed_precision=True --allreduce_in_fp16 --use_nccl
2020-05-29 09:24:40 -07:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
pengwa
6d03470587
Add e2e measurement for training (#4049)
* add e2e measurement
2020-05-29 10:08:29 +08:00
liqunfu
6665d5e2bc
Liqun/a transformer example (#3845)
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-05-27 15:21:35 -07:00
Xueyun Zhu
633008b5ef
Add pipeline online partition logic for pipeline (#3996)
* online partition

* fix when multiple consumer nodes is in cut info

* fix windows build

* address feedback

* adding test

* feedback

* address feedback

* add parser for cut edge

* windows build
2020-05-26 17:44:09 -07:00
Wei-Sheng Chin
24eda3df33
Create Utils for Adding Range and Marker (#4013)
In this PR, we
  1. create some APIs for creating NVTX objects
  2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in 
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
2020-05-24 22:55:24 -07:00
Bowen Bao
0a5395bb78
Remove 'model_.' prefix from onnx model initializers in training (#3881)
* Remove 'model_.' prefix for onnx model initializers in training

* fix test case remove redundant device test

* rename

* Fix state_dict/load_state_dict with frozen_weight

* nit

* Add monkey patch for pt opset 10

* remove pt patch in CI

* nit: newline
2020-05-20 10:06:31 -07:00
ytaous
fb4efafc8e
GPT-2 training perf scripts (#3974)
* gpt2 training perf

* gpt2 training perf

* debug

* debug

* debug

* fix bug

* minor

* on comments

* dynamic sql

* fix build

* minor

* linked hash

* on comments

* minor

* mem

* minor

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-19 10:21:40 -07:00
Faith Xu
b8a255e1b5
Doc Updates for Build (#3976)
* Initial update of readme

* Readme updates

* Review of consolidated README (#3930)

* Proposed updates for readme (#3953)

I found some of the information was duplicated within the doc, so attempted to streamline

* Fix links

* More updates

- fix build instructions
- nodejs doc reorganization
- roadmap update
- version fixes

* Update ORT Server build instructions

* More doc cleanup

* fix python dev notes name

* Update nodejs and some links

* sync eigen version back to master

* Minor fixes

* add nodsjs to sample table of content

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* address PR feedback

* address PR feedback

* nodejs build instruction

* Update Java instructions to include gradle

* Roadmap refresh

Reformat some data, fix link, minor rewording

* Clarify Visual C++ runtime req

Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com>
Co-authored-by: Prasanth Pulavarthi <prasantp@microsoft.com>
Co-authored-by: manashgoswami <magoswam@microsoft.com>
2020-05-18 20:08:36 -07:00
M. Zeeshan Siddiqui
44731e88bb
Add comments for zero valued normalization factor in SoftmaxCrossEntropyLossGrad CUDA kernel. (#3972) 2020-05-18 09:08:09 -07:00
Wei-Sheng Chin
0d11649bb3
Address comments from #3823 and polish code (#3964)
* Address comments from #3823 and polish code

* One line
2020-05-17 14:08:33 -07:00
M. Zeeshan Siddiqui
a296b16719
Prevent divide by zero in CUDA implementation of SoftmaxCrossEntropyLossGrad. (#3962) 2020-05-16 00:33:25 -07:00
Wei-Sheng Chin
33208c9f6b
Modify Pipeline Facilities to Fix PipeDream Deadlock (#3823)
* Prepare utils for adding Wait's and Record's

* Have a running PipeDream

* Add comments

* Polish comments

* Clean code

* Fix test

* Polish names

* Polish names

* Remove debug headers

* Fix a shape inference bug (not related to pipeline code)

* Fix a warning

* Address some comments

* Address comments

* Only touch consumers of outputs when re-wire edges
2020-05-15 18:27:19 -07:00
ytaous
bc441b7e5c
Add cpu/mem usage for perf metrics (#3947)
* add cpu/mem usage

* on comments

* on comments

* renaming

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-15 12:29:40 -07:00
ytaous
93eb9bcfde
Add yaml/perf scripts for new perf test pipeline (#3909)
* yaml/perf scripts for new pipeline

* yaml/perf scripts for new pipeline

* remove unused imports

* testing some comments change

* testing some comments change

* testing jdbc

* testing jdbc

* testing jdbc

* exclude pwd from jdbc properties

* exclude pwd from jdbc properties

* namedtuple

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-13 14:15:17 -07:00
Bowen Bao
0f82b42fed
Ensure pt model is set to cpu in ort_trainer (#3867)
* Ensure pt model is set to cpu in ort_trainer

* add note comment
2020-05-12 13:32:27 -07:00
Thiago Crepaldi
70abb120b3
Remove ORTModel from frontend API (#3825)
* Resolve conflict

* Address review
2020-05-11 18:20:33 -07:00
M. Zeeshan Siddiqui
c46a9e8d65
Add numerical stability to SoftmaxGrad test inputs. (#3857)
* Increase the tolerance for SoftmaxGrad CPU-GPU compare tests.

* Increase the tolerance for SoftmaxGrad CPU-GPU compare tests.

* Add 1e-2 to Y for numerical stability.

* build break.

* comments.

* PR feedback.

* PR feedback.
2020-05-11 17:59:24 -07:00
ytaous
96030fdcbc
dashboard integration - output training perf metrics as json (#3809)
* dashboard integration - first phase

* change a field

* perf scripts

* addressing PR comments

* address comments and fix build

* minor

* make GetConfigFromData() const

* more update for comments

* addressing comments

* more on addressing comments

* minor

* fix build

* add condition check

* more on comments

* retrun status

* remove batch size

* on comments

* rename pkg path

* rename pkg path

* additional commentss

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-10 10:29:38 -07:00
M. Zeeshan Siddiqui
eb33d5eda9
Do not register Dropout(12) as training ONLY kernel. (#3859)
* Do not register Dropout(12) as training ONLY kernel.

* Move Dropout forward implementation in inference project.

* fix inference build test failures.

* remove fp16 test since its support is absent on CPU.

* build break.
2020-05-09 21:38:17 -07:00
Vincent Wang
3c24841569
Fold Shape Node During Constant Folding (#3748)
* Fold Shape node in constant folding.

* bugfix

* Fix test failure.

* Bugfix for C++ frontend.

* Bugfix for C++ frontend.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-05-09 20:15:03 +08:00
ashbhandare
424a00bf04
Fix enabling gradient as output for easy mode. (#3866) 2020-05-07 15:07:14 -07:00
Wei-Sheng Chin
0aeb383273
Support Pipeline in Training Runner (#3770) 2020-05-06 21:03:36 -07:00
Xueyun Zhu
0e59668c1b
add support for symbolic broadcast for Add/Sub/Mul (#3743)
* add support for symbolic broadcast

* fix comment

* address feedback
2020-05-06 10:40:57 -07:00
Bowen Bao
f7ff5a7aa1
Fix state_dict and save_as_onnx for training (#3774) 2020-05-05 11:47:46 -07:00
Changming Sun
bd78364411
Parallel all the activations ops (#3722)
1. Parallel all the activations ops.
2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms).
3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh.
4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty.
5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.
2020-05-05 01:18:17 -07:00
M. Zeeshan Siddiqui
a24c71af40
Update Dropout(12) forward kernel with training_mode input. (#3805)
* Update Dropout(12) forward and backward kernel with training_mode input.

* Revert deleted assert.

* clean up.

* PR feedback.
2020-05-04 20:05:42 -07:00
M. Zeeshan Siddiqui
6f95cdfa68
Use new cost based threadpool abstractions in CPU gradient operators. (#3807)
* Use ThreadPool abstractions instead of OpenMP.

* PR feedback.
2020-05-04 15:23:10 -07:00
Sherlock
2f8a2364c3
Fix loss function builder (#3801) 2020-05-04 10:41:15 -07:00
M. Zeeshan Siddiqui
4f9f6aedea
CUDA/CPU test for NegativeLogLikelihoodLoss(12) function based loss operator. (#3793) 2020-05-01 21:36:29 -07:00
Xueyun Zhu
e8e95110d3
add pipeline to distributed context config (#3789)
* add pipeline to distributed context

* white space
2020-05-01 13:49:51 -07:00
edgchen1
047975e404
Address flaky test ReduceApiTest.Sum. (#3716)
Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.
2020-05-01 09:18:26 -07:00
pengwa
98b97be635
collect the last few iteration latency for throuput calculation (#3766) 2020-05-01 13:24:17 +08:00
liqunfu
af3988198c
Liqun/e2e transformer test (#3540)
* initial change to transformer.py

* prepare e2e transformer tests

* refactor transformer tests

* put test python files in a flat folder

* fix typo pip install transform(s)

* python 3.6

* python version to 3.6 in install_ubuntu.sh

* remove argparser

* to use opset ver 12

* workaround loss_scale naming patch in case of loss_fn_

* assign self.loss_fn_ so it can be checked

* skip a few un-needed post-process steps

* fix loss_scale_input_name, clean up post process steps

* skip non-frontend tests

* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* update with reviewers' comments

* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference

* fix merge mistakes

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
2020-04-30 12:26:38 -07:00
M. Zeeshan Siddiqui
b9a5ed1fe2
Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760) 2020-04-30 02:48:21 -07:00
pengwa
0531acccc5
Refine GatherND CPU/CUDA Kernels & Add UTs (#3688)
* Refactor GatherND CPU Kernel (Renaming & Simplify)

* Add batch_dim=1 or 2, negative slices tests

* Rename gather_nd_gard_impl.cu

* Use dispatcher to refactor CUDA GatherND/GatherNDGrad

* Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute

* Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests
2020-04-30 10:17:54 +08:00
ashbhandare
58f53966d3
Add Distributed Checkpointing support (#3639)
* Change naming of moments to Moment_x_<weight_name>

* Add checkpointing code and zero checkpoint aggregation

* Correct aggregation for LAMB, cleanup

* Add simple checkpointing test

* Add test for zero checkpoint aggregation

* Fix tests

* fix test

* Review changes

* Fix test after review comment fix

* Fix API, test

* Fix test after API change

* Decouple save load from ORTTrainer

* Add flag to not break checkpointing with ORTModel'

Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-04-29 14:52:21 -07:00
suffiank
ea0e2d1dde
fix warning treated as error due to ignoring return status (#3739)
Co-authored-by: suffian khan <sukha@microsoft.com>
2020-04-29 02:38:53 -07:00
Tixxx
0638565fe0
Fix evaluation issues (#3538)
* allow switching between eval and training modes dynamically

Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>
2020-04-28 21:03:37 -07:00