Commit graph

20 commits

Author SHA1 Message Date
Sherlock
60dbd8a1e5
Update maximum batch size for UT; Include recompute modes (#5444)
* Update MaxBatchSize and include recompute mode
* Minor fix for frontend test

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-12 14:50:43 -07:00
Sherlock
37445d1198
Update Bert Perf Script (#5339)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-30 14:30:20 -07:00
Vincent Wang
7fb194d03d
Update convergence baseline for ci_test. (#4465)
Co-authored-by: Vincent Wang <weicwang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-07-09 15:29:36 +08:00
ytaous
4380b8ba68
adjust bs size (#4375)
Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-30 10:29:48 -07:00
edgchen1
63bf587623
Use azcopy to download test data (#4221)
Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy.
Update download_test_data.py to use helper function.
2020-06-16 10:14:34 -07:00
ytaous
5d28efd434
opset12 code cleanup (#4242)
* opset12 code cleanup

* opset12 code cleanup

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 19:45:35 -07:00
ytaous
e0334f177c
Opset12 upgrade for existing models used by perf/e2e pipelines (#4238)
* opset12 support

* opset12 support

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 14:26:53 -07:00
edgchen1
ba74914c5a
Remove evaluation output from training e2e test baseline data. (#4092) 2020-06-01 15:06:21 -07:00
ytaous
72d508b7a0
New perf metric - e2e throughput (#4085)
* new metric

* on comments

* tab to spaces

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-01 12:11:34 -07:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
ytaous
fb4efafc8e
GPT-2 training perf scripts (#3974)
* gpt2 training perf

* gpt2 training perf

* debug

* debug

* debug

* fix bug

* minor

* on comments

* dynamic sql

* fix build

* minor

* linked hash

* on comments

* minor

* mem

* minor

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-19 10:21:40 -07:00
ytaous
93eb9bcfde
Add yaml/perf scripts for new perf test pipeline (#3909)
* yaml/perf scripts for new pipeline

* yaml/perf scripts for new pipeline

* remove unused imports

* testing some comments change

* testing some comments change

* testing jdbc

* testing jdbc

* testing jdbc

* exclude pwd from jdbc properties

* exclude pwd from jdbc properties

* namedtuple

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-13 14:15:17 -07:00
Xueyun Zhu
f1ba9aaf34
Add pipeline transformer for wait/record node (#3513)
* pipeline transformer

* clean up

* address feedback

* add record/wait for first stage and updated split script

* address feedback

* make recv/send signal as initializer

* merge

* address feedback

* unify input and initializer

* address feedback and bug fix

* minor fix

* windows build

* fix
2020-04-22 23:28:01 -07:00
pengwa
2c7c45076b
MaxBatchSize E2E Test (#3454)
* max batch size e2e test

*update test data snapshot
2020-04-15 09:50:44 +08:00
Thiago Crepaldi
15e32b44fd
Merge pull request #3383
Merge from master into ort_training
2020-04-06 19:05:01 -07:00
Edward Chen
95707d22a5 Disable gradient clipping for E2E test. 2020-04-06 23:07:28 +00:00
Xueyun Zhu
efc8bd738f
add pipeline graph split script (#3275)
* pipeline graph cut

* add element type

* add input wait event and shape info

* shape inference

* support multiple cuts

* format script

* address feedback

* address feedback
2020-03-31 19:30:18 -07:00
edgchen1
d9f628cb1d
Remove orttraining/tools/scripts/profile directory. (#3268) 2020-03-19 14:13:05 -07:00
Jesse Benson
3a7539e071 Update bert-base convergence values 2020-03-13 23:03:34 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00