Commit graph

2237 commits

Author SHA1 Message Date
suffiank
0e12d05cd2
fixes for ort_trainer.py to resume from checkpoint (#3510)
* fixes for ort_trainer.py to resume from checkpoint

* define self.state_dict_ during init

* add comment of explanation

* add unit test for restore from checkpoint

* fix file not found

Co-authored-by: suffian khan <sukha@microsoft.com>
2020-04-22 16:33:58 -07:00
Weixing Zhang
e4fc83252d
Refactoring code related to WARP_SIZE. (#3623)
1. Centralize its definition in common.cuh.
2. Rename it to GPU_WARP_SIZE which can be extended to AMD GPU later.
3. Centralize warp shuffle functions.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-04-22 15:19:06 -07:00
edgchen1
bb9b0ba5b3
Merge pull request #3607 from microsoft/edgchen1/merge_from_master
Merge from master to ort_training
2020-04-22 13:22:32 -07:00
Wei-Sheng Chin
ab70625b29
Add Lamb shape inference (#3634) 2020-04-22 11:32:28 -07:00
Edward Chen
8df5076d96 Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-22 17:16:00 +00:00
Edward Chen
8d09cefafc Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master 2020-04-22 16:56:15 +00:00
edgchen1
b518cb2a7a
Clean up OPTIONAL name conflict workarounds in ort_training. (#3622)
* Clean up OPTIONAL name conflict workarounds.

* Cleanup unnecessory header files onnx_protobuf.h

Co-authored-by: Sherlock Huang
2020-04-22 09:07:55 -07:00
Vincent Wang
d3a2ac5c5c
Eliminate Useless Cast during Transformer. (#3606)
* Remove Useless Cast during Transformer.

* Resolve comments.

* Check if graph can remove the node.

Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-04-22 16:36:46 +08:00
Tianlei Wu
d69bc31309
Refine BERT optimization script options (#3618)
* Remove paramters like --gpu_only --sequence_length. Update bert GPU notebook accordingly.
* Remove input_int32 and float16 parameters from constructors of BertOnnxModel class and other classes derived from it. 
* Update gpt2 benchmark. Add comments in gpt2 notebook to indicate work in progress. Clear notebook output before official 1.3.0 release is ready.
2020-04-21 21:28:06 -07:00
Scott McKay
b4508dbdc6
Improve TopK performance. (#3612)
* Update TopK implementation.
  - add faster heap
  - special case k=1
  - update selector for when to use heap and when to use nth_element based on performance testing
  - parallelize if enough work to do
  - reduce templatized code
  - add some extra unit tests.

Perf tested vs. master. Average speedup is 3.75x using this combination of input sizes:

```
    batches = [10, 25, 50]
    batch_size = [8, 16, 32, 64, 128, 256, 512, 1024, 2048]
    k = [1, 2, 4, 6, 8, 16, 24, 32, 48, 64, 128]
```

For larger batches (e.g. 50x2048) the speedup is over 20x.
2020-04-22 10:05:13 +10:00
edgchen1
5492d02c4e
Remove Windows CUDA 9 build definition and helper scripts. (#3615) 2020-04-21 15:22:27 -07:00
Sherlock
d66d5bb86a
Update Optimizer Domain and Opset (#3602)
* Update Domain and Opset for SGD

* Update Adam Domain and Opset

* Update Lamb Domain and Opset
2020-04-21 15:06:02 -07:00
Edward Chen
47f1758fdc Add --skip_onnx_tests to orttraining Windows builds. 2020-04-21 21:50:35 +00:00
Edward Chen
297ab43b0c Add --enable_onnx_tests to Windows builds to allow set up of test data directory. 2020-04-21 20:34:55 +00:00
Edward Chen
2e4b9b1d0e Disable CudaKernelTest.SoftmaxCrossEntropyLoss_LargeSizeTensor because it's flaky. 2020-04-21 20:30:45 +00:00
Edward Chen
28a0c863b1 Revert "Convert Gelu to use TryParallelFor (#3599)"
This reverts commit 2579a72a88.
2020-04-21 18:45:20 +00:00
Edward Chen
d50c3e7a71 Fix GraphTransformationTests tests. 2020-04-21 18:43:49 +00:00
Pranav Sharma
9636da3951
Threadpool related changes. (#3564)
Threadpool related changes.

Don't create ORT threadpool if openmp is enabled (except for inter op threadpool).
Created a new static function ThreadPool::NumThreads to account for openmp settings and null threadpool ptr.
Log a warning when using SetIntraOpNumThreads when openmp is enabled.
Added a document for ORT devs.
Fix LSTM to use the new threadpool abstractions.
Rename GetNumCpuCores to GetThreadAffinityMasks and move it to the Env class.

Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>
2020-04-21 09:57:39 -07:00
Adam Pocock
3dd3f84116
[Java] Adding model metadata support (#3573)
* java - adding deployment information to build.gradle.

* java - adding support for model metadata.
2020-04-21 02:28:15 -07:00
George Wu
1c37d5e6ec
debug option for dumping tensorrt subgraphs. (#3604) 2020-04-21 11:55:30 +08:00
Edward Chen
87fad09c7b Fix merge issue. 2020-04-21 03:44:44 +00:00
Edward Chen
daa14b64e3 Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-21 03:31:32 +00:00
edgchen1
ead00f97f3
Sync onnx_backend_test_series.py disabled tests (#3603)
Make the set of disabled tests consistent between ort_training and master. Fix some regex patterns.
2020-04-20 18:00:53 -07:00
pengwa
e233e6ba45
Refactor - ScatterElements (#3559)
Refactor ScatterElements using MLTypeCallDispatcherRet to refactor
2020-04-21 08:58:42 +08:00
Changming Sun
2579a72a88
Convert Gelu to use TryParallelFor (#3599) 2020-04-20 17:32:39 -07:00
Changming Sun
911d125323 Remove openmp from gpu build 2020-04-20 17:13:54 -07:00
liqunfu
781e1c36be
Add front-end MNIST test (#3231)
* add frontend minst test

* to use torch nightly with torchvision

* remove incorrect comment per reviewer's comment

* experiment torchvision import failure

* experiment install_deps.sh

* more experiment install_deps.sh

* experiment install_deps.sh with --upgrade

* Experiment with install_deps.sh.

* Experiment with install_ubuntu.sh.

* Use Ubuntu 18.04 and Python 3.6 for CI.

* Update cmake version for CI.

* Install MPI on Ubuntu 18.04 for CI.

* Increase tolerance for MNIST test.

* Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa.

* Clean-up.

* Update ort_trainer.py from ort_training.

* Get default Ubuntu Python ver back to 3.5.

* Add underscore to opset_version parameter name in ORTTrainer constructor.

* Move loss/model wrap before the call for sample output.

* Update expected values for MNIST test.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
2020-04-20 11:19:31 -07:00
edgchen1
f180b71f27
Support ONNX test version parsing from path on Windows in onnx_test_runner. (#3588) 2020-04-20 10:02:51 -07:00
Sheil Kumar
31b6629e99
Fork WinML IDL Guids (#3591)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-20 09:17:07 -07:00
Prabhat
381fee47ab
Added support to build onnxruntime with ACL (#3586)
* Added support to build onnxruntime with ACL

* Added ACL build instructions
2020-04-20 13:35:28 +05:30
Changming Sun
75426a3091 Fix build break 2020-04-19 18:32:46 -07:00
Zhang Lei
422266c445
Support conv transpos 1D in cuda provider. (#3300)
* Support conv transpos 1D in cuda provider.

* Clear some old comment. Enable conv_transpose_1d onnx test for cuda.
2020-04-19 22:07:34 +08:00
Scott McKay
7d5348f87e
Add ability to batch device copy for graph inputs and outputs. (#3580)
* Add ability to batch device copy for graph inputs and outputs.
2020-04-19 17:51:07 +10:00
Prabhat
ea62b3435a
Clean up build.py code (#3466) 2020-04-18 20:48:30 -07:00
Maxim Kalinin
fcf0f6ee9f
Generalize reshape fusion (#3554)
* Generalize reshape fusion

* Allow arbitrary number of Concat arguments
* Apply fusion even when an output of an internal node is used elsewhere
* Fix a bug when an internal node's output is the subgraph output
* Simplify code
2020-04-18 20:47:23 -07:00
Tiago Koji Castro Shibata
14e387aa1a
Fix WinML namespace build break (#3583)
* Add missing winrt namespace

* Conditional compilation of dxcore code

* Fix TAEF macros
2020-04-18 20:46:01 -07:00
Sherlock
56b223bc60
Implement OneHot CUDA Kernels (#3390)
* Implement OneHot CUDA Kernels

* Support fp16

* Use HandleNegativeAxis

* Make MLFloat16 test GPU only
2020-04-18 17:41:39 -07:00
Hariharan Seshadri
1599562016 Fix BatchNorm CUDA kernel definition 2020-04-18 17:21:29 -07:00
Zhang Lei
c365822808
Refactor some for the calibate.py. Add QLinearAdd and QLinearMul support. Fix bugs loading jpgs not strict RGB, and typoes in load_batch call. (#3542) 2020-04-18 17:10:55 -07:00
Dmitri Smirnov
db9566f70d
Implement Inverse(12) for CPU and CUDA (#3485) 2020-04-18 17:10:21 -07:00
Dmitri Smirnov
38a18023c7
Fix some too popular warnings. (#3578)
Some pointless and noisy warnings either fixed or disabled.
2020-04-18 17:05:05 -07:00
Changming Sun
d68245853e
Disable downloading test data on Linux (#3581) 2020-04-18 15:54:58 -07:00
Sergii Dymchenko
3e884b4b6b
Fix some typos. (#3582)
* Fix some typos.

* Fix a typo.
2020-04-18 14:18:05 -07:00
suryasidd
6fe688c732
Disabled failed maxpool test on GPU (#3549) 2020-04-18 13:49:42 -07:00
edgchen1
52cfc98ec4
Merge pull request #3557 from microsoft/havenka/master-merge
Merge from master
2020-04-18 09:40:32 -07:00
edgchen1
811bd67872
Clean up docs. (#3579)
* Fix orttraining/README.md formatting.

* Delete ORT_TRAINING_BUILDS.md.

* Fix typo.
2020-04-17 22:13:11 -07:00
ytaous
ca1bbff5d4
subgraph type override handling and unit test (#3560)
* unit test for subgraph type override

* unit test - re-wire input properly to subgraph

* update args

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-04-17 19:33:34 -07:00
Tianlei Wu
7f46f347db
Add GPT2 Attention Fusion in optimization script (#3488)
* Add Attention fusion for GPT2
* Support distilgpt2 in benchmark_gpt2.py
* Add options to disable Attention/SkipLayerNormalization/EmbedLayerNormalization/BiasGelu fusions
* Add logging at the begining of each fusion
* Update notebooks: Add Gpt2OnnxModel.py to list of script files.
* Add test for gpt2 model optimization
* Add optional parameters (--input_ids --segment_ids --input_mask) for graph inputs
* Fuse BiasGelu
* Handle model that does not have segment_ids input.
* Allow fuse embed layer without mask
2020-04-17 16:23:53 -07:00
Tianlei Wu
5d3b217039
Update Attention operator for GPT2 (#3474)
Add unidrectional mask for Attention operator.
Update mask_index to mask broadcast from B->BxS->BxNxSxS to B->BxSxS->BxNxSxS.
2020-04-17 16:20:40 -07:00
edgchen1
2cb8cb816f
Disable or update flaky tests, improve test random seed accessibility. (#3495)
- Add output of test random seed
- Allow setting of test random seed with environment variable
- Disable / relax tolerance for flaky tests
2020-04-17 15:57:32 -07:00