Commit graph

2434 commits

Author SHA1 Message Date
Scott McKay
42cf971ca2
Add a couple of utility scripts to tools/python (#3621)
* Add a helper script to more easily create a test directory for use with onnx_test_runner or onnxruntime_perf_test.
Add example script that can be used as a base for performance testing a model with a variety of input sizes.
Add __init__.py so files in this directory can be imported in other scripts.

* Fix some flake8 warnings.
Add example of specifying attribute for op.

* Add ability for test dir creation to fill in all missing input data with random values.
Add example of using test dir creation this way
2020-05-02 17:35:43 +10:00
edgchen1
440f361363
Remove orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3788) 2020-05-02 00:35:08 -07:00
Sheil Kumar
43a828f0a2
Add tests for WinRT Projection Raw ABI consumption (#3718)
Add tests for WinRT Projection Raw ABI consumption
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-05-02 00:33:17 -07:00
Tianlei Wu
3fab8ebfe9
(MaximKalininMS) Fix Reshape Fusion and Crash in Reshape (#3777)
* Fix a crash in Reshape
Reshape doesn't handle 0 input dimension properly, which leads to a
division by zero

* Fix reshape fusion
https://github.com/microsoft/onnxruntime/pull/3554 introduced a bug:
initializers can now come before Shape->Gather->Unsqueeze chains; if
those initializers have more than 1 element, expected dimensions in the
chains are now incorrect.

Authored-by: Max Kalinin <makalini@microsoft.com>
2020-05-02 00:20:00 -07:00
Scott McKay
15eca74d15
Make ThreadPool::PartitionWork a bit more user friendly. Update a few places to use PartitionWork. (#3795) 2020-05-02 17:09:55 +10:00
Pranav Sharma
2b8d9ef0fd
Refactor scatter/gather ops to use the new cost based threadpool abstractions. (#3776)
* Update Scatter and Gather ops by replacing pragma omp invocations with the new threadpool abstractions.

* Use forward declarations

* PR comments
2020-05-02 17:09:31 +10:00
M. Zeeshan Siddiqui
4f9f6aedea
CUDA/CPU test for NegativeLogLikelihoodLoss(12) function based loss operator. (#3793) 2020-05-01 21:36:29 -07:00
Sheil Kumar
b1c4d6ff4e
bump dml version (#3792)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-05-01 18:16:00 -07:00
David Brownell
4b8fad214a
Initial checkin (#3791) 2020-05-01 14:58:49 -07:00
Scott McKay
11b819054b
Fix tree ensemble threading bug (#3778)
* Fix first instance where the calculation for TryBatchParallelFor was incorrect. Other usages need to be validated.

* Fix some other usages of the threadpool.
2020-05-02 07:50:35 +10:00
Scott McKay
2fc3984e70
Add test that C is unidirectionally broadcast-able before fusing the MatMul with Add. (#3780)
Addresses #3764
2020-05-02 07:36:21 +10:00
Xueyun Zhu
e8e95110d3
add pipeline to distributed context config (#3789)
* add pipeline to distributed context

* white space
2020-05-01 13:49:51 -07:00
M. Zeeshan Siddiqui
517bff9675
Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782)
* Function expansion support, Update ONNX to 1.7 release candidate 1.

* Renable disabled tests.
2020-05-01 10:35:16 -07:00
edgchen1
047975e404
Address flaky test ReduceApiTest.Sum. (#3716)
Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.
2020-05-01 09:18:26 -07:00
Changming Sun
edd5855fb7 Remove eigen device from thread pool 2020-05-01 02:21:57 -07:00
George Wu
dcb1a21552
fix python package linux gpu failure (#3786)
* pin base image for manylinux2010_gpu

* pin base image for Dockerfile.manylinux2010
2020-05-01 17:04:59 +08:00
stevenlix
99ec93ea42
Apply onnx-tensorrt bug fixes (#3785)
* merge latest onnx-tensorrt parser

* differentiate kernel names between graph and subgraph

* merge more TRT parser bug fixes

* merge more onnx-tensorrt bug fixes

* fix merge issue

Co-authored-by: stevenlix <stevenlix>
2020-05-01 16:51:48 +08:00
Pranav Sharma
e42e0d4787
Update documentation + Update mlas threading lib to use the new TrySimpleParallelFor. (#3779) 2020-05-01 00:23:06 -07:00
pengwa
29234458af
disable cublasHgemm for training (#3769)
* disable cublasHgemm for training
2020-05-01 13:57:37 +08:00
pengwa
98b97be635
collect the last few iteration latency for throuput calculation (#3766) 2020-05-01 13:24:17 +08:00
Ori Levari
ad63e2593d
avoid using LocalFree on FormatMessageA buffer (#3772)
* avoid using localfree for FormatMessageA buffer because it is only supported on windows 10

Co-authored-by: Ori Levari <orlevari@microsoft.com>
2020-04-30 20:51:09 -07:00
Dmitri Smirnov
f68a326bd9
Implement Pow(12) for cpu and cuda (#3727)
* Implement Pow(12) cpu and cuda.
2020-04-30 15:29:39 -07:00
Pranav Sharma
027a364922
Remove usage of openmp in reverse seq impl. (#3754) 2020-04-30 14:44:25 -07:00
Changming Sun
62c730a8df
Revert softmax kernel implementation (#3753)
Revert softmax kernel implementation to the previous version.(commit 4d26f2ce86)

For
1. Reducing the binary size. Currently the single kernel is 500KB.
2. Removing dependency on Eigen::ThreadPoolDevice
2020-04-30 14:38:41 -07:00
RandySheriffH
86eaa71ec6
sync threads before calling next cub function (#3758)
Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2020-04-30 14:16:46 -07:00
liqunfu
af3988198c
Liqun/e2e transformer test (#3540)
* initial change to transformer.py

* prepare e2e transformer tests

* refactor transformer tests

* put test python files in a flat folder

* fix typo pip install transform(s)

* python 3.6

* python version to 3.6 in install_ubuntu.sh

* remove argparser

* to use opset ver 12

* workaround loss_scale naming patch in case of loss_fn_

* assign self.loss_fn_ so it can be checked

* skip a few un-needed post-process steps

* fix loss_scale_input_name, clean up post process steps

* skip non-frontend tests

* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* update with reviewers' comments

* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference

* fix merge mistakes

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
2020-04-30 12:26:38 -07:00
pengwa
177c1357f4
Use cublasHgemm "back" for fp16 computation with Volta GPU (#3765)
* Use cublasHgemm for fp16 computation with Volta GPU
2020-05-01 00:36:07 +08:00
Scott McKay
3421ec1110
Add Threadpool::TrySimpleParallelFor (#3759)
* Add TrySimpleParallerFor so that there's a path with OpenMP awareness for SimpleParallelFor. Makes it consistent with [Try]BatchParallelFor and [Try]ParallelFor.
Update TopK to check for the number of threads better, and to use TrySimpleParallelFor.

* Update doco to mention TrySimpleParallelFor
2020-04-30 20:03:33 +10:00
M. Zeeshan Siddiqui
b9a5ed1fe2
Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760) 2020-04-30 02:48:21 -07:00
Scott McKay
9f72752397
Fix 'Install ONNX' CI failure (#3761)
* Disable flaky test temporarily

* turn off pip upgrade warning

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com>
Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>
2020-04-30 18:18:58 +10:00
pengwa
0531acccc5
Refine GatherND CPU/CUDA Kernels & Add UTs (#3688)
* Refactor GatherND CPU Kernel (Renaming & Simplify)

* Add batch_dim=1 or 2, negative slices tests

* Rename gather_nd_gard_impl.cu

* Use dispatcher to refactor CUDA GatherND/GatherNDGrad

* Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute

* Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests
2020-04-30 10:17:54 +08:00
ashbhandare
58f53966d3
Add Distributed Checkpointing support (#3639)
* Change naming of moments to Moment_x_<weight_name>

* Add checkpointing code and zero checkpoint aggregation

* Correct aggregation for LAMB, cleanup

* Add simple checkpointing test

* Add test for zero checkpoint aggregation

* Fix tests

* fix test

* Review changes

* Fix test after review comment fix

* Fix API, test

* Fix test after API change

* Decouple save load from ORTTrainer

* Add flag to not break checkpointing with ORTModel'

Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-04-29 14:52:21 -07:00
David Brownell
7296e06dd5
Properly creating arguments to pass to setup.py (#3744) 2020-04-29 09:47:51 -07:00
suffiank
ea0e2d1dde
fix warning treated as error due to ignoring return status (#3739)
Co-authored-by: suffian khan <sukha@microsoft.com>
2020-04-29 02:38:53 -07:00
suryasidd
e529464a12
Limit the number of models run on OpenVINO (#3742)
* Removed NMS from supported list
2020-04-29 02:23:09 -07:00
Changming Sun
7ff06056bd
Fix the test coverage pipeline (#3710) 2020-04-28 21:21:19 -07:00
Tixxx
0638565fe0
Fix evaluation issues (#3538)
* allow switching between eval and training modes dynamically

Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>
2020-04-28 21:03:37 -07:00
M. Zeeshan Siddiqui
939589c265
Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. (#3734)
* Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU.

* fix gather test?

* PR feedback.
2020-04-28 19:35:14 -07:00
Pranav Sharma
bad90d7a53 Fix a perf regression by providing a better estimate for the cost in LSTM's TryParallelFor call. 2020-04-28 19:25:20 -07:00
gwang-msft
12d7c2f6e4
iOS cross build on MacOS (#3699)
* Enable iOS cross build on MacOS (step#1)

* Changed parallel option

* fixed style issues

* Enable ios arm64 crossbuild on MacOS

* Enable ios arm64 crossbuild on MacOS

* Enable parallel build for xcode

* Fix arm64 function not 4-byte aligned warning

* Rename onnxruntime_ios.cmake to onnxruntime_ios.toolchain.cmake

* change build.py to use the new ios toolchain file name
2020-04-28 17:09:31 -07:00
Scott McKay
29c12c0f07
Handle dim with value of zero in ConvTranspose (#3728)
* Handle dim with value of zero in ConvTranspose
* Update CUDA implementation and disable zero dim test for some EPs that don't support that yet.
2020-04-29 09:58:36 +10:00
Jeff Bloomfield
9a4d1c7720
Merge pull request #3708 from microsoft/jeffbloo/MergeDmlDev
Merge DML Execution Provider updates
2020-04-28 15:19:51 -07:00
Sheil Kumar
f1a948fd62
Enable telemetry on windows zip packages (#3738)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-28 14:07:11 -07:00
Ori Levari
78fde2c4cb
add downlevel test artifact to windowsai-nuget build (#3711) 2020-04-28 10:05:32 -07:00
S. Manohar Karlapalem
f7cf703d10
[OpenVINO-EP] Optimize MCR Docker image size (#3732)
* updated dockerfile.openvino

* Group all RUN commands and add a 'cd WORKDIR' betwen each

* Update doc with installer and build info

Highlight usage of Online installer package.
Specify --rm option during docker build to avoid caching layer.

Co-authored-by: avidiyal <akhila.vidiyala@intel.com>
2020-04-29 00:08:15 +08:00
edgchen1
1356215bd0
Fix build issues in the Python Packaging pipelines. (#3725) 2020-04-28 08:41:37 -07:00
edgchen1
1bcfd49918
Merge pull request #3731 from microsoft/ettao/ort-2-master
Merge from ort_training to master
2020-04-28 07:56:05 -07:00
George Wu
6b3b4fe43e
remove warning message (#3730) 2020-04-28 03:02:34 -07:00
Jeff Bloomfield
1a11ba8a7e Merge remote-tracking branch 'upstream/master' into jeffbloo/MergeDmlDev 2020-04-28 00:45:22 -07:00
Tianlei Wu
f487cc0b28
Fix Reshape Fusion with graph inputs (#3729)
Use NodeArg to check root input; Add a check on constant initializer
2020-04-28 00:03:16 -07:00