Commit graph

2409 commits

Author SHA1 Message Date
liqunfu
af3988198c
Liqun/e2e transformer test (#3540)
* initial change to transformer.py

* prepare e2e transformer tests

* refactor transformer tests

* put test python files in a flat folder

* fix typo pip install transform(s)

* python 3.6

* python version to 3.6 in install_ubuntu.sh

* remove argparser

* to use opset ver 12

* workaround loss_scale naming patch in case of loss_fn_

* assign self.loss_fn_ so it can be checked

* skip a few un-needed post-process steps

* fix loss_scale_input_name, clean up post process steps

* skip non-frontend tests

* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* update with reviewers' comments

* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference

* fix merge mistakes

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
2020-04-30 12:26:38 -07:00
pengwa
177c1357f4
Use cublasHgemm "back" for fp16 computation with Volta GPU (#3765)
* Use cublasHgemm for fp16 computation with Volta GPU
2020-05-01 00:36:07 +08:00
Scott McKay
3421ec1110
Add Threadpool::TrySimpleParallelFor (#3759)
* Add TrySimpleParallerFor so that there's a path with OpenMP awareness for SimpleParallelFor. Makes it consistent with [Try]BatchParallelFor and [Try]ParallelFor.
Update TopK to check for the number of threads better, and to use TrySimpleParallelFor.

* Update doco to mention TrySimpleParallelFor
2020-04-30 20:03:33 +10:00
M. Zeeshan Siddiqui
b9a5ed1fe2
Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760) 2020-04-30 02:48:21 -07:00
Scott McKay
9f72752397
Fix 'Install ONNX' CI failure (#3761)
* Disable flaky test temporarily

* turn off pip upgrade warning

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com>
Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>
2020-04-30 18:18:58 +10:00
pengwa
0531acccc5
Refine GatherND CPU/CUDA Kernels & Add UTs (#3688)
* Refactor GatherND CPU Kernel (Renaming & Simplify)

* Add batch_dim=1 or 2, negative slices tests

* Rename gather_nd_gard_impl.cu

* Use dispatcher to refactor CUDA GatherND/GatherNDGrad

* Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute

* Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests
2020-04-30 10:17:54 +08:00
ashbhandare
58f53966d3
Add Distributed Checkpointing support (#3639)
* Change naming of moments to Moment_x_<weight_name>

* Add checkpointing code and zero checkpoint aggregation

* Correct aggregation for LAMB, cleanup

* Add simple checkpointing test

* Add test for zero checkpoint aggregation

* Fix tests

* fix test

* Review changes

* Fix test after review comment fix

* Fix API, test

* Fix test after API change

* Decouple save load from ORTTrainer

* Add flag to not break checkpointing with ORTModel'

Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-04-29 14:52:21 -07:00
David Brownell
7296e06dd5
Properly creating arguments to pass to setup.py (#3744) 2020-04-29 09:47:51 -07:00
suffiank
ea0e2d1dde
fix warning treated as error due to ignoring return status (#3739)
Co-authored-by: suffian khan <sukha@microsoft.com>
2020-04-29 02:38:53 -07:00
suryasidd
e529464a12
Limit the number of models run on OpenVINO (#3742)
* Removed NMS from supported list
2020-04-29 02:23:09 -07:00
Changming Sun
7ff06056bd
Fix the test coverage pipeline (#3710) 2020-04-28 21:21:19 -07:00
Tixxx
0638565fe0
Fix evaluation issues (#3538)
* allow switching between eval and training modes dynamically

Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>
2020-04-28 21:03:37 -07:00
M. Zeeshan Siddiqui
939589c265
Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. (#3734)
* Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU.

* fix gather test?

* PR feedback.
2020-04-28 19:35:14 -07:00
Pranav Sharma
bad90d7a53 Fix a perf regression by providing a better estimate for the cost in LSTM's TryParallelFor call. 2020-04-28 19:25:20 -07:00
gwang-msft
12d7c2f6e4
iOS cross build on MacOS (#3699)
* Enable iOS cross build on MacOS (step#1)

* Changed parallel option

* fixed style issues

* Enable ios arm64 crossbuild on MacOS

* Enable ios arm64 crossbuild on MacOS

* Enable parallel build for xcode

* Fix arm64 function not 4-byte aligned warning

* Rename onnxruntime_ios.cmake to onnxruntime_ios.toolchain.cmake

* change build.py to use the new ios toolchain file name
2020-04-28 17:09:31 -07:00
Scott McKay
29c12c0f07
Handle dim with value of zero in ConvTranspose (#3728)
* Handle dim with value of zero in ConvTranspose
* Update CUDA implementation and disable zero dim test for some EPs that don't support that yet.
2020-04-29 09:58:36 +10:00
Jeff Bloomfield
9a4d1c7720
Merge pull request #3708 from microsoft/jeffbloo/MergeDmlDev
Merge DML Execution Provider updates
2020-04-28 15:19:51 -07:00
Sheil Kumar
f1a948fd62
Enable telemetry on windows zip packages (#3738)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-28 14:07:11 -07:00
Ori Levari
78fde2c4cb
add downlevel test artifact to windowsai-nuget build (#3711) 2020-04-28 10:05:32 -07:00
S. Manohar Karlapalem
f7cf703d10
[OpenVINO-EP] Optimize MCR Docker image size (#3732)
* updated dockerfile.openvino

* Group all RUN commands and add a 'cd WORKDIR' betwen each

* Update doc with installer and build info

Highlight usage of Online installer package.
Specify --rm option during docker build to avoid caching layer.

Co-authored-by: avidiyal <akhila.vidiyala@intel.com>
2020-04-29 00:08:15 +08:00
edgchen1
1356215bd0
Fix build issues in the Python Packaging pipelines. (#3725) 2020-04-28 08:41:37 -07:00
edgchen1
1bcfd49918
Merge pull request #3731 from microsoft/ettao/ort-2-master
Merge from ort_training to master
2020-04-28 07:56:05 -07:00
George Wu
6b3b4fe43e
remove warning message (#3730) 2020-04-28 03:02:34 -07:00
Jeff Bloomfield
1a11ba8a7e Merge remote-tracking branch 'upstream/master' into jeffbloo/MergeDmlDev 2020-04-28 00:45:22 -07:00
Tianlei Wu
f487cc0b28
Fix Reshape Fusion with graph inputs (#3729)
Use NodeArg to check root input; Add a check on constant initializer
2020-04-28 00:03:16 -07:00
ytaous
75c24a5fac
Revert "Merge from ort_training to master (#3719)" (#3726)
This reverts commit b990ba0059.
2020-04-27 20:42:43 -07:00
ytaous
b990ba0059
Merge from ort_training to master (#3719)
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* Set gradient as output only for easy mode (#3694)

* Support GPU Event Operators (#3653)

* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.

* Address comment and polish code

* Merge shared code between CPU and GPU kernels

* Move event test to a new file

* Address comments

* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc

* fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h

Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-04-27 16:45:21 -07:00
Dmitri Smirnov
4f887b465a
Uncomment celu test. (#3717) 2020-04-27 14:24:54 -07:00
Wei-Sheng Chin
7627e6bcc2
Improve node and node argument name generation (#3649) 2020-04-27 13:57:24 -07:00
Jeff Bloomfield
407492472f Fix build warnings and address PR comment 2020-04-27 13:21:45 -07:00
Weixing Zhang
d03c552992 fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h 2020-04-27 19:39:42 +00:00
Sherlock
635bc9cd04
Fix graph transformers to support opset 12 ops (#3715) 2020-04-27 11:53:45 -07:00
Ethan Tao
0516e7d22e Merge branch 'ort_public_ort_training' into ettao/ort-2-master 2020-04-27 18:17:17 +00:00
Prabhat
d901640817
Call optimised version of depthwise ConvLayer (#3664)
* Call optimised version of depthwise ConvLayer

* Update if statements
2020-04-27 17:41:33 +05:30
George Wu
c23b484275 add missing deps in Dockerfile.openvino 2020-04-26 22:02:48 -07:00
Jeff Bloomfield
1621a1fef1 Merge remote-tracking branch 'upstream/master' into jeffbloo/MergeDmlDev 2020-04-26 17:20:21 -07:00
Jeff Bloomfield
e51c6c0b3b Fix build warning in DmlOperatorResize.cpp and ReadbackHeap.cpp 2020-04-26 17:20:15 -07:00
Changming Sun
805ffc01e5
Temp remove --enable_wcos --use_winml from CI build (#3707)
The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first.
Will add them back before 1.3 release.
2020-04-26 16:10:25 -07:00
Jeff Bloomfield
735caecfe1 Copy disabled ONNX backend tests from WindowsAI 2020-04-26 14:47:41 -07:00
David Brownell
a917023f94
Support for country-specific holidays in the DateTimeTransformer (#3701)
* Support for country-specific holidays in the DateTimeTransformer

Updates the DateTimeTransformer featurizer to support holidays, where holiday information is read from country-specific json files.

* Addressed build breaks

* Enhanced Windows strategies for scenarios when tests run from root dir

* Skipping test for nuget installations
2020-04-26 11:12:26 -07:00
Jeff Bloomfield
02e8d10f3a Fix AdapterSessionTest 2020-04-25 20:49:51 -07:00
Tracy Sharpe
bf1caba2b2
Port MLAS to Power architecture (#3703)
Updates to MLAS to support building for the Power architecture.
2020-04-25 19:31:55 -07:00
Jeff Bloomfield
f1c19f8495 merge master 2020-04-25 19:04:58 -07:00
Jeff Bloomfield
99a0bdf271 Upgrade nuget version in dml.cmake 2020-04-25 18:48:32 -07:00
Jeff Bloomfield
8cc161aec6 Remove problematic change for dxcore.lib 2020-04-25 18:48:07 -07:00
Jeff Bloomfield
c49cc0c937 Increase DML nuget version to 0.0.2 2020-04-25 16:28:19 -07:00
edgchen1
e22d97ba56
Merge pull request #3643 from microsoft/ort_training_for_merge_to_master
Introduce ORT training implementation
2020-04-25 07:15:22 -07:00
Sheil Kumar
a475f2824d
Create the Nuget WindowsAI Pipeline (#3684)
* add windowsai.yml for new Microsoft.AI.MachineLearning nuget

* temporarily add windowsai.yml to gpu.yml

* pass in build arch

* remove install onnx task

* no dml for arm or arm64

* refactor nuget pipeline defs

* update package creation

* pass in build and sources path

* missing hyphens

* copy license file

* fix parameter variable

* disable arm builds for now

* remove commented script block

* download pipeline atifcat name update

* set working dir

* Add bundling nuget script

* path combine

* null path

* combine needs parentheses

* binplace microsoft.* dlls in new nuget package

* update artifact name

* move merged nuget to artifacts directory

* move to merged subfolder in artifacts staging dir

* forward slash to back

* enable arm

* vcvarsall needs x64 vars setup

* Run Tests

* fix tests

* move global variables

* update yml to not have global variable in template

* removed parameters

* fixes

* Add build arch as an env variable

* ne not neq

* %Var% for batch script

* dont pass argument for x64

* disable arm tests

* skip csharp/cxx tests for microsoft nuget package

* remove test-win as it tests only c# cxx and capi

* test build for store apps

* dont build for store

* tools/nuget/generate_nuspec_for_native_nuget.py

* remove args.

* add new props and targets for microsoft.ai

* make windowsai props/targets static

* add dependency

* dont ship dot net props

* Remove c# fom windowsai nuget

* copy license file

* native packages must have win10 as the platform, not win

* cuda header in wrong if branch

* no dml for arm builds

* only build dml for x64/ x86

* User/sheilk/props update (#3616)

* prelim store work

* props

* Fix desktop nuget props/targets

* clean up targets and make store apps work

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* update windowsai.yml with latest

* remove extra dloadhelpers

* Add abi headers to abi dir, and reference native includes

* update windowsai.yml

* minor update

* remove parameters

* add doesrp param

* hard code esrp to true

* add directml for x86/x64

* revert gpu yml changes

* add store builds

* add store builds

* add checks again in old way

* dup job names for store and desktop builds

* move all of the runtime binaries to win10 folder

* only set safeseh on x86

* disable the store builds for now... missing msvcprt.lib

* copy paste deletion...

* switch back to win- (#3646)

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* use stahlworks

* & not supported in ado

* add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package

* revert nocontribops

* add underscore...

* extra win/win10 change

* merged nuget... still not being bundled...

* files in merged directory

* missing parens causing dml to be included in cpu package

* more diagnostic info

* switch dir to get-childitem

* wait for compression to complete

* add winml_adapter to mkml and gpu packages

* enable_wcos

* add mklml binaries

* props and targets missing from mklml

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-24 20:20:04 -07:00
ytaous
1c484ce33f
fix test (#3700)
Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-04-24 18:09:46 -07:00
Wei-Sheng Chin
72b38f0a8b
Support GPU Event Operators (#3653)
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.

* Address comment and polish code

* Merge shared code between CPU and GPU kernels

* Move event test to a new file

* Address comments

* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc
2020-04-24 17:43:04 -07:00