Commit graph

3953 commits

Author SHA1 Message Date
Edward Chen
cd3a5acca0
Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
2020-12-18 19:01:02 -08:00
Derek Murray
11b0a5401e
Fix typo in BERT pretraining script (#6175)
A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.
2020-12-18 16:38:14 -08:00
Guoyu Wang
bbb52e9274
[NNAPI EP] Enable per-channel quantization for QlinearConv (#6155)
* Enable qlinearconv per-channel quantization

* Fix the android CI test failure

* Add Android Version Check for Per-Channel Quant

* Address PR comments

* Fix some minor issues

* Add verification of per-channel zero points

* Make the error tolerance configurable
2020-12-18 16:13:22 -08:00
baijumeswani
39aedbc97f
aggregate model states only for the case when mixed precision was true (#6176) 2020-12-18 14:09:32 -08:00
Pranav Sharma
86493e6d0c
Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) 2020-12-18 02:00:42 -08:00
Sergii Dymchenko
824ef9a1de
Don't try to bind unused inputs in the Training frontend (#6166) 2020-12-17 21:41:28 -08:00
baijumeswani
adc2071043
save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)
* save_checkpoint and load_checkpoint implementations

* checkpoint aggregation logic

* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints
2020-12-17 21:01:36 -08:00
Guoyu Wang
c339bb2da9
Remove ignored build warnings for pybind on Mac (#6165) 2020-12-17 19:54:28 -08:00
Yufeng Li
98d8a3e335
Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)
This reverts commit f2dcba7afe.
2020-12-17 19:53:50 -08:00
Du Li
34725ae520
Bugfix for topk cuda kernel (#6164)
* fix the issue that std::numeric_limits cannot handle half type

* adding a test

Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-17 17:59:37 -08:00
Jay Rodge
dec703b62d
Update TensorRT-ExecutionProvider.md (#6161) 2020-12-17 17:10:40 -08:00
Tixxx
32c67c2944
Deprecating Horovod and refactored Adasum computations (#5468)
deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests
2020-12-17 16:21:33 -08:00
Pranav Sharma
efa1b0d864
Minor fix to satisfy c++14 (#6162) 2020-12-17 13:53:24 -08:00
Juliana Franco
36c03b32e9
Using a map of of ops to stages as input of partition function. (#5940)
* New partition algorithm running before AD

* Convert cut_group_info into device map. Work in progress -- works for  bert-tiny with pp=2

* Removing code for partition of bwd graphs

* Remove old code

* Adding some verification code

* Handle Shared Initializer

* Renaming rank with stage

* Added first unit test

* new test

* redundant check

* undo change in bert

* Moved cut-based partition to testing utils file

Co-authored-by: xzhu1900
Co-authored-by: wschin

* New conversion function and tests

* minor

* remove test that is not needed2

* improve GetDeviceAssignment and PR comments

* minor changes

* PR comments

* improving documentation and variable naming

* add documentation

* Variable naming and docs

* more doc improvements

* more doc improvements

* missing static cast

* Fix test file for windows

* Fix test file for windows

* Fix test file for windows

* stage id is not the same as rank id

* PR comments

* PR comments

* More comments

* More comments
2020-12-17 09:03:33 -08:00
Tracy Sharpe
503b61d897
MLAS: add NEON version of int8 depthwise convolution (#6152) 2020-12-16 18:39:10 -08:00
Edward Chen
0fa04bdc50
Fix clean_docker_image_cache.py detection of image pushes. (#6151)
Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.
2020-12-16 17:25:22 -08:00
Changming Sun
344a2a8ee5
Revert "work around of the build break in mac (#6069)" (#6150)
This reverts commit 3cae28699b.
2020-12-16 14:41:18 -08:00
Scott McKay
7250562271
Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)
#4656
2020-12-17 07:52:31 +10:00
ashbhandare
82690486c1
Partition initial optimizer state for Zero-1 (#6093)
* Initial changes

* Working changes

* Working changes

* Cleanup

* fix windows CI

* Review comments

* review comments
2020-12-16 15:27:42 -05:00
Derek Murray
8fd085801a
Add gradient registration for Abs. (#6139) 2020-12-16 08:32:10 -08:00
stevenlix
aa49e476b0
Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)
* add static subgraph kernel index

* change kernel naming to avoid conflicts
2020-12-16 00:04:53 -08:00
Yateng Hong
0978d2bfbe
Fix CUDA test hang: (#6138)
- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.
2020-12-16 16:32:56 +10:00
Guoyu Wang
b648bf641f
nnapi add min max support (#6117) 2020-12-15 22:31:28 -08:00
George Nash
939cc9b410
Enable running the mnist_training sample without cuda (#6085)
Signed-off-by: George Nash <george.nash@intel.com>
2020-12-15 17:06:54 -08:00
Ryan Hill
ac62cf8058
Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)
* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.
2020-12-15 16:45:53 -08:00
Cecilia Liu
980a93c164
Model Fusion For Bart (#6105)
Fusion fix for Bart models
2020-12-15 14:30:15 -08:00
George Wu
297c824807
remove dnnl_dll_path from post build copy (#6142) 2020-12-15 13:47:39 -08:00
Edward Chen
64709b1335
Deprecate Python global configuration functions [Part 1] (#5923)
Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.
2020-12-15 11:32:43 -08:00
Jesse Benson
a8d549e181 Minor changes to AMD element-wise kernels to converge with CUDA element-wise kernels. 2020-12-15 08:46:36 -08:00
Pranav Sharma
a9548283d0
Don't mark issues that are marked as enhancement as stale (#6134) 2020-12-14 18:57:40 -08:00
Edward Chen
9810b9e02b
Reduce amount of compiled CUDA device code (#6118)
Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight.

Make corresponding changes for ROCM execution provider code.

Other minor cleanup.
2020-12-14 15:27:40 -08:00
Sheil Kumar
a6a23db130
Enable C# .NET5 for WinML (#6120)
* build for .net5

* only reference cswinrt for .net5

* remove netstandard2.0 references

* upgrade language version

* net5

* remove extra comment closure

* add targetframework

* set target framework

* remove net*

* pep8 errors

* make test project build with .net windows SDK projection

* disable c# builds for non-x64 builds

* fix pep8 errors

* disable for store build

* fix tests

* remove cswinrt and sdk references from package

* bump cswinrt down to 1.0.1

* fix bin path

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-12-14 15:05:15 -08:00
Sherlock
eb5c1f0fcc
Unify activation and initializer alignment value (#6109)
* Unify activation and initializer alignment value

* Fix VerifyInputTensorsAllocatedContiguously
2020-12-14 13:13:41 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
baijumeswani
dd2e5a1a05
state_dict and load_state_dict for ORTTrainer (#6095)
* add functions state_dict and load_state_dict to ORTTrainer

* unit tests for state_dict and load_state_dict for ORTTrainer
2020-12-14 11:55:52 -08:00
dependabot[bot]
d4dddd99d9 Bump ini from 1.3.5 to 1.3.8 in /nodejs
Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](https://github.com/isaacs/ini/compare/v1.3.5...v1.3.8)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-12 13:06:43 -08:00
Hariharan Seshadri
c755ca0b71
Honor auto_pad attribute in ConvTranspose (#4271) 2020-12-11 22:30:17 -08:00
Suffian Khan
6cb5d3ac09
Fix multi-tensor LAMB reduction to be deterministic (#6028)
* define ordering of reduction across blocks

* save state

* remove debug code

* remove debug code

* review comments

* significant correction for reduction only over blocks on same tensor

* addressing ocmments

* update rocm/lamb.cc to build as well

* remove times 2048*size in multitensor test until threshold error in rocm resolved

* convert tuple => struct as per recomendation

* update comment

* apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer

* remove excess template arguments from rocm lamb.cc launch_multitensor as well

* fixes for AMD build

* pr comments

* run formatter from vscode

* formatter on cuda files
2020-12-11 13:13:05 -08:00
Edward Chen
c8ac34d6a5
Fix DEBUG_NODE_INPUTS_OUTPUTS test by putting it in a separate process, clean up unused test_main.cc files. (#5949)
Move the DEBUG_NODE_INPUTS_OUTPUTS test into its own process. The implementation uses static variables which do not interact well with other tests.
Clean up old test_main.cc files which are no longer used.
2020-12-11 11:36:58 -08:00
Sherlock
a53f4dd379
Introduce VariadicAlias, remove hardcoded alias limits (#6106)
* Introduce VariadicAlias, remove hardcoded alias limits

* Include optional-lite in winml build

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-11 10:47:08 -08:00
Jesse Benson
38c49c2483 Make ROCM and CUDA reduction_all code more similar. 2020-12-11 09:35:07 -08:00
Ryan Lai
1eb146f561
Implement conversion from ORT String to WinML Tensor String (#6097)
* Implement conversion from ort string to winml string

* NIT:comment
2020-12-10 17:47:50 -08:00
Ryan Lai
8bcb5fd119
Add skip test reason for onnx model zoo models and tier 2 models (#6081) 2020-12-10 14:41:17 -08:00
Ryan Lai
753af576c4
If building inbox, hook up winrt_activation_handler for WinML Tests (#6074)
* If building inbox, hook up winrt_activation_handler with what is already defined in in dllload.cpp

* Add base.h header

* Missed custom ops test
2020-12-10 14:41:01 -08:00
Du Li
e945b5fcf6
adding fp16 support for topk cuda kernel (#6082)
* adding fp16 support for topk.

* disable fp16 tests for cpu ep

Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-10 11:04:19 -08:00
Vincent Wang
7ddeafdfcc
Add ReduceL2Grad and ClipGrad (#5970)
* ReduceL2Grad and ClipGrad.

* fix win build and amd ci pipeline

* resolve comments.

Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-12-10 11:03:26 +08:00
RandySheriffH
404982ded5
Enable varied input type for custom op (#6066)
* allow custom op taking varied types

* refactor test case

* add test model

* refactor test case

* enable copy elision

* update test case

* fix issue in ToString function
2020-12-09 15:10:42 -08:00
Jesse Benson
cc47cfcb31 Update AMD transpose to match CUDA transpose. 2020-12-09 11:00:18 -08:00
Edward Chen
abdbb5fc84
Reduction kernel optimization (#6088)
Optimize reduction kernel code by moving loads from global memory before computation.
Add CMake option to build CUDA code with --generate-line-info option.
2020-12-09 10:20:23 -08:00
Sergii Dymchenko
9e26e59a37
Deprecate opsets <12 for training. (#6027) 2020-12-09 00:15:27 -08:00