Commit graph

3959 commits

Author SHA1 Message Date
Suffian Khan
67ac6ae4e0
Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)
* tune fast gelu to use exp(x) instead of tanh(x) on rocm

* update to use expression 2/(1+exp(-2x))-1 for stability
2020-12-21 16:25:21 -08:00
Weixing Zhang
53307a5f2e
improve perf for softmax (#6128)
* improve perf for both gathergrad and softmax

* revert the change in gathergrad and will be done in another PR.

* address comments from code review.
2020-12-21 14:15:54 -08:00
S. Manohar Karlapalem
ea9cfa554a
Add usage details of unified MCR container image (#6182)
Going forward, a single unifed docker image will be published in
MCR. The hardware accelerator target choice will have to be made
in the application using OpenVINO EP's runtime config options.
2020-12-21 11:48:54 -08:00
satyajandhyala
201d0dbb1a
Android coverage dashboard (#6163)
* Write the report to a file.

* Post code coverage to the Dashboard database.
2020-12-21 10:34:01 -08:00
jingyanwangms
f874260b9e
Backend APIs for checkpointing (#5803)
* Add backend API GetOptimizerState and GetModelState

* add GetPartitionInfoMap
2020-12-21 08:21:29 -08:00
Scott McKay
2da8060f34
Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156)
* Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions.

Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach).

* Restructure the helper so it can be called across the EP bridge.
Add ability to call id generation helper from EP bridge
  - convert DNNL EP to use helper to validate
Address issue where a new Model may be loaded into the same address as a previous one.
  - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model
Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time.
  - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.
2020-12-21 12:17:58 +10:00
Edward Chen
cd3a5acca0
Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
2020-12-18 19:01:02 -08:00
Derek Murray
11b0a5401e
Fix typo in BERT pretraining script (#6175)
A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.
2020-12-18 16:38:14 -08:00
Guoyu Wang
bbb52e9274
[NNAPI EP] Enable per-channel quantization for QlinearConv (#6155)
* Enable qlinearconv per-channel quantization

* Fix the android CI test failure

* Add Android Version Check for Per-Channel Quant

* Address PR comments

* Fix some minor issues

* Add verification of per-channel zero points

* Make the error tolerance configurable
2020-12-18 16:13:22 -08:00
baijumeswani
39aedbc97f
aggregate model states only for the case when mixed precision was true (#6176) 2020-12-18 14:09:32 -08:00
Pranav Sharma
86493e6d0c
Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) 2020-12-18 02:00:42 -08:00
Sergii Dymchenko
824ef9a1de
Don't try to bind unused inputs in the Training frontend (#6166) 2020-12-17 21:41:28 -08:00
baijumeswani
adc2071043
save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)
* save_checkpoint and load_checkpoint implementations

* checkpoint aggregation logic

* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints
2020-12-17 21:01:36 -08:00
Guoyu Wang
c339bb2da9
Remove ignored build warnings for pybind on Mac (#6165) 2020-12-17 19:54:28 -08:00
Yufeng Li
98d8a3e335
Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)
This reverts commit f2dcba7afe.
2020-12-17 19:53:50 -08:00
Du Li
34725ae520
Bugfix for topk cuda kernel (#6164)
* fix the issue that std::numeric_limits cannot handle half type

* adding a test

Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-17 17:59:37 -08:00
Jay Rodge
dec703b62d
Update TensorRT-ExecutionProvider.md (#6161) 2020-12-17 17:10:40 -08:00
Tixxx
32c67c2944
Deprecating Horovod and refactored Adasum computations (#5468)
deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests
2020-12-17 16:21:33 -08:00
Pranav Sharma
efa1b0d864
Minor fix to satisfy c++14 (#6162) 2020-12-17 13:53:24 -08:00
Juliana Franco
36c03b32e9
Using a map of of ops to stages as input of partition function. (#5940)
* New partition algorithm running before AD

* Convert cut_group_info into device map. Work in progress -- works for  bert-tiny with pp=2

* Removing code for partition of bwd graphs

* Remove old code

* Adding some verification code

* Handle Shared Initializer

* Renaming rank with stage

* Added first unit test

* new test

* redundant check

* undo change in bert

* Moved cut-based partition to testing utils file

Co-authored-by: xzhu1900
Co-authored-by: wschin

* New conversion function and tests

* minor

* remove test that is not needed2

* improve GetDeviceAssignment and PR comments

* minor changes

* PR comments

* improving documentation and variable naming

* add documentation

* Variable naming and docs

* more doc improvements

* more doc improvements

* missing static cast

* Fix test file for windows

* Fix test file for windows

* Fix test file for windows

* stage id is not the same as rank id

* PR comments

* PR comments

* More comments

* More comments
2020-12-17 09:03:33 -08:00
Tracy Sharpe
503b61d897
MLAS: add NEON version of int8 depthwise convolution (#6152) 2020-12-16 18:39:10 -08:00
Edward Chen
0fa04bdc50
Fix clean_docker_image_cache.py detection of image pushes. (#6151)
Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.
2020-12-16 17:25:22 -08:00
Changming Sun
344a2a8ee5
Revert "work around of the build break in mac (#6069)" (#6150)
This reverts commit 3cae28699b.
2020-12-16 14:41:18 -08:00
Scott McKay
7250562271
Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)
#4656
2020-12-17 07:52:31 +10:00
ashbhandare
82690486c1
Partition initial optimizer state for Zero-1 (#6093)
* Initial changes

* Working changes

* Working changes

* Cleanup

* fix windows CI

* Review comments

* review comments
2020-12-16 15:27:42 -05:00
Derek Murray
8fd085801a
Add gradient registration for Abs. (#6139) 2020-12-16 08:32:10 -08:00
stevenlix
aa49e476b0
Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)
* add static subgraph kernel index

* change kernel naming to avoid conflicts
2020-12-16 00:04:53 -08:00
Yateng Hong
0978d2bfbe
Fix CUDA test hang: (#6138)
- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.
2020-12-16 16:32:56 +10:00
Guoyu Wang
b648bf641f
nnapi add min max support (#6117) 2020-12-15 22:31:28 -08:00
George Nash
939cc9b410
Enable running the mnist_training sample without cuda (#6085)
Signed-off-by: George Nash <george.nash@intel.com>
2020-12-15 17:06:54 -08:00
Ryan Hill
ac62cf8058
Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)
* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.
2020-12-15 16:45:53 -08:00
Cecilia Liu
980a93c164
Model Fusion For Bart (#6105)
Fusion fix for Bart models
2020-12-15 14:30:15 -08:00
George Wu
297c824807
remove dnnl_dll_path from post build copy (#6142) 2020-12-15 13:47:39 -08:00
Edward Chen
64709b1335
Deprecate Python global configuration functions [Part 1] (#5923)
Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.
2020-12-15 11:32:43 -08:00
Jesse Benson
a8d549e181 Minor changes to AMD element-wise kernels to converge with CUDA element-wise kernels. 2020-12-15 08:46:36 -08:00
Pranav Sharma
a9548283d0
Don't mark issues that are marked as enhancement as stale (#6134) 2020-12-14 18:57:40 -08:00
Edward Chen
9810b9e02b
Reduce amount of compiled CUDA device code (#6118)
Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight.

Make corresponding changes for ROCM execution provider code.

Other minor cleanup.
2020-12-14 15:27:40 -08:00
Sheil Kumar
a6a23db130
Enable C# .NET5 for WinML (#6120)
* build for .net5

* only reference cswinrt for .net5

* remove netstandard2.0 references

* upgrade language version

* net5

* remove extra comment closure

* add targetframework

* set target framework

* remove net*

* pep8 errors

* make test project build with .net windows SDK projection

* disable c# builds for non-x64 builds

* fix pep8 errors

* disable for store build

* fix tests

* remove cswinrt and sdk references from package

* bump cswinrt down to 1.0.1

* fix bin path

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-12-14 15:05:15 -08:00
Sherlock
eb5c1f0fcc
Unify activation and initializer alignment value (#6109)
* Unify activation and initializer alignment value

* Fix VerifyInputTensorsAllocatedContiguously
2020-12-14 13:13:41 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
baijumeswani
dd2e5a1a05
state_dict and load_state_dict for ORTTrainer (#6095)
* add functions state_dict and load_state_dict to ORTTrainer

* unit tests for state_dict and load_state_dict for ORTTrainer
2020-12-14 11:55:52 -08:00
dependabot[bot]
d4dddd99d9 Bump ini from 1.3.5 to 1.3.8 in /nodejs
Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](https://github.com/isaacs/ini/compare/v1.3.5...v1.3.8)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-12 13:06:43 -08:00
Hariharan Seshadri
c755ca0b71
Honor auto_pad attribute in ConvTranspose (#4271) 2020-12-11 22:30:17 -08:00
Suffian Khan
6cb5d3ac09
Fix multi-tensor LAMB reduction to be deterministic (#6028)
* define ordering of reduction across blocks

* save state

* remove debug code

* remove debug code

* review comments

* significant correction for reduction only over blocks on same tensor

* addressing ocmments

* update rocm/lamb.cc to build as well

* remove times 2048*size in multitensor test until threshold error in rocm resolved

* convert tuple => struct as per recomendation

* update comment

* apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer

* remove excess template arguments from rocm lamb.cc launch_multitensor as well

* fixes for AMD build

* pr comments

* run formatter from vscode

* formatter on cuda files
2020-12-11 13:13:05 -08:00
Edward Chen
c8ac34d6a5
Fix DEBUG_NODE_INPUTS_OUTPUTS test by putting it in a separate process, clean up unused test_main.cc files. (#5949)
Move the DEBUG_NODE_INPUTS_OUTPUTS test into its own process. The implementation uses static variables which do not interact well with other tests.
Clean up old test_main.cc files which are no longer used.
2020-12-11 11:36:58 -08:00
Sherlock
a53f4dd379
Introduce VariadicAlias, remove hardcoded alias limits (#6106)
* Introduce VariadicAlias, remove hardcoded alias limits

* Include optional-lite in winml build

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-11 10:47:08 -08:00
Jesse Benson
38c49c2483 Make ROCM and CUDA reduction_all code more similar. 2020-12-11 09:35:07 -08:00
Ryan Lai
1eb146f561
Implement conversion from ORT String to WinML Tensor String (#6097)
* Implement conversion from ort string to winml string

* NIT:comment
2020-12-10 17:47:50 -08:00
Ryan Lai
8bcb5fd119
Add skip test reason for onnx model zoo models and tier 2 models (#6081) 2020-12-10 14:41:17 -08:00
Ryan Lai
753af576c4
If building inbox, hook up winrt_activation_handler for WinML Tests (#6074)
* If building inbox, hook up winrt_activation_handler with what is already defined in in dllload.cpp

* Add base.h header

* Missed custom ops test
2020-12-10 14:41:01 -08:00