Commit graph

3433 commits

Author SHA1 Message Date
Yufeng Li
61ba5b501a
Fix bug in the back to back quantization of matmul and conv (#5264)
* fix bug in the back to back quantization of matmul and conv

* fix bug in back to back gather
2020-09-23 08:47:20 -07:00
George Wu
b5a6a8e847
remove implicit linking of tensorrt and dnnl ep shared libs (#5262)
* remove trt and dnnl from link command

* add comment
2020-09-23 05:47:18 -07:00
Dwayne Robinson
6ea66b43db
ORT DirectML EP for Iron release, ONNX 1.5 (part 2) (#5263)
* Merged PR 5195856: Fix broken cases of zero size tensors in Cast/Reduce

 MaskRCNN failed when `Cast` tried to execute `Xor` with emptiness (zero in dimensions). This is perfectly legal and should be treated as a nop.

Ultimately DML itself should treat this case as a nop, just like how C's `memcpy` treats 0 count as a nop, but I'm just addressing it in ORT now, as enabling it in DML would impact more operators to be consistent (probably should incrementally add a flag to tensor validation so operators can be opted in gradually).

Corresponding WindowsAI PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/5195850

Related work items: #27469839, #28761382

* Merged PR 5201369: Remove copy of initializers added in DMLXP refactor

When used in ORT, a common method shouldn't copy and return initializer data

Related work items: #29514403

Co-authored-by: Justin Stoecker <justoeck@microsoft.com>
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
2020-09-23 01:56:19 -07:00
Hariharan Seshadri
75d994f194
Handle zero norm values in LpNormalization CPU kernel (#5251) 2020-09-22 22:01:09 -07:00
Adam Pocock
d26c71f55c
[java] Fixing the buffer semantics. (#5223)
* [java] Fixing the buffer semantics.
* Renaming bufferCapacity to bufferRemaining.
* Adding a cast to char* so the pointer arithmetic works on Windows.
2020-09-22 21:29:01 -07:00
Scott McKay
c52561d044
Rework broadcasting setup to decrease binary size. (#5227)
* Rework broadcasting setup to decrease binary size. Push all the type specific down and separate out the broadcasting/parallelization.

Reductions:
element_wise_ops: 521.0KB -> 268.8KB
where: 25.8 KB -> 17.3 KB
qlinear_binary_op: 28.1 -> 12.8
2020-09-23 14:15:40 +10:00
Changming Sun
43faf9e388
Disable a few tests that run too long(1 hour) in debug mode (#5257) 2020-09-22 21:06:24 -07:00
Tianlei Wu
3bbce69185
bump version to 1.5.1 (#5258) 2020-09-22 20:57:34 -07:00
Jeff Bloomfield
59e69bf35b
Handle missing initializers in allocation planner to fix crashes with DML provider (#5244)
* Fix memory planning bug with DML EP

* Address PR comments

* Fix typo
2020-09-22 19:37:07 -07:00
Ye Wang
898531f502
Fix reshape fusion crash (#5252)
* fix reshape fusion crash

* handling start_node statelessly

* fix
2020-09-22 15:04:13 -07:00
Guoyu Wang
e30530d9ea
Add java API for AddSessionConfigEntry (#5241)
* Add session option config entry API for java

* Java format

* Add extra test verification

* Address PR comments

* Update comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-22 14:51:39 -07:00
KeDengMS
8dceebda0e
[Training/Python] Add option to enable symbolic shape inference (#5107)
This change adds symbolic shape inference to ORT training which helps static memory planning for model like BART.
2020-09-22 10:49:07 -07:00
edgchen1
14f250a4d0
Update BUILD.md training dependency info. (#5240)
Update training dependency versions based on Dockerfile.training.
2020-09-22 10:36:04 -07:00
Guoyu Wang
d957dbebea
Fix possible ios build break after update to Xcode 12 (#5246)
* Fix possible ios build break after update to Xcode 12

* Address comments
2020-09-22 07:42:54 -07:00
suffian khan
417929b049 jobs timeout .. 2020-09-21 21:51:59 -07:00
suffian khan
a6eb90472c try fix error on code coverage ci build 2020-09-21 21:51:59 -07:00
Sherlock
1478643215
Place Shape's output in CPU memory (#5245)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-21 20:21:59 -07:00
Sherlock
038192bdb2
Place shape related compute nodes in CPU (#4940)
* Place shape related nodes in CPU
* visit candidates by topological order
* Make CPU node placement a utility function
* skip placing on CPU if the data typs is float16 or bfloat16
2020-09-21 17:10:39 -07:00
Changming Sun
0cb09374c6
Update BUILD.md for CUDA versions (#5239) 2020-09-21 15:28:53 -07:00
George Wu
3147bc00c3
update TensorRT docs (#5238)
* doc updates TensorRT

* update

* update

* fix warning

* newline

* format
2020-09-21 15:24:20 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
liqunfu
84c222126c
Deprecate testMNISTTrainingAndTestingOpset10 (#4927)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-21 14:17:08 -07:00
Pranav Sharma
974b9bfc09
Allow sharing of initializers between sessions. (#5092)
* Allow sharing of initializers between sessions.

* Allow sharing of initializers between sessions (2).

* Add test for C#

* Add test for C#; address PR comments

* Address PR comments
Moved AddInitializer logic to internal session options
Added tests for owned buffer
Clarified documentation
Fix bug where memory info and not device was getting compared

* Fix test

* Fix training build

* Add ver 5 end marker and ver 6 starter, add scenario and usage examples.
2020-09-21 14:09:37 -07:00
Scott McKay
e0719a1073
Revert to using release SafeInt repo now that it supports a build with exceptions disabled. (#5233) 2020-09-22 06:29:28 +10:00
edgchen1
e9671e93f0
Fix TransposeScaleMatMul and MatMulScaleFusion issues (#5230)
- Rename TransposeScaleMatMul back to TransposeMatMul for backwards compatibility
- Fix MatMulScaleFusion issues:
  - Add check for supported execution providers
  - Add check for supported MatMul input types
2020-09-21 12:34:01 -07:00
Ye Wang
65740deb10
Fix a bug in EmbedLayerNorm fusion (#5150)
* fix embedlayernorm bug

* review comments

* interim checkin

* review comments

* Fix core dump in MacOS

* remove unnecessary lines

* update document

* Update graph_utils.cc

* Update onnx_exporter.py

* resolve comments
2020-09-21 12:26:14 -07:00
stevenlix
aefb2cc49b
Create profile for all dynamic shape input tensors (#5229) 2020-09-20 05:55:21 -07:00
Tiago Koji Castro Shibata
cd663d58f5
Fix WinML warnings (#5228) 2020-09-19 12:41:42 -07:00
Guoyu Wang
78a29aebbc
[ORT Mobile] ORT Minimal E2E CI (#5200)
* Modify the ort minimal CI to ort minimal e2e ci
2020-09-19 18:43:22 +10:00
Dmitri Smirnov
8ee4e8226e
Preserve relative order of the results and the tests. (#5225) 2020-09-19 00:45:44 -07:00
Weixing Zhang
b49f6a5e2c
using GPU_WARP_SIZE to make kernel portable between AMD and Nvidia GPU (#5173) 2020-09-18 14:56:16 -07:00
Suffian Khan
84589c7e05
Fuse softmax(a + b) in case of simple broadcast (#4937)
* bias softmax kernel

* bias softmax kernel

* remove debug comments

* remove debug comment

* windows build doesnt handle unary minus on unsigned type

* int64 => int treated as error

* only support cuda

* add bias softmax fusion tests

* PR comments

* more PR comments

* use MLTypeCallDispatcher

* break function into pieces

* add loop unroll and add to list for inference as well

* use std::min and move operator==

* revert std::min (doesnt work ci pipeline) and fix int to size_t error

* pr comments

* fixes for windows ci

* fix for windows ci

* pr comments on consistency

* p_model_

* fix formatting and add anonymous namespace

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-18 14:15:55 -07:00
Tang, Cheng
e0b49844e9
Provide option to let layernorm stash mean/var as fp32 or bfloat16 (#5215)
* add option to set layernorm stash type

* bug fix

* fix merge error

* fix win build error
2020-09-18 13:42:01 -07:00
Dmitri Smirnov
a90ab12589
Refactor onnx_test_runner (#5169)
Refactor onnx_test_runner for better object ownership, code readability and maintainability.
2020-09-18 13:19:35 -07:00
Ryan Hill
13318ab0d4
Remove invalid install line (#5219) 2020-09-18 11:58:40 -07:00
Shucai Xiao
a632dd2d3b
Amdmigraphx improvements (#5158)
* code backup

* remove unnecessary log info

* code backup

* code backup

* merge changes from master branch

* code backup

* code backup

* merge changes from master branch

* code backup

* code backup for constant folding enhancement

* code backup

* include more scenarios for constant folding

* code backup

* remove unnecessary code

* remove unnecessary log information

* fix an error in comments

* update algorithm to do graph partition

* code backup

* remove unnecessary log information

* remove an unused function

* remove unnecessary changes
2020-09-18 11:56:50 -07:00
Weixing Zhang
f91248e0cc
remove curand_generator_ related code since it is not used. (#5220) 2020-09-18 11:50:35 -07:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
RRRachelllll555
f7c1e51810
Remove shape inference and fix save large model(>2g) issue (#5210)
* remove shape inference and fix save large model problem

* remove unnecessary import

* refine code and add external format for quantize_qat

* remove initializers in tensors_to_calibrate

* small refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-18 08:46:31 -07:00
Scott McKay
c46a480306
Update conversion script and process to simplify creating ORT format models and a minimal build (#5217)
* Update conversion script and process to simplify creating ORT format models and a minimal build.
2020-09-18 18:49:54 +10:00
George Wu
1b61dfaf69
fix _WIN32 (#5218) 2020-09-18 00:23:17 -07:00
Pranav Prakash
f5df96256c
Fix order of returned values in quantize_weight_per_channel (#5205)
Must match returned order of `quantize_inputs`
2020-09-17 17:57:46 -07:00
liqunfu
f37e1292a1
--shm-size=1024m to fix nccl shared memory issue (#5214)
* --shm-size=256m to fix nccl shared memory issue

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 17:21:47 -07:00
Guoyu Wang
8156e0dd10
[ORT Mobile] Some updates to iOS/Android build settings (#5184)
* Update android CI and build settings

* add build_java to arm64 also

* Add ios signing param

* fix a small build warning

* address pr comments
2020-09-17 15:53:14 -07:00
Tracy Sharpe
8698157112
NCHWc optimizer fixes for quantized models (#5203)
This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.
2020-09-17 09:52:21 -07:00
Pranav Sharma
d535894297
Add API to allow configuration of the global thread pools. (#5199) 2020-09-17 09:19:18 -07:00
Suffian Khan
e01e0b2e40
Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160)
* register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_*

* forgot to remove axis check

* add tests all axis

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 07:15:25 -07:00
S. Manohar Karlapalem
584638e5d3
Corrects doc typos and formatting (#5201) 2020-09-17 01:25:19 -07:00
Zhang Lei
cd0386b649
MaxPool versioning in quantization tools. (#5194)
MaxPool versioning in quantization tools.
2020-09-16 22:52:24 -07:00
Ryan Hill
b11c106346
Remove almost all of the reinterpret_casts from the provider shared API (#5190) 2020-09-16 17:00:15 -07:00