Commit graph

11997 commits

Author SHA1 Message Date
Derek Murray
3e48ffd21c
Move AutoPadType to common.h (#4474)
Extracting some common code related to "AutoPadType" from the cpu execution provider into "common.h".

Motivation and Context
* Sharing code with authors of other execution providers that need the same functionality.
* I didn't modify the code in shared_library or dnnl EP to avoid changing their dependency structure, so there is still a redundant copy of the AutoPadType code in there.
2020-07-10 16:40:32 -07:00
Tianlei Wu
e96a829e84
Handle multiple embed nodes in transformer optimizer (#4471)
Handle model with multiple embed nodes:
* update embed layer norm fusion in onnxruntime
* Fix temp model path in optimizer
* Add unit test for model with multiple embed nodes.
* Add unit test for gpt2 fusion with past state and mask
* Add unit test for change input to int32
2020-07-10 15:28:27 -07:00
Ashwini Khade
6a9a9a35be
fix crashes caused by test runner (#4475)
* Fix crashes in test runner

* plus some fixes

* changes per review
2020-07-10 14:04:22 -07:00
Hariharan Seshadri
26ebcfab88
Fix Nuget GPU pipeline (#4462) 2020-07-10 14:02:28 -07:00
gwang-msft
9b4c54bcef
Enable onnxruntime_test_all for NNAPI EP (#4476) 2020-07-10 13:34:44 -07:00
edgchen1
6c7da5e9d3
Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418)
For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.
2020-07-10 10:20:23 -07:00
Prabhat
04586fc09d
Fix segmentation fault caused by invalid tensor type (#4467)
* Fix segmentation fault caused by invalid tensor type

* Addressed review comment
2020-07-10 11:23:12 +01:00
Zhang Lei
ccbf49e59f
Fix avx2 load 32 bytes buffer overrun. (#4455)
* Fix avx2 load 32 bytes buffer overrun.

* Fix qladd buffer overrun for sse2 code.

* Fix QLinearAdd buffer overrun for arm.

* Add mlas test for qladd to cover overrun and more.

* Change API to save binary space. Add more test in mlas to cover different zeropoints.
2020-07-09 15:54:31 -07:00
Yufeng Li
d4db83858b
Only quantize gather with initializer (#4469) 2020-07-09 13:33:43 -07:00
Yulong Wang
bec18eb3f4
[Node.js binding] support CentOS 7 in CI (#4447) 2020-07-09 00:59:50 -07:00
Josh Bradley
ca5af9d622
Add modern C++ standards for Ort::Value (#4367)
* add modern standards to function arguments

* code cleanup

* fix code formatting

* add element access convenience function

* change template type name to match rest of code

* remove new At() convenience function

* add better documentation message
2020-07-09 00:35:41 -07:00
Vincent Wang
7fb194d03d
Update convergence baseline for ci_test. (#4465)
Co-authored-by: Vincent Wang <weicwang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-07-09 15:29:36 +08:00
Josh Bradley
3effac2990
Experimental C++ API examples (#4358)
* Add examples

* fix build instructions for linux users

* fix header include

* update documentation
2020-07-08 23:17:50 -07:00
Yufeng Li
5dc7339be6
Add quantization tool to python package (#4458)
* Add quantization tool to python package
2020-07-08 21:42:53 -07:00
edgchen1
0ca4f7eb30
Update Git submodule cgmanifests. (#4461) 2020-07-08 19:24:03 -07:00
George Wu
f24d8e4587
fix build break from PR#2850 api change (#4451) 2020-07-08 17:02:12 -07:00
Tianlei Wu
cb5c4292b8
GPT-2 Attention Fusion without input mask (#4456)
* Allow input mask to be optional
* Add test for model without input mask and past state.
2020-07-08 15:59:57 -07:00
Wei-Sheng Chin
5222b2c6c0
Remove code which is not thread-safe. (#4454)
Because of acync access to the memory logger when using parallel executor,
ORT crashes sometime.
2020-07-08 14:27:56 -07:00
Tianlei Wu
05757b4c3c
Transformer benchmark: add option to use raw attention mask (#4446)
* Update benchmark and optimizer to add an option to use raw attention mask
* Remove temporary model in optimizer
2020-07-08 12:34:41 -07:00
Tixxx
b156ae4448
Support training_mode flag in eval (#4324)
* add training_mode feed for evaluation to support opset12
2020-07-08 10:38:54 -07:00
Negin Raoof
71aec2adcb
Custom op export test template (#4383)
* Adding pytorch custom op export tests to CI

* Test clean build

* Fix export for intended failure

* update export script

* Build onnxruntime
2020-07-08 10:14:56 -07:00
Du Li
063156d98d
IOBinding docs (#4432)
* Adding iobinding pathon docs.

* Adding iobinding pathon docs.

* Addressing PR comments.
2020-07-08 03:48:22 -07:00
Hariharan Seshadri
6d6b6b54a5
Support binding a graph output to a specific device via the Python binding (#4439) 2020-07-07 21:09:37 -07:00
Tracy Sharpe
aa06d308a6
Build new AVX file with /ARCH:AVX (#4442)
Build new file with /ARCH:AVX on Windows to ensure correct vzeroupper behavior.
2020-07-07 12:00:12 -07:00
Tiago Koji Castro Shibata
e62686c36e
Remove use of RTTI in CUDA provider (#4444) 2020-07-07 11:38:09 -07:00
Sheil Kumar
fdb4a3a2e8
Add cppwinrt and cswinrt tests in windowsai nuget pipeline (#4381)
* build e2e cppwinrt tests

* add use nuget task

* make all referenced to package version prop/target-ified

* remove dupe props/targets reference

* work around project.assets.json error by deleting it

* powershell test invocation

* switch to batch script

* print debug info

* update x86->x64

* stdio.h

* pushd/popd

* add csharp tests

* package.config -> packages.config

* typo

* x86 -> anycpu

* debug is default

* add test path

* update csproj as well

* debug

* really replace all package versions

* debug output

* really use [PackageVersion]

* sleep intead of converting async operation to task and waiting

* dont close software bitmap

* switch to powershell script

* remove binding check

* continue on failure

* continuse on error action

* continueOnError and errorActionPreference

* tabbing

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-07 09:36:42 -07:00
Yufeng Li
612f52c975
add bias for DynamicQuantizeMatmul (#4440) 2020-07-06 22:31:29 -07:00
Pranav Sharma
1f1384f8a9
Update dependency introduced by fuzzing change. (#4438) 2020-07-06 21:56:40 -07:00
Tianlei Wu
eabf6dc9ee
Add Fusion for GPT Attention with both past state and attention mask (#4437)
Add Fusion for GPT Attention with past state and attention mask
2020-07-06 19:37:37 -07:00
gwang-msft
7baf374939
Change the input to NNAPI EP ModelBuilder from ModelProto to GraphViewer (#4389)
* init version to use graph instead of model_proto for IsOpSupported

* move add to modelbuilder to use graph node

* move the rest of model_builder to use graph instead of modelproto

* remove redundant code

* Clear some redundant code

* merge master and some minor style changes

* move check if an initializer is external to individual op instead the whole graph

* Addressed comments

* Change the GetType and GetShape to log waring info inside to simplify the caller, remove some redundant onnxruntime namespace

* add squeeze op support, some more code style clean up

* fix a bug where duplicate output can be added to a subgraph, some other minor logging changes
2020-07-06 18:44:04 -07:00
EronsJ
632b2896f3
Onnxruntime fuzzing (#4341)
* Add protobuf mutator library as a git submodule

* Added files and instructions to build the protobuf mutator library in CMake

* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'

* Added src files and build instructions for the main fuzzing engine

* Removed Random number generation test from inside the engine

* Added license header to files

* Removed all pep8 violations introduced by this change and other E501 violations
2020-07-06 16:34:34 -07:00
Cecilia Liu
ec35a1b514
Remove unused initializer in graph after embed fusion (#4436) 2020-07-06 16:04:02 -07:00
Tracy Sharpe
3ef449816c
MLAS: support prepacking APIs for quantized GEMM (#4433)
Add support for prepacking matrix B for use in the quantized GEMMs.
2020-07-06 15:20:10 -07:00
Ashwini Khade
dd73e8c016
add function initialization back to graph resolve (#4434) 2020-07-06 15:17:27 -07:00
liqunfu
0fdb1e9f60
Liqun/roberta (#4408)
add GLUE Roberta example, fix unused initializer issue at backend. Bert GLUE expected out updated due to graph changes between June29 to July1st
2020-07-06 09:19:30 -07:00
Christian Goll
3588484336
use system libnsync (#4377)
* use system libnsync
2020-07-06 07:53:22 -07:00
ISS Build Account
a44eb7dd08 Merge remote-tracking branch 'upstream/master' into DmlDev 2020-07-06 12:32:14 +00:00
KeDengMS
77cf51b13c
Fix symbolic_shape_infer for Resize with roi (#4426)
Should only apply roi when coordinate_transformation_mode == tf_crop_and_resize
2020-07-05 23:37:36 -07:00
jornt-xilinx
0d4a65eede
Fix Vitis-AI EP for memory info into IAllocator move (#4404) 2020-07-05 09:00:26 +10:00
pengwa
8bcdefc9c1
Optimize GatherND (#4097)
* Optimize GatherND
* Refine the code, Fix few comments
2020-07-03 19:42:32 +08:00
Weixing Zhang
bd11ab6816
Optimize LayernormGrad (#4156)
* Draft for LayerNorm Optimization

* Modify LayernormGrad kernel based on new backward graph.

* keep two LayernormGrad implementations.

One is implemented based on input X, mean. The other is based on output Y, scale, bias. The first one is enabled by default. The second one can be enabled by --use_invertible_layernorm_grad

* expose use_invertible_layernorm_grad to frontend.

* add fp16 tests.

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-07-02 22:09:30 -07:00
Weixing Zhang
33e06be4ac
optimize transpose CUDA kernel (#4233)
* optimize transpose

* optimize for the case when the tensor is 3D and the permutation is done in last two dimension.

BERT-L throughput is improved ~1.4% from transpose optimization

* fix UT MegatronSelfAttentionPartitionCorrectnessTest

* polish code.

* add test and change tile size to 16x16 for better perf.

* fix UT

* fix test of mask_rcnn

* address code review comments.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-07-02 22:05:32 -07:00
edgchen1
dba22b17b4
Update BiasGeluGradDxKernel and tests. (#4400)
For BiasGeluGradDxKernel:
- Implement optimization to first load from global memory into registers as suggested by Weixing.
- Support larger bias sizes which were previously limited by the number of threads per block.
- Address flaky unit test by increasing the error tolerance to the default value.
2020-07-02 18:55:44 -07:00
Tracy Sharpe
93d4964727
Use single OpKernel for u8u8 and u8s8 types (#4414)
Combine kernels for u8u8 and u8s8 types.
2020-07-02 18:23:58 -07:00
Pranav Sharma
4df8a1e240
Use the file size while reading onnx models. Ensure models are loaded using APIs in model.h for consistency. (#4399)
* Use the file size while reading onnx models. Ensure models are loaded using APIs in model.h for consistency.

* Refactor existing GetFileLength in posix.cc and address PR comments.

* Fix linux build - signed/unsigned conversion
2020-07-02 17:30:53 -07:00
Scott McKay
d22f6fddf7
Add ability to specify just the device when using IOBinding for an output (#4386)
* Add ability to specify just the device when using IOBinding for an output. This enables keeping an output on a different device GPU when it has a dynamic size that is not known ahead of graph execution.
2020-07-03 09:26:47 +10:00
Vincent Wang
28e4c0edf5
Keep loss_scale and Whole Loss Subgraph in FP32 during Mixed Precision Training (#4268)
* Keep loss subgraph as FP32 when mixed-p training.

* Fix case where there is no white-list loss op.

* Get nodes from loss_scale instead of whitelist.

* rename const variables.

Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-07-03 06:54:56 +08:00
suffiank
7a05b3ca87
Increase python packaging pipeline timeout (#4412)
* increase python packaging pipeline from 90 to 110 min

* change timeout to Linux GPU and do 120 min to match Win GPU
2020-07-02 15:38:39 -07:00
Yufeng Li
67a7d93b49
Fuse MatMulInteger and scale followed (#4350)
* Fuse MatMulInteger and scale followed

* Add bias
2020-07-02 13:08:21 -07:00
Tiago Koji Castro Shibata
10c25416bb
Remove use of RTTI in CUDA provider (#4410) 2020-07-02 12:44:17 -07:00