Commit graph

3604 commits

Author SHA1 Message Date
Sherlock
694a4d6413
Add more loggings for GradientBuilder (#5556)
* Add more loggings for GradientBuilder

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-26 15:15:52 -07:00
edgchen1
68fe722691
GatherGrad optimization (#5524)
The existing implementation of the GatherGrad CUDA kernel does not do work in a very parallel manner for certain inputs which can lead to poor performance.

The computation essentially involves multiple summations. The values are gathered from the input and the sums are scattered to the output.

Previously, each sum was computed by a single thread. If there is an instance of a summation of a large number of values, it can significantly impact the overall kernel execution time.

The updated version has an alternate implementation which splits the sums into partial sums which get accumulated together later. This allows for more parallelism. A significant downside is that the alternate implementation requires CPU and GPU synchronization because intermediate GPU results are required by the CPU computation. The original implementation outperformed the alternate for certain inputs (e.g., where the maximum number of values in a sum was not large), so the updated version chooses between them based on the input. The input analysis has some overhead.

The implementation was adapted from PyTorch (b186831c08/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu).
2020-10-26 12:53:53 -07:00
Sergii Dymchenko
8224718f8f
Enable CommonSubexpressionElimination in training. (#5504)
* Add test for CommonSubexpressionElimination in training.

* Enable CommonSubexpressionElimination in training.

* Add ommonSubexpressionEliminationApplyOnce for training.
2020-10-26 11:25:15 -07:00
Hariharan Seshadri
44773c60e3
Add a CUDA based IOBinding test (#5572) 2020-10-26 10:57:36 -07:00
Xavier Dupré
f4cee22b9b
Handle -inf in ReduceSumLogExp, fix regression introduced in PR #5370 (#5583)
* Handle -inf in ReduceSumLogExp operator
* Update reduction_ops_test.cc
* Remove a case which has a different behaviour CPU/GPU
2020-10-26 09:58:02 +01:00
Tracy Sharpe
502f67ba58
MLAS: implement u8x8 GEMM for aarch32 (#5580) 2020-10-25 23:05:12 -07:00
Andrew McDowell
b2da700e4d
Allow Upper case letters in RHS of einsum equations. (#5569)
Co-authored-by: Andrew McDowell <andrew@neva-labs.com>
2020-10-25 18:11:12 -07:00
Ye Wang
51af108af5
Support older version of slice in reshape fusion (#5574)
* support older version of slice in reshape fusion

* fix

* review partial comments

* add test

* add gen file
2020-10-24 14:48:18 -07:00
Du Li
860cb22260
Bug fix for C API (#5520)
* remove if_def from C api

* Fix CI issues.

* revert change for symbols.txt
2020-10-24 13:37:58 -07:00
Pranav Sharma
3f3b202e36
Optimize GatherElements further, add threshold for parallelizing Scaler. (#5579)
* Optimize GatherElements more.

* Optimize GatherElements further, add threshold for parallelizing Scaler.

* Add basic tests to exercises the parallel path
2020-10-24 12:38:31 -07:00
Guoyu Wang
3f06286154
Add Flatten support for NNAPI (#5545)
* Add flatten support for NNAPI, correct some typo in NNAPI code files

* Address review comments

* Update CanSkipReshape

* Add test for verify NNAPI is actually running for a supported model

* Adding test for reshape/flatten test for NNAPI

* Add one extra verbose log for skipping reshape

* Fix Android CI failure

* Correct test file name to fix Android CI failure
2020-10-22 18:15:53 -07:00
ytaous
7da5949279
NVTX label change (#5562)
* label change

* more info on label

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-22 10:34:20 -07:00
Andrews548
20bc83400b
ACL/ArmNN update (#5515)
* Build ACL and ArmNN with custom library path

* Define import to tensor as a separate function for maintenance and readability

* Enabled optimized depthwise convolution for ACL v20.02

* Check operation status for ACL and ArmNN Execution Providers

* Enabled fused operation for convolution-activation

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-10-22 09:29:44 -07:00
Ryan Lai
98538580c8
give more tolerance to DirectML runs (#5564) 2020-10-21 23:14:51 -07:00
Tianlei Wu
1f304fbee7
Attention with past and no unidirectional mask (#5557)
* Update fusions to support shared node, and mask of all ones
2020-10-21 20:12:02 -07:00
ashbhandare
0a9b83a313
Add zero test (#5476) 2020-10-21 17:12:00 -07:00
Scott McKay
6d35be215f
Add --skip_tests to example command line as the included ops are being reduced. (#5554) 2020-10-22 08:55:42 +10:00
RandySheriffH
d220c9f950
Resolve crash in MatMul optimization (#5551)
* check pointer before referencing

* add test case

* switch to ASSERT_EQ
2020-10-21 13:18:19 -07:00
Changming Sun
5802fe1699
Remove MKLML build config (#5559)
Remove MKLML build config
2020-10-21 13:11:25 -07:00
Ryan Hill
82c7a9756e
Fix shared provider unload crash (#5553) 2020-10-21 13:01:21 -07:00
Hariharan Seshadri
4291c57322
[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Ashwini Khade
df22611026
Update ONNX commit (#5487)
* update ONNX

* update onnx + register kernels for reduction ops

* bug fix kernel reg

* update cgmanifests

* revert unsqueeze op 13 registration

* filter ops which are not implemented yet

* filter some tests

* update onnx commit to include conv transpose bug fix

* update docker images

* undo not required test changes

* fix test failures
2020-10-21 07:22:20 -07:00
Vincent Wang
b48f596a91
GatherElementsGrad CPU Kernel and TopKGrad CPU/CUDA Kernel (#5511)
* TopKGrad CPU kernel

* use Scatter for GatherElementsGrad and TopKGrad.

* rollback convgrad change.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-10-21 09:29:29 +08:00
Yufeng Li
6c2162e97a
Fix quantization of Conv1D with bias (#5491)
* Fix reshape for Conv with bias
2020-10-20 15:27:26 -07:00
Pranav Sharma
1038f9cc8b
Optimize GatherElements and Scaler. (#5543)
* Optimize GatherElements and Scaler.

* Address PR comments

* Fix build
2020-10-20 10:36:20 -07:00
edgchen1
2f4fc83231
Add NVTX profiling range around kernel computation. (#5542) 2020-10-20 09:58:58 -07:00
Tracy Sharpe
45483dcf1f
Add QLinearConv for activations=u8, weights=s8 (#5510) 2020-10-20 08:45:13 -07:00
Changming Sun
280cdf31f5
Revert "Fix shared provider unload crash (#5523)" (#5547)
This reverts commit 610676293e. Because Linux DNNL pipeline is failing.
2020-10-20 08:01:28 -07:00
Xavier Dupré
66c8a441e0
Improves ReduceSum performance by removing transposition. (#5370)
* Improves ReduceSum performance
* Add min, max, L1, L2, logsum, sumsquare
* remove all reduce implementation including transpose
2020-10-20 10:36:31 +02:00
Scott McKay
682898ae2b
Add #include for std::tolower. Fixes VS2017 build error. (#5544) 2020-10-20 18:00:57 +10:00
ytaous
67968441e0
GatherND - add Cuda support for int64 on opset 12 (#5531)
* support for int64

* per comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-10-19 21:48:00 -07:00
Ryan Hill
610676293e
Fix shared provider unload crash (#5523)
* Change shared providers so that they are shutdown before shared library unload
* Move UnloadSharedProviders declaration into a shared header to avoid bugs.
2020-10-19 18:08:38 -07:00
Hariharan Seshadri
4b29423656
Re-enable custom op shared library test for debug builds (#5475) 2020-10-19 17:14:31 -07:00
Juliana Franco
0298b9734e
Save in EndTraining only if in last rank (#5500)
* Only save partition of graph with loss (during EndTraining)

* fix comments

Co-authored-by: Juliana <jufranc@microsoft.com>
2020-10-19 14:16:48 -07:00
Scott McKay
a3d2bc36be
Fix script name in doco (#5530) 2020-10-20 06:42:53 +10:00
Thien Bui
6ad70d7371
[Doc] ONNX_Runtime_Server_Usage fix proto uri (#5345)
The predict proto should be `../server/protobuf/prediction_service.proto` instead of `../onnxruntime/server/protobuf/prediction_service.proto`
2020-10-19 13:30:58 -07:00
Olivia Jain
1e4b259d28
Updating EP docs with Onnxruntime API calls (#5503)
* updating examples with current api calls

* Fixing capitalization in api calls, adding RKNPU update

* Correcting nuphar and rknpu ep api calls

* Include creating session in readme
2020-10-19 12:21:21 -07:00
Derek Murray
0b59004666
Add fallback function implementation for DivGrad (#5518)
* Add fallback function implementation for DivGrad.

* Add shape inference for DivGrad.

* Add missing argument.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-10-19 10:47:47 -07:00
Tracy Sharpe
a355281b99
Add alternate IsSupportedOptypeVersionAndString signature (#5529)
Add a variant of graph_utils::IsSupportedOptypeVersionAndDomain that takes const char* instead of std::string.
2020-10-18 18:14:06 -07:00
KeDengMS
e1a54c4090
Symbolic shape inference: fix a bug in shape merge (#5519)
* Symbolic shape inference: fix a bug in shape merge

OpType Where:
input0: ['mt_src_tokens_batch', 1, 1, 'mt_src_tokens_len']
input1: []
input2: ['mt_prev_output_tokens_batch', 12, 'mt_prev_output_tokens_len', 'floor(mt_src_tokens_batch*mt_src_tokens_len/mt_prev_output_tokens_batch)'] 1
output: [None, 12, 'mt_prev_output_tokens_len', None]

* Undo unintended TRT change
2020-10-16 17:54:57 -07:00
Sergii Dymchenko
eda9fd566e
Update tar-stream and prebuild-install versions (#5479)
* Update tar-stream and prebuild-install versions

Update the versions because of Component Governance alerts.

* Update package-lock.json
2020-10-16 12:18:49 -07:00
Scott McKay
ad94a1dd6d
Add opset 13 registrations for Identity, IsNaN, NonZero, GatherND and Pad (#5513) 2020-10-16 09:39:03 -07:00
Ryan Lai
f207f0bf5e
Add WinML Model testing (#5417)
* Model test start with float

* Clean up code and add environment variable detection

* Move into namespace

* PR comments

* Fix linker errors in latest merge to master and also fix warning

* add skipping model test mechanism

* Return std::string instead of writing to buffer

* Address case where env variable is larger than max_path

* use const static string for test reason

* Disable x86 tests and don't build if ort memory checker is enabled

* Add comment

* Add additional failing x86 tests and ifdef for checking fo rx86 build

* PR comments
2020-10-15 19:04:12 -07:00
Guoyu Wang
b991ee4c69
Cleanup NNAPI code (#5505)
* Cleanup NNAPI code

* Check return of GetNCHWInput
2020-10-15 17:40:10 -07:00
Derek Murray
6f65e2ad2c
Mark the dX and dB outputs of ConvGrad as OpSchema::Optional. (#5462)
* Mark the dB output of ConvGrad as OpSchema::Optional.

* Also mark dX as optional

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-10-15 16:54:17 -07:00
Derek Murray
64f6d856e4
Add FlattenGrad and test. (#5461)
Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-10-15 16:11:57 -07:00
Derek Murray
88f6523baf
Add type inference for BroadcastGradientArgs (#5501)
* Add type inference for BroadcastGradientArgs

This change enables the ONNX shape and type inference to work on a function body containing a BroadcastGradientArgs op. Without this change, the dummy inference function is used, and no types are inferred for the output here:

531e6dd459/onnx/shape_inference/implementation.cc (L467-L469)

* Handle optional outputs.
2020-10-15 16:11:24 -07:00
Scott McKay
7da7e07909
Cleanup some test infrastructure (#5484)
* Created shared version of InferenceSession wrapper class and update relevant tests to use it.
Include domain in the ops counting helper so it's more general and we don't need to duplicate it in the nchwc tests. Update tests to include domain in key being checked.

* Fix some training tests

* Fix prefixing of contrib op names in test
2020-10-16 06:44:01 +10:00
Sunghoon
645d978589
Sunghcho/denormals (#5391)
* Add session option and global thread pool option to set denormal as zero.

* Revert unneccessary changes.

* Add cpuinfo submodule

* Add more comments

* Remove cpuinfo submodule dependency and check only SSE3 support for ftz and daz inspired by Tensorflow

* Preserve API order in C api

* Clean up and utilize SSE3 detection logic from existeing cpuid_info.h

* Keep the same order with header file

* Fix build issue with Linux pipeline, which has old g++ compiler

* Fix broken build on Linux and remove a duplicated unit test

* Remove reformatting at eigen thread pool

* Remove flatbuffers which is not intentionally added

* Revert "Remove flatbuffers which is not intentionally added"

This reverts commit 9f509a9aaaa3c7832d88854c82fd26b234770b7f.

* Remove flatbuffers which is not intentionally added

* Resolve comments
  - Put details on APIs
  - Add a log for ftz/daz initialization
  - Add clang
  - Fix typo

* Remove unnecessary header include

* Resolve comments
2020-10-15 12:47:42 -07:00
Guoyu Wang
915d475353
Android CI update (#5474)
* Update Android CI

* update comments
2020-10-14 16:56:50 -07:00