Commit graph

4185 commits

Author SHA1 Message Date
Yufeng Li
56e4e47f66
Quantize model with QDQ format (#6541)
* implement qdq format in quant tool
* refactor code
2021-02-08 18:46:07 -08:00
Scott McKay
c02ae61cab
Make kernel hash stable in type reduced build (#6603)
* Add infrastructure so that a kernel definition has the full list of supported types and a list of types enabled in this build. We need to use the full list when calculating the kernel hash so that the hash value in an ORT format model is stable across builds with and without type reduction enabled.
2021-02-09 12:00:08 +10:00
Cian Hayes
16eed68a1e
Fix layer_norm.cc on x86 (#6556)
* Fix LayerNromGrad on x86

* PR feedback
2021-02-08 17:36:14 -08:00
Scott McKay
13d7db9a98
Don't update the excluded ops/types unless args.update is true. Updating the exclusion info triggers rebuilding of all kernels using type reduction. (#6604) 2021-02-09 07:15:31 +10:00
Scott McKay
0b1e21c638
Fix bug with ORT format serialization of tensor attributes. (#6602)
The model path needs to come from the Node not the potentially nullptr subgraph.
2021-02-09 07:15:21 +10:00
Pranav Sharma
67ef6b1aa6
[Mult-GPU inferencing] Add new API to get/set device id. Set correct device id in cuda allocator. (#6592) 2021-02-08 08:59:18 -08:00
nietras
1dd920fa7c
Fix TensorRT unnecessary file cache operations (#6601)
* Fix TensorRT unnecessary file cache operations

* fix compile
2021-02-07 20:09:30 -08:00
Edward Chen
19c130f561
Reduce CastMLFloat16ThroughFloat size (Scott's suggested changes), fix unused function warning. (#6597) 2021-02-08 07:20:53 +10:00
Scott McKay
190b90a682
Fix some coding conventions issues (#6583)
Fix some coding conventions issues
Use #define for types that Cast supports
2021-02-08 07:11:26 +10:00
Weixing Zhang
c86c21e002
Generate error when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax (#6599)
* Generate error when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-06 15:54:29 -08:00
Jesse Benson
d18aa45b46 Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models. 2021-02-06 14:40:34 -08:00
Adam Pocock
dbe31361bc Fix build.gradle so it always targets Java 8 class files. 2021-02-05 22:26:17 -08:00
Ryan Lai
b57a7f4de3 Delay load dxcore in winml model tests 2021-02-05 21:08:11 -08:00
George Nash
b50b0a89aa Fix build failure when building with --build_wheel on Windows
This resolves issue #6536

Signed-off-by: George Nash <george.nash@intel.com>
2021-02-05 18:59:01 -08:00
Nat Kershaw (MSFT)
af9dfa7a4d
Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225) 2021-02-05 18:09:27 -08:00
Dmitri Smirnov
dda5a62072
Fix updated Doxygen errors. (#6588) 2021-02-05 18:07:03 -08:00
Chun-Wei Chen
115e16b37b
ort_test_utils: skip creating input if it is an initializer (#6544) 2021-02-05 17:34:08 -08:00
Scott McKay
ccfd90291b
Remove condition from ORT_RETURN_IF[_NOT] macro output. (#6563)
Remove condition from ORT_RETURN_IF[_NOT] macro output as repeating the condition doesn't add much value compared to the explicit error message, and the error message includes the file and line anyway so it's easy enough to find the condition if needed.
Update the few places where the macros were used without an explicit error message to provide an explicit error message.

Saves 12.5KB in a minimal MinSizeRel build with all DNN ops, 16KB in full release build.
2021-02-05 17:33:29 -08:00
Changming Sun
b5bd14fc9f
Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585)
Update gpu packaging pipelines to CUDA11

In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
2021-02-05 16:58:37 -08:00
Ye Wang
82229c8e61
Support no bias in layernorm and skiplayernorm op (#6554)
* add noBias attribute in layernorm

* skip bias in skiplayernorm

* fix

* fix cuda tets

* add tests

* fix windows build

* fix win build issue

* review comments
2021-02-05 16:48:22 -08:00
Weixing Zhang
299ace0759
Support to allow user to specify compute stream per session (#3723)
* Support to allow user to specify compute stream per session

Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream.

remove some redudant cudaStreamSynchronize

fix gpt2 model test failures

don't use default stream in nccl either.

add stream schronization in OnRunEnd()

using cub::DeviceScan::InclusiveSum which can be called with stream specified.

fix topK failure due to latest rebase

fix tensorrt

support user specified stream

add user_stream support in tensorrt EP

use same stream for both tensort and CUDA EP.

fix ScatterND

specify stream for adasum and p2p kernels.

fix loop

fix CApiTest.custom_op_handler

fix CApiTest.varied_input_custom_op_handler

change for cudaMemcpyFromSymbol

improve provider options for user specified compute stream

* add changes for ROCM EP

* fix GatherGrad UT for ROCM EP

* clean code and fix NonMaxSuppression

* use default stream for ROCM now

* fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt

* fix tensorrt ut: CApiTest.io_binding_cuda

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-05 15:48:18 -08:00
sfatimar
973c3917a6
OpenVino add build_shared_lib flag in the build command (#6560)
* Dockerfile changes to add build_shared_lib
2021_1 indendation changes

* csharp shared library

Co-authored-by: sfatimar <sahar.fatima@intel/com>
2021-02-05 12:18:02 -08:00
Guoyu Wang
68193e28de
Let execution fall back to CPU EP if Compile of a partition on current EP fails (#6580)
* Let exccution fall back to CPU EP if compile of a partition fails

* Removed debugging logs

* Addressed CR comments
2021-02-05 12:14:55 -08:00
Chun-Wei Chen
f2ce3aae13
add set_model_dir and update ONNX (#6119) 2021-02-05 09:30:49 -08:00
Edward Chen
3b376da37c
Enable type reduction for Gather CPU kernel. (#6579)
* Enable type reduction in Gather.
2021-02-05 17:22:22 +10:00
Scott McKay
c5d2538314
Add more kernels that have typed registrations to the operators we track type usage for. (#6565) 2021-02-05 15:10:54 +10:00
Hariharan Seshadri
f14c621c10
Tile perf enhancements - continued (#6561) 2021-02-04 20:14:27 -08:00
Scott McKay
c49d1dbc4b
Add type reduction support to Slice and Transpose (#6547)
* Add type reduction support to Slice and Transpose
2021-02-05 11:08:23 +10:00
Yulong Wang
89627a8178
[Node.js binding] support NPM v7+ (#6559) 2021-02-04 17:07:06 -08:00
Xavier Dupré
615acf156c
remove keras example from python documentation (#6574) 2021-02-05 01:10:11 +01:00
Prasanth Pulavarthi
4e61e254ec
Update link in readme (#6537) 2021-02-04 15:28:39 -08:00
Jesse Benson
d914e29fe1 Reuse reduction_functions.cu 2021-02-04 15:00:05 -08:00
Jesse Benson
3c44184963 Pick up changes from:
https://github.com/microsoft/onnxruntime/pull/6490
2021-02-04 15:00:05 -08:00
Jesse Benson
a9e4d70b50 Fix merge conflict. 2021-02-04 15:00:05 -08:00
Jesse Benson
76fcebd0a4 Fix scratch buffer early free. 2021-02-04 15:00:05 -08:00
Jesse Benson
86ac11af1a Delete ROCM-specific reduction code that is identical to CUDA reduction code. 2021-02-04 15:00:05 -08:00
Jesse Benson
5d8792705b Code formatting. 2021-02-04 15:00:05 -08:00
Jesse Benson
21a47ec8d9 Disable a couple more unsupported tests. 2021-02-04 15:00:05 -08:00
Jesse Benson
0b147702af Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels. 2021-02-04 15:00:05 -08:00
Jesse Benson
a28ddb85b6 Reduction ops. 2021-02-04 15:00:05 -08:00
Jesse Benson
196132925e Reuse CUDA's reduction_functions.cc 2021-02-04 15:00:05 -08:00
Jesse Benson
4c1db50df5 miopen common 2021-02-04 15:00:05 -08:00
Jesse Benson
554184bcc4 Add reduce template parameters. 2021-02-04 15:00:05 -08:00
Jesse Benson
c4b6559be9 Update reduction_all.cu 2021-02-04 15:00:05 -08:00
Jesse Benson
5fc377f21e Partial updating of ROCM reduction code. 2021-02-04 15:00:05 -08:00
Edward Chen
318b82ca7e
Cast Op performance fix. (#6509)
Update CPU Cast implementation to fix performance regressions.
Update Cast unit tests for more coverage.
2021-02-04 14:52:37 -08:00
Edward Chen
2ef792ae6e
Don't resolve symlink in resolve_executable_path(). (#6540) 2021-02-04 12:32:03 -08:00
Changming Sun
aa31ba5774
Merge CPU packaging pipelines (#6480)
1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one.
2. Enable compile warnings for cuda files(*.cu) on Windows.
3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code.
4. Fix some warnings in the training code.
5. Enable code sign for Java. Previously we forgot it.
6. Update TPN.txt to remove Jemalloc.
2021-02-04 08:38:56 -08:00
Guoyu Wang
0d35f0e2c0
[CoreML EP] Add support of Conv operator (#6510)
* [CoreML EP] Add support of Conv operator

* Ignore an corner case setting empty padding

* Add handle autopadding

* Addressed CR comments
2021-02-04 00:30:10 -08:00
Guoyu Wang
6cf54ff296
Switch Android CI java build to JDK 11 (#6552)
* switch to jdk11

* fix java

* Update
2021-02-03 17:49:23 -08:00