Commit graph

4179 commits

Author SHA1 Message Date
nietras
1dd920fa7c
Fix TensorRT unnecessary file cache operations (#6601)
* Fix TensorRT unnecessary file cache operations

* fix compile
2021-02-07 20:09:30 -08:00
Edward Chen
19c130f561
Reduce CastMLFloat16ThroughFloat size (Scott's suggested changes), fix unused function warning. (#6597) 2021-02-08 07:20:53 +10:00
Scott McKay
190b90a682
Fix some coding conventions issues (#6583)
Fix some coding conventions issues
Use #define for types that Cast supports
2021-02-08 07:11:26 +10:00
Weixing Zhang
c86c21e002
Generate error when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax (#6599)
* Generate error when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-06 15:54:29 -08:00
Jesse Benson
d18aa45b46 Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models. 2021-02-06 14:40:34 -08:00
Adam Pocock
dbe31361bc Fix build.gradle so it always targets Java 8 class files. 2021-02-05 22:26:17 -08:00
Ryan Lai
b57a7f4de3 Delay load dxcore in winml model tests 2021-02-05 21:08:11 -08:00
George Nash
b50b0a89aa Fix build failure when building with --build_wheel on Windows
This resolves issue #6536

Signed-off-by: George Nash <george.nash@intel.com>
2021-02-05 18:59:01 -08:00
Nat Kershaw (MSFT)
af9dfa7a4d
Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225) 2021-02-05 18:09:27 -08:00
Dmitri Smirnov
dda5a62072
Fix updated Doxygen errors. (#6588) 2021-02-05 18:07:03 -08:00
Chun-Wei Chen
115e16b37b
ort_test_utils: skip creating input if it is an initializer (#6544) 2021-02-05 17:34:08 -08:00
Scott McKay
ccfd90291b
Remove condition from ORT_RETURN_IF[_NOT] macro output. (#6563)
Remove condition from ORT_RETURN_IF[_NOT] macro output as repeating the condition doesn't add much value compared to the explicit error message, and the error message includes the file and line anyway so it's easy enough to find the condition if needed.
Update the few places where the macros were used without an explicit error message to provide an explicit error message.

Saves 12.5KB in a minimal MinSizeRel build with all DNN ops, 16KB in full release build.
2021-02-05 17:33:29 -08:00
Changming Sun
b5bd14fc9f
Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585)
Update gpu packaging pipelines to CUDA11

In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
2021-02-05 16:58:37 -08:00
Ye Wang
82229c8e61
Support no bias in layernorm and skiplayernorm op (#6554)
* add noBias attribute in layernorm

* skip bias in skiplayernorm

* fix

* fix cuda tets

* add tests

* fix windows build

* fix win build issue

* review comments
2021-02-05 16:48:22 -08:00
Weixing Zhang
299ace0759
Support to allow user to specify compute stream per session (#3723)
* Support to allow user to specify compute stream per session

Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream.

remove some redudant cudaStreamSynchronize

fix gpt2 model test failures

don't use default stream in nccl either.

add stream schronization in OnRunEnd()

using cub::DeviceScan::InclusiveSum which can be called with stream specified.

fix topK failure due to latest rebase

fix tensorrt

support user specified stream

add user_stream support in tensorrt EP

use same stream for both tensort and CUDA EP.

fix ScatterND

specify stream for adasum and p2p kernels.

fix loop

fix CApiTest.custom_op_handler

fix CApiTest.varied_input_custom_op_handler

change for cudaMemcpyFromSymbol

improve provider options for user specified compute stream

* add changes for ROCM EP

* fix GatherGrad UT for ROCM EP

* clean code and fix NonMaxSuppression

* use default stream for ROCM now

* fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt

* fix tensorrt ut: CApiTest.io_binding_cuda

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-05 15:48:18 -08:00
sfatimar
973c3917a6
OpenVino add build_shared_lib flag in the build command (#6560)
* Dockerfile changes to add build_shared_lib
2021_1 indendation changes

* csharp shared library

Co-authored-by: sfatimar <sahar.fatima@intel/com>
2021-02-05 12:18:02 -08:00
Guoyu Wang
68193e28de
Let execution fall back to CPU EP if Compile of a partition on current EP fails (#6580)
* Let exccution fall back to CPU EP if compile of a partition fails

* Removed debugging logs

* Addressed CR comments
2021-02-05 12:14:55 -08:00
Chun-Wei Chen
f2ce3aae13
add set_model_dir and update ONNX (#6119) 2021-02-05 09:30:49 -08:00
Edward Chen
3b376da37c
Enable type reduction for Gather CPU kernel. (#6579)
* Enable type reduction in Gather.
2021-02-05 17:22:22 +10:00
Scott McKay
c5d2538314
Add more kernels that have typed registrations to the operators we track type usage for. (#6565) 2021-02-05 15:10:54 +10:00
Hariharan Seshadri
f14c621c10
Tile perf enhancements - continued (#6561) 2021-02-04 20:14:27 -08:00
Scott McKay
c49d1dbc4b
Add type reduction support to Slice and Transpose (#6547)
* Add type reduction support to Slice and Transpose
2021-02-05 11:08:23 +10:00
Yulong Wang
89627a8178
[Node.js binding] support NPM v7+ (#6559) 2021-02-04 17:07:06 -08:00
Xavier Dupré
615acf156c
remove keras example from python documentation (#6574) 2021-02-05 01:10:11 +01:00
Prasanth Pulavarthi
4e61e254ec
Update link in readme (#6537) 2021-02-04 15:28:39 -08:00
Jesse Benson
d914e29fe1 Reuse reduction_functions.cu 2021-02-04 15:00:05 -08:00
Jesse Benson
3c44184963 Pick up changes from:
https://github.com/microsoft/onnxruntime/pull/6490
2021-02-04 15:00:05 -08:00
Jesse Benson
a9e4d70b50 Fix merge conflict. 2021-02-04 15:00:05 -08:00
Jesse Benson
76fcebd0a4 Fix scratch buffer early free. 2021-02-04 15:00:05 -08:00
Jesse Benson
86ac11af1a Delete ROCM-specific reduction code that is identical to CUDA reduction code. 2021-02-04 15:00:05 -08:00
Jesse Benson
5d8792705b Code formatting. 2021-02-04 15:00:05 -08:00
Jesse Benson
21a47ec8d9 Disable a couple more unsupported tests. 2021-02-04 15:00:05 -08:00
Jesse Benson
0b147702af Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels. 2021-02-04 15:00:05 -08:00
Jesse Benson
a28ddb85b6 Reduction ops. 2021-02-04 15:00:05 -08:00
Jesse Benson
196132925e Reuse CUDA's reduction_functions.cc 2021-02-04 15:00:05 -08:00
Jesse Benson
4c1db50df5 miopen common 2021-02-04 15:00:05 -08:00
Jesse Benson
554184bcc4 Add reduce template parameters. 2021-02-04 15:00:05 -08:00
Jesse Benson
c4b6559be9 Update reduction_all.cu 2021-02-04 15:00:05 -08:00
Jesse Benson
5fc377f21e Partial updating of ROCM reduction code. 2021-02-04 15:00:05 -08:00
Edward Chen
318b82ca7e
Cast Op performance fix. (#6509)
Update CPU Cast implementation to fix performance regressions.
Update Cast unit tests for more coverage.
2021-02-04 14:52:37 -08:00
Edward Chen
2ef792ae6e
Don't resolve symlink in resolve_executable_path(). (#6540) 2021-02-04 12:32:03 -08:00
Changming Sun
aa31ba5774
Merge CPU packaging pipelines (#6480)
1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one.
2. Enable compile warnings for cuda files(*.cu) on Windows.
3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code.
4. Fix some warnings in the training code.
5. Enable code sign for Java. Previously we forgot it.
6. Update TPN.txt to remove Jemalloc.
2021-02-04 08:38:56 -08:00
Guoyu Wang
0d35f0e2c0
[CoreML EP] Add support of Conv operator (#6510)
* [CoreML EP] Add support of Conv operator

* Ignore an corner case setting empty padding

* Add handle autopadding

* Addressed CR comments
2021-02-04 00:30:10 -08:00
Guoyu Wang
6cf54ff296
Switch Android CI java build to JDK 11 (#6552)
* switch to jdk11

* fix java

* Update
2021-02-03 17:49:23 -08:00
Ryan Lai
c7feb48083
Don't send out Runtime error telemetry when can't create LearningModelDevice on machine without hardware adapters (#6535)
* Checkoutpoint 1

* Remove global logruntime error telemetry. This isn't necessary and doesn't contain relevant information

* Make macro simpler

Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2021-02-03 14:27:29 -08:00
Guoyu Wang
464dbef143
[NNAPI EP] add uint8 support for Transpose/Concat/Maxpool, add support of QLinearSigmoid (#6534)
* Init change

* Add QlinearSigmoid support

* Update tests

* Add resize int8 support

* Add version check for resize linear uint8 and add scale/zero point check for concat uint8

* Address CR comments

* minor fix and add test for uint8 handling

* Address CR comments

* Fixed an existing bug

* Fix the new UT break, due to different rounding of 0.5 in device and emulator
2021-02-03 13:45:49 -08:00
Scott McKay
6cb8f8c812
Support disabling a typed kernel registration that uses the output type (#6530)
* Update infrastructure to support disabling a typed kernel registration that uses output 0 for the type (vs. the normal use case of input 0).
2021-02-03 14:22:32 +10:00
Scott McKay
8d53ef69e5
Add type reduction support to Min, Max and Pow (#6519)
* Add type reduction support to Min, Max and Pow
Update the C++ type reduction infrastructure to allow specifying an opset for the supported types list, as those can change across opset versions.
Minor updates to the type usage tracking script
* Add 'all opsets' macros and constant
2021-02-03 06:51:35 +10:00
Thiago Crepaldi
fbb24b57d0
Update code owners for pytorch frontend team (#6329) 2021-02-02 11:09:10 -08:00
ashbhandare
85434273ff
Fix CUDA Reduction kernel for ArgMax/ArgMix for when reduction dim=1 (#6490)
* Fix for when reduction dim=1

* Disable test for AMD GPUs

* Specify Async
2021-02-02 09:50:16 -08:00