Commit graph

6887 commits

Author SHA1 Message Date
Chi Lo
eb41bfb7b5
Fix graph viewer to proto (#11862)
* Add test for case where main const initialier in subgraph

* update test to use trt ep

* add initializer when converting from graph viewer to proto

* add comments

* add comments

* add comments

* only add initialier that is outer scope value

* make including outer scope value optional

* modify python format

* modify python format

* modify python format

* Remove test

* remove redundant argument
2022-06-19 19:28:18 -07:00
sumitsays
52f2b3bf89
[DML EP] Remove suffix removal adhoc logic for fusedNodeArgNames (#11879)
* DML EP: Remove suffix removal hack for fusedNodeArgName

* Acknowledged PR comments

* Removed reference from gsl::span

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-06-17 17:04:16 -07:00
sfatimar
f97bd38c4f
UEP 4.1 release (#11834)
* Add pypi build changes to latest Master

* Add ORT training part of OV build

* Disabling SqueezeOpTest.BadAxes

* Add ONNXruntime branch ARG to Docker build

* Changes to include file details versions

* Commit File Version Updates

* Change naming for linux build

* Add fix for pylint format errors

* Fix pylint warnings.

* Fix pylint errors - stage 2

Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com>

* Fix pylint errors - stage 3

* Fix pylint format - stage4

Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com>

* Commit for Wheel Release >0.35.1

Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: nmaajidk <n.maajid.khan@intel.com>
2022-06-17 14:49:04 -07:00
Edward Chen
a93fe7824a
Update EP compile API deprecation warning message. (#11808)
Minor wording update to warning message to clarify that the function style Compile API is deprecated now and will be removed soon.
Also updated some code comments.
2022-06-17 12:49:24 -07:00
Yi Zhang
f70201c801
Make sure the command works in both centos and ubuntu. (#11894)
make one bash condition compatible with POSIX
2022-06-17 12:19:22 -07:00
Rachel Guo
1494120423
[NNAPI EP] Unsqueeze op support (#11864)
* wip

* save unsqueeze support

* minor update

* remove unnecessary line

* address pr comments

* add comments
2022-06-17 12:07:18 -07:00
Yi-Hong Lyu
4ac72e305c
NHWC Resize optimization (#11825)
The optimization consists of:

* Use int32_t instead of int64_t
* Use different code path for tf_crop_and_resize or other
  coordinate_transformation_mode to avoid redundant conditions
* Loop-invariant code motion of offset, coefficient and extrapolation_value
  check
* Use fixed point to avoid floating-point computation

Besides, it always transforms NCHW Resize to NHWC because it has higher perf in
the NHWC variant when the input X is 4D int8/uint8 tensor and the mode is
linear on ARM.

It improves DeepLab V3 with int8 quantization by 26%~27% on big core and 37% on
LITTLE core on AArch64. It also improves DeepLab V3 with uint8 quantization by
24%~25% on big core and 34% on LITTLE core on AArch64.

Co-authored-by: Yufeng Li liyufeng1987@gmail.com
2022-06-17 11:00:36 -07:00
Edward Chen
adcf7e66c8
[NNAPI EP] Pad Op (#11860)
Add basic support for Pad Op in NNAPI EP.
2022-06-17 10:05:31 -07:00
Adrian Lizarraga
ad4abbd75e
[EP-Perf-Dashboard] Add support for TensorRT 8.4 to EP Perf Dashboard (#11876)
Co-authored-by: George Wu <jywu@microsoft.com>
2022-06-17 09:16:51 -07:00
Yi Zhang
8bb0062873
add manylinux_2_27 CPU wheel (#11886)
* add manylinux_2_27

* minor refactory

* change base image

* minor refactor

* add tests

* fix condition
2022-06-17 19:38:38 +08:00
Yi Zhang
d2cbae3a04
Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)
Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)"

This reverts commit 2ecba6fd25.
2022-06-17 17:07:21 +08:00
stevenlix
bd65acd08d
Share execution context memory between TensorRT subgraphs (#11859)
* share trt context memory

* update parser to 8.4-EA

* update parser to 8.4-GA

* add context memory sharing enable option

* update parser to 8.2-GA

* fix format issue

* reverse orders

* fix format

* fix format

* fix issues
2022-06-16 22:42:40 -07:00
Changming Sun
10478a09ca Revert "add manylinux_2_27 wheel (#11832)"
This reverts commit bbace23d0c.
2022-06-16 18:28:12 -07:00
Dmitri Smirnov
2ecba6fd25
Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)
Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.
2022-06-16 16:50:48 -07:00
Dwayne Robinson
3d99f16e98
Merge pull request #11827 from microsoft/user/dwayner/DmlEp1.9
Integrate WindowsAI feature branch with DML EP features and DML 1.9
2022-06-16 13:04:00 -07:00
George Wu
df5ee6aa4e
[TensorRT EP] support TensorRT 8.4 (#11866)
* update trt 8.4ga

* trt 8.4 linux ci pipeline

* fix cmake

* placeholder_builder

* trt 8.4 windows pipeline

* gpu package pipeline

* trt 8.4.1.5 , packaging pipeline updates

* python packaging

* ctest timeout

* python packaging test

* bump timeout

* python format

* format

* revert

* newline

* enable trt python tests

* typo

* python format

* disable on windows
2022-06-16 07:46:40 -07:00
Dwayne Robinson
fe7b8b80ae Revert BatchNormalization change for now, falling back to CPU on mixed types until a more advanced solution is written 2022-06-15 21:49:18 -07:00
Dwayne Robinson
babd6e3fcd Update DirectML preview package with unmangled names 2022-06-15 18:16:58 -07:00
Maxiwell S. Garcia
3f8c9146d5
ppc64le: specialize generic 'mlas' functions to use VSX instructions (#11845) 2022-06-15 16:49:38 -07:00
Scott McKay
d64f23fec0
EP factory creation cleanup and enhancements. (#11798)
* Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places.
Convert append EP for SNPE to be generic, and also use for XNNPACK.
Add XNNPACK to C# API

* Don't need stub for MIGraphX as it's using provider bridge.

* Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries.

* Only use EPs that require the layout transform if the opset is supported by the layout transformer.

* Update wasm registration of xnnpack.
2022-06-16 07:01:41 +10:00
Rachel Guo
1a1c360a80
[NNAPI EP] Add Gather op support (#11824)
* initial gather support nnapi

* update

* minor update

* address pr comments

* add int32 indices test case for nnapi

* remove nnapi ep limitation for added UT

* add link for memcpy type punning usage
2022-06-15 09:44:07 -07:00
Vincent Wang
02457ec30a
[CUDA] GatherElements[Grad]/ScatterElements Bugfix and Perf Improve (#11374)
* gather elements bugfix and perf improve

* fix win build

* fix ut on some eps

* ut change

* resove comments

* resolve comments

* fix win build
2022-06-15 16:29:17 +08:00
Xavier Dupré
a805a49363
Move OrtValueVector from onnxruntime-training to onnxruntime (#11176)
* Move OrtValueVector from onnxruntime-training to onnxruntime

* disable dlpack on onnxruntime

* disable dlpack

* dlpack

* opaque inlcuded in any cc file of the python binding

* fix type issue

* fix incomplete name

* remove len()

* remove unused parameter

* black

* black

* black

* remove unused import

* add unit test to check the output type

* black

* lint

* lint

* lint

* fix method name

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* check return type of C API

* lint

* lint

* fix missing ;

* fix type issue

* fix merge issue

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-06-15 09:36:28 +02:00
Dwayne Robinson
ff8b173286 Typo in DirectML.Debug.dll 2022-06-15 00:18:40 -07:00
Dwayne Robinson
508c76a246 Add missing DirectML.Debug.dll 2022-06-15 00:16:10 -07:00
Dwayne Robinson
e3ec30efb6 Add missing GELU to ApiHelpers.h 2022-06-14 23:28:15 -07:00
Dwayne Robinson
4c1a410d54 Unmangle DML preview package filenames 2022-06-14 23:12:58 -07:00
Yi Zhang
bbace23d0c
add manylinux_2_27 wheel (#11832)
* add manylinux_2_27
2022-06-15 10:26:51 +08:00
Changming Sun
51ed27cf22
Delete win-gpu-cuda-10-2-pipeline.yml (#11847) 2022-06-14 18:34:56 -07:00
daquexian
3cbbf9dcae
Fix wasm static lib in sub-project (#11671)
* wasm_static_lib_global

Signed-off-by: daquexian <daquexian566@gmail.com>

* make wasm static lib global

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix the property

Signed-off-by: daquexian <daquexian566@gmail.com>

* add code missing after merge

Signed-off-by: daquexian <daquexian566@gmail.com>
2022-06-14 15:18:11 -07:00
Gary Miguel
e8b0d24071
Support per-test tolerances for ONNX tests (#11775)
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.

In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.

Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.

Also fix various minor issues raised by linters.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 15:12:23 -07:00
Chen Fu
d936751aad
QlinearConv threading adjustments (#11228)
* Reserve the first core for the main thread

Currently in "auto affinity" mode the worker threads are affinized to cores 0..(N-1), leaving the very last core for the main thread. This patch preserves core #0 for the main thread, and affinizes the worker threads to cores 1..N.

* Avoid unneeded spin_pause in thread pool's worker threads

Remove unneeded PAUSE instruction (0.1-0.2 usec latency) after a worker thread finds a task to execute.

* MLAS/x86: optimize QLinearConv on hybrid CPUs

Existing 4x task granularity for task partitioning on hybrid CPUs is
not sufficient to compensate the difference of VNNI instructions
throughput
between performance and efficient cores. This patch...

* Increases granularity for QLinearConv by 2x, to have 2x more tasks
with 2x
  smaller output count

  * Limits QLinearConv task count from above, to avoid output count per
  task
    getting smaller than kernel's capability

    * Remove hardcoded task count for QLineConv as it limited scaling on
      16+ cores CPUs

* MLAS/x86: optimize QLinearConv on hybrid CPUs

Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions
throughput between performance and efficient cores. This patch...

  * Increases granularity for QLinearConv by 2x, to have 2x more tasks
  with 2x smaller output count

  * Limits QLinearConv task count from above, to avoid output count per
  task getting smaller than kernel's capability

  * Remove hardcoded task count for QLineConv as it limited scaling on
  16+ cores CP

* Addressing comments

* combining x86 ARM branches in qlinearconv threaded job partition

* revert first core assignment

Co-authored-by: Saurabh <saurabh.tangri@intel.com>
Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-06-14 14:42:12 -07:00
Yufeng Li
80d8c4c7ff
add data type check before quantizing (#11840) 2022-06-14 14:22:34 -07:00
Yufeng Li
607afbe1c0
fix valgrind warnings:Conditional jump or move depends on uninitialis… (#11822)
* fix valgrind warnings:Conditional jump or move depends on uninitialised value(s)
2022-06-14 14:02:15 -07:00
Gary Miguel
52f6db19da
Python backend: use packaging.version to parse ONNX version (#11800)
Unlike the previous code, this handles version strings like "1.12.0rc3".

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 10:17:35 -07:00
zhangyaobit
f6d2b629a0
Add kernel explorer (#11779)
* Add kernel explorer, a tool to help develop, test, profile, and tune GPU kernels.

* clean up with some formatting issues

* rename MACRO

* macro renaming

* improve cmake code

* fix python lint errors

* fix python lint errors

* fix python lint errors

* delete white space suggested by lint
2022-06-13 20:11:25 -07:00
Scott McKay
6bf6bac1fd
Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829) 2022-06-14 09:31:17 +10:00
Chun-Wei Chen
63c483a998
1.12.0 is the right TBD instead of released 1.11.0 (#11817) 2022-06-13 14:27:59 -07:00
Adrian Lizarraga
aef53e2b0d
Support uploading EP perf data to a configurable database. (#11819) 2022-06-13 14:06:50 -07:00
Changming Sun
a93ebd2503
Move tvm pipeline to Github Actions (#11721) 2022-06-13 11:38:44 -07:00
Wil Brady
b0e027c661
Add aten::_softmax to eager ops. (#11820) 2022-06-13 13:05:26 -04:00
Hector Li
7582644f57
cmake changes for SNPE EP (#11821)
* move code used to find the SNPE libs to a separate cmake file

* Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict
2022-06-13 08:15:37 -07:00
Dwayne Robinson
04dd6639de And appease the time wasting formatting tool now -_-... 2022-06-11 19:17:20 -07:00
Dwayne Robinson
2bc487a816 Appease flaky flake tool 2022-06-11 19:15:19 -07:00
Dwayne Robinson
50e0a193c8 Merge branch 'master' into user/dwayner/DmlEp1.9 2022-06-11 19:01:51 -07:00
Dwayne Robinson
76024b8a6a Update DirectML.dll to 1.9.0 Preview 2022-06-11 18:51:32 -07:00
Maxiwell S. Garcia
0869f4f4ea
ppc64le: optimizing the MlasRequantizeOutput() with VSX (#11659) 2022-06-10 16:04:52 -07:00
Tianlei Wu
def78a1b81
Support T5 in BeamSearch operator (#11450)
(1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation.
(2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute
(3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default.
(4) Add more tests in best_beam_search.py
(5) fix ORT_ENFORCE of hypothesis_buffer_offset_
(6) Improve ONNX conversion:
   (a) Change encoder some dynamic axes to fixed dim value
   (b) add --separate_encoder_and_decoder_init
   (c) correct name t5-3B => t5-3b, t5-11B => t5-11b
   (d) Add --use_int32_inputs in convert t5 to onnx
   (e) Allow t5 beam search conversion in one step
2022-06-10 15:06:57 -07:00
Dwayne Robinson
c1b5f34362
DML EP BatchNormalization-15 (#11814)
* Add external helper DirectMLX.h
* Add BatchNormalization-15 using DMLX to achieve casting if types are different
* Shape helper and some reformatting
* Additional linting issues
2022-06-10 15:04:48 -07:00
Tianlei Wu
768b9cfb60
Fix GetDirNameFromFilePath to support forward slash in windows (#11793) 2022-06-10 14:37:30 -07:00