Commit graph

7863 commits

Author SHA1 Message Date
Yufeng Li
607afbe1c0
fix valgrind warnings:Conditional jump or move depends on uninitialis… (#11822)
* fix valgrind warnings:Conditional jump or move depends on uninitialised value(s)
2022-06-14 14:02:15 -07:00
Gary Miguel
52f6db19da
Python backend: use packaging.version to parse ONNX version (#11800)
Unlike the previous code, this handles version strings like "1.12.0rc3".

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 10:17:35 -07:00
zhangyaobit
f6d2b629a0
Add kernel explorer (#11779)
* Add kernel explorer, a tool to help develop, test, profile, and tune GPU kernels.

* clean up with some formatting issues

* rename MACRO

* macro renaming

* improve cmake code

* fix python lint errors

* fix python lint errors

* fix python lint errors

* delete white space suggested by lint
2022-06-13 20:11:25 -07:00
Scott McKay
6bf6bac1fd
Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829) 2022-06-14 09:31:17 +10:00
Chun-Wei Chen
63c483a998
1.12.0 is the right TBD instead of released 1.11.0 (#11817) 2022-06-13 14:27:59 -07:00
Adrian Lizarraga
aef53e2b0d
Support uploading EP perf data to a configurable database. (#11819) 2022-06-13 14:06:50 -07:00
Changming Sun
a93ebd2503
Move tvm pipeline to Github Actions (#11721) 2022-06-13 11:38:44 -07:00
Wil Brady
b0e027c661
Add aten::_softmax to eager ops. (#11820) 2022-06-13 13:05:26 -04:00
Hector Li
7582644f57
cmake changes for SNPE EP (#11821)
* move code used to find the SNPE libs to a separate cmake file

* Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict
2022-06-13 08:15:37 -07:00
Dwayne Robinson
04dd6639de And appease the time wasting formatting tool now -_-... 2022-06-11 19:17:20 -07:00
Dwayne Robinson
2bc487a816 Appease flaky flake tool 2022-06-11 19:15:19 -07:00
Dwayne Robinson
50e0a193c8 Merge branch 'master' into user/dwayner/DmlEp1.9 2022-06-11 19:01:51 -07:00
Dwayne Robinson
76024b8a6a Update DirectML.dll to 1.9.0 Preview 2022-06-11 18:51:32 -07:00
Maxiwell S. Garcia
0869f4f4ea
ppc64le: optimizing the MlasRequantizeOutput() with VSX (#11659) 2022-06-10 16:04:52 -07:00
pengwa
fb88efbe18
End to end run pass (on device training) (#11694)
* lr_scheduler implementation

(cherry picked from commit d9c2552b3a3b2ff38ee0a14770257aa1169f6fa9)

* refactor Module/Optimizer constructor.

* add intermidiate API layer bridging public interfaces with internal ones.

* synthetic data loader

* make end to end run pass

* avoid many session input copy (CPU to GPU)
some clean up

* NVTX for runner

* minor fix after sync

* revert to let Module/Optimizer handle session creation.

* fix tests & test file folder consolidation

* refine based on comments & fix cpplint

* typos
2022-06-10 15:25:44 -07:00
Tianlei Wu
def78a1b81
Support T5 in BeamSearch operator (#11450)
(1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation.
(2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute
(3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default.
(4) Add more tests in best_beam_search.py
(5) fix ORT_ENFORCE of hypothesis_buffer_offset_
(6) Improve ONNX conversion:
   (a) Change encoder some dynamic axes to fixed dim value
   (b) add --separate_encoder_and_decoder_init
   (c) correct name t5-3B => t5-3b, t5-11B => t5-11b
   (d) Add --use_int32_inputs in convert t5 to onnx
   (e) Allow t5 beam search conversion in one step
2022-06-10 15:06:57 -07:00
Dwayne Robinson
c1b5f34362
DML EP BatchNormalization-15 (#11814)
* Add external helper DirectMLX.h
* Add BatchNormalization-15 using DMLX to achieve casting if types are different
* Shape helper and some reformatting
* Additional linting issues
2022-06-10 15:04:48 -07:00
Tianlei Wu
768b9cfb60
Fix GetDirNameFromFilePath to support forward slash in windows (#11793) 2022-06-10 14:37:30 -07:00
Baiju Meswani
a61c38e4f4
Add ability to author float initializers (#11752) 2022-06-10 11:21:14 -07:00
Jeff Daily
5562b47f06
missing #include <thrust/count.h> in non_max_suppression_impl.cu (#11730)
Otherwise, depending on cuda or hip thrust versions, transitive header inclusions miss thrust::count_if.
2022-06-10 10:45:28 -07:00
Guenther Schmuelling
d4ea59654c
make xnnpack build for ort-web (#11745)
* make xnnpack build for ort-web

* make ci happy
2022-06-10 08:47:57 -07:00
Vincent Wang
f745eb1d3f
fix gradient ut (#11797) 2022-06-10 12:14:19 +08:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
Scott McKay
927bac0f86
Rework allocator sharing to work for multiple devices. (#11700)
* Rework allocator sharing to work for multiple devices.
* Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP).
  * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. 
* Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. 
  * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory.  
* Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup.
* Add some clarifying comments around how replace allocator works.
* Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class.
* Fix invalid check of whether data is on CPU to use device info instead of allocator name.
2022-06-09 17:38:38 +10:00
Dwayne Robinson
5e54611427
DML EP add Trilu-14 and Resize-13 nearest mode and others (#11782)
* Add Trilu-14 kernel
* Support Resize with rounding direction for round_prefer_ceil/round_prefer_floor
* Add batch normalization query and RNN query
* Appease CPPLINT.cfg per https://raw.githubusercontent.com/google/styleguide/gh-pages/cpplint/cpplint.py to reduce the noise
2022-06-08 19:08:00 -07:00
Dwayne Robinson
0f0b640b4b
Reformat build.py for WindowsAI branch (#11794) 2022-06-08 18:05:11 -07:00
Alex Fuller
8156b9370c
[Abseil] Adding URL_HASH so that an existing archive can be used from disk (#11690) 2022-06-08 17:12:59 -07:00
pengwa
540935aace
lr scheduler implementation (on device training) (#11714)
* lr_scheduler implementations

* rename test_runner to test_trainer.

* add unit tests

* address comments
2022-06-09 08:04:30 +08:00
Justin Chu
913100885b
Remove the redundant black check in CI (#11790)
We have two black checks in CI for different scopes (PR, full repo). Now that the repo level black check is required, we can remove the PR level check.
2022-06-08 16:58:43 -07:00
Gary Miguel
79db92f8fe
clang-format signal_defs.cc (#11767) 2022-06-08 15:45:40 -07:00
sumitsays
f5fe4f253c
Registered Softmax/Hardmax/LogSoftmax-13 as Versioned Operator (#11787)
* Added Softmax/Hardmax/LogSoftmax-13

* Removed redundant method specifier

* Registered softmax/hardmax/logsoftmax as verisioned operator

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-06-08 11:38:14 -07:00
dependabot[bot]
750cb42f87
Bump protobufjs from 6.10.2 to 6.11.3 in /js/node (#11722)
Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 6.10.2 to 6.11.3.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/v6.11.3/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/v6.10.2...v6.11.3)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-08 11:17:56 -07:00
dependabot[bot]
bc4c771078
Bump protobufjs from 6.10.2 to 6.11.3 in /js/web (#11723)
Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 6.10.2 to 6.11.3.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/v6.11.3/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/v6.10.2...v6.11.3)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-08 11:17:30 -07:00
Changming Sun
eeeb249a27
Update onnxruntime_providers.cmake to remove the reference of "onnxruntime_tvm_dependencies" (#11780) 2022-06-08 09:06:00 -07:00
Alexey Gladyshev
331c387f4a
[TVM EP][DOC] Documentation update for TVM EP due to the addition of precompiled model support. (#11743)
* update description of TVM EP options in docs

* update sample notebook

* update TVM EP documentation

* add link to description of options

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-06-08 14:56:01 +02:00
Yi Zhang
7f8d0ba824
Update comments in Android workflow (#11311)
* keep comments change only
2022-06-08 15:25:21 +08:00
Yufeng Li
f6f457aa57
not remove relu/clip for symmetric activation (#11696)
* not remove relu/clip for symmetric activation
2022-06-07 18:02:31 -07:00
sumitsays
aa3a825816
Added Softmax/Hardmax/LogSoftmax-13 (#11772)
* Added Softmax/Hardmax/LogSoftmax-13

* Removed redundant method specifier

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-06-07 14:31:55 -07:00
GPhilo
40f4304c7d
[Fix #11447] Use correct type for tensor shape vectors (#11448)
* [Fix] Use correct type for tensor shape vectors

* Replacing std::vector with absl::InlinedVector

* Remove explicit use of absl:: namespace;
Add back explicit size in constructors.

* Remove explicit size for InlinedVector
2022-06-07 09:06:32 -07:00
Yi Zhang
b4f1e769c0
Add Mac Silicon/M1 Wheel (#11591) 2022-06-07 08:58:20 -07:00
Yulong Wang
40d2c98e4d [js/web] fix ORT Web dependency version mismatch 2022-06-06 23:41:40 -07:00
leqiao-1
8fb38e8a54
fix cmake warning (#11742) 2022-06-07 09:37:16 +08:00
dependabot[bot]
9e33bfd29b
Bump simple-plist from 1.3.0 to 1.3.1 in /js/react_native/e2e (#11712)
Bumps [simple-plist](https://github.com/wollardj/simple-plist) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/wollardj/simple-plist/releases)
- [Commits](https://github.com/wollardj/simple-plist/compare/v1.3.0...v1.3.1)

---
updated-dependencies:
- dependency-name: simple-plist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-06 16:54:36 -07:00
PeixuanZuo
908e19dc16
[FIX] using torch.version.cuda/hip to ensure build ORTModule Torch C++ CUDA extension for docker build (#11675)
* [FIX] cpp ext

* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/install.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* [FIX] fix python format

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-06-07 07:51:26 +08:00
George Nash
981d45d8d5
Add binary comparators to the OneDNN (dnnl) execution provider (#11641)
* Added Bool output support by using u8 datatype

Signed-off-by: George Nash <george.nash@intel.com>

* Add Equal, Greater, GreaterOrEqual, Less, and LessOrEqual Operators

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Erick Munoz Alvarado <erick.munoz.alvarado@intel.com>
2022-06-06 09:15:42 -07:00
Valery Chernov
4296968f20
[TVM EP] update set input method for VirtualMachine (#11674)
* update TVM

* get alignment constant from TVM

* update TVM_VM_SetInputs to upstream with TVM API

* fix CI issue: update TVM EP dependencies

* add sudo

* revert changes needed to install missing package

* add package for TVM EP CI

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-04 09:31:01 +02:00
Changming Sun
d5e34acb82
Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651) 2022-06-03 20:00:54 -07:00
Changming Sun
3c1dd9514d
Revert "fixed point based requantization on arm64 (#11540)" (#11732)
This reverts commit 1f2c926. Because it makes our packaging pipeline crash

Error message:

[ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise
Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec

We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.
2022-06-03 19:12:25 -07:00
Scott McKay
ef64b2ee52
Fix clash between QDQ propagation and TransposeOptimizer (#11636)
* Initial changes with comments on potential unit test changes.

* Update tests to disable TransposeOptimizer as that's simpler.
Add some extra comments.
Cleanup.

* Update comments in TransformGraph

* Add regression test.
Add limitation that transpose optimizer will ignore assigned nodes that do not match the context EP if that is set.

* Fix test. I removed a trailing Transpose after initial validation to simplify but that changed things so that the transpose optimizer didn't kick in, and the DQ -> Transpose -> Q was actually converted to a single Transpose by the CPU EP QDQ handling. Same end result in most builds so the subtle difference wasn't noticed, but in a build without contrib ops the CPU EP QDQ handling is disabled so the end result was different.

Update the test to re-instate the trailing Transpose so transpose optimizer alters the graph as desired.

* Don't run level 1 optimizers after partitioning as they don't guarantee to handle EP assignment for new nodes they create.
2022-06-03 16:16:35 -07:00
Hector Li
95a16c1ffe
Snpe ep (#11665)
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
2022-06-03 14:10:02 -07:00