Commit graph

1209 commits

Author SHA1 Message Date
Yulong Wang
054464dce2
fix XNNPACK on WebAssembly SIMD (#13161)
### Description

fix XNNPACK on WebAssembly SIMD.

Flag "-msimd128" need to be applied to every source file when compiling
WASM SIMD. Currently only a part of the source files are compiled with
this flag so we get inconsistent result for
`sizeof(xnn_f32_minmax_params)` because the type definition include a
`#ifdef` for `__wasm_simd128__`. The inconsistency causes writing
garbage data to a stack variable and eventually cause the crash.

XNNPACK libraries are C libraries so need to apply the build flags not
only to `CMAKE_CXX_FLAGS` but also to `CMAKE_C_FLAGS`.
2022-09-30 16:34:15 -07:00
George Nash
b76a65c784
Upgrade the oneDNN ep to use oneDNNv2.7 (#13175)
### Description
This updates the oneDNN library used by oneDNN ep from version 2.6 to
version 2.7



### Motivation and Context
This brings in the many improvements incorporated into the oneDNN
library to the oneDNN execution provider.

Signed-off-by: George Nash <george.nash@intel.com>
2022-09-30 12:29:17 -07:00
cloudhan
c93cb8f949
Revert "Enable ROCm to use tunable GEMM" (#13160)
Reverts microsoft/onnxruntime#12853 due to CI pipeline problem.
2022-09-30 14:01:16 +08:00
cloudhan
32c2c4b480
Change ROCm to use tunable GEMM (#12853)
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.
2022-09-28 16:21:54 +08:00
Rachel Guo
9a44a69653
Refactor NNAPI EP OpBuilder/OpSupportChecker structure (#13065)
### Description
<!-- Describe your changes. -->

As title

-Split long OpBuilder and OpSupportChecker files into individual
operator files.

-Add OpBuilder/SupportChecker registry factories.

-Combine the functionality of op_builder and op_support_checker into one
op_builder.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The NNAPI OPBuilder was splitted into OPBuilder (For EP::Compile) and
OPSupportChecker (for EP::GetCapability)
At the time it was reasonable choice, but OPBuilder/OPSupportChecker
share some logic and has to use addition helper.

Clean up now to make NNAPI OPBuilder/OPSupportChecker into single
OPBuilder (similar to what CoreML EP has)
2022-09-27 17:12:09 -07:00
Changming Sun
b25437ec41
Upgrade protobuf version (#13100)
Upgrade protobuf version from 3.18.1 to 3.18.3 to address CVE-2022-1941
2022-09-26 21:30:28 -07:00
RandySheriffH
77a066c700
Drop nuphar from java API (#13107)
Drop nuphar from:

- java API
- tvm.cmake
- run_build.sh
2022-09-26 17:06:08 -07:00
RandySheriffH
a83a9ed6b0
Remove miscellaneous nuphar configs (#13070)
Remove a handful of nuphar related configurations after deprecation.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-26 13:41:28 -07:00
Dale Phurrough
2ae33b3613
fix CuDNN lib path for Windows (#12974)
Fixes microsoft/onnxruntime#12969

### Motivation and Context

Build is broken, can't find cudnn.lib with nvidia official install of
cuDNN

Alternative method is to use `IF(EXISTS
${onnxruntime_CUDNN_HOME}/lib/x64/cudnn.lib)` to test for legacy
location and only add the legacy dir to the path, else add the current
official `lib/` dir.
2022-09-26 13:23:38 -07:00
Changming Sun
eafd67b8fd
Update CUDA version to 11.6 and refactor python packaging pipeline (#13002)
1. Update CUDA version from 11.4 to 11.6.
2. Update Manylinux version
3. Upgrade GCC version from 10 to 11 for most x86_64 pipelines. CentOS 7 ARM64 doesn't have GCC 11 yet.
4. Refactor python packaging pipeline: 
    a. Split Linux GPU build job to two parts, build and test, so that the
build part doesn't need to use a GPU machine
    b. Make the Linux GPU build job and Linux CPU build job more similar: share the same bash script and yaml file.
5. Temporarily disable Attention_Mask1D_Fp16_B2_FusedNoPadding because it is causing one of our packaging pipeline to fail. I have created an ADO task for this.
2022-09-23 00:29:27 -07:00
cloudhan
a24b41d92e
Move all TunableOp related falicilities to EP level directory (#12857)
Some Ops in EP directory instead of contrib_ops directory will
require TunableOp. We will also need to add EP level session tuning
options for it. So move those code all at once.

Also remove duplicated utility functions.
2022-09-23 11:10:19 +08:00
wangxiyuan
952c99304a
Add CANN EP (#12416)
**Description**: This PR adds Ascend CANN execution provider support.

**Motivation and Context**
- Why is this change required? What problem does it solve?
As the info shown in the issue. CANN is the API layer for Ascend
processor. Add CANN EP can allow user run onnx model on Ascend hardware
via onnxruntime
  The detail change:
  1. Added CANN EP framework.
  2. Added the basic operators to support ResNet and VGG model.
  3. Added C/C++、Python API support
- If it fixes an open issue, please link to the issue here.
   https://github.com/microsoft/onnxruntime/issues/11477

Author: 
lijiawei <lijiawei19@huawei.com>
wangxiyuan <wangxiyuan1007@gmail.com>

Co-authored-by: FFrog <ljw1101.vip@gmail.com>
2022-09-22 14:53:40 -07:00
sfatimar
cccbe90764
Openvino ep 2022.2 v4.2 (#13023)
This changes are to align OV 2022.2 Release with ORT . Changes
CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile 

**Motivation and Context**
- This change is required to ensure ORT-OpenVINO Execution Provider is
aligned with latest changes.
- If it fixes an open issue, please link to the issue here.

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: shamaksx <shamax.kshirsagar@intel.com>
Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com>
Co-authored-by: pratiksha <mohsinx.mohammad@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: nmaajidk <n.maajid.khan@intel.com>
Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com>
Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>
2022-09-22 12:31:40 -07:00
Adam Louly
268bfe2a5d
python training api bindings (#12610)
**Description**: **Python API Bindings for on device training. **
**Motivation and Context**
- This PR contains api bindings so python users can perform a whole
training loop.

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-09-16 09:38:24 -07:00
sumitsays
363c695dad
Update DML 1.9.0 to 1.9.1 (#12966)
Update DML to 1.9.1

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2022-09-15 10:54:22 -07:00
cloudhan
10f9a69707
Use CMake EXCLUDE_FROM_ALL for composable kernels to avoid building of conv related kernels (#12855) 2022-09-14 22:11:31 -07:00
Chun-Wei Chen
d819b56fba
Consume ONNX 1.12.1 to prevent vulnerability issue while loading external file (#12915)
* consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors

* update ONNX 1.12.1

* test updated PR

* use official rel-1.12.1 commit
2022-09-14 21:10:24 -07:00
Scott McKay
022d9e2d0c
Get files for XNNPACK wasm build from BUILD.bazel. (#12892)
Get files for wasm build from BUILD.bazel.
2022-09-09 12:38:57 -07:00
pallavides
6ebb7b91eb
Re-apply fix for mkl issue for eager mode (#12881)
* reapply fix for mkl issue for eager mode
* add comment, update link libs
2022-09-08 12:29:24 -07:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Hariharan Seshadri
ad69aac491
Introduce ordered quantization ops for the CUDA EP [1/n] (#12582)
Initial core small set for the ordered quantization ops for cuda EP.
2022-09-07 11:58:15 -07:00
Guenther Schmuelling
f856be162e
fix xnnpack wasm build (#12845) 2022-09-06 19:20:07 -07:00
Jan Tilly
437409c343
Add DONT_VECTORIZE flag to cmake (#12169)
Add DONT_VECTORIZE flag.
2022-09-07 12:14:14 +10:00
Yulong Wang
726251609a
increase max memory to 4G for wasm (#12798) 2022-09-06 17:07:13 -07:00
Xavier Dupré
54360c88d2
Disable two warnings raised by tensorboard on Visual Studio (#12773) 2022-09-06 20:42:52 +02:00
Baiju Meswani
295bd26980
Remove orttraining-distributed CI pipeline (#12738) 2022-09-02 14:34:26 -07:00
Changming Sun
ca5af24765
Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
Sheil Kumar
e3b501125d
DFT on DirectML (#12710)
* DFT on DirectML

* feedback

* fix misc build issues

* fixes

* fix constant cpu inputs and optional tensors for external operators

* disable dft tests on 'pure' dml
2022-09-01 08:31:14 -07:00
Yulong Wang
82a28cc2c3
upgrade emsdk to 3.1.19 (#12690)
* upgrade emsdk to 3.1.19

* fix build break

* ignore '-Wunused-but-set-variable' in eigen

* add malloc and free in exported functions

* EXPORTED_FUNCTIONS
2022-08-30 13:42:45 -07:00
Yi Zhang
27304d9082
gcc should not less than 7 (#12771) 2022-08-29 23:49:29 +08:00
mwootton
817dc94345
Add first pass of rocm kernel profiler (#10911)
* Add first pass of rocm kernel profiler

* Clean up rocm_profiler. Format args. Demangle kernel names.
Add Api EventRecords

* Remove debug output

* Temporarily disable profiling unit test 'api record check' for cupti

* Fix compile error for non-gpu builds

* Use common file for demangle and pid/tid.  Namespace ThreadUtil.  Fix gpu buffer clearing.

* Merge demangle into profiler_common

* Merge demangle into profiler_common part 2

* Style cleanup

* Resolve linking issues via ProviderHost interface

* Demangle cuda kernel names

* Clean up comments

* Fix formatting

* Fix anal retentive formatting
2022-08-26 19:38:03 -07:00
cloudhan
46c074a6c8
Update composable kernel and enable experimental inter wave scheduling (#12626)
Update ck to latest master and enable interwave scheduling
2022-08-25 22:19:41 -07:00
Changming Sun
7927d525a7
Remove CUDNN path from CI build scripts (#12671) 2022-08-24 18:21:50 -07:00
Yi Zhang
de3d772995
Check GCC version (#12680)
* check gcc version
2022-08-24 12:10:08 +08:00
Wei-Sheng Chin
dc486d146b
Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460)
* Make ORT as Pytorch JIT backend

LORT likely doesn't work with aten fallback so we only test LORT in its own CI.

* Revert changes to enable external CUDA allocator. Will add it later.

Revert "Revert changes to enable external CUDA allocator. Will add it later."

This reverts commit d5487f2e193014c805505afae8fb577c53667658.

Fix external allocator

* Relax tolerance and remove commented code

* Print more information in CI

* Fix pointer

* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.

* Use Pytorch master branch as all PRs are merged

Fix

* Refine based on cpplint feedbacks

* Revert changes to allow custom CUDA allocator in public APIs

* Use torch.testing.assert_close

* Use unittest framework

* Switch docker repo

* Rename *.cpp to *.cc

* Address comments

* Add comment

* Use same pipeline file for eager and lort pipelines

* Address comments

* Add yaml comment

* Fix cmake files

* Address comments

* Rename flags, remove printing code, remove dead comment
2022-08-22 09:40:40 -07:00
Yulong Wang
bfdd191eec
[wasm] use same export name for SIMD/NOSIMD build (#12545) 2022-08-19 18:17:50 -07:00
yf711
9d10badc55
Add build option to link TensorRT prebuilt parser (#12602)
* Add build option to link prebuilt TensorRT parser

* Test without the build option to link prebuilt TRTParser

* Minor: update name of build option

* Minor: update name of build option
2022-08-16 14:09:58 -07:00
Dmitri Smirnov
616677104a
ONNX Protobuf natvis with some google::protobuf (#12580)
ONNX Protobuf natvis with some google::protobuf structures
  Add leading underscore to local Intrinsic
2022-08-15 09:59:07 -07:00
Xinya Zhang
eb827bd3e5
[ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops (#11971)
* [ROCm] enable NGramRepeatBlock Op

* [ROCm] Enable testing ROCm in NGramRepeatBlockTest.NGramSize_3

Also link onnxruntime_test_all with amdhip64 when USE_ROCM=1

* [ROCm] add LongformerAttention Op

* [ROCm] Enable LongformerAttentionTest

* [ROCm] Add DecoderAttention Op

* Enable DecoderAttention Test for ROCm.

* [ROCM] Updates according to reviews
2022-08-11 19:32:08 -07:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Cheng
819c36701f
[xnnpack] basic QDQ operators support (#11912)
* basic ops for mobilenet,qconv,qsoftmax,qavgpool

update Xnnpack to latest

unit test

* NodeUnit: use outputedge to replace output-node

* qdq model e2e test

* use inlinedvector to replace vector

* conv bias check

* tensorshape helpers

* Refactor xnn_op minmax

* Qlinearsoftmax schema update

* Remove qlinearsoftmax registration

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-08-11 10:12:51 +08:00
Dwayne Robinson
eb90b52a75
DML EP fix training build error (#12461)
Fix onnxruntime_training.cmake missing linkage issue
2022-08-05 16:01:25 -07:00
cloudhan
f39354d7cb
Add composable kernel GEMM baseline for kernel explorer (#12364)
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
2022-08-04 17:32:20 -07:00
Dmitri Smirnov
a4ef0e7f7b
Remove dynamic allocation for ThreadPool ParallelSection (#12429)
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
2022-08-04 09:46:16 -07:00
Dmitri Smirnov
eebaf5f270
Adjust and fixx abseil-cpp debugging visualization (#12415)
Move abseil-cpp.natvis file, add it to PDB, adjust visualization
2022-08-02 15:08:17 -07:00
Edward Chen
f77ab4fea6
Manually add optimization flag for Android Release builds. (#12390)
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740

Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.

This change is a workaround to manually add `-O3` for "Release" Android builds.
2022-08-01 12:49:03 -07:00
George Wu
6bb807ef74
add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377)
* add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices

* add cuda version >= 11 check for orin support
2022-08-01 09:45:58 -07:00
Valery Chernov
1a4868e5c4
[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381)
fix of build on Windows with ipp-crypto. cmake warnings fix

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-31 14:36:54 +02:00
Valery Chernov
e2423bb55c
[TVM EP] Build on Windows with ipp-crypto support (#12336)
* update TVM EP docs for ipp-crypto build conditions

* add ipp-crypto by ExternalProject_Add

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-28 15:40:19 +02:00
msftlincoln
9cf6912bba
Fix ORT Eager Mode to work with Pytorch 1.12 (#12323) 2022-07-27 16:24:46 -04:00
Ashwini Khade
ceb76429db
Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc
Merge On-Device-Training Offline Tooling and C/C++ APIs
2022-07-21 15:09:48 -07:00
Baiju Meswani
cbf08c7a7b Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments 2022-07-21 18:11:48 +00:00
cloudhan
a0074ba9bc
Add baseline gemm for kernel explorer (#12050)
Use rocblasGemmHelper gemm wrapper from ORT and profile for bert param size only.
2022-07-20 13:49:26 +08:00
Michael Melesse
bb5bd08545
[ROCM] Navi21 fixes pr (#11368)
* add scripts

* update docker scripts

* update build script

* create run script

* add test script

* add log 3 flags

* use the right build function

* build navi

* add clean script

* add pytorch like soln

* only build gfx 1030

* use HOST side var

* ignore logs

* update scripts

* GPU_WARP_SIZE_HOST

* update scripts

* remove scripts/amd

* match main

* add GPU_WARP_SIZE_HOST on cuda side

* match main

* correct gfx1030

* remove print

* move gfx add to rocm5.0

* remove inline

* make constexpr on cuda side
2022-07-18 22:26:57 -07:00
Valery Chernov
3b0aaa9e0e
[TVM EP] support build on Windows (#11851)
* add description of build ORT+TVM EP on Windows

* fix cmake error related to symlink creation on Windows

* add llvm config path to build flags for correct build on Windows

* update TVM_EP.md for llvm_config build arg

* fix warnings skipping during build on Windows

* fix using string or wstring for model path to correct build on Windows (MSVC error)

* fix error in custom logger for correct build on Windows

* implement glob algorithm for Windows

* additional build fixes

* update TVM with export of VM symbols for dll

* description of nasm issue and workaround

* update TVM with export of Executable from VM symbols for dll

* description of installation of ipp-crypto dependencies on Windows

* cmake key for ipp-crypto build

* fix wstring for TVMso EP

* fix ipp-crypto build

* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality

* fix absolute path to compiled lib

* update TVM_EP.md, fix lint warnings

* update TVM_EP.md

* small fixes after review

* switch on handshake functionality for Linux workflow

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-07-13 10:48:42 +02:00
cloudhan
785f74979b
Rework cmake for kernel_explorer (#12079)
Improve CMake for deep integration with ORT, so that we can easily hook ort function of microbenchmarking purpose.
2022-07-13 15:43:32 +08:00
Dwayne Robinson
32a8751dc4
DML EP Update to DML 1.9 (#12090)
* Update to DML 1.9

* Appease obnoxious Python formatting tool
2022-07-05 16:30:54 -07:00
Baiju Meswani
1aa27e127c Resolve build conflicts with master 2022-07-05 19:53:54 +00:00
Wenbing Li
479e71a7a8
enable the extensions custom build for java and android (#11823) 2022-07-05 10:34:14 -07:00
Baiju Meswani
a457ddc41d Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr 2022-06-30 21:53:07 +00:00
ashbhandare
0ce14c7068
Fix windows cpu build VS2022 (#12032)
Fix windows cpu build VS2021
2022-06-29 15:45:00 -07:00
Baiju Meswani
6e8edfff0c
Separate training apis from shared core apis (#12027) 2022-06-29 14:12:29 -07:00
Valery Chernov
8ba8146650
[TVM] handshake mechanism for support of TVMso EP (#11437)
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm

* check hash during compile in TVMso EP

* add IPP-CRYPTO to external dependencies for TVM EP

* made checkHash method constant

* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict

* implemented SHA-256 calculation using ipp-crypto library

* fix dependency for ipp-crypto

* add provider options for hash check

* update documentation for added provider options

* add hash check condition

* fix docs

* fix lint

* fix ORT_THROW

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-29 14:57:18 +02:00
Edward Chen
f045994389
[NNAPI EP] Update NNAPI headers (#11954)
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
2022-06-27 18:54:06 -07:00
Baiju Meswani
d25cf4df26 Merge branch 'master' into training_dev/on_device_poc 2022-06-24 20:18:19 +00:00
Preetha Veeramalai
f54476a42f
Dll version fix ovep4.1 (#11953)
* Setting default version values for ovep dlls as well

* Update backend_manager.cc

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: mohsin <mohsinx.mohammad@intel.com>
2022-06-22 11:09:36 -07:00
Gary Miguel
4bf22e2a40
Update ONNX to 1.12 (#11924)
Follow-ups that need to happen after this and before the next ORT release:
* Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731
* Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778

Follow-ups that need to happen after this but don't necessarily need to happen before the release:
* Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916

Fixes #11640
2022-06-21 17:19:52 -07:00
Dwayne Robinson
64f95d400a
Update DML 1.9 Nuget package to fix WindowsAI nuget pipeline build issue (#11934) 2022-06-21 15:55:51 -07:00
sfatimar
f97bd38c4f
UEP 4.1 release (#11834)
* Add pypi build changes to latest Master

* Add ORT training part of OV build

* Disabling SqueezeOpTest.BadAxes

* Add ONNXruntime branch ARG to Docker build

* Changes to include file details versions

* Commit File Version Updates

* Change naming for linux build

* Add fix for pylint format errors

* Fix pylint warnings.

* Fix pylint errors - stage 2

Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com>

* Fix pylint errors - stage 3

* Fix pylint format - stage4

Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com>

* Commit for Wheel Release >0.35.1

Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: nmaajidk <n.maajid.khan@intel.com>
2022-06-17 14:49:04 -07:00
Dwayne Robinson
3d99f16e98
Merge pull request #11827 from microsoft/user/dwayner/DmlEp1.9
Integrate WindowsAI feature branch with DML EP features and DML 1.9
2022-06-16 13:04:00 -07:00
George Wu
df5ee6aa4e
[TensorRT EP] support TensorRT 8.4 (#11866)
* update trt 8.4ga

* trt 8.4 linux ci pipeline

* fix cmake

* placeholder_builder

* trt 8.4 windows pipeline

* gpu package pipeline

* trt 8.4.1.5 , packaging pipeline updates

* python packaging

* ctest timeout

* python packaging test

* bump timeout

* python format

* format

* revert

* newline

* enable trt python tests

* typo

* python format

* disable on windows
2022-06-16 07:46:40 -07:00
Dwayne Robinson
babd6e3fcd Update DirectML preview package with unmangled names 2022-06-15 18:16:58 -07:00
Scott McKay
d64f23fec0
EP factory creation cleanup and enhancements. (#11798)
* Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places.
Convert append EP for SNPE to be generic, and also use for XNNPACK.
Add XNNPACK to C# API

* Don't need stub for MIGraphX as it's using provider bridge.

* Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries.

* Only use EPs that require the layout transform if the opset is supported by the layout transformer.

* Update wasm registration of xnnpack.
2022-06-16 07:01:41 +10:00
Ashwini Khade
f63e28c92f
C API version 0.001 (#11758)
* C API version 0.001

* fix linker issues

* fixes for save checkpoint api

* plus fixes based on tests

* plus test_runner and other changes

* Plus cosmetic updates

* remove unnecessary headers

* plus some updates

* plus more changes

Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-06-15 11:13:35 -07:00
Dwayne Robinson
ff8b173286 Typo in DirectML.Debug.dll 2022-06-15 00:18:40 -07:00
Dwayne Robinson
508c76a246 Add missing DirectML.Debug.dll 2022-06-15 00:16:10 -07:00
Dwayne Robinson
4c1a410d54 Unmangle DML preview package filenames 2022-06-14 23:12:58 -07:00
daquexian
3cbbf9dcae
Fix wasm static lib in sub-project (#11671)
* wasm_static_lib_global

Signed-off-by: daquexian <daquexian566@gmail.com>

* make wasm static lib global

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix the property

Signed-off-by: daquexian <daquexian566@gmail.com>

* add code missing after merge

Signed-off-by: daquexian <daquexian566@gmail.com>
2022-06-14 15:18:11 -07:00
Gary Miguel
e8b0d24071
Support per-test tolerances for ONNX tests (#11775)
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.

In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.

Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.

Also fix various minor issues raised by linters.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 15:12:23 -07:00
Scott McKay
6bf6bac1fd
Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829) 2022-06-14 09:31:17 +10:00
Hector Li
7582644f57
cmake changes for SNPE EP (#11821)
* move code used to find the SNPE libs to a separate cmake file

* Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict
2022-06-13 08:15:37 -07:00
Dwayne Robinson
50e0a193c8 Merge branch 'master' into user/dwayner/DmlEp1.9 2022-06-11 19:01:51 -07:00
Dwayne Robinson
76024b8a6a Update DirectML.dll to 1.9.0 Preview 2022-06-11 18:51:32 -07:00
pengwa
fb88efbe18
End to end run pass (on device training) (#11694)
* lr_scheduler implementation

(cherry picked from commit d9c2552b3a3b2ff38ee0a14770257aa1169f6fa9)

* refactor Module/Optimizer constructor.

* add intermidiate API layer bridging public interfaces with internal ones.

* synthetic data loader

* make end to end run pass

* avoid many session input copy (CPU to GPU)
some clean up

* NVTX for runner

* minor fix after sync

* revert to let Module/Optimizer handle session creation.

* fix tests & test file folder consolidation

* refine based on comments & fix cpplint

* typos
2022-06-10 15:25:44 -07:00
Guenther Schmuelling
d4ea59654c
make xnnpack build for ort-web (#11745)
* make xnnpack build for ort-web

* make ci happy
2022-06-10 08:47:57 -07:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
Alex Fuller
8156b9370c
[Abseil] Adding URL_HASH so that an existing archive can be used from disk (#11690) 2022-06-08 17:12:59 -07:00
pengwa
540935aace
lr scheduler implementation (on device training) (#11714)
* lr_scheduler implementations

* rename test_runner to test_trainer.

* add unit tests

* address comments
2022-06-09 08:04:30 +08:00
Changming Sun
eeeb249a27
Update onnxruntime_providers.cmake to remove the reference of "onnxruntime_tvm_dependencies" (#11780) 2022-06-08 09:06:00 -07:00
Valery Chernov
4296968f20
[TVM EP] update set input method for VirtualMachine (#11674)
* update TVM

* get alignment constant from TVM

* update TVM_VM_SetInputs to upstream with TVM API

* fix CI issue: update TVM EP dependencies

* add sudo

* revert changes needed to install missing package

* add package for TVM EP CI

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-04 09:31:01 +02:00
Hector Li
95a16c1ffe
Snpe ep (#11665)
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
2022-06-03 14:10:02 -07:00
Scott McKay
4445dd6bc1
XNNPACK EP (#11445)
* Implement XNNPACK support via an EP.
  * Layout transform uses the GraphPartitioner infrastructure.
  * Node fusion is supported.
  * Conv and MaxPool implementations were ported from Changming's PR.
  * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled
2022-06-03 20:22:34 +10:00
ashbhandare
1c316d0e39
Parameter,Module and Optimizer changes (#11494)
* Module step

* On device training offline composition

* Working grad accumulation with test for TrainStep

* Temp changes

* Revert "On device training offline composition"

This reverts commit ec3da68247.

* cleanup

* Implement eval step

* Use new graphs and checkpoints

* Optimizer test, changes

* review comments

* review comments

* review comments

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-05-31 09:20:47 -07:00
pengwa@microsoft.com
e1c63cb06a Merge branch 'master' of https://github.com/microsoft/onnxruntime into training_dev/on_device_poc 2022-05-28 01:54:17 +00:00
Scott McKay
4fabc400de
Fix CUDA 11.6 build error on Windows (#11578)
* Avoid windows header that defines 'small'
2022-05-28 08:04:46 +10:00
Jeff Bloomfield
a7fa735286 Merge remote-tracking branch 'origin/master' into WindowsAI 2022-05-27 12:53:54 -07:00
Baiju Meswani
3a22a866a1
On device training offline tooling (#11520) 2022-05-24 18:21:39 -07:00
Yi Zhang
a3f05da338
Revert "[TVM EP] update set input to remove excess copying inside TVM (#11247)" (#11504)
This reverts commit 5ae461ec0a.
2022-05-13 02:27:36 +08:00
Tianlei Wu
ece1274ffa
revert safeint version (#11500) 2022-05-12 11:24:43 -07:00
Tianlei Wu
f5473596fa
Change longformer default kernel (#11470)
* change default to compact memory kernel
* Remove a cuda stream synchronize that is not needed
* Update longformer benchmark tool
2022-05-11 10:54:59 -07:00