Commit graph

1959 commits

Author SHA1 Message Date
RandySheriffH
d35361bf9d
Fix python pipeline for AzureEP without using root (#16023)
Fix python pipeline for AzureEP without using root, this is for 1.15.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-22 16:38:47 -07:00
Changming Sun
0204594f90
Cleanup WASM cmake code (#15996)
### Description
Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if
(CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code
look more nature.
For example,

```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY)
```
becomes
```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
```
2023-05-20 18:07:39 -07:00
Hector Li
4324d2173b
[QNN EP] Enable Qnn context cache to save model initialization time (#15815)
### Description
Enable Qnn Context cache feature to save model initialization time
Provider options:
qnn_context_cache_enable|1 to enable the cache feature
qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default.

### Motivation and Context
Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost.

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-05-19 10:52:17 -07:00
RandySheriffH
4dfb89b3ad
Implement mutex-free spin lock for task queue (#14834)
Implemented "lock-free" spinlock to save CPU usage on context switching.
The change has been tested on queene service of Ads team, the lock-free
version of ort (40 threads) saves CPU usage on gen8 (128 logical
processors on 8 numa nodes) windows by nearly half, from 65% to 35%.

For 32 cores, the curve is flat:

Anubis, 32 vCPU, windows, hugging face models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
 alvert_base_v2 | 34.21 | 34.09
 bert_large_uncased | 116.27| 117.84
 bart_base | 72.06 | 71.99
 distilgpt2 | 25.43 | 25.02
 vit_base_patch16_224 | 37.33 | 37.76

Anubis, 32 vCPU win, Linux, 1st party models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
deepthink_v2 | 24.35 | 22.95
bing_feeds |  36.96 | 36.48
deep_writes |  14.46 | 14.32
keypoints |  9.34 | 7.69
model11 |  1.71 | 1.66
model12 |  1.82 | 1.44
model2 |  4.21 | 3.95
model6 |  1.08 | 1.05
agiencoder |  0.99 | 0.93
geminet_transformer |  5.32 | 5.24

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-19 10:12:10 -07:00
Patrice Vignola
310b22aa0c
[DML EP] Update DirectML version to 1.12.0 (#16011) 2023-05-18 19:37:12 -07:00
PeixuanZuo
d78bbf5ef2
[ROCm] remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline (#16004)
remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline.
2023-05-19 10:29:01 +08:00
Edward Chen
6d46007028
Add explicit 'set +x' before printing a vso[] command to avoid output getting parsed again with a trailing quote. (#15986)
Here's the motivating issue:
https://github.com/microsoft/azure-pipelines-tasks/issues/10331

Noticed some problems in other repos so also updating usages in ORT.

We may be fine now without it, but this change adds some safeguard against future additions of 'set -x' for debugging.
2023-05-17 19:30:28 -07:00
Changming Sun
d98763473a
Change CUDA pipelines to download CUDA SDK in every build job (#15915)
### Description
Change CUDA pipelines to download CUDA SDK in every build job


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 17:31:51 -07:00
cloudhan
856afa49dd
[C#] Add missing rocm csharp api (#15540) 2023-05-18 08:15:19 +08:00
Yi Zhang
6d43d51eb0
[Fix] No test result report while not using ctest (#15976)
### Description
1. Set gtest output while ctest is set to empty.
2. onnx_src in _deps shouldn't be removed because
onnx_test_pytorch_converted and onnx_test_pytorch_converted need to read
data from onnx/backend/test/data/..

### Motivation and Context
Test result report is important to find the flaky tests.

### To do
Tests are not inconsistent.
If ctest_path is empty, onnx_test_pytorch_converted and
onnx_test_pytorch_converted will not be executed, if it's not,
onnxruntime_mlas_test will not be executed.


270c09a37f/tools/ci_build/build.py (L1743-L1753)
2023-05-17 08:31:16 -07:00
Jian Chen
2881d849d4
Update Win-CPU-2021 to onnxruntime-Win-CPU-2022 (#15967)
### Description
After this PR there are following pool need to be updated.

old|new|note
---|---|---
onnxruntime-Win2019-GPU-dml-A10|tbd|
onnxruntime-Win2019-GPU-T4|onnxruntime-Win2022-GPU-T4|
onnxruntime-Win2019-GPU-training-T4|onnxruntime-Win2022-GPU-T4|ame as
the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4|tbd|
aiinfra-dml-winbuild|tbd|
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 08:29:27 -07:00
kailums
f62f722c70
integrate triton into ort (#15862)
### Description
In some scenarios, the triton written kernels are more performant than
CK or other handwritten kernels, so we implement a framework that
onnxruntime can use these triton written kernels.

This PR is to integrate triton into ort, so that ort can use kernels
that written and compiled by triton.

The main change focus on two part:
1. a build part to compile triton written kernel and combine these
kernels into libonnxruntime_providers_rocm.so
2. a loader and launcher in c++, for loading and launch triton written
kernels.

#### Build

To compile triton written kernel, add a script
`tools/ci_build/compile_triton.py`. This script will dynamic load all
kernel files, compile them, and generate `triton_kernel_infos.a` and
`triton_kernel_infos.h`.

`triton_kernel_infos.a` contains all compiled kernel instructions, this
file will be combined into libonnxruntime_providers_rocm.so, using
--whole-archive flag.

`triton_kernel_infos.h` defines a const array that contains all the
metadata for each compiled kernel. These metadata will be used for load
and launch. So this header file is included by 'triton_kernel.cu' which
defines load and launch functions.

Add a build flag in build.py and CMakeList.txt, when building rocm
provider, it will call triton_kernel build command, and generate all
necessary files.

#### C++ Load and Launch

On c++ part, we implement load and launch functions in triton_kernel.cu
and triton_kernel.h.

These two files located in `providers/cuda`, and when compiling rocm,
they will be hipified. so this part supports both cuda and rocm. But
currently we only call triton kernel in rocm.

We also implement a softmax triton op for example. Because there will
generate many kernels for different input shape of softmax, we use
TunableOp to select the best one.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 09:35:28 +08:00
Jian Chen
780442b9f6
Change windows machine pools to use VS2022
 (#15806)
### Description
<!-- Describe your changes. -->



Old pool | New pool | Notes
-- | -- | --
onnxruntime-Win-CPU-2019 | onnxruntime-Win-CPU-2022 |  
onnxruntime-Win2019-CPU-training | onnxruntime-Win2022-CPU-training-AMD
|  
onnxruntime-Win2019-CPU-training-AMD |
onnxruntime-Win2022-CPU-training-AMD | Same as the above
onnxruntime-Win2019-GPU-dml-A10 | Need be created | You need to create a
new image for it first
onnxruntime-Win2019-GPU-T4 | onnxruntime-Win2022-GPU-T4 |  
onnxruntime-Win2019-GPU-training-T4 | onnxruntime-Win2022-GPU-T4 | Same
as the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4| TBD|TBD
Win-CPU-2021|onnxruntime-Win-CPU-2022| will do it in next PR
Win-CPU-2019|onnxruntime-Win2022-Intel-CPU'| Intel CPU needed for
win-ci-pipeline.yml -> `stage: x64_release_dnnl`

<br class="Apple-interchange-newline">

### Motivation and Context
With vs2022 we can take the advantage of 64bit compiler. It also with
better c++20 support
2023-05-16 10:34:34 -07:00
RandySheriffH
7faad53632
Set default option for package name and build arg options (#15958)
Set default value for parameters in nuget-zip pipeline, and only apply
the configurations when they are not "NONE".

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-16 09:07:38 -07:00
cloudhan
dc383ed4ce
Basic CSharp packaging support for ROCm EP (#15535)
This PR mainly fixes building errors when trying to build nupkg for ROCm EP.
It also slighly improve the packaging logic so that devlopers can
produce the nupkg on linux natively.
2023-05-16 07:27:38 +08:00
yf711
825d691617
Unify cuda & trt version on few CIs (#15943)
### Description
The cuda & trt version of some CIs didn't sync with the majority. 
Unifying cuda version as 11.8 and trt version as 8.6 on these CIs
2023-05-15 09:54:30 -07:00
Rachel Guo
18133ddadb
[doc] add LeakyRelu to coreml supported ops (#15944)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-15 09:46:30 -07:00
Adrian Lizarraga
5542e70dd1
[QNN EP] Update default QNN SDK version to 2.10 for QNN NuGet pipeline (#15899)
### Description
Updates the default QNN SDK version to 2.10 for the QNN NuGet pipeline.

### Motivation and Context
Ensures that the daily QNN NuGet pipeline builds ORT using the latest
QNN SDK by default.
2023-05-15 09:17:42 -07:00
PeixuanZuo
af6cb2af87
[ROCm] update ROCm/MIGraphX CI to ROCm5.5 (#15905)
update ROCm/MIGraphX CI to ROC5.5.

TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(https://github.com/microsoft/onnxruntime/pull/15903)
- test_ortmodule_attribute_name_collision_warning
(https://github.com/microsoft/onnxruntime/pull/15884)
2023-05-15 10:28:15 +08:00
Yi Zhang
b20d5e85d5
Update Cuda to 11.8 in 2 Linux GPU workflows. (#15925)
### Description
use template variable for cuda version


### Motivation and Context
2023-05-14 12:51:25 +08:00
RandySheriffH
7c4e8267e7
Implement openAI endpoint invoker for nuget (#15797)
Implement openAI audio endpoint, and enable nuget packaging.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-11 22:04:02 -07:00
Yi Zhang
0e7ae13e74
Run Linux GPU tests in docker container (#15872)
### Description
Run  Linux GPU tests in docker container

### Motivation and Context
2023-05-12 06:29:22 +08:00
Jian Chen
1a73d61829
Update eigen to 3.4 and remove the eigen from git submodule (#15875)
### Description
Update eigen to 3.4 and remove the eigen from git submodule

### Motivation and Context
We need to have eigen 3.4 for c++20
2023-05-11 11:56:59 -07:00
Changming Sun
7c58d013aa
Remove Ubuntu 18.04 usages (#15781)
### Description
Remove Ubuntu 18.04 usages because it will be EOL this month.

### Motivation and Context
2023-05-11 11:44:00 -07:00
Yulong Wang
756cf3a76f
increase web CI timeout (#15876)
### Description
The CI is extremely slow on downloading source code (~1MB/sec) so the
web CI went timeout. This is blocking the PR/checks.

Increase the timeout temporarily.
2023-05-11 11:17:46 -07:00
liqun Fu
ac9ae9f7c5
update onnx release 1.14 for docker files (#15680)
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.


### Motivation and Context

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-05-10 13:15:56 -07:00
Nat Kershaw (MSFT)
36c9ae0f58
Fix release version suffix for RC builds (#15865) 2023-05-09 23:06:08 -07:00
Sumit Agarwal
b473e3f3c6
[DML EP] Update DirectML version to 1.11.0 (#15858)
### Description
- Update DML version to 1.11.0
- Disable Gemm+Softmax fusion



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-09 12:48:15 -07:00
Jian Chen
34cb293c6b
Remove unused ADO YML pipeline template (#15857)
### Description
Remove unused ADO YML pipeline template



### Motivation and Context
Clean up and reduce our codebase.
2023-05-09 09:15:04 -07:00
Wanming Lin
00b1e79e04
Support WebNN EP (#15698)
**Description**: 

This PR intends to enable WebNN EP in ONNX Runtime Web. It translates
the ONNX nodes by [WebNN
API](https://webmachinelearning.github.io/webnn/), which is implemented
in C++ and uses Emscripten [Embind
API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#).
Temporarily using preferred layout **NHWC** for WebNN graph partitions
since the restriction in WebNN XNNPack backend implementation and the
ongoing
[discussion](https://github.com/webmachinelearning/webnn/issues/324) in
WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW'
layouts. No WebNN native EP, only for Web.

**Motivation and Context**:
Allow ONNXRuntime Web developers to access WebNN API to benefit from
hardware acceleration.

**WebNN API Implementation Status in Chromium**:
- Tracked in Chromium issue:
[#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291)
- **CPU device**: based on XNNPack backend, and had been available on
Chrome Canary M112 behind "#enable-experimental-web-platform-features"
flag for Windows and Linux platforms. Further implementation for more
ops is ongoing.
- **GPU device**: based on DML, implementation is ongoing.

**Open**:
- GitHub CI: WebNN currently is only available on Chrome Canary/Dev with
XNNPack backend for Linux and Windows. This is an open to reviewers to
help identify which GitHub CI should involved the WebNN EP and guide me
to enable it. Thanks!
2023-05-08 21:25:10 -07:00
Yulong Wang
0457fd0b40
upgrade emsdk to 3.1.37 (#15817)
### Description
upgrade emsdk to 3.1.37

WIP branch to debug the mystery memory issue in web assembly
multi-thread build.
2023-05-08 16:49:47 -07:00
Yi Zhang
045c623415
Make Nuget workflow easy to debug (#15808)
### Description
Fix the bug in #15693 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-08 20:53:08 +08:00
Nat Kershaw (MSFT)
5e9b42326c
Fix packaging pipeline for nightly builds (#15839) 2023-05-07 20:42:38 -07:00
PeixuanZuo
41457885e0
[ROCm] add rocm5.5 to python package pipeline (#15820)
add rocm5.5 to python packaging pipeline.

https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=306082&view=results

TODO:
Remove version 5.2.3, 5.3.2 and 5.4 in the next PR.
2023-05-06 10:21:15 +08:00
Nat Kershaw (MSFT)
ed31e4b737
Add nuget release version suffix to support publishing rcs to nuget.org (#15791) 2023-05-05 18:18:24 -07:00
Adrian Lizarraga
45f5c27632
[QNN EP] Update default QNN SDK to version 2.10.0 (#15818)
### Description
- Updates the default QNN SDK for CI pipelines to version 2.10.0.
- Disables convolution op tests that run on the QNN CPU backend due to a
potential bug with QNN SDK 2.10.0.

### Motivation and Context
Allows us to test the latest QNN SDK in default CI pipeline runs.
2023-05-05 13:01:21 -07:00
Guenther Schmuelling
5a43828b3d
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688)
needed to get tokenizers/decode for whisper

---------

Co-authored-by: Shalva Mist <shalvamist@microsoft.com>
2023-05-05 09:48:07 -07:00
Scott McKay
d1b2b35cd2
Various fixes to the CSharp setup (#15782)
### Description
<!-- Describe your changes. -->
Various fixes to the CSharp setup
- fix warnings
- fix invalid tests
- update test sdk nuget package
  - enables testing on linux
  - fixes issue with some unit tests not running in CI
- run unit tests in linux pipeline using dotnet

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unit tests weren't breaking in CIs for both Windows and Linux builds and
should have been.
2023-05-05 14:27:30 +10:00
Yulong Wang
4712009f8a
[js/web] add target ort.webgpu.min.js (#15780)
### Description
add target ort.webgpu.min.js

WebGPU is experimental feature, so I don't want to put webgpu into the
ort.min.js file. This change adds 2 ways for users to access ort-web
with webgpu:
- using script tag: by URL
`https://cdn.jsdelivr.net/npm/onnxruntime-web@1.15.0/dist/ort.webgpu.min.js`
( this URL is not ready yet )
- using `import()`: use `import { Tensor, InferenceSession } from
'onnxruntime-web/webgpu';` - 'onnxruntime-web/webgpu' instead of
'onnxruntime-web'
2023-05-04 10:05:39 -07:00
Yulong Wang
33d1372729
[wasm] revert emsdk to v3.1.19 (#15793)
### Description
latest emsdk generated multi-thread version sometimes crash with unknown
reason ( error: memory access out of bounds ).

we don't want to break existing ort-web users, so revert emsdk back to
3.1.19 (same to what ort v1.14.0 uses)
2023-05-04 01:15:01 -07:00
Baiju Meswani
e464588a0e
Avoid generating training documentation during packaging (#15795) 2023-05-03 19:09:07 -07:00
Changming Sun
1fb2f2605b
Update VERSION_NUMBER (#15773)
### Description

1. Update VERSION_NUMBER for preparing the upcoming release. This PR's
commit will not be included in the 1.15 release branch
2. Delete package/rpm/onnxruntime.spec since it was not used in past
years.

### Motivation and Context
Preparing the release.

Fixed
[AB#15311](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15311)
2023-05-03 15:07:34 -07:00
Baiju Meswani
ba7b83ff3c
Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776) 2023-05-03 13:08:35 -07:00
Changming Sun
d53324d4a7
Update cmake version in a few places (#15775)
### Description
They were missed in #15707 , because they are not in common places for Dockerfiles.

Though this commit updated tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile, it won't automatically take effect. The image needs to be manually generated and pushed to a place, and before doing that our CMakeLists.txt also needs to be tweaked a little bit.
2023-05-02 22:56:28 -07:00
Yulong Wang
ef1f17f3dc
[wasm/JSEP] add threaded build to artifacts (#15777)
### Description
This is the first part to create a webassembly artifacts for ort-web
webgpu EP (wasm build).

there will be following steps to consume the artifacts in web build
2023-05-02 17:53:44 -07:00
Baiju Meswani
2d519d21af
Python documentation for onnxruntime-training (#15765) 2023-05-02 16:58:16 -07:00
Jian Chen
abdd4f518a
Update TRT Windows Cuda 11.6 to 11.8 (#15746)
### Description
Update TRT Windows cuda 11.6 to 11.8



### Motivation and Context
We are adapting newer version of cuda systemwide.
2023-05-02 12:23:13 -07:00
Changming Sun
328cabb194
Download protoc from Github Release instead of Nuget (#15731)
### Description
Download protoc from Github Release instead of Nuget to avoid having
dependency on nuget.exe on Linux

### Motivation and Context
To avoid having dependency on nuget.exe on Linux. Many users' build
environment do not have nuget or dotnet.
2023-05-02 12:18:59 -07:00
Changming Sun
5352f6d9b0
Make "--cuda_version" build arg optional (#15758)
### Description
This change will allow us building CUDA EP without installing CUDA SDK
on Windows.

### Motivation and Context
Nvidia's CUDA installer comes with a VS extension. In the past, we
require installing the extension. It is a little bit inconvenient since:
1. Visual Studio must be installed before CUDA SDK. CUDA's installer
will not install the extension if your machine doesn't have Visual
Studio.
2. We need to install CUDA SDK on our build machines, instead of just
downloading it and using it.

After this change, we will not need to install CUDA SDK on our build
machines. So it will be easier to add a support for a different CUDA
version.

Also, fix two PreFast warnings.
2023-05-01 18:00:47 -07:00
Ashwini Khade
0ffae8073b
Creating Nuget and Android packages for Training (#15712)
### Description
This PR creates Nuget and Android for Training. 


### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.

## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**

### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**

### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
2023-05-01 12:59:56 -07:00