### Description
<!-- Describe your changes. -->
Various fixes to the CSharp setup
- fix warnings
- fix invalid tests
- update test sdk nuget package
- enables testing on linux
- fixes issue with some unit tests not running in CI
- run unit tests in linux pipeline using dotnet
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unit tests weren't breaking in CIs for both Windows and Linux builds and
should have been.
### Description
Fix two errors that is only encountered on windows
### Motivation and Context
For onnxruntime::VitisAIProviderFactoryCreator::Create, it would cause
the compile error.
For if (it == provider_options_map.end()), it would cause an error but
execute as normal
Co-authored-by: Zhang <yueqingz@amd.com>
Bumps [engine.io](https://github.com/socketio/engine.io) from 6.4.1 to
6.4.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/engine.io/releases">engine.io's
releases</a>.</em></p>
<blockquote>
<h2>6.4.2</h2>
<p>⚠️ This release contains an important security fix
⚠️</p>
<p>A malicious client could send a specially crafted HTTP request,
triggering an uncaught exception and killing the Node.js process:</p>
<pre><code>TypeError: Cannot read properties of undefined (reading
'handlesUpgrades')
at Server.onWebSocket (build/server.js:515:67)
</code></pre>
<p>Please upgrade as soon as possible.</p>
<h3>Bug Fixes</h3>
<ul>
<li>include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)
(<a
href="93957828be">9395782</a>)</li>
<li>prevent crash when provided with an invalid query param (<a
href="fc480b4f30">fc480b4</a>)</li>
<li><strong>typings:</strong> make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)
(<a
href="bd6d4713b0">bd6d471</a>)</li>
<li><strong>uws:</strong> prevent crash when using with middlewares (<a
href="8b22162903">8b22162</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/tyilo"><code>@tyilo</code></a> and <a
href="https://github.com/cieldeville"><code>@cieldeville</code></a> for
helping!</p>
<h4>Links</h4>
<ul>
<li>Diff: <a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">https://github.com/socketio/engine.io/compare/6.4.1...6.4.2</a></li>
<li>Client release: -</li>
<li>ws version: <a
href="https://github.com/websockets/ws/releases/tag/8.11.0">~8.11.0</a>
(no change)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/engine.io/blob/main/CHANGELOG.md">engine.io's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">6.4.2</a>
(2023-05-02)</h2>
<p>⚠️ This release contains an important security fix
⚠️</p>
<p>A malicious client could send a specially crafted HTTP request,
triggering an uncaught exception and killing the Node.js process:</p>
<pre><code>TypeError: Cannot read properties of undefined (reading
'handlesUpgrades')
at Server.onWebSocket (build/server.js:515:67)
</code></pre>
<p>Please upgrade as soon as possible.</p>
<h3>Bug Fixes</h3>
<ul>
<li>include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)
(<a
href="93957828be">9395782</a>)</li>
<li>prevent crash when provided with an invalid query param (<a
href="fc480b4f30">fc480b4</a>)</li>
<li><strong>typings:</strong> make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)
(<a
href="bd6d4713b0">bd6d471</a>)</li>
<li><strong>uws:</strong> prevent crash when using with middlewares (<a
href="8b22162903">8b22162</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/tyilo"><code>@tyilo</code></a> and <a
href="https://github.com/cieldeville"><code>@cieldeville</code></a> for
helping!</p>
<h3>Dependencies</h3>
<ul>
<li><a
href="https://github.com/websockets/ws/releases/tag/8.11.0"><code>ws@~8.11.0</code></a>
(no change)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="95e215387c"><code>95e2153</code></a>
chore(release): 6.4.2</li>
<li><a
href="fc480b4f30"><code>fc480b4</code></a>
fix: prevent crash when provided with an invalid query param</li>
<li><a
href="0141951185"><code>0141951</code></a>
refactor(types): ensure compatibility with Express middlewares</li>
<li><a
href="8b22162903"><code>8b22162</code></a>
fix(uws): prevent crash when using with middlewares</li>
<li><a
href="93957828be"><code>9395782</code></a>
fix: include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)</li>
<li><a
href="911d0e3575"><code>911d0e3</code></a>
refactor: return HTTP 400 upon invalid request overlap</li>
<li><a
href="bd6d4713b0"><code>bd6d471</code></a>
fix(typings): make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)</li>
<li>See full diff in <a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
add target ort.webgpu.min.js
WebGPU is experimental feature, so I don't want to put webgpu into the
ort.min.js file. This change adds 2 ways for users to access ort-web
with webgpu:
- using script tag: by URL
`https://cdn.jsdelivr.net/npm/onnxruntime-web@1.15.0/dist/ort.webgpu.min.js`
( this URL is not ready yet )
- using `import()`: use `import { Tensor, InferenceSession } from
'onnxruntime-web/webgpu';` - 'onnxruntime-web/webgpu' instead of
'onnxruntime-web'
Implement a set of new APIs for lightweight custom ops registration, to
save efforts from schema-composing.
A few highlights:
- Support build-time type inference;
- Support function-as-op for "stateless" ops;
- Support structure-as-op for "stateful" ops;
- Support varied input/output forms such as span, scalar, and tensors,
either optional or non-optional.
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
latest emsdk generated multi-thread version sometimes crash with unknown
reason ( error: memory access out of bounds ).
we don't want to break existing ort-web users, so revert emsdk back to
3.1.19 (same to what ort v1.14.0 uses)
### Description
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1
### Motivation and Context
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1
### Description
1. Update VERSION_NUMBER for preparing the upcoming release. This PR's
commit will not be included in the 1.15 release branch
2. Delete package/rpm/onnxruntime.spec since it was not used in past
years.
### Motivation and Context
Preparing the release.
Fixed
[AB#15311](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15311)
### Description
They were missed in #15707 , because they are not in common places for Dockerfiles.
Though this commit updated tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile, it won't automatically take effect. The image needs to be manually generated and pushed to a place, and before doing that our CMakeLists.txt also needs to be tweaked a little bit.
### Description
This is the first part to create a webassembly artifacts for ort-web
webgpu EP (wasm build).
there will be following steps to consume the artifacts in web build
### Description
Download protoc from Github Release instead of Nuget to avoid having
dependency on nuget.exe on Linux
### Motivation and Context
To avoid having dependency on nuget.exe on Linux. Many users' build
environment do not have nuget or dotnet.
### Description
Parallelize fp16 pooling operators
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Currently compiling with clang + cuda leads to:
```
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:33:6: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup
ss << t;
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<gsl::span<const long, 18446744073709551615>>' requested here
MakeStringImpl(ss, args...);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:46:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, gsl::span<const long, 18446744073709551615>>' requested here
MakeStringImpl(ss, args...);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:93:18: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, gsl::span<const long, 18446744073709551615>>' requested here
return detail::MakeStringImpl(detail::if_char_array_make_ptr_t<Args const&>(args)...);
^
/code/build/_deps/onnxruntime-src/onnxruntime/contrib_ops/cuda/quantization/qordered_ops/qordered_qdq.cc:73:12: note: in instantiation of function template specialization 'onnxruntime::MakeString<char[39], gsl::span<const long, 18446744073709551615>>' requested here
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Shape not meet clean tile requirement!", dims);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/common.h:188:48: note: expanded from macro 'ORT_MAKE_STATUS'
::onnxruntime::MakeString(__VA_ARGS__))
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/framework/tensor_shape.h:201:15: note: 'operator<<' should be declared prior to the call site or in namespace 'gsl'
std::ostream& operator<<(std::ostream& out, const TensorShape& shape);
^
1 error generated.
```
### Description
Adds an option to load local state dictionary for whisper model export.
### Motivation and Context
This is useful to demonstrate workflow of using ORT Training to get
model weights, downloading said weights onto a local gpu-enabled device,
exporting the custom model using `convert_to_onnx.py`, and then nicely
feeding the .onnx file into ORT InferenceSession.
### Description
This change will allow us building CUDA EP without installing CUDA SDK
on Windows.
### Motivation and Context
Nvidia's CUDA installer comes with a VS extension. In the past, we
require installing the extension. It is a little bit inconvenient since:
1. Visual Studio must be installed before CUDA SDK. CUDA's installer
will not install the extension if your machine doesn't have Visual
Studio.
2. We need to install CUDA SDK on our build machines, instead of just
downloading it and using it.
After this change, we will not need to install CUDA SDK on our build
machines. So it will be easier to add a support for a different CUDA
version.
Also, fix two PreFast warnings.
### Description
This PR creates Nuget and Android for Training.
### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.
## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**
### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**
### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
### Description
While building ORT for DML EP with `dml_EXTERNAL_PROJECT` flag, 2
variables (`DML_SHARED_LIB`, `DML_PACKAGE_DIR`) value is not set
properly.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice
### Motivation and Context
Currently “Location” is represented as ORTMemoryInfo, which is OrtDevice
+ OrtMemType, while OrtDevice is represent as DeviceType + DeviceId +
MemType. As we can see there is some unnecessary hierarchy, the proposal
is to make it a clear definition that to use OrtDevice as an abstraction
for Location
---------
Co-authored-by: Lei Cao <leca@microsoft.com>
### Description
Enabled the use of per channel Bias and Mean normalization when converting an image <--> tensor.
Added a few bug fixes and updates to the relevant E2E tests.
---------
Co-authored-by: shalvamist <shalva.mist@microsoft.com>
Symbol visibility from DllImport is inconsistent across platforms resulting in the symbol not necessarily being visible to ORT native code that tries to look it up by name.
Best solution is to use DllImport to load the library and to call the registration function directly. That requires the native SessionOptions handle and OrtApiBase struct. We could either make those public, or provide a helper where the user passes in a delegate from their DllImport. Can add when needed.
Implement a set of new APIs for lightweight custom ops registration, to
save efforts on schema-composing.
A few highlights:
1. Support build-time type inference;
2. Support function-as-op for "stateless" ops;
3. Support structure-as-op for "stateful" ops;
4. Support varied input/output forms such as span, scalar, and tensors,
either optional or non-optional.
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
Augment nhwc graph optimizer to accommodate fp16 operators.
### Motivation and Context
With new fp16 conv operator added. This operator prefers NHWC data
layout. We need to augment existing graph optimizers to better utilize
the new operator.
### Description
Originally VitisAI EP only works with old version of VitisAI release.
### Motivation and Context
Update VitisAI EP so that it works together with the current VitisiAI
3.5 and further version of VitisAI. We try our best to make it forward
compatible.
---------
Co-authored-by: Wang Chunye <chunywan@xilinx.com>
Co-authored-by: mingyue <mingyue@amd.com>
Co-authored-by: mingyueliuh <131847423+mingyueliuh@users.noreply.github.com>
Co-authored-by: liumingyue <mingyue@xilinx.com>
Co-authored-by: moore-ch <129165652+moore-ch@users.noreply.github.com>
Co-authored-by: shoucair <shoucai.ren@amd.com>
Co-authored-by: zz002 <zhenze.wang@amd.com>
Co-authored-by: BoarQing <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
This addresses a performance regression in some INT8 models with the
DirectML EP by defaulting OrtSessionOptionsDisableQuantQDQ to 1 when the
EP is registered.
This regression occured due to the introduction of the QDQ propagation
transformer, which is based on this session option. That transformer
maximizes the number of nodes which are executed as quantized by
logically propagating quantize operators upstream and dequantize
operators downstream. However, it does this simply by inserting QDQ
pairs, with an expectation that something will recognize sequences of
DQ->Op->Q. This logic and related L2 transformers are not currently
enabled for the DirectML EP.
This change also removes a noisy warning when the session option for
memory pattern is overriden as the DirectML EP is registered.
### Description
Walkaround of https://github.com/microsoft/onnxruntime/issues/15521.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Previous behavior of TRT EP to set TRT optimization profiles for dynamic
shape input is based on input tensor values. Users can't explicitly
specify the profiles.
This PR makes users capable of specifying min/max/opt profiles through
newly added three provider options:
`trt_profile_min_shapes`, `trt_profile_max_shapes` and
`trt_profile_opt_shapes`
with the format of "input1:dim1xdim2...,input2:dim3xdim4...".
(Note: It's similar to --minShapes, --maxShapes and --optShapes of
trtexec command-line
[flags](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags))
For example, if you are using onnxruntime_perf_test, you can try this:
`./onnxruntime_perf_test -e tensorrt -r 1 -i
"trt_profile_min_shapes|imgs:1x3x384x288
trt_profile_max_shapes|imgs:32x3x384x288
trt_profile_opt_shapes|imgs:16x3x384x288" your_model_path`
If the engine cache is enabled, you still need to provide these three
explicit provider options in order to use this feature. ORT TRT will
compare the min/max/opt profile shape with the ones saved in .profile
file to decide whether to rebuild the engine.
Constraints to use these provider options: (1) Need to specify
min/max/opt profile shapes for all the dynamic shape input
This feature is also requested by other users:
https://github.com/microsoft/onnxruntime/issues/13851
### Description
<!-- Describe your changes. -->
Disable new test that is failing on linux. Not required for this
release. Will fix in the next week.
Marshal.Prelink can be used on Windows to make the symbol available but
Linux appears to work differently.
Also need to update the pre-checkin tests so this is tested early as
it's only failing in the E2E tests run in the packaging pipeline.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline error.
### Description
This PR updates the documentation for using the Whisper custom export
scripts via the wheel.
### Motivation and Context
The path should say
`onnxruntime.transformers.models.whisper.convert_to_onnx` instead of
`onnxruntime.transformers.models.convert_to_onnx`.
Dynamic shapes was not working with serialized model so we are switching
to compile model method
### Motivation and Context
Dynamic shapes was not working with serialized model
- If it fixes an open issue, please link to the issue here. -->
Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
### Description
Adds support for the LRN operator to QNN EP.
### Motivation and Context
Enables basic models like googlenet and alexnet to run entirely on QNN
EP.