### Description
Bring the fix for DML to 1.17.3 to resolve an issue
https://github.com/microsoft/onnxruntime/issues/20180
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Lei Cao <leca@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Jian Chen <cjian@microsoft.com>
Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
### Description
This PR is a preview of cherry-picks for ort-web to `rel-1.17.3` based
on `rel-1.17.2`.
<details>
<summary>Changes of ort-web to cherry-pick</summary>
The following commits are from main branch.
`o` stands for pick, and `x` stands for skip.
```
o 2e0a388c36 [js/webgpu] Add HardSigmoid support (#19215)
o d226e40856 [js/webgpu] set query type in onRunStart (#19202)
o 61610ff986 [js/webgpu] Add FusedConv clip test case (#18900)
o a33b5bd1fa [JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788)
o 591f90c0b9 [js/webgpu] Fix issue of timestamp query (#19258)
o 7252c6e747 [WebNN EP] Support WebNN async API with Asyncify (#19145)
o 5b06505073 [js/webgpu] Fix Tanh explosion (#19201)
o 656ca66186 [js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753)
o a3f0e2422b [js/webgpu] Support f16 uniform (#19098)
o 9e69606360 fix f16 for attention, enable slice and flatten for more types (#19262)
o 624b4e2063 [js/webgpu] Remove enableShapesUniforms (#19279)
o 90883a366a [js/webgpu] Add hardSigmoid activation for fusedConv (#19233)
o 85cef0af8c [js/webgpu] Support capture and replay for jsep (#18989)
o d73131cf0f [js/webgpu] Use DataType as uniform cpu type (#19281)
o dd1f6ccc45 [js/webgpu] resolve codescan alert (#19343)
o 3a2ab1963a [js/webgpu] Refactor createTensorShapeVariables (#18883)
o efc17e79de [js/webgpu] Fix the undefined push error (#19366)
x 50806a7dd5 [js/web] support external data in npm test (#19377)
o ccbe264a39 [js/webgpu] Add LeakyRelu activation for fusedConv (#19369)
o 5ff27ef02a [js/webgpu] support customop FastGelu (#19392)
x 03be65e064 [js/web] fix types exports in package.json (#19458)
o 06269a3952 [js/webgpu] allow uint8 tensors for webgpu (#19545)
o dfeda9019c [JS/WebGPU] Add MatMulNBits (#19446)
o 1b48054e1b [js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
o 3fe2c137ee [js] small fix to workaround formatter (#19400)
x 70567a4b3a [js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
o 6e04e36e3f [js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)
o 58f4921686 [js] changes to allow Float16Array if any polyfill is available (#19305)
o 57d6819212 [js/web] Fix fused-conv is not included in npm test (#19581)
o ebd220b073 Misspelling in README.md (#19433)
o 38c3432393 Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582)
o fe82fccf1a [js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
o 76a2a487a1 Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583)
o 29b1106033 [node] Switch to setImmediate to avoid starving the Node.js event loop (#19610)
o ae3d73c981 [JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
o aec2389ad0 [js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
o bb43a0f133 [js/webgpu] minor fixes to make tinyllama work (#19564)
o 0edb035808 [js/web] fix suite test list for zero sized tensor (#19638)
o 3cb81cdde2 [js/common] move 'env.wasm.trace' to 'env.trace' (#19617)
o e30618d055 [js/webgpu] use Headless for webgpu test by default (#19702)
o f06164ef8b [js/web] transfer input buffer back to caller thread (#19677)
x a788514027 [js/web] dump debug logs for karma for diagnose purpose (#19785)
o 24b72d2613 [JS/WebGPU] Preserve zero size input tensor dims. (#19737)
o 4538d31a8b [js/webgpu] expose a few properties in WebGPU API (#19857)
o 53de2d8cb0 [js/webgpu] Enable GroupedConvVectorize path (#19791)
o ed250b88c3 [JS/WebGPU] Optimize MatMulNBits (#19852)
x e771a763c3 [js/test] align web test runner flags with ort.env (#19790)
o 79e50aeef3 [js/web] rewrite backend resolve to allow multiple EPs (#19735)
o acb0df2280Fix#19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932)
o b29849a287 [js/common] fix typedoc warnings (#19933)
o afdab62f53 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949)
o 28ad6c3955 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951)
o 7e0d424934 accumulate in fp32 for Reduce* (#19868)
o 4c6a6a37f7 [js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387)
o 01c7aaf6aa [js/webgpu] allow setting env.webgpu.adapter (#19940)
o c45cff60cf [js/webgpu] fix maxpool / fp16 (#19981)
```
</details>
<details>
<summary>Cherry-pick commandlines</summary>
```sh
git cherry-pick 2e0a388c36
git cherry-pick d226e40856
git cherry-pick 61610ff986
git cherry-pick a33b5bd1fa
git cherry-pick 591f90c0b9
git cherry-pick 7252c6e747
git cherry-pick 5b06505073
git cherry-pick 656ca66186
git cherry-pick a3f0e2422b
git cherry-pick 9e69606360
git cherry-pick 624b4e2063
git cherry-pick 90883a366a
git cherry-pick 85cef0af8c #<<<<< Note: conflicts
git cherry-pick d73131cf0f
git cherry-pick dd1f6ccc45
git cherry-pick 3a2ab1963a
git cherry-pick efc17e79de
git cherry-pick ccbe264a39
git cherry-pick 5ff27ef02a
git cherry-pick 06269a3952
git cherry-pick dfeda9019c
git cherry-pick 1b48054e1b
git cherry-pick 3fe2c137ee
git cherry-pick 6e04e36e3f
git cherry-pick 58f4921686
git cherry-pick 57d6819212
git cherry-pick ebd220b073
git cherry-pick 38c3432393
git cherry-pick fe82fccf1a
git cherry-pick 76a2a487a1
git cherry-pick 29b1106033
git cherry-pick ae3d73c981
git cherry-pick aec2389ad0
git cherry-pick bb43a0f133
git cherry-pick 0edb035808
git cherry-pick 3cb81cdde2
git cherry-pick e30618d055
git cherry-pick f06164ef8b
git cherry-pick 24b72d2613
git cherry-pick 4538d31a8b
git cherry-pick 53de2d8cb0
git cherry-pick ed250b88c3
git cherry-pick 79e50aeef3
git cherry-pick acb0df2280
git cherry-pick b29849a287
git cherry-pick afdab62f53
git cherry-pick 28ad6c3955
git cherry-pick 7e0d424934
git cherry-pick 4c6a6a37f7
git cherry-pick 01c7aaf6aa
git cherry-pick c45cff60cf
```
</details>
<details>
<summary>Cherry-pick conflicts</summary>
- 85cef0af8c#18989
this change is for enabling graph capture feature for JSEP, and it is
done after ROCM EP enabled graph capture feature. However, the ROCM EP
graph capture feature is not cherry-picked in rel-1.17.2.
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jiajia Qin <jiajia.qin@intel.com>
Co-authored-by: Xu Xing <xing.xu@intel.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: Yang Gu <yang.gu@intel.com>
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
Co-authored-by: Jiajie Hu <jiajie.hu@intel.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Matttttt <18152455+martholomew@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Segev Finer <segev208@gmail.com>
Co-authored-by: Belem Zhang <belem.zhang@intel.com>
### Description
<!-- Describe your changes. -->
Web prs are not included yet.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Your Name <your@email.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: enximi <70036307+enximi@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Markus Tavenrath <mtavenrath@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
### Description
the release branch `rel-1.17.3` is created based on `rel-1.17.2` last
week. However, there are latest code change merged into `rel-1.17.2`:
#19897. The branch `rel-1.17.3` is protected so no push or delete can be
performed on it.
This PR cherry-picks the commit 633c22f based on 6bc6adc to make sure
the base of `rel-1.17.3` matches `rel-1.17.2`.
@snnn @pranavsharma This operation will ensure the code base contains
same code, but the git history will not be exactly same. If you want it
to be exactly same, I need your help to do a git rebase or delete and
recreate the branch.
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: George Wu <jywu@microsoft.com>
### Description
<!-- Describe your changes. -->
As title. Follow up pr for source code release 1.17.2
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Changming Sun <chasun@microsoft.com>
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Markus Tavenrath <mtavenrath@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Need to update patch release version.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
Cherry-pick Final Round
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
### Description
<!-- Describe your changes. -->
[ORT 1.17.0 Release] Cherry pick 1st round
PR authors please take a look, and let me know if there are any
questions about the changes or approve accordingly.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: wejoncy <wejoncy@163.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: luoyu-intel <yu.luo@intel.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Heflin Stephen Raj <heflinstephen03@gmail.com>
Co-authored-by: Yifan Li <109183385+yf711@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Update DML version to 1.13.1
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fix build error when ENABLE_NPU_ADAPTER_ENUMERATION is defined
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The Java `TensorInfo` object which is used to describe a tensor's shape,
along with the input and output placeholders for a model couldn't show
any symbolic/named dimensions in that tensor. Now this information is
stored in Java strings on construction and included in the toString.
### Motivation and Context
Setting symbolic dimensions required external information in Java, the
names were not discoverable from within the API.
### Description
<!-- Describe your changes. -->
### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
### Description
<!-- Describe your changes. -->
The `split` input of the Split op is int64_t. Fixing that resolves a
type mismatch build error on Windows when CoreML is enabled (for
debugging the partitioning code).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix build error
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
The `OnnxValue` and `OrtProviderOptions` implementations now check to
see if they've been closed before accessing the native pointer, and also
before close is called.
### Motivation and Context
Before they could be closed twice which SIGSEGV'd the JVM. Fixes#19125.
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
### Description
Set default flags nvcc and do not set the flags for ROCM EP.
### Motivation and Context
1. To meet a BinSkim requirement for CUDA EP.
https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations
2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed
to load the EP with the following error message:
Failed to load library libonnxruntime_providers_rocm.so with error:
/build/Release/libonnxruntime_providers_rocm.so: undefined symbol:
vtable for onnxruntime::InsertMaxPoolOutput .
This PR is a hot fix to bring the pipeline back. So far I don't know why
the error happened. The symbol "InsertMaxPoolOutput" is in
onnxruntime_optimizers. I don't see any EP code references it directly.
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:
"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"
After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.
Cache itself should not cause this difference.
### Motivation and Context
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
### Description
enable external data loading for ort-web.
### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.
### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
// session options
externalData: [
{
// relative or absolute path/URL of the file,
// or a pre-loaded Uint8Array containing the data of the external data file
data: './path/data/b.bin',
// the relative path of the external data. Should match initializers' "location" value defined in the model file
path: './b.bin'
},
// { } if multiple external data file
]
});
```
Currently, this feature only works with JSEP build enabled.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add new option `trt_engine_cache_prefix` to customize TRTEP engine cache
prefix.
i.e:
- If user specifies `trt_engine_cache_prefix|FRCNN
trt_engine_cache_enable|true` when running FRCNN model
- the cache will be saved/loaded:
`FRCNN_2068723788287043730_*_sm80.engine`. Engine profile follows same
pattern.
- If skipping this option, the engine will be saved/loaded:
`TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_2068723788287043730_*_*_sm80.engine`
as default case.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/16708
---------
Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
when jsep calls javascript with an index to HEAP8 or HEAP32 the index is
negative when the heap is above 2GB, even if we pass it as uint32_t it
remains negative. So in javascript use >>> 0 to make it unsigned.
### Description
- Updates `get_qnn_qdq_config()` to use new scale/zp np.array data
types.
- Adds missing unit test to help prevent future regression.
### Motivation and Context
https://github.com/microsoft/onnxruntime/pull/18043 changed the usage of
`extra_options["TensorQuantizationOverrides"]`. We need to update its
use in quantization/execution_providers/qnn/quant_config.py
### Description
<!-- Describe your changes. -->
Pass through the ConfigOptions from the session via OpKernelInfo so that
kernel behavior can be configured.
Initial usage would be to optionally enable a fast path for ARM64 bloat16 GEMM - see #17031
Other usages could be things like selected the exact implementations of the activation functions for RNN operators instead of the default approximations (e.g. use [sigmoid_exact instead of sigmoid](2d6e2e243d/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h (L379-L382)))
OpKernelInfo is already passing through things from the session state, and adding a new member of ConfigOptions
is the simpler update. It's also a more natural fit given it's providing state/info to the kernel.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
when DOM API is not avaiable, using OffscreenCanvas
### Motivation and Context
In some environment like service worker or web worker, the DOM API is
not avaiable, we can use OffscreenCanvas API to replace
`document.createElement('canvas')`.
Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same,
except that `toDataUrl` is missing.
It fix this issues #19032
resize for fp16 has 2 issues: scales are always f32 and roi can be f32
or f16.
scales:
this is fixed.
roi
this is fixed for the case where roi is not passed as optional input
with f16. To fix this it requires a much larger change and I did not
want to risk this short before a release. For all practical purpose
passing roi as input with f16 should be rare and we can fix it in the
near future.
### Description
Introduce AppendExecutionProvider_OpenVINO_V2 API and support for OV
2023.3.
### Context
- The API is added to facilitate customers in using published official
Microsoft onnxruntime libraries with OVEP libraries.
- Add support for OpenVINO 2023.3 official release.
- Extend operator coverage
- GH fixes
---------
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
### Description
<!-- Describe your changes. -->
Implements LabelEncoder as per `ai.onnx.ml` opset 4 for the upcoming
ONNX 1.15 release. ~~This currently depends on a new ONNX release
candidate and so is marked as draft in the meantime.~~
### Motivation and Context
Closes https://github.com/microsoft/onnxruntime/issues/17602