### Description
Adds support for specifying mixed precision QDQ models via tensor
quantization overrides.
### Motivation and Context
This PR implements an approach for supported "mixed precision" models.
The following figure demonstrates an example mixed precision model as
defined in this PR.

A mixed precision QDQ model consists of regions with different
activation/weight quantization data types. The boundary between regions
converts between activation quantization data types (e.g., uint8 to
uint16) using a DQ to Q sequence.
The ability to specify regions with different quantization data types
enables exploring the tradeoffs between accuracy and latency. A higher
integer precision may improve accuracy at the expense of latency, so
selectively promoting certain regions to a higher precision can aid in
achieving a desirable balance in key metrics.
#### Current support
By default, the ORT quantizer supports specifying default activation and
weight quantization data types for the entire model. A recent PR added
support for specifying basic quantization overrides at the tensor level
via the `extra_options["TensorQuantOverrides"]` configuration:
```
TensorQuantOverrides = dictionary :
Default is {}. Set tensor quantization overrides. The key is a tensor name and the value is a
list of dictionaries. For per-tensor quantization, the list contains a single dictionary. For
per-channel quantization, the list contains a dictionary for each channel in the tensor.
Each dictionary contains optional overrides with the following keys and values.
'quant_type' = QuantType : The tensor's quantization data type.
'scale' = Float : The scale value to use. Must also specify `zero_point` if set.
'zero_point' = Int : The zero-point value to use. Must also specify `scale` is set.
'symmetric' = Bool : If the tensor should use symmetric quantization. Invalid if also
set `scale` or `zero_point`.
'reduce_range' = Bool : If the quantization range should be reduced. Invalid if also
set `scale` or `zero_point`.
'rmax' = Float : Override the maximum real tensor value in calibration data.
Invalid if also set `scale` or `zero_point`.
'rmin' = Float : Override the minimum real tensor value in calibration data.
Invalid if also set `scale` or `zero_point`.
```
The tensor-level overrides are currently used to override the
quantization type for weights/initializers or to set specific
scale/zero-point values for a tensor (e.g., QNN requires Sigmoid to use
a specific scale/zero-point at its output).
However, these overrides are not typically used to override activation
quantization types due in large part to operator data type constraints.
Consider, for example, that all inputs and outputs to an Add operator
must be of the same data type. Consequently, using tensor-level
overrides to promote the Add’s output to 16-bits would force the inputs
to also be overridden to 16-bit. In turn, this would have a cascading
effect on potentially the entire graph. The solution implemented by this
PR is to allow the specification of tensor boundaries where the
activation quantization data type changes.
#### The approach
The following figure shows a model with a region that has been promoted
to 16-bit from the default 8-bit activation type.

Note the following observations:
- Op2’s output is consumed by Op4, Op7, and Op8. Op4 consumes the
converted u16 type, while Op7 and Op8 consume the original u8 type.
- Op3’s output is converted from u8 to u16. Op5 consumes the converted
u16 type.
- Op4’s output is just u16 (not converted).
- Op5’s output is converted from u16 to u8. Op6 consumes the u8 type.
The approach implemented by this PR uses the tensor-level quantization
overrides to specify a tensor’s quantization type at both the producer
and consumer ends. **The following shows the overrides necessary to
create this mixed precision QDQ model.**
```python3
overrides = {
“Op2_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op4”}}}],
“Op3_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op5”}}}],
“Op4_out”: [{“quant_type”: QUInt16}],
“Op5_out”: [{“quant_type”: QUInt16, “convert”: {“quant_type”: QUInt8, “recv_nodes”: {“Op6”}}}]
}
```
### Description
Fix a bug in WASM's GEMM. The bug was found when running
"ConvAddActivationFusionTests.ConvGemmDirect" unit test in a wasm build
with address sanitizer enabled. When CountK=25, CountN=1, lda=25, ldc=1,
the function I am modifying triggered a read out of bound error.
The bug fix was provided by @fs-eire.
### Description
Add `ModelProto` support as an input to transformers `optimize_model`
API.
### Motivation and Context
Currently, the `optimize_model` API only accepts a model path as the
input model. However, for large models, saving and loading from disk can
be time-consuming. By adding `ModelProto` as an input option to the
`optimize_model` API, significant time can be saved.
### Description
<!-- Describe your changes. -->
Initialize Symbol engine as needed with no duplicate calls.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Currently absel library may call SymInitialize more than once
when shared libraries are involved. However, this can only be
called only once per process. Our debug_alloc also may call it
when enabled. This change enables intialization to proceed
only when needed with no duplicate effort.
### Description
This PR removes early stopping from the end-to-end LLaMA-2 benchmark
script.
### Motivation and Context
This allows models to always generate the requested number of new
tokens.
### Description
<!-- Describe your changes. -->
Removing const_cast as it might lead to unknown behavior. Specifying
DLMangedTensor as a const doesn't seem to be necessary and I have tested
this by running torch_ort.configure. Not sure what other tests which
needs to be done. Background can be found in this
[PR](https://github.com/microsoft/onnxruntime/pull/19982)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR adds a benchmarking script to measure end-to-end performance and
saves the results in a CSV file.
### Motivation and Context
With this PR, end-to-end performance can be easily measured for many
large-language models such as LLaMA-2. The performance numbers for
LLaMA-2 are located
[here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).
### Description
Add NPU to list of device supported.
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL
Bug Fixes with Python API
Reverted Dockerfiles not being maintained.
### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.
---------
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
### Description
1. Move building on CPU machine.
2. Optimize the pipeline
3. Since there isn't official ONNX package for python 12, the python 12
test stage uses the packages built with ONNX source in build stage.
### Motivation and Context
1. Resolve the random hang in compilation
4. Save a lot of GPU resources.
---------
### Description
To test this feature, run
```bat
python cmake\deps_update_and_upload.py --root-path mirror
```
Then run build.py as usual.
The zip files will be cached local. To avoid being downloaded again and
again.
### Description
<!-- Describe your changes. -->
### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
MAUI on macOS uses mac-catalyst which requires a different native
binary.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
### Description
Fixed all CUDA NHWC Pooling operations which were broken and enabled the
NHWC CUDA pooling tests. Disabled all pooling tests which are not
supported by the CUDA EP.
### Motivation and Context
Ensure parity between CUDA NHWC / NCHW and work towards 100% tests
enabled for the CUDA EP / CUDA NHWC EP.
---------
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
### Description
Support MatMul 1D inputs by combining Reshape and ReduceMean.
### Motivation and Context
ONNX MatMul can support 1D inputs, which is disabled in
`IsOpSupportedImpl`.
### Description
<!-- Describe your changes. -->
1.Support Tensor Parallelism in ShardedMoE.
2.Make necessary code changes to support Mixtral MoE.
3.Fix a bug related to using IOBinding in test script.
4.Fix the input size limitation
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add Const Cast for DLManagedTensor as PyTorch has changed it's
[code](https://github.com/pytorch/pytorch/pull/121102) which creates
incompatibility.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the below error while configuring ORT-training with nightly PyTorch
```
aten_op_executor.cc:60:40: error: invalid conversion from ‘const DLManagedTensor*’ to ‘DLManagedTensor*’ [-fpermissive]
60 | at::Tensor tensor = at::fromDLPack(dlpack);
| ^~~~~~
| |
| const DLManagedTensor*
```
### Description
Improve the precision of tests.
Changes include:
(1) Update checkers.cc to use consistent default tolerance.
(2) Allow different default tolerances for different providers at
runtime (Previously, threshold of a test is decided during compiling).
(3) Explicitly set absolute and relative error tolerances for tests that
failed to pass new default threshold.
#### Default Thresholds Change
Note that the formula of testing is `abs(expected - value) < absolute +
relative * expected`
Default test thresholds when both absolute and relative tolerance are
not set:
type | provider | absolute (before) | absolute (after) | relative
(before) | relative (after)
-- | -- | -- | -- | -- | --
double | CPU | 0.001 | 0.00001 | 0 | 0.00001
double | CUDA | 0.005 | 0.00001 | 0 | 0.00001
double | TRT | 0.005 | 0.00001 | 0 | 0.00001
double | ROCM | 0.005 | 0.00001 | 0 | 0.00001
double | DML | 0.005 | 0.00001 | 0 | 0.00001
| | | | |
float | CPU | 0.0001 | 0.00001 | 0 | 0.0001
float | CUDA | 0.005 | 0.00001 | 0 | 0.0001
float | TRT | 0.005 | 0.00001 | 0 | 0.0001
float | ROCM | 0.005 | 0.00001 | 0 | 0.0001
float | DML | 0.005 | 0.00001 | 0 | 0.0001
float | Training* | 0.005 | 0.001 | 0 | 0.0001
| | | | |
half | CPU | 0.001 | 0.0025 | 0 | 0.001
half | CUDA | 0.005 | 0.0025 | 0 | 0.001
half | TRT | 0.005 | 0.0025 | 0 | 0.001
half | ROCM | 0.005 | 0.0025 | 0 | 0.001
half | DML | 0.02 | 0.005 | 0 | 0.001
half | Training* | 0.005 | 0.005 | 0 | 0.001
| | | | |
bfloat16 | CPU | 0.0001 | 0.02 | 0 | 0.01
bfloat16 | CUDA | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | TRT | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | ROCM | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | DML | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | Training* | 0.0001 | 0.02 | 0.05 | 0.01
*Training mean a build flag ENABLE_TRAINING_CORE is defined. The
provider can be any one.
#### Threshold for provider
Previously, the threshold might change according to build flags:
```
#if defined(USE_CUDA) || defined(USE_ROCM) || defined(USE_DML)
constexpr float threshold = 0.005f;
#else
constexpr float threshold = 0.0001f;
#endif
```
For a cpu only build, the threshold is 0.0001. For a cuda build, the
threshold for CPU provider (some tests in cuda build actually run with
CPU provider) is changed to 0.005.
After this change, the threshold only depends on data type and provider
used in the test. It will not change by build flags for non-training
builds.
Default thresholds for training might be different from inference
(please refer to the above table). There are a few factors there:
Training has gradient outputs; TF32 is not disabled in training; Some
training tests has iterations, and error might accumulate. How to set
different thresholds based on these factors could be a future task.
Currently, the nhwc_transformer_test.cc compilation unit defines
explicit FP16 versions of `ModelTestBuilder::MakeInput<MLFloat16>` and
`ModelTestBuilder::MakeInitializer<MLFloat16>` outside of the
ModelTestBuilder class's header file.
These explicit template instantiations cause linker errors when other
compilation units also instantiate these functions due to duplicate
definitions. Additionally, the versions defined in
nhwc_transformer_test.cc do not really conform to the expected behavior
in the original ModelTestBuilder class, which is to make random
input/initializer values. Instead, the versions in
nhwc_transformer_test.cc create a range of values.
The solution is to edit nhwc_transformer_test.cc to use stand-alone
static functions that do not change the ModelTestBuilder class.
**Note**: This linker error cannot currently be replicated in our CIs
because it requires a QNN-HTP-enabled Windows ARM64 environment with
`MLAS_F16VEC_INTRINSICS_SUPPORTED` defined. I can replicate on a local
build. The linker error/conflict happens with with this new FP16 QNN
test:
d4c8bc359e/onnxruntime/test/providers/qnn/clip_op_test.cc (L186)
### Description
The docker image name was fixed, but the docker argument was different
in different job.
It would trigger rebuilding the docker image almost every time!!!
### Description
<!-- Describe your changes. -->
- [x] Pad operator has introduced a new input called "axes" which
specifies which axis to pad. But it defaults to input_rank if axes is
not provided which was the behavior before the opset upgrade.
- [x] ReduceMean
- [x] ReduceL2
- [x] ReduceLogSumExp
- [x] ReduceSum
- Reduction ops all had the axes attribute switched to an input and a
new attribute called "noop_with_empty_axes" was added to define what to
do when axes is not specified.
- [x] Resize has had two new attributes introduced: antialias and
keep_aspect_ratio_policy. From Operators.md I've gathered:
"Antialiasing is achieved by stretching the resampling filter by a
factor max(1, 1 / scale), which means that when downsampling, more input
pixels contribute to an output pixel."
keep_aspect_ratio_policy "describes how to interpret the `sizes` input
with regard to keeping the original aspect ratio of the input." there
are a couple enum-type options that specify different policies and what
to do in each case.
- NOTE: Baiju already included opset18 tests in
https://github.com/microsoft/onnxruntime/pull/17772
- [x] ScatterElements/ScatterND has had a new attribute introduced
called "reduction." This specifies the type of reduction to apply: none
(default), add, mul, max, min.
- [x] Split introduced a new attribute called "num_outputs" which
specifies how many outputs to split the input tensor into. This is in
contrast to the previous, default behavior of specifying a "split" input
which defines the size of each resultant tensor of the output.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Operator or model test result shall not depend on whether
NVIDIA_TF32_OVERRIDE environment variable is set or not. This make test results more deterministic.
### Description
<!-- Describe your changes. -->
This change addresses the following issues with the current CustomOP
Output Type inference
- The function does not take into account optional inputs. When input is
absent the inference is silently aborted, and no output type is inferred
(P1 customer issue)
- Inferring output type based on the input type for multi-kernel custom
ops is done based on the latest in sequence kernel definition. There is
not an attempt made to match the kernel based on the input type.
- Inference is aborted when variadic inputs/outputs are detected when
the generated input/output names fail to obtain type constraints. This
is not immediately clear from the code, because custom op schema is not
available within the inference function.
- No error reporting.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Most of CustomOPs lack their own type and shape inference function as it
was recently introduced. For that reason, it is important to fix this.
This change is inspired by a customer issue.
This is a follow up on:
- https://github.com/microsoft/onnxruntime/pull/15184
- https://github.com/cbourjau/ort-custom-op/pull/11
- https://github.com/microsoft/onnxruntime-extensions/issues/451
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Add `cann_dependencies`
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The previous [PR](https://github.com/microsoft/onnxruntime/pull/17365)
avioded using patchelf but lost `cann_dependencies`, This PR adds
`cann_dependencies` to avoid require cann libraries when repairing
wheel.
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```
Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
### Description
Fix#19931 broken Get Started link
HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime
supports Windows 10 and above, only."
### Description
Include Windows 11 in the version check. Now, you will not see the
warning “Unsupported Windows version (11). ONNX Runtime supports Windows
10 and above, only.”
### Motivation and Context
Warning on Windows 11: Only supports systems above Windows 10, which is
somewhat strange.
### Description
This PR rewrite the backend resolve logic to support specifying multiple
EPs.
#### Backend
The first version of ONNX Runtime Web actually carried some existing
code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes
the "backend" concept. The original "backend" in ONNX.js is designed in
a way assuming there is only one backend from user's backend hint list
will be used. For example, in ONNX.js, if user specify a backend hint as
`['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it
loads successfully (the browser supports webgl), then "webgl" backend
will be used and "wasm" will be ignored; otherwise, "webgl" will be
ignored and try to load "wasm" backend.
In short: only one backend will be used when initializing a session.
#### Execution Provider
Execution Provider, or EP, in ONNX Runtime is a different concept. One
of the differences is that users are allow to specify multiple EPs, and
if one does not support a particular kernel, it can fallback to other
EP. This is a very common case when using a GPU EP in ONNX Runtime.
#### Current Status: Backend v.s. EP
Because of the history reasons mentioned above, the current status is
quite confusing. There are **real backend**s, which means it's different
implementation in code; and there are **backend hint**s, which are used
as string names for backend hint; and there are **EP**s of the ONNX
Runtime concepts.
currently there are only 2 **backend**s in our code base: The "onnxjs
backend", and the "wasm backend". The "onnxjs backend" currently only
powers backend hint "webgl", which go into the old onnx.js code path.
All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu"
and "webnn" are all powered by "wasm backend".
And because ORT Web treat "backend" as an internal concept and want to
align with ONNX Runtime, so those names of backend hints are becoming EP
names.
The following table shows today's status:
| Execution Provider Name (public) / Backend Hint (internal) | Backend |
EP in ORT
| -------- | ------- | ------- |
| "wasm"/"cpu" | WasmBackend | CPU EP
| "webgl" | OnnxjsBackend | \* technically not an EP
| "webgpu" | WasmBackend | JSEP
| "webnn" | WasmBackend | WebNN EP
#### Problem
While the API allows to specify multiple EPs, the backend resolving only
allows one backend. This causes issues when user specify multiple EP
names in session options, the backend resolve behavior and EP
registration behavior is inconsistent. Specifically, in this issue:
https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908:
EP list `['webgpu', 'wasm']` on a browser without WebGPU support
resolves to 'wasm' backend, but the full EP list is passed in session
options, so JSEP is still enabled, causing the runtime error.
#### Solution
Since we still need WebGL backend, we cannot totally remove the backend
register/resolve system. In this PR I made the following changes:
- initialize every backend from the EP list, instead of only do that for
the first successful one.
- for the first resolved backend, filter all EP using the exact same
backend. Remove all EPs not using this backend from session options
- for every explicitly specified EP, if it's removed, show a warning
message in console