### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Update DML version to 1.13.1
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fix build error when ENABLE_NPU_ADAPTER_ENUMERATION is defined
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The Java `TensorInfo` object which is used to describe a tensor's shape,
along with the input and output placeholders for a model couldn't show
any symbolic/named dimensions in that tensor. Now this information is
stored in Java strings on construction and included in the toString.
### Motivation and Context
Setting symbolic dimensions required external information in Java, the
names were not discoverable from within the API.
### Description
<!-- Describe your changes. -->
### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
### Description
<!-- Describe your changes. -->
The `split` input of the Split op is int64_t. Fixing that resolves a
type mismatch build error on Windows when CoreML is enabled (for
debugging the partitioning code).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix build error
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
The `OnnxValue` and `OrtProviderOptions` implementations now check to
see if they've been closed before accessing the native pointer, and also
before close is called.
### Motivation and Context
Before they could be closed twice which SIGSEGV'd the JVM. Fixes#19125.
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
### Description
Set default flags nvcc and do not set the flags for ROCM EP.
### Motivation and Context
1. To meet a BinSkim requirement for CUDA EP.
https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations
2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed
to load the EP with the following error message:
Failed to load library libonnxruntime_providers_rocm.so with error:
/build/Release/libonnxruntime_providers_rocm.so: undefined symbol:
vtable for onnxruntime::InsertMaxPoolOutput .
This PR is a hot fix to bring the pipeline back. So far I don't know why
the error happened. The symbol "InsertMaxPoolOutput" is in
onnxruntime_optimizers. I don't see any EP code references it directly.
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:
"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"
After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.
Cache itself should not cause this difference.
### Motivation and Context
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
### Description
enable external data loading for ort-web.
### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.
### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
// session options
externalData: [
{
// relative or absolute path/URL of the file,
// or a pre-loaded Uint8Array containing the data of the external data file
data: './path/data/b.bin',
// the relative path of the external data. Should match initializers' "location" value defined in the model file
path: './b.bin'
},
// { } if multiple external data file
]
});
```
Currently, this feature only works with JSEP build enabled.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add new option `trt_engine_cache_prefix` to customize TRTEP engine cache
prefix.
i.e:
- If user specifies `trt_engine_cache_prefix|FRCNN
trt_engine_cache_enable|true` when running FRCNN model
- the cache will be saved/loaded:
`FRCNN_2068723788287043730_*_sm80.engine`. Engine profile follows same
pattern.
- If skipping this option, the engine will be saved/loaded:
`TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_2068723788287043730_*_*_sm80.engine`
as default case.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/16708
---------
Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
when jsep calls javascript with an index to HEAP8 or HEAP32 the index is
negative when the heap is above 2GB, even if we pass it as uint32_t it
remains negative. So in javascript use >>> 0 to make it unsigned.
### Description
- Updates `get_qnn_qdq_config()` to use new scale/zp np.array data
types.
- Adds missing unit test to help prevent future regression.
### Motivation and Context
https://github.com/microsoft/onnxruntime/pull/18043 changed the usage of
`extra_options["TensorQuantizationOverrides"]`. We need to update its
use in quantization/execution_providers/qnn/quant_config.py
### Description
<!-- Describe your changes. -->
Pass through the ConfigOptions from the session via OpKernelInfo so that
kernel behavior can be configured.
Initial usage would be to optionally enable a fast path for ARM64 bloat16 GEMM - see #17031
Other usages could be things like selected the exact implementations of the activation functions for RNN operators instead of the default approximations (e.g. use [sigmoid_exact instead of sigmoid](2d6e2e243d/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h (L379-L382)))
OpKernelInfo is already passing through things from the session state, and adding a new member of ConfigOptions
is the simpler update. It's also a more natural fit given it's providing state/info to the kernel.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
when DOM API is not avaiable, using OffscreenCanvas
### Motivation and Context
In some environment like service worker or web worker, the DOM API is
not avaiable, we can use OffscreenCanvas API to replace
`document.createElement('canvas')`.
Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same,
except that `toDataUrl` is missing.
It fix this issues #19032
resize for fp16 has 2 issues: scales are always f32 and roi can be f32
or f16.
scales:
this is fixed.
roi
this is fixed for the case where roi is not passed as optional input
with f16. To fix this it requires a much larger change and I did not
want to risk this short before a release. For all practical purpose
passing roi as input with f16 should be rare and we can fix it in the
near future.
### Description
Introduce AppendExecutionProvider_OpenVINO_V2 API and support for OV
2023.3.
### Context
- The API is added to facilitate customers in using published official
Microsoft onnxruntime libraries with OVEP libraries.
- Add support for OpenVINO 2023.3 official release.
- Extend operator coverage
- GH fixes
---------
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
### Description
<!-- Describe your changes. -->
Implements LabelEncoder as per `ai.onnx.ml` opset 4 for the upcoming
ONNX 1.15 release. ~~This currently depends on a new ONNX release
candidate and so is marked as draft in the meantime.~~
### Motivation and Context
Closes https://github.com/microsoft/onnxruntime/issues/17602
Update WebNN test list in suite-test-list.jsonc so all test cases are
passed behind WebNN CPU backend on Chrome Stable (Although some cases
may fall back to CPU EP).
Enable int64 support for WebNN in unit tests.
### Description
In some environments the test code has undefined behavior. To prove it, save the following code as
test.cpp
```c++
#include <iostream>
#include <stdio.h>
int main(){
char buf[1024];
int ret = snprintf(buf, sizeof(buf), "%ls","abc");
if(ret <0){
std::cout<< ret<< std::endl;
} else{
std::cout<< "OK: ret="<<ret<< std::endl;
}
return 0;
}
```
Then compile it as
```
g++ -DNDEBUG -std=gnu++17 test.cpp -o /tmp/t
```
Or
```
g++ -O2 -DNDEBUG -std=gnu++17 test.cpp -o /tmp/t
```
The first command is without optimization. The second one turns on
optimization. Then the outputs are different.
When optimization is enabled, the output might be:
```
OK: ret=-1
```
You cannot explain why it would go to this branch when ret is "-1". It
might be a bug of a specific version of GCC. However, at this moment we
cannot change the version. It was found in GCC version 8.5.0 20210514
(Red Hat 8.5.0-18) (GCC) that is provided by UBI8. RHEL9 doesn't have
the problem. snprintf is a builtin function of GCC. So the problem was
not related to glibc.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Whenever a node QuantizeLinear or DequantizeLinear, the type of the
weights before being quantize must be known to create the scale with the
expected type. Another option would be to add many operator CastLike but
that would push the burden to onnxruntime optimizer.
The PR tries to avoid changing the signature. To do so, it modified the
scale computation to use a numpy array to store the result and not a
python float. The numpy array must be of the same type than the weights
to quantize.
The PR adds many `assert` to check the type of the scale is not a python
type or a float64. This was added to make sure all the code follows the
same logic. These lines were kept for the first review.
DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR
https://github.com/onnx/onnx/pull/5709 is missing to fix shape
inference. PR https://github.com/onnx/onnx/pull/5473) is missing to
support QLinearMatMul with float 16. That explains why some tests are
disabled with float 16.
### Motivation and Context
The current quantization tool assumes every weight is float 32. For
large models such as LLAMA, it is usually float 16. The quantization
needs to quantize such weights.
### Description
1. Add two build jobs for enabling Address Sanitizer in CI. One for
Windows CPU, One for Linux CPU.
2. Set default compiler flags/linker flags in build.py for normal
Windows/Linux/MacOS build. This can help control compiler flags in a
more centralized way.
3. All Windows binaries in our official packages will be built with
"/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft
public symbol
server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols).
Limitations:
1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries.
Therefore once Address Sanitizer is enabled, before running tests we
need to manually set LD_LIBRARY_PATH properly otherwise
libonnxruntime.so may not be able to find custom ops and shared EPs.
4. On Linux we also need to set LD_PRELOAD before running some tests(if
the main executable, like python, is not built with address sanitizer.
On Windows we do not need to.
5. On Windows before running python tests we should manually copy
address sanitizer DLL to the onnxruntime/capi directory, because python
3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the
information provided by PATH env.
6. On Linux Address Sanitizer found a lot of memory leaks from our
python binding code. Therefore right now we cannot enable Address
Sanitizer when building ONNX Runtime with python binding.
7. Address Sanitizer itself uses a lot of memory address space and
delays memory deallocations, which is easy to cause OOM issues in 32-bit
applications. We cannot run all the tests in onnxruntime_test_all in
32-bit mode with Address Sanitizer due to this reason. However, we still
can run individual tests in such a way. We just cannot run all of them
in one process.
### Motivation and Context
To catch memory issues.
### Description
Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix
a build error:
```
Starting: Set Python manylinux variables
==============================================================================
Task : Python script
Description : Run a Python file or inline script
Version : 0.231.1
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script
==============================================================================
##[error]Parameter 'toolPath' cannot be null or empty.
Finishing: Set Python manylinux variables
```
The error was because today I deleted a bunch of software from the VM
image. The task might fail if no Python versions are found in
$(Agent.ToolsDirectory).
### Description
Remove the references to CreateFileMapping2 because the function is
mainly for system services. To use the function, we need to link to one
of the four [Windows umbrella
libraries](https://learn.microsoft.com/en-us/windows/win32/apiindex/windows-umbrella-libraries).
It's tricky because a custom build might want to use any of the four. So
I cannot just choose one and add that one to our CMakeLists.txt.
Given it's so complicated and the code is not actually used now, I will
remove it. It is not used because it requires NTDDI_VERSION >=
NTDDI_WIN10_RS5 but in our top level CMakeLists.txt we set the version
to the first Windows 10 release which is lower than RS5.
### Description
<!-- Describe your changes. -->
Add four quantize Ops: MatmulInteger, ConvInteger, DynamicQuantizeLinear
and DequantizeLinear.
Add datatype TensorProto_DataType_INT8 and TensorProto_DataType_UINT8.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Support quantized models.
### Description
Change `A / sqrt(B)` to `A * inverseSqrt(B)` in BatchNormalization,
InstanceNormalization, LayerNormalization and SkipLayerNormalization.
### Motivation and Context
For the same reason as the existence of the `inverseSqrt` built-in in
WebGPU spec.