WindowsAI build failing due to deprecated .NET5 SDK missing in build
image
.NET5 was deprecated last year, and recently the build machine images
have been updated to not include this SDK.
Unblock failing builds by force insalling .NET5 SDK as part of the build
pipeline.
### Description
XNNPack: allow users to choose whether enable CPU MEM arena or not.
Right now it is hardcoded to true and it is not impacted by the on/off
switch in SessionOption. We should make it work.
### Motivation and Context
As we have such a switch in SessionOption, it should work as expected.
Split `IsTunbaleOpEnable` semantics into **enable tunable op for using**
and **enable tunable op for tuning**.
They remain disabled in general for safety purpose. But
- if session is created with onnx model with tuning results embeded
- the embedded tuning results is set to the EP without error `Status`
then we automatically enable the using, tuning remains disabled.
The planned options will be
- `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. **NOTE:** most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path.
- `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp`
Then for the possible future options:
- `tunable_op_tuning_max_iteration`: blahblah
- `tunable_op_tuning_max_duration_ms`: blahblah
- `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this.
For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations.
### Description
Upgrade remainding python to 3.11 removing 3.7
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Allows the creation of zero length tensors via the buffer path (the
array path with zero length arrays still throws as the validation logic
to check it's not ragged would require more intrusive revision), and
allows the `tensor.getValue()` method to return a Java multidimensional
array with a zero dimension. Also added a test for the creation and
extraction behaviour.
### Motivation and Context
The Python interface can return zero length tensors (e.g. if object
detection doesn't find any objects), and before this PR in Java calling
`tensor.getValue()` throws an exception with a confusing error message.
Fixes#7270 & #15107.
### Description
<!-- Describe your changes. -->
Add clog back to onnxruntime_EXTERNAL_LIBRARIES.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix iOS packaging pipeline build failure.
### Description
Register Resize op into nhwc schema for Qnn EP.
### Motivation and Context
Resize op is identified as layout sensitive op for Qnn EP, need to
register it into nhwc schema
Change the default behavior to link against the nvonnxparser library
(onnx-tensorrt parser) that is included with the TensorRT package.
Previously, the default behavior was to build and statically link
against the OSS onnx-tensorrt parser.
Historically, we wanted to incorporate the latest commits/fixes from OSS
parser.
These days the OSS parser is not significantly different from the
included parser library so there is less reason to build against it by
default.
By linking with parser shared library from TensorRT library, the major
benefit is it's much easier to
build/link against a minor version update of TensorRT. And OnnxRuntime
can be updated with a new TensorRT minor version by simply replacing
TensorRT libraries with the newer version. (because the parser is no
longer statically linked into onnxruntime)
Added --use_tensorrt_oss_parser to build.py to support the previous
default behavior. (build + static link OSS parser)
Add support for ViT optimization in optimizer.py
As ViT architecture follows BERT rather closely, we can easily reuse
BERT fusions for ViT. The only difference is that ViT does not have
attention mask, which means there is no Add node in qk paths.
Make the necessary changes in onnx_exporter.py to be able to cover
optimizations with test.
### Description
Update python package pipeline to support 3.11
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Adjust various code paths to allow Whisper model to function with
BeamSearch op.
Approach: Add a new kModelType enum value in IGenerationParameters as
so:
#### Old: 0 = GPT2, 1 = T5
#### New: 0 = GPT2, 1 = T5, 2 = Whisper
When the user assigns this attribute value to 2, various shape and type
checks are changed to accommodate Whisper inputs.
### Motivation and Context
BeamSearch is currently designed to function with BERT-based models with
inputs as vocab tokens, and needs changes to function with Whisper
inputs (3-D float values processed from audio data).
---------
Co-authored-by: Peter McAughan <petermca@microsoft.com>
### Description
In merge branch, the run only reads the cache generated in main build.
As a result, each run in merge branch will not upload new cache except
at the first time.
### Motivation and Context
1.Reduce the cache storage.
If there's some big changes, devs should trigger the specific builds
manually in https://dev.azure.com/onnxruntime/onnxruntime/_build. It
still reads own branch cache.
### Description
Add scripts to export Whisper model to ONNX and integrate the ORT
BeamSearch op with the resulting graphs.
Example command to execute this script:
python convert_to_onnx.py -m openai/whisper-large --output whisper -e
---------
Co-authored-by: Peter McAughan <petermca@microsoft.com>
Fix prefast warnings:
(1) Arithmetic overflow: Using operator '*' on a 4 byte value and then
casting the result to a 8 byte value. Cast the value to the wider type
before calling operator '*' to avoid overflow (io.2).
(2) Dereferencing NULL pointer 'key'.
### Description
Rework some external targets to ease building with
`-DFETCHCONTENT_FULLY_DISCONNECTED=ON`
This will allow package managers to more easily provide an onnxruntime
package by reducing the amount of patching needed downstream at each
version.
### Motivation and Context
Availability of onnxruntime in some C++ package managers
https://github.com/microsoft/onnxruntime/issues/7150https://github.com/conan-io/conan-center-index/issues/16699https://github.com/microsoft/vcpkg/issues/20548
My initial intent is to get this in conan but the PR would most likely
be useful (though not tested) to vcpkg as well (and maybe others).
I tried to get only a first batch of not too specific patches (i.e. not
specific to conan).
The first commit reworks `flatbuffers` and just extends what @snnn did
in https://github.com/microsoft/onnxruntime/pull/13991
The second commit reworks `pytorch_cpuinfo`
The third commit reworks `google_nsync`
Temporarily remove Azure build check to unblock PR(s).
We need to investigate the sudden build failure and reenable.
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Ensure that Loop operators run on CPU.
Fix memcpy for Sequence Tensors, so that empty sequences (like when
SequenceEmpty runs on DirectML) can be copied back to CPU.
MSVC and gcc are both not good at optimizing select(), even in trivial
usage outside of ORT.
gcc seems to do better with -ffast-math (not used by ORT) but /fp:fast
does nothing for MSVC
This PR delivers a 33% speedup on the same model (360us -> 270us on
Windows; 205 us -> 153 us on Linux; measured on different systems).
TODO: Examine and fix Elu and other similar activation functions for the
use of `Eigen::select`
Co-authored-by: @fpribeiro
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add a tool to convert fused BERT like model to packing mode
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Followed by https://github.com/microsoft/onnxruntime/pull/14881
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
rocm python packaging pipeline failed because manylinux version and
manylinux.patch update.
1. fix duplicate `epel-release` installation issue, ROCm pipeline
install it at the begin of the dockerfile to install rocm libs. remove
duplicate installation on install-runtime-packages.sh.
```
/var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package.
Error: Nothing to do
```
2. add python10 to fix error below.
```
+ /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools
build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory
```
3. add python10 to rocm pipeline.
pipeline link:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results
### Description
Removing fp16 support from apple build
### Motivation and Context
FP16 support on ARM64 only available after armv8.2a, thus the clang
compiler needs a compilation flag `-march=armv8.2-a+fp16`.
Unfortunately, our current universal build does not support hardware
specific compilation flags on cpp source files, as it would cause
trouble when compiling against more than one hardware target. Until we
figure out how to remove this limitation, had to disable fp16 support
for Apple systems.
### Description
<!-- Describe your changes. -->
Add required graph transformer to duplicate DQ nodes to ensure that QDQ
node units have unique DQ nodes. This condition is necessary for QDQ
node unit processing.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
There is an existing Python utility that does this:
c7ced7a5e9/tools/python/util/qdq_helpers/qdq_model_utils.py (L77)
This PR implements it as a graph transformer so it is integrated into
ORT and does not require a separate step to update the model. There are
also tests to ensure that its effects are not undone by basic level
graph optimizations.