### Description
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1
### Motivation and Context
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1
### Description
1. Update VERSION_NUMBER for preparing the upcoming release. This PR's
commit will not be included in the 1.15 release branch
2. Delete package/rpm/onnxruntime.spec since it was not used in past
years.
### Motivation and Context
Preparing the release.
Fixed
[AB#15311](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15311)
### Description
They were missed in #15707 , because they are not in common places for Dockerfiles.
Though this commit updated tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile, it won't automatically take effect. The image needs to be manually generated and pushed to a place, and before doing that our CMakeLists.txt also needs to be tweaked a little bit.
### Description
This is the first part to create a webassembly artifacts for ort-web
webgpu EP (wasm build).
there will be following steps to consume the artifacts in web build
### Description
Download protoc from Github Release instead of Nuget to avoid having
dependency on nuget.exe on Linux
### Motivation and Context
To avoid having dependency on nuget.exe on Linux. Many users' build
environment do not have nuget or dotnet.
### Description
Parallelize fp16 pooling operators
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Currently compiling with clang + cuda leads to:
```
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:33:6: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup
ss << t;
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<gsl::span<const long, 18446744073709551615>>' requested here
MakeStringImpl(ss, args...);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:46:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, gsl::span<const long, 18446744073709551615>>' requested here
MakeStringImpl(ss, args...);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/make_string.h:93:18: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, gsl::span<const long, 18446744073709551615>>' requested here
return detail::MakeStringImpl(detail::if_char_array_make_ptr_t<Args const&>(args)...);
^
/code/build/_deps/onnxruntime-src/onnxruntime/contrib_ops/cuda/quantization/qordered_ops/qordered_qdq.cc:73:12: note: in instantiation of function template specialization 'onnxruntime::MakeString<char[39], gsl::span<const long, 18446744073709551615>>' requested here
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Shape not meet clean tile requirement!", dims);
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/common/common.h:188:48: note: expanded from macro 'ORT_MAKE_STATUS'
::onnxruntime::MakeString(__VA_ARGS__))
^
/code/build/_deps/onnxruntime-src/include/onnxruntime/core/framework/tensor_shape.h:201:15: note: 'operator<<' should be declared prior to the call site or in namespace 'gsl'
std::ostream& operator<<(std::ostream& out, const TensorShape& shape);
^
1 error generated.
```
### Description
Adds an option to load local state dictionary for whisper model export.
### Motivation and Context
This is useful to demonstrate workflow of using ORT Training to get
model weights, downloading said weights onto a local gpu-enabled device,
exporting the custom model using `convert_to_onnx.py`, and then nicely
feeding the .onnx file into ORT InferenceSession.
### Description
This change will allow us building CUDA EP without installing CUDA SDK
on Windows.
### Motivation and Context
Nvidia's CUDA installer comes with a VS extension. In the past, we
require installing the extension. It is a little bit inconvenient since:
1. Visual Studio must be installed before CUDA SDK. CUDA's installer
will not install the extension if your machine doesn't have Visual
Studio.
2. We need to install CUDA SDK on our build machines, instead of just
downloading it and using it.
After this change, we will not need to install CUDA SDK on our build
machines. So it will be easier to add a support for a different CUDA
version.
Also, fix two PreFast warnings.
### Description
This PR creates Nuget and Android for Training.
### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.
## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**
### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**
### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
### Description
While building ORT for DML EP with `dml_EXTERNAL_PROJECT` flag, 2
variables (`DML_SHARED_LIB`, `DML_PACKAGE_DIR`) value is not set
properly.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice
### Motivation and Context
Currently “Location” is represented as ORTMemoryInfo, which is OrtDevice
+ OrtMemType, while OrtDevice is represent as DeviceType + DeviceId +
MemType. As we can see there is some unnecessary hierarchy, the proposal
is to make it a clear definition that to use OrtDevice as an abstraction
for Location
---------
Co-authored-by: Lei Cao <leca@microsoft.com>
### Description
Enabled the use of per channel Bias and Mean normalization when converting an image <--> tensor.
Added a few bug fixes and updates to the relevant E2E tests.
---------
Co-authored-by: shalvamist <shalva.mist@microsoft.com>
Symbol visibility from DllImport is inconsistent across platforms resulting in the symbol not necessarily being visible to ORT native code that tries to look it up by name.
Best solution is to use DllImport to load the library and to call the registration function directly. That requires the native SessionOptions handle and OrtApiBase struct. We could either make those public, or provide a helper where the user passes in a delegate from their DllImport. Can add when needed.
Implement a set of new APIs for lightweight custom ops registration, to
save efforts on schema-composing.
A few highlights:
1. Support build-time type inference;
2. Support function-as-op for "stateless" ops;
3. Support structure-as-op for "stateful" ops;
4. Support varied input/output forms such as span, scalar, and tensors,
either optional or non-optional.
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
Augment nhwc graph optimizer to accommodate fp16 operators.
### Motivation and Context
With new fp16 conv operator added. This operator prefers NHWC data
layout. We need to augment existing graph optimizers to better utilize
the new operator.
### Description
Originally VitisAI EP only works with old version of VitisAI release.
### Motivation and Context
Update VitisAI EP so that it works together with the current VitisiAI
3.5 and further version of VitisAI. We try our best to make it forward
compatible.
---------
Co-authored-by: Wang Chunye <chunywan@xilinx.com>
Co-authored-by: mingyue <mingyue@amd.com>
Co-authored-by: mingyueliuh <131847423+mingyueliuh@users.noreply.github.com>
Co-authored-by: liumingyue <mingyue@xilinx.com>
Co-authored-by: moore-ch <129165652+moore-ch@users.noreply.github.com>
Co-authored-by: shoucair <shoucai.ren@amd.com>
Co-authored-by: zz002 <zhenze.wang@amd.com>
Co-authored-by: BoarQing <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
This addresses a performance regression in some INT8 models with the
DirectML EP by defaulting OrtSessionOptionsDisableQuantQDQ to 1 when the
EP is registered.
This regression occured due to the introduction of the QDQ propagation
transformer, which is based on this session option. That transformer
maximizes the number of nodes which are executed as quantized by
logically propagating quantize operators upstream and dequantize
operators downstream. However, it does this simply by inserting QDQ
pairs, with an expectation that something will recognize sequences of
DQ->Op->Q. This logic and related L2 transformers are not currently
enabled for the DirectML EP.
This change also removes a noisy warning when the session option for
memory pattern is overriden as the DirectML EP is registered.
### Description
Walkaround of https://github.com/microsoft/onnxruntime/issues/15521.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Previous behavior of TRT EP to set TRT optimization profiles for dynamic
shape input is based on input tensor values. Users can't explicitly
specify the profiles.
This PR makes users capable of specifying min/max/opt profiles through
newly added three provider options:
`trt_profile_min_shapes`, `trt_profile_max_shapes` and
`trt_profile_opt_shapes`
with the format of "input1:dim1xdim2...,input2:dim3xdim4...".
(Note: It's similar to --minShapes, --maxShapes and --optShapes of
trtexec command-line
[flags](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags))
For example, if you are using onnxruntime_perf_test, you can try this:
`./onnxruntime_perf_test -e tensorrt -r 1 -i
"trt_profile_min_shapes|imgs:1x3x384x288
trt_profile_max_shapes|imgs:32x3x384x288
trt_profile_opt_shapes|imgs:16x3x384x288" your_model_path`
If the engine cache is enabled, you still need to provide these three
explicit provider options in order to use this feature. ORT TRT will
compare the min/max/opt profile shape with the ones saved in .profile
file to decide whether to rebuild the engine.
Constraints to use these provider options: (1) Need to specify
min/max/opt profile shapes for all the dynamic shape input
This feature is also requested by other users:
https://github.com/microsoft/onnxruntime/issues/13851
### Description
<!-- Describe your changes. -->
Disable new test that is failing on linux. Not required for this
release. Will fix in the next week.
Marshal.Prelink can be used on Windows to make the symbol available but
Linux appears to work differently.
Also need to update the pre-checkin tests so this is tested early as
it's only failing in the E2E tests run in the packaging pipeline.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline error.
### Description
This PR updates the documentation for using the Whisper custom export
scripts via the wheel.
### Motivation and Context
The path should say
`onnxruntime.transformers.models.whisper.convert_to_onnx` instead of
`onnxruntime.transformers.models.convert_to_onnx`.
Dynamic shapes was not working with serialized model so we are switching
to compile model method
### Motivation and Context
Dynamic shapes was not working with serialized model
- If it fixes an open issue, please link to the issue here. -->
Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
### Description
Adds support for the LRN operator to QNN EP.
### Motivation and Context
Enables basic models like googlenet and alexnet to run entirely on QNN
EP.
### Description
<!-- Describe your changes. -->
js/react_native package dependency change to manage ort-extensions for
react-native app.
Enable optional inclusion of ort-ext aar/ ort-ext pods for react-native
extensions apps when specifiy `ortExtensionsEnabled` in user's
package.json file
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
-Add support for loading model from buffer on iOS
-Update OnnxruntimeModuleTest to use updated loadModelFromBuffer
Based on #12676
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Issue: #12500
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
### Description
This PR adds `onnxruntime.transformers.models.whisper` to the wheel.
### Usage
There is a README.md document that shows sample commands. The following
command will show how to use the custom Whisper export script in more
detail.
```
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx --help
```
### Motivation and Context
This fixes an issue with adding the Whisper custom export scripts to the
wheel. The Whisper folder now appears in the wheel.

### Description
Cast optimizer may convert a fp16 node to fp32. This used to be safe as
all fp16 kernels has fp32 implementation. As this assumption is no
longer true, we need to check the validity of the operation
### Motivation and Context
Main work here is to introduce an API to check whether a kernel is
registered. Currently we don't have a way to do that without an operator
node. This needs to be augmented. We need to query whether a kernel is
registered by its property only, so that we can judge whether it is safe
to construct a node long before we actually do so.