### Description
Fixed the issue of finding nodes with empty name for vitis ai.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
It is required because we encountered this error when testing newly
created models.
### Description
<!-- Describe your changes. -->
Simplify Shrink.
Replace Eigen code with the one that does not require fp16 conversion in
Sign.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
argmax and argmin are similar to reduce. Eventually we need to add
optimized flavors of the shader.
softmax is optimized but only works on the last axis for now which
should be the common use case.
todo: enable more ut for argmax/argmin
### Description
<!-- Describe your changes. -->
Support more data types for vitis ai.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
It is required because the models we are testing now have uint8 data
type. To solve this once for all, we changed the code to support generic
data type.
### Description
Added Resize NHWC domain kernel registration.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Last week I fixed error #16484 found when trying to build onnxruntime
with the icpx compiler. Another thing I found out is that icpx uses
-ffast-math flag by default. You can check it by running the compiler
with -v flag like following:
```bash
# Setup the environment
. /opt/intel/oneapi/setvars.sh
# Compile any file to see all the implicit flags
icpx -v main.cpp
```
This leads to a bunch of warnings during the build like:
```bash
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/cpu/tensor/upsample_op_test.cc:5:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/provider_test_utils.h:6:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/checkers.h:10:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68:
In file included from /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/Core:172:
/mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/src/Core/MathFunctions.h:1019:12: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare]
return isnan EIGEN_NOT_A_MACRO (x);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
```
And some tests are failing as well, usually with infinities involved. To
list a few:
```bash
# ...
1: [ FAILED ] IsInfTest.test_isinf_float
1: [ FAILED ] IsInfTest.test_isinf_double
1: [ FAILED ] IsInfTest.test_isinf_positive_float
1: [ FAILED ] IsInfTest.test_isinf_positive_double
1: [ FAILED ] IsInfTest.test_isinf_negative_float
1: [ FAILED ] IsInfTest.test_isinf_negative_double
1: [ FAILED ] IsNaNOpTest.IsNaNFloat
1: [ FAILED ] IsNaNOpTest.IsNaNDouble
# ...
```
This PR adds a quick global check for the IntelLLVM compiler, as in the
way its name is reported by CMake and then, depending on the compiler
driver, sets either MSVC-like or GCC-like switch to disable fast-maths.
Probably a bit cleaner solution would be to use
```target_compile_options(${TARGET} PRIVATE MEOW)``` instead of a
global-wide ```set(CMAKE_CXX_FLAGS MEOW)```, but then we'd be required
to add it to all the individual targets and execution providers and this
will lead to a lot of code duplication.
### Description
1. rename OrtValue.FillStringTensorElement to StringTensorSetElementAt .
To the API user I think we're conceptually setting the string at an
offset in the tensor with is roughly equivalent to `List<string> list
... list[index] = "value"`.
2. While working on new inference examples, I noticed that I am still
inclined to use `DenseTensor` for N-D indexing. Added `GetStrides()` and
`GetIndex()` from strides for long dims, so the user can obtain strides
and translate N-D indices into a flat index to operate directly on the
native `OrtValue` buffers. Expose these functions to the user.
3. Make sure we generate docs for C# public static functions.
In additions to `onnxruntime_test_all`, `onnxruntime_shared_lib_test`
and `onnxruntime_customopregistration_test` should
also add "-Wno-deprecated-declarations" flag to ignore compiler warning
### Save optimized pre_grad graph once it's ready
`graph_builder.build()` did two things for training: 1. optimized
forward graph, e.g. pre_grad graph optimization. 2. build gradient
graph.
Originally after `graph_builder.build()` completed, pre_graph graph is
saved. While if pre_grad graph optimization completed, but fail during
gradient graph build, we still cannot get pre_grad graph to investigate.
This PR made the change once pre_grad graph is ready, we save it (if
save_model is enabled) in C++ backend.
BTW, reset minimal supported opset to 1, because with minimal supported
opset 7 will ignore all ops that have last since version less than 7.
e.g. GlobalLpPool, it only has two opset versions: 1, 2.
Padding value in ONNX Pad can be negative, which indicates remove pixel.
WebNN EP can not support such operation, so it needs to use slice to
handle this case.
### Description
There are currently multiple failures that blocking the CI pipelines so
this PR has all of the fixes in order to make sure it passes the CI.
Otherwise a single fix will still fail the CI.
includes:
#16960#16958
Please help to make sure this PR get merged once CI passed.
@snnn @carzh @guschmue
Fixed:
[AB#18118](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18118)
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
Add lsb-release package to android custom build
### Motivation and Context
To fix a build issue:
/workspace/onnxruntime/tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/install_protobuf.sh:
line 27: lsb_release: command not found
### Description
- enable unit test for js/common in CI
- add debug config in js/.vscode/launch.json
- enable source map for js/common/test for debugging purposes; add
source map files to ignore list
- ignore js/common/test folder for npm packaging
Being able to leverage I/O binding for DML and registering `If` for the
DML EP allows us to avoid copying the past/present key/values back and
forth between the CPU and the GPU after every token.
This gives us a 25% performance increase for Dolly V2 with 128 tokens on
an RTX 4090.
### Description
Set `canvas` dimensions to the `ImageBitmap` dimensions, thus fixing a
malformed Tensor creation.
### Motivation and Context
According to the [HTMLCanvasElement.drawImage()
spec](https://html.spec.whatwg.org/multipage/canvas.html#drawing-images):
> When the destination rectangle is outside the destination image (the
output bitmap), the pixels that land outside the output bitmap are
discarded, as if the destination was an infinite canvas whose rendering
was clipped to the dimensions of the output bitmap.
meaning that `ImageBitmap` pixels exceeding the canvas dimensions will
be discarded. Since no canvas dimensions are set for
`Tensor.fromImage(ImageBitmap)` if-case, the default 300x150px canvas
dimensions are used leading to the creation of malformed Tensors where
all the exceeding pixels are discarded and equal to `0, 0, 0, 0` during
the subsequent `pixels2DContext.getImageData()` call.
### Description
<!-- Describe your changes. -->
This PR moves checking profilingMode to each run instead of the
initialization stage. In this way, users can start/stop profiling at any
time. Otherwise, profiling only take effects at the very beginning and
can't be stopped.
### Description
This PR adds support for saving model optimizations after loading a
model that contains external data into an `InferenceSession`.
### Motivation and Context
This PR is a follow-up to a [previous
PR](https://github.com/microsoft/onnxruntime/pull/16716) for saving a
model optimized by an `InferenceSession`.
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
The `%AGENT_TEMPDIRECTORY%\v11.8` is created in azcopy step.
So, the set env step should be after the azcopy step.
### Motivation and Context
Correct the previous logic
Unify the step since multiple jobs are using it.
### Description
Implemented Resize operator support in JSEP
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve graph transformer DoubleQDQPairsRemover
### Description
Improve DoubleQDQPairsRemover to not reset the scale & zero point if
existing value are same on the target DQ & Q nodes.
### Motivation and Context
Fix a bug that DoubleQDQPairsRemover reset the scale value while
removing unnecessary DQ & Q nodes.
### Description
Added Gelu operator to JSEP
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Python Package Pipeline failed since there is exception raised in
test_smooth_quant (from #16288):
```
File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static
importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant")
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module>
from .contrib import *
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module>
from .strategy import *
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module>
__import__(basename(f)[:-3], globals(), locals(), level=1)
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module>
from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module>
from .strategy import STRATEGIES
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module>
from ..algorithm import AlgorithmScheduler, ALGORITHMS
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module>
from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module>
from neural_compressor.utils.create_obj_from_config import get_algorithm
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module>
from neural_compressor.metric import METRICS
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module>
__import__(basename(f)[:-3], globals(), locals(), level=1)
File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module>
from pycocotools import coco
File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module>
from . import mask as maskUtils
File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module>
import pycocotools._mask as _mask
File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```
The cause is pycocotools package uses "oldest-supported-numpy", which
might cause older version numpy in build pycocotools:
9e9164f979/PythonAPI/pyproject.toml (L4)
Related issue: https://github.com/cocodataset/cocoapi/issues/248
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fixes the issue with IRFFT output dimension calculation as described in
#13236
### Motivation and Context
Please refer to #13236 for detailed description.
Specifically, [this code](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/contrib_ops/cuda/math/fft_ops.cc#L103) computes the output dimension as:
```
out_dim = in_dim * 2 - 1
```
while it should be this instead:
```
out_dim = 2 * (in_dim - 1)
```
(assuming the original signal has even number of samples, of course).
For example, if the original signal has 4 samples, then the round trip should look something like:
```
4 -> (one-sided RFFT) -> 3 (complex) -> (one-sided IRFFT) -> 4
```
with the current code the output will be a signal with 5 points.
---------
Co-authored-by: Alexey Kamenev <akamenev@nvidia.com>
Co-authored-by: Nick Geneva <nicholasgeneva@gmail.com>
### Description
<!-- Describe your changes. -->
Split stages for CPU and CPU+NNAPI builds as CodeQL is enabled at the
stage level.
We run it for CPU+NNAPI as that covers all the Android code.
We don't want to run it for both as duplicate issues would be created
for a problem in code included in both builds.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Remove VS 2019 code.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Allow creating ConstantScalarNode for double type
Allow create ConstantScalarNode for double type. Looks double type is
not respected when creating constant. So fix it.
```
onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, const onnxruntime::training::TrainingGraphTransformerConfiguration&)> [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (Sub) bound to different types (tensor(double) and tensor(float) in node (/_original_module/_original_model/gpt_neox/layers.0/input_layernorm/Pow_Grad/Sub_1).
```
### Description
These yaml files and docker files are not used by any pipeline. If I
were wrong, feel free to submit a PR to get the wrongly deleted file
back from git history (git keeps everything forever).
### Description
Add some safety check for conv op.
It is to validate if the attributes coming from a conv op are in a valid
range. (shouldn't be too large or too small).
### Description
Added Flatten operator support to JSEP.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Check the axis attribute for LayerNorm for HTP
### Description
Add code to check the axis value for LayerNorm for HTP explicitly to
make sure only the last dimension is allowed.
### Description
<!-- Describe your changes. -->
This PR adds support to cache the exported training/evaluation ONNX
model in `ORTModule`. On future runs, instead of exporting the model
again, we can pick up the model from a location on disc and run
`ORTModule` training/evaluation.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ORT Training DRI Contribution
---------
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: pengwa <pengwa@microsoft.com>