Add `--user` option to pip install command.
Error:
```
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py'
Consider using the `--user` option or check the permissions.
```
See #19877.
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
### Description
make the compilation work on Azure CPU Agent by reduce the parallel
count
### Motivation and Context
The OOM issue mentioned in #20244 was caused the by low
memory/parallel_count.
### Description
It always has been out of memory in training CUDA 12.2 packaging
pipeline
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
since the PR #19910
I tried other CPU agents for example, D64as_v5(256G memory) and
D32as_v4(128G memory and 256 G SSD temp storage), which are still out of
memory like the below image

But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp
storage, and it takes much more time.
### Motivation and Context
Restore CUDA 12.2 training packaging pipeline first.
More time is needed to investigate the root cause
### Other Clues.
These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4
And it runs out of memory on CPU machine. @ajindal1
cuda12.2 on T4
```
2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o
2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o
```
But they could be finished in about one minute with Cuda 11.8 on CPU
```
cuda11.8 on CPU
2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
cuda11.8 on GPU
024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
```
### Description
Update QNN python packages to use QNN SDK version 2.19.2.
### Motivation and Context
Our CI builds already use QNN SDK version 2.19.2. We should make sure
the ort-nightly-qnn python packages are also built with the same QNN SDK
version.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Except [Python-CUDA-Packaging
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary),
all windows cuda packaging jobs have been running well now.
After comparison, enable_lto isn't added in the pipeline, which might be
one root cause of the random hang.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and
D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX
headers through the DirectX-Headers repo.
Note that this includes an update to cgamanifest.json for onnx-tensorrt
which is triggered during re-generation due to a prior changes to
deps.txt.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to
allow compiling ORT on Java 21. The build still targets Java 8.
I'm not sure if there will be CI changes necessary to use this PR,
specifically for the Gradle version as I don't know if that is cached
somewhere earlier in the CI build process.
The new Gradle version adds a warning that using `--source` and
`--target` to select the Java language version is obsolete which is
annoying, we can fix it if we decide to only allow building on newer
versions of Java, while still supporting running on Java 8.
### Motivation and Context
Java 21 is the latest LTS release of Java and ORT should be able to
build on it.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Yi Zhang <your@email.com>
### Description
reactor win-ci.yml to solve the random hang issue in more GPU workflows,
move nugget-zip packages and python cuda12 packages building to CPU
machine.
---------
Co-authored-by: Yi Zhang <your@email.com>
### Description
Address build issues and source code discrepancies.
Fix cuda_test_provider gtest argument stack corruption.
### Motivation and Context
`OpTester` class that is widely used for kernel testing is not
suitable for testing internal classes for EPs that are built as shared
objects.
Currently, CUDA EP tests run only on Linux.
We want to enable testing and developments on Windows,
and create a usable pattern for testing of other EPs internals.
Alternatives considered:
Abstracting EP unit tests into separate test executable such as
`onnxruntime_test_all`.
This alternative was rejected as it would create a lot more changes in
the established patterns,
and potentially interfere with CUDA functionality with more complex
source code maintanence.
### Description
In #20073, I use pin onnx version to unblock the whole PR CI.
In fact, we could use the onnx that installed by building source code,
that the onnx version is controlled by deps.txt.
For some history reason, DML stage installed onnx from pypi. Now, the
onnx can be installed as other stages.
add an option to skip installing onnx in win-ci-prebuild-step
### Description
Make Windows GPU Packaging stage in Python Packaging pipeline run on CPU
machine as well
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Test Link
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430961&view=results
### Description
Update Web CI to use data dir under Agent.TempDirectory
This change fixes the random failure caused by unstable access to karma
temp directory (which is under AppData\Local\Temp) on CI pipeline
### Description
Add NPU to list of device supported.
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL
Bug Fixes with Python API
Reverted Dockerfiles not being maintained.
### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.
---------
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
### Description
1. Move building on CPU machine.
2. Optimize the pipeline
3. Since there isn't official ONNX package for python 12, the python 12
test stage uses the packages built with ONNX source in build stage.
### Motivation and Context
1. Resolve the random hang in compilation
4. Save a lot of GPU resources.
---------
### Description
<!-- Describe your changes. -->
### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
MAUI on macOS uses mac-catalyst which requires a different native
binary.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
### Description
The docker image name was fixed, but the docker argument was different
in different job.
It would trigger rebuilding the docker image almost every time!!!
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```
Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
### Description
<!-- Describe your changes. -->
* Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner)
* Model: FasterRCNN-10 model within CI image
* `-c` param configurable via CI panel when kicking off CI tasks
* Auto-replicate test input/outputs according to `-c` param
* By default, the model test will be executed in 100 iterations (~2min
added to T4 CI task load overall)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To monitor potential concurrency issues of ORT-TRT
### Description
Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers
### Motivation and Context
This PR includes changes that will make ort handle 2GB+ checkpoints.
To do that we need to upgrade flatbuffers to 23.5.9 -
https://github.com/google/flatbuffers/pull/7945
- Modified the commitHash and the hash for the new version
- Removed the patch for rust generator's unused variable warning as it
is no longer producing this - [Check it out
here](d121e09d89/src/idl_gen_rust.cpp)
- Updated the VerifyField calls with alignment values that were
introduced in the new version.
---------
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
### Description
1. Use stage to organize the pipeline and split building and testing
2. Move compilation on CPU machine
3. test stage can leverage existing artifacts
4. check wheel size, it gives warning if the size above 300M
5. docker image name wasn't change even the argument changed, which
caused the docker image was always rebuilt. So update the docker image
name according to the argument can save the docker build time.
Pipeline duration reduced by 60% (2 hours -> 50 minutes)
Compilation time reduced by 75% (1.5hours -> 20 minutes)
GPU time reduced by 87% ( 8 hours to 1 hours)
for debugging, the GPU time could be reduced by above 95%, because we
can choose run only one test stage and skip building.
### Motivation and Context
Make the pipeline efficient.
Optimized
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results
Curent
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results
---------
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.
Similar to #19909
### Motivation and Context
As a follow up of #19118
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.
### Motivation and Context
As a follow up of #19118
### Description
the `npm test` flags are difficult to memorize, because they are
different to the `ort.env` flags. This change makes those flags align
with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`.
Old flags are marked as deprecated except `-x` (as a shortcut of
`--wasm.numThreads`)
### Description
Check the onnx node tests and model tests worked
### Motivation and Context
onnx node test data and model data are mount in one dir.
And onnxruntime_test_all search the dir and load the data.
If the dir does exist or there's some change in onnxruntime_test_all,
those tests may not be executed.
For example, all onnx node test data is 32M. It's hardly for us aware of
the regression.
So I add the simple check to ensure those tests are executed.
---------
Co-authored-by: Yi Zhang <your@email.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
--extra-index-url is not allowed by injected Secure Supply Chain Step in
packaging pipelines.
```
> Starting Multifeed Python Security Analysis:
##[warning]tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml - Found "extra-index-url". (https://aka.ms/cfs/pypi)
```
And those 2 packages can be installed from PyPI as well now.
Co-authored-by: Yi Zhang <your@email.com>
Fix some linker errors that come up when integrating the onnxruntime-training-c pod into another Xcode project. The problematic configuration is a minimal build with training APIs enabled.
- training_op_defs.o had some unresolved references to ONNX functions. It should not be included at all in a minimal build.
- tree_ensemble_helper.o also had unresolved references to ONNX ParseData. The containing function is unused in a minimal build.
Added a test to cover this configuration.
### Description
As a follow up of #19788, remove more remaining Windows ARM32 build
jobs.
### Motivation and Context
Our nuget packaging pipeline is failing because it could not find an
artifact for Win ARM32.
```
##[error]Artifact onnxruntime-training-win-arm was not found for build 421397.
```
Deprecation of Win ARM32 was announced by Windows team in January 2023.
We should follow it.
### Description
* Update name of existing dockerfiles and add support to test latest
TensorRT EA binary located in the image
* Add cuda 12.3/cuDNN 9/TensorRT 8.6 dockerfile
* Add detail to CI prompts and configs
Instruction to test latest TRT via BIN:
1. Select `BIN` in TensorRT Version
2. In Variables, update related tarCudaVersion, **clear**
tarCudnnVersion (not required in latest TRT tar binary) , and path to
binary.
### Description
* Add tag to distinguish if TRT `builtin` or `oss` parser is being used
* `oss` tag will be inserted with onnx-tensorrt commit id, to indicate
which version oss parser is
### Validate
DB entry before/after this PR
(during test, `builtin` or `oss_{commit_id}` tag was inserted in the
database entries):
### Motivation and Context
To distinguish perf results using builtin/oss parser in the database,
this parser tag is needed.
In future, results using different parsers will be listed in different
Perf Dashboard pages.
### Description
<!-- Describe your changes. -->
Address warnings so all the ORT projects build with /W4 on Windows.
Mainly
- unused parameters
- variables shadowing other ones
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#19588 started on this.
### Description
Change webgpu CI pipeline to use a preinstalled chrome. Hopefully it can
increase the stability. Now the chrome got from puppeteer often failed
to start.
### Description
It is a "Bash" task that requires running bash on Windows. Most Windows
operating systems do not have Bash installed. Given this task is only
debugging purposes, we can remove it for now.
### Motivation and Context
I am making this change because I am regenerating the VM image in a
different manner, and the new image does not contain bash. Once this PR
is in, I can switch the images.
### Description
<!-- Describe your changes. -->
* Publish the artifacts as late as possible
* once published the artifacts are immutable, and any retry will fail if
they exist
* if any step fails after publishing the stage cannot be retried
* use powershell to cleanup
* DeleteFiles is taking >30 mins and causing the stage to timeout
* powershell took < 1s
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make pipeline more robust
### Description
<!-- Describe your changes. -->
Use UseMultiToolTask and limit the number of cl.exe instances running.
MultiToolTask info:
https://devblogs.microsoft.com/cppblog/improved-parallelism-in-msbuild/
Info on why limiting CL_MPCount can help:
https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows
The current CIs have 4 cores (both physical and logical). Hardcoded the
GPU build in win-ci.yml to use CL_MPCount of 2 as that seems to work
fine. Can adjust if needed to base it on the actual number of cores or
to use build.py to build.
Caveat: I've run about 16 builds and haven't seen a slow build yet, but
as the root cause of the slow builds isn't really known this isn't
guaranteed to be a fix.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Try and prevent super slow GPU builds by reducing number of tasks
potentially running in parallel.
### Description
<!-- Describe your changes. -->
The RN CI has intermittent failure error with "app seems to idle".
enable the most verbose logging level (and can add steps to dump
device.log from the detox folder/artifacts if necessary) to at least get
more information.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
Xcode UI tests seem to be flaky:
https://github.com/orgs/community/discussions/68807
Add a couple of retries if we get a "Timed out while loading
Accessibility." error which is transient.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add Whisper Conversion and E2E into Big Models pipeline
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Your Name <your@email.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
### Description
1. check GPU status in docker
2. use stages to make test stage can leverage existing building
artifacts
### Motivation and Context
To investigate the root cause of the random exception
`CUDA failure 100: no CUDA-capable device is detected`
### Description
<!-- Describe your changes. -->
build.py sets a few parallelization parameters when building. Using
msbuild directly lacks those.
7a5860e490/tools/ci_build/build.py (L1665-L1669)
Changed to use build.py. If there's a concern with that we _could_ set
the parameters in the yaml, but that will be uglier due to duplicating
logic in multiple places.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Updates the default QNN SDK version to 2.19.2.240210.
### Motivation and Context
Build and test the latest version of QNN SDK in our pipelines.
### Description
Some test thresholds that previously worked in T4 GPU does not work
anymore. The reason is current pipeline uses A10, and TF32 is enabled by
default.
Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random
test failure.
### Motivation and Context
Linux Test has random failure at tests:
ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[446], expected: <0.0419757> but was: <0.041948937>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
org.opentest4j.AssertionFailedError: array contents differ at index [6],
expected: <0.0225981> but was: <0.022587791>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
### Description
<!-- Describe your changes. -->
ROCm CI pipeline issue.
```
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
main()
File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
builder_instance.download_and_prepare(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
self._download_and_prepare(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
data_file = dl_manager.download_and_extract(self.config.data_url)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
downloaded_path_or_paths = map_nested(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
return function(data_struct)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
return cached_path(url_or_filename, download_config=download_config)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
output_path = get_from_cache(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update the `datasets` pipeline to latest version `2.17.0`.
### Description
<!-- Describe your changes. -->
This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
the final opset supported by torchscript exporter
(https://github.com/pytorch/pytorch/pull/107829)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Engineering excellence contribution for ORT Training DRI.
---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
Adds a job to the python packaging pipeline that builds x64 python
wheels for QNN EP.
### Motivation and Context
Necessary to create a cached QNN model on Windows x64, which is done by
creating a properly configured onnxruntime session with QNN EP.
### Description
This pull request includes a small change to the
`Dockerfile.manylinux2_28_cuda` file in the
`tools/ci_build/github/linux/docker` directory. The change corrects the
`PREPEND_PATH` argument from `/usr/local/cuda/binet` to
`/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is
set.
This pull request includes modifications to the `c-api-cpu.yml` Azure
Pipelines configuration file. The changes mainly revolve around the
Node.js packaging stage and the handling of Node.js artifacts. The most
significant changes include renaming the Node.js packaging stage, adding
a new dependency to the stage, changing artifact names, adding a new
script to list Node.js artifacts, and updating the source folder for
copying NuGet binaries.
Changes in Node.js packaging:
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508):
Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to
`Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new
dependency to the stage.
Changes in handling of Node.js artifacts:
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569):
Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to
`drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline
artifacts for Windows x64.
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598):
Added a new script to list Node.js artifacts from the directory
`$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`.
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640):
Updated the source folder from
`$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib`
to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task
to copy NuGet binaries to the directory
`$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`.
---------
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description
1. make parity_check use local model to avoid using hf token
2. del the model didn't work because it tried to del the object define
out of the function scope.
So it caused out of memory in A10.
3. In fact, 16G GPU memory (one T4) is enough. But the conversion
process always be killed in T4 and it works on A10/24G.
Standard_NC4as_T4_v3 has 28G CPU memory
Standard_NV36ads_A10_v5 has 440G memory.
It looks that the model conversion needs very huge memory.
### Motivation and Context
Last time, I came across some issues in convert_to_onnx.py so I use the
onnx model in https://github.com/microsoft/Llama-2-Onnx for testing.
Now, these issues could be fixed. So I use onnx model generated by this
repo and the CI can cover the model conversion.
Fix pytest version to 7.4.4, higher version will cause error
`from onnxruntime.capi import onnxruntime_validation
ModuleNotFoundError: No module named 'onnxruntime.capi'`
### Description
<!-- Describe your changes. -->
Setup usage of coremltools via dependencies instead of copying files.
Pull in some changes from
https://github.com/microsoft/onnxruntime/pull/19347 in preparation for
supporting ML Program and enabling building the ML Model on all
platforms to make development and testing of CoreML EP code easier.
- Update to coremltools 7.1
- Add patch for changes required for cross platform build of ML Program
related code
- Generate coreml proto files on all platforms
- mainly to test these changes work everywhere, as the proto files will
be used on all platforms when #19347 is checked in
- rename onnxruntime_coreml_proto target to coreml_proto as it contains
purely coreml protobuf code with no ORT related chagnes
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve setup.
### Description
1. save the model to pipeline cache
2. lower the similarly bar to 97
3. publish the generated image that we can check it once the test fails
### Motivation and Context
Reduce model downloads
### Description
<!-- Describe your changes. -->
Updates to only include ios archs framework in artifacts included in
Nuget Package.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Related issue:
https://github.com/microsoft/onnxruntime/issues/19295#issuecomment-1914143256
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
In PR #19073 I mistunderstood the value of "--parallel". Instead of
testing if args.parallel is None or not , I should test the returned
value of number_of_parallel_jobs function.
If build.py was invoked without --parallel, then args.parallel equals to
1. Because it is the default value. Then we should not add "/MP".
However, the current code adds it. Because if `args.paralllel` is
evaluated to `if 1` , which is True.
If build.py was invoked with --parallel with additional numbers, then
args.parallel equals to 0. Because it is unspecified. Then we should add
"/MP". However, the current code does not add it. Because `if
args.paralllel` is evaluated to `if 0` , which is False.
This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons.
### Motivation and Context
### Description
1. Add visual parity test based on openai clip model
2. Add trigger rules
### Motivation and Context
1. check generated image is expected
2. reduce unnecessary triggers
### Description
Fix two issues:
(1) We can only use single quote inside `bash -c "..."`. Current
pipeline job stopped at `python3 demo_txt2img.py astronaut` and skip the
following commands. In this change, we remove the remaining commands to
get same effect (otherwise, the pipeline runtime might be 2 hours
instead of 15 minutes).
(2) Fix a typo of Stable.
### Description
Update abseil to a release tag and register neural_speed to CG.
### Motivation and Context
Now we are using a non-relesed version of abseil. Using a tag is better.
### Description
1. Update Linux GPU machine from T4 to A10, sm=8.6
2. update the tolerance
### Motivation and Context
1. Free more T4 and test with higher compute capability.
2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss
and fail this test
```
2024-01-19T13:27:18.8302842Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:25.8438641Z Expected equality of these values:
2024-01-19T13:27:25.8438841Z COMPARE_RESULT::SUCCESS
2024-01-19T13:27:25.8439276Z Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:25.8439464Z ret.first
2024-01-19T13:27:25.8445514Z Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ
2024-01-19T13:27:25.8446198Z
2024-01-19T13:27:25.8555736Z [ FAILED ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms)
2024-01-19T13:27:25.8556077Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312
2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:29.3175144Z Expected equality of these values:
2024-01-19T13:27:29.3175389Z COMPARE_RESULT::SUCCESS
2024-01-19T13:27:29.3175812Z Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:29.3176080Z ret.first
2024-01-19T13:27:29.3176322Z Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ
```
3. some other test like SSD throw other exception, so skip them
'''
2024-01-22T09:07:40.8446910Z [ RUN ]
ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-22T09:07:51.5587571Z
/onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358:
Failure
2024-01-22T09:07:51.5588512Z Expected equality of these values:
2024-01-22T09:07:51.5588870Z COMPARE_RESULT::SUCCESS
2024-01-22T09:07:51.5589467Z Which is: 4-byte object <00-00 00-00>
2024-01-22T09:07:51.5589953Z ret.first
2024-01-22T09:07:51.5590462Z Which is: 4-byte object <01-00 00-00>
2024-01-22T09:07:51.5590841Z expected 1, got 63
'''
### Description
Adds a job to create a nightly python package for ORT/QNN on Windows
ARM64.
Must build onnxruntime-qnn with python 3.11 and numpy 1.25.
**Note: pipeline run may take up to 3 hrs**
### Motivation and Context
Make it possible to get a nightly python package with the latest updates
to QNN EP.
Issue #19161
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:
"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"
After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.
Cache itself should not cause this difference.
### Motivation and Context
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Add two build jobs for enabling Address Sanitizer in CI. One for
Windows CPU, One for Linux CPU.
2. Set default compiler flags/linker flags in build.py for normal
Windows/Linux/MacOS build. This can help control compiler flags in a
more centralized way.
3. All Windows binaries in our official packages will be built with
"/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft
public symbol
server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols).
Limitations:
1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries.
Therefore once Address Sanitizer is enabled, before running tests we
need to manually set LD_LIBRARY_PATH properly otherwise
libonnxruntime.so may not be able to find custom ops and shared EPs.
4. On Linux we also need to set LD_PRELOAD before running some tests(if
the main executable, like python, is not built with address sanitizer.
On Windows we do not need to.
5. On Windows before running python tests we should manually copy
address sanitizer DLL to the onnxruntime/capi directory, because python
3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the
information provided by PATH env.
6. On Linux Address Sanitizer found a lot of memory leaks from our
python binding code. Therefore right now we cannot enable Address
Sanitizer when building ONNX Runtime with python binding.
7. Address Sanitizer itself uses a lot of memory address space and
delays memory deallocations, which is easy to cause OOM issues in 32-bit
applications. We cannot run all the tests in onnxruntime_test_all in
32-bit mode with Address Sanitizer due to this reason. However, we still
can run individual tests in such a way. We just cannot run all of them
in one process.
### Motivation and Context
To catch memory issues.
### Description
Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix
a build error:
```
Starting: Set Python manylinux variables
==============================================================================
Task : Python script
Description : Run a Python file or inline script
Version : 0.231.1
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script
==============================================================================
##[error]Parameter 'toolPath' cannot be null or empty.
Finishing: Set Python manylinux variables
```
The error was because today I deleted a bunch of software from the VM
image. The task might fail if no Python versions are found in
$(Agent.ToolsDirectory).
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
### Description
Adding python3.12 support to ORT
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Remove Windows ARM32 from nuget packaging pipelines
2. Add missing component-governance-component-detection-steps.yml to
some build jobs.
### Motivation and Context
Stop supporting Windows ARM32 to align with [Windows's support
policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64).
Users who need this feature still can build the DLLs from source.
However, later on we will remove that support too.
### Description
- Removes `--disable_ml_ops` build flag
- Automatically detects ORT version from VERSION file via
`templates/set-version-number-variables-step.yml`. We will no longer
need to create a commit to update ORT versions.
### Motivation and Context
- A new unit test caused failures in the QNN Nuget pipeline because it
did not enable ml ops.
- Automate ORT version specification
### Description
Change all macOS python packages to use universal2, to reduce the number
of packages we have.
### Motivation and Context
According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur),
macOS 11 is the first macOS version that supports universal 2. And it is
the min macOS version we support. So we no longer need to maintain
separate binaries for different CPU archs.
### Description
- Add mutex to protect QNN API calls for executing a graph and
extracting the corresponding profile data.
- Ensures QNN EP's execute function does not store unnecessary state
(i.e., input and output buffer pointers do not need to be stored as
class members.)
### Motivation and Context
Allow calling `session.Run()` from multiple threads when using QNN EP.
### Description
1. Update donwload-artifacts to flex-downloadartifacts to make it eaiser
to debug.
2. Move the native files into Gpu.Windows and Gpu-linux packages.
Onnxruntime-Gpu has dependency on them.
3. update the package validation as well
4. Add 2 stages to run E2E test for GPU.Windows and GPU.Linux
for example:

### Motivation and Context
Single Onnxruntime.Gpu Package size has already excceded the Nuget size
limit.
We split the package into some smaller packages to make them can be
published.
For compatibility, the user can install or upgrade Onnxruntime.Gpu,
which will install Gpu.Windows and Gpu.Linux automatically.
And the user can only install Gpu.Windows and Gpu.Linux directly.
### Test Link
1. In ORT_NIGHTLY
2. Install the preview version in nuget-int. (nuget source:
https://apiint.nugettest.org/v3/index.json)
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Move QNN EP provider options to session options
### Description
Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first.
This is the first step for PR:
PR https://github.com/microsoft/onnxruntime/pull/18865
### Description
<!-- Describe your changes. -->
Add LeakyRelu to the list as support was added a while ago.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Add a CodeSign validation task before the binaries are published, to
make sure all DLL files are signed.
2. Auto-trigger the CUDA 12 pipeline's publishing job.
### Description
Fixes a failure in the ortmodule nightly pipeline.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Change Nuget packaging pipeline's build TRT job to download CUDA SDK
on-the-fly, so that we do not need to put a CUDA SDK in the build
machine's image.
### Description
Update absl and googletest to their latest version to include some cmake
changes:
1. A googletest's cmake change that will allow using external absl and
re2.
2. Nullability enhancements that will allow our clang-based static
analysis detecting many kinds of null pointer errors.
### Motivation and Context
To fix a C4744 link warning in our Windows pipelines.
```
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
```
### Description
ONNX model zoo changed their dir structure. So some our pipelines are
failing. In prevent such things happening again, we'd better to read the
test data for a cache from local disk instead of downloading it remotely
every time.
### Description
<!-- Describe your changes. -->
Add macos build for objc pod.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Follow up pr for #18550
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
The warning is:
```
C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.1812949Z with
2023-12-08T20:58:48.2144272Z [
2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.2801935Z ]
2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2894476Z with
2023-12-08T20:58:48.2911521Z [
2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,
2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc)
2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3197946Z with
2023-12-08T20:58:48.3198565Z [
2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3905678Z ]
2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3913414Z with
2023-12-08T20:58:48.3913660Z [
2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.3914499Z ]
2023-12-08T20:58:48.3914743Z qlinear_concat.cc
2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5534583Z with
2023-12-08T20:58:48.5541266Z [
2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5544914Z ]
2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5553712Z with
2023-12-08T20:58:48.5555569Z [
2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5558707Z ]
2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5566354Z with
2023-12-08T20:58:48.5568185Z [
2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5571339Z ]
2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5578562Z with
2023-12-08T20:58:48.5580399Z [
2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5583465Z ]
2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5591396Z with
2023-12-08T20:58:48.5593220Z [
2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5595955Z ]
```
And the warning in #18195
### Motivation and Context
AB#22894
---------
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
### Description
Move NuGet nightly package publishing job to a separated pipeline.
Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs
Packaging Pipeline'. This PR moves it to a separate pipeline so that we
can manually trigger this step for any branch(e.g. release branches).
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Updating transformers package in test pipeline to fix a security
vulnerability.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Update absl and gtest to fix an ARM64EC build error
### Motivation and Context
We need to get an important fix into ORT.
The fix is:
8028a87c96
### Description
reuse EO pool in NPM pipeline.
### Motivation and Context
build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO
pool.
Reuse EO pool to make the pipeline work now.
When I'm free, I'll try upgrading the chrome in the custom image.
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
yolo-v8 model missing operator support.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
- Update QNN CI Pipelines to use QNN SDK version 2.17.0
- **Print warning if unit test requires adjusted tolerance to pass**
- **Temporarily disable unloading QnnCpu.dll for windows x64 due to
crash when calling FreeLibrary**
- Enable fixed HTP tests
- QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale
- QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8
- QnnHTPBackendTests.ReduceSumS8Opset13_Rank5
- QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis
- QnnHTPBackendTests.WhereLargeDataBroadcastU8
- QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8
- Enabled fixed CPU tests
- QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales
- Increased tolerance for HTP tests that are less accurate on QNN SDK
2.17.0
- QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8
- QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8
- QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8
- QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input
- QnnHTPBackendTests.ConvU8U8S32_bias_initializer
- QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer
- QnnHTPBackendTests.LRNSize3
- QnnHTPBackendTests.LRNSize5
- QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8
- QnnHTPBackendTests.MaxPool_LargeInput_1Pads
- QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners
- QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric
- Disabled ONNX model tests
- averagepool_2d_ceil: Accuracy issues **only on Windows x64
QnnCpu.dll**
- Disabled QDQ model tests (onnx_test_runner)
- facedetection_op8_qdq: Accuracy issues
- Disabled CPU EP tests (these use QnnCpu.dll)
- ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX
- GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is
initializer and bias is not
- MathOpTest.MatMulFloatType "test padding and broadcast B > A":
Inaccuracy (**only linux**)
- Fix Gemm translation bugs in QNN EP:
- Do not skip processing of inputs that need to be transposed.
### Motivation and Context
- Allow testing with newest QNN SDK version
- Take advantage of improvements to enable new models.
### Description
To make the code more consistent. Now some TRT pipelines download TRT
binaries on-the-fly, while other TRT pipelines use a preinstalled
version. This PR make them the same.
### Description
<!-- Describe your changes. -->
Remove developement id and force codesign not required in the test macos
target.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix failure happened in iOS_Full_xcframwork stage in
Zip-Nuget-Java-NodeJS packaging pipeline.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
### Description
1. Add a new stage to download java tools from https://oss.sonatype.org
and publish them to pipeline artifact
2. Remove downloads in other jobs, they get the java tools from pipeline
artifact
3. consolidate final_java_testing stages.
### Motivation and Context
Reduce downloads to reduce the connection error like below.
```
--2023-11-28 07:16:31-- https://oss.sonatype.org/service/local/repositories/releases/content/org/junit/platform/junit-platform-console-standalone/1.6.2/junit-platform-console-standalone-1.6.2.jar
Resolving oss.sonatype.org (oss.sonatype.org)... 3.227.40.198, 3.229.50.23
Connecting to oss.sonatype.org (oss.sonatype.org)|3.227.40.198|:443... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2023-11-28 07:16:32 ERROR 502: Bad Gateway.
```
### Description
Currently, the `drop-nuget` artifact only contains protoc.exe which is
also part of the `drop-extra` artifact.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
As title.
1. Add macos build as an optionally enabled arch for pod and changes to
exsiting build_ios_framework/assemble_c_pod scripts.
2. Enable macos build arch in ios packaging pipeline (currently for
variants other than Mobile) and check the output artifacts are correct.
3. Write MacOS Test Target scheme in the test app and integrate into ios
packaging CI testing pipeline.
Currently the changes only apply to onnxruntime-c pod. as the original
request was from ORT SPM which consumes the onnxruntime-c pod only as
the binary target. TODO: could look into adding macos platform to objc
pod as well.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable macos platform support in cocoapods. and also potentially produce
binary target for enabling macos platform in SPM as well.
Replace https://github.com/microsoft/onnxruntime/pull/18334
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
Update Azure-Pipelines-EO-Windows2022-aiinfra to
onnxruntime-win-CPU-2022 in Nuget_Package_CPU.
To make the debugging easier, use flex-downloadPipelineArtifact
### Motivation and Context
Azure-Pipelines-EO-Windows2022-aiinfra is using 1ES window-latest image.
The pipeline might be failed by unexpected upgrade.
Verified:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=384425&view=results
### P.S.
I think we should replace all Azure-Pipelines-EO-Windows2022-aiinfra.
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Added for yolov8 model missing operator support.
https://github.com/microsoft/onnxruntime/issues/17654
Now the model support info looks like:
_CoreMLExecutionProvider::GetCapability, number of partitions supported
by CoreML: 3 number of nodes in the graph: 233 number of nodes supported
by CoreML: 230_
(only missing 3 concat op support due to input 3d shape is not currently
support in CoreML EP Concat).
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Build ORT-training packaging pipeline for CUDA 12.2
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will help any customer using CUDA 12 and would not need to build
ORT-training from source
Test run:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Recent PyTorch breaks DORT CI and [a
patch](https://github.com/pytorch/pytorch/pull/113697) has been merged
into PyTorch main. In order to update DORT's CI, we made dummy change in
this PR.
### Description
It causes our "NPM Packaging Pipeline" to fail.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Always run emsdk_env.sh before build.py, even when ccache is disabled
This is a follow up to #18434. That PR didn't handle the case when
ccache was disabled.
This also set the Path variable for the downloaded libraries.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
1. Introduce MoE CUDA op to ORT based on FT implementation.
2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows.
Remove patch file for cutlass 3.0.0.
3. Sharded MoE implementation will come with another PR
limitation: __CUDA_ARCH__ >= 700
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
add CI steps to log info for test failure investigating.
Currently Web CI is marked as 'optional'. This change adds some script
to dump debug info for investigating the random test failure
### Description
Make all build_wasm tasks (NPM packaging and post merge)run on Linux.
Enable web gpu test in npm package pipeline too.
### Motivation and Context
Even on Windows, build_wasm is running in cygwin.
So, it could save a lot of time to run it on Linux.
### Description
<!-- Describe your changes. -->
Set DML package name correctly so the build doesn't try and include mobile targets.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline.
### Description
<!-- Describe your changes. -->
Fix bad delegates.
Add script to detect mismatch, and run in CI and when creating nuget
package.
Ignore whitespace when looking at the diff to the .cs file as
clang-format ran.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#18363
### Description
Only one of "--cuda_version" and "--cuda_home" is needed. If they were
both specified, the first one will take precedence. Since we download
cuda SDKs on-the-fly now, the machines will not need to have a
preinstalled CUDA SDK therefore will not have VS-CUDA integration
extension. Therefore the "--cuda_version" flag will not work. This PR
deletes such usages.
Related PR: #15915
### Description
This PR fixes the TypeScript type check.
Previously, when I use esbuild to replace webpack (#17745), typescript
typecheck was disabled. This causes a few TypeScript type error checked
in into the code base. This PR fixes the followings:
- Use "Node16" as default "module" value in tsconfig.json, because in
TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is
an invalid combination.
- Set `noUnusedParameters` to true as default. in web override it to
false because multiple code need to be updated ( a following-up PR will
do this )
- set correct project file for 'web/lib/**/*.ts' for ESLint (otherwise
WebGPU types are not populated correctly)
- fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts
- upgrade "@webgpu/types" to latest to fix type error in file
js/web/lib/wasm/jsep/backend-webgpu.ts
- add package script "prebuild" for web to run tsc type check
- add type check in CI yml file
### Description
1. Add a build validation for Linux ARM64/ARM32 cross-compile to catch
issues listed in #18195 .
2. Revert eigen's commit id back to what we had before.
### Motivation and Context
To catch cross-compile issues.
Added a TODO item for fixing the compile warnings in Linux ARM32 build: AB#21639
### Description
Add the pool definition in 2 stages even the pool is Microsoft-Hosted
Pool.
### Motivation and Context
Recently, in Nuget pipeline, when we click the Stages to Run

It always pops up
```
Encountered error(s) while parsing pipeline YAML:
Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz.
Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz.
```
1. Now we use a released version of ONNX, so we can directly download a
prebuilt package from pypi.org. We do not need to build one from source.
2. Update protobuf python package's version to match the C/C++ version
we are using.
3. Update tensorboard python python because the current one is
incompatible with the newer protobuf version.
### Description
Add CI changes for #18287
Install onnx explicitly to pass windows GPU+dml stage.
### Motivation and Context
'eigen-3.4' was refering to a branch, not to a tag. There is now an
Eigen 3.4.1 on that branch, and thus the hash has changed.
See
https://github.com/microsoft/onnxruntime/issues/18286#issuecomment-1793683416
### Description
Update the C# nuget build infrastructure to make building a test nuget
package more user friendly and to simplify
- Remove usage of dotnet and msbuild in CIs
- was temporary requirement until .net 6 MAUI was added to the released
Visual Studio
- remove SelectedTargets property and its usage
- Add property for excluding mobile targets
- generally we exclude based on the nuget package name
- can now specify `/p:IncludeMobileTargets=false` on the command line to
force exclusion
- support building test package using build.py `--build_nuget` better
- limit inclusion of xamarin targets as building with them requires a
lot more infrastructure
- use msbuild directly if xamarin targets are included. use dotnet
otherwise.
- remove quoting of property values as it doesn't appear to be necessary
and breaks when msbuild is being used
- add infrastructure to be able to pack the nuget package on linux with
`dotnet pack`
- `nuget pack` is not user friendly as-per comments in changes
- requires stub csproj to provide the nuspec path
- Remove netstandard1.0 targets from nuspec
- we removed support from the actual bindings previously
- Remove usage of nuget-staging directory when creating nuget package on
linux
- the nuspec file element has a fully qualified path for a source file
so there is no obvious benefit to copying to a staging directory prior
to packing
### Motivation and Context
Address issues with 1P users trying to create test nuget packages
locally.
Long overdue cleanup of CI complexity.
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well
Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
- some ops use a workspace buffer
- copied workspace allocation from XNNPACK unit test code
- some suffixes changed
Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
- not an issue currently
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed
Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.
NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
### Description
Retry 3 times at most if the web test fails.
### Motivation and Context
Web GPU tests are not stable.
From this link, we could find these ort-web tests are all in top 10
failing tasks.
https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=161&contextType=build.
Generally, it could pass by manually rerunning it.
So, enable it to rerun automatically.
These test steps duration isn't long. So, it won't take too long to
retry.
### Description
Disable ccache for DML. This change is similar to #18104. Now the DML
build job is having the same timeout issue. I don't know why. But
disabling ccache probably would help.