- Update ROCm and MIGraphX CI to ROCm5.7
- Simplify test exculde file. Some tests will output `registered
execution providers ROCMExecutionProvider were unable to run the model.`
if they cannot run.
- Add `enable_training` build argument for MIGraphX pipeline.
Python package pipeline fails due to "tokenizers" compilation. Since
"tokenizers" is a dep of "transformers", we update its version and hope
a new solution had been there.
```
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib/src/models/bpe/trainer.rs:517:47
```
- we will publish the onnxruntime-training-rocm package on ADO feeds.
The onnxruntime-training package will solely be for cuda.
- Add new pipeline for onnxruntime-training-rocm ADO feeds
https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1278. Only
package with latest rocm version is publish to ADO.
### Description
Improve the QNN context binary cache feature to reduce the memory
overhead and initialization time overhead.
Instead of dumping a Qnn context binary file with metadata as header, we
dump a Onnx format file with metadata inside Onnx node.
### Motivation and Context
reduce the memory overhead and initialization time overhead
Two major modifications of this PR:
1. Refactor OrtTensorRTProviderOptions initialization and make it easy
to add new field.
2. Make Python API capable of using TensorRT plugins by adding new
Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs
to register ep's custom op domain before model load. For C++ API, it's
slightly different, when calling
SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op
domain to session option. Later ORT can register custom op domain from
session option before model loading)
Bump ruff version and remove pylint from the linter list. Fix any new
error detected by ruff.
### Motivation and Context
Ruff covers many of the pylint rules. Since pylint is not enabled in
this repo and runs slow, we remove it from the linters
### Description
<!-- Describe your changes. -->
Move the swift files to ORT SPM repo now:
https://github.com/microsoft/onnxruntime-swift-package-manager
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
This PR introduces
- New data structure to represent kernel-level (aka node-level or
op-level) tensor sharding informaiton. I consider it as the
fundamentaion of ONNX distribtued inference.
- Building blocks for distribtued kernels implementation especially
stateless implementation for communication ops.
- Implementation of DistributedMatMul and its tests.
Code structure:
- sharding.h/.cc: Function to shard and reshard tensors (calling into
NCCL).
- sharding_spec.h/.cc: Representation of how a tensor is sharded.
- distributed_matmul.h/.cc: Implementation of tensor parallel MatMul.
Inputs and outputs are sharded across devices.
- onnxruntime_test_distributed.py: distributed operator tests.
Example of specifying sharding information
```python
@onnxscript.script()
def matmul_rs_sr_rr(tensor_x: FLOAT, tensor_w: FLOAT) -> FLOAT:
# Run MatMul by sharding x along column axis and w along row axis on
# 2 GPUs.
return MICROSOFT_OPSET.DistributedMatMul(
tensor_x,
tensor_w,
device_mesh_shape=[2],
device_mesh_elements=[0, 1],
input_shard_specs=["RS[0]", "S[0]R"],
output_shard_specs=["RR"],
)
onnx_model = matmul_rs_sr_rr.to_model_proto(
input_types=[FLOAT[2, "s"], FLOAT["s", 2]],
output_types=[FLOAT[2, 2]],
)
```
In this example, the device mesh can be visualized as 1-D tensor, `[0,
1]`. The 2nd axis of `tensor_x` is sharded across `[0, 1]` (i.e., the
0-axis of the device mesh). Similarly, the 1st axis of `tensor_w` is
sharded across `[0, 1]` as well.
C++ classes to represent tensor sharding (copied from sharding_spec.h):
```cpp
class DeviceMesh {
public:
// [Device Mesh and Tensor Sharding for Tensor Parallel]
// Device mesh is a tensor of device indices.
// A tensor can then be partitioned along specific mesh axes.
//
// Assume we have 4 GPUs indexed by 0, 1, 2, and 3.
// Let's consider some examples.
// 1. 1D device mesh [0, 1, 2, 3]. In this case,
// device_mesh_shape is [4] and device_mesh_elements
// is [0, 1, 2, 3].
// If we want to shard a 2-D tensor along its axis 1, the
// corresponding sharding spec is a string "RS[0]".
// 2. 2D device mesh [[0, 1], [2, 3]]. In this case,
// device_mesh_shape is [2, 2] and device_mesh_elements
// is [0, 1, 2, 3].
// If we want to shard a 2-D tensor's
// rows along mesh axis 1 and
// columns along mesh axis 0, the
// corresponding sharding spec is a string "S[1]S[0]".
// If that 2-D tensor's value is np.array([[5, 6], [7, 8]]),
// GPU 0/1/2/3 owns 5/7/6/8. Below is a visualization the sharding
// proccess.
// - Start with a 2-D device mesh [[0, 1], [2, 3]] and
// a 2-D tensor [[5, 6], [7, 8]]
// - GPU: [[0, 1], [2, 3]], Tensor: [[5, 6], [7, 8]]
// - Split GPU mesh along axis 1 and tensor along
// axis 0 for "S[1]" in "S[1]S[0]"
// - GPU: [[0], [2]], Tensor: [[5, 6]]
// GPU: [[1], [3]], Tensor: [[7, 8]]
// - Split GPU mesh along axis 0 and tensor along
// axis 1 for "S[0]" in "S[1]S[0]"
// - GPU: [[0]], Tensor: [[5]]
// - GPU: [[2]], Tensor: [[6]]
// - GPU: [[1]], Tensor: [[7]]
// - GPU: [[3]], Tensor: [[8]]
// Actual shape of device mesh represented by `device_mesh_elements`.
std::vector<int64_t> device_mesh_shape;
// Flattened device mesh.
std::vector<int64_t> device_mesh_elements;
};
class AxisPartitionSpec {
// [Device Mesh and Tensor Sharding for Tensor Parallel]
// This class is the in-memory representation of
// 1. if a tensor is sharded or not (aka replica), and
// 2. which tensor axis is shard by which device mesh axis.
// Let's consider sharding 2-D tensor along column axis on
// device mesh [0, 1] as an example.
// The required sharding spec RS[0] can be represented by
// - AxisPartitionSpec(Condition::Replica, -1)
// - AxisPartitionSpec(Condition::Shard, 0)
public:
// Status of a tensor axis.
// A tensor axis can be either sharded or replicated
// along a device mesh axis.
enum class Condition { Replica,
Shard };
// This field tells if a tensor axis is sharded or not.
Condition cond;
// If a tensor axis is sharded, this field tells which device
// mesh axis to distribute the shards along.
// If a tensor axis is not sharded, this field is ignored.
int device_mesh_axis;
// A helper to construct a replica spec for a tensor axis.
static AxisPartitionSpec CreateReplica() {
return AxisPartitionSpec(Condition::Replica, -1);
}
// A helper to construct a sharding spec for a tensor axis.
// This tensor axis is sharded along `device_mesh_axis` in device mesh.
static AxisPartitionSpec CreateShard(int device_mesh_axis) {
return AxisPartitionSpec(Condition::Shard, device_mesh_axis);
}
};
class TensorPartitionSpec {
// [Device Mesh and Tensor Sharding for Tensor Parallel]
// TensorPartitionSpec holds a collection of AxisPartitionSpec and an
// associated DeviceMesh. It is responsible for determining how a tensor
// should be partitioned across a device mesh.
//
// Example 1: RS[0]
// In this scenario, `axis_specs` would contain two `AxisPartitionSpec` objects.
// - The first object is a Replica, denoting that the first axis of the tensor is
// not sharded but is instead replicated.
// - The second object is a Shard along the 0-th axis of the device mesh. It denotes
// that the second axis of the tensor is sharded along the first axis of the
// device mesh.
//
// Example 2: S[0]RR
// In this scenario, `axis_specs` would contain three `AxisPartitionSpec` objects.
// - The first object is a Shard along the 0-th axis of the device mesh, indicating
// that the first axis of the tensor is sharded along the first axis of the
// device mesh.
// - The second and third objects are Replicas, indicating that the second and third
// axes of the tensor are not sharded but are instead replicated.
public:
// axis_specs[i]: AxisPartitionSpec for tensor axis i. For a 2-D tensor,
// axis_specs[0] is for row axis and axis_specs[1] is for
// column axis. axis_specs[i].device_mesh_axis = j means that
// tensor axis i is sharded along device mesh axis j.
std::vector<AxisPartitionSpec> axis_specs;
// device_mesh: DeviceMesh for sharding the associated tensor.
// Read [Device Mesh and Tensor Sharding for Tensor Parallel] in DeviceMesh's comment.
DeviceMesh device_mesh;
};
```
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
### Description
this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released
### Motivation and Context
Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT.
---------
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
1. Upgrade nodejs from 16.x to 18.x for Windows pipelines
2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs
nodejs from internet or local disk cache. But we already moved all Linux
tests to docker. So we do not need the installer anymore.
3. Remove some other unused code.
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v} libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v}
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
Package Architecture Version Repository Size
========================================================================================================================
Upgrading:
libnvinfer-devel x86_64 8.6.1.6-1.cuda12.0 cuda 542 M
libnvinfer-headers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 118 k
libnvinfer-headers-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 14 k
libnvinfer-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 13 M
libnvinfer-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 13 M
libnvinfer-vc-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 107 k
libnvinfer-vc-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 251 k
libnvinfer8 x86_64 8.6.1.6-1.cuda12.0 cuda 543 M
libnvonnxparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 467 k
libnvonnxparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 757 k
libnvparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 2.0 M
libnvparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 854 k
Installing dependencies:
cuda-toolkit-12-0-config-common noarch 12.0.146-1 cuda 7.7 k
cuda-toolkit-12-config-common noarch 12.2.140-1 cuda 7.9 k
libcublas-12-0 x86_64 12.0.2.224-1 cuda 361 M
libcublas-devel-12-0 x86_64 12.0.2.224-1 cuda 397 M
Transaction Summary
========================================================================================================================
```
As you can see from the output, they are CUDA 12 packages.
The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.
```bash
v="8.6.1.6-1.cuda11.8" &&\
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.
2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)
### Description
Include onnxruntime_float16.h in the package.
### Motivation and Context
This was missed in the recently released 1.16 pkgs (except Nuget).
manylinux build is used for nightly packaging generation and it's hard
to capture issue in time when related files change. This PR add
manylinux build in CI.
### Description
1. use standard win build template
2. enable compiler cache
### Motivation and Context
Make win build task easy to maintain and accelerate the pipeline.
### Description
PR 15470 updated some C/C++ dependencies. The change caused ROCM EP's
nightly build to fail. see issue
https://github.com/ROCm-Developer-Tools/HIP/issues/2082 for a
background. So, the root cause is HIP compiler has a special requirement
that HIP's include dirs must be used before the operating system's
include folder: /usr/include. HIP adds "-isystem" in front of
"/usr/include". gcc or clang will search the folders added with "-I"
first, then the "-isystem" folder. It works fine as long as we do not
add "-I/usr/include" to the compile commands for *.cu files. It would be wrong if
we already have installed an open source library to /usr and want to use the
prebuilt library from there instead of the current build dir.
### Motivation and Context
### Description
supplement of #17417
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
The name of nightly ACPT image has been updated to
`ptebic.azurecr.io/internal/aifx/acpt/nightly-ubuntu-cuda-torch-dev`
As the previous image alias had `cu118`, `torch210dev` or `py38`, any
version update will break the training nightly pipeline
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Using constant image alias to avoid pipeline failure.
### Description
Delete all Prefast tasks because the new VS 17.7 version crashes every
time when we run the task on our CI build servers. However, we cannot
reproduce it locally. And this problem blocks us installing security
patches to our CI build machines.
Will use [CodeQL](https://codeql.github.com/) instead.
### Motivation and Context
Address some security alerts.
The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name.
### Description
Updates the version of QNN SDK used by CI Pipelines. Enables some tests
fixed by 2.14.1, but still need to look into Resize in a separate PR.
### Motivation and Context
Test latest version of QNN SDK.
### Description
Update the Web CI pipelines:
- remove parameter 'WebTemplate': Since we start to support webgpu, the
linux-web-ci.yml is no longer working and it is already out-of-date.
remove this file and parameter so that we always use win-web-ci.yml
- change flag `RunWebGpuTests` into 2 flags, for release and debug.
Currently for CI we only run webgpu tests on release build. But we want
to have the capability to run webgpu tests on debug build as well.
After this PR is merged, next step is to enable both Debug and Release
webgpu tests in PostMerge pipeline.
### Description
[Successful pipeline
run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results)
Added flag to build the training artifacts & updated the
pull-wasm-artifacts script to pull the training artifacts as well.
Bundled into this PR are minor formatting fixes + naming fixes.
### Motivation and Context
[This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended
the WASM API wrapper to build training WASM artifacts as well.
The ORT training WASM artifacts are required to support ORT training web
bindings.
### Description
The yaml file changes made in #16050 do not really work. Currently the
pipeline is failing with error:
```
Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib
```
So, I will revert the yaml changes first to bring the pipeline back.
Some people are waiting for our nightly packages.
Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results
### Motivation and Context
### Description
install dotnet 6.0 in the docker image.
move C# build and test into docker.
### Motivation and Context
### Note
The Unit tests and Symbolic shape infer's migration will be in another
PR.
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.
2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Now we have multiple data types that we want to disable for minimal
build and to reduce binary size. may be worth adding an argument in the
build script for specifying that.
Also for fp16 type stuff, it may be too restrict to disable that for all
minimal build.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
### Description
Add the compiler cache in linux GPU tensorRT CI.
Save about 30 minutes in the GPU machine. (52 minutes -> 24 minutes)
PS.
There're only white-space differences in the dockerfile.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Get the latest gcc 12 by default
---------
Co-authored-by: Changming Sun <chasun@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
* Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10
vm)
* Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon)
* Add multi gpu support to existing C# test cases
* Deprecate unfunctional flag `--enable_multi_device_tests`
### Motivation and Context
* Two contexts of replacing Linux MultiGPU TRT CI:
* Flag `--enable_multi_device_tests` is not functional, which cannot
detect issues like #17036
* The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23.
Need to enable this test in other dualGPU vm pool.
### Description
Add single test step in Window GPU Reduced Ops workflow
### Motivation and Context
The old workflow's building and testing were running in one command.
In PR #17263, the test step was removed by mistake.
So, readd it.
How to consolidate the test step is in consideration.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Unify some pre-build common steps.
### Motivation and Context
In the long run, other devs should only focus on build option and test
commands.
It would reduce mistakes and maintenance cost to use common template
steps.
There will be more PRs to achieve the goal.
### Description
1. Add a CUDA 12.x pipeline
2. Improve install_third_party_deps.ps1: avoid using Start-process.
Directly call the command instead.
### Motivation and Context
Since our official packages and all CI pipelines still use CUDA 11.x, we need extra pipelines to validate our source code level compatibility with CUDA 12.x. BTW for sure the prebuilt binaries in our release page are not compatible with CUDA 12.x. Do not report bugs for that.
AB#15152
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.
Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results
### Motivation and Context
Currently the pipeline has a crash error.
AB#18580
### Description
Updates NuGet packaging pipelines to use the correct license name.
### Motivation and Context
The license name changed. See https://github.com/microsoft/onnxruntime/pull/17170
The QNN_Windows_Nuget and Zip-Nuget-* pipelines will not run without this update.
### Description
Move DML build job's Prefast task to a CPU machine pool which has larger
memory. The current one runs out of memory in every run.
### Motivation and Context
To fix the broken python packaging pipeline.
- 'js/web'
- 'js/node'
- 'onnxruntime/core/providers/js'
is updated
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Adds continuous integration and pull-requestion validation triggers
directly to the yaml file for the Windows x64 QNN CI Pipeline.
### Motivation and Context
There have been various unit tests failures that break the
QNN_Windows_Nuget pipeline, which builds QNN EP for Windows x64. This PR
ensures that QNN EP is built and tested on a Windows x64 image for every
pull request.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The onnxruntime-CI-nightly-ort-pipeline encounters occasional failures
due to synchronization discrepancies between the ACPT nightly image and
the repository. We are addressing this by executing tests using the
commit ID associated with the ort build within the ACPT image.
---------
Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
1. Clean up cmake files. Remove some unused code
2. Remove the "Semmle" task from
tools/ci_build/github/azure-pipelines/templates/win-ci.yml. Semmle is
deprecated and replaced by CodeQL.
### Description
- Disables Resize tests that use nearest mode on QNN CPU.
- Fixes indentation problems on yaml for win x64 qnn pipeline.
### Motivation and Context
The QNN windows Nuget pipeline does not run due to failing unit tests on
Windows x64. These tests should not be enabled until we determine the
rounding behavior of QNN's ResizeNearestNeighbor operator.
ROCm python package pipeline failed because this
PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx
version to a commit and we need to build onnx from source. Low protobuf
version will cause build errors.
This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will
install by `install_os_deps.sh`.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The correct name should be onnxruntimecpubuildpython
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description
Some pipelines are failing. It is because PR #16325 set ONNX version to
`rel-1.14.1` . It is a branch name, not a commit or tag name. It means
whenever the branch got a new commit, we will auto pick it and use it.
### Description
This change upgrade emsdk to 3.1.44.
Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".
most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
### Description
enable webgpu in browser unit test.
The CI pipeline uses Edge v113+ which enables WebGPU.
===
**UPDATE on 08/07/2023:**
- add flags to Edge browser launch commandline so that Edge on CI agents
can initialize WebGPU correctly.
- ONLY enable webgpu on web release build. Other pipelines are using
flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly.
- disable "Resize" related test failures. Once they are fixed the tests
can be re-enabled.
---------
Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>
Add script to get iOS simulator device info so we don't need to use hardcoded specifiers which may or may not refer to a valid simulator device.
Add use-xcode-version step to a packaging pipeline so it uses a consistent version of Xcode.
### Description
1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak
details
1. General Valgrind logs and logs related to ORT-TRT will be parsed in
[CI
artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts)
1. Logic:
1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log
to `valgrind.log`
2. Identify if any `definitely lost` memleak happened
1. For log paragraphs which show `definitely lost`, parse if they have
keyword `TensorrtExecutionProvider`.
2. If so, extract these details to `ort_trt_memleak_detail.log`, and
return `build failure` to EP Perf CI
3. Fix existing addressSanitizer and sync the squeezenet testcase with
latest update from
[ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp)
1. Updates in short: Upgrade main.cpp to be using
OrtTensorRTProviderOptionsV2
4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable
MemTest by default
### Description
This change allows Web CI to do some check as the first step, so that if
there are errors it won't launch the task to build web assembly, which
is heavy.
Checks includes:
- "npm ci" in /js, /js/common and /js/web. this implicitly include:
- typescript compiler in /js
- typescript compiler in /js/common
- webpack build in /js/common
- typescript compiler in /js/web
- ESLint on typescripts
- clang-format formatter (.js, .ts, .cc, .h, .mm)
- Prettier formatter (.json, .jsonc, .md)
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
1. rename OrtValue.FillStringTensorElement to StringTensorSetElementAt .
To the API user I think we're conceptually setting the string at an
offset in the tensor with is roughly equivalent to `List<string> list
... list[index] = "value"`.
2. While working on new inference examples, I noticed that I am still
inclined to use `DenseTensor` for N-D indexing. Added `GetStrides()` and
`GetIndex()` from strides for long dims, so the user can obtain strides
and translate N-D indices into a flat index to operate directly on the
native `OrtValue` buffers. Expose these functions to the user.
3. Make sure we generate docs for C# public static functions.
### Description
There are currently multiple failures that blocking the CI pipelines so
this PR has all of the fixes in order to make sure it passes the CI.
Otherwise a single fix will still fail the CI.
includes:
#16960#16958
Please help to make sure this PR get merged once CI passed.
@snnn @carzh @guschmue
Fixed:
[AB#18118](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18118)
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
- enable unit test for js/common in CI
- add debug config in js/.vscode/launch.json
- enable source map for js/common/test for debugging purposes; add
source map files to ignore list
- ignore js/common/test folder for npm packaging
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
The `%AGENT_TEMPDIRECTORY%\v11.8` is created in azcopy step.
So, the set env step should be after the azcopy step.
### Motivation and Context
Correct the previous logic
Unify the step since multiple jobs are using it.
### Description
<!-- Describe your changes. -->
Split stages for CPU and CPU+NNAPI builds as CodeQL is enabled at the
stage level.
We run it for CPU+NNAPI as that covers all the Android code.
We don't want to run it for both as duplicate issues would be created
for a problem in code included in both builds.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Remove VS 2019 code.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
These yaml files and docker files are not used by any pipeline. If I
were wrong, feel free to submit a PR to get the wrongly deleted file
back from git history (git keeps everything forever).
### Description
### Motivation and Context
It's also used to upgrade visual studio to VS2022.
onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-tensorrt8-winbuild-t4
are using the image based on one dev branch and VS2019
To avoid breaking the current CIs, we move jobs running on
onnxruntime-gpu-winbuild-T4/onnxruntime-gpu-tensorrt8-winbuild-t4 to
onnxruntime-Win2022-GPU-T4.
### Description
Support SmoothQuant for ORT static quantization via intel neural
compressor
> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.
### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.
---------
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
### Description
Disable two PERF* rules in ruff to allow better readability. Rational
commented inline. This change also removes the unused noqa directives
because of the rule change.
### Motivation and Context
Readability
Update upload_pod_archive_and_update_podspec.sh to take a pod archive path glob pattern. The actual pod archive path has a version suffix which changes.
### Description
1. use the pool with VS2022
2. upgrade System.Memory to 4.5.5
### Motivation and Context
Solve the build error while using VS2022:
`[Failure] Msbuild failed when processing the file
'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj'
with message: Method not found: 'System.ReadOnlySpan`1<Char>
Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'`
Ref:
https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789
Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.
Signed-off-by: Justin Chu <justinchu@microsoft.com>
This pull request contains a few changes:
1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.
### Motivation and Context
Test with the latest QNN SDK.
### Description
This PR is includes changes in the documentation of _readmeOV.rst_ file
and also the changes in the dockerfile which enables to build ORT with
latest OpenVINO 2023.0.0
### Motivation and Context
Modified the dockerfile to incorporate the latest version of OpenVINO
(2023.0.0) for building Onnxruntime.
The changes in the PR aim to improve the overall user experience by
providing accurate and up-to-date documentation while leveraging latest
OpenVINO 2023.0.0
### Description
<!-- Describe your changes. -->
Delete second reference to onnxruntime_api_tests_without_env in the code
coverage commands. One was removed in #16373 and the duplicate wasn't
noticed.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
This PR mainly optimize ROCm CI test to reduce time and CPU utilization.
- use smaller batch size on strided_batched_gemm/batched_gemm test
- disable cpu training test
- fix test_e2e_padding_elimination Occasional failures on ROCm.
### Description
Use pipeline cache instead of reading data from the image.
### Motivation and Context
1. To reduce the browser dependency of custom image.
2. The onnx node test data is less than 30M and the cache download time
is very short.
### Description
<!-- Describe your changes. -->
Add ort_value.h to session_options.h so OrtValue is defined.
Update a unit test binary to add required include paths. Adding
ort_value.h pulls in more data type headers.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#16193
- Move ROCm build step on CPU only machine
- Add the performance data of the huggingface bert-large model on the
MI200
- At the beginning of the test step, check the agent's GPU usage and
kill the threads occupying the GPU, which may be left over from previous
tasks that exited abnormally.
- Use different docker images during the build and test steps. The
difference is the `uid` and `user` when build docker image and create
docker container.
- Update some documentation comments.
- Use onnxruntime_training.h as the umbrella header so training API docs are included in generated docs.
- Fix static analysis build.
### Description
Enable support for building iOS packages/CocoaPods with training API
- Add `Training` Package variant and config files in current iOS
packaging utilities to enable creation of training packages
### Motivation and Context
This PR introduces new `Training` variant in
`build_and_assemble_ios_pods.py` script which allows creating pods for
iOS with training API enabled.
The sample script to build training pods:
```
python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py --variant Training \
--build-settings-file tools/ci_build/github/apple/default_full_ios_training_framework_build_settings.json \
-b=-- path_to_protoc_exe=<path/to/protoc>
```
Note: build settings file should have `--enable_training` as a build
parameter.
Simply adding training packaging increases the duration of the Azure
pipeline for packaging by 70 minutes. To address this issue, we need to
parallelize pod creation. In order not to further strain the pipeline,
the changes for training packaging will be added in another PR, which
optimizes the packaging pipeline.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
The ONNX exporter in DORT have been moved to PyTorch as a formal
feature. We therefore switch to consume the exporter from PyTorch
instead of maintaining two duplicates.
### Description
<!-- Describe your changes. -->
Set emulator logging to verbose to see if it helps with intermittent
React Native CI failures when emulator crashes at startup
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Works with local onnxruntime-c pod in js/rn/e2e test.
The image for the onnxruntime-CI-nightly-ort-pipeline is too old.
The ort package in the image is older than latest test code in nightly
ci. This causes the nightly ci failed.
Some CI jobs may interrupted unexpectedly and didn't execute umount data
step. The data left in host device will cause `device or resource busy`
and make subsequent CI jobs fail.
Move the mount data step into docker container, the host machine will
not be occupied when CI jobs exit incorrectly.
### Description
Revert docker base image to
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e
### Motivation and Context
The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has
minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with
Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider
tests (these three tests have higher error which are above the tolerance)
That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor
that make maxwell GPU generator higher error. CIs with T4 GPU are not
affected.
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Set onnxruntime-c local pod path environment variable for react native
e2e tests on react-native-ci.yml
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Previously the E2E test project is not properly consuming a local built
onnxruntime-c version pod.
https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816
### Description
1. Keep symlink in the package.
2. keep the artifact package format
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The build pipeline runs on Azure NV12 machines that will be deprecated
soon because the SKU is too old. So this PR will move the pipeline to a
Windows machine with two A10 GPUs.
1. Enable xnnpack test
2. Change TSA database name from onnxruntime_master to onnxruntime_main.
This is a leftover of renaming the "master" branch to "main"
3. Add two static analysis jobs for WinML and DML
4. Rename the machine pool "aiinfra-dml-winbuild" to
"onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO
instances use the same machine pool name.
5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4"
to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have
enough T4 GPUs.
### Description
<!-- Describe your changes. -->
Publish E2E test logs on build failure too.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Get more information about intermittent test failures.
### Description
A few QDQ tests failed on XNNPACK EP.
The reason should be the range of input_data doesn't fit for scale and
zero_point.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add an API for users to get version of current package. example usage:
```js
import { env } from 'onnxruntime-node';
console.log(env.versions.node); // output "1.16.0"
```
```js
import { env } from 'onnxruntime-web';
console.log(env.versions.web); // output "1.16.0"
console.log(env.versions.common); // output "1.16.0"
console.log(env.versions.node); // output "undefined"
```
#16156
### Description
1. Updated Mac package workflow for easily debugging.
2. Changed Archive type from tgz to zip since zip is supported by ESRP.
3. .../dylib.dSYM/Contents/Resources/DWARF/libonnxruntime.1.16.0.dylib
is a debug symbol file, so it couldn't be signed.
### Motivation and Context
It‘s required from VS code.
Mac binaries in nuget should be signed
### Description
Implement Objective-C binding for `ORTCheckPoint`. Additionally,
- Modify `onnxruntime_objectivec.cmake` to only include training header
and sources when training flag is enabled
- Enable objective-c binding for `orttraining-mac-ci-pipeline`
### Motivation and Context
This PR is part of implementing Objective-C bindings for training API.
It implements objective-c binding for ORTCheckPoint class. The
objective-C API closely resembles the C++ API.
**Note**: The test for saving checkpoint is skipped as it requires use
of training session. It will be added when the objective-c binding for
`ORTTrainingSession` is added.
- Fix flatbuffers flatc warning, unused-but-set-variable.
- Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code).
- Update CI builds to use Xcode 14.3.
- Update minimum iOS version to 12.0.
- Update Mac hosted agents to MacOS 13 where possible.
MIGraphX CI
- Change docker container user name to `onnxruntimedev`
ROCm CI
- Build docker image every job instead of using prebuild image.
- Every job create a container with only one GPU with command `docker
run -it --device=/dev/kfd --device=/dev/dri/renderDxxx`
- Remove tests that are unstable or use outdated interfaces.
- Enable training ortmodule test.
### Description
1. Avoid taking dependency on dl.fedoraproject.org
The website is not very stable. Our build pipelines often fail to fetch
packages from there.
2. Update manylinux to the latest version
### Description
1. Add a Memory Profiling build job
2. Remove no absl build job since the feature will be removed
3. Simplify post-merge-jobs.yml by unifying the pool names
### Motivation and Context
To catch build errors in #16124
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
1. Cherry-pick #16054 back to the main branch
2. Replace onnxruntime-gpu-winbuild-t4 with onnxruntime-Win2022-GPU-T4.
The later one has VS2022.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
### Description
This change is a follow-up to #15327. It adds Unary operators (Sqrt,
Reciprocal) and Reduce operators (ReduceSum, ReduceMean). I've tried to
follow existing patterns in the code :-)
### Motivation and Context
This reduces fragmentation across EPs when using CoreML on macOS,
thereby speeding up execution.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
Enable Qnn Context cache feature to save model initialization time
Provider options:
qnn_context_cache_enable|1 to enable the cache feature
qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default.
### Motivation and Context
Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost.
---------
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Here's the motivating issue:
https://github.com/microsoft/azure-pipelines-tasks/issues/10331
Noticed some problems in other repos so also updating usages in ORT.
We may be fine now without it, but this change adds some safeguard against future additions of 'set -x' for debugging.
### Description
Change CUDA pipelines to download CUDA SDK in every build job
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Set gtest output while ctest is set to empty.
2. onnx_src in _deps shouldn't be removed because
onnx_test_pytorch_converted and onnx_test_pytorch_converted need to read
data from onnx/backend/test/data/..
### Motivation and Context
Test result report is important to find the flaky tests.
### To do
Tests are not inconsistent.
If ctest_path is empty, onnx_test_pytorch_converted and
onnx_test_pytorch_converted will not be executed, if it's not,
onnxruntime_mlas_test will not be executed.
270c09a37f/tools/ci_build/build.py (L1743-L1753)
### Description
After this PR there are following pool need to be updated.
old|new|note
---|---|---
onnxruntime-Win2019-GPU-dml-A10|tbd|
onnxruntime-Win2019-GPU-T4|onnxruntime-Win2022-GPU-T4|
onnxruntime-Win2019-GPU-training-T4|onnxruntime-Win2022-GPU-T4|ame as
the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4|tbd|
aiinfra-dml-winbuild|tbd|
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Old pool | New pool | Notes
-- | -- | --
onnxruntime-Win-CPU-2019 | onnxruntime-Win-CPU-2022 |
onnxruntime-Win2019-CPU-training | onnxruntime-Win2022-CPU-training-AMD
|
onnxruntime-Win2019-CPU-training-AMD |
onnxruntime-Win2022-CPU-training-AMD | Same as the above
onnxruntime-Win2019-GPU-dml-A10 | Need be created | You need to create a
new image for it first
onnxruntime-Win2019-GPU-T4 | onnxruntime-Win2022-GPU-T4 |
onnxruntime-Win2019-GPU-training-T4 | onnxruntime-Win2022-GPU-T4 | Same
as the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4| TBD|TBD
Win-CPU-2021|onnxruntime-Win-CPU-2022| will do it in next PR
Win-CPU-2019|onnxruntime-Win2022-Intel-CPU'| Intel CPU needed for
win-ci-pipeline.yml -> `stage: x64_release_dnnl`
<br class="Apple-interchange-newline">
### Motivation and Context
With vs2022 we can take the advantage of 64bit compiler. It also with
better c++20 support
Set default value for parameters in nuget-zip pipeline, and only apply
the configurations when they are not "NONE".
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Updates the default QNN SDK version to 2.10 for the QNN NuGet pipeline.
### Motivation and Context
Ensures that the daily QNN NuGet pipeline builds ORT using the latest
QNN SDK by default.
update ROCm/MIGraphX CI to ROC5.5.
TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(https://github.com/microsoft/onnxruntime/pull/15903)
- test_ortmodule_attribute_name_collision_warning
(https://github.com/microsoft/onnxruntime/pull/15884)
### Description
The CI is extremely slow on downloading source code (~1MB/sec) so the
web CI went timeout. This is blocking the PR/checks.
Increase the timeout temporarily.
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.
### Motivation and Context
---------
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
### Description
Fix the bug in #15693
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->