Commit graph

2137 commits

Author SHA1 Message Date
Jian Chen
96228c86a0
Adding Job names to jobs without a name (#20961)
### Description
Adding Job names to jobs without a name

### Motivation and Context
This way we will know which job fails CG scan.
2024-06-06 19:09:21 -07:00
Adrian Lizarraga
b5eb9e8a8a
[QNN EP] Update to QNN SDK 2.22 (#20628)
### Description
- Updates pipelines to use QNN SDK 2.22 by default.
- Linux QNN pipeline now uses an Ubuntu 22.04 image (required by QNN
SDK)
- Android QNN pipeline still uses the current Ubuntu 20.04 image. Will
update in a separate PR.
- Disables QDQ LayerNorm test that triggers QNN's graph finalization
error on QNN 2.22
- Increases accuracy tolerance for various HTP tests so that they pass
on Windows arm64.



### Motivation and Context
Test QNN EP with latest QNN SDK version by default.

---------

Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-06-05 18:25:23 -07:00
Jian Chen
5faeaf6437
Remove failOnStderr from Gradle cmakeCheck (#20919)
### Description
Remove failOnStderr from Gradle cmakeCheck



### Motivation and Context
The Gradle is still using the deprecated API
2024-06-04 13:54:49 -07:00
liqun Fu
51bc53580d
Update to onnx 1.16.1 (#20702) 2024-06-04 11:06:28 -07:00
Changming Sun
3dd6fcc089
Upgrade min ios version to 13.0 (#20773)
To align with Office and other MS products.
Office's support policy is:
"Office for iPad and iPhone is supported on the two most recent versions
of iOS and iPadOS. When a new version of iOS or iPadOS is released, the
Office Operating System requirement becomes the two most recent
versions: the new version of iOS or iPadOS and the previous version."
(from https://products.office.com/office-system-requirements)

The latest iOS version is 17. So they support both 17 and 16. Here I set
our min iOS version to 13 so that it will be a superset of what Office
supports.

This change would allow us using C++17's std::filesystem feature in the
core framework. The modifications were generated by running
```bash
 find . -type f -exec sed -i "s/apple_deploy_target[ =]12.0/apple_deploy_target=13.0/g"  {} \;
```

Cannot use 15.0 because otherwise iOS packaging would fail with:

```
/Users/runner/work/1/b/apple_framework/intermediates/iphoneos_arm64/Release/_deps/coremltools-src/mlmodel/src/MILBlob/Util/Span.hpp:288:9: error: cannot use 'throw' with exceptions disabled
        MILVerifyIsTrue(index < Size(), std::range_error, "index out of bounds");
```

The Google OSS libraries we use only officially support iOS 15+.
2024-06-04 10:15:20 -07:00
Yi Zhang
c5087b9b58
Improve stable diffusion image parity test stability (#20904)
### Description
1. Add one image into whitelist, but if the image is hit, the pipeline
status is warning.
2. adjust the image parity test tolerance



### Motivation and Context
improve pipeline stability
2024-06-04 10:19:32 +08:00
Jian Chen
456ab09d17
Component Governance fix round 5 (#20905)
…over the case where there is only single repo checked out

### Description
adding $(Build.SourcesDirectory)/cmake/external/onnx/third_party to
cover the case where there is only single repo checked out



### Motivation and Context
Fix CG issue
https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8862110?typeId=16576846
2024-06-03 14:22:22 -07:00
Jian Chen
ae8df4db8f
Split java's gradle build and test (#20817)
### Description

This PR to allow `./gradlew cmakeCheck` failed on
Windows_Packaging_(CUDA|TensorRT) Job. This way, it will still generate
all nessary jar and pom file need for later stage to consume while
`./gradlew cmakeCheck`will be also run again in the
Windows_Packaging_(CUDA|TensorRT)_Testing stage.


### Motivation and Context
Reduce the time of All java packaging stages by 30+ min.
2024-06-03 14:08:45 -07:00
Changming Sun
d13cabf7f9
Upgrade GCC and remove the dependency on GCC8's experimental std::filesystem implementation (#20893)
### Description
This PR upgrades CUDA 11 build pipelines' GCC version from 8 to 11. 

### Motivation and Context

GCC8 has an experimental std::filesystem implementation which is not ABI
compatible with the formal one in later GCC releases. It didn't cause
trouble for us, however, ONNX community has encountered this issue much.
For example, https://github.com/onnx/onnx/issues/6047 . So this PR
increases the minimum supported GCC version from 8 to 9, and removes the
references to GCC's "stdc++fs" library. Please note we compile our code
on RHEL8 and RHEL8's libstdc++ doesn't have the fs library, which means
the binaries in ONNX Runtime's official packages always static link to
the fs library. It is just a matter of which version of the library, an
experimental one or a more mature one. And it is an implementation
detail that is not visible from outside. Anyway, a newer GCC is better.
It will give us the chance to use many C++20 features.

#### Why we were using GCC 8?
It is because all our Linux packages were built on RHEL8 or its
equivalents. The default GCC version in RHEL8 is 8. RHEL also provides
additional GCC versions from RH devtoolset. UBI8 is the abbreviation of
Red Hat Universal Base Image 8, which is the containerized RHEL8. UBI8
is free, which means it doesn't require a subscription(while RHEL does).
The only devtoolset that UBI8 provides is GCC 12, which is too new for
being used with CUDA 11.8. And our CUDA 11.8's build env is a docker
image from Nvidia that is based on UBI8.
#### How the problem is solved
Almalinux is an alternative to RHEL. Almalinux 8 provides GCC 11. And
the CUDA 11.8 docker image from Nvidia is open source, which means we
can rebuild the image based on Almalinux 8 to get GCC 11. I've done
this, but I cannot republish the new image due to various complicated
license restrictions. Therefore I put them at an internal location in
onnxruntimebuildcache.azurecr.io.
2024-06-03 10:14:08 -07:00
Jian Chen
217b66fd85
Update py-publishing pipeline to use the resoure from packaging pipeline (#20888)
### Description
<!-- Describe your changes. -->



### Motivation and Context
To allow nightly release to be automatic triggered
2024-06-01 16:10:02 -07:00
Changming Sun
67bc9438d7
Update training packaging pipeline's docker files (#20853)
### Description
Similar to #20786 . The last PR was able to update all pipelines and all
docker files. This is a follow-up to that PR.

### Motivation and Context
1. To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2. Avoid hitting docker hub's limit: 429 Too Many Requests - Server
message: toomanyrequests: You have reached your pull rate limit. You may
increase the limit by authenticating and upgrading:
https://www.docker.com/increase-rate-limit
2024-05-30 23:48:42 -07:00
Edward Chen
a508130456
Address React Native pipeline component detection timeout (#20871)
mac-react-native-ci-pipeline.yml:
- We don't need to run component detection for PR builds so just disable it there.

npm-packaging-pipeline.yml:
- Manually added component detection task was being added twice - removed one.
- Increased timeout of stage where component detection is run since the existing timeout was close for some builds.
2024-05-30 16:37:03 -07:00
Changming Sun
65ef270e06
Update Aten pipeline's docker file to use UBI8 (#20856)
### Description
Now it uses CentOS 7 which is EOL. This PR updates it to UBI8.

### Motivation and Context
To deprecate CentOS 7 .
2024-05-30 07:38:15 -07:00
Jian Chen
228713f635
adding publishing stage to publish java CUDA 12 pkg to ado (#20834) 2024-05-29 16:24:23 -07:00
Vincent Wang
e77f238dc6
Update Torch Version to Fix ATen CPU Pipeline Failure (#20845)
Update Torch Version to Fix ATen CPU Pipeline Failure.
2024-05-29 16:04:18 +08:00
Edward Chen
535e9d7114
Update package_release_tasks.py (#20835)
1. Move azcopy environment variables out of script and into an Azure DevOps variable group. Move towards consolidating the managed identity client ID definition in one place.
2. Disable azcopy overwrite. We don't want to accidentally change the files for a released package.
2024-05-28 17:50:25 -07:00
Adrian Lizarraga
e78b18a2fb
Increase ComponentDetection timeout for React Native CI (#20800)
### Description
Runs of the React Native CI are timing out during ComponentDetection
after 8 minutes. This increases the timeout value.



### Motivation and Context
Runs of the React Native CI are timing out during ComponentDetection.
2024-05-28 08:36:38 -07:00
Jian Chen
b1b8cb05dc
Adding java build and packaging stage to cuda-packaging-pipeline.yml (#20812)
### Description
Adding java build/packaging stage to `cuda-packaging-pipeline.yml`



### Motivation and Context
This way we can enable publishing the Java Cuda 12 along with Nuget CUDA
12
2024-05-27 07:59:19 -07:00
Changming Sun
439ed92b96
Remove TVM EP's pipeline (#20813)
### Description
Temporarily remove TVM EP's pipeline until someone helps us upgrade TVM
to a newer version which is compatible with the latest ONNX.

### Motivation and Context
The ONNX version that TVM EP uses has a known security vulnerability. We
cannot continue using it in our hosted build environment. This change is temporary
2024-05-25 20:42:41 -07:00
Jian Chen
fe24006425
Fix Nuget Cuda pipeline package pipeline (#20741)
### Description
<!-- Describe your changes. -->

This PR adding protoc.exe to make the Nuget Cuda Pipleine, which also
allowing it to get build Java for various CUDA version

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-24 09:15:57 -07:00
Changming Sun
535a030b1e
Remove manylinux build scripts from python packaging pipeline (#20786)
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
https://github.com/onnx/onnx/issues/6047 and
https://github.com/microsoft/onnxruntime-genai/issues/257 .

### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2024-05-24 08:18:22 -07:00
Jian Chen
884acd4598
Fix Nuget-Cuda pubish pipeline (#20794)
### Description
Previous all feed are set to nightly, the offcial released feed-id is
not set


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-23 18:27:46 -07:00
Changming Sun
b522df0ae4
Update RE2 to the latest (#20775)
Update RE2 to the latest.

To keep the components up to date.
2024-05-23 14:30:15 -07:00
Yi Zhang
fa8670fe5b
Add a test image for stable diffusion (#20780) 2024-05-23 08:50:23 -07:00
Jian Chen
d4fe4b5b51
Replace ubuntu-latest with onnxruntime-Ubuntu2204-AMD-CPU (#20736)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-22 13:36:02 -07:00
Jian Chen
0a10a3003a
component-governance fix round 4 (#20754)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-22 11:05:24 -07:00
Jian Chen
372974e5d6
Using CPU pool to build Linux GPU C API Package (#20648)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 15:25:14 -07:00
Jian Chen
ddafbf2224
Component Governance fix round 3 (#20689)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 13:39:09 -07:00
Jian Chen
11df22b59b
Reenabling Nuget Cuda Packaging Pipeline (#20688)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 10:37:15 -07:00
Edward Chen
fefae0cd04
Add Mac CI GitHub Actions workflow (#20717)
Add a new GitHub Actions workflow, `.github/workflows/mac.yml`. It contains these jobs:
- ARM64 MacOS CI build.
- Objective-C static analysis build. This was moved over from another Azure DevOps pipeline to make it more visible.
2024-05-20 10:27:03 -07:00
Yulong Wang
036fcd93d4
[js/web] optimize module export and deployment (#20165)
### Description

This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.

See each section below for more details.

#### Preview

>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)

> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~

> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~

> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~

<details>
<summary><h4>Breaking changes</h4></summary>

There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.

#### Importing:

Import table is changed. See following for details.

<details>
<summary><h5>Current import table:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
  | `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
| `ort.training` | `onnxruntime-web/training` |  |  | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
  | `ort.wasm-core` | `onnxruntime-web/wasm-core` |  |  | ✔️ |  |  |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ✔️<sup>\[2]</sup>
|  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.

</details>

<details>
<summary><h5>Proposed update:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
  | `ort.training` | `onnxruntime-web/training` |  |  | ✔️ | ✔️ | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~~~ | ~~~~
| ~~✔️~~ | ~~~~ | ~~~~ |
  | `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ~~✔️~~  |  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

</details>

#### Flags:

The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.

The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
  ```diff
  -  export interface Old_WasmFilePaths{
  -    'ort-wasm.wasm'?: string;
  -    'ort-wasm-threaded.wasm'?: string;
  -    'ort-wasm-simd.wasm'?: string;
  -    'ort-training-wasm-simd.wasm'?: string;
  -    'ort-wasm-simd-threaded.wasm'?: string;
  -  };
  +  export interface New_WasmFilePaths {
  +    /**
  +     * Specify the override path for the main .wasm file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .wasm file is:
  +     * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.wasm` for training build
  +     */
  +    wasm?: URL|string;
  +    /**
  +     * Specify the override path for the main .mjs file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .mjs file is:
  +     * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.mjs` for training build
  +     */
  +    mjs?: URL|string;
  +  }
  ```

#### Bundler compatibility:

Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.

#### Deployment:

- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)

</details>
<details>
<summary><h4>Problems</h4></summary>

There are a few problems with the current module export and deployment:

- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>

- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
    ```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
    ```
  - import from source code inside `<script type="module">` tag (ESM)
    ```html
    <script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";

      // using 'ort'
    </script>
    ```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js
    const ort = require('onnxruntime-web');
    ```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js (or main.mjs)
    import * as ort from 'onnxruntime-web';
    ```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
  - webpack (esm requires extra post-process step)
  - rollup
  - parcel (esm requires extra post-process step)
  - More bundlers **TBD**
- Multi-threading support for Node.js

NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>

<details>
<summary><h4>Important Design Decisions</h4></summary>

- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.

- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
  - Keep the generated files as-is also helps to:
    - reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug

- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
  - (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.

- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.

- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.

</details>

<details>
<summary><h4>Distribution File Manifest</h4></summary>

The distribution folder contains the following files:

- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.

  | File Name | Build Flags |
  |------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |

- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.

  There are multiple build targets for different use cases:
  | Target Name | Path for "import" or "require" | Description |
  |------|-----|-----|
  | `ort` | `onnxruntime-web` | The default target. |
  | `ort.all` | `onnxruntime-web/all` | The target including webgl. |
  | `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |


  For each target, there are multiple files generated:
  | File Name | Description |
  |------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
  | [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |

</details>

<details>
<summary><h4>Dynamic Import Explained</h4></summary>

- Local Served | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort-wasm-simd-threaded.mjs]
                    |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                    |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Local Served | Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort.proxy.min.mjs]
                    |
                    + new Worker()--> [ort.proxy.min.mjs (worker)]
                                        |
+ import()--> [ort-wasm-simd-threaded.mjs]
                                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                        |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Cross Origin | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort-wasm-simd-threaded.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort-wasm-simd-threaded)]
                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                        |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
                                            |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```

- Cross Origin | Proxy
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort.proxy.min.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort.proxy)]
                        |
+ new Worker()--> [blob:... (ort.proxy) (worker)]
                                            |
+ fetch('ort-wasm-simd-threaded.mjs')
                                                |
+ URL.createObjectURL(res.blob())
                                                |
+ import()--> [blob:... (ort-wasm-simd-threaded)]
                                                                |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                                |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
</details>
2024-05-20 09:51:16 -07:00
Edward Chen
e81c8676e3
MatMulNBits + Add fusion (#20587)
- Add MatMulNBits Bias input
- Add graph transformer to fuse MatMulNBits + Add
2024-05-16 11:00:59 -07:00
Yifan Li
47a178b518
[EP Perf] Fix on EP Perf (#20683)
### Description
<!-- Describe your changes. -->
* Partially revert [previous
change](https://github.com/microsoft/onnxruntime/pull/19804), and
   * Redo concurrency_test_result parser outside of post.py
* Add support of syncing memtest result to db


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To fix the error when CI is running on two model groups.
- When running on two model groups, the [previous
change](https://github.com/microsoft/onnxruntime/pull/19804) wrongly
navigates two levels up in the directory after running one model group,
while one level is needed. After that, the script can't find another
model group.
- Running on one model group can't repro the issue
2024-05-15 21:38:52 -07:00
Jian Chen
d1e66f0446
Increase NPM ComponentDetection.Timeout: 1200 (#20681)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-15 13:41:59 -07:00
Jian Chen
87ed1e3e3f
Component governance fix round 2 (#20679) 2024-05-14 17:15:15 -07:00
Edward Chen
113aa2992f
Update React Native CI (#20673)
- Move iOS package build to separate job so it can run in parallel with Android AAR build and be decoupled from the test stage. The test stage fails sometimes (not infrequently) and may need to be re-run.
- Update stop iOS simulator step so it doesn't fail if the start step doesn't run.
2024-05-14 14:10:56 -07:00
Jian Chen
83a871f890
Fix critical and High issues from Component Governance (#20611)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-14 09:17:23 -07:00
Hector Li
0e11d0c4f8
Enable Qnn nuget nightly (#20662)
### Description
Enable Qnn nuget nightly
2024-05-13 21:28:43 -07:00
Yi Zhang
c131ea89e1
Nuget Publish pipelines should be trigger by rel-* automatically too. (#20652)
### Description
And
Set allowPackageConflicts = True
`#allowPackageConflicts: false # boolean. Optional. Use when command =
push && nuGetFeedType = internal. Allow duplicates to be skipped.
Default: false.`

https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/nuget-command-v2?view=azure-pipelines

Once the publish patial failed, we don't need to rerun the whole package
generation workflow.
2024-05-13 13:18:16 -07:00
Edward Chen
90d49ccb9a
Allow path pattern to be specified in package_release_tasks.py. (#20650)
Do more in the Python helper script so the Bash code in the release definition can be simplified.
2024-05-13 09:16:04 -07:00
Jian Chen
4fe565a62a
Java CUDA 12 support (#20583)
### Description

- This PR combine all CUDA 12 stage into the Zip-nuget-... pipeline.
- It also enables the cuda12 support



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-10 14:16:22 -07:00
George Wu
a0c4bd4da7
[qnn ep] sign onnxruntime.dll/pyd for qnn packages (#20634)
sign only onnxruntime.dll and onnxruntime_pybind11_state.pyd in
packages.
2024-05-09 20:45:44 -07:00
Yi Zhang
5a18818e1d
Migrate training storage from SAS to managed identity (#20618)
### Description
orttrainingtestdatascus has only save mnist whose size is only 64M in
Azure File
To meet security requirements and reduce maintenance cost, move the test
data to lotusscus and saved in Azure blob.
2024-05-09 15:44:29 -07:00
Jian Chen
d1cbb3e076
The time for nuget pkg should be consistent (#20522)
This pull request primarily involves changes to the build scripts in the
`tools/ci_build/github/azure-pipelines` directory. The changes add build
date and time information to the build process. This is achieved by
introducing two new parameters, `BuildDate` and `BuildTime`, and
incorporating them into the `msbuildArguments` in multiple locations.

Addition of new parameters:

*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R309-R310):
Added `BuildDate` and `BuildTime` parameters using the pipeline's start
time.

Incorporation of new parameters in `msbuildArguments`:

*
[`tools/ci_build/github/azure-pipelines/c-api-noopenmp-packaging-pipelines.yml`](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948):
Added `CurrentDate` and `CurrentTime` parameters to `msbuildArguments`
in multiple locations.
[[1]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948)
[[2]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1092-R1093)
[[3]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1114-R1115)
[[4]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1137-R1138)
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L446-R448):
Incorporated the `CurrentDate` and `CurrentTime` parameters into
`msbuildArguments`.### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-09 11:35:45 -07:00
Edward Chen
a0db2187ee
Update CocoaPods package release script. (#20608)
- Update method for uploading to Azure storage to use managed identity.
- Allow helper script tasks to be split across different calls.
- Rewrite helper script in Python.

Motivation:
Recently the Azure storage account configuration was changed and now the old way of uploading to it no longer works.
2024-05-08 16:17:26 -07:00
Changming Sun
08b637350a
Remove an extra space in azure_scale_set_vm_mount_test_data.sh (#20584) 2024-05-08 09:46:50 -07:00
Scott McKay
8d09baf49f
Clarify when protobuf dependency builds protoc (#20542)
### Description
<!-- Describe your changes. -->
Currently figuring out if the protobuf dependency is building protoc it
is a little obtuse and inconsistent
* in some places we directly set protobuf_BUILD_PROTOC_BINARIES to OFF
to indicate the protobuf dependency is not building protoc
  * e.g. macOS/iOS/visionOS builds
* for a user provided protoc path we don't set
protobuf_BUILD_PROTOC_BINARIES, and inside protobuf_function.cmake that
determines if `protobuf::protoc` is added as a dependency or not
*
0dda8b0c44/cmake/external/protobuf_function.cmake (L40-L45)

To be more consistent/explicit, set protobuf_BUILD_PROTOC_BINARIES to
OFF when ONNX_CUSTOM_PROTOC_EXECUTABLE set and valid.

Remove outdated script that built and external protoc binary which was
used in later builds. The build setup will fetch a pre-built protoc so
there's no need for this additional build.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make it easier to figure out if protoc is coming from the protobuf
dependency.
2024-05-08 08:30:11 +10:00
aciddelgado
4e27841bdb
fix gqa cpu nan bug (#20521)
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.



### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
2024-05-07 15:19:26 -07:00
Adrian Lizarraga
0dda8b0c44
[QNN EP] Update QNN SDK to 2.21 (#20534)
### Description
- Updates QNN pipelines to use QNN SDK 2.21
- Downloads QNN SDK from Azure storage to avoid having to rebuild images
when a new version is released.


### Motivation and Context
Test with the latest QNN SDK.
2024-05-01 20:17:35 -07:00
Scott McKay
f9febc4f35
Remove usage of 'required reason' iOS API from protobuf (#20529)
### Description
<!-- Describe your changes. -->

Using certain APIs is about to require a [privacy
manifest](https://developer.apple.com/documentation/bundleresources/privacy_manifest_files/describing_use_of_required_reason_api)
to be added to a package.

Our version of protobuf uses `mach_absolute_time`. Patch as per
https://github.com/protocolbuffers/protobuf/pull/15662/ to remove usage.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Usage of API will require a privacy manifest for an iOS app to be
accepted as of 5/1/2024
#20519
2024-05-02 08:21:08 +10:00
Yifan Li
29417762f7
[TensorRT EP] support TensorRT 10-GA (#20506)
### Description
<!-- Describe your changes. -->
This branch is based on rel-1.18.0 and supports TensorRT 10-GA.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-01 11:10:53 -07:00
Hector Li
755aaea9a6
Qnn nuget update (#20527)
### Description
Update Qnn nuget package to include Qnn libs and license file
2024-04-30 22:12:53 -07:00
Yi Zhang
91baeb8495
Reduce downloads to NodeJS to mitigate random connection exception. (#20518)
### Description
There was connection exception in docker build in package pipeline
```
48.26 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
456.0 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=453140&view=logs&j=f9f5b320-fa10-56c4-debe-61ea69c74793&t=1656e225-defa-5b12-8935-2a0a93e76a67&s=3c85d903-a183-5028-775e-d63999fcc9ae

In fact, docker image shouldn't be rebuilt this time.

Checked the code, The docker image tag in Linux_C_API_Packaging_GPU_x64
of onnxruntimecuda${{ variables.CUDA_VERSION_MAJOR }}build was same as
the image tag of Linux-gpu-ci-pipeline, but their docker files are
different.

So changing the Linux GPU pipeline's image tag to avoid packaging
pipeline docker image overridden unexpectedly.
2024-05-01 09:04:56 +08:00
Rachel Guo
8c31f27dd1
Catalyst nuget package .NET changes only (#20424)
### Description
<!-- Describe your changes. -->

https://github.com/microsoft/onnxruntime/pull/20418

Add back Catalyst changes only for now.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-04-29 15:39:48 -07:00
Scott McKay
923b0ef323
Run fuzz testing before the CG task cleans up the build directory (#20500)
### Description
<!-- Describe your changes. -->
Update order of steps


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CI
2024-04-29 16:02:53 +10:00
Rachel Guo
ff505b9f44
Follow up fix for #20472 (#20484)
### Description
<!-- Describe your changes. -->

Error: 

**Artifact name input: e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss)
##[error]Artifact name is not valid:
e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss). It cannot contain '\', /',
"', ':', '<', '>', '|', '*', and '?'**

Date not correctly showing up in the artifact name. Use predefined
pipeline variable BuildNumber instead which also serves similarly as a
timestamp.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

RN CI failure

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-04-27 13:42:24 +10:00
Rachel Guo
88904b9220
Add unique identifier to e2e_test_logs artifacts in react-native-ci.yml (#20472)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-26 22:20:10 +10:00
Scott McKay
aa27dadd1c
Use download.onnxruntime.ai in podspec (#20474)
### Description
<!-- Describe your changes. -->
Update to more generic url


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-26 20:28:54 +10:00
Yi Zhang
464f199b95
Extend mac package jobs time out limit (#20459) 2024-04-25 10:13:13 -07:00
Yi Zhang
e5947f5729
Two improvements in pipelines (#20449)
### Description
1. Update the image name to avoid docker image wouldn't be overwrite.
there was an mistake that variables.CUDA_VERSION_MAJOR is always empty

14fcf0a52d/tools/ci_build/github/azure-pipelines/stages/nuget-linux-cuda-packaging-stage.yml (L120)
3. set one artifact name as variable to make the job rerunnable



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-25 10:15:40 +08:00
Scott McKay
a46bab6364
Update podspec url to use AFD hostname (#20452)
Update to use AFD url when generating podspec
2024-04-24 09:37:24 -07:00
Rachel Guo
14fcf0a52d
Support visionos build (#20365)
### Description
<!-- Describe your changes. -->

This PR supports a build of onnxruntime.xcframework for xros/xrsimulator
for visionos via the build command of

`python3 tools/ci_build/github/apple/build_apple_framework.py --config
Release/Debug
tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`.

For officially include visionos in ios cocoapods package and testing in
CI, would require separate work for upgrading the Xcode version &
upgrade macOS CI agent to macos-13-arm64 or higher.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

visionos support:
https://github.com/microsoft/onnxruntime/discussions/19313

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2024-04-23 18:15:07 -07:00
Yulong Wang
5055dc0aa8
[js/web] add diagnose log for chrome (#20439)
### Description

Add logs to further diagnose the pipeline issue.
2024-04-23 17:18:54 -07:00
Edward Chen
76461c8f4d
Increase timeout for iOS packaging pipeline jobs. (#20434) 2024-04-23 11:55:55 -07:00
Yi Zhang
7ebc653f04
Revert "Nuget .NET changes for Mac Catalyst (#19923)" (#20418)
This reverts commit f396748ed6.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 15:08:12 +08:00
Adrian Lizarraga
e6a677f6b7
[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359)
### Description
- Updates Windows QNN Nuget and Python packaging pipelines to download
QNN SDK from blob storage.
- Makes the QNN SDK version configurable when launching the python
packaging pipeline.



### Motivation and Context
Removes the need to rebuild images to update QNN SDK. Only applies to
Windows pipelines. Linux pipelines still get the SDK from disk.
2024-04-22 22:32:55 -07:00
Yi Zhang
197b3f1d90
Enable Whisper Test with OMP_FFMPEG (#20402)
### Description
 Installing OMP_FFMPEG in the docker  and Readd Whisper Test
Download OMP_FFMPEG in restricted accessed Azure blob.
2024-04-22 10:55:56 -07:00
Yulong Wang
a457c1df80
upgrade emsdk to 3.1.57 (#20295)
### Description
upgrade emsdk to 3.1.57
2024-04-19 23:05:18 -07:00
Rachel Guo
f396748ed6
Nuget .NET changes for Mac Catalyst (#19923)
### Description
<!-- Describe your changes. -->

Add Nuget package changes for adding new 'net6.0-maccatalyst' platform.

The output ORT Nuget package was manually tested and verified in a .NET
MAUI app setup.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2024-04-19 14:20:03 -07:00
sfatimar
4d1963c2a2
OpenVINO EP Rel 1.18 Changes (#20337)
### Description
These changes include
Support to OpenVINO 2024.1 
Import PreCompiled Blobs with EPContext Blob 
Separate Device/Precision as input
Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU 
AUTO GPU, CPU will only create GPU Blob and not CPU Blob. 



### Motivation and Context
- OpenVINO 2024.1 will be out soon
- Import Precompiled Blob can greatly reduce FEIL/FIL Time. 
- Separating Device/Precision will make the input cleaner
-

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
2024-04-19 00:31:38 -07:00
Yulong Wang
3577a4bd02
[Node.js binding] Allow installation to download CUDA binaries via script (#20364)
### Description
Currently we try to include all prebuilt binaries into the NPM packages.
This was working until we added libonnxruntime_providers_cuda.so
(>400MB) into the NPM package. The NPM registry refuses to accept new
package publishment because the file is too large.

To make the new NPM package working, we have to remove the large file
from the package, and add a new script on package installation. This
script will try to dynamically install onnxruntime CUDA dynamic library
for Linux/x64.
2024-04-18 13:44:42 -07:00
Yi Zhang
4d2b98155f
More fixes on random connection excepiton in Mac Build. (#20328)
### Description
supplement of #20322 



### Motivation and Context
Fixes random connection exceptions in 
Mac build in Python Packaging Pipeline

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443617&view=logs&j=5849a411-e258-5ce5-39bd-7b65d44961a0&t=ccb871c8-76d9-5e80-55b0-4279efd5567f
and IOS full xcframework

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443458&view=logs&j=370fd1a2-3dec-5916-4d2c-8aae58c72d28&t=686352ba-ee61-5ad4-8739-e8abd07372a4&s=e9aa87c8-a9ad-51f7-3b12-045ecc319776
2024-04-17 08:37:56 +08:00
Yi Zhang
caf692e626
[Fix] Random connection exceptions in MacOS_C_API_Packaging_CPU stage (#20322)
### Description
Add download_deps to reduce downloading from 3rd party websites.


### Motivation and Context
Fix frequent random exception like
```
CMake Error at abseil_cpp-subbuild/abseil_cpp-populate-prefix/src/abseil_cpp-populate-stamp/download-abseil_cpp-populate.cmake:162 (message):
  Each download failed!

    error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20240116.0.zip' failed
          status_code: 35
          status_string: "SSL connect error"
          log:
          --- LOG BEGIN ---
            Trying 20.29.134.23:443...

  Connected to github.com (20.29.134.23) port 443

  ALPN: curl offers h2,http/1.1

  (304) (OUT), TLS handshake, Client hello (1):

  [315 bytes data]

   CAfile: /etc/ssl/cert.pem
   CApath: none

  Recv failure: Operation timed out

  LibreSSL/3.3.6: error:02FFF03C:system library:func(4095):Operation timed
  out

  Closing connection
```

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443278&view=logs&j=006a7a04-d43b-5fe1-df02-ecafb79c4d6e&t=110edd38-9f3b-50cf-b328-8ed0f915e5c1

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-04-16 13:28:18 +08:00
Edward Chen
287ecea2f1
Fix binary size check build publish step. (#20298)
Add `--user` option to pip install command.

Error:
```
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py'
Consider using the `--user` option or check the permissions.
```

See #19877.
2024-04-15 10:15:42 -07:00
liqun Fu
cd7112f800
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md

ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0

#### Updated ops for CPU EP:
- DequantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block dequantization support
- QuantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block quantization support
- Cast(21)
  - Missing int4 and uint4 support
- CastLike(21)
  - Missing int4 and uint4 support
- ConstantOfShape(21)
  - Missing int4 and uint4 support
- Identity(21)
  - Missing int4 and uint4 support
- If(21)
  - Missing int4 and uint4 support
- Loop(21)
  - Missing int4 and uint4 support
- Reshape(21)
  - Missing int4 and uint4 support
- Scan(21)
  - Missing int4 and uint4 support
- Shape(21)
  - Missing int4 and uint4 support
- Size(21)
  - Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support

#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Disabled tests
#### ORT Training

orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops

#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8

#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)

---------

Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 09:46:49 -07:00
Yifan Li
9577fe454d
[EP Perf] Customize onnx-tensorrt commit id when init CI tasks (#20175)
### Description
<!-- Describe your changes. -->
Customize commit id of onnx-tensorrt in EP Perf CI variables when
testing OSS parsers in different versions

### To Verify

![image](https://github.com/microsoft/onnxruntime/assets/109183385/9dc650d8-377d-4223-8951-f0849b1fe984)

After assigning `onnxTensorrtCommitId` in EP Perf CI Variables, 
CI would prompt during the step of **[Build latest ORT Image with
TensorRT OSS
parser](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396450)**:
```
Updated deps.txt with new commit id a43ce67187bab219520fd80f21af8bbd4354bc8c and hash 572535aefef477050f86744dfab1fef840198035
```
And CI would [overwrite the line of onnx_tensorrt in
deps.txt](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396451)
which was assigned as:
```
onnx_tensorrt;a43ce67187.zip;572535aefef477050f86744dfab1fef840198035

```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To save time of modifying deps.txt and manually calculating zip hash
2024-04-10 09:46:05 -07:00
Yi Zhang
0acde1157a
Set parallel count to avoid OOM in training GPU packaging pipeline (#20255)
### Description
make the compilation work on Azure CPU Agent by reduce the parallel
count



### Motivation and Context
The OOM issue mentioned in #20244 was caused the by low
memory/parallel_count.
2024-04-10 14:05:53 +08:00
Yi Zhang
14d7872ce9
Reuse T4 for Cuda12.2 training packaging pipeline. (#20244)
### Description
It always has been out of memory in training CUDA 12.2 packaging
pipeline
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
since the PR #19910
I tried other CPU agents for example, D64as_v5(256G memory) and
D32as_v4(128G memory and 256 G SSD temp storage), which are still out of
memory like the below image

![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083)


But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp
storage, and it takes much more time.

### Motivation and Context
Restore CUDA 12.2 training packaging pipeline first.
More time is needed to investigate the root cause


### Other Clues.
These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4
And it runs out of memory on CPU machine. @ajindal1 
cuda12.2 on T4
```
2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o
2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o

```

But they could be finished in about one minute with Cuda 11.8 on CPU
```
cuda11.8 on CPU
2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

cuda11.8 on GPU
024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
```
2024-04-10 09:21:40 +08:00
Adrian Lizarraga
05d97e8d18
Update QNN python packages to use QNN SDK version 2.19.2 (#20213)
### Description
Update QNN python packages to use QNN SDK version 2.19.2.



### Motivation and Context
Our CI builds already use QNN SDK version 2.19.2. We should make sure
the ort-nightly-qnn python packages are also built with the same QNN SDK
version.
2024-04-05 17:15:25 -07:00
Yi Zhang
23a5d0a305
Extend time out in Windows GPU packaging jobs (#20207)
### Description
Extend Windows GPU Packaging job building time out to 6 hours, and test
stage to 3 hours.



### Motivation and Context
There're still a few timeout issues after refactoring. The probability
is about 20% in
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84.
I found the building could be finished in 4 hours if it becomes slow,
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f,
although in most cases, it only take about 30 minutes.
Not like before, the building couldn't be completed.
So, In this PR, I extend the timeout to 6 hours.

And one interesting thing, if one windows GPU job becomes slow, all
other windows GPU jobs in the same run become slow too.
So I doubt it has something with the ADO or virtualization. That is,
it's not completely random.
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841
2024-04-06 08:03:42 +08:00
Yi Zhang
4ea54b82f9
[Fix] Upload training CUDA daily wheel (#20183)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-03 13:18:26 +08:00
Yi Zhang
523ef04240
enable lto in Python-CUDA-Packaging Pipline (#20164)
### Description
Except [Python-CUDA-Packaging
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary),
all windows cuda packaging jobs have been running well now.
After comparison, enable_lto isn't added in the pipeline, which might be
one root cause of the random hang.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-01 15:42:28 +08:00
Jeff Bloomfield
2f31560430
Enable generic feature level devices in DML EP (#20114)
### Description
Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and
D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX
headers through the DirectX-Headers repo.

Note that this includes an update to cgamanifest.json for onnx-tensorrt
which is triggered during re-generation due to a prior changes to
deps.txt.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-29 14:37:30 -07:00
Adam Pocock
2f82400b13
[java] Java 21 build support (#19876)
### Description
Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to
allow compiling ORT on Java 21. The build still targets Java 8.

I'm not sure if there will be CI changes necessary to use this PR,
specifically for the Gradle version as I don't know if that is cached
somewhere earlier in the CI build process.

The new Gradle version adds a warning that using `--source` and
`--target` to select the Java language version is obsolete which is
annoying, we can fix it if we decide to only allow building on newer
versions of Java, while still supporting running on Java 8.

### Motivation and Context
Java 21 is the latest LTS release of Java and ORT should be able to
build on it.
2024-03-28 15:51:22 -07:00
Yi Zhang
f7b52d2e3e
[Fix] Only copy java files when build_java is True (#20121)
### Description


### Motivation and Context
Fix error in Nuget-CUDA-Packaging-Pipeline
2024-03-28 14:06:28 -07:00
Yi Zhang
c5d7310f1b
Remove TSA upload in testing stage (#20115)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-28 13:15:03 +08:00
Yi Zhang
8f069f81c4
Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080)
### Description
reactor win-ci.yml to solve the random hang issue in more GPU workflows,
move nugget-zip packages and python cuda12 packages building to CPU
machine.

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-28 12:55:44 +08:00
Dmitri Smirnov
b95fd4e644
Enable CUDA EP unit testing on Windows (#20039)
### Description
Address build issues and source code discrepancies.
Fix cuda_test_provider gtest argument stack corruption.

### Motivation and Context
`OpTester` class that is widely used for kernel testing is not
suitable for testing internal classes for EPs that are built as shared
objects.
Currently, CUDA EP tests run only on Linux.
We want to enable testing and developments on Windows,
and create a usable pattern for testing of other EPs internals.

Alternatives considered: 
Abstracting EP unit tests into separate test executable such as
`onnxruntime_test_all`.
This alternative was rejected as it would create a lot more changes in
the established patterns,
and potentially interfere with CUDA functionality with more complex
source code maintanence.
2024-03-27 13:32:36 -07:00
Yi Zhang
ab2eaedfaa
Install ONNX by buildling source code in Windows DML stage (#20079)
### Description
In #20073, I use pin onnx version to unblock the whole PR CI.
In fact, we could use the onnx that installed by building source code,
that the onnx version is controlled by deps.txt.
For some history reason, DML stage installed onnx from pypi. Now, the
onnx can be installed as other stages.

add an option to skip installing onnx in win-ci-prebuild-step
2024-03-27 12:29:34 -07:00
Yi Zhang
4df9d16f98
[Fix] TSAUpload task must be in building stage (#20098)
### Description
In #20085, TSAUpload was in testing stage so main branch failed.
2024-03-27 12:20:57 -07:00
Yulong Wang
47903e701a
fix condition in web CI YAML (#20095)
### Description
fix condition in web CI YAML
2024-03-27 10:35:43 -07:00
Yi Zhang
0561b9576e
Fix and Refactor Python Packaging Pipeline (#20085)
### Description
Make Windows GPU Packaging stage in Python Packaging pipeline run on CPU
machine as well



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Test Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430961&view=results
2024-03-27 12:17:22 +08:00
Yulong Wang
0313dd1f65
Update Web CI to use data dir under Agent.TempDirectory (#20074)
### Description
Update Web CI to use data dir under Agent.TempDirectory

This change fixes the random failure caused by unstable access to karma
temp directory (which is under AppData\Local\Temp) on CI pipeline
2024-03-26 13:16:59 -07:00
Baiju Meswani
40efbd6c37
Fix training and macos ci pipelines (#20034) 2024-03-26 12:20:11 -07:00
sfatimar
eab35c20fc
Ort openvino npu 1.17 master (#19966)
### Description
Add NPU to list of device supported. 
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL 
Bug Fixes with Python API 
Reverted Dockerfiles not being maintained. 



### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
2024-03-21 18:44:00 -07:00
Yi Zhang
cd6d3aea45
Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989)
### Description
1. Move building on CPU machine.
2. Optimize the pipeline
3. Since there isn't official ONNX package for python 12, the python 12
test stage uses the packages built with ONNX source in build stage.


### Motivation and Context
1. Resolve the random hang in compilation
4. Save a lot of GPU resources.

---------
2024-03-22 09:16:00 +08:00
Yi Zhang
30a0d80925
Fix exception in Publish unit test results step (#20007)
### Description
Test results files are all in RelWithDebInfo\RelWithDebInfo directory.
It's not necessary to stat the directory of _deps 

### Motivation and Context
Recently this exception in zip-nuget pipleine occurs many times.
`##[error]Error: Failed find: EPERM: operation not permitted, stat
'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'`

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b
2024-03-22 06:53:59 +08:00
Yi Zhang
175f149b30
Remove downloading deps in CUDA package test stage (#19993)
### Description
<!-- Describe your changes. -->



### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
2024-03-21 10:01:03 +08:00
Yufeng Li
15219e2e71
turn on neural_speed by default (#19627)
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-20 12:49:58 -07:00
Rachel Guo
6b305f95e0
Support xcframework for mac catalyst builds. (#19534)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

MAUI on macOS uses mac-catalyst which requires a different native
binary.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-03-20 10:55:19 -07:00
Yi Zhang
8adbc09314
[Fix] Error Python Packaging Pipeline (Training CPU) (#19992)
### Description
fix the error caused by
https://github.com/microsoft/onnxruntime/pull/19973
2024-03-20 09:02:50 -07:00
mindest
3dfe4a5e6d
[ROCm] Remove MPI dependency and collectives to use NCCL (#19830)
### Description
* Remove MPI dependency to use NCCL AllReduce, etc.
* Exclude unsupported collectives in hipify
2024-03-19 17:35:18 -07:00
Hariharan Seshadri
cd6ec50b50
Switch a portion of CI/packaging jobs to MacOS12 (#19908) 2024-03-19 14:54:58 -07:00
Yi Zhang
d4c8bc359e
Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973)
### Description
The docker image name was fixed, but the docker argument was different
in different job.
It would trigger rebuilding the docker image almost every time!!!
2024-03-19 09:33:24 -07:00
Yulong Wang
b29849a287
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```

Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-15 19:01:50 -07:00
Yifan Li
0b2a75b274
[EP Perf] Add concurrency test (#19804)
### Description
<!-- Describe your changes. -->
* Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner)
  * Model: FasterRCNN-10 model within CI image
  * `-c` param configurable via CI panel when kicking off CI tasks
  * Auto-replicate test input/outputs according to `-c` param
* By default, the model test will be executed in 100 iterations (~2min
added to T4 CI task load overall)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To monitor potential concurrency issues of ORT-TRT
2024-03-15 07:41:21 -07:00
Justin Chu
bcf47d3546
Update install_deps_lort.sh to fix onnxscript installation (#19922)
Install onnxscript correctly with `pip install`. Dev dependencies are
not required.

### Motivation and Context

Fix build breaks.
2024-03-14 17:05:50 -07:00
Adam Louly
32558134a9
[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770)
### Description
Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers


### Motivation and Context
This PR includes changes that will make ort handle 2GB+ checkpoints.
To do that we need to upgrade flatbuffers to 23.5.9 -
https://github.com/google/flatbuffers/pull/7945

- Modified the commitHash and the hash for the new version
- Removed the patch for rust generator's unused variable warning as it
is no longer producing this - [Check it out
here](d121e09d89/src/idl_gen_rust.cpp)
- Updated the VerifyField calls with alignment values that were
introduced in the new version.

---------

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2024-03-14 16:36:24 -07:00
Yi Zhang
87a9f77c56
Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910)
### Description
1. Use stage to organize the pipeline and split building and testing
2. Move compilation on CPU machine
3. test stage can leverage existing artifacts
4. check wheel size, it gives warning if the size above 300M
5. docker image name wasn't change even the argument changed, which
caused the docker image was always rebuilt. So update the docker image
name according to the argument can save the docker build time.

Pipeline duration reduced by 60% (2 hours ->  50 minutes)
Compilation time reduced by 75% (1.5hours -> 20 minutes)
GPU time reduced by 87% ( 8 hours to 1 hours)
for debugging, the GPU time could be reduced by above 95%, because we
can choose run only one test stage and skip building.

### Motivation and Context
Make the pipeline efficient.
Optimized

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results
Curent

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results

---------
2024-03-15 06:47:41 +08:00
Changming Sun
8b766bd24e
Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919)
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.

Similar to #19909

### Motivation and Context

As a follow up of #19118
2024-03-14 15:07:56 -07:00
Changming Sun
ea4a5eea18
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build (#19909)
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.


### Motivation and Context

As a follow up of #19118
2024-03-14 07:55:00 -07:00
Yulong Wang
e771a763c3
[js/test] align web test runner flags with ort.env (#19790)
### Description
the `npm test` flags are difficult to memorize, because they are
different to the `ort.env` flags. This change makes those flags align
with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`.

Old flags are marked as deprecated except `-x` (as a shortcut of
`--wasm.numThreads`)
2024-03-13 12:00:36 -07:00
Yi Zhang
d5d9dbd51d
reuse T4 on Linux GPU (#19879)
### Description

### Motivation and Context
Linux GPU test on A10 isn't very stable
2024-03-13 10:41:36 -07:00
Hariharan Seshadri
ed306b4f97
Fix Android CI pipeline (#19877) 2024-03-13 10:09:43 -07:00
Justin Chu
faea42af95
Bump ruff to 0.3.2 and black to 24 (#19878)
### Motivation and Context

Routing updates
2024-03-13 10:00:32 -07:00
Yi Zhang
9e0a0f0f32
Check whether required tests are executed. (#19884)
### Description
Check the onnx node tests and model tests worked

### Motivation and Context
onnx node test data and model data are mount in one dir.
And onnxruntime_test_all search the dir and load the data.
If the dir does exist or there's some change in onnxruntime_test_all,
those tests may not be executed.
For example, all onnx node test data is 32M. It's hardly for us aware of
the regression.
So I add the simple check to ensure those tests are executed.

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-13 09:59:57 -07:00
Yi Zhang
7313aa4efe
Remove --extra-index-url (#19885)
### Description
<!-- Describe your changes. -->



### Motivation and Context
--extra-index-url is not allowed by injected Secure Supply Chain Step in
packaging pipelines.
```
> Starting Multifeed Python Security Analysis:
##[warning]tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml - Found "extra-index-url". (https://aka.ms/cfs/pypi)
```
And those 2 packages can be installed from PyPI as well now.

Co-authored-by: Yi Zhang <your@email.com>
2024-03-13 09:45:22 -07:00
Edward Chen
860eb762c2
[Apple framework] Fix minimal build with training enabled. (#19858)
Fix some linker errors that come up when integrating the onnxruntime-training-c pod into another Xcode project. The problematic configuration is a minimal build with training APIs enabled.
- training_op_defs.o had some unresolved references to ONNX functions. It should not be included at all in a minimal build.
- tree_ensemble_helper.o also had unresolved references to ONNX ParseData. The containing function is unused in a minimal build.

Added a test to cover this configuration.
2024-03-12 11:33:30 -07:00
Yi Zhang
d4fa4f0276
Remove FFmpeg to meet compliance (#19859) 2024-03-12 09:06:59 -07:00
Changming Sun
5479124834
Remove remaining Windows ARM32 build jobs (#19840)
### Description
As a follow up of #19788, remove more remaining Windows ARM32 build
jobs.


### Motivation and Context
Our nuget packaging pipeline is failing because it could not find an
artifact for Win ARM32.
```
##[error]Artifact onnxruntime-training-win-arm was not found for build 421397.
```

Deprecation of Win ARM32 was announced by Windows team in January 2023.
We should follow it.
2024-03-11 11:25:11 +08:00
Yifan Li
069d2d6f54
[EP Perf] Update EP Perf dockerfiles with cuda12/cudnn9 (#19781)
### Description
* Update name of existing dockerfiles and add support to test latest
TensorRT EA binary located in the image
* Add cuda 12.3/cuDNN 9/TensorRT 8.6 dockerfile
* Add detail to CI prompts and configs

Instruction to test latest TRT via BIN:
1. Select `BIN` in TensorRT Version
2. In Variables, update related tarCudaVersion, **clear**
tarCudnnVersion (not required in latest TRT tar binary) , and path to
binary.
2024-03-08 13:58:22 -08:00
Yifan Li
3170a48e60
[EP Perf] Add tag to indicate which TRT parser is using (#19784)
### Description
* Add tag to distinguish if TRT `builtin` or `oss` parser is being used
* `oss` tag will be inserted with onnx-tensorrt commit id, to indicate
which version oss parser is
### Validate
DB entry before/after this PR 
(during test, `builtin` or `oss_{commit_id}` tag was inserted in the
database entries):

### Motivation and Context
To distinguish perf results using builtin/oss parser in the database,
this parser tag is needed.
In future, results using different parsers will be listed in different
Perf Dashboard pages.
2024-03-08 10:24:36 -08:00
Ashwini Khade
e93a860819
Remove arm build for training (#19788)
We no longer support Win arm 32 so removing the associated build and
packaging job.
2024-03-05 21:54:48 -08:00
Scott McKay
db59cec82f
Don't reduce warning level for CUDA build on Windows (#19663)
### Description
<!-- Describe your changes. -->
Address warnings so all the ORT projects build with /W4 on Windows.

Mainly 
- unused parameters
- variables shadowing other ones

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#19588 started on this.
2024-03-06 15:03:55 +10:00
Yulong Wang
a788514027
[js/web] dump debug logs for karma for diagnose purpose (#19785)
### Description
dump debug logs for karma for diagnose purpose.

This is for debugging the CI issue of Chrome launch failure and
considered temporary.
2024-03-05 18:27:26 -08:00
Yi Zhang
9460597b21
Update copying API header files (#19736)
### Description
Make Linux logic consistent as Windows


### Motivation and Context
onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip
package

acbfc29f27/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml (L67)

Co-authored-by: Your Name <your@email.com>
2024-03-02 11:33:47 +08:00
Edward Chen
5672cdebdf
Update google benchmark to 1.8.3. (#19734)
Update google benchmark to 1.8.3.
Update deps_update_and_upload.py script to make it easier to use.
2024-03-01 11:01:58 -08:00
Changming Sun
ed550b5fe5
Change webgpu CI pipeline to use a preinstalled chrome (#19729)
### Description
Change webgpu CI pipeline to use a preinstalled chrome. Hopefully it can
increase the stability. Now the chrome got from puppeteer often failed
to start.
2024-02-29 20:36:29 -08:00
Changming Sun
250779474d
Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698)
### Description
The original one reports "out of disk space", which needs to be
investigated.
2024-02-28 19:36:26 -08:00
Changming Sun
a93c31e3c9
Update dml-vs-2022.yml (#19687)
### Description
Fix a build error in "Zip-Nuget-Java-Nodejs Packaging Pipeline" which
deletes files too early.
2024-02-28 12:03:17 -08:00
Changming Sun
7a147fc6f7
Remove a bash task from webgpu CI pipeline (#19682)
### Description
It is a "Bash" task that requires running bash on Windows. Most Windows
operating systems do not have Bash installed. Given this task is only
debugging purposes, we can remove it for now.


### Motivation and Context
I am making this change because I am regenerating the VM image in a
different manner, and the new image does not contain bash. Once this PR
is in, I can switch the images.
2024-02-28 18:20:53 +08:00
Yi Zhang
f95c0773a1
Add share memory Flag in docker (#19672)
### Description



### Motivation and Context
Ref:
https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#setincshmem

Co-authored-by: Your Name <your@email.com>
2024-02-28 10:40:40 +08:00
Scott McKay
1c468a03b9
Improve Nuget-CUDA-Packaging-Pipeline (#19668)
### Description
<!-- Describe your changes. -->
* Publish the artifacts as late as possible
* once published the artifacts are immutable, and any retry will fail if
they exist
  * if any step fails after publishing the stage cannot be retried
* use powershell to cleanup
  * DeleteFiles is taking >30 mins and causing the stage to timeout
  * powershell took < 1s

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make pipeline more robust
2024-02-27 09:27:43 -08:00
Scott McKay
580ee20dfc
Tweak Windows build parallelization settings (#19664)
### Description
<!-- Describe your changes. -->
Use UseMultiToolTask and limit the number of cl.exe instances running. 

MultiToolTask info:
https://devblogs.microsoft.com/cppblog/improved-parallelism-in-msbuild/

Info on why limiting CL_MPCount can help:
https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows

The current CIs have 4 cores (both physical and logical). Hardcoded the
GPU build in win-ci.yml to use CL_MPCount of 2 as that seems to work
fine. Can adjust if needed to base it on the actual number of cores or
to use build.py to build.

Caveat: I've run about 16 builds and haven't seen a slow build yet, but
as the root cause of the slow builds isn't really known this isn't
guaranteed to be a fix.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Try and prevent super slow GPU builds by reducing number of tasks
potentially running in parallel.
2024-02-27 08:56:16 -08:00
Yi Zhang
3b46ab6439
Re-add testing removed by mistake. (#19647) 2024-02-27 08:46:29 -08:00
Rachel Guo
5bb58a10e7
Enable the most verbose logging level in detox E2E React Native CI (#19659)
### Description
<!-- Describe your changes. -->

The RN CI has intermittent failure error with "app seems to idle".
enable the most verbose logging level (and can add steps to dump
device.log from the detox folder/artifacts if necessary) to at least get
more information.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-02-26 20:00:14 -08:00
Scott McKay
8bd943be39
Retry flaky XCode iOS UI tests if we get a known error (#19639)
### Description
<!-- Describe your changes. -->
Xcode UI tests seem to be flaky:
https://github.com/orgs/community/discussions/68807
Add a couple of retries if we get a "Timed out while loading
Accessibility." error which is transient.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-27 09:31:32 +10:00
Yi Zhang
0fcc6fb760
Add Whisper model in CI (#19604)
### Description
 Add Whisper Conversion and E2E into Big Models pipeline



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <your@email.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
2024-02-25 14:04:22 +08:00
Yi Zhang
c980149c85
Add log for random exception in Linux GPU Test Stage. (#19569)
### Description
1. check GPU status in docker
2. use stages to make test stage can leverage existing building
artifacts


### Motivation and Context
To investigate the root cause of the random exception
`CUDA failure 100: no CUDA-capable device is detected`
2024-02-24 13:00:53 -08:00
Scott McKay
45e20bf781
Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578)
### Description
<!-- Describe your changes. -->
build.py sets a few parallelization parameters when building. Using
msbuild directly lacks those.


7a5860e490/tools/ci_build/build.py (L1665-L1669)

Changed to use build.py. If there's a concern with that we _could_ set
the parameters in the yaml, but that will be uglier due to duplicating
logic in multiple places.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-21 10:38:37 +08:00
PeixuanZuo
f3e3b531fe
Update build directory clean up stage for python package pipeline (#19553)
Fix to make clean up stage take effect.

If the `SourceFolder ` is empty, the task deletes files from the root
folder of the repository as though
[$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables)
was specified.
2024-02-20 10:31:39 +08:00
Adrian Lizarraga
4874a41008
[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546)
### Description
Updates the default QNN SDK version to 2.19.2.240210.

### Motivation and Context
Build and test the latest version of QNN SDK in our pipelines.
2024-02-16 16:59:43 -08:00
Tianlei Wu
1dce5e1732
Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541)
### Description
Some test thresholds that previously worked in T4 GPU does not work
anymore. The reason is current pipeline uses A10, and TF32 is enabled by
default.

Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random
test failure.

### Motivation and Context
Linux Test has random failure at tests:

ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[446], expected: <0.0419757> but was: <0.041948937>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
 
org.opentest4j.AssertionFailedError: array contents differ at index [6],
expected: <0.0225981> but was: <0.022587791>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
2024-02-16 14:41:11 -08:00
rui-ren
d63c664ca0
fix rocm ci pipeline (#19525)
### Description
<!-- Describe your changes. -->

ROCm CI pipeline issue.
```
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
    main()
  File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
    datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
    self._download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
    data_file = dl_manager.download_and_extract(self.config.data_url)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
    return self.extract(self.download(url_or_urls))
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
    downloaded_path_or_paths = map_nested(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
    return function(data_struct)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
    return cached_path(url_or_filename, download_config=download_config)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
    output_path = get_from_cache(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
    raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip

```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update the `datasets` pipeline to latest version `2.17.0`.
2024-02-15 00:02:08 -08:00
Prathik Rao
3b03b2e046
Upgrade default ORTModule opset from 15 to 17 (#19315)
### Description
<!-- Describe your changes. -->

This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
the final opset supported by torchscript exporter
(https://github.com/pytorch/pytorch/pull/107829)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Engineering excellence contribution for ORT Training DRI.

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-14 11:19:33 -08:00
Yifan Li
5c7e6b2e2a
[EP Perf] Add CI option to enable TRT-OSS parser (#19448)
### Description
<!-- Describe your changes. -->
* Introducing CI option to enable TRT-OSS parser, during ep perf
testing:

![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504)

By default, open-sourced onnx-tensorrt parser listed under
[cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40)
will be used if enabling this option.


### To verify this option and check the difference during ORT image
build:
If this option is enabled:
<img width="649" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd">

If this option is not enabled (by default):
<img width="683" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab">

* update default usage of cmake/trt version to the latest

### Motivation and Context
Make it easier to test oss parser and find potential gap between
tensorrt builtin/oss parser.

Schedule runs with oss parser will be set after this PR gets merged
2024-02-12 23:04:08 -08:00
Adrian Lizarraga
4dfba53bfb
[QNN EP] Build x64 python wheel for QNN EP (#19499)
### Description
Adds a job to the python packaging pipeline that builds x64 python
wheels for QNN EP.



### Motivation and Context
Necessary to create a cached QNN model on Windows x64, which is done by
creating a properly configured onnxruntime session with QNN EP.
2024-02-12 20:54:04 -08:00
Baiju Meswani
c831031ad5
Remove cuda gencode 90 to reduce onnxruntime-training package size (#19486) 2024-02-12 09:24:36 -08:00
Justin Chu
3d2ddf96e3
Bump ruff linter to 0.2.1 (#19471)
### Motivation and Context

Include new lint rules
2024-02-08 16:08:27 -08:00
Jian Chen
75f06319d6
Change binet to bin (#19424)
### Description
This pull request includes a small change to the
`Dockerfile.manylinux2_28_cuda` file in the
`tools/ci_build/github/linux/docker` directory. The change corrects the
`PREPEND_PATH` argument from `/usr/local/cuda/binet` to
`/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is
set.
2024-02-07 09:51:02 -08:00
Edward Chen
df5c6718bd
Remove iOS simulator max runtime version limit. (#19396) 2024-02-06 14:54:06 -08:00
Yulong Wang
a4cfdc1c28
update comments for nodejs binding artifact preparation. (#19425)
### Description
document update as a following-up for #19274
2024-02-05 22:58:35 -08:00
Jian Chen
06a84c8a0d
Enable DML on Windows and CUDA on Linux for Node.js binding (#19274)
This pull request includes modifications to the `c-api-cpu.yml` Azure
Pipelines configuration file. The changes mainly revolve around the
Node.js packaging stage and the handling of Node.js artifacts. The most
significant changes include renaming the Node.js packaging stage, adding
a new dependency to the stage, changing artifact names, adding a new
script to list Node.js artifacts, and updating the source folder for
copying NuGet binaries.

Changes in Node.js packaging:

*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508):
Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to
`Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new
dependency to the stage.

Changes in handling of Node.js artifacts:

*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569):
Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to
`drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline
artifacts for Windows x64.
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598):
Added a new script to list Node.js artifacts from the directory
`$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`.
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640):
Updated the source folder from
`$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib`
to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task
to copy NuGet binaries to the directory
`$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-02-05 14:33:58 -08:00
Yi Zhang
435e19953e
Fix llama.covert_onnx to make it runnable in CI (#19372)
### Description
1.  make parity_check use local model to avoid using hf token
2. del the model didn't work because it tried to del the object define
out of the function scope.
     So it caused out of memory in A10.
3. In fact, 16G GPU memory (one T4) is enough. But the conversion
process always be killed in T4 and it works on A10/24G.
     Standard_NC4as_T4_v3 has 28G CPU memory
     Standard_NV36ads_A10_v5 has 440G memory.
     It looks that the model conversion needs very huge memory.

### Motivation and Context
Last time, I came across some issues in convert_to_onnx.py so I use the
onnx model in https://github.com/microsoft/Llama-2-Onnx for testing.
Now, these issues could be fixed. So I use onnx model generated by this
repo and the CI can cover the model conversion.
2024-02-05 07:26:24 +08:00
PeixuanZuo
0cba56e0a0
[ROCm] Fix CI pipeline by fixing pytest version (#19407)
Fix pytest version to 7.4.4, higher version will cause error

`from onnxruntime.capi import onnxruntime_validation 
ModuleNotFoundError: No module named 'onnxruntime.capi'`
2024-02-04 16:37:36 +08:00
Scott McKay
debd1cab10
Add coremltools 7.1 as a dependency (#19389)
### Description
<!-- Describe your changes. -->
Setup usage of coremltools via dependencies instead of copying files. 
Pull in some changes from
https://github.com/microsoft/onnxruntime/pull/19347 in preparation for
supporting ML Program and enabling building the ML Model on all
platforms to make development and testing of CoreML EP code easier.

- Update to coremltools 7.1 
- Add patch for changes required for cross platform build of ML Program
related code
- Generate coreml proto files on all platforms
- mainly to test these changes work everywhere, as the proto files will
be used on all platforms when #19347 is checked in
- rename onnxruntime_coreml_proto target to coreml_proto as it contains
purely coreml protobuf code with no ORT related chagnes

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve setup.
2024-02-03 09:42:21 +10:00
Yi Zhang
e74f141338
Save stablediffusion and open-clip in pipeline cache (#19314)
### Description
1. save the model to pipeline cache
2. lower the similarly bar to 97
3. publish the generated image that we can check it once the test fails


### Motivation and Context
Reduce model downloads
2024-01-31 09:39:27 +08:00
Rachel Guo
3e17ca3dab
Fix iOS artifacts issue in Microsoft.ML.OnnxRuntime Nuget Package (#19311)
### Description
<!-- Describe your changes. -->

Updates to only include ios archs framework in artifacts included in
Nuget Package.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Related issue:
https://github.com/microsoft/onnxruntime/issues/19295#issuecomment-1914143256

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-01-30 08:44:20 -08:00
Changming Sun
e91d91ae4f
Fix a build issue: /MP was not enabled correctly (#19190)
### Description

In PR #19073 I mistunderstood the value of "--parallel". Instead of
testing if args.parallel is None or not , I should test the returned
value of number_of_parallel_jobs function.

If build.py was invoked without --parallel, then args.parallel equals to
1. Because it is the default value. Then we should not add "/MP".
However, the current code adds it. Because if `args.paralllel` is
evaluated to `if 1` , which is True.
If build.py was invoked with --parallel with additional numbers, then
args.parallel equals to 0. Because it is unspecified. Then we should add
"/MP". However, the current code does not add it. Because `if
args.paralllel` is evaluated to `if 0` , which is False.

This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons. 

### Motivation and Context
2024-01-29 12:45:38 -08:00
Yi Zhang
e96a038f01
Add VP test in Stable diffusion pipeline (#19300)
### Description
1. Add visual parity test based on openai clip model
2. Add trigger rules

### Motivation and Context
1. check generated image is expected
2. reduce unnecessary triggers
2024-01-29 09:33:58 -08:00
Tianlei Wu
358650d441
Fix BigModel stable diffusion pipeline (#19277)
### Description
Fix two issues:
(1) We can only use single quote inside `bash -c "..."`. Current
pipeline job stopped at `python3 demo_txt2img.py astronaut` and skip the
following commands. In this change, we remove the remaining commands to
get same effect (otherwise, the pipeline runtime might be 2 hours
instead of 15 minutes).
(2) Fix a typo of Stable.
2024-01-25 17:19:04 -08:00
Changming Sun
bc54ad3f03
Update abseil to a release tag and register neural_speed (#19255)
### Description
Update abseil to a release tag and register neural_speed to CG.


### Motivation and Context
Now we are using a non-relesed version of abseil. Using a tag is better.
2024-01-24 14:37:39 -08:00
Yi Zhang
d7aebf9ea8
Move Nuget Test from T4 to A10 to reduce release duration (#19253)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Running release process is very painful and boring because some GPU jobs
have to wait so long time.

![image](https://github.com/microsoft/onnxruntime/assets/16190118/1c5c981e-68d4-4678-9758-443fbf362802)

![image](https://github.com/microsoft/onnxruntime/assets/16190118/ba0d79ba-1554-4c7a-93dd-6ea8144c9295)

![image](https://github.com/microsoft/onnxruntime/assets/16190118/36cab833-71c1-4ff5-bca5-f4caa9aee0c9)
On the one hand, we could move some T4 from PR process since some jobs
are not using T4 any more and on the other hand, we can continue to
change some jobs' agent from T4 to A4 too.

In the future, T4 will mainly be used for the scenarioes that big GPU
memory is needed, multiple GPU cards or some special cases.


Test runs:

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=401786&view=logs&j=8048494c-e6eb-5e47-5e87-ff0aa863325d

cc @YUNQIUGUO @snnn
2024-01-24 14:15:07 +08:00
Yi Zhang
54871a2773
Replace T4 to A10 in Linux GPU workflow (#19205)
### Description
1. Update Linux GPU  machine from T4 to A10, sm=8.6
2. update the tolerance 

### Motivation and Context
1. Free more T4 and test with higher compute capability.
2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss
and fail this test
```
2024-01-19T13:27:18.8302842Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:25.8438641Z Expected equality of these values:
2024-01-19T13:27:25.8438841Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:25.8439276Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:25.8439464Z   ret.first
2024-01-19T13:27:25.8445514Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ
2024-01-19T13:27:25.8446198Z 
2024-01-19T13:27:25.8555736Z [  FAILED  ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms)
2024-01-19T13:27:25.8556077Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312
2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:29.3175144Z Expected equality of these values:
2024-01-19T13:27:29.3175389Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:29.3175812Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:29.3176080Z   ret.first
2024-01-19T13:27:29.3176322Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ

```
3. some other test like SSD throw other exception, so skip them
'''
2024-01-22T09:07:40.8446910Z [ RUN ]
ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-22T09:07:51.5587571Z
/onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358:
Failure
2024-01-22T09:07:51.5588512Z Expected equality of these values:
2024-01-22T09:07:51.5588870Z   COMPARE_RESULT::SUCCESS
2024-01-22T09:07:51.5589467Z     Which is: 4-byte object <00-00 00-00>
2024-01-22T09:07:51.5589953Z   ret.first
2024-01-22T09:07:51.5590462Z     Which is: 4-byte object <01-00 00-00>
2024-01-22T09:07:51.5590841Z expected 1, got 63
'''
2024-01-23 10:49:24 -08:00
Adrian Lizarraga
37d14d7896
[QNN EP] Create Windows ARM64 nightly python package (#19128)
### Description
Adds a job to create a nightly python package for ORT/QNN on Windows
ARM64.
Must build onnxruntime-qnn with python 3.11 and numpy 1.25.

**Note: pipeline run may take up to 3 hrs**

### Motivation and Context
Make it possible to get a nightly python package with the latest updates
to QNN EP.
Issue #19161
2024-01-22 18:14:41 -08:00
Yifan Li
e283cdb218
Fix Fuzz Testing CI (#19228)
### Description
<!-- Describe your changes. -->
Add BuildArch

To verify:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=400952&view=logs&j=5b022bb4-70a7-5401-8766-a8a7802c7150&t=291e85c7-5547-590b-50de-4e01fcd4eba3&l=14

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-22 15:44:57 -08:00
Yi Zhang
780acda7b4
Add Big models pipeline (#19222)
### Description
2 models are added in CI.
Stabe diffusion Model stage is based on
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md

LLama2 FP16 is based on https://github.com/microsoft/Llama-2-Onnx.
12G GPU memory is not enough, so I choose T4 to run it.

### Motivation and Context
Add regular E2E test for big models. 
It will be triggered in main build, that is, it'll run after one PR is
merged.

More models will be added later.

### Test Runs ###

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1275191&view=results
2024-01-22 14:02:56 -08:00
Edward Chen
c8ce83967e
Download protoc for all Apple host builds, remove protoc build from iOS packaging pipeline. (#19209) 2024-01-19 15:30:09 -08:00
Adrian Lizarraga
28a16c223c
[QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129)
### Description
Update QNN pipelines to use QNN SDK 2.18 by default



### Motivation and Context
Test with the latest version of QNN SDK by default.
2024-01-18 14:59:23 -08:00
Yi Zhang
dc1fed7268
[Fix] Dual Cuda version isn't supported as expected in Linux Gpu pipeline (#19192)
### Description
<!-- Describe your changes. -->


### Motivation and Context
It isn't support expected dual cuda version 

cuda 12 link

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1272235&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91
2024-01-18 13:26:26 -08:00
Guenther Schmuelling
dd2177c5d7
enable webnn in ci build (#19163)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-18 13:11:47 -08:00
Jian Chen
9da3e36138
Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-17 17:20:42 -08:00
Changming Sun
81d363045b
Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117)
### Description
Upgrade Ubuntu machine pool from 20.04 to 22.04
2024-01-16 17:25:18 -08:00
Changming Sun
e2e488d6f8
Revert "iOS packaging pipeline stability" (#19135)
Reverts microsoft/onnxruntime#19097 because it broken Android CI
pipeline.
2024-01-16 09:18:35 -08:00
Jian Chen
c92f72ebeb
Merge Linux Nuget GPU pipeline with zip-nuget (#19120)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-16 08:59:03 -08:00
pengwa
1150b1f81e
ORTModule memory improvement (#18924)
## Dependency

https://github.com/microsoft/onnxruntime/pull/19007

## ORTModule memory efficient gradient management

Previously I have tried to solve the coarsed-grained gradient
accumulation/update problem in ORTModule with
https://github.com/microsoft/onnxruntime/pull/8979, while that
resolution somehow is not fully validated with DDP or there is user
hooks on the gradient accumulation on torch parameter.

This PR is addressing the problem in the similar approach as PR 8979,
e.g. trigger gradient accumulation once ORT computed the grad, but
instead of use a AccumulateGrad op, this time with a ONNX operator
PythonOp, internally it will call param.backward(grad), which will help
handle all related hooks correctly.


## Design

Check the details from


https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ

## Convergence Validation:


![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe)

differences are on mostly 0.000x, sometimes 0.00x, which may comes from
the different order gradient apply happens before or after this change
(on deepspeed zero stage 2)


## TODO

Consolidate the logic with Stage3's similar logic.
2024-01-16 08:57:37 +08:00
Yi Zhang
922a2f00e3
Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
2024-01-15 14:37:22 +08:00
Jian Chen
c3ce9df80c
Disabling python3.12 on training python packaging pipleines (#19123) 2024-01-14 14:51:00 -08:00
Jian Chen
76797127d6
Always download cuda and trt libraries from Azure blob (#19118)
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
2024-01-14 11:37:26 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Edward Chen
e1e45901e2
iOS packaging pipeline stability (#19097)
- Remove protoc build step which sometimes times out. Download protoc instead.
- Use macOS-12 image in the set variables stage. It seems more stable.
2024-01-13 19:27:44 -08:00
Changming Sun
5558912d7b
Disable ccache in Windows CPU CI pipeline (#19131)
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:

"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"

After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.

Cache itself should not cause this difference. 

### Motivation and Context
2024-01-13 18:40:43 -08:00
Adrian Lizarraga
65893ef382
Add --parallel to QNN EP NuGet pipeline build command (#19126)
### Description
Add --parallel to QNN EP NuGet pipeline build command

### Motivation and Context
Improve build times for pipeline.
2024-01-13 02:38:40 -08:00
Jian Chen
78e796bb27
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096)
### Description
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was
also uploaded.


For example,
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts
2024-01-12 22:30:43 -08:00
Jian Chen
e5eacc6d11
Fix cuda-packaging-pipeline.yml (#19115)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-12 19:09:25 -08:00
Guenther Schmuelling
96dbac6e4b
update to emsdk-3.1.51 (#18844) 2024-01-12 16:04:33 -08:00
Caroline Zhu
4dbaa73738
[js/web/training] added end-to-end tests (#18700)
## Summary
* following inference's [set-up for end-to-end
tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e),
created an end-to-end test runner for training
* this test runner copies testdata from the [trainingapi
folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api)
* then runs two tests (training session with evalModel & optimizer
model, and training session with the minimum options), and tests if the
ORT-web training package encompasses inference
  * these tests check 
    * createTrainingSession
    * runTrainStep
    * runOptimizerStep if applicable
* the parameters methods (getParametersSize, loadParametersBuffer, and
getContiguousParameters)

## TL;DR
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for setting up and running the end to end tests
*
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8)
contains the test function definitions (`testInferenceFunction`,
`testTrainingFunctionMin`, `testTrainingFunctionAll`)

## Flow
* entrypoint: user runs the following command in the terminal: `npm run
test:training:e2e`
*
[`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28)
was modified to include an npm script that will run `run.js` which will
run the end to end tests
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for
  * detecting and installing local tarball packages of ORT-web
  * copying training data to the `js/web/training/e2e/data` folder
* starting two Karma processes. Karma is a test runner framework that
simulates testing in the browser.
* In this case, the tests happen in Chrome. We can configure the tests
to run in Edge and other browsers in the future.
* one of these karma processes is self-hosted, meaning it pulls the
ORT-web package from local
* the other karma process is not self-hosted, meaning it pulls the
ORT-web package from another source. In this case, we start an http
server that serves the ORT-web binaries.
*
[`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c)
is responsible for starting the HTTP server and serving the ORT binary
files. This code almost identical to the same code in the inference E2E
tests.
*
[`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28)
Karma configuration file that specifies what happens when a karma
process is started. The config specifies Mocha as the testing framework,
which will go through all the loaded files and run any tests that exist
*
[`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221)
File that contains the tests that Mocha will pick up on and run.
* The test functions (such as testInference and testTrainingFunctionAll)
are defined in
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8).

## Notes
* I followed the [tests for training
core](b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc)
where they randomly generated input for the training session
* E2E tests are triggered by running `npm run test:training:e2e` --
suggestions for alternative script names are appreciated!!!

## Motivation and Context
- adding training bindings for web
2024-01-12 13:33:33 -08:00
Changming Sun
55b046e97e
Remove enable_mac_silicon settings (#19108)
### Description
Remove enable_mac_silicon settings from two packaging pipelines.

### Motivation and Context
Now we build universal2 packages instead.
2024-01-12 11:01:39 -08:00
Changming Sun
0e8d4c3d21
Enable Address Sanitizer in CI (#19073)
### Description
1. Add two build jobs for enabling Address Sanitizer in CI. One for
Windows CPU, One for Linux CPU.
2. Set default compiler flags/linker flags in build.py for normal
Windows/Linux/MacOS build. This can help control compiler flags in a
more centralized way.
3. All Windows binaries in our official packages will be built with
"/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft
public symbol
server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols).

Limitations:
1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries.
Therefore once Address Sanitizer is enabled, before running tests we
need to manually set LD_LIBRARY_PATH properly otherwise
libonnxruntime.so may not be able to find custom ops and shared EPs.
4. On Linux we also need to set LD_PRELOAD before running some tests(if
the main executable, like python, is not built with address sanitizer.
On Windows we do not need to.
5. On Windows before running python tests we should manually copy
address sanitizer DLL to the onnxruntime/capi directory, because python
3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the
information provided by PATH env.
6. On Linux Address Sanitizer found a lot of memory leaks from our
python binding code. Therefore right now we cannot enable Address
Sanitizer when building ONNX Runtime with python binding.
7. Address Sanitizer itself uses a lot of memory address space and
delays memory deallocations, which is easy to cause OOM issues in 32-bit
applications. We cannot run all the tests in onnxruntime_test_all in
32-bit mode with Address Sanitizer due to this reason. However, we still
can run individual tests in such a way. We just cannot run all of them
in one process.

### Motivation and Context
To catch memory issues.
2024-01-12 07:24:40 -08:00
Changming Sun
285606108a
Set pythonInterpreter in set-python-manylinux-variables-step.yml (#19105)
### Description
Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix
a build error:

```
Starting: Set Python manylinux variables
==============================================================================
Task         : Python script
Description  : Run a Python file or inline script
Version      : 0.231.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script
==============================================================================
##[error]Parameter 'toolPath' cannot be null or empty.
Finishing: Set Python manylinux variables
```
The error was because today I deleted a bunch of software from the VM
image. The task might fail if no Python versions are found in
$(Agent.ToolsDirectory).
2024-01-12 07:22:02 -08:00
Jian Chen
53497702a6
Fix Nuget CUDA Packaging pipeline (#19054)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
2024-01-11 11:59:21 -08:00
Jian Chen
2eb3db6bf0
Adding python3.12 support to ORT (#18814)
### Description
Adding python3.12 support to ORT



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-11 08:34:28 -08:00
Baiju Meswani
730df1bfa2
Increase MacOS pipeline timeout (#19072) 2024-01-09 18:35:21 -08:00
Ashwini Khade
897a4163d7
Update transformer version for training CIs (#19046)
### Description
Updating version to resolve security vulnerability.
2024-01-09 12:00:34 -08:00
Changming Sun
ab897a4a40
Remove Windows ARM32 from nuget packaging pipelines (#19049)
### Description
1. Remove Windows ARM32 from nuget  packaging pipelines

2. Add missing component-governance-component-detection-steps.yml to
some build jobs.

### Motivation and Context
Stop supporting Windows ARM32 to align with [Windows's support
policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64).
Users who need this feature still can build the DLLs from source.
However, later on we will remove that support too.
2024-01-09 07:45:03 -08:00
Adrian Lizarraga
52e5601449
[QNN Nuget Pipeline] Build with ML ops and detect ORT version (#19024)
### Description
- Removes `--disable_ml_ops` build flag 
- Automatically detects ORT version from VERSION file via
`templates/set-version-number-variables-step.yml`. We will no longer
need to create a commit to update ORT versions.

### Motivation and Context
- A new unit test caused failures in the QNN Nuget pipeline because it
did not enable ml ops.
- Automate ORT version specification
2024-01-08 12:44:12 -08:00
Yi Zhang
e8ac97c8d8
Move Windows GPU training job to A10 (#19041)
### Description
1. Update sm to 86

### Motivation and Context
We have more A10 quota then T4 and Nvidia AXX could be  partitioned
2024-01-08 09:19:58 -08:00
PeixuanZuo
efdcefcf8c
[ROCm] fix security warning (#19017)
fix security warning
2024-01-05 10:05:34 -08:00
Changming Sun
e155c66b4a
Change all macOS python packages to use universal2 (#19013)
### Description
Change all macOS python packages to use universal2, to reduce the number
of packages we have.

### Motivation and Context
According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur),
macOS 11 is the first macOS version that supports universal 2. And it is
the min macOS version we support. So we no longer need to maintain
separate binaries for different CPU archs.
2024-01-04 17:44:49 -08:00
Adrian Lizarraga
02b1ff5fa2
[QNN EP] Support multithreaded inference of a single session (#18981)
### Description
- Add mutex to protect QNN API calls for executing a graph and
extracting the corresponding profile data.
- Ensures QNN EP's execute function does not store unnecessary state
(i.e., input and output buffer pointers do not need to be stored as
class members.)

### Motivation and Context
Allow calling `session.Run()` from multiple threads when using QNN EP.
2024-01-04 13:32:48 -08:00
PeixuanZuo
7a454acd61
[ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985)
Update CI/Packaing pipeline to ROCm6.0
2024-01-03 17:25:15 +08:00
Yi Zhang
c97e3f4821
[Fix] exception in Fuzz Test pipeline (#18984)
### Description
<!-- Describe your changes. -->


### Motivation and Context
The file path is not correct.
2024-01-03 14:53:31 +08:00
Yifan Li
3993d43048
[EP Perf] Fix missing Azure cli & use onnx zoo model inside image (#18917)
### Description
* Fix [missing Azure CLI
issue](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392612&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=12).
* Now, once CI fails to run `az --version`, it would auto-reinstall the
azure cli dependency
* Use existing onnx zoo model inside image during memtesting 
   * to avoid test failure when onnx model zoo is restructuring
* Display more detail info of valgrind when memtesting
* Clear invalid dep of existing AddressSanitizer test case


### Validate
* Before the fix, Azure CLI is missing:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392994&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=10
* After the fix:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392619&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b
2024-01-01 17:14:39 -08:00
Yi Zhang
3f03c12986
Split Onnxruntime Nuget GPU package (#18819)
### Description
1. Update donwload-artifacts to flex-downloadartifacts to make it eaiser
to debug.
2. Move the native files into Gpu.Windows and Gpu-linux packages.
Onnxruntime-Gpu has dependency on them.
3. update the package validation as well
4. Add 2 stages to run E2E test for GPU.Windows and GPU.Linux
   for example:
   

![image](https://github.com/microsoft/onnxruntime/assets/16190118/35c6730b-8080-4f52-a17c-b9c61f41b6bb)



### Motivation and Context
Single Onnxruntime.Gpu Package size has already excceded the Nuget size
limit.
We split the package into some smaller packages to make them can be
published.

For compatibility, the user can install or upgrade Onnxruntime.Gpu,
which will install Gpu.Windows and Gpu.Linux automatically.
And the user can only install Gpu.Windows and Gpu.Linux directly. 

### Test Link
1. In ORT_NIGHTLY

2. Install the preview version in nuget-int. (nuget source:
https://apiint.nugettest.org/v3/index.json)

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-12-22 16:57:16 +08:00
Changming Sun
3d8f229d39
Add ARM64EC build jobs (#18870)
### Description
Add ARM64EC build jobs in post merge pipeline to validate if our code is
compatible with Windows ARM64EC.
2023-12-21 16:31:38 -08:00
Yifan Li
54e471a054
[EP Perf] Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard (#18868)
### Description
Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard:

![image](https://github.com/microsoft/onnxruntime/assets/109183385/bafba098-1338-46fa-b10a-ca19eff2a746)

Check
[here](https://msit.powerbi.com/groups/d1ae6355-afd0-4c40-b78e-676a86cab1e2/reports/82101bbb-dad2-4f24-9ddf-a37f0d41509a/ReportSectionda402bdf6824e505a614?experience=power-bi)
to preview on ep perf dashboard


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- brief overview of op metrics towards various models
- easy to identify models which haven't reached 100% ops on cuda/trt ep.
2023-12-20 22:11:47 -08:00
Hector Li
8931854528
Move some QNN EP provider options to session options (#18877)
Move QNN EP provider options to session options

### Description
Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first.

This is the first step for PR:
PR https://github.com/microsoft/onnxruntime/pull/18865
2023-12-20 00:13:38 -08:00
Scott McKay
666fcbde4d
Add LeakyRelu to list of NNAPI operators (#18880)
### Description
<!-- Describe your changes. -->
Add LeakyRelu to the list as support was added a while ago. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-20 14:44:31 +10:00
Changming Sun
535a2403dd
Update Nuget publishing jobs (#18851)
### Description
1. Add a CodeSign validation task before the binaries are published, to
make sure all DLL files are signed.
2. Auto-trigger the CUDA 12 pipeline's publishing job.
2023-12-19 16:54:46 -08:00
Ashwini Khade
4dff154f51
Fix nightly pipeline failure (#18867)
### Description
Fixes a failure in the ortmodule nightly pipeline. 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-19 09:18:00 -08:00
Jian Chen
6d7519ede8
Adding new pipeline for python cuda testing (#18718)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-18 18:13:03 -08:00
Changming Sun
ad476d5a1f
Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly (#18847)
### Description
Change Nuget packaging pipeline's build TRT job to download CUDA SDK
on-the-fly, so that we do not need to put a CUDA SDK in the build
machine's image.
2023-12-15 17:44:02 -08:00
Changming Sun
fc9ecb59db
Add Windows ARM build jobs to post merge pipeline (#18832)
### Description
Add Windows ARM build jobs to post merge pipeline to valid our code is
still compatible with these build settings.
2023-12-15 08:47:52 -08:00
Changming Sun
cbad4fe49b
Update absl and googletest (#18827)
### Description
Update absl and googletest to their latest version to include some cmake
changes:
1. A googletest's cmake change that will allow using external absl and
re2.
2. Nullability enhancements that will allow our clang-based static
analysis detecting many kinds of null pointer errors.



### Motivation and Context
To fix a C4744 link warning in our Windows pipelines.
```
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
```
2023-12-14 16:15:07 -08:00
Changming Sun
b129f425fc
Fix test model URL issue (#18823)
### Description
ONNX model zoo changed their dir structure. So some our pipelines are
failing. In prevent such things happening again, we'd better to read the
test data for a cache from local disk instead of downloading it remotely
every time.
2023-12-14 13:06:08 -08:00
Changming Sun
95193cb440
Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810)
### Description
To upgrade the clang version in preparation for PR #17031 .
2023-12-14 08:08:41 -08:00
Rachel Guo
f3fa045681
Enable MacOS build in ORT Objc Pod (#18786)
### Description
<!-- Describe your changes. -->

Add macos build for objc pod. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Follow up pr for #18550

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-12-13 13:50:42 -08:00
Changming Sun
17eaf9b053
Fix a build warning in SparseTensor code for 32-bit build configs (#18766)
### Description
The warning is:

```

                C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.1812949Z                  with
2023-12-08T20:58:48.2144272Z                  [
2023-12-08T20:58:48.2145285Z                      Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.2801935Z                  ]
2023-12-08T20:58:48.2804047Z        C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2806197Z        C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2871783Z        C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2893010Z        C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2894476Z                  with
2023-12-08T20:58:48.2911521Z                  [
2023-12-08T20:58:48.2912457Z                      Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,
2023-12-08T20:58:48.3067840Z                      T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3068863Z                  ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc)
2023-12-08T20:58:48.3195854Z        C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3197946Z                  with
2023-12-08T20:58:48.3198565Z                  [
2023-12-08T20:58:48.3199093Z                      T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3905678Z                  ]
2023-12-08T20:58:48.3907275Z        C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3912734Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3913414Z                  with
2023-12-08T20:58:48.3913660Z                  [
2023-12-08T20:58:48.3914001Z                      Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.3914499Z                  ]
2023-12-08T20:58:48.3914743Z          qlinear_concat.cc
2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3918624Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5534583Z                  with
2023-12-08T20:58:48.5541266Z                  [
2023-12-08T20:58:48.5542401Z                      Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5544914Z                  ]
2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5552099Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5553712Z                  with
2023-12-08T20:58:48.5555569Z                  [
2023-12-08T20:58:48.5556779Z                      Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5558707Z                  ]
2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5565624Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5566354Z                  with
2023-12-08T20:58:48.5568185Z                  [
2023-12-08T20:58:48.5569305Z                      Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5571339Z                  ]
2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5577866Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5578562Z                  with
2023-12-08T20:58:48.5580399Z                  [
2023-12-08T20:58:48.5581503Z                      Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5583465Z                  ]
2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5590705Z    182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5591396Z                  with
2023-12-08T20:58:48.5593220Z                  [
2023-12-08T20:58:48.5593693Z                      Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5595955Z                  ]

```
And the warning in #18195



### Motivation and Context
AB#22894

---------

Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
2023-12-13 11:11:13 -08:00
Changming Sun
44054e7508
Move NuGet nightly package publishing job to a separated pipeline (#18801)
### Description
Move NuGet nightly package publishing job to a separated pipeline.
Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs
Packaging Pipeline'. This PR moves it to a separate pipeline so that we
can manually trigger this step for any branch(e.g. release branches).
2023-12-13 11:10:50 -08:00
Jian Chen
ce1fed6ddf
Adding a new pipeline for publishing to Python Cuda 12 packages. (#18712)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-11 14:17:46 -08:00
Jian Chen
bfa5eb4591
Adding a new pipeline for pubilshing cuda 12 nuget packages (#18713)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-11 13:07:05 -08:00
Ashwini Khade
16df8377d3
Update transformers package to fix the security issue (#18730)
### Description
Updating transformers package in test pipeline to fix a security
vulnerability.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-11 09:15:23 -08:00
cloudhan
de32baeeef
[ROCm] Add GemmFloat8 (#18488) 2023-12-11 11:37:29 +08:00
Changming Sun
bf33919afb
Update absl and gtest to fix an ARM64EC build error (#18735)
### Description
Update absl and gtest to fix an ARM64EC build error


### Motivation and Context
We need to get an important fix into ORT.
The fix is:

8028a87c96
2023-12-07 15:55:17 -08:00
Yi Zhang
a045be335b
use EO pool for windows web_cpu stage (#18737)
### Description
reuse EO pool in NPM pipeline.


### Motivation and Context
build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO
pool.
Reuse EO pool to make the pipeline work now.
When I'm free, I'll try upgrading the chrome in the custom image.
2023-12-07 10:10:00 -08:00
Rachel Guo
7762f3f7c5
[NNAPI EP] Add NNAPI Split (#18702)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

yolo-v8 model missing operator support.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-12-06 15:11:15 -08:00
Adrian Lizarraga
559bd52252
[QNN EP] Update QNN SDK to version 2.17.0 (#18684)
### Description
- Update QNN CI Pipelines to use QNN SDK version 2.17.0
- **Print warning if unit test requires adjusted tolerance to pass**
- **Temporarily disable unloading QnnCpu.dll for windows x64 due to
crash when calling FreeLibrary**
- Enable fixed HTP tests
  - QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale
  - QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8
  - QnnHTPBackendTests.ReduceSumS8Opset13_Rank5
  - QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis
  - QnnHTPBackendTests.WhereLargeDataBroadcastU8
  - QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8
- Enabled fixed CPU tests
  - QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales
- Increased tolerance for HTP tests that are less accurate on QNN SDK
2.17.0
  - QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8
  - QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8
  - QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8
  - QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input
  - QnnHTPBackendTests.ConvU8U8S32_bias_initializer
  - QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer
  - QnnHTPBackendTests.LRNSize3
  - QnnHTPBackendTests.LRNSize5
  - QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8
  - QnnHTPBackendTests.MaxPool_LargeInput_1Pads
  - QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel
  - QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel
  - QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel
  - QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners
  - QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric
- Disabled ONNX model tests
- averagepool_2d_ceil: Accuracy issues **only on Windows x64
QnnCpu.dll**
- Disabled QDQ model tests (onnx_test_runner)
  - facedetection_op8_qdq: Accuracy issues
- Disabled CPU EP tests (these use QnnCpu.dll)
  - ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX
- GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is
initializer and bias is not
- MathOpTest.MatMulFloatType "test padding and broadcast B > A":
Inaccuracy (**only linux**)
- Fix Gemm translation bugs in QNN EP:
  - Do not skip processing of inputs that need to be transposed.

### Motivation and Context
- Allow testing with newest QNN SDK version
- Take advantage of improvements to enable new models.
2023-12-06 11:05:41 -08:00
Changming Sun
eaaf27015e
Remove EnvSetupScript parameter from win-ci.yml (#18662)
### Description
To make the code more consistent. Now some TRT pipelines download TRT
binaries on-the-fly, while other TRT pipelines use a preinstalled
version. This PR make them the same.
2023-12-01 15:30:16 -08:00
Rachel Guo
9c45fe4957
Fix macos xcframework test stage codesign info (#18649)
### Description
<!-- Describe your changes. -->

Remove developement id and force codesign not required in the test macos
target.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix failure happened in iOS_Full_xcframwork stage in
Zip-Nuget-Java-NodeJS packaging pipeline.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-12-01 14:47:46 -08:00
Jian Chen
d69842226b
Update the template files to correct stage to fix the python cuda 12 packaging pipeline (#18651) 2023-12-01 07:57:46 -08:00
Yi Zhang
efee9abdb7
Reduce downloads in Nuget-Java pipeline to reduce connection exception (#18635)
### Description
1. Add a new stage to download java tools from https://oss.sonatype.org
and publish them to pipeline artifact
2. Remove downloads in other jobs, they get the java tools from pipeline
artifact
3. consolidate final_java_testing stages.


### Motivation and Context
Reduce downloads to reduce the connection error like below.

```
--2023-11-28 07:16:31--  https://oss.sonatype.org/service/local/repositories/releases/content/org/junit/platform/junit-platform-console-standalone/1.6.2/junit-platform-console-standalone-1.6.2.jar
Resolving oss.sonatype.org (oss.sonatype.org)... 3.227.40.198, 3.229.50.23
Connecting to oss.sonatype.org (oss.sonatype.org)|3.227.40.198|:443... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2023-11-28 07:16:32 ERROR 502: Bad Gateway.
```
2023-12-01 07:44:44 +08:00
Changming Sun
1b5675ff0f
Update post-merge-jobs.yml: increase timeout value for the Ios job (#18602) 2023-11-30 08:07:13 -08:00
Yi Zhang
68209307da
Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 (#18614)
### Description
Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to
Onnxruntime-Win-CPU-2022


### Motivation and Context
Reduce the maintenance cost
2023-11-29 10:32:42 -08:00
Edward Chen
14a343441d
Fix Objective-C static analysis build (#18606)
- Patch abseil to fix a compile error about not finding `cxxabi.h`.
- Fix some static analysis warnings.
2023-11-28 17:14:20 -08:00
Jian Chen
a49f31b670
Remove drop-nuget artifact from all pipelines (#18592)
### Description
Currently, the `drop-nuget` artifact only contains protoc.exe which is
also part of the `drop-extra` artifact.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-28 13:23:01 -08:00
Mike Guo
e24733cfe9
fix the Olive CI pipeline failure on Windows (#18464)
Fix the https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1046
failure for Windows
2023-11-28 11:42:39 -08:00
Rachel Guo
288b80d363
Add MacOS build to ORT C Pod (#18550)
### Description
<!-- Describe your changes. -->

As title.

1. Add macos build as an optionally enabled arch for pod and changes to
exsiting build_ios_framework/assemble_c_pod scripts.
2. Enable macos build arch in ios packaging pipeline (currently for
variants other than Mobile) and check the output artifacts are correct.
3. Write MacOS Test Target scheme in the test app and integrate into ios
packaging CI testing pipeline.
Currently the changes only apply to onnxruntime-c pod. as the original
request was from ORT SPM which consumes the onnxruntime-c pod only as
the binary target. TODO: could look into adding macos platform to objc
pod as well.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable macos platform support in cocoapods. and also potentially produce
binary target for enabling macos platform in SPM as well.

Replace https://github.com/microsoft/onnxruntime/pull/18334

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-11-28 10:11:53 -08:00
Yi Zhang
a6d8726407
Update ADO windows image to custom image (#18598)
### Description
Update Azure-Pipelines-EO-Windows2022-aiinfra to
onnxruntime-win-CPU-2022 in Nuget_Package_CPU.
To make the debugging easier, use flex-downloadPipelineArtifact

### Motivation and Context
Azure-Pipelines-EO-Windows2022-aiinfra is using 1ES window-latest image.
The pipeline might be failed by unexpected upgrade.
Verified:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=384425&view=results

### P.S.
I think we should replace all Azure-Pipelines-EO-Windows2022-aiinfra.
2023-11-28 09:04:25 -08:00
Jian Chen
3ea27c2925
Create a new Nuget Package pipeline for CUDA 12 (#18135) 2023-11-28 09:03:46 -08:00
Rachel Guo
62f00ad8e7
[CoreML] Add Softmax and Split op support (#18358)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Added for yolov8 model missing operator support.
https://github.com/microsoft/onnxruntime/issues/17654

Now the model support info looks like:
 
_CoreMLExecutionProvider::GetCapability, number of partitions supported
by CoreML: 3 number of nodes in the graph: 233 number of nodes supported
by CoreML: 230_

(only missing 3 concat op support due to input 3d shape is not currently
support in CoreML EP Concat).

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-11-23 14:26:57 -08:00
cloudhan
6f3c1f9dc9
[ROCm] Update ck for GemmFloat8 (#18487) 2023-11-23 12:06:19 +08:00
Yulong Wang
d455b0f8fd
[js/web] use Chrome in CI for npm tests (#18522)
### Description
use Chrome in CI for npm tests. Previously we use Edge, however it
sometimes crashes with reasons not yet identified.
2023-11-21 18:03:57 -08:00
Abhishek Jindal
680a526e73
Training packaging pipeline for cuda12 (#18524)
### Description
<!-- Describe your changes. -->
Build ORT-training packaging pipeline for CUDA 12.2


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will help any customer using CUDA 12 and would not need to build
ORT-training from source

Test run:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0
2023-11-21 13:19:21 -08:00
Jian Chen
1dd9bf5340
Remove setup_env_azure.bat (#18482)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-20 09:58:15 -08:00
Jian Chen
d97fc1824f
Create a new Python Package pipeline for CUDA 12 (#18348)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-20 09:48:28 -08:00
Wei-Sheng Chin
3bcc137eb4
Tiny change to trigger the update of DORT's CI image (#18507)
Recent PyTorch breaks DORT CI and [a
patch](https://github.com/pytorch/pytorch/pull/113697) has been merged
into PyTorch main. In order to update DORT's CI, we made dummy change in
this PR.
2023-11-19 22:09:11 -08:00
Changming Sun
9364c05170
Update web-ci.yml: remove depth=1 (#18500)
### Description
It causes our "NPM Packaging Pipeline" to fail.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-17 22:49:03 -08:00
Changming Sun
41f9379f3c
Update NDK version to 26.1.10909125 (#18493)
### Description
Similar to #17852


### Motivation and Context
To avoid downloading NDK
2023-11-17 14:14:01 -08:00
Changming Sun
5eb5056c61
Always run emsdk_env.sh before build.py, even when ccache is disabled (#18477)
### Description
Always run emsdk_env.sh before build.py, even when ccache is disabled

This is a follow up to #18434. That PR didn't handle the case when
ccache was disabled.
2023-11-16 21:37:29 -08:00
Jian Chen
05526b354b
Adding new yaml file for downloading cuda, and trt from azure blob (#18443)
This also set the Path variable for the downloaded libraries. 

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-14 19:47:39 -08:00