onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-21 02:18:09 +00:00

Author	SHA1	Message	Date
Edward Chen	535e9d7114	Update package_release_tasks.py (#20835 ) 1. Move azcopy environment variables out of script and into an Azure DevOps variable group. Move towards consolidating the managed identity client ID definition in one place. 2. Disable azcopy overwrite. We don't want to accidentally change the files for a released package.	2024-05-28 17:50:25 -07:00
Adrian Lizarraga	e78b18a2fb	Increase ComponentDetection timeout for React Native CI (#20800 ) ### Description Runs of the React Native CI are timing out during ComponentDetection after 8 minutes. This increases the timeout value. ### Motivation and Context Runs of the React Native CI are timing out during ComponentDetection.	2024-05-28 08:36:38 -07:00
Jian Chen	b1b8cb05dc	Adding java build and packaging stage to cuda-packaging-pipeline.yml (#20812 ) ### Description Adding java build/packaging stage to `cuda-packaging-pipeline.yml` ### Motivation and Context This way we can enable publishing the Java Cuda 12 along with Nuget CUDA 12	2024-05-27 07:59:19 -07:00
Changming Sun	439ed92b96	Remove TVM EP's pipeline (#20813 ) ### Description Temporarily remove TVM EP's pipeline until someone helps us upgrade TVM to a newer version which is compatible with the latest ONNX. ### Motivation and Context The ONNX version that TVM EP uses has a known security vulnerability. We cannot continue using it in our hosted build environment. This change is temporary	2024-05-25 20:42:41 -07:00
Jian Chen	fe24006425	Fix Nuget Cuda pipeline package pipeline (#20741 ) ### Description <!-- Describe your changes. --> This PR adding protoc.exe to make the Nuget Cuda Pipleine, which also allowing it to get build Java for various CUDA version ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:15:57 -07:00
Changming Sun	535a030b1e	Remove manylinux build scripts from python packaging pipeline (#20786 ) ### Description Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in https://github.com/onnx/onnx/issues/6047 and https://github.com/microsoft/onnxruntime-genai/issues/257 . ### Motivation and Context To extract the common part as a reusable build infra among different ONNX Runtime projects.	2024-05-24 08:18:22 -07:00
Jian Chen	884acd4598	Fix Nuget-Cuda pubish pipeline (#20794 ) ### Description Previous all feed are set to nightly, the offcial released feed-id is not set ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-23 18:27:46 -07:00
Changming Sun	b522df0ae4	Update RE2 to the latest (#20775 ) Update RE2 to the latest. To keep the components up to date.	2024-05-23 14:30:15 -07:00
Yulong Wang	0996d6e19e	[tools] update pipeline list for run_CIs_for_external_pr.py (#20776 ) ### Description add required pipeline "Linux Android Emulator QNN CI Pipeline"	2024-05-23 10:38:42 -07:00
Yi Zhang	fa8670fe5b	Add a test image for stable diffusion (#20780 )	2024-05-23 08:50:23 -07:00
Jian Chen	d4fe4b5b51	Replace ubuntu-latest with onnxruntime-Ubuntu2204-AMD-CPU (#20736 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-22 13:36:02 -07:00
Jian Chen	0a10a3003a	component-governance fix round 4 (#20754 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-22 11:05:24 -07:00
Jian Chen	372974e5d6	Using CPU pool to build Linux GPU C API Package (#20648 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-20 15:25:14 -07:00
Jian Chen	ddafbf2224	Component Governance fix round 3 (#20689 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-20 13:39:09 -07:00
Jian Chen	11df22b59b	Reenabling Nuget Cuda Packaging Pipeline (#20688 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-20 10:37:15 -07:00
Edward Chen	fefae0cd04	Add Mac CI GitHub Actions workflow (#20717 ) Add a new GitHub Actions workflow, `.github/workflows/mac.yml`. It contains these jobs: - ARM64 MacOS CI build. - Objective-C static analysis build. This was moved over from another Azure DevOps pipeline to make it more visible.	2024-05-20 10:27:03 -07:00
Yulong Wang	036fcd93d4	[js/web] optimize module export and deployment (#20165 ) ### Description This PR make numbers of optimizations to onnxruntime-web's module export and deployment. See each section below for more details. #### Preview > [onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21) > ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~ > ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~ > ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~ <details> <summary><h4>Breaking changes</h4></summary> There is no code change required, but there are a few differences regarding code import, flags, bundler config and deployment steps. #### Importing: Import table is changed. See following for details. <details> <summary><h5>Current import table:</h5></summary> \| Target Name \| Path for "import" or "require" \| WebGL \| JSEP \| wasm \| Proxy \| Training \| \|------\|-----\|-----\|-----\|-----\|-----\|-----\| \| `ort` (default) \| `onnxruntime-web` \| ✔️ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.all` \| `onnxruntime-web/experimental` \| ✔️ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| \| `ort.node` \| `onnxruntime-web` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.training` \| `onnxruntime-web/training` \| ❌ \| ❌ \| ✔️ \| ✔️<sup>\[1]</sup> \| ✔️ \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.wasm-core` \| `onnxruntime-web/wasm-core` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| ✔️ \| ❌ \| ❌ \| ✔️<sup>\[2]</sup> \| ❌ \| \| `ort.webgpu` \| `onnxruntime-web/webgpu` \| ❌ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| * [1] didn't test. may not actually work. * [2] not working. this is a mistake in build config. </details> <details> <summary><h5>Proposed update:</h5></summary> \| Target Name \| Path for "import" or "require" \| WebGL \| JSEP \| wasm \| Proxy \| Training \| \|------\|-----\|-----\|-----\|-----\|-----\|-----\| \| `ort` (default) \| `onnxruntime-web` \| ✔️ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.all` \| ~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` \| ✔️ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| \| `ort.node` \| `onnxruntime-web` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.training` \| `onnxruntime-web/training` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ✔️ \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| ~~`ort.wasm-core`~~ \| ~~`onnxruntime-web/wasm-core`~~ \| ~~❌~~ \| ~~❌~~ \| ~~✔️~~ \| ~~❌~~ \| ~~❌~~ \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| ✔️ \| ❌ \| ❌ \| ~~✔️~~ ❌ \| ❌ \| \| `ort.webgpu` \| `onnxruntime-web/webgpu` \| ❌ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| </details> #### Flags: The following flags are deprecated: - `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in build. The following flags changed their type: - `env.wasm.wasmPaths`: When using this flag as a string ( for the URL prefix ), nothing is changed. When using this flag as an object ( for per-file path override ), the type changed: ```diff - export interface Old_WasmFilePaths{ - 'ort-wasm.wasm'?: string; - 'ort-wasm-threaded.wasm'?: string; - 'ort-wasm-simd.wasm'?: string; - 'ort-training-wasm-simd.wasm'?: string; - 'ort-wasm-simd-threaded.wasm'?: string; - }; + export interface New_WasmFilePaths { + /** + * Specify the override path for the main .wasm file. + * + * This path should be an absolute path. + * + * If not modified, the filename of the .wasm file is: + * - `ort-wasm-simd-threaded.wasm` for default build + * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and WebNN) + * - `ort-training-wasm-simd-threaded.wasm` for training build + / + wasm?: URL\|string; + /* + * Specify the override path for the main .mjs file. + * + * This path should be an absolute path. + * + * If not modified, the filename of the .mjs file is: + * - `ort-wasm-simd-threaded.mjs` for default build + * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and WebNN) + * - `ort-training-wasm-simd-threaded.mjs` for training build + / + mjs?: URL\|string; + } ``` #### Bundler compatibility: Config changes are need for bundlers. See usage example in /js/web/test/e2e/ for Webpack, parcel and rollup. #### Deployment: - if consuming from a CDN, there is no breaking change. - if consuming from a local server, need to copy all `ort-.wasm` and `ort-.mjs` files (totally 6 files) in the dist folder. (previously only need to copy `ort-.wasm` files.) </details> <details> <summary><h4>Problems</h4></summary> There are a few problems with the current module export and deployment: - Script URL cannot be correctly inferred when imported as ESM. - Workers are forcefully encoded using Blob URL, which makes onnxruntime-web not working in CSP environment and Node.js, when using proxy or multi-threading feature. - Generated JS code (by Emscripten) is encoded using `function.toString()`, which is unstable and error-prone. - When running with a different Emscripten build, always need the build step. Making it difficult to swap artifacts in deveopment/debug. </details> <details> <summary><h4>Goals</h4></summary> - Full ESM support - Support variances of ways to import. Including: - import from HTML's `<script>` tag (IIFE format, exporting to global variable `ort`) ```html <script src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script> ``` - import from source code inside `<script type="module">` tag (ESM) ```html <script type="module"> import * as ort from "https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs"; // using 'ort' </script> ``` - import in a CommonJS project (CJS format, resolve from package.json "exports" field) ```js // myProject/main.js const ort = require('onnxruntime-web'); ``` - import in an ESM project (ESM format, resolve from package.json "exports" field) ```js // myProject/main.js (or main.mjs) import * as ort from 'onnxruntime-web'; ``` - Support popular bundlers when importing onnxruntime-web into a CJS/ESM project. - webpack (esm requires extra post-process step) - rollup - parcel (esm requires extra post-process step) - More bundlers TBD - Multi-threading support for Node.js NOTE: keeping single JavaScript file (the all-in-one bundle) is no longer a goal. This is because technically there is a conflict with the other requirements. </details> <details> <summary><h4>Important Design Decisions</h4></summary> - Drop support of single JavaScript output. - The current onnxruntime-web distribution uses a single JavaScript file to include all code. While there are a few benefits, it also creates problems as mentioned above. Since ESM is being used more and more widely, and browsers are making more restricted security checks and requirement, the old Blob based solution is going to be replaced. - To achieve the requirement, specifically, the CSP environment support, we have to offer a non Blob based solution. Therefore, we have to distribute multiple files and drop the single file solution. - Do not run parser/postprocess on Emscripten generated JavaScript. - Emscripten is evolving quickly so we should only depends on what's in its documentation instead of a certain implementation details. (for example, currently we patch on its code to deal with a special variable `_scriptDir`) - Keep the generated files as-is also helps to: - reduce the size of ort.min.js - make it easier to replace build artifacts when in development/debug - Drop support for non-SIMD and non-MultiThread. This helps to reduce the number of artifacts in distribution. - (fixed-sized) SIMD is supported in any mainstream JS environment. - Multi-thread as WebAssembly feature is supported in any mainstream JS environment. In some environment the feature is guarded with cross origin policy, but it can still work if not trying to create any worker. - Use ESM output for Emscripten generated JavaScript. - There are 2 ways to dynamically import classic (umd) modules and neither of them are recommended: - dynamically creating a <script> tag. This changes the HTML structure and have quite a lot of compatibility issue - use `fetch()` and `eval()`. However `eval` is strongly suggested to be avoid because there is a great perf hit. - importing ESM is super easy - just use the `import()` call. Considering ESM is widely supported in modern browsers and Node.js this is the better option. - Add Blob based solution as a fallback for cross-origin workers. - There are still wide use case of importing onnxruntime-web from CDN. In this usage, make it able create worker by using `fetch()`+`Blob` to create a same-origin Blob URL. </details> <details> <summary><h4>Distribution File Manifest</h4></summary> The distribution folder contains the following files: - WebAssembly artifacts. These files are the result of compiling the ONNX Runtime C++ code to WebAssembly by Emscripten. \| File Name \| Build Flags \| \|------\|-----\| \| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm \| `--enable_wasm_simd` <br/> `--enable_wasm_threads` \| \| ort-training-wasm-simd-threaded.mjs <br/> ort-training-wasm-simd-threaded.wasm \| `--enable_training_apis` <br/> `--enable_wasm_simd` <br/> `--enable_wasm_threads` \| \| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm \| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep` <br/> `--use_webnn` \| - onnxruntime-web JavaScript artifacts. These files are generated by ESBuild as the entry point for onnxruntime-web. There are multiple build targets for different use cases: \| Target Name \| Path for "import" or "require" \| Description \| \|------\|-----\|-----\| \| `ort` \| `onnxruntime-web` \| The default target. \| \| `ort.all` \| `onnxruntime-web/all` \| The target including webgl. \| \| `ort.node` \| `onnxruntime-web` \| The default target for Node.js. \| \| `ort.training` \| `onnxruntime-web/training` \| The target including training APIs \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| The target including only WebAssembly (CPU) EP \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| The target including only WebGL EP \| For each target, there are multiple files generated: \| File Name \| Description \| \|------\|-----\| \| [target].js \| The entry point for the target. IIFE and CommonJS format. \| \| [target].mjs \| The entry point for the target. ESM format. \| \| [target].min.js <br/> [target].min.js.map \| The entry point for the target. Minimized with sourcemap. IIFE and CommonJS format. \| \| [target].min.mjs <br/> [target].min.mjs.map \| The entry point for the target. Minimized with sourcemap. ESM format. \| \| [target].proxy.mjs \| (if appliable) The proxy ESM module for the target. \| \| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map \| (if appliable) The proxy ESM module for the target. Minimized with sourcemap. \| </details> <details> <summary><h4>Dynamic Import Explained</h4></summary> - Local Served \| No Proxy: ``` [Bundle or ort.min.js] \| + import()--> [ort-wasm-simd-threaded.mjs] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [ort-wasm-simd-threaded.mjs (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Local Served \| Proxy: ``` [Bundle or ort.min.js] \| + import()--> [ort.proxy.min.mjs] \| + new Worker()--> [ort.proxy.min.mjs (worker)] \| + import()--> [ort-wasm-simd-threaded.mjs] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [ort-wasm-simd-threaded.mjs (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Cross Origin \| No Proxy: ``` [Bundle or ort.min.js] \| + fetch('ort-wasm-simd-threaded.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort-wasm-simd-threaded)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Cross Origin \| Proxy ``` [Bundle or ort.min.js] \| + fetch('ort.proxy.min.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort.proxy)] \| + new Worker()--> [blob:... (ort.proxy) (worker)] \| + fetch('ort-wasm-simd-threaded.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort-wasm-simd-threaded)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` </details>	2024-05-20 09:51:16 -07:00
Edward Chen	e81c8676e3	MatMulNBits + Add fusion (#20587 ) - Add MatMulNBits Bias input - Add graph transformer to fuse MatMulNBits + Add	2024-05-16 11:00:59 -07:00
Yifan Li	47a178b518	[EP Perf] Fix on EP Perf (#20683 ) ### Description <!-- Describe your changes. --> * Partially revert [previous change](https://github.com/microsoft/onnxruntime/pull/19804), and * Redo concurrency_test_result parser outside of post.py * Add support of syncing memtest result to db ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To fix the error when CI is running on two model groups. - When running on two model groups, the [previous change](https://github.com/microsoft/onnxruntime/pull/19804) wrongly navigates two levels up in the directory after running one model group, while one level is needed. After that, the script can't find another model group. - Running on one model group can't repro the issue	2024-05-15 21:38:52 -07:00
Jian Chen	d1e66f0446	Increase NPM ComponentDetection.Timeout: 1200 (#20681 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-15 13:41:59 -07:00
Jian Chen	87ed1e3e3f	Component governance fix round 2 (#20679 )	2024-05-14 17:15:15 -07:00
Edward Chen	113aa2992f	Update React Native CI (#20673 ) - Move iOS package build to separate job so it can run in parallel with Android AAR build and be decoupled from the test stage. The test stage fails sometimes (not infrequently) and may need to be re-run. - Update stop iOS simulator step so it doesn't fail if the start step doesn't run.	2024-05-14 14:10:56 -07:00
Jian Chen	83a871f890	Fix critical and High issues from Component Governance (#20611 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-14 09:17:23 -07:00
Hector Li	0e11d0c4f8	Enable Qnn nuget nightly (#20662 ) ### Description Enable Qnn nuget nightly	2024-05-13 21:28:43 -07:00
Yi Zhang	c131ea89e1	Nuget Publish pipelines should be trigger by rel-* automatically too. (#20652 ) ### Description And Set allowPackageConflicts = True `#allowPackageConflicts: false # boolean. Optional. Use when command = push && nuGetFeedType = internal. Allow duplicates to be skipped. Default: false.` https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/nuget-command-v2?view=azure-pipelines Once the publish patial failed, we don't need to rerun the whole package generation workflow.	2024-05-13 13:18:16 -07:00
Edward Chen	90d49ccb9a	Allow path pattern to be specified in package_release_tasks.py. (#20650 ) Do more in the Python helper script so the Bash code in the release definition can be simplified.	2024-05-13 09:16:04 -07:00
Jian Chen	4fe565a62a	Java CUDA 12 support (#20583 ) ### Description - This PR combine all CUDA 12 stage into the Zip-nuget-... pipeline. - It also enables the cuda12 support ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-10 14:16:22 -07:00
George Wu	a0c4bd4da7	[qnn ep] sign onnxruntime.dll/pyd for qnn packages (#20634 ) sign only onnxruntime.dll and onnxruntime_pybind11_state.pyd in packages.	2024-05-09 20:45:44 -07:00
Yi Zhang	5a18818e1d	Migrate training storage from SAS to managed identity (#20618 ) ### Description orttrainingtestdatascus has only save mnist whose size is only 64M in Azure File To meet security requirements and reduce maintenance cost, move the test data to lotusscus and saved in Azure blob.	2024-05-09 15:44:29 -07:00
Jian Chen	d1cbb3e076	The time for nuget pkg should be consistent (#20522 ) This pull request primarily involves changes to the build scripts in the `tools/ci_build/github/azure-pipelines` directory. The changes add build date and time information to the build process. This is achieved by introducing two new parameters, `BuildDate` and `BuildTime`, and incorporating them into the `msbuildArguments` in multiple locations. Addition of new parameters: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R309-R310): Added `BuildDate` and `BuildTime` parameters using the pipeline's start time. Incorporation of new parameters in `msbuildArguments`: * [`tools/ci_build/github/azure-pipelines/c-api-noopenmp-packaging-pipelines.yml`](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948): Added `CurrentDate` and `CurrentTime` parameters to `msbuildArguments` in multiple locations. [[1]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948) [[2]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1092-R1093) [[3]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1114-R1115) [[4]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1137-R1138) * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L446-R448): Incorporated the `CurrentDate` and `CurrentTime` parameters into `msbuildArguments`.### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-09 11:35:45 -07:00
Edward Chen	a0db2187ee	Update CocoaPods package release script. (#20608 ) - Update method for uploading to Azure storage to use managed identity. - Allow helper script tasks to be split across different calls. - Rewrite helper script in Python. Motivation: Recently the Azure storage account configuration was changed and now the old way of uploading to it no longer works.	2024-05-08 16:17:26 -07:00
Changming Sun	08b637350a	Remove an extra space in azure_scale_set_vm_mount_test_data.sh (#20584 )	2024-05-08 09:46:50 -07:00
Scott McKay	8d09baf49f	Clarify when protobuf dependency builds protoc (#20542 ) ### Description <!-- Describe your changes. --> Currently figuring out if the protobuf dependency is building protoc it is a little obtuse and inconsistent * in some places we directly set protobuf_BUILD_PROTOC_BINARIES to OFF to indicate the protobuf dependency is not building protoc * e.g. macOS/iOS/visionOS builds * for a user provided protoc path we don't set protobuf_BUILD_PROTOC_BINARIES, and inside protobuf_function.cmake that determines if `protobuf::protoc` is added as a dependency or not * `0dda8b0c44/cmake/external/protobuf_function.cmake (L40-L45)` To be more consistent/explicit, set protobuf_BUILD_PROTOC_BINARIES to OFF when ONNX_CUSTOM_PROTOC_EXECUTABLE set and valid. Remove outdated script that built and external protoc binary which was used in later builds. The build setup will fetch a pre-built protoc so there's no need for this additional build. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make it easier to figure out if protoc is coming from the protobuf dependency.	2024-05-08 08:30:11 +10:00
aciddelgado	4e27841bdb	fix gqa cpu nan bug (#20521 ) ### Description There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case. ### Motivation and Context this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error.	2024-05-07 15:19:26 -07:00
Chi Lo	c86476a636	[TensorRT] adapt for TRT lib name change after TRT 10 GA (update) (#20550 ) https://github.com/microsoft/onnxruntime/pull/20445 The nvonnxparser still needs major version appending to it when building oss parser.	2024-05-06 15:00:13 -07:00
Adrian Lizarraga	0dda8b0c44	[QNN EP] Update QNN SDK to 2.21 (#20534 ) ### Description - Updates QNN pipelines to use QNN SDK 2.21 - Downloads QNN SDK from Azure storage to avoid having to rebuild images when a new version is released. ### Motivation and Context Test with the latest QNN SDK.	2024-05-01 20:17:35 -07:00
Scott McKay	f9febc4f35	Remove usage of 'required reason' iOS API from protobuf (#20529 ) ### Description <!-- Describe your changes. --> Using certain APIs is about to require a [privacy manifest](https://developer.apple.com/documentation/bundleresources/privacy_manifest_files/describing_use_of_required_reason_api) to be added to a package. Our version of protobuf uses `mach_absolute_time`. Patch as per https://github.com/protocolbuffers/protobuf/pull/15662/ to remove usage. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Usage of API will require a privacy manifest for an iOS app to be accepted as of 5/1/2024 #20519	2024-05-02 08:21:08 +10:00
Yifan Li	29417762f7	[TensorRT EP] support TensorRT 10-GA (#20506 ) ### Description <!-- Describe your changes. --> This branch is based on rel-1.18.0 and supports TensorRT 10-GA. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-01 11:10:53 -07:00
Hector Li	755aaea9a6	Qnn nuget update (#20527 ) ### Description Update Qnn nuget package to include Qnn libs and license file	2024-04-30 22:12:53 -07:00
Yi Zhang	91baeb8495	Reduce downloads to NodeJS to mitigate random connection exception. (#20518 ) ### Description There was connection exception in docker build in package pipeline ``` 48.26 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail 456.0 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2) ``` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=453140&view=logs&j=f9f5b320-fa10-56c4-debe-61ea69c74793&t=1656e225-defa-5b12-8935-2a0a93e76a67&s=3c85d903-a183-5028-775e-d63999fcc9ae In fact, docker image shouldn't be rebuilt this time. Checked the code, The docker image tag in Linux_C_API_Packaging_GPU_x64 of onnxruntimecuda${{ variables.CUDA_VERSION_MAJOR }}build was same as the image tag of Linux-gpu-ci-pipeline, but their docker files are different. So changing the Linux GPU pipeline's image tag to avoid packaging pipeline docker image overridden unexpectedly.	2024-05-01 09:04:56 +08:00
Rachel Guo	8c31f27dd1	Catalyst nuget package .NET changes only (#20424 ) ### Description <!-- Describe your changes. --> https://github.com/microsoft/onnxruntime/pull/20418 Add back Catalyst changes only for now. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-04-29 15:39:48 -07:00
Scott McKay	923b0ef323	Run fuzz testing before the CG task cleans up the build directory (#20500 ) ### Description <!-- Describe your changes. --> Update order of steps ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI	2024-04-29 16:02:53 +10:00
Rachel Guo	ff505b9f44	Follow up fix for #20472 (#20484 ) ### Description <!-- Describe your changes. --> Error: *Artifact name input: e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss) ##[error]Artifact name is not valid: e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss). It cannot contain '\', /', "', ':', '<', '>', '\|', '', and '?'** Date not correctly showing up in the artifact name. Use predefined pipeline variable BuildNumber instead which also serves similarly as a timestamp. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> RN CI failure --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-04-27 13:42:24 +10:00
Rachel Guo	88904b9220	Add unique identifier to e2e_test_logs artifacts in react-native-ci.yml (#20472 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-26 22:20:10 +10:00
Scott McKay	aa27dadd1c	Use download.onnxruntime.ai in podspec (#20474 ) ### Description <!-- Describe your changes. --> Update to more generic url ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-26 20:28:54 +10:00
Yi Zhang	464f199b95	Extend mac package jobs time out limit (#20459 )	2024-04-25 10:13:13 -07:00
Yi Zhang	e5947f5729	Two improvements in pipelines (#20449 ) ### Description 1. Update the image name to avoid docker image wouldn't be overwrite. there was an mistake that variables.CUDA_VERSION_MAJOR is always empty `14fcf0a52d/tools/ci_build/github/azure-pipelines/stages/nuget-linux-cuda-packaging-stage.yml (L120)` 3. set one artifact name as variable to make the job rerunnable ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-25 10:15:40 +08:00
Scott McKay	a46bab6364	Update podspec url to use AFD hostname (#20452 ) Update to use AFD url when generating podspec	2024-04-24 09:37:24 -07:00
Rachel Guo	14fcf0a52d	Support visionos build (#20365 ) ### Description <!-- Describe your changes. --> This PR supports a build of onnxruntime.xcframework for xros/xrsimulator for visionos via the build command of `python3 tools/ci_build/github/apple/build_apple_framework.py --config Release/Debug tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`. For officially include visionos in ios cocoapods package and testing in CI, would require separate work for upgrading the Xcode version & upgrade macOS CI agent to macos-13-arm64 or higher. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> visionos support: https://github.com/microsoft/onnxruntime/discussions/19313 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-23 18:15:07 -07:00
Yulong Wang	5055dc0aa8	[js/web] add diagnose log for chrome (#20439 ) ### Description Add logs to further diagnose the pipeline issue.	2024-04-23 17:18:54 -07:00
Edward Chen	76461c8f4d	Increase timeout for iOS packaging pipeline jobs. (#20434 )	2024-04-23 11:55:55 -07:00
Yi Zhang	7ebc653f04	Revert "Nuget .NET changes for Mac Catalyst (#19923 )" (#20418 ) This reverts commit `f396748ed6`. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 15:08:12 +08:00
Adrian Lizarraga	e6a677f6b7	[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359 ) ### Description - Updates Windows QNN Nuget and Python packaging pipelines to download QNN SDK from blob storage. - Makes the QNN SDK version configurable when launching the python packaging pipeline. ### Motivation and Context Removes the need to rebuild images to update QNN SDK. Only applies to Windows pipelines. Linux pipelines still get the SDK from disk.	2024-04-22 22:32:55 -07:00
Yi Zhang	197b3f1d90	Enable Whisper Test with OMP_FFMPEG (#20402 ) ### Description Installing OMP_FFMPEG in the docker and Readd Whisper Test Download OMP_FFMPEG in restricted accessed Azure blob.	2024-04-22 10:55:56 -07:00
Yulong Wang	a457c1df80	upgrade emsdk to 3.1.57 (#20295 ) ### Description upgrade emsdk to 3.1.57	2024-04-19 23:05:18 -07:00
Rachel Guo	f396748ed6	Nuget .NET changes for Mac Catalyst (#19923 ) ### Description <!-- Describe your changes. --> Add Nuget package changes for adding new 'net6.0-maccatalyst' platform. The output ORT Nuget package was manually tested and verified in a .NET MAUI app setup. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-19 14:20:03 -07:00
sfatimar	4d1963c2a2	OpenVINO EP Rel 1.18 Changes (#20337 ) ### Description These changes include Support to OpenVINO 2024.1 Import PreCompiled Blobs with EPContext Blob Separate Device/Precision as input Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU AUTO GPU, CPU will only create GPU Blob and not CPU Blob. ### Motivation and Context - OpenVINO 2024.1 will be out soon - Import Precompiled Blob can greatly reduce FEIL/FIL Time. - Separating Device/Precision will make the input cleaner - --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2024-04-19 00:31:38 -07:00
Patrice Vignola	12569626cb	Update DML to 1.14.1 (#20380 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 22:43:41 -07:00
Chi Lo	a747a00cd3	[TensorRT EP] Use protobuf with debug build on Windows (#20378 ) TRT EP implicitly uses oss_parser with debug build on Windows, therefore it should use protobuf rather than protobuf-lite.	2024-04-18 19:39:08 -07:00
Patrice Vignola	745b426c60	[DML] Update DML to 1.14 (#20304 ) I am prefiring this change to pre-run the non-dml checks, and also to give folks the time to review it before DML gets released. When DML 1.14 officially releases, we'll only need to run the DML pipeline to automatically pick up the nuget package. This should save us some valuable time. Note that DML 1.14 is the release needed for ORT 1.17.4, and DML 1.15 will come soon after.	2024-04-18 16:22:57 -07:00
Yulong Wang	3577a4bd02	[Node.js binding] Allow installation to download CUDA binaries via script (#20364 ) ### Description Currently we try to include all prebuilt binaries into the NPM packages. This was working until we added libonnxruntime_providers_cuda.so (>400MB) into the NPM package. The NPM registry refuses to accept new package publishment because the file is too large. To make the new NPM package working, we have to remove the large file from the package, and add a new script on package installation. This script will try to dynamically install onnxruntime CUDA dynamic library for Linux/x64.	2024-04-18 13:44:42 -07:00
Patrice Vignola	76434907fb	[DML EP] Add graph capture (#20257 ) This adds a new "Graph Capture" option to the DML ep, similar to the cuda graph functionality. Here's how graph capture works: - A user can enable graph capture in the session options by setting `ep.dml.enable_graph_capture` to `true` - When they want to capture a run, they set `gpu_graph_id` in their `RunOptions` to a number bigger than 0 (0 is reserved for internal use according to the cuda graph documentation). - Then, when they start the inference, the graph will be captured and stored in the DML EP for future use - When they execute the run for a second time with the same id, the `ReplayGraph` function in the DML EP will be called instead of executing the kernels, resulting in very low overhead and avoiding kernel recompilation. This feature can give up-to-par or even better performance than specifying the static dimensions at session creation time, but is also much more flexible.	2024-04-18 10:15:00 -07:00
Yi Zhang	4d2b98155f	More fixes on random connection excepiton in Mac Build. (#20328 ) ### Description supplement of #20322 ### Motivation and Context Fixes random connection exceptions in Mac build in Python Packaging Pipeline https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443617&view=logs&j=5849a411-e258-5ce5-39bd-7b65d44961a0&t=ccb871c8-76d9-5e80-55b0-4279efd5567f and IOS full xcframework https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443458&view=logs&j=370fd1a2-3dec-5916-4d2c-8aae58c72d28&t=686352ba-ee61-5ad4-8739-e8abd07372a4&s=e9aa87c8-a9ad-51f7-3b12-045ecc319776	2024-04-17 08:37:56 +08:00
dependabot[bot]	7354f3cdd8	Bump transformers from 4.36.0 to 4.38.0 in /tools/ci_build (#20272 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM</h2> <h2>New model additions</h2> <h3>💎 Gemma 💎</h3> <p>Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via <code>AutoModelForCausalLM</code>, <code>GemmaForCausalLM</code> or <code>pipeline</code> interface!</p> <p>Read more about it in the Gemma release blogpost: <a href="https://hf.co/blog/gemma">https://hf.co/blog/gemma</a></p> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <p>You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !</p> <ul> <li>Flash Attention 2</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", torch_dtype=torch.float16, attn_implementation="flash_attention_2" )</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <ul> <li>bitsandbytes-4bit</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", load_in_4bit=True ) </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08ab54ada5`"><code>08ab54a</code></a> [ <code>gemma</code>] Adds support for Gemma 💎 (<a href="https://redirect.github.com/huggingface/transformers/issues/29167">#29167</a>)</li> <li><a href="`2de9314197`"><code>2de9314</code></a> [<code>Maskformer</code>] safely get backbone config (<a href="https://redirect.github.com/huggingface/transformers/issues/29166">#29166</a>)</li> <li><a href="`476957b5b4`"><code>476957b</code></a> 🚨 Llama: update rope scaling to match static cache changes (<a href="https://redirect.github.com/huggingface/transformers/issues/29143">#29143</a>)</li> <li><a href="`7a4bec6e8f`"><code>7a4bec6</code></a> Release: 4.38.0</li> <li><a href="`ee3af60be0`"><code>ee3af60</code></a> Add support for fine-tuning CLIP-like models using contrastive-image-text exa...</li> <li><a href="`0996a10077`"><code>0996a10</code></a> Revert low cpu mem tie weights (<a href="https://redirect.github.com/huggingface/transformers/issues/29135">#29135</a>)</li> <li><a href="`15cfe38942`"><code>15cfe38</code></a> [<code>Core tokenization</code>] <code>add_dummy_prefix_space</code> option to help with latest is...</li> <li><a href="`efdd436663`"><code>efdd436</code></a> FIX [<code>PEFT</code> / <code>Trainer</code> ] Handle better peft + quantized compiled models (<a href="https://redirect.github.com/huggingface/transformers/issues/29">#29</a>...</li> <li><a href="`5e95dcabe1`"><code>5e95dca</code></a> [<code>cuda kernels</code>] only compile them when initializing (<a href="https://redirect.github.com/huggingface/transformers/issues/29133">#29133</a>)</li> <li><a href="`a7755d2409`"><code>a7755d2</code></a> Generate: unset GenerationConfig parameters do not raise warning (<a href="https://redirect.github.com/huggingface/transformers/issues/29119">#29119</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.36.0&new-version=4.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 14:21:12 -07:00
Yi Zhang	caf692e626	[Fix] Random connection exceptions in MacOS_C_API_Packaging_CPU stage (#20322 ) ### Description Add download_deps to reduce downloading from 3rd party websites. ### Motivation and Context Fix frequent random exception like ``` CMake Error at abseil_cpp-subbuild/abseil_cpp-populate-prefix/src/abseil_cpp-populate-stamp/download-abseil_cpp-populate.cmake:162 (message): Each download failed! error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20240116.0.zip' failed status_code: 35 status_string: "SSL connect error" log: --- LOG BEGIN --- Trying 20.29.134.23:443... Connected to github.com (20.29.134.23) port 443 ALPN: curl offers h2,http/1.1 (304) (OUT), TLS handshake, Client hello (1): [315 bytes data] CAfile: /etc/ssl/cert.pem CApath: none Recv failure: Operation timed out LibreSSL/3.3.6: error:02FFF03C:system library:func(4095):Operation timed out Closing connection ``` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443278&view=logs&j=006a7a04-d43b-5fe1-df02-ecafb79c4d6e&t=110edd38-9f3b-50cf-b328-8ed0f915e5c1 --------- Co-authored-by: Yi Zhang <your@email.com>	2024-04-16 13:28:18 +08:00
Edward Chen	287ecea2f1	Fix binary size check build publish step. (#20298 ) Add `--user` option to pip install command. Error: ``` ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py' Consider using the `--user` option or check the permissions. ``` See #19877.	2024-04-15 10:15:42 -07:00
liqun Fu	cd7112f800	Integration with ONNX 1.16.0 (#19745 ) ### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-04-12 09:46:49 -07:00
Yifan Li	9577fe454d	[EP Perf] Customize onnx-tensorrt commit id when init CI tasks (#20175 ) ### Description <!-- Describe your changes. --> Customize commit id of onnx-tensorrt in EP Perf CI variables when testing OSS parsers in different versions ### To Verify ![image](https://github.com/microsoft/onnxruntime/assets/109183385/9dc650d8-377d-4223-8951-f0849b1fe984) After assigning `onnxTensorrtCommitId` in EP Perf CI Variables, CI would prompt during the step of [Build latest ORT Image with TensorRT OSS parser](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396450): ``` Updated deps.txt with new commit id a43ce67187bab219520fd80f21af8bbd4354bc8c and hash 572535aefef477050f86744dfab1fef840198035 ``` And CI would [overwrite the line of onnx_tensorrt in deps.txt](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396451) which was assigned as: ``` onnx_tensorrt;`a43ce67187`.zip;572535aefef477050f86744dfab1fef840198035 ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To save time of modifying deps.txt and manually calculating zip hash	2024-04-10 09:46:05 -07:00
Yi Zhang	0acde1157a	Set parallel count to avoid OOM in training GPU packaging pipeline (#20255 ) ### Description make the compilation work on Azure CPU Agent by reduce the parallel count ### Motivation and Context The OOM issue mentioned in #20244 was caused the by low memory/parallel_count.	2024-04-10 14:05:53 +08:00
Yi Zhang	14d7872ce9	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 ) ### Description It always has been out of memory in training CUDA 12.2 packaging pipeline https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary since the PR #19910 I tried other CPU agents for example, D64as_v5(256G memory) and D32as_v4(128G memory and 256 G SSD temp storage), which are still out of memory like the below image ![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083) But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp storage, and it takes much more time. ### Motivation and Context Restore CUDA 12.2 training packaging pipeline first. More time is needed to investigate the root cause ### Other Clues. These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4 And it runs out of memory on CPU machine. @ajindal1 cuda12.2 on T4 ``` 2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o 2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o 2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o ``` But they could be finished in about one minute with Cuda 11.8 on CPU ``` cuda11.8 on CPU 2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o cuda11.8 on GPU 024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o ```	2024-04-10 09:21:40 +08:00
Adrian Lizarraga	05d97e8d18	Update QNN python packages to use QNN SDK version 2.19.2 (#20213 ) ### Description Update QNN python packages to use QNN SDK version 2.19.2. ### Motivation and Context Our CI builds already use QNN SDK version 2.19.2. We should make sure the ort-nightly-qnn python packages are also built with the same QNN SDK version.	2024-04-05 17:15:25 -07:00
Yi Zhang	23a5d0a305	Extend time out in Windows GPU packaging jobs (#20207 ) ### Description Extend Windows GPU Packaging job building time out to 6 hours, and test stage to 3 hours. ### Motivation and Context There're still a few timeout issues after refactoring. The probability is about 20% in https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84. I found the building could be finished in 4 hours if it becomes slow, https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f, although in most cases, it only take about 30 minutes. Not like before, the building couldn't be completed. So, In this PR, I extend the timeout to 6 hours. And one interesting thing, if one windows GPU job becomes slow, all other windows GPU jobs in the same run become slow too. So I doubt it has something with the ADO or virtualization. That is, it's not completely random. https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841	2024-04-06 08:03:42 +08:00
Thomas Boby	254bdbb19d	OneDNN/dnnl: Fix filepath after dnnl move (#20086 ) ### Description This adjusts the path used in the nuget script for dnnl to the new location of the file. There isn't a CI pipeline for this as far as I can tell, and I can't easily confirm this change works on master, so please check. ### Motivation and Context It is currently not possible to build onednn nuget packages. It's possible that the correct action would be to move the file not fix this path, but I'm not familiar enough with the repository layout. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-04-04 21:24:49 -07:00
Yi Zhang	4ea54b82f9	[Fix] Upload training CUDA daily wheel (#20183 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-03 13:18:26 +08:00
Yi Zhang	523ef04240	enable lto in Python-CUDA-Packaging Pipline (#20164 ) ### Description Except [Python-CUDA-Packaging pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary), all windows cuda packaging jobs have been running well now. After comparison, enable_lto isn't added in the pipeline, which might be one root cause of the random hang. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-01 15:42:28 +08:00
Jeff Bloomfield	2f31560430	Enable generic feature level devices in DML EP (#20114 ) ### Description Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX headers through the DirectX-Headers repo. Note that this includes an update to cgamanifest.json for onnx-tensorrt which is triggered during re-generation due to a prior changes to deps.txt. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-29 14:37:30 -07:00
Adam Pocock	2f82400b13	[java] Java 21 build support (#19876 ) ### Description Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to allow compiling ORT on Java 21. The build still targets Java 8. I'm not sure if there will be CI changes necessary to use this PR, specifically for the Gradle version as I don't know if that is cached somewhere earlier in the CI build process. The new Gradle version adds a warning that using `--source` and `--target` to select the Java language version is obsolete which is annoying, we can fix it if we decide to only allow building on newer versions of Java, while still supporting running on Java 8. ### Motivation and Context Java 21 is the latest LTS release of Java and ORT should be able to build on it.	2024-03-28 15:51:22 -07:00
Yi Zhang	f7b52d2e3e	[Fix] Only copy java files when build_java is True (#20121 ) ### Description ### Motivation and Context Fix error in Nuget-CUDA-Packaging-Pipeline	2024-03-28 14:06:28 -07:00
Yi Zhang	2a38168f0b	increase cl mpcount since Compilation is moved on CPU machine (#20116 ) ### Description The CPU machine has 16 cores, so we can increase the parallel count. Compared with 2 runs. 1. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=432328&view=results 2. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=432331&view=results The compilation took about 25 minutes if the parallel count is 15, while it took 41 minutes if the parallel count is 3 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 13:30:33 +08:00
Yi Zhang	c5d7310f1b	Remove TSA upload in testing stage (#20115 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 13:15:03 +08:00
Yi Zhang	8f069f81c4	Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080 ) ### Description reactor win-ci.yml to solve the random hang issue in more GPU workflows, move nugget-zip packages and python cuda12 packages building to CPU machine. --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 12:55:44 +08:00
Dmitri Smirnov	b95fd4e644	Enable CUDA EP unit testing on Windows (#20039 ) ### Description Address build issues and source code discrepancies. Fix cuda_test_provider gtest argument stack corruption. ### Motivation and Context `OpTester` class that is widely used for kernel testing is not suitable for testing internal classes for EPs that are built as shared objects. Currently, CUDA EP tests run only on Linux. We want to enable testing and developments on Windows, and create a usable pattern for testing of other EPs internals. Alternatives considered: Abstracting EP unit tests into separate test executable such as `onnxruntime_test_all`. This alternative was rejected as it would create a lot more changes in the established patterns, and potentially interfere with CUDA functionality with more complex source code maintanence.	2024-03-27 13:32:36 -07:00
Yi Zhang	ab2eaedfaa	Install ONNX by buildling source code in Windows DML stage (#20079 ) ### Description In #20073, I use pin onnx version to unblock the whole PR CI. In fact, we could use the onnx that installed by building source code, that the onnx version is controlled by deps.txt. For some history reason, DML stage installed onnx from pypi. Now, the onnx can be installed as other stages. add an option to skip installing onnx in win-ci-prebuild-step	2024-03-27 12:29:34 -07:00
Yi Zhang	4df9d16f98	[Fix] TSAUpload task must be in building stage (#20098 ) ### Description In #20085, TSAUpload was in testing stage so main branch failed.	2024-03-27 12:20:57 -07:00
Yulong Wang	47903e701a	fix condition in web CI YAML (#20095 ) ### Description fix condition in web CI YAML	2024-03-27 10:35:43 -07:00
Yi Zhang	0561b9576e	Fix and Refactor Python Packaging Pipeline (#20085 ) ### Description Make Windows GPU Packaging stage in Python Packaging pipeline run on CPU machine as well ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Test Link https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430961&view=results	2024-03-27 12:17:22 +08:00
Yulong Wang	0313dd1f65	Update Web CI to use data dir under Agent.TempDirectory (#20074 ) ### Description Update Web CI to use data dir under Agent.TempDirectory This change fixes the random failure caused by unstable access to karma temp directory (which is under AppData\Local\Temp) on CI pipeline	2024-03-26 13:16:59 -07:00
Baiju Meswani	40efbd6c37	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
Yi Zhang	0906c57c9e	Pin Onnx Version (#20073 ) ### Description 1. change in build.py is to fix DML exception (https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=10&_a=summary) 2. change in requirements.txt is to fix exception in python packaging pipeline. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430433&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-26 17:59:46 +08:00
sfatimar	eab35c20fc	Ort openvino npu 1.17 master (#19966 ) ### Description Add NPU to list of device supported. Added changes for Support to OV 2024.0 Nuget packages removes packaging of OpenVINO DLL Bug Fixes with Python API Reverted Dockerfiles not being maintained. ### Motivation and Context NPU Device has been introduced by Intel in latest client systems OpenVINO 2024.0 release is out. --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com> Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com> Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>	2024-03-21 18:44:00 -07:00
Yi Zhang	cd6d3aea45	Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989 ) ### Description 1. Move building on CPU machine. 2. Optimize the pipeline 3. Since there isn't official ONNX package for python 12, the python 12 test stage uses the packages built with ONNX source in build stage. ### Motivation and Context 1. Resolve the random hang in compilation 4. Save a lot of GPU resources. ---------	2024-03-22 09:16:00 +08:00
Yi Zhang	30a0d80925	Fix exception in Publish unit test results step (#20007 ) ### Description Test results files are all in RelWithDebInfo\RelWithDebInfo directory. It's not necessary to stat the directory of _deps ### Motivation and Context Recently this exception in zip-nuget pipleine occurs many times. `##[error]Error: Failed find: EPERM: operation not permitted, stat 'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b	2024-03-22 06:53:59 +08:00
Yi Zhang	175f149b30	Remove downloading deps in CUDA package test stage (#19993 ) ### Description <!-- Describe your changes. --> ### Motivation and Context downloading deps is not needed in test stage remove it to reduce random downloading errors	2024-03-21 10:01:03 +08:00
Yufeng Li	15219e2e71	turn on neural_speed by default (#19627 ) ### Description <!-- Describe your changes. --> the crash caused by the neural_speed turns out to be a very corn case. Turn it on by default. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-20 12:49:58 -07:00
Rachel Guo	6b305f95e0	Support xcframework for mac catalyst builds. (#19534 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> MAUI on macOS uses mac-catalyst which requires a different native binary. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-03-20 10:55:19 -07:00
Yi Zhang	8adbc09314	[Fix] Error Python Packaging Pipeline (Training CPU) (#19992 ) ### Description fix the error caused by https://github.com/microsoft/onnxruntime/pull/19973	2024-03-20 09:02:50 -07:00
mindest	3dfe4a5e6d	[ROCm] Remove MPI dependency and collectives to use NCCL (#19830 ) ### Description * Remove MPI dependency to use NCCL AllReduce, etc. * Exclude unsupported collectives in hipify	2024-03-19 17:35:18 -07:00
Hariharan Seshadri	cd6ec50b50	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
Yi Zhang	d4c8bc359e	Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973 ) ### Description The docker image name was fixed, but the docker argument was different in different job. It would trigger rebuilding the docker image almost every time!!!	2024-03-19 09:33:24 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Yifan Li	0b2a75b274	[EP Perf] Add concurrency test (#19804 ) ### Description <!-- Describe your changes. --> * Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner) * Model: FasterRCNN-10 model within CI image * `-c` param configurable via CI panel when kicking off CI tasks * Auto-replicate test input/outputs according to `-c` param * By default, the model test will be executed in 100 iterations (~2min added to T4 CI task load overall) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To monitor potential concurrency issues of ORT-TRT	2024-03-15 07:41:21 -07:00
Justin Chu	bcf47d3546	Update install_deps_lort.sh to fix onnxscript installation (#19922 ) Install onnxscript correctly with `pip install`. Dev dependencies are not required. ### Motivation and Context Fix build breaks.	2024-03-14 17:05:50 -07:00
Adam Louly	32558134a9	[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770 ) ### Description Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers ### Motivation and Context This PR includes changes that will make ort handle 2GB+ checkpoints. To do that we need to upgrade flatbuffers to 23.5.9 - https://github.com/google/flatbuffers/pull/7945 - Modified the commitHash and the hash for the new version - Removed the patch for rust generator's unused variable warning as it is no longer producing this - [Check it out here](`d121e09d89/src/idl_gen_rust.cpp`) - Updated the VerifyField calls with alignment values that were introduced in the new version. --------- Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2024-03-14 16:36:24 -07:00
Yi Zhang	87a9f77c56	Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910 ) ### Description 1. Use stage to organize the pipeline and split building and testing 2. Move compilation on CPU machine 3. test stage can leverage existing artifacts 4. check wheel size, it gives warning if the size above 300M 5. docker image name wasn't change even the argument changed, which caused the docker image was always rebuilt. So update the docker image name according to the argument can save the docker build time. Pipeline duration reduced by 60% (2 hours -> 50 minutes) Compilation time reduced by 75% (1.5hours -> 20 minutes) GPU time reduced by 87% ( 8 hours to 1 hours) for debugging, the GPU time could be reduced by above 95%, because we can choose run only one test stage and skip building. ### Motivation and Context Make the pipeline efficient. Optimized https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results Curent https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results ---------	2024-03-15 06:47:41 +08:00
Changming Sun	8b766bd24e	Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919 ) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. Similar to #19909 ### Motivation and Context As a follow up of #19118	2024-03-14 15:07:56 -07:00
Changming Sun	ea4a5eea18	Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build (#19909 ) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. ### Motivation and Context As a follow up of #19118	2024-03-14 07:55:00 -07:00
Yulong Wang	e771a763c3	[js/test] align web test runner flags with ort.env (#19790 ) ### Description the `npm test` flags are difficult to memorize, because they are different to the `ort.env` flags. This change makes those flags align with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`. Old flags are marked as deprecated except `-x` (as a shortcut of `--wasm.numThreads`)	2024-03-13 12:00:36 -07:00
Yi Zhang	d5d9dbd51d	reuse T4 on Linux GPU (#19879 ) ### Description ### Motivation and Context Linux GPU test on A10 isn't very stable	2024-03-13 10:41:36 -07:00
Hariharan Seshadri	ed306b4f97	Fix Android CI pipeline (#19877 )	2024-03-13 10:09:43 -07:00
Justin Chu	faea42af95	Bump ruff to 0.3.2 and black to 24 (#19878 ) ### Motivation and Context Routing updates	2024-03-13 10:00:32 -07:00
Yi Zhang	9e0a0f0f32	Check whether required tests are executed. (#19884 ) ### Description Check the onnx node tests and model tests worked ### Motivation and Context onnx node test data and model data are mount in one dir. And onnxruntime_test_all search the dir and load the data. If the dir does exist or there's some change in onnxruntime_test_all, those tests may not be executed. For example, all onnx node test data is 32M. It's hardly for us aware of the regression. So I add the simple check to ensure those tests are executed. --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-13 09:59:57 -07:00
Yi Zhang	7313aa4efe	Remove --extra-index-url (#19885 ) ### Description <!-- Describe your changes. --> ### Motivation and Context --extra-index-url is not allowed by injected Secure Supply Chain Step in packaging pipelines. ``` > Starting Multifeed Python Security Analysis: ##[warning]tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml - Found "extra-index-url". (https://aka.ms/cfs/pypi) ``` And those 2 packages can be installed from PyPI as well now. Co-authored-by: Yi Zhang <your@email.com>	2024-03-13 09:45:22 -07:00
Edward Chen	860eb762c2	[Apple framework] Fix minimal build with training enabled. (#19858 ) Fix some linker errors that come up when integrating the onnxruntime-training-c pod into another Xcode project. The problematic configuration is a minimal build with training APIs enabled. - training_op_defs.o had some unresolved references to ONNX functions. It should not be included at all in a minimal build. - tree_ensemble_helper.o also had unresolved references to ONNX ParseData. The containing function is unused in a minimal build. Added a test to cover this configuration.	2024-03-12 11:33:30 -07:00
Yi Zhang	d4fa4f0276	Remove FFmpeg to meet compliance (#19859 )	2024-03-12 09:06:59 -07:00
Changming Sun	5479124834	Remove remaining Windows ARM32 build jobs (#19840 ) ### Description As a follow up of #19788, remove more remaining Windows ARM32 build jobs. ### Motivation and Context Our nuget packaging pipeline is failing because it could not find an artifact for Win ARM32. ``` ##[error]Artifact onnxruntime-training-win-arm was not found for build 421397. ``` Deprecation of Win ARM32 was announced by Windows team in January 2023. We should follow it.	2024-03-11 11:25:11 +08:00
Yifan Li	069d2d6f54	[EP Perf] Update EP Perf dockerfiles with cuda12/cudnn9 (#19781 ) ### Description * Update name of existing dockerfiles and add support to test latest TensorRT EA binary located in the image * Add cuda 12.3/cuDNN 9/TensorRT 8.6 dockerfile * Add detail to CI prompts and configs Instruction to test latest TRT via BIN: 1. Select `BIN` in TensorRT Version 2. In Variables, update related tarCudaVersion, clear tarCudnnVersion (not required in latest TRT tar binary) , and path to binary.	2024-03-08 13:58:22 -08:00
Yifan Li	3170a48e60	[EP Perf] Add tag to indicate which TRT parser is using (#19784 ) ### Description * Add tag to distinguish if TRT `builtin` or `oss` parser is being used * `oss` tag will be inserted with onnx-tensorrt commit id, to indicate which version oss parser is ### Validate DB entry before/after this PR (during test, `builtin` or `oss_{commit_id}` tag was inserted in the database entries): ### Motivation and Context To distinguish perf results using builtin/oss parser in the database, this parser tag is needed. In future, results using different parsers will be listed in different Perf Dashboard pages.	2024-03-08 10:24:36 -08:00
Scott McKay	01c376a0b9	Update script to run CIs for a branch. (#19797 ) ### Description <!-- Describe your changes. --> - Support multiple include/exclude values. - e.g. can now run with `-i MacOS -i iOS` to run CIs for both Apple platforms. - Default to current branch if run from directory in repo. - make lazier usage possible ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve tools. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-03-08 17:52:47 +10:00
Ashwini Khade	e93a860819	Remove arm build for training (#19788 ) We no longer support Win arm 32 so removing the associated build and packaging job.	2024-03-05 21:54:48 -08:00
Scott McKay	db59cec82f	Don't reduce warning level for CUDA build on Windows (#19663 ) ### Description <!-- Describe your changes. --> Address warnings so all the ORT projects build with /W4 on Windows. Mainly - unused parameters - variables shadowing other ones ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #19588 started on this.	2024-03-06 15:03:55 +10:00
Yulong Wang	a788514027	[js/web] dump debug logs for karma for diagnose purpose (#19785 ) ### Description dump debug logs for karma for diagnose purpose. This is for debugging the CI issue of Chrome launch failure and considered temporary.	2024-03-05 18:27:26 -08:00
Yi Zhang	9460597b21	Update copying API header files (#19736 ) ### Description Make Linux logic consistent as Windows ### Motivation and Context onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip package `acbfc29f27/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml (L67)` Co-authored-by: Your Name <your@email.com>	2024-03-02 11:33:47 +08:00
Edward Chen	5672cdebdf	Update google benchmark to 1.8.3. (#19734 ) Update google benchmark to 1.8.3. Update deps_update_and_upload.py script to make it easier to use.	2024-03-01 11:01:58 -08:00
Changming Sun	ed550b5fe5	Change webgpu CI pipeline to use a preinstalled chrome (#19729 ) ### Description Change webgpu CI pipeline to use a preinstalled chrome. Hopefully it can increase the stability. Now the chrome got from puppeteer often failed to start.	2024-02-29 20:36:29 -08:00
Changming Sun	250779474d	Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698 ) ### Description The original one reports "out of disk space", which needs to be investigated.	2024-02-28 19:36:26 -08:00
Changming Sun	a93c31e3c9	Update dml-vs-2022.yml (#19687 ) ### Description Fix a build error in "Zip-Nuget-Java-Nodejs Packaging Pipeline" which deletes files too early.	2024-02-28 12:03:17 -08:00
Changming Sun	7a147fc6f7	Remove a bash task from webgpu CI pipeline (#19682 ) ### Description It is a "Bash" task that requires running bash on Windows. Most Windows operating systems do not have Bash installed. Given this task is only debugging purposes, we can remove it for now. ### Motivation and Context I am making this change because I am regenerating the VM image in a different manner, and the new image does not contain bash. Once this PR is in, I can switch the images.	2024-02-28 18:20:53 +08:00
Yi Zhang	f95c0773a1	Add share memory Flag in docker (#19672 ) ### Description ### Motivation and Context Ref: https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#setincshmem Co-authored-by: Your Name <your@email.com>	2024-02-28 10:40:40 +08:00
Scott McKay	1c468a03b9	Improve Nuget-CUDA-Packaging-Pipeline (#19668 ) ### Description <!-- Describe your changes. --> * Publish the artifacts as late as possible * once published the artifacts are immutable, and any retry will fail if they exist * if any step fails after publishing the stage cannot be retried * use powershell to cleanup * DeleteFiles is taking >30 mins and causing the stage to timeout * powershell took < 1s ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make pipeline more robust	2024-02-27 09:27:43 -08:00
Scott McKay	580ee20dfc	Tweak Windows build parallelization settings (#19664 ) ### Description <!-- Describe your changes. --> Use UseMultiToolTask and limit the number of cl.exe instances running. MultiToolTask info: https://devblogs.microsoft.com/cppblog/improved-parallelism-in-msbuild/ Info on why limiting CL_MPCount can help: https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows The current CIs have 4 cores (both physical and logical). Hardcoded the GPU build in win-ci.yml to use CL_MPCount of 2 as that seems to work fine. Can adjust if needed to base it on the actual number of cores or to use build.py to build. Caveat: I've run about 16 builds and haven't seen a slow build yet, but as the root cause of the slow builds isn't really known this isn't guaranteed to be a fix. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Try and prevent super slow GPU builds by reducing number of tasks potentially running in parallel.	2024-02-27 08:56:16 -08:00
Yi Zhang	3b46ab6439	Re-add testing removed by mistake. (#19647 )	2024-02-27 08:46:29 -08:00
Rachel Guo	5bb58a10e7	Enable the most verbose logging level in detox E2E React Native CI (#19659 ) ### Description <!-- Describe your changes. --> The RN CI has intermittent failure error with "app seems to idle". enable the most verbose logging level (and can add steps to dump device.log from the detox folder/artifacts if necessary) to at least get more information. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-02-26 20:00:14 -08:00
Changming Sun	18c8fab1ae	Fix a bug in build.py (#19652 ) ### Description Fix a bug in build.py that accidentally disabled C# tests for most builds when "--build_nuget" is specified. ### Motivation and Context The bug was introduced in PR #8892 .	2024-02-26 15:58:09 -08:00
Scott McKay	8bd943be39	Retry flaky XCode iOS UI tests if we get a known error (#19639 ) ### Description <!-- Describe your changes. --> Xcode UI tests seem to be flaky: https://github.com/orgs/community/discussions/68807 Add a couple of retries if we get a "Timed out while loading Accessibility." error which is transient. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-27 09:31:32 +10:00
Yi Zhang	0fcc6fb760	Add Whisper model in CI (#19604 ) ### Description Add Whisper Conversion and E2E into Big Models pipeline ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <your@email.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>	2024-02-25 14:04:22 +08:00
Yi Zhang	c980149c85	Add log for random exception in Linux GPU Test Stage. (#19569 ) ### Description 1. check GPU status in docker 2. use stages to make test stage can leverage existing building artifacts ### Motivation and Context To investigate the root cause of the random exception `CUDA failure 100: no CUDA-capable device is detected`	2024-02-24 13:00:53 -08:00
Scott McKay	c12a20bef9	Add helper to run CIs for a branch using `az pipelines`. (#16843 ) ### Description <!-- Describe your changes. --> Add helper to run CIs for a branch using `az pipelines`. This can be used to easily kick off multiple CIs for a branch prior to creating a PR. Update run_CIs_for_external_pr.py so the CI list can be shared. Request json output from `gh pr view` so the current state is more easily parsed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-24 14:06:30 +10:00
PeixuanZuo	6226c5f62f	[ROCm] Add SkipGroupNorm for ROCm EP (#19303 ) Add SkipGroupNorm for ROCm EP. --------- Co-authored-by: Peixuan Zuo <peixuanzuo@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-21 11:08:48 +08:00
Scott McKay	45e20bf781	Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578 ) ### Description <!-- Describe your changes. --> build.py sets a few parallelization parameters when building. Using msbuild directly lacks those. `7a5860e490/tools/ci_build/build.py (L1665-L1669)` Changed to use build.py. If there's a concern with that we _could_ set the parameters in the yaml, but that will be uglier due to duplicating logic in multiple places. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 10:38:37 +08:00
Yulong Wang	97ff17c2cb	update script of run CI for external PRs to add "Big Models" (#19576 ) ### Description update script of run CI for external PRs to add "Big Models"	2024-02-20 17:02:11 -08:00
Scott McKay	ec9c8cbdc9	Use xcode parallel build flags to speed up iOS CI that is timing out (#19570 ) ### Description <!-- Describe your changes. --> Provide specific xcodebuild flags instead of depending on cmake to do the right thing. This built in just over an hour with a ccache miss. Previous CIs with a ccache miss were timing out after 150 minutes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 07:40:35 +10:00
PeixuanZuo	f3e3b531fe	Update build directory clean up stage for python package pipeline (#19553 ) Fix to make clean up stage take effect. If the `SourceFolder ` is empty, the task deletes files from the root folder of the repository as though [$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables) was specified.	2024-02-20 10:31:39 +08:00
Adrian Lizarraga	4874a41008	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 ) ### Description Updates the default QNN SDK version to 2.19.2.240210. ### Motivation and Context Build and test the latest version of QNN SDK in our pipelines.	2024-02-16 16:59:43 -08:00
Tianlei Wu	1dce5e1732	Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541 ) ### Description Some test thresholds that previously worked in T4 GPU does not work anymore. The reason is current pipeline uses A10, and TF32 is enabled by default. Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random test failure. ### Motivation and Context Linux Test has random failure at tests: ProviderOptionsTest > testCUDAOptions() FAILED org.opentest4j.AssertionFailedError: array contents differ at index [446], expected: <0.0419757> but was: <0.041948937> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99) at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43) org.opentest4j.AssertionFailedError: array contents differ at index [6], expected: <0.0225981> but was: <0.022587791> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676) at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)	2024-02-16 14:41:11 -08:00
rui-ren	d63c664ca0	fix rocm ci pipeline (#19525 ) ### Description <!-- Describe your changes. --> ROCm CI pipeline issue. ``` Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20... main() File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset builder_instance.download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare self._download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare split_generators = self._split_generators(dl_manager, **split_generators_kwargs) File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators data_file = dl_manager.download_and_extract(self.config.data_url) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract return self.extract(self.download(url_or_urls)) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download downloaded_path_or_paths = map_nested( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested return function(data_struct) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download return cached_path(url_or_filename, download_config=download_config) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path output_path = get_from_cache( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache raise ConnectionError("Couldn't reach {}".format(url)) ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Update the `datasets` pipeline to latest version `2.17.0`.	2024-02-15 00:02:08 -08:00
Changming Sun	660f39aca5	Perf improvement for Intel MTL CPUs (#19524 ) ### Description See the comments inside of the changed files for more detailed information. The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc and onnxruntime/core/platform/windows/hardware_core_enumerator.h were copied from WinML source folder in this repo, with minor coding style changes. I had an offline discussion with Sheil. We agree that given the lack of a future proof solution, we may check-in this temp fix first, and rework it later. I will have a meeting with @ivberg for discussing the issue deeply, and seeking for a long term solution. Thanks for offering help, @ivberg ! ### Motivation and Context With this change, we will see about 2x perf improvement on some Intel CPUs.	2024-02-14 18:35:56 -08:00
Prathik Rao	3b03b2e046	Upgrade default ORTModule opset from 15 to 17 (#19315 ) ### Description <!-- Describe your changes. --> This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is the final opset supported by torchscript exporter (https://github.com/pytorch/pytorch/pull/107829) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Engineering excellence contribution for ORT Training DRI. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-14 11:19:33 -08:00
Yifan Li	5c7e6b2e2a	[EP Perf] Add CI option to enable TRT-OSS parser (#19448 ) ### Description <!-- Describe your changes. --> * Introducing CI option to enable TRT-OSS parser, during ep perf testing: ![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504) By default, open-sourced onnx-tensorrt parser listed under [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40) will be used if enabling this option. ### To verify this option and check the difference during ORT image build: If this option is enabled: <img width="649" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd"> If this option is not enabled (by default): <img width="683" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab"> * update default usage of cmake/trt version to the latest ### Motivation and Context Make it easier to test oss parser and find potential gap between tensorrt builtin/oss parser. Schedule runs with oss parser will be set after this PR gets merged	2024-02-12 23:04:08 -08:00
George Wu	5e70c6b3a6	allow protobuf lite build for TRT EP (#19498 ) allow protobuf-lite builds with TensorRT EP as long as it's built with the trt built-in parser and not the oss-parser. This is because trt built-in parser statically links protobuf so there aren't any conflicts for protobuf-lite.	2024-02-12 22:53:04 -08:00
Adrian Lizarraga	4dfba53bfb	[QNN EP] Build x64 python wheel for QNN EP (#19499 ) ### Description Adds a job to the python packaging pipeline that builds x64 python wheels for QNN EP. ### Motivation and Context Necessary to create a cached QNN model on Windows x64, which is done by creating a properly configured onnxruntime session with QNN EP.	2024-02-12 20:54:04 -08:00
Baiju Meswani	c831031ad5	Remove cuda gencode 90 to reduce onnxruntime-training package size (#19486 )	2024-02-12 09:24:36 -08:00
Justin Chu	3d2ddf96e3	Bump ruff linter to 0.2.1 (#19471 ) ### Motivation and Context Include new lint rules	2024-02-08 16:08:27 -08:00
Scott McKay	3b1b18347c	Check for invalid combination of python + minimal build in build.py (#19463 ) ### Description <!-- Describe your changes. --> Python bindings aren't supported in a minimal build. Check in build.py so user gets a better error message. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #19422	2024-02-08 09:08:41 -08:00
Jian Chen	75f06319d6	Change binet to bin (#19424 ) ### Description This pull request includes a small change to the `Dockerfile.manylinux2_28_cuda` file in the `tools/ci_build/github/linux/docker` directory. The change corrects the `PREPEND_PATH` argument from `/usr/local/cuda/binet` to `/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is set.	2024-02-07 09:51:02 -08:00
Edward Chen	df5c6718bd	Remove iOS simulator max runtime version limit. (#19396 )	2024-02-06 14:54:06 -08:00
Tianlei Wu	c4b49fb7bf	[CUDA] remove CUBLAS_TENSOR_OP_MATH mode (#19431 ) This pull request replaces `CUBLAS_TENSOR_OP_MATH` with `CUBLAS_DEFAULT_MATH`. The changes affect several files, including test cases and a Python script for AMD hipify process. ### Motivation and Context CUBLAS_TENSOR_OP_MATH mode is deprecated: https://docs.nvidia.com/cuda/cublas/index.html#cublasmath-t On CUDA versions prior to 11, users are required to set the math mode to CUBLAS_TENSOR_OP_MATH manually to be able to use tensor cores for FP16. On CUDA 11 and CUDA 12, this is no longer required. Since latest ORT only supports CUDA >= 11 so it is safe to remove CUBLAS_TENSOR_OP_MATH from our code base.	2024-02-06 12:48:39 -08:00
Yulong Wang	a4cfdc1c28	update comments for nodejs binding artifact preparation. (#19425 ) ### Description document update as a following-up for #19274	2024-02-05 22:58:35 -08:00
Jian Chen	06a84c8a0d	Enable DML on Windows and CUDA on Linux for Node.js binding (#19274 ) This pull request includes modifications to the `c-api-cpu.yml` Azure Pipelines configuration file. The changes mainly revolve around the Node.js packaging stage and the handling of Node.js artifacts. The most significant changes include renaming the Node.js packaging stage, adding a new dependency to the stage, changing artifact names, adding a new script to list Node.js artifacts, and updating the source folder for copying NuGet binaries. Changes in Node.js packaging: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508): Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to `Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new dependency to the stage. Changes in handling of Node.js artifacts: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569): Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to `drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline artifacts for Windows x64. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598): Added a new script to list Node.js artifacts from the directory `$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640): Updated the source folder from `$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib` to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task to copy NuGet binaries to the directory `$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`. --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-02-05 14:33:58 -08:00
Yi Zhang	435e19953e	Fix llama.covert_onnx to make it runnable in CI (#19372 ) ### Description 1. make parity_check use local model to avoid using hf token 2. del the model didn't work because it tried to del the object define out of the function scope. So it caused out of memory in A10. 3. In fact, 16G GPU memory (one T4) is enough. But the conversion process always be killed in T4 and it works on A10/24G. Standard_NC4as_T4_v3 has 28G CPU memory Standard_NV36ads_A10_v5 has 440G memory. It looks that the model conversion needs very huge memory. ### Motivation and Context Last time, I came across some issues in convert_to_onnx.py so I use the onnx model in https://github.com/microsoft/Llama-2-Onnx for testing. Now, these issues could be fixed. So I use onnx model generated by this repo and the CI can cover the model conversion.	2024-02-05 07:26:24 +08:00
PeixuanZuo	0cba56e0a0	[ROCm] Fix CI pipeline by fixing pytest version (#19407 ) Fix pytest version to 7.4.4, higher version will cause error `from onnxruntime.capi import onnxruntime_validation ModuleNotFoundError: No module named 'onnxruntime.capi'`	2024-02-04 16:37:36 +08:00
Scott McKay	debd1cab10	Add coremltools 7.1 as a dependency (#19389 ) ### Description <!-- Describe your changes. --> Setup usage of coremltools via dependencies instead of copying files. Pull in some changes from https://github.com/microsoft/onnxruntime/pull/19347 in preparation for supporting ML Program and enabling building the ML Model on all platforms to make development and testing of CoreML EP code easier. - Update to coremltools 7.1 - Add patch for changes required for cross platform build of ML Program related code - Generate coreml proto files on all platforms - mainly to test these changes work everywhere, as the proto files will be used on all platforms when #19347 is checked in - rename onnxruntime_coreml_proto target to coreml_proto as it contains purely coreml protobuf code with no ORT related chagnes ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve setup.	2024-02-03 09:42:21 +10:00
Yi Zhang	e74f141338	Save stablediffusion and open-clip in pipeline cache (#19314 ) ### Description 1. save the model to pipeline cache 2. lower the similarly bar to 97 3. publish the generated image that we can check it once the test fails ### Motivation and Context Reduce model downloads	2024-01-31 09:39:27 +08:00
Rachel Guo	3e17ca3dab	Fix iOS artifacts issue in Microsoft.ML.OnnxRuntime Nuget Package (#19311 ) ### Description <!-- Describe your changes. --> Updates to only include ios archs framework in artifacts included in Nuget Package. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Related issue: https://github.com/microsoft/onnxruntime/issues/19295#issuecomment-1914143256 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-01-30 08:44:20 -08:00
Changming Sun	e91d91ae4f	Fix a build issue: /MP was not enabled correctly (#19190 ) ### Description In PR #19073 I mistunderstood the value of "--parallel". Instead of testing if args.parallel is None or not , I should test the returned value of number_of_parallel_jobs function. If build.py was invoked without --parallel, then args.parallel equals to 1. Because it is the default value. Then we should not add "/MP". However, the current code adds it. Because if `args.paralllel` is evaluated to `if 1` , which is True. If build.py was invoked with --parallel with additional numbers, then args.parallel equals to 0. Because it is unspecified. Then we should add "/MP". However, the current code does not add it. Because `if args.paralllel` is evaluated to `if 0` , which is False. This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons. ### Motivation and Context	2024-01-29 12:45:38 -08:00
Yi Zhang	e96a038f01	Add VP test in Stable diffusion pipeline (#19300 ) ### Description 1. Add visual parity test based on openai clip model 2. Add trigger rules ### Motivation and Context 1. check generated image is expected 2. reduce unnecessary triggers	2024-01-29 09:33:58 -08:00
Tianlei Wu	358650d441	Fix BigModel stable diffusion pipeline (#19277 ) ### Description Fix two issues: (1) We can only use single quote inside `bash -c "..."`. Current pipeline job stopped at `python3 demo_txt2img.py astronaut` and skip the following commands. In this change, we remove the remaining commands to get same effect (otherwise, the pipeline runtime might be 2 hours instead of 15 minutes). (2) Fix a typo of Stable.	2024-01-25 17:19:04 -08:00
Phoebe Chen	4477f57ee3	Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238 ) ### Description This pull request introduces the necessary changes to enable RISC-V 64-bit cross-compiling support for the ONNX Runtime on Linux. The RISC-V architecture has gained popularity as an open standard instruction set architecture, and this contribution aims to extend ONNX Runtime's compatibility to include RISC-V, thereby broadening the reach of ONNX models to a wider range of devices. ### Motivation and Context RISC-V is a free and open-source instruction set architecture (ISA) based on established RISC principles. It is provided under open licenses without fees. Due to its extensibility and freedom in both software and hardware, RISC-V is poised for widespread adoption in the future, especially in applications related to AI, parallel computing, and data centers. ### Example Build Command ``` ./build.sh --parallel --config Debug --rv64 --riscv_toolchain_root=/path/to/toolchain/root --skip_tests ``` ### Documentation Updates Relevant sections of the documentation will be updated to reflect the newly supported RISC-V 64-bit cross-compilation feature. https://github.com/microsoft/onnxruntime/pull/19239 --------- Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>	2024-01-24 16:27:05 -08:00
Changming Sun	bc54ad3f03	Update abseil to a release tag and register neural_speed (#19255 ) ### Description Update abseil to a release tag and register neural_speed to CG. ### Motivation and Context Now we are using a non-relesed version of abseil. Using a tag is better.	2024-01-24 14:37:39 -08:00
Yi Zhang	d7aebf9ea8	Move Nuget Test from T4 to A10 to reduce release duration (#19253 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Running release process is very painful and boring because some GPU jobs have to wait so long time. ![image](https://github.com/microsoft/onnxruntime/assets/16190118/1c5c981e-68d4-4678-9758-443fbf362802) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/ba0d79ba-1554-4c7a-93dd-6ea8144c9295) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/36cab833-71c1-4ff5-bca5-f4caa9aee0c9) On the one hand, we could move some T4 from PR process since some jobs are not using T4 any more and on the other hand, we can continue to change some jobs' agent from T4 to A4 too. In the future, T4 will mainly be used for the scenarioes that big GPU memory is needed, multiple GPU cards or some special cases. Test runs: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=401786&view=logs&j=8048494c-e6eb-5e47-5e87-ff0aa863325d cc @YUNQIUGUO @snnn	2024-01-24 14:15:07 +08:00
aciddelgado	cbb29d80ff	GQA Rotary and Packed QKV with Flash (#18906 ) ### Description These changes add rotary embedding and packed qkv input to gqa. As of now, the changes are only supported with Flash-Attention (SM >= 80) but should soon be supported with Memory Efficient Attention as well. ### Motivation and Context With the fusion of rotary embedding into this Attention op, we hope to observe some perf gain. The packed QKV should also provide some perf gain in the context of certain models, like Llama2, that would benefit from running ops on the fused QKV matrix, rather than the separate Q, K, and V. --------- Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>	2024-01-23 16:34:26 -08:00
Yi Zhang	54871a2773	Replace T4 to A10 in Linux GPU workflow (#19205 ) ### Description 1. Update Linux GPU machine from T4 to A10, sm=8.6 2. update the tolerance ### Motivation and Context 1. Free more T4 and test with higher compute capability. 2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss and fail this test ``` 2024-01-19T13:27:18.8302842Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:25.8438641Z Expected equality of these values: 2024-01-19T13:27:25.8438841Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:25.8439276Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:25.8439464Z ret.first 2024-01-19T13:27:25.8445514Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ 2024-01-19T13:27:25.8446198Z 2024-01-19T13:27:25.8555736Z [ FAILED ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms) 2024-01-19T13:27:25.8556077Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312 2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:29.3175144Z Expected equality of these values: 2024-01-19T13:27:29.3175389Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:29.3175812Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:29.3176080Z ret.first 2024-01-19T13:27:29.3176322Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ ``` 3. some other test like SSD throw other exception, so skip them ''' 2024-01-22T09:07:40.8446910Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-22T09:07:51.5587571Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358: Failure 2024-01-22T09:07:51.5588512Z Expected equality of these values: 2024-01-22T09:07:51.5588870Z COMPARE_RESULT::SUCCESS 2024-01-22T09:07:51.5589467Z Which is: 4-byte object <00-00 00-00> 2024-01-22T09:07:51.5589953Z ret.first 2024-01-22T09:07:51.5590462Z Which is: 4-byte object <01-00 00-00> 2024-01-22T09:07:51.5590841Z expected 1, got 63 '''	2024-01-23 10:49:24 -08:00
Adrian Lizarraga	37d14d7896	[QNN EP] Create Windows ARM64 nightly python package (#19128 ) ### Description Adds a job to create a nightly python package for ORT/QNN on Windows ARM64. Must build onnxruntime-qnn with python 3.11 and numpy 1.25. Note: pipeline run may take up to 3 hrs ### Motivation and Context Make it possible to get a nightly python package with the latest updates to QNN EP. Issue #19161	2024-01-22 18:14:41 -08:00
Yifan Li	e283cdb218	Fix Fuzz Testing CI (#19228 ) ### Description <!-- Describe your changes. --> Add BuildArch To verify: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=400952&view=logs&j=5b022bb4-70a7-5401-8766-a8a7802c7150&t=291e85c7-5547-590b-50de-4e01fcd4eba3&l=14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-22 15:44:57 -08:00
Yi Zhang	780acda7b4	Add Big models pipeline (#19222 ) ### Description 2 models are added in CI. Stabe diffusion Model stage is based on https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md LLama2 FP16 is based on https://github.com/microsoft/Llama-2-Onnx. 12G GPU memory is not enough, so I choose T4 to run it. ### Motivation and Context Add regular E2E test for big models. It will be triggered in main build, that is, it'll run after one PR is merged. More models will be added later. ### Test Runs ### https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1275191&view=results	2024-01-22 14:02:56 -08:00
Edward Chen	c8ce83967e	Download protoc for all Apple host builds, remove protoc build from iOS packaging pipeline. (#19209 )	2024-01-19 15:30:09 -08:00
Adrian Lizarraga	28a16c223c	[QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129 ) ### Description Update QNN pipelines to use QNN SDK 2.18 by default ### Motivation and Context Test with the latest version of QNN SDK by default.	2024-01-18 14:59:23 -08:00
Yi Zhang	dc1fed7268	[Fix] Dual Cuda version isn't supported as expected in Linux Gpu pipeline (#19192 ) ### Description <!-- Describe your changes. --> ### Motivation and Context It isn't support expected dual cuda version cuda 12 link https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1272235&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91	2024-01-18 13:26:26 -08:00
Guenther Schmuelling	dd2177c5d7	enable webnn in ci build (#19163 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-18 13:11:47 -08:00
Jian Chen	9da3e36138	Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-17 17:20:42 -08:00
Wanming Lin	07d3aed3aa	[WebNN EP] Fixed build issue with disable_rtti (#19173 ) Previously building webnn ep with --disable_rtti will throw unboundTypeError since unbound type names are illegal with RTTI disabled in Embind API, we can fix it by adding a -DEMSCRIPTEN_HAS_UNBOUND_TYPE_NAMES=0 flag.	2024-01-16 21:35:13 -08:00
Changming Sun	81d363045b	Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117 ) ### Description Upgrade Ubuntu machine pool from 20.04 to 22.04	2024-01-16 17:25:18 -08:00
Jian Chen	8e272b9cac	Update build.py to remove unused functions and update python to 3.8 (#19164 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-16 13:53:15 -08:00
Changming Sun	e2e488d6f8	Revert "iOS packaging pipeline stability" (#19135 ) Reverts microsoft/onnxruntime#19097 because it broken Android CI pipeline.	2024-01-16 09:18:35 -08:00
Jian Chen	c92f72ebeb	Merge Linux Nuget GPU pipeline with zip-nuget (#19120 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-16 08:59:03 -08:00
Jeff Bloomfield	8d4369b77e	Update DirectML nuget version to 1.13.1 (#19122 ) ### Description Update DML version to 1.13.1 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-15 19:04:41 -08:00
pengwa	1150b1f81e	ORTModule memory improvement (#18924 ) ## Dependency https://github.com/microsoft/onnxruntime/pull/19007 ## ORTModule memory efficient gradient management Previously I have tried to solve the coarsed-grained gradient accumulation/update problem in ORTModule with https://github.com/microsoft/onnxruntime/pull/8979, while that resolution somehow is not fully validated with DDP or there is user hooks on the gradient accumulation on torch parameter. This PR is addressing the problem in the similar approach as PR 8979, e.g. trigger gradient accumulation once ORT computed the grad, but instead of use a AccumulateGrad op, this time with a ONNX operator PythonOp, internally it will call param.backward(grad), which will help handle all related hooks correctly. ## Design Check the details from https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ ## Convergence Validation: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe) differences are on mostly 0.000x, sometimes 0.00x, which may comes from the different order gradient apply happens before or after this change (on deepspeed zero stage 2) ## TODO Consolidate the logic with Stage3's similar logic.	2024-01-16 08:57:37 +08:00
Yi Zhang	922a2f00e3	Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Linux_GPU_x64 job in the pipeline has been canceled due to timeout since 0112.	2024-01-15 14:37:22 +08:00
Jian Chen	c3ce9df80c	Disabling python3.12 on training python packaging pipleines (#19123 )	2024-01-14 14:51:00 -08:00
Jian Chen	76797127d6	Always download cuda and trt libraries from Azure blob (#19118 ) ### Description This way, we will not need to update the windows images constantly and allow more flexibility to choose the cuda version in the future.	2024-01-14 11:37:26 -08:00
Changming Sun	bb4011b2b1	Set default flags nvcc and do not set default compile flags for ROCM EP (#19124 ) ### Description Set default flags nvcc and do not set the flags for ROCM EP. ### Motivation and Context 1. To meet a BinSkim requirement for CUDA EP. https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations 2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed to load the EP with the following error message: Failed to load library libonnxruntime_providers_rocm.so with error: /build/Release/libonnxruntime_providers_rocm.so: undefined symbol: vtable for onnxruntime::InsertMaxPoolOutput . This PR is a hot fix to bring the pipeline back. So far I don't know why the error happened. The symbol "InsertMaxPoolOutput" is in onnxruntime_optimizers. I don't see any EP code references it directly.	2024-01-14 11:36:49 -08:00
Yulong Wang	f917dde717	[web] remove xnnpack from web backends (#19116 ) ### Description XNNPACK is already disabled in web assembly build. This change removes the xnnpack backend registration in JS.	2024-01-13 23:04:02 -08:00
Edward Chen	e1e45901e2	iOS packaging pipeline stability (#19097 ) - Remove protoc build step which sometimes times out. Download protoc instead. - Use macOS-12 image in the set variables stage. It seems more stable.	2024-01-13 19:27:44 -08:00
Changming Sun	5558912d7b	Disable ccache in Windows CPU CI pipeline (#19131 ) ### Description Disable ccache for all the jobs in in Windows CPU CI pipeline. Before disabling it, the build has a warning that: "MSIL .netmodule or module compiled with /GL found; restarting link with /LTCG; add /LTCG to the link command line to improve linker performance" After disabling it, the warning is gone and the build doesn't use /GL or /LTCG. Cache itself should not cause this difference. ### Motivation and Context	2024-01-13 18:40:43 -08:00
Adrian Lizarraga	65893ef382	Add --parallel to QNN EP NuGet pipeline build command (#19126 ) ### Description Add --parallel to QNN EP NuGet pipeline build command ### Motivation and Context Improve build times for pipeline.	2024-01-13 02:38:40 -08:00
Jian Chen	78e796bb27	Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096 ) ### Description Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. For example, https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts	2024-01-12 22:30:43 -08:00
Jian Chen	e5eacc6d11	Fix cuda-packaging-pipeline.yml (#19115 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-12 19:09:25 -08:00
Guenther Schmuelling	96dbac6e4b	update to emsdk-3.1.51 (#18844 )	2024-01-12 16:04:33 -08:00
Caroline Zhu	4dbaa73738	[js/web/training] added end-to-end tests (#18700 ) ## Summary * following inference's [set-up for end-to-end tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e), created an end-to-end test runner for training * this test runner copies testdata from the [trainingapi folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api) * then runs two tests (training session with evalModel & optimizer model, and training session with the minimum options), and tests if the ORT-web training package encompasses inference * these tests check * createTrainingSession * runTrainStep * runOptimizerStep if applicable * the parameters methods (getParametersSize, loadParametersBuffer, and getContiguousParameters) ## TL;DR * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for setting up and running the end to end tests * [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8) contains the test function definitions (`testInferenceFunction`, `testTrainingFunctionMin`, `testTrainingFunctionAll`) ## Flow * entrypoint: user runs the following command in the terminal: `npm run test:training:e2e` * [`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28) was modified to include an npm script that will run `run.js` which will run the end to end tests * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for * detecting and installing local tarball packages of ORT-web * copying training data to the `js/web/training/e2e/data` folder * starting two Karma processes. Karma is a test runner framework that simulates testing in the browser. * In this case, the tests happen in Chrome. We can configure the tests to run in Edge and other browsers in the future. * one of these karma processes is self-hosted, meaning it pulls the ORT-web package from local * the other karma process is not self-hosted, meaning it pulls the ORT-web package from another source. In this case, we start an http server that serves the ORT-web binaries. * [`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c) is responsible for starting the HTTP server and serving the ORT binary files. This code almost identical to the same code in the inference E2E tests. * [`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28) Karma configuration file that specifies what happens when a karma process is started. The config specifies Mocha as the testing framework, which will go through all the loaded files and run any tests that exist * [`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221) File that contains the tests that Mocha will pick up on and run. * The test functions (such as testInference and testTrainingFunctionAll) are defined in [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8). ## Notes * I followed the [tests for training core](`b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc`) where they randomly generated input for the training session * E2E tests are triggered by running `npm run test:training:e2e` -- suggestions for alternative script names are appreciated!!! ## Motivation and Context - adding training bindings for web	2024-01-12 13:33:33 -08:00
Changming Sun	55b046e97e	Remove enable_mac_silicon settings (#19108 ) ### Description Remove enable_mac_silicon settings from two packaging pipelines. ### Motivation and Context Now we build universal2 packages instead.	2024-01-12 11:01:39 -08:00
Changming Sun	0e8d4c3d21	Enable Address Sanitizer in CI (#19073 ) ### Description 1. Add two build jobs for enabling Address Sanitizer in CI. One for Windows CPU, One for Linux CPU. 2. Set default compiler flags/linker flags in build.py for normal Windows/Linux/MacOS build. This can help control compiler flags in a more centralized way. 3. All Windows binaries in our official packages will be built with "/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft public symbol server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols). Limitations: 1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries. Therefore once Address Sanitizer is enabled, before running tests we need to manually set LD_LIBRARY_PATH properly otherwise libonnxruntime.so may not be able to find custom ops and shared EPs. 4. On Linux we also need to set LD_PRELOAD before running some tests(if the main executable, like python, is not built with address sanitizer. On Windows we do not need to. 5. On Windows before running python tests we should manually copy address sanitizer DLL to the onnxruntime/capi directory, because python 3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the information provided by PATH env. 6. On Linux Address Sanitizer found a lot of memory leaks from our python binding code. Therefore right now we cannot enable Address Sanitizer when building ONNX Runtime with python binding. 7. Address Sanitizer itself uses a lot of memory address space and delays memory deallocations, which is easy to cause OOM issues in 32-bit applications. We cannot run all the tests in onnxruntime_test_all in 32-bit mode with Address Sanitizer due to this reason. However, we still can run individual tests in such a way. We just cannot run all of them in one process. ### Motivation and Context To catch memory issues.	2024-01-12 07:24:40 -08:00
Changming Sun	285606108a	Set pythonInterpreter in set-python-manylinux-variables-step.yml (#19105 ) ### Description Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix a build error: ``` Starting: Set Python manylinux variables ============================================================================== Task : Python script Description : Run a Python file or inline script Version : 0.231.1 Author : Microsoft Corporation Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script ============================================================================== ##[error]Parameter 'toolPath' cannot be null or empty. Finishing: Set Python manylinux variables ``` The error was because today I deleted a bunch of software from the VM image. The task might fail if no Python versions are found in $(Agent.ToolsDirectory).	2024-01-12 07:22:02 -08:00
Jian Chen	53497702a6	Fix Nuget CUDA Packaging pipeline (#19054 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2024-01-11 11:59:21 -08:00
Jian Chen	2eb3db6bf0	Adding python3.12 support to ORT (#18814 ) ### Description Adding python3.12 support to ORT ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-11 08:34:28 -08:00
Baiju Meswani	730df1bfa2	Increase MacOS pipeline timeout (#19072 )	2024-01-09 18:35:21 -08:00
Milos Puzovic	37ac9d391c	Enable Arm Compute Library 23.08 (#17672 ) ### Description This PR enables onnxruntime to build with the most recent release of Arm Compute Library ### Motivation and Context The latest version of Arm Compute Library that onnxruntime can build is 20.02 which is more than 3 years old.	2024-01-09 14:10:25 -08:00
Ashwini Khade	897a4163d7	Update transformer version for training CIs (#19046 ) ### Description Updating version to resolve security vulnerability.	2024-01-09 12:00:34 -08:00
Changming Sun	ab897a4a40	Remove Windows ARM32 from nuget packaging pipelines (#19049 ) ### Description 1. Remove Windows ARM32 from nuget packaging pipelines 2. Add missing component-governance-component-detection-steps.yml to some build jobs. ### Motivation and Context Stop supporting Windows ARM32 to align with [Windows's support policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64). Users who need this feature still can build the DLLs from source. However, later on we will remove that support too.	2024-01-09 07:45:03 -08:00
Adrian Lizarraga	52e5601449	[QNN Nuget Pipeline] Build with ML ops and detect ORT version (#19024 ) ### Description - Removes `--disable_ml_ops` build flag - Automatically detects ORT version from VERSION file via `templates/set-version-number-variables-step.yml`. We will no longer need to create a commit to update ORT versions. ### Motivation and Context - A new unit test caused failures in the QNN Nuget pipeline because it did not enable ml ops. - Automate ORT version specification	2024-01-08 12:44:12 -08:00
Yi Zhang	e8ac97c8d8	Move Windows GPU training job to A10 (#19041 ) ### Description 1. Update sm to 86 ### Motivation and Context We have more A10 quota then T4 and Nvidia AXX could be partitioned	2024-01-08 09:19:58 -08:00
PeixuanZuo	efdcefcf8c	[ROCm] fix security warning (#19017 ) fix security warning	2024-01-05 10:05:34 -08:00
Changming Sun	e155c66b4a	Change all macOS python packages to use universal2 (#19013 ) ### Description Change all macOS python packages to use universal2, to reduce the number of packages we have. ### Motivation and Context According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur), macOS 11 is the first macOS version that supports universal 2. And it is the min macOS version we support. So we no longer need to maintain separate binaries for different CPU archs.	2024-01-04 17:44:49 -08:00
Jeff Bloomfield	55a669409a	Merge pull request #18983 from microsoft/WindowsAI Merge WindowsAI to main	2024-01-04 17:21:19 -08:00
Adrian Lizarraga	02b1ff5fa2	[QNN EP] Support multithreaded inference of a single session (#18981 ) ### Description - Add mutex to protect QNN API calls for executing a graph and extracting the corresponding profile data. - Ensures QNN EP's execute function does not store unnecessary state (i.e., input and output buffer pointers do not need to be stored as class members.) ### Motivation and Context Allow calling `session.Run()` from multiple threads when using QNN EP.	2024-01-04 13:32:48 -08:00
raoanag	56fcea94e3	Enable QDQ quantization for DML EP (#18367 ) ### Description This enables QDQ transforms with the DML EP	2024-01-03 16:13:23 -08:00
Jeff Bloomfield	c3d96a7b35	Update DML version to 1.13.0 (#18978 ) Update DML nuget version to 1.13.0	2024-01-03 16:09:55 -08:00
PeixuanZuo	7a454acd61	[ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985 ) Update CI/Packaing pipeline to ROCm6.0	2024-01-03 17:25:15 +08:00
Yi Zhang	c97e3f4821	[Fix] exception in Fuzz Test pipeline (#18984 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The file path is not correct.	2024-01-03 14:53:31 +08:00
Yifan Li	3993d43048	[EP Perf] Fix missing Azure cli & use onnx zoo model inside image (#18917 ) ### Description * Fix [missing Azure CLI issue](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392612&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=12). * Now, once CI fails to run `az --version`, it would auto-reinstall the azure cli dependency * Use existing onnx zoo model inside image during memtesting * to avoid test failure when onnx model zoo is restructuring * Display more detail info of valgrind when memtesting * Clear invalid dep of existing AddressSanitizer test case ### Validate * Before the fix, Azure CLI is missing: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392994&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=10 * After the fix: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392619&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b	2024-01-01 17:14:39 -08:00
Yi Zhang	3f03c12986	Split Onnxruntime Nuget GPU package (#18819 ) ### Description 1. Update donwload-artifacts to flex-downloadartifacts to make it eaiser to debug. 2. Move the native files into Gpu.Windows and Gpu-linux packages. Onnxruntime-Gpu has dependency on them. 3. update the package validation as well 4. Add 2 stages to run E2E test for GPU.Windows and GPU.Linux for example: ![image](https://github.com/microsoft/onnxruntime/assets/16190118/35c6730b-8080-4f52-a17c-b9c61f41b6bb) ### Motivation and Context Single Onnxruntime.Gpu Package size has already excceded the Nuget size limit. We split the package into some smaller packages to make them can be published. For compatibility, the user can install or upgrade Onnxruntime.Gpu, which will install Gpu.Windows and Gpu.Linux automatically. And the user can only install Gpu.Windows and Gpu.Linux directly. ### Test Link 1. In ORT_NIGHTLY 2. Install the preview version in nuget-int. (nuget source: https://apiint.nugettest.org/v3/index.json) --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-12-22 16:57:16 +08:00
Changming Sun	3d8f229d39	Add ARM64EC build jobs (#18870 ) ### Description Add ARM64EC build jobs in post merge pipeline to validate if our code is compatible with Windows ARM64EC.	2023-12-21 16:31:38 -08:00
dependabot[bot]	914bc409b0	Bump transformers from 4.30.0 to 4.36.0 in /tools/ci_build (#18895 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.30.0 to 4.36.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support</h2> <h2>New model additions</h2> <h3>Mixtral</h3> <p>Mixtral is the new open-source model from Mistral AI announced by the blogpost <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral of Experts</a>. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.</p> <!-- raw HTML omitted --> <p>The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as <code>NllbMoe</code> architecture in transformers. You can use it through <code>AutoModelForCausalLM</code> interface:</p> <pre lang="py"><code>>>> import torch >>> from transformers import AutoModelForCausalLM, AutoTokenizer <p>>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto") >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")</p> <p>>>> prompt = "My favourite condiment is"</p> <p>>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device) >>> model.to(device)</p> <p>>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) >>> tokenizer.batch_decode(generated_ids)[0] </code></pre></p> <p>The model is compatible with existing optimisation tools such Flash Attention 2, <code>bitsandbytes</code> and PEFT library. The checkpoints are release under <a href="https://huggingface.co/mistralai"><code>mistralai</code></a> organisation on the Hugging Face Hub.</p> <h3>Llava / BakLlava</h3> <p>Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.</p> <!-- raw HTML omitted --> <p>The Llava model was proposed in <a href="https://arxiv.org/pdf/2310.03744">Improved Baselines with Visual Instruction Tuning</a> by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.</p> <ul> <li>[<code>Llava</code>] Add Llava to transformers by <a href="https://github.com/younesbelkada"><code>@younesbelkada</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27662">#27662</a></li> <li>[LLaVa] Some improvements by <a href="https://github.com/NielsRogge"><code>@NielsRogge</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a></li> </ul> <p>The integration also includes <a href="https://github.com/SkunkworksAI/BakLLaVA"><code>BakLlava</code></a> which is a Llava model trained with Mistral backbone.</p> <p>The mode is compatible with <code>"image-to-text"</code> pipeline:</p> <pre lang="py"><code>from transformers import pipeline from PIL import Image import requests <p>model_id = "llava-hf/llava-1.5-7b-hf" </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`14666775a2`"><code>1466677</code></a> Release: v4.36.0</li> <li><a href="`accccdd008`"><code>accccdd</code></a> [<code>Add Mixtral</code>] Adds support for the Mixtral MoE (<a href="https://redirect.github.com/huggingface/transformers/issues/27942">#27942</a>)</li> <li><a href="`0676d992a5`"><code>0676d99</code></a> [<code>from_pretrained</code>] Make from_pretrained fast again (<a href="https://redirect.github.com/huggingface/transformers/issues/27709">#27709</a>)</li> <li><a href="`9f18cc6df0`"><code>9f18cc6</code></a> Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (<a href="https://redirect.github.com/huggingface/transformers/issues/27940">#27940</a>)</li> <li><a href="`7ea21f1f03`"><code>7ea21f1</code></a> [LLaVa] Some improvements (<a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a>)</li> <li><a href="`5e620a92cf`"><code>5e620a9</code></a> Fix <code>SeamlessM4Tv2ModelIntegrationTest</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27911">#27911</a>)</li> <li><a href="`e96c1de191`"><code>e96c1de</code></a> Skip <code>UnivNetModelTest::test_multi_gpu_data_parallel_forward</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27912">#27912</a>)</li> <li><a href="`8d8970efdd`"><code>8d8970e</code></a> [BEiT] Fix test (<a href="https://redirect.github.com/huggingface/transformers/issues/27934">#27934</a>)</li> <li><a href="`235be08569`"><code>235be08</code></a> [DETA] fix backbone freeze/unfreeze function (<a href="https://redirect.github.com/huggingface/transformers/issues/27843">#27843</a>)</li> <li><a href="`df5c5c62ae`"><code>df5c5c6</code></a> Fix typo (<a href="https://redirect.github.com/huggingface/transformers/issues/27918">#27918</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.30.0...v4.36.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.30.0&new-version=4.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-12-21 00:44:36 -08:00
Yifan Li	54e471a054	[EP Perf] Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard (#18868 ) ### Description Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard: ![image](https://github.com/microsoft/onnxruntime/assets/109183385/bafba098-1338-46fa-b10a-ca19eff2a746) Check [here](https://msit.powerbi.com/groups/d1ae6355-afd0-4c40-b78e-676a86cab1e2/reports/82101bbb-dad2-4f24-9ddf-a37f0d41509a/ReportSectionda402bdf6824e505a614?experience=power-bi) to preview on ep perf dashboard ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - brief overview of op metrics towards various models - easy to identify models which haven't reached 100% ops on cuda/trt ep.	2023-12-20 22:11:47 -08:00
Hector Li	8931854528	Move some QNN EP provider options to session options (#18877 ) Move QNN EP provider options to session options ### Description Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first. This is the first step for PR: PR https://github.com/microsoft/onnxruntime/pull/18865	2023-12-20 00:13:38 -08:00
Scott McKay	666fcbde4d	Add LeakyRelu to list of NNAPI operators (#18880 ) ### Description <!-- Describe your changes. --> Add LeakyRelu to the list as support was added a while ago. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-20 14:44:31 +10:00
Changming Sun	535a2403dd	Update Nuget publishing jobs (#18851 ) ### Description 1. Add a CodeSign validation task before the binaries are published, to make sure all DLL files are signed. 2. Auto-trigger the CUDA 12 pipeline's publishing job.	2023-12-19 16:54:46 -08:00
Ashwini Khade	4dff154f51	Fix nightly pipeline failure (#18867 ) ### Description Fixes a failure in the ortmodule nightly pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 09:18:00 -08:00
Jian Chen	6d7519ede8	Adding new pipeline for python cuda testing (#18718 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-18 18:13:03 -08:00
Changming Sun	ad476d5a1f	Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly (#18847 ) ### Description Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly, so that we do not need to put a CUDA SDK in the build machine's image.	2023-12-15 17:44:02 -08:00
Changming Sun	fc9ecb59db	Add Windows ARM build jobs to post merge pipeline (#18832 ) ### Description Add Windows ARM build jobs to post merge pipeline to valid our code is still compatible with these build settings.	2023-12-15 08:47:52 -08:00
Changming Sun	cbad4fe49b	Update absl and googletest (#18827 ) ### Description Update absl and googletest to their latest version to include some cmake changes: 1. A googletest's cmake change that will allow using external absl and re2. 2. Nullability enhancements that will allow our clang-based static analysis detecting many kinds of null pointer errors. ### Motivation and Context To fix a C4744 link warning in our Windows pipelines. ``` LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] ```	2023-12-14 16:15:07 -08:00
Changming Sun	b129f425fc	Fix test model URL issue (#18823 ) ### Description ONNX model zoo changed their dir structure. So some our pipelines are failing. In prevent such things happening again, we'd better to read the test data for a cache from local disk instead of downloading it remotely every time.	2023-12-14 13:06:08 -08:00
Changming Sun	95193cb440	Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810 ) ### Description To upgrade the clang version in preparation for PR #17031 .	2023-12-14 08:08:41 -08:00
Rachel Guo	f3fa045681	Enable MacOS build in ORT Objc Pod (#18786 ) ### Description <!-- Describe your changes. --> Add macos build for objc pod. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Follow up pr for #18550 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-12-13 13:50:42 -08:00
Changming Sun	17eaf9b053	Fix a build warning in SparseTensor code for 32-bit build configs (#18766 ) ### Description The warning is: ``` C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.1812949Z with 2023-12-08T20:58:48.2144272Z [ 2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.2801935Z ] 2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2894476Z with 2023-12-08T20:58:48.2911521Z [ 2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr, 2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) 2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3197946Z with 2023-12-08T20:58:48.3198565Z [ 2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3905678Z ] 2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3913414Z with 2023-12-08T20:58:48.3913660Z [ 2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.3914499Z ] 2023-12-08T20:58:48.3914743Z qlinear_concat.cc 2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5534583Z with 2023-12-08T20:58:48.5541266Z [ 2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5544914Z ] 2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5553712Z with 2023-12-08T20:58:48.5555569Z [ 2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5558707Z ] 2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5566354Z with 2023-12-08T20:58:48.5568185Z [ 2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5571339Z ] 2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5578562Z with 2023-12-08T20:58:48.5580399Z [ 2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5583465Z ] 2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5591396Z with 2023-12-08T20:58:48.5593220Z [ 2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5595955Z ] ``` And the warning in #18195 ### Motivation and Context AB#22894 --------- Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>	2023-12-13 11:11:13 -08:00
Changming Sun	44054e7508	Move NuGet nightly package publishing job to a separated pipeline (#18801 ) ### Description Move NuGet nightly package publishing job to a separated pipeline. Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs Packaging Pipeline'. This PR moves it to a separate pipeline so that we can manually trigger this step for any branch(e.g. release branches).	2023-12-13 11:10:50 -08:00
Jian Chen	ce1fed6ddf	Adding a new pipeline for publishing to Python Cuda 12 packages. (#18712 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 14:17:46 -08:00
Jian Chen	bfa5eb4591	Adding a new pipeline for pubilshing cuda 12 nuget packages (#18713 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 13:07:05 -08:00
Ashwini Khade	16df8377d3	Update transformers package to fix the security issue (#18730 ) ### Description Updating transformers package in test pipeline to fix a security vulnerability. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 09:15:23 -08:00
cloudhan	de32baeeef	[ROCm] Add GemmFloat8 (#18488 )	2023-12-11 11:37:29 +08:00
Changming Sun	bf33919afb	Update absl and gtest to fix an ARM64EC build error (#18735 ) ### Description Update absl and gtest to fix an ARM64EC build error ### Motivation and Context We need to get an important fix into ORT. The fix is: `8028a87c96`	2023-12-07 15:55:17 -08:00
Yi Zhang	a045be335b	use EO pool for windows web_cpu stage (#18737 ) ### Description reuse EO pool in NPM pipeline. ### Motivation and Context build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO pool. Reuse EO pool to make the pipeline work now. When I'm free, I'll try upgrading the chrome in the custom image.	2023-12-07 10:10:00 -08:00
moyo1997	9479ba525b	Build onnxruntime.dll as arm64x (#18633 ) Build onnxruntime.dll as arm64x Added a .cmake file to generate a link repro of the onnxruntime.dll during arm64 build. This provides us a directory containing all the arm64 objs, def file and libs to link to when it is time to building arm64x onnxruntime.dll during the arm64ec build by passing the /machine:arm64x flag to the linker along with the arm64 artifacts. If other dlls wanted to be built as x, setting the ARM64X_TARGETS variable in the toplevel cmakelists.txt to include these other targets is all that will be needed. Added build_arm64x.bat as a wrapper for the multiple (rm64, then arm64ec) cmake calls needed to build as arm64x. AB#22533	2023-12-06 16:49:00 -08:00
Rachel Guo	7762f3f7c5	[NNAPI EP] Add NNAPI Split (#18702 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> yolo-v8 model missing operator support. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-12-06 15:11:15 -08:00
Adrian Lizarraga	559bd52252	[QNN EP] Update QNN SDK to version 2.17.0 (#18684 ) ### Description - Update QNN CI Pipelines to use QNN SDK version 2.17.0 - Print warning if unit test requires adjusted tolerance to pass - Temporarily disable unloading QnnCpu.dll for windows x64 due to crash when calling FreeLibrary - Enable fixed HTP tests - QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale - QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8 - QnnHTPBackendTests.ReduceSumS8Opset13_Rank5 - QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis - QnnHTPBackendTests.WhereLargeDataBroadcastU8 - QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8 - Enabled fixed CPU tests - QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales - Increased tolerance for HTP tests that are less accurate on QNN SDK 2.17.0 - QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8 - QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input - QnnHTPBackendTests.ConvU8U8S32_bias_initializer - QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer - QnnHTPBackendTests.LRNSize3 - QnnHTPBackendTests.LRNSize5 - QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8 - QnnHTPBackendTests.MaxPool_LargeInput_1Pads - QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners - QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric - Disabled ONNX model tests - averagepool_2d_ceil: Accuracy issues only on Windows x64 QnnCpu.dll - Disabled QDQ model tests (onnx_test_runner) - facedetection_op8_qdq: Accuracy issues - Disabled CPU EP tests (these use QnnCpu.dll) - ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX - GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is initializer and bias is not - MathOpTest.MatMulFloatType "test padding and broadcast B > A": Inaccuracy (only linux) - Fix Gemm translation bugs in QNN EP: - Do not skip processing of inputs that need to be transposed. ### Motivation and Context - Allow testing with newest QNN SDK version - Take advantage of improvements to enable new models.	2023-12-06 11:05:41 -08:00
Changming Sun	eaaf27015e	Remove EnvSetupScript parameter from win-ci.yml (#18662 ) ### Description To make the code more consistent. Now some TRT pipelines download TRT binaries on-the-fly, while other TRT pipelines use a preinstalled version. This PR make them the same.	2023-12-01 15:30:16 -08:00
Rachel Guo	9c45fe4957	Fix macos xcframework test stage codesign info (#18649 ) ### Description <!-- Describe your changes. --> Remove developement id and force codesign not required in the test macos target. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix failure happened in iOS_Full_xcframwork stage in Zip-Nuget-Java-NodeJS packaging pipeline. --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-12-01 14:47:46 -08:00
snadampal	05a9c95764	[DNNL] add Arm Compute Library (ACL) backend for dnnl execution provider (#15847 ) Add ACL as the DNNL runtime option for aarch64 platforms. Update makefile and the python wheel build script. ### Description <!-- Describe your changes. --> Add ACL as the DNNL runtime option for aarch64 platforms. Update makefile and the python wheel build script. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is to enable the optimized ACL gemm kernels for dnnl execution provider on aarch64 platform.	2023-12-01 09:16:44 -08:00
Jian Chen	d69842226b	Update the template files to correct stage to fix the python cuda 12 packaging pipeline (#18651 )	2023-12-01 07:57:46 -08:00
Yi Zhang	efee9abdb7	Reduce downloads in Nuget-Java pipeline to reduce connection exception (#18635 ) ### Description 1. Add a new stage to download java tools from https://oss.sonatype.org and publish them to pipeline artifact 2. Remove downloads in other jobs, they get the java tools from pipeline artifact 3. consolidate final_java_testing stages. ### Motivation and Context Reduce downloads to reduce the connection error like below. ``` --2023-11-28 07:16:31-- https://oss.sonatype.org/service/local/repositories/releases/content/org/junit/platform/junit-platform-console-standalone/1.6.2/junit-platform-console-standalone-1.6.2.jar Resolving oss.sonatype.org (oss.sonatype.org)... 3.227.40.198, 3.229.50.23 Connecting to oss.sonatype.org (oss.sonatype.org)\|3.227.40.198\|:443... connected. HTTP request sent, awaiting response... 502 Bad Gateway 2023-11-28 07:16:32 ERROR 502: Bad Gateway. ```	2023-12-01 07:44:44 +08:00
Changming Sun	1b5675ff0f	Update post-merge-jobs.yml: increase timeout value for the Ios job (#18602 )	2023-11-30 08:07:13 -08:00
George Wu	5c67a00d8e	Revert "remove full protobuf requirement for tensorrt ep" (#18626 ) Reverts microsoft/onnxruntime#18413 there's a timing issue here. we eventually want to get this change merged in but we need to update OSS onnx-tensorrt first.	2023-11-29 22:27:51 -08:00
Yi Zhang	68209307da	Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 (#18614 ) ### Description Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 ### Motivation and Context Reduce the maintenance cost	2023-11-29 10:32:42 -08:00
Edward Chen	14a343441d	Fix Objective-C static analysis build (#18606 ) - Patch abseil to fix a compile error about not finding `cxxabi.h`. - Fix some static analysis warnings.	2023-11-28 17:14:20 -08:00
Jian Chen	a49f31b670	Remove drop-nuget artifact from all pipelines (#18592 ) ### Description Currently, the `drop-nuget` artifact only contains protoc.exe which is also part of the `drop-extra` artifact. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-28 13:23:01 -08:00
Mike Guo	e24733cfe9	fix the Olive CI pipeline failure on Windows (#18464 ) Fix the https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1046 failure for Windows	2023-11-28 11:42:39 -08:00
Rachel Guo	288b80d363	Add MacOS build to ORT C Pod (#18550 ) ### Description <!-- Describe your changes. --> As title. 1. Add macos build as an optionally enabled arch for pod and changes to exsiting build_ios_framework/assemble_c_pod scripts. 2. Enable macos build arch in ios packaging pipeline (currently for variants other than Mobile) and check the output artifacts are correct. 3. Write MacOS Test Target scheme in the test app and integrate into ios packaging CI testing pipeline. Currently the changes only apply to onnxruntime-c pod. as the original request was from ORT SPM which consumes the onnxruntime-c pod only as the binary target. TODO: could look into adding macos platform to objc pod as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable macos platform support in cocoapods. and also potentially produce binary target for enabling macos platform in SPM as well. Replace https://github.com/microsoft/onnxruntime/pull/18334 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-28 10:11:53 -08:00
Yi Zhang	a6d8726407	Update ADO windows image to custom image (#18598 ) ### Description Update Azure-Pipelines-EO-Windows2022-aiinfra to onnxruntime-win-CPU-2022 in Nuget_Package_CPU. To make the debugging easier, use flex-downloadPipelineArtifact ### Motivation and Context Azure-Pipelines-EO-Windows2022-aiinfra is using 1ES window-latest image. The pipeline might be failed by unexpected upgrade. Verified: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=384425&view=results ### P.S. I think we should replace all Azure-Pipelines-EO-Windows2022-aiinfra.	2023-11-28 09:04:25 -08:00
Jian Chen	3ea27c2925	Create a new Nuget Package pipeline for CUDA 12 (#18135 )	2023-11-28 09:03:46 -08:00
Ted Themistokleous	7b2aefa856	undo hipify of __half to rocblas_half (#18573 ) Fixes build issue seen with newer ROCm releases Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2023-11-24 18:04:23 +08:00
Rachel Guo	62f00ad8e7	[CoreML] Add Softmax and Split op support (#18358 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Added for yolov8 model missing operator support. https://github.com/microsoft/onnxruntime/issues/17654 Now the model support info looks like: _CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 3 number of nodes in the graph: 233 number of nodes supported by CoreML: 230_ (only missing 3 concat op support due to input 3d shape is not currently support in CoreML EP Concat). --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-23 14:26:57 -08:00
cloudhan	6f3c1f9dc9	[ROCm] Update ck for GemmFloat8 (#18487 )	2023-11-23 12:06:19 +08:00
Yulong Wang	d455b0f8fd	[js/web] use Chrome in CI for npm tests (#18522 ) ### Description use Chrome in CI for npm tests. Previously we use Edge, however it sometimes crashes with reasons not yet identified.	2023-11-21 18:03:57 -08:00
Abhishek Jindal	680a526e73	Training packaging pipeline for cuda12 (#18524 ) ### Description <!-- Describe your changes. --> Build ORT-training packaging pipeline for CUDA 12.2 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This will help any customer using CUDA 12 and would not need to build ORT-training from source Test run: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0	2023-11-21 13:19:21 -08:00
Xavier Dupré	29a409acaa	Add missing flags DISABLE_FLOAT8_TYPES in GemmFloat8 custom operator for CUDA < 11.8 (#18162 ) ### Description PR #16051 introduced operator GemmFloat8 but the flags DISABLE_FLOAT8_TYPES was missing in a couple of places. The PR addresses that issue. That would allows the compilation on CUDA < 11.8.	2023-11-21 14:37:48 +01:00
Jian Chen	1dd9bf5340	Remove setup_env_azure.bat (#18482 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-20 09:58:15 -08:00
Jian Chen	d97fc1824f	Create a new Python Package pipeline for CUDA 12 (#18348 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-20 09:48:28 -08:00
Wei-Sheng Chin	3bcc137eb4	Tiny change to trigger the update of DORT's CI image (#18507 ) Recent PyTorch breaks DORT CI and [a patch](https://github.com/pytorch/pytorch/pull/113697) has been merged into PyTorch main. In order to update DORT's CI, we made dummy change in this PR.	2023-11-19 22:09:11 -08:00
Changming Sun	9364c05170	Update web-ci.yml: remove depth=1 (#18500 ) ### Description It causes our "NPM Packaging Pipeline" to fail. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-17 22:49:03 -08:00
Changming Sun	41f9379f3c	Update NDK version to 26.1.10909125 (#18493 ) ### Description Similar to #17852 ### Motivation and Context To avoid downloading NDK	2023-11-17 14:14:01 -08:00
Changming Sun	5eb5056c61	Always run emsdk_env.sh before build.py, even when ccache is disabled (#18477 ) ### Description Always run emsdk_env.sh before build.py, even when ccache is disabled This is a follow up to #18434. That PR didn't handle the case when ccache was disabled.	2023-11-16 21:37:29 -08:00
George Wu	d73073d491	remove full protobuf requirement for tensorrt ep (#18413 ) tensorrt can work with protobuf lite.	2023-11-16 20:44:27 -08:00
Scott McKay	e7a524fea9	Update to allow large models to be checked for mobile support. (#18357 ) ### Description <!-- Describe your changes. --> Update usability checker and related infrastructure to support checking models > 2GB. - Add ability to set flag to keep initializers as external data - we optimize the model as part of the checking so need to write out a new copy. - Handle issue with ONNX shape inferencing silently failing - use API that supports large models but requires writing the model to a new file - automate cleanup of that copy of the model ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Allow analysis of LLMs to determine gaps for mobile usage. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-17 07:20:16 +10:00
Jian Chen	05526b354b	Adding new yaml file for downloading cuda, and trt from azure blob (#18443 ) This also set the Path variable for the downloaded libraries. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 19:47:39 -08:00
Ye Wang	f9af94009b	onboard MoE (#18279 ) ### Description <!-- Describe your changes. --> 1. Introduce MoE CUDA op to ORT based on FT implementation. 2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows. Remove patch file for cutlass 3.0.0. 3. Sharded MoE implementation will come with another PR limitation: __CUDA_ARCH__ >= 700 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 16:48:51 -08:00
Changming Sun	27d068569a	Remove Node.js tool installer task from web ci pipeline (#18434 ) EMSDK already has a nodejs. We will use that one to be more consistent(the CI build pipeline would be less dependent on the VM image).	2023-11-14 13:16:01 -08:00
Yulong Wang	d22b1af5da	[js/web] add CI steps to log info for test failure investigating (#18418 ) ### Description add CI steps to log info for test failure investigating. Currently Web CI is marked as 'optional'. This change adds some script to dump debug info for investigating the random test failure	2023-11-14 11:40:58 -08:00
Changming Sun	a09099f2dd	Remove XNNPack from web pipelines (#18419 ) ### Description Remove XNNPack from web pipelines for now	2023-11-13 22:43:53 -08:00
Yi Zhang	0b16185223	build wasm with linux (#18106 ) ### Description Make all build_wasm tasks (NPM packaging and post merge)run on Linux. Enable web gpu test in npm package pipeline too. ### Motivation and Context Even on Windows, build_wasm is running in cygwin. So, it could save a lot of time to run it on Linux.	2023-11-14 14:42:11 +08:00
Scott McKay	897c1c1f05	Set DML package name correctly in CI (#18405 ) ### Description <!-- Describe your changes. --> Set DML package name correctly so the build doesn't try and include mobile targets. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline.	2023-11-14 14:01:59 +10:00
Scott McKay	8ff41aea09	Fix 4 more bad delegates missing the attribute that cause iOS AOT errors at runtime (#18390 ) ### Description <!-- Describe your changes. --> Fix bad delegates. Add script to detect mismatch, and run in CI and when creating nuget package. Ignore whitespace when looking at the diff to the .cs file as clang-format ran. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #18363	2023-11-14 14:00:21 +10:00
PeixuanZuo	37d8bed53d	[ROCm] add migraphx into onnxruntime-training-rocm package (#18339 )	2023-11-14 11:54:22 +08:00
PeixuanZuo	a62a500ae1	[ROCm] Update CK version (#17628 ) update ck version	2023-11-13 15:43:38 -08:00
Changming Sun	c3b5479056	Remove extra CUDA version flag (#18397 ) ### Description Only one of "--cuda_version" and "--cuda_home" is needed. If they were both specified, the first one will take precedence. Since we download cuda SDKs on-the-fly now, the machines will not need to have a preinstalled CUDA SDK therefore will not have VS-CUDA integration extension. Therefore the "--cuda_version" flag will not work. This PR deletes such usages. Related PR: #15915	2023-11-13 15:11:42 -08:00
Yulong Wang	6b0c97b43f	[js/web] fix typescript type check (#18343 ) ### Description This PR fixes the TypeScript type check. Previously, when I use esbuild to replace webpack (#17745), typescript typecheck was disabled. This causes a few TypeScript type error checked in into the code base. This PR fixes the followings: - Use "Node16" as default "module" value in tsconfig.json, because in TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is an invalid combination. - Set `noUnusedParameters` to true as default. in web override it to false because multiple code need to be updated ( a following-up PR will do this ) - set correct project file for 'web/lib/*/.ts' for ESLint (otherwise WebGPU types are not populated correctly) - fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts - upgrade "@webgpu/types" to latest to fix type error in file js/web/lib/wasm/jsep/backend-webgpu.ts - add package script "prebuild" for web to run tsc type check - add type check in CI yml file	2023-11-10 16:03:38 -08:00
Changming Sun	2d23b4e117	Update min macos version (#18251 )	2023-11-10 11:08:17 -08:00
RandySheriffH	59262dfc63	Add cuda context headers to zip (#18330 ) Expose cuda context headers for cuda custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-09 14:53:58 -08:00
Scott McKay	885bf3561d	Add tool to fix lines > 120 chars. (#18293 ) ### Description <!-- Describe your changes. --> Helper to run clang-format on lines that are > 120 chars. We disable clang-format enforcing 120 chars by default because it's formatting can negatively impact readability. If a developer has not manually kept a line within the 120 char limit this tool will fix it. It will leave all other lines alone to honor the formatting the developer chose. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Help developers fix lint errors. Preferred is to use a vertical ruler/guideline in your editor when actually writing the code.	2023-11-09 10:12:57 +10:00
Justin Chu	c250540722	Bump linter versions (#18341 ) Bump linter versions and run format.	2023-11-08 13:04:40 -08:00
Changming Sun	812532592e	Add a build validation for Linux ARM64 cross-compile (#18200 ) ### Description 1. Add a build validation for Linux ARM64/ARM32 cross-compile to catch issues listed in #18195 . 2. Revert eigen's commit id back to what we had before. ### Motivation and Context To catch cross-compile issues. Added a TODO item for fixing the compile warnings in Linux ARM32 build: AB#21639	2023-11-08 13:03:18 -08:00
Yulong Wang	d117a8010f	fix typo (node)->(browser) in linux-wasm-ci.yml (#18309 ) ### Description fix display name `'Build and test (node) (simd + threads)'` to `'Build and test (browser) (simd + threads)'`	2023-11-07 17:07:40 -08:00
Yi Zhang	9868a71373	[Fix] Stages to Run couldn't be selected (#18310 ) ### Description Add the pool definition in 2 stages even the pool is Microsoft-Hosted Pool. ### Motivation and Context Recently, in Nuget pipeline, when we click the Stages to Run ![image](https://github.com/microsoft/onnxruntime/assets/16190118/45af295e-fa75-402a-a7de-803c6a2ab7cd) It always pops up ``` Encountered error(s) while parsing pipeline YAML: Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. ```	2023-11-07 17:52:47 +08:00
Changming Sun	398ef677ba	Update protobuf python package's version (#18203 ) 1. Now we use a released version of ONNX, so we can directly download a prebuilt package from pypi.org. We do not need to build one from source. 2. Update protobuf python package's version to match the C/C++ version we are using. 3. Update tensorboard python python because the current one is incompatible with the newer protobuf version.	2023-11-06 09:22:54 -08:00
Yi Zhang	b7b8b5b2ce	Fix Eigen-3.4.0 URL and hash (#18290 ) ### Description Add CI changes for #18287 Install onnx explicitly to pass windows GPU+dml stage. ### Motivation and Context 'eigen-3.4' was refering to a branch, not to a tag. There is now an Eigen 3.4.1 on that branch, and thus the hash has changed. See https://github.com/microsoft/onnxruntime/issues/18286#issuecomment-1793683416	2023-11-06 09:19:51 -08:00
Scott McKay	c352e9b1f9	Rework/cleanup the C# build infrastructure for nuget packages. (#18127 ) ### Description Update the C# nuget build infrastructure to make building a test nuget package more user friendly and to simplify - Remove usage of dotnet and msbuild in CIs - was temporary requirement until .net 6 MAUI was added to the released Visual Studio - remove SelectedTargets property and its usage - Add property for excluding mobile targets - generally we exclude based on the nuget package name - can now specify `/p:IncludeMobileTargets=false` on the command line to force exclusion - support building test package using build.py `--build_nuget` better - limit inclusion of xamarin targets as building with them requires a lot more infrastructure - use msbuild directly if xamarin targets are included. use dotnet otherwise. - remove quoting of property values as it doesn't appear to be necessary and breaks when msbuild is being used - add infrastructure to be able to pack the nuget package on linux with `dotnet pack` - `nuget pack` is not user friendly as-per comments in changes - requires stub csproj to provide the nuspec path - Remove netstandard1.0 targets from nuspec - we removed support from the actual bindings previously - Remove usage of nuget-staging directory when creating nuget package on linux - the nuspec file element has a fully qualified path for a source file so there is no obvious benefit to copying to a staging directory prior to packing ### Motivation and Context Address issues with 1P users trying to create test nuget packages locally. Long overdue cleanup of CI complexity.	2023-11-03 09:05:17 -07:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
Yi Zhang	9f5a6856fe	Rerun the flaky ort-web tests automatically (#18187 ) ### Description Retry 3 times at most if the web test fails. ### Motivation and Context Web GPU tests are not stable. From this link, we could find these ort-web tests are all in top 10 failing tasks. https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=161&contextType=build. Generally, it could pass by manually rerunning it. So, enable it to rerun automatically. These test steps duration isn't long. So, it won't take too long to retry.	2023-11-03 16:34:56 +08:00
Changming Sun	d8d79521ca	Disable ccache for DML (#18230 ) ### Description Disable ccache for DML. This change is similar to #18104. Now the DML build job is having the same timeout issue. I don't know why. But disabling ccache probably would help.	2023-11-02 16:00:55 -07:00
Preetha Veeramalai	d87216bcb1	Openvino ep ort 23.1 (#17911 ) ### Description Integration to OpenVINO 2023.1 ### Motivation and Context - Alignment with latest OpenVINO Version. - Device name change from VPUX to NPU and Remove from supported list until official public support is available. --------- Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Saurabh Kale <saurabh1.kale@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com>	2023-11-01 08:39:39 -07:00
Scott McKay	62c7894ffe	Add mobile CIs to list run by script for external PRs. (#18094 ) ### Description <!-- Describe your changes. --> Add the mobile CIs to the list so we check external PRs don't break those. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Recent external PR was found to break iOS CI after checkin	2023-11-01 09:25:48 +10:00
liqun Fu	20f2dd8b6b	use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177 )	2023-10-31 14:58:21 -07:00

... 4 5 6 7 8 ...

2740 commits