onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-01 03:45:06 +00:00

Author	SHA1	Message	Date
Yulong Wang	bad00a3657	Add dependency dawn into deps.txt (#21910 ) ### Description Add dependency dawn into deps.txt. This is a preparation for introducing WebGPU EP.	2024-09-02 04:24:28 -07:00
Kyle	b1ae43cbcb	Add Files Signature Validation after Signed by ESRP (#21949 ) ### Description <!-- Describe your changes. --> Files signature validation after signed by ESRP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - Add validation after the ESRP process. - Make sure the targeting pattern/suffix files are signed successfully by ESRP. - If the signature is not Valid, then will fail the following stages.	2024-09-02 17:16:59 +08:00
Yi Zhang	60b07623a2	Add a reminder in set-trigger-rules script (#21929 ) ### Description After editing the set-trigger-rules.py, we must run the file. ### Motivation and Context Obviously the script wasn't run because some files's name are incorrect.	2024-08-30 12:18:10 -07:00
mindest	bfa4da4f65	Add Linux ROCm CI Pipeline (#21798 ) ### Description * Add new ROCm CI pipeline (`Linux ROCm CI Pipeline`) focusing on inference. * Resolve test errors; disable flaky tests. based on test PR #21614.	2024-08-30 14:50:32 +08:00
dependabot[bot]	4ac1558498	Bump torch from 1.13.1+cpu to 2.2.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/torch_eager_cpu (#21919 ) Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1+cpu to 2.2.0.	2024-08-29 21:57:24 -07:00
Yi Zhang	be76e1e1b8	Add dependent stages in nuget packaging pipelines (#21886 ) ### Description Since the stage need to download drop-extra, it should add the dependencies ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-29 11:34:10 +08:00
Jian Chen	e95277484e	Adding $(Build.SourcesDirectory)s to the ignoreDirectories (#21878 )	2024-08-27 19:56:48 -07:00
George Wu	23f3912334	support both qnn x64 and arm64ec stages in py packaging pipeline (#21880 ) both arm64ec and x64 packages are needed. x64 is needed for offline context binary generation and arm64ec is needed for interop with python packages that don't have prebuilt arm64 packages and only have x64.	2024-08-27 15:07:30 -07:00
Caroline Zhu	b7f09d4c27	Increase timeout for orttraining-linux-gpu pipeline (#21844 ) ### Description Increase timeout to 160 minutes ### Motivation and Context - Recent runs of orttraining-linux-gpu pipeline have been timing out	2024-08-27 11:47:12 -07:00
Jian Chen	7f851f4e61	Removing docker_base_image parameter and variables (#21864 ) ### Description Removing `docker_base_image` parameter and variables. From the Cuda Packaging pipeline. ### Motivation and Context Since the docker image is hard coded in the `onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda12/Dockerfile` and `onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda11/Dockerfile` This parameter and variable is no longer needed.	2024-08-27 10:36:17 -07:00
Yi Zhang	2877de73e1	sign native dll with correct cert (#21854 ) ### Description Fixed #21775 ### Motivation and Context The dlls should be signed with Keycode CP-230012. The default is the test code sign.	2024-08-26 16:46:19 +08:00
Caroline Zhu	983c4d57a4	Fix typo for react native pipeline (#21845 ) ### Description fix typo ### Motivation and Context [RN pipeline failing](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=188&_a=summary) since #21578 with this error: ![image](https://github.com/user-attachments/assets/75e5b968-572f-42cc-9816-7940de464cfa)	2024-08-26 12:05:11 +10:00
Guenther Schmuelling	ba7baae994	Revert "Upgrade emsdk from 3.1.59 to 3.1.62" (#21817 ) Reverts microsoft/onnxruntime#21421 Users are seeing chrome memory grow to 16GB before it crashes: https://github.com/microsoft/onnxruntime/issues/21810 Revert for now so we have time to debug.	2024-08-22 11:21:00 -07:00
Jian Chen	6c1a3f85a6	Do not allow clearing Android logs if the emulator is not running (#21578 ) ### Description Do not allow clearing Android logs if the emulator is not running ### Motivation and Context Previously the Clearing Android logs step stuck until the pipeline timeout. If one of the previous steps failed.	2024-08-22 10:18:01 -07:00
Yi Zhang	12f426c63f	update size limit check of training GPU wheel (#21762 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The training wheel size limit should be 400M	2024-08-21 09:30:05 +08:00
Tianlei Wu	7c93d5ded1	Upgrade pytorch_lightning to 2.3.3 to fix orttraining_amd_gpu_ci_pipeline (#21789 ) ### Description Upgrade pytorch_lightning to fix orttraining_amd_gpu_ci_pipeline ``` #24 1.838 WARNING: Ignoring version 1.6.0 of pytorch_lightning since it has invalid metadata: #24 1.838 Requested pytorch_lightning==1.6.0 from `cee67f4849/pytorch_lightning-1.6.0-py3-none-any.whl` has invalid metadata: .* suffix can only be used with `==` or `!=` operators #24 1.838 torch (>=1.8.*) #24 1.838 ~~~~~~^ #24 1.838 Please use pip<24.1 if you need to use this version. #24 1.838 ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10 #24 1.838 ERROR: Could not find a version that satisfies the requirement pytorch_lightning==1.6.0 (from versions: 0.0.2, 0.2, 0.2.2, 0.2.3, 0.2.4, 0.2.4.1, 0.2.5, 0.2.5.1, 0.2.5.2, 0.2.6, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.4.1, 0.3.5, 0.3.6, 0.3.6.1, 0.3.6.3, 0.3.6.4, 0.3.6.5, 0.3.6.6, 0.3.6.7, 0.3.6.8, 0.3.6.9, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.5.0, 0.5.1, 0.5.1.2, 0.5.1.3, 0.5.2, 0.5.2.1, 0.5.3, 0.5.3.1, 0.5.3.2, 0.5.3.3, 0.6.0, 0.7.1, 0.7.3, 0.7.5, 0.7.6, 0.8.1, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.10.0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0rc1, 1.3.0rc2, 1.3.0rc3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.7.post0, 1.3.8, 1.4.0rc0, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0rc0, 1.8.0rc1, 1.8.0rc2, 1.8.0, 1.8.0.post1, 1.8.1, 1.8.2, 1.8.3, 1.8.3.post0, 1.8.3.post1, 1.8.3.post2, 1.8.4, 1.8.4.post0, 1.8.5, 1.8.5.post0, 1.8.6, 1.9.0rc0, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 2.0.0rc0, 2.0.0, 2.0.1, 2.0.1.post0, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.9.post0, 2.1.0rc0, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0) #24 1.838 ERROR: No matching distribution found for pytorch_lightning==1.6.0 ```	2024-08-19 12:58:22 -07:00
jingyanwangms	c018ba43ef	[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742 ) ### Description - TensorRT 10.2.0.19 -> 10.3.0.26 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 13:26:41 -07:00
Edward Chen	63e8849992	build_aar_package.py - Check that executable is present before trying to copy it. (#21730 ) Check that executable is present before trying to copy it. Accommodate builds where we skip building the test executables.	2024-08-16 11:21:09 -07:00
Yi Zhang	8a59b4dc4b	Move Python Training CUDA 12.2 pipeline to another pool. (#21745 ) ### Description <!-- Describe your changes. --> ### Motivation and Context [Python Training CUDA 12.2 pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary) has been always cancelled by remote provider since Aug 2nd. But other workflows with the same pool haven't this issue. It looks like there're some weird things in Azure devops. It works by using another pool. In fact, the SKU is smaller than the old. ### Verification https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary	2024-08-15 17:31:56 +08:00
Satya Kumar Jandhyala	6d8de1f7b8	Upgrade emsdk from 3.1.59 to 3.1.62 (#21421 ) ### Description Upgrade EM SDK to 3.1.62. ### Motivation and Context The changes are required to clear wasm64 errors.	2024-08-14 12:38:52 -07:00
Prathik Rao	e32e3575d8	pin pytorch lightning version for training CI (#21731 ) ### Description <!-- Describe your changes. --> Pins pytorch-lightning package to version 2.3.3 since version >=2.4.0 requires torch > 2.1.0 which is not compatible with cu118. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19 Release Preparation	2024-08-13 20:04:56 -07:00
Yi Zhang	6db3d63add	move the A100 stage to main build (#21722 ) ### Description <!-- Describe your changes. --> ### Motivation and Context We couldn't get enough A100 agent time to finish the jobs since today. The PR makes the A100 job only runs in main branch to unblock other PRs if it's not recovered in a short time.	2024-08-13 22:48:58 +08:00
George Wu	a8462ffb61	enable qnn python arm64ec packaging (#21575 ) create the x64 qnn python package as arm64ec so it can be published publicly.	2024-08-12 22:43:17 -07:00
Yulong Wang	6ae7e02d34	Web CI: make multi-browser test job optional (#21669 ) ### Description This job is a little bit unstable. Make it optional to avoid blocking other PRs before we revise it.	2024-08-09 23:53:26 -07:00
Scott McKay	410ae94e9e	Use zipped xcframework in nuget package (#21663 ) ### Description <!-- Describe your changes. --> The xcframework now uses symlinks to have the correct structure according to Apple requirements. Symlinks are not supported by nuget on Windows. In order to work around that we can store a zip of the xcframeworks in the nuget package. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix nuget packaging build break	2024-08-09 17:38:18 +10:00
Tianlei Wu	a46e49b439	Unblock migraphx and linux GPU training ci pipelines (#21662 ) ### Description * Fix migraphx build error caused by https://github.com/microsoft/onnxruntime/pull/21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by https://github.com/microsoft/onnxruntime/pull/19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-08 19:44:15 -07:00
Scott McKay	d616025884	Match changes in gh-pages PR (#21628 ) ### Description <!-- Describe your changes. --> Update to match #21627 and make the info for Split consistent. As a Split that doesn't split anything is a no-op it doesn't seem meaningful to call that limitation out in the docs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-08 10:29:15 +10:00
Adrian Lizarraga	0acefc7988	[QNN EP] Update QNN SDK to 2.25 (#21623 ) ### Description - Update pipelines to use QNN SDK 2.25 by default - Update ifdef condition to apply workaround for QNN LayerNorm validation bug to QNN SDK 2.25 (as well as 2.24) ### Motivation and Context Use the latest QNN SDK	2024-08-06 09:08:48 -07:00
Yi Zhang	0d1da41ca8	Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612 ) ### Description Improve docker commands to make docker image layer caching works. It can make docker building faster and more stable. So far, A100 pool's system disk is too small to use docker cache. We won't use pipeline cache for docker image and remove some legacy code. ### Motivation and Context There are often an exception of ``` 64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail 286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2) ``` Because Onnxruntime pipeline have been sending too many requests to download Nodejs in docker building. Which is the major reason of pipeline failing now In fact, docker image layer caching never works. We can always see the scrips are still running ``` #9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts #9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) #9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) #9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory #9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']' #9 0.236 +++ tr -dc 0-9. #9 0.236 +++ cut -d . -f1 #9 0.238 ++ os_major_version=8 .... #9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail #9 60.59 + return 0 ... ``` This PR is improving the docker command to make image layer caching work. Thus, CI won't send so many redundant request of downloading NodeJS. ``` #9 [2/5] ADD scripts /tmp/scripts #9 CACHED #10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts #10 CACHED #11 [4/5] RUN adduser --uid 1000 onnxruntimedev #11 CACHED #12 [5/5] WORKDIR /home/onnxruntimedev #12 CACHED ``` ###Reference https://docs.docker.com/build/drivers/ --------- Co-authored-by: Yi Zhang <your@email.com>	2024-08-06 21:37:09 +08:00
Edward Chen	a5ce65d87a	Clean up some mobile package related files and their usages. (#21606 ) The mobile packages have been removed.	2024-08-05 16:38:20 -07:00
Scott McKay	bcc01ac123	Updates to apple packaging (#21611 ) ### Description <!-- Describe your changes. --> Add ability to test packaging without rebuilding every time. Add ability to comment out some platforms/architectures without the scripts to assemble the c/obj-c packages breaking. Update a couple of commands to preserve symlinks. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make debugging packaging issues faster. Creates correct package for mac-catalyst and doesn't require setting symlinks via bash script.	2024-08-06 08:50:56 +10:00
vraspar	88c811b638	Restructure MacOS framework package to fix malformed Framework errors (#21536 ) ### Description Refactor framework directory structure for MacOS packages ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Apple started enforcing specific [framework structure](https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPFrameworks/Concepts/FrameworkAnatomy.html) for MacOS packages. We need to change how we package for MacOS to follow the guidelines Fixes following issue: [Malformed Framework](https://github.com/microsoft/onnxruntime-swift-package-manager/issues/19 )	2024-08-04 12:47:16 -07:00
Julius Tischbein	1391354265	Adding CUDNN Frontend and use for CUDA NN Convolution (#19470 ) ### Description Added CUDNN Frontend and used it for NHWC convolutions, and optionally fuse activation. #### Backward compatible - For model existed with FusedConv, model can still run. - If ORT is built with cuDNN 8, cuDNN frontend will not be built into binary. Old kernels (using cudnn backend APIs) are used. #### Major Changes - For cuDNN 9, we will enable cudnn frontend to fuse convolution and bias when a provider option `fuse_conv_bias=1`. - Remove the fusion of FusedConv from graph transformer for CUDA provider, so there will not be FusedConv be added to graph for CUDA EP in the future. - Update cmake files regarding to cudnn settings. The search order of CUDNN installation in build are like the following: * environment variable `CUDNN_PATH` * `onnxruntime_CUDNN_HOME` cmake extra defines. If a build starts from build.py/build.sh, user can pass it through `--cudnn_home` parameter, or by environment variable `CUDNN_HOME` if `--cudnn_home` not used. * cudnn python package installation directory like python3.xx/site-packages/nvidia/cudnn * CUDA installation path #### Potential Issues - If ORT is built with cuDNN 8, FusedConv fusion is no longer done automatically, so some model might have performance regression. If user still wants FusedConv operator for performance reason, they can still have multiple ways to walkaround: like use older version of onnxruntime; or use older version of ORT to save optimized onnx, then run with latest version of ORT. We believe that majority users have moved to cudnn 9 when 1.20 release (since the default in ORT and PyTorch is cudnn 9 for 3 months when 1.20 release), so the impact is small. - cuDNN graph uses TF32 by default, and user cannot disable TF32 through the use_tf32 cuda provider option. If user encounters accuracy issue (like in testing), user has to set environment variable `NVIDIA_TF32_OVERRIDE=0` to disable TF32. Need update the document of use_tf32 later. #### Follow ups This is one of PRs that target to enable NHWC convolution in CUDA EP by default if device supports it. There are other changes will follow up to make it possible. (1) Enable `prefer_nhwc` by default for device with sm >= 70. (2) Change `fuse_conv_bias=1` by default after more testing. (3) Add other NHWC operators (like Resize or UpSample). ### Motivation and Context The new CUDNN Frontend library provides the functionality to fuse operations and provides new heuristics for kernel selection. Here it fuses the convolution with the pointwise bias operation. On the [NVIDIA ResNet50](https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/) we get a performance boost from 49.1144 ms to 42.4643 ms per inference on a 2560x1440 input (`onnxruntime_perf_test -e cuda -I -q -r 100-d 1 -i 'prefer_nhwc\|1' resnet50.onnx`). --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>	2024-08-02 15:16:42 -07:00
dependabot[bot]	3b73ef2bf7	Bump torch from 1.13.1 to 2.2.0 in /tools/ci_build/github/windows/eager (#21505 ) Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytorch/pytorch/releases">torch's releases</a>.</em></p> <blockquote> <h2>PyTorch 2.2: FlashAttention-v2, AOTInductor</h2> <h1>PyTorch 2.2 Release Notes</h1> <ul> <li>Highlights</li> <li>Backwards Incompatible Changes</li> <li>Deprecations</li> <li>New Features</li> <li>Improvements</li> <li>Bug fixes</li> <li>Performance</li> <li>Documentation</li> </ul> <h1>Highlights</h1> <p>We are excited to announce the release of PyTorch® 2.2! PyTorch 2.2 offers ~2x performance improvements to <code>scaled_dot_product_attention</code> via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments.</p> <p>This release also includes improved torch.compile support for Optimizers, a number of new inductor optimizations, and a new logging mechanism called TORCH_LOGS.</p> <p><strong>Please note that we are <a href="https://redirect.github.com/pytorch/pytorch/issues/114602">deprecating macOS x86 support</a>, and PyTorch 2.2.x will be the last version that supports macOS x64.</strong></p> <p>Along with 2.2, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.</p> <p>This release is composed of 3,628 commits and 521 contributors since PyTorch 2.1. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.2. More information about how to get started with the PyTorch 2-series can be found at our <a href="https://pytorch.org/get-started/pytorch-2.0/">Getting Started</a> page.</p> <p>Summary:</p> <ul> <li><code>scaled_dot_product_attention</code> (SDPA) now supports FlashAttention-2, yielding around 2x speedups compared to previous versions.</li> <li>PyTorch 2.2 introduces a new ahead-of-time extension of TorchInductor called AOTInductor, designed to compile and deploy PyTorch programs for non-python server-side.</li> <li><code>torch.distributed</code> supports a new abstraction for initializing and representing ProcessGroups called device_mesh.</li> <li>PyTorch 2.2 ships a standardized, configurable logging mechanism called TORCH_LOGS.</li> <li>A number of torch.compile improvements are included in PyTorch 2.2, including improved support for compiling Optimizers and improved TorchInductor fusion and layout optimizations.</li> <li>Please note that we are deprecating macOS x86 support, and PyTorch 2.2.x will be the last version that supports macOS x64.</li> <li><code>torch.ao.quantization</code> now offers a prototype <code>torch.export</code> based flow</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`8ac9b20d4b`"><code>8ac9b20</code></a> Run docker release build on final tag (<a href="https://redirect.github.com/pytorch/pytorch/issues/117131">#117131</a>) (<a href="https://redirect.github.com/pytorch/pytorch/issues/117182">#117182</a>)</li> <li><a href="`2490352430`"><code>2490352</code></a> Fix cuInit test on Windows (<a href="https://redirect.github.com/pytorch/pytorch/issues/117095">#117095</a>)</li> <li><a href="`3a44bb713f`"><code>3a44bb7</code></a> [CI] Test that cuInit is not called during import (<a href="https://redirect.github.com/pytorch/pytorch/issues/117043">#117043</a>)</li> <li><a href="`1c8ba3847d`"><code>1c8ba38</code></a> [CI] Use jemalloc for CUDA builds (<a href="https://redirect.github.com/pytorch/pytorch/issues/116900">#116900</a>) (<a href="https://redirect.github.com/pytorch/pytorch/issues/116988">#116988</a>)</li> <li><a href="`96d2ddbafe`"><code>96d2ddb</code></a> Store user model to simplify ONNXProgram.{adapt_torch_*,<strong>call</strong>} APIs (<a href="https://redirect.github.com/pytorch/pytorch/issues/1152">#1152</a>...</li> <li><a href="`738b4a560a`"><code>738b4a5</code></a> Update ONNX's IO Adapter to support FakeTensor with ExportedProgram (<a href="https://redirect.github.com/pytorch/pytorch/issues/114407">#114407</a>)...</li> <li><a href="`4cf10bf4dc`"><code>4cf10bf</code></a> [Cherry-pick] [Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval ...</li> <li><a href="`7e97e4b4b6`"><code>7e97e4b</code></a> [AARCH64] Fall back to GEMM if mkldnn_matmul fails (<a href="https://redirect.github.com/pytorch/pytorch/issues/115936">#115936</a>) (<a href="https://redirect.github.com/pytorch/pytorch/issues/116666">#116666</a>)</li> <li><a href="`1a3e3c7cff`"><code>1a3e3c7</code></a> [CUDA] baddmm should fall back to addmm for batch=1 (<a href="https://redirect.github.com/pytorch/pytorch/issues/114992">#114992</a>) (<a href="https://redirect.github.com/pytorch/pytorch/issues/116518">#116518</a>)</li> <li><a href="`ab7505f78c`"><code>ab7505f</code></a> Fix broken PyYAML 6.0 on MacOS x86 (<a href="https://redirect.github.com/pytorch/pytorch/issues/115956">#115956</a>) (<a href="https://redirect.github.com/pytorch/pytorch/issues/116551">#116551</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torch&package-manager=pip&previous-version=1.13.1&new-version=2.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-01 04:28:43 -07:00
vraspar	07d3be5b0e	CoreML: Add ML Program Split Op (#21456 ) ### Description Add support for Split Op ### Motivation and Context Address operator gaps in high priority model. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-30 14:04:47 +10:00
Yifan Li	5d78b9a17b	[TensorRT EP] Update TRT OSS Parser to 10.2 (#21552 ) ### Description <!-- Describe your changes. --> Update TRT OSS Parser to [latest 10.2-GA branch](`f161f95883`) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 17:27:38 -07:00
Jian Chen	79537d0523	Remove tools/ci_build/github/android/run_nnapi_code_coverage.sh (#21371 ) ### Description Remove tools/ci_build/github/android/run_nnapi_code_coverage.sh ### Motivation and Context This file is no longer needed	2024-07-29 10:00:52 -07:00
Jian Chen	bc3713206d	Update QNN pipeline pool (#21482 ) ### Description Update QNN pipeline pool ### Motivation and Context Let all our pipelines are using the latest NDK version	2024-07-29 10:00:21 -07:00
Yi Zhang	05cef469e8	Move on-device training packages publish step (#21539 ) ### Description Since the onedevice training cpu packaging has been a separated pipeline, it's nuget package publishing step must be moved as well. ### Motivation and Context Fixes the exception in Nuget Publishing Packaging Pipeline caused by #21485	2024-07-29 09:59:46 -07:00
Jian Chen	7e23212de9	Delete tools/ci_build/github/azure-pipelines/win-gpu-ci-pipeline.yml (#21529 ) ### Description Delete tools/ci_build/github/azure-pipelines/win-gpu-ci-pipeline.yml ### Motivation and Context This CI pipeline has been divided into 4 different pipeline.	2024-07-27 15:58:12 -07:00
maggie1059	10b4a3b90b	Fix conda failure for onnxruntime-directml (#21526 ) The change in #21005 works for directly building wheels with `build.py`, but ort-nightly-directml wheels, as well as the 1.18.1 release of the onnxruntime-directml python wheel, still do not work with conda since they're built from the `py-win-gpu.yml` pipeline, which uses `install_third_party_deps.ps1` to set compile flags.	2024-07-26 22:26:38 -07:00
Scott McKay	5af423c7c0	Set version and other info in the C# dll (#21517 ) ### Description <!-- Describe your changes. --> Set version and other info in the Microsoft.ML.OnnxRuntime C# dll by setting GenerateAssemblyInfo to true and passing in ORT version in the CI. Minor re-org of the order of properties so related things are grouped a little better. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #21475	2024-07-27 13:22:57 +10:00
Jian Chen	7db7c4e5c8	Separating all GPU stages into different Pipelines (#21521 ) ### Description Separating all GPU stages into different Pipelines	2024-07-26 14:54:45 -07:00
Prathik Rao	278f0f5cd2	disables qnn in ort training cpu pipeline (#21510 ) ### Description <!-- Describe your changes. --> `enable_windows_arm64_qnn` and `enable_windows_x64_qnn` are true by default but unnecessary for training. This change explicitly sets these parameters to false for training pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19 Release Preparation	2024-07-26 17:23:35 +08:00
Scott McKay	b0e1f7f798	CoreML: Aggregated changes to add all required ops for priority model (#21472 ) ### Description <!-- Describe your changes. --> Add these changes to one PR to simplify checkin - Add Concat (#21423) - Add DepthToSpace (#21426) - Add LeakyRelu (#21453) - Add test scripts (#21427) - Add ability to set coreml flags from python (#21434) Other changes - updated partitioning utils to support dropping constant initializers from a ComputeCapability's inputs. - noticed that the list of inputs to the coreml model was unexpectedly long due to this - we copy constant initializers to a CoreML model so don't need the originals, and if they remain as inputs ORT can't free them as they appear to be in use. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-26 08:29:33 +10:00
Scott McKay	3cdf4b917b	Fix Android CI Pipeline code coverage failure (#21504 ) ### Description <!-- Describe your changes. --> Current failure is due to a version mismatch. Use llvm-cov from the Android NDK instead of the system gcov so that the version is correct. Also comment out publishing to the Azure dashboard to simplify the setup. The CI prints out the stats for review by developers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI pipeline	2024-07-26 07:36:23 +10:00
Changming Sun	4167b68abf	Split ondevice training cpu packaging pipeline to a separated pipeline (#21485 ) ### Description Right now our "Zip-Nuget-Java-Nodejs Packaging Pipeline" is too big. This OnDevice training part is independent of the others, so it can be split out. Then our NPM Packaging pipeline will not depends on this training stuff. ### Motivation and Context Similar to #21235 Also, this PR fixed a problem that: "NuGet_Test_Linux_Training_CPU" job downloads artifacts from "onnxruntime-linux-x64" for getting customop shared libs, but the job forget to declare it depends on the "Linux_C_API_Packaging_CPU_x64" which produces the artifact. Such problems can be hard to find when a pipeline goes big.	2024-07-25 10:58:34 -07:00
Yifan Li	ebcb7075eb	Set CUDA12 as default in GPU packages (#21438 ) ### Description * Swap cuda version 11.8/12.2 in GPU CIs * Set CUDA12 as default version in yamls of publishing nuget/python/java GPU packages * Suppress warnings as errors of flash_api.cc during ort win-build	2024-07-25 10:17:16 -07:00
Adrian Lizarraga	eb9b377306	[QNN EP] Update to QNN SDK 2.24.0 (#21463 ) ### Description - Update pipelines to use QNN SDK 2.24 by default - Update QNN_Nuget_Windows pipeline to build csharp solution without mobile projects (fixes errors). - Implement workaround for QNN 2.24 validation bug for LayerNorm ops without an explicit bias input. - Enable Relu unit test, which now passes due to the fact Relu is no longer fused into QuantizeLinear for QNN EP. - Fix bug where a negative quantization axis is not properly normalized for per-channel int4 conv. ### Motivation and Context Update QNN SDk.	2024-07-24 10:17:12 -07:00
Changming Sun	b04adcc381	Update copy_strip_binary.sh: use "make install" instead (#21464 ) ### Description Before this change, copy_strip_binary.sh manually copies each file from onnx runtime's build folder to an artifact folder. It can be hard when dealing with symbolic link for shared libraries. This PR will change the packaging pipelines to run "make install" first, before packaging shared libs . ### Motivation and Context Recently because of feature request #21281 , we changed libonnxruntime.so's SONAME. Now every package that contains this shared library must also contains libonnxruntime.so.1. Therefore we need to change the packaging scripts to include this file. Instead of manually construct the symlink layout, using `make install` is much easier and will make things more consistent because it is a standard way of making packages. Breaking change: After this change, our inference tarballs that are published to our Github release pages will be not contain ORT training headers.	2024-07-24 10:02:00 -07:00

1 2 3 4 5 ...

2045 commits