onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

Author	SHA1	Message	Date
Jian Chen	05526b354b	Adding new yaml file for downloading cuda, and trt from azure blob (#18443 ) This also set the Path variable for the downloaded libraries. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 19:47:39 -08:00
Ye Wang	f9af94009b	onboard MoE (#18279 ) ### Description <!-- Describe your changes. --> 1. Introduce MoE CUDA op to ORT based on FT implementation. 2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows. Remove patch file for cutlass 3.0.0. 3. Sharded MoE implementation will come with another PR limitation: __CUDA_ARCH__ >= 700 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 16:48:51 -08:00
Changming Sun	27d068569a	Remove Node.js tool installer task from web ci pipeline (#18434 ) EMSDK already has a nodejs. We will use that one to be more consistent(the CI build pipeline would be less dependent on the VM image).	2023-11-14 13:16:01 -08:00
Yulong Wang	d22b1af5da	[js/web] add CI steps to log info for test failure investigating (#18418 ) ### Description add CI steps to log info for test failure investigating. Currently Web CI is marked as 'optional'. This change adds some script to dump debug info for investigating the random test failure	2023-11-14 11:40:58 -08:00
Changming Sun	a09099f2dd	Remove XNNPack from web pipelines (#18419 ) ### Description Remove XNNPack from web pipelines for now	2023-11-13 22:43:53 -08:00
Yi Zhang	0b16185223	build wasm with linux (#18106 ) ### Description Make all build_wasm tasks (NPM packaging and post merge)run on Linux. Enable web gpu test in npm package pipeline too. ### Motivation and Context Even on Windows, build_wasm is running in cygwin. So, it could save a lot of time to run it on Linux.	2023-11-14 14:42:11 +08:00
Scott McKay	897c1c1f05	Set DML package name correctly in CI (#18405 ) ### Description <!-- Describe your changes. --> Set DML package name correctly so the build doesn't try and include mobile targets. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline.	2023-11-14 14:01:59 +10:00
Scott McKay	8ff41aea09	Fix 4 more bad delegates missing the attribute that cause iOS AOT errors at runtime (#18390 ) ### Description <!-- Describe your changes. --> Fix bad delegates. Add script to detect mismatch, and run in CI and when creating nuget package. Ignore whitespace when looking at the diff to the .cs file as clang-format ran. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #18363	2023-11-14 14:00:21 +10:00
PeixuanZuo	37d8bed53d	[ROCm] add migraphx into onnxruntime-training-rocm package (#18339 )	2023-11-14 11:54:22 +08:00
PeixuanZuo	a62a500ae1	[ROCm] Update CK version (#17628 ) update ck version	2023-11-13 15:43:38 -08:00
Changming Sun	c3b5479056	Remove extra CUDA version flag (#18397 ) ### Description Only one of "--cuda_version" and "--cuda_home" is needed. If they were both specified, the first one will take precedence. Since we download cuda SDKs on-the-fly now, the machines will not need to have a preinstalled CUDA SDK therefore will not have VS-CUDA integration extension. Therefore the "--cuda_version" flag will not work. This PR deletes such usages. Related PR: #15915	2023-11-13 15:11:42 -08:00
Yulong Wang	6b0c97b43f	[js/web] fix typescript type check (#18343 ) ### Description This PR fixes the TypeScript type check. Previously, when I use esbuild to replace webpack (#17745), typescript typecheck was disabled. This causes a few TypeScript type error checked in into the code base. This PR fixes the followings: - Use "Node16" as default "module" value in tsconfig.json, because in TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is an invalid combination. - Set `noUnusedParameters` to true as default. in web override it to false because multiple code need to be updated ( a following-up PR will do this ) - set correct project file for 'web/lib/*/.ts' for ESLint (otherwise WebGPU types are not populated correctly) - fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts - upgrade "@webgpu/types" to latest to fix type error in file js/web/lib/wasm/jsep/backend-webgpu.ts - add package script "prebuild" for web to run tsc type check - add type check in CI yml file	2023-11-10 16:03:38 -08:00
Changming Sun	2d23b4e117	Update min macos version (#18251 )	2023-11-10 11:08:17 -08:00
RandySheriffH	59262dfc63	Add cuda context headers to zip (#18330 ) Expose cuda context headers for cuda custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-09 14:53:58 -08:00
Changming Sun	812532592e	Add a build validation for Linux ARM64 cross-compile (#18200 ) ### Description 1. Add a build validation for Linux ARM64/ARM32 cross-compile to catch issues listed in #18195 . 2. Revert eigen's commit id back to what we had before. ### Motivation and Context To catch cross-compile issues. Added a TODO item for fixing the compile warnings in Linux ARM32 build: AB#21639	2023-11-08 13:03:18 -08:00
Yulong Wang	d117a8010f	fix typo (node)->(browser) in linux-wasm-ci.yml (#18309 ) ### Description fix display name `'Build and test (node) (simd + threads)'` to `'Build and test (browser) (simd + threads)'`	2023-11-07 17:07:40 -08:00
Yi Zhang	9868a71373	[Fix] Stages to Run couldn't be selected (#18310 ) ### Description Add the pool definition in 2 stages even the pool is Microsoft-Hosted Pool. ### Motivation and Context Recently, in Nuget pipeline, when we click the Stages to Run ![image](https://github.com/microsoft/onnxruntime/assets/16190118/45af295e-fa75-402a-a7de-803c6a2ab7cd) It always pops up ``` Encountered error(s) while parsing pipeline YAML: Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. ```	2023-11-07 17:52:47 +08:00
Changming Sun	398ef677ba	Update protobuf python package's version (#18203 ) 1. Now we use a released version of ONNX, so we can directly download a prebuilt package from pypi.org. We do not need to build one from source. 2. Update protobuf python package's version to match the C/C++ version we are using. 3. Update tensorboard python python because the current one is incompatible with the newer protobuf version.	2023-11-06 09:22:54 -08:00
Yi Zhang	b7b8b5b2ce	Fix Eigen-3.4.0 URL and hash (#18290 ) ### Description Add CI changes for #18287 Install onnx explicitly to pass windows GPU+dml stage. ### Motivation and Context 'eigen-3.4' was refering to a branch, not to a tag. There is now an Eigen 3.4.1 on that branch, and thus the hash has changed. See https://github.com/microsoft/onnxruntime/issues/18286#issuecomment-1793683416	2023-11-06 09:19:51 -08:00
Scott McKay	c352e9b1f9	Rework/cleanup the C# build infrastructure for nuget packages. (#18127 ) ### Description Update the C# nuget build infrastructure to make building a test nuget package more user friendly and to simplify - Remove usage of dotnet and msbuild in CIs - was temporary requirement until .net 6 MAUI was added to the released Visual Studio - remove SelectedTargets property and its usage - Add property for excluding mobile targets - generally we exclude based on the nuget package name - can now specify `/p:IncludeMobileTargets=false` on the command line to force exclusion - support building test package using build.py `--build_nuget` better - limit inclusion of xamarin targets as building with them requires a lot more infrastructure - use msbuild directly if xamarin targets are included. use dotnet otherwise. - remove quoting of property values as it doesn't appear to be necessary and breaks when msbuild is being used - add infrastructure to be able to pack the nuget package on linux with `dotnet pack` - `nuget pack` is not user friendly as-per comments in changes - requires stub csproj to provide the nuspec path - Remove netstandard1.0 targets from nuspec - we removed support from the actual bindings previously - Remove usage of nuget-staging directory when creating nuget package on linux - the nuspec file element has a fully qualified path for a source file so there is no obvious benefit to copying to a staging directory prior to packing ### Motivation and Context Address issues with 1P users trying to create test nuget packages locally. Long overdue cleanup of CI complexity.	2023-11-03 09:05:17 -07:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
Yi Zhang	9f5a6856fe	Rerun the flaky ort-web tests automatically (#18187 ) ### Description Retry 3 times at most if the web test fails. ### Motivation and Context Web GPU tests are not stable. From this link, we could find these ort-web tests are all in top 10 failing tasks. https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=161&contextType=build. Generally, it could pass by manually rerunning it. So, enable it to rerun automatically. These test steps duration isn't long. So, it won't take too long to retry.	2023-11-03 16:34:56 +08:00
Changming Sun	d8d79521ca	Disable ccache for DML (#18230 ) ### Description Disable ccache for DML. This change is similar to #18104. Now the DML build job is having the same timeout issue. I don't know why. But disabling ccache probably would help.	2023-11-02 16:00:55 -07:00
liqun Fu	20f2dd8b6b	use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177 )	2023-10-31 14:58:21 -07:00
Jian Chen	29e40987e3	Update batch file to set PATH for Cuda with TRT (#18182 ) ### Description Update batch file to set PATH for Cuda with TRT ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-31 10:22:40 -07:00
Jian Chen	8a574b874c	Update setup_env_cuda.bat (#18176 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-30 21:28:02 -07:00
Yi Zhang	436056dcd7	Revert "Disable dml stage in windows GPU pipeline temporarily. (#18034 )" (#18150 ) This reverts commit `99b8dcaae2`. ### Description <!-- Describe your changes. --> ### Motivation and Context Restore the dml stage in windows GPU pipeline. Agent issue is solved by adding Feature.DisableGpuDriver in pool properties.	2023-10-30 15:41:07 +08:00
Xavier Dupré	c10b83eb68	Update python cryptography version to 41.0.4 (#18056 ) ### Description Version 41.0.0 currently used has vulnerabilities. ### Motivation and Context See [Vulnerable OpenSSL included in cryptography wheels](https://github.com/advisories/GHSA-v8gr-m533-ghj9)	2023-10-27 12:06:38 +02:00
Jian Chen	7c18c60bc2	Change cuda image for tensorRT to the one with cudnn8 (#18102 ) ### Description copilot:summary ### Motivation and Context copliot::walkthrough	2023-10-26 16:28:57 -07:00
Ashwini Khade	f2e19a8ccf	Updates to training pipelines to reduce CI time (#18116 ) ### Description Motivation for this PR is reducing CI test time by removing unnecessary tests from the pipelines. Following changes are for reducing test time in pipelines: - Skip CPU model tests in GPU builds. Training CIs run these tests as a sanity check. There is no direct training code being tested in these pipelines, furthermore, CPU tests are being run in CPU pipelines so no need to run them again in GPU builds and block the GPU VM. This change reduces testing time by 20-25 mins in all training GPU pipelines. - Delete debug package building pipeline for linux training packages. This was required by compiler team at some point but there have been 0 downloads of these packages. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-26 14:58:57 -07:00
Chi Lo	455a9ce614	[TensorRT EP] Use latest onnx-tensorrt parser (#18067 ) Use latest onnx-tensorrt to fix compile error. Please see the issue https://github.com/microsoft/onnxruntime/issues/18029	2023-10-26 13:55:12 -07:00
Jian Chen	b023de0bfc	Redo #18044 Install CUDA 12.2 on Windows (#18093 )	2023-10-26 10:12:46 -07:00
Changming Sun	0f72739b6d	Disable ccache for WinML build (#18104 ) ### Description It seems would resolve the timeout issue. ### Motivation and Context	2023-10-26 19:03:01 +08:00
Jian Chen	76e275baf4	Merge Cuda docker files into a single one (#18020 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-24 15:17:36 -07:00
Changming Sun	6ec45f2ba5	Merge aiinfra-linux-ARM64-CPU-2019 and onnxruntime-linux-ARM64-CPU-2019 (#18069 ) ### Description Merge aiinfra-linux-ARM64-CPU-2019 and onnxruntime-linux-ARM64-CPU-2019 machines to a single one to ease management.	2023-10-24 13:04:08 -07:00
Changming Sun	abb329179a	Update win-wasm-ci.yml: increase the timeout value (#18023 )	2023-10-24 10:50:12 -07:00
Jian Chen	e63ccd3cbb	Install CUDA 12.2 on Windows (#18044 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-24 10:47:23 -07:00
liqun Fu	020824ed50	Update ONNX to 1.15.0rc1 (#17914 )	2023-10-20 15:08:25 -07:00
Yi Zhang	99b8dcaae2	Disable dml stage in windows GPU pipeline temporarily. (#18034 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-20 08:41:40 -07:00
Jian Chen	cbb0e0f83c	Create a new Dockerfile for cuda 12 and trt 8.6.1.6-1.cuda12.0 (#18000 )	2023-10-18 14:46:02 -07:00
Changming Sun	57c8736596	Move a nodejs test to a different machine pool (#17970 ) ### Description This is a temp fix for the failing "Zip-Nuget-Java-Nodejs Packaging Pipeline". The pipeline is failing because I removed NodeJS from the build machine pool's image, to reduce the number of dependencies we need to maintain in VMs. So this PR will temporarily move the test to a different machine pool to get the test passed. Then I will move the test to docker. Docker images are relatively easier to update and maintain. Now we almost run all Linux test in docker, except for this one. Moving it to docker is needed for enabling GPU support in nodejs, because all our Linux VMs do not have CUDA. ### Motivation and Context	2023-10-17 09:30:14 -07:00
Hariharan Seshadri	9356986730	Fix AMD builds and enable testing NHWC CUDA ops in one GPU CI (#17972 ) ### Description This PR: (1) Fixes AMD builds after #17200 broke them (Need to remember to run AMD builds while trying to merge external CUDA PRs next time) (2) Turn on the NHWC CUDA feature in the Linux GPU CI. The extra time spent in building a few more files and running a few more tests will not be much. Test Linux GPU CI run : https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1170770 ### Motivation and Context Keep the NHWC CUDA ops tested (https://github.com/microsoft/onnxruntime/pull/17200) and guard against regressions	2023-10-17 09:23:52 -07:00
Yulong Wang	f7341e8103	enable training for win-wasm-ci.yml (#17954 ) ### Description Fixes NPM Packaging pipeline. Training was enabled for linux-wasm-ci.yml but not enabled for win-wasm-ci.yml. the web CI uses linux-wasm-ci.yml NPM packaging pipeline uses win-wasm-ci.yml	2023-10-16 16:07:20 +08:00
Scott McKay	ae211999dd	Attempt to make the usage of the Android emulator in CIs more robust (#17903 ) ### Description <!-- Describe your changes. --> Android emulator usage updates: - Change approach to detecting boot has completed - use `-delay-adb` and a simple command (`ls`) with `wait-for-device` as the first step - this ensures enough startup has occurred for adb to be responsive - use secondary loop on the python side to check for sys.boot_completed to be set - doing the check on the python side provides more feedback and seems to work well - make the 'stop' logic more precise by using psutil - add internal timeout of 20 mins for emulator startup - waiting for the CI jobs overall timeout is way too long - value is hardcoded for now (most CIs startup in under 10 mins) but could be made configurable if needed CI updates: - add template for using the Android emulator - update CIs to use template - reorder React Native CI - minimize the time the Android emulator or iOS simulator is running by moving some build steps around - don't run both at the same time - unnecessary and potentially adds significant memory pressure to the machine - fix QNN Android emulator CI as much as possible - now everything works apart from running onnx_test_runner with the QNN EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix inconsistent detection of the emulator boot completing. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-10-15 08:42:36 +10:00
PeixuanZuo	0c5b1598d3	[ROCm] Add ROCm Debug wheels to private ADO Feeds (#17887 ) Add ROCm Debug wheels to private ADO Feeds	2023-10-13 10:28:10 +08:00
Changming Sun	3f3ece4a39	Update NDK to 26.0.10792818 (#17852 ) ### Description Update NDK to 26.0.10792818 which is included in every macOS build machine so that we do not need to download a different version every time in every build. ### Motivation and Context Downloading NDK on-the-fly is a main contributor of Android related build failures.	2023-10-12 14:08:43 -07:00
Yi Zhang	9d07ca3621	Move compliance check before publishing pipeline artifact (#17857 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Compliance check would fail randomly but the stage couldn't be rerun if the pipeline artifacts are already published. There's the error like `Artifact xxxx already exists`. We had to restart the whole pipeline if there's a random error in compliance check.	2023-10-12 15:48:04 +08:00
Yulong Wang	25bbd8d4eb	[js/web] allow gpu IO binding tests to fail temporarily (#17892 ) ### Description allow gpu IO binding tests to fail temporarily. when the root cause is still in investigation, use `continueOnError: true` to allow the test to fail without blocking PRs.	2023-10-11 21:21:21 -07:00
Changming Sun	138ccecd22	Change how "NPM packaging pipeline" downloads packages from another pipeline (#17838 ) ### Description "NPM packaging pipeline" needs to download an artifact from "Zip-Nuget-Java-Nodejs Packaging Pipeline". It has been a long-time issue that they two pipelines often use different commit ids. This change declares 'Zip-Nuget-Java-Nodejs Packaging Pipeline' as a resource, so that "NPM packaging pipeline" will always fetch from the pipeline run that triggers this NPM pipeline. Their official document says: "When you define a resource trigger, if its pipeline resource is from the same repo as the current pipeline, triggering follows the same branch and commit on which the event is raised."	2023-10-11 21:07:27 -07:00
Scott McKay	046939b0c1	Include CoreML in mac os python packages (#17844 ) ### Description <!-- Describe your changes. --> Include CoreML EP in python package. I've added to the base package as CoreML comes from the OS so there are no additional libraries to distribute. Updated the CPU-based provider list to add the AzureEP, which is also included in the base package, to fix some test failures. Without this the infrastructure thinks a device copy implementation is required between AzureEP and CoreML nodes, which is not the case as the AzureEP is CPU based. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #16989	2023-10-10 11:44:32 +10:00

1 2 3 4 5 ...

1688 commits