onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-18 21:21:17 +00:00

Author	SHA1	Message	Date
Ashwini Khade	e93a860819	Remove arm build for training (#19788 ) We no longer support Win arm 32 so removing the associated build and packaging job.	2024-03-05 21:54:48 -08:00
Scott McKay	db59cec82f	Don't reduce warning level for CUDA build on Windows (#19663 ) ### Description <!-- Describe your changes. --> Address warnings so all the ORT projects build with /W4 on Windows. Mainly - unused parameters - variables shadowing other ones ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #19588 started on this.	2024-03-06 15:03:55 +10:00
Yulong Wang	a788514027	[js/web] dump debug logs for karma for diagnose purpose (#19785 ) ### Description dump debug logs for karma for diagnose purpose. This is for debugging the CI issue of Chrome launch failure and considered temporary.	2024-03-05 18:27:26 -08:00
Yi Zhang	9460597b21	Update copying API header files (#19736 ) ### Description Make Linux logic consistent as Windows ### Motivation and Context onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip package `acbfc29f27/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml (L67)` Co-authored-by: Your Name <your@email.com>	2024-03-02 11:33:47 +08:00
Edward Chen	5672cdebdf	Update google benchmark to 1.8.3. (#19734 ) Update google benchmark to 1.8.3. Update deps_update_and_upload.py script to make it easier to use.	2024-03-01 11:01:58 -08:00
Changming Sun	ed550b5fe5	Change webgpu CI pipeline to use a preinstalled chrome (#19729 ) ### Description Change webgpu CI pipeline to use a preinstalled chrome. Hopefully it can increase the stability. Now the chrome got from puppeteer often failed to start.	2024-02-29 20:36:29 -08:00
Changming Sun	250779474d	Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698 ) ### Description The original one reports "out of disk space", which needs to be investigated.	2024-02-28 19:36:26 -08:00
Changming Sun	a93c31e3c9	Update dml-vs-2022.yml (#19687 ) ### Description Fix a build error in "Zip-Nuget-Java-Nodejs Packaging Pipeline" which deletes files too early.	2024-02-28 12:03:17 -08:00
Changming Sun	7a147fc6f7	Remove a bash task from webgpu CI pipeline (#19682 ) ### Description It is a "Bash" task that requires running bash on Windows. Most Windows operating systems do not have Bash installed. Given this task is only debugging purposes, we can remove it for now. ### Motivation and Context I am making this change because I am regenerating the VM image in a different manner, and the new image does not contain bash. Once this PR is in, I can switch the images.	2024-02-28 18:20:53 +08:00
Yi Zhang	f95c0773a1	Add share memory Flag in docker (#19672 ) ### Description ### Motivation and Context Ref: https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#setincshmem Co-authored-by: Your Name <your@email.com>	2024-02-28 10:40:40 +08:00
Scott McKay	1c468a03b9	Improve Nuget-CUDA-Packaging-Pipeline (#19668 ) ### Description <!-- Describe your changes. --> * Publish the artifacts as late as possible * once published the artifacts are immutable, and any retry will fail if they exist * if any step fails after publishing the stage cannot be retried * use powershell to cleanup * DeleteFiles is taking >30 mins and causing the stage to timeout * powershell took < 1s ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make pipeline more robust	2024-02-27 09:27:43 -08:00
Scott McKay	580ee20dfc	Tweak Windows build parallelization settings (#19664 ) ### Description <!-- Describe your changes. --> Use UseMultiToolTask and limit the number of cl.exe instances running. MultiToolTask info: https://devblogs.microsoft.com/cppblog/improved-parallelism-in-msbuild/ Info on why limiting CL_MPCount can help: https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows The current CIs have 4 cores (both physical and logical). Hardcoded the GPU build in win-ci.yml to use CL_MPCount of 2 as that seems to work fine. Can adjust if needed to base it on the actual number of cores or to use build.py to build. Caveat: I've run about 16 builds and haven't seen a slow build yet, but as the root cause of the slow builds isn't really known this isn't guaranteed to be a fix. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Try and prevent super slow GPU builds by reducing number of tasks potentially running in parallel.	2024-02-27 08:56:16 -08:00
Yi Zhang	3b46ab6439	Re-add testing removed by mistake. (#19647 )	2024-02-27 08:46:29 -08:00
Rachel Guo	5bb58a10e7	Enable the most verbose logging level in detox E2E React Native CI (#19659 ) ### Description <!-- Describe your changes. --> The RN CI has intermittent failure error with "app seems to idle". enable the most verbose logging level (and can add steps to dump device.log from the detox folder/artifacts if necessary) to at least get more information. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-02-26 20:00:14 -08:00
Scott McKay	8bd943be39	Retry flaky XCode iOS UI tests if we get a known error (#19639 ) ### Description <!-- Describe your changes. --> Xcode UI tests seem to be flaky: https://github.com/orgs/community/discussions/68807 Add a couple of retries if we get a "Timed out while loading Accessibility." error which is transient. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-27 09:31:32 +10:00
Yi Zhang	0fcc6fb760	Add Whisper model in CI (#19604 ) ### Description Add Whisper Conversion and E2E into Big Models pipeline ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <your@email.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>	2024-02-25 14:04:22 +08:00
Yi Zhang	c980149c85	Add log for random exception in Linux GPU Test Stage. (#19569 ) ### Description 1. check GPU status in docker 2. use stages to make test stage can leverage existing building artifacts ### Motivation and Context To investigate the root cause of the random exception `CUDA failure 100: no CUDA-capable device is detected`	2024-02-24 13:00:53 -08:00
Scott McKay	45e20bf781	Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578 ) ### Description <!-- Describe your changes. --> build.py sets a few parallelization parameters when building. Using msbuild directly lacks those. `7a5860e490/tools/ci_build/build.py (L1665-L1669)` Changed to use build.py. If there's a concern with that we _could_ set the parameters in the yaml, but that will be uglier due to duplicating logic in multiple places. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 10:38:37 +08:00
PeixuanZuo	f3e3b531fe	Update build directory clean up stage for python package pipeline (#19553 ) Fix to make clean up stage take effect. If the `SourceFolder ` is empty, the task deletes files from the root folder of the repository as though [$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables) was specified.	2024-02-20 10:31:39 +08:00
Adrian Lizarraga	4874a41008	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 ) ### Description Updates the default QNN SDK version to 2.19.2.240210. ### Motivation and Context Build and test the latest version of QNN SDK in our pipelines.	2024-02-16 16:59:43 -08:00
Tianlei Wu	1dce5e1732	Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541 ) ### Description Some test thresholds that previously worked in T4 GPU does not work anymore. The reason is current pipeline uses A10, and TF32 is enabled by default. Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random test failure. ### Motivation and Context Linux Test has random failure at tests: ProviderOptionsTest > testCUDAOptions() FAILED org.opentest4j.AssertionFailedError: array contents differ at index [446], expected: <0.0419757> but was: <0.041948937> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99) at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43) org.opentest4j.AssertionFailedError: array contents differ at index [6], expected: <0.0225981> but was: <0.022587791> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676) at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)	2024-02-16 14:41:11 -08:00
rui-ren	d63c664ca0	fix rocm ci pipeline (#19525 ) ### Description <!-- Describe your changes. --> ROCm CI pipeline issue. ``` Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20... main() File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset builder_instance.download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare self._download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare split_generators = self._split_generators(dl_manager, **split_generators_kwargs) File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators data_file = dl_manager.download_and_extract(self.config.data_url) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract return self.extract(self.download(url_or_urls)) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download downloaded_path_or_paths = map_nested( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested return function(data_struct) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download return cached_path(url_or_filename, download_config=download_config) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path output_path = get_from_cache( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache raise ConnectionError("Couldn't reach {}".format(url)) ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Update the `datasets` pipeline to latest version `2.17.0`.	2024-02-15 00:02:08 -08:00
Prathik Rao	3b03b2e046	Upgrade default ORTModule opset from 15 to 17 (#19315 ) ### Description <!-- Describe your changes. --> This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is the final opset supported by torchscript exporter (https://github.com/pytorch/pytorch/pull/107829) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Engineering excellence contribution for ORT Training DRI. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-14 11:19:33 -08:00
Yifan Li	5c7e6b2e2a	[EP Perf] Add CI option to enable TRT-OSS parser (#19448 ) ### Description <!-- Describe your changes. --> * Introducing CI option to enable TRT-OSS parser, during ep perf testing: ![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504) By default, open-sourced onnx-tensorrt parser listed under [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40) will be used if enabling this option. ### To verify this option and check the difference during ORT image build: If this option is enabled: <img width="649" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd"> If this option is not enabled (by default): <img width="683" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab"> * update default usage of cmake/trt version to the latest ### Motivation and Context Make it easier to test oss parser and find potential gap between tensorrt builtin/oss parser. Schedule runs with oss parser will be set after this PR gets merged	2024-02-12 23:04:08 -08:00
Adrian Lizarraga	4dfba53bfb	[QNN EP] Build x64 python wheel for QNN EP (#19499 ) ### Description Adds a job to the python packaging pipeline that builds x64 python wheels for QNN EP. ### Motivation and Context Necessary to create a cached QNN model on Windows x64, which is done by creating a properly configured onnxruntime session with QNN EP.	2024-02-12 20:54:04 -08:00
Baiju Meswani	c831031ad5	Remove cuda gencode 90 to reduce onnxruntime-training package size (#19486 )	2024-02-12 09:24:36 -08:00
Justin Chu	3d2ddf96e3	Bump ruff linter to 0.2.1 (#19471 ) ### Motivation and Context Include new lint rules	2024-02-08 16:08:27 -08:00
Jian Chen	75f06319d6	Change binet to bin (#19424 ) ### Description This pull request includes a small change to the `Dockerfile.manylinux2_28_cuda` file in the `tools/ci_build/github/linux/docker` directory. The change corrects the `PREPEND_PATH` argument from `/usr/local/cuda/binet` to `/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is set.	2024-02-07 09:51:02 -08:00
Edward Chen	df5c6718bd	Remove iOS simulator max runtime version limit. (#19396 )	2024-02-06 14:54:06 -08:00
Yulong Wang	a4cfdc1c28	update comments for nodejs binding artifact preparation. (#19425 ) ### Description document update as a following-up for #19274	2024-02-05 22:58:35 -08:00
Jian Chen	06a84c8a0d	Enable DML on Windows and CUDA on Linux for Node.js binding (#19274 ) This pull request includes modifications to the `c-api-cpu.yml` Azure Pipelines configuration file. The changes mainly revolve around the Node.js packaging stage and the handling of Node.js artifacts. The most significant changes include renaming the Node.js packaging stage, adding a new dependency to the stage, changing artifact names, adding a new script to list Node.js artifacts, and updating the source folder for copying NuGet binaries. Changes in Node.js packaging: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508): Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to `Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new dependency to the stage. Changes in handling of Node.js artifacts: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569): Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to `drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline artifacts for Windows x64. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598): Added a new script to list Node.js artifacts from the directory `$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640): Updated the source folder from `$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib` to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task to copy NuGet binaries to the directory `$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`. --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-02-05 14:33:58 -08:00
Yi Zhang	435e19953e	Fix llama.covert_onnx to make it runnable in CI (#19372 ) ### Description 1. make parity_check use local model to avoid using hf token 2. del the model didn't work because it tried to del the object define out of the function scope. So it caused out of memory in A10. 3. In fact, 16G GPU memory (one T4) is enough. But the conversion process always be killed in T4 and it works on A10/24G. Standard_NC4as_T4_v3 has 28G CPU memory Standard_NV36ads_A10_v5 has 440G memory. It looks that the model conversion needs very huge memory. ### Motivation and Context Last time, I came across some issues in convert_to_onnx.py so I use the onnx model in https://github.com/microsoft/Llama-2-Onnx for testing. Now, these issues could be fixed. So I use onnx model generated by this repo and the CI can cover the model conversion.	2024-02-05 07:26:24 +08:00
PeixuanZuo	0cba56e0a0	[ROCm] Fix CI pipeline by fixing pytest version (#19407 ) Fix pytest version to 7.4.4, higher version will cause error `from onnxruntime.capi import onnxruntime_validation ModuleNotFoundError: No module named 'onnxruntime.capi'`	2024-02-04 16:37:36 +08:00
Scott McKay	debd1cab10	Add coremltools 7.1 as a dependency (#19389 ) ### Description <!-- Describe your changes. --> Setup usage of coremltools via dependencies instead of copying files. Pull in some changes from https://github.com/microsoft/onnxruntime/pull/19347 in preparation for supporting ML Program and enabling building the ML Model on all platforms to make development and testing of CoreML EP code easier. - Update to coremltools 7.1 - Add patch for changes required for cross platform build of ML Program related code - Generate coreml proto files on all platforms - mainly to test these changes work everywhere, as the proto files will be used on all platforms when #19347 is checked in - rename onnxruntime_coreml_proto target to coreml_proto as it contains purely coreml protobuf code with no ORT related chagnes ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve setup.	2024-02-03 09:42:21 +10:00
Yi Zhang	e74f141338	Save stablediffusion and open-clip in pipeline cache (#19314 ) ### Description 1. save the model to pipeline cache 2. lower the similarly bar to 97 3. publish the generated image that we can check it once the test fails ### Motivation and Context Reduce model downloads	2024-01-31 09:39:27 +08:00
Rachel Guo	3e17ca3dab	Fix iOS artifacts issue in Microsoft.ML.OnnxRuntime Nuget Package (#19311 ) ### Description <!-- Describe your changes. --> Updates to only include ios archs framework in artifacts included in Nuget Package. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Related issue: https://github.com/microsoft/onnxruntime/issues/19295#issuecomment-1914143256 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-01-30 08:44:20 -08:00
Changming Sun	e91d91ae4f	Fix a build issue: /MP was not enabled correctly (#19190 ) ### Description In PR #19073 I mistunderstood the value of "--parallel". Instead of testing if args.parallel is None or not , I should test the returned value of number_of_parallel_jobs function. If build.py was invoked without --parallel, then args.parallel equals to 1. Because it is the default value. Then we should not add "/MP". However, the current code adds it. Because if `args.paralllel` is evaluated to `if 1` , which is True. If build.py was invoked with --parallel with additional numbers, then args.parallel equals to 0. Because it is unspecified. Then we should add "/MP". However, the current code does not add it. Because `if args.paralllel` is evaluated to `if 0` , which is False. This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons. ### Motivation and Context	2024-01-29 12:45:38 -08:00
Yi Zhang	e96a038f01	Add VP test in Stable diffusion pipeline (#19300 ) ### Description 1. Add visual parity test based on openai clip model 2. Add trigger rules ### Motivation and Context 1. check generated image is expected 2. reduce unnecessary triggers	2024-01-29 09:33:58 -08:00
Tianlei Wu	358650d441	Fix BigModel stable diffusion pipeline (#19277 ) ### Description Fix two issues: (1) We can only use single quote inside `bash -c "..."`. Current pipeline job stopped at `python3 demo_txt2img.py astronaut` and skip the following commands. In this change, we remove the remaining commands to get same effect (otherwise, the pipeline runtime might be 2 hours instead of 15 minutes). (2) Fix a typo of Stable.	2024-01-25 17:19:04 -08:00
Changming Sun	bc54ad3f03	Update abseil to a release tag and register neural_speed (#19255 ) ### Description Update abseil to a release tag and register neural_speed to CG. ### Motivation and Context Now we are using a non-relesed version of abseil. Using a tag is better.	2024-01-24 14:37:39 -08:00
Yi Zhang	d7aebf9ea8	Move Nuget Test from T4 to A10 to reduce release duration (#19253 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Running release process is very painful and boring because some GPU jobs have to wait so long time. ![image](https://github.com/microsoft/onnxruntime/assets/16190118/1c5c981e-68d4-4678-9758-443fbf362802) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/ba0d79ba-1554-4c7a-93dd-6ea8144c9295) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/36cab833-71c1-4ff5-bca5-f4caa9aee0c9) On the one hand, we could move some T4 from PR process since some jobs are not using T4 any more and on the other hand, we can continue to change some jobs' agent from T4 to A4 too. In the future, T4 will mainly be used for the scenarioes that big GPU memory is needed, multiple GPU cards or some special cases. Test runs: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=401786&view=logs&j=8048494c-e6eb-5e47-5e87-ff0aa863325d cc @YUNQIUGUO @snnn	2024-01-24 14:15:07 +08:00
Yi Zhang	54871a2773	Replace T4 to A10 in Linux GPU workflow (#19205 ) ### Description 1. Update Linux GPU machine from T4 to A10, sm=8.6 2. update the tolerance ### Motivation and Context 1. Free more T4 and test with higher compute capability. 2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss and fail this test ``` 2024-01-19T13:27:18.8302842Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:25.8438641Z Expected equality of these values: 2024-01-19T13:27:25.8438841Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:25.8439276Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:25.8439464Z ret.first 2024-01-19T13:27:25.8445514Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ 2024-01-19T13:27:25.8446198Z 2024-01-19T13:27:25.8555736Z [ FAILED ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms) 2024-01-19T13:27:25.8556077Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312 2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:29.3175144Z Expected equality of these values: 2024-01-19T13:27:29.3175389Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:29.3175812Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:29.3176080Z ret.first 2024-01-19T13:27:29.3176322Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ ``` 3. some other test like SSD throw other exception, so skip them ''' 2024-01-22T09:07:40.8446910Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-22T09:07:51.5587571Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358: Failure 2024-01-22T09:07:51.5588512Z Expected equality of these values: 2024-01-22T09:07:51.5588870Z COMPARE_RESULT::SUCCESS 2024-01-22T09:07:51.5589467Z Which is: 4-byte object <00-00 00-00> 2024-01-22T09:07:51.5589953Z ret.first 2024-01-22T09:07:51.5590462Z Which is: 4-byte object <01-00 00-00> 2024-01-22T09:07:51.5590841Z expected 1, got 63 '''	2024-01-23 10:49:24 -08:00
Adrian Lizarraga	37d14d7896	[QNN EP] Create Windows ARM64 nightly python package (#19128 ) ### Description Adds a job to create a nightly python package for ORT/QNN on Windows ARM64. Must build onnxruntime-qnn with python 3.11 and numpy 1.25. Note: pipeline run may take up to 3 hrs ### Motivation and Context Make it possible to get a nightly python package with the latest updates to QNN EP. Issue #19161	2024-01-22 18:14:41 -08:00
Yifan Li	e283cdb218	Fix Fuzz Testing CI (#19228 ) ### Description <!-- Describe your changes. --> Add BuildArch To verify: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=400952&view=logs&j=5b022bb4-70a7-5401-8766-a8a7802c7150&t=291e85c7-5547-590b-50de-4e01fcd4eba3&l=14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-22 15:44:57 -08:00
Yi Zhang	780acda7b4	Add Big models pipeline (#19222 ) ### Description 2 models are added in CI. Stabe diffusion Model stage is based on https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md LLama2 FP16 is based on https://github.com/microsoft/Llama-2-Onnx. 12G GPU memory is not enough, so I choose T4 to run it. ### Motivation and Context Add regular E2E test for big models. It will be triggered in main build, that is, it'll run after one PR is merged. More models will be added later. ### Test Runs ### https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1275191&view=results	2024-01-22 14:02:56 -08:00
Edward Chen	c8ce83967e	Download protoc for all Apple host builds, remove protoc build from iOS packaging pipeline. (#19209 )	2024-01-19 15:30:09 -08:00
Adrian Lizarraga	28a16c223c	[QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129 ) ### Description Update QNN pipelines to use QNN SDK 2.18 by default ### Motivation and Context Test with the latest version of QNN SDK by default.	2024-01-18 14:59:23 -08:00
Yi Zhang	dc1fed7268	[Fix] Dual Cuda version isn't supported as expected in Linux Gpu pipeline (#19192 ) ### Description <!-- Describe your changes. --> ### Motivation and Context It isn't support expected dual cuda version cuda 12 link https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1272235&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91	2024-01-18 13:26:26 -08:00
Guenther Schmuelling	dd2177c5d7	enable webnn in ci build (#19163 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-18 13:11:47 -08:00
Jian Chen	9da3e36138	Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-17 17:20:42 -08:00
Changming Sun	81d363045b	Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117 ) ### Description Upgrade Ubuntu machine pool from 20.04 to 22.04	2024-01-16 17:25:18 -08:00
Changming Sun	e2e488d6f8	Revert "iOS packaging pipeline stability" (#19135 ) Reverts microsoft/onnxruntime#19097 because it broken Android CI pipeline.	2024-01-16 09:18:35 -08:00
Jian Chen	c92f72ebeb	Merge Linux Nuget GPU pipeline with zip-nuget (#19120 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-16 08:59:03 -08:00
pengwa	1150b1f81e	ORTModule memory improvement (#18924 ) ## Dependency https://github.com/microsoft/onnxruntime/pull/19007 ## ORTModule memory efficient gradient management Previously I have tried to solve the coarsed-grained gradient accumulation/update problem in ORTModule with https://github.com/microsoft/onnxruntime/pull/8979, while that resolution somehow is not fully validated with DDP or there is user hooks on the gradient accumulation on torch parameter. This PR is addressing the problem in the similar approach as PR 8979, e.g. trigger gradient accumulation once ORT computed the grad, but instead of use a AccumulateGrad op, this time with a ONNX operator PythonOp, internally it will call param.backward(grad), which will help handle all related hooks correctly. ## Design Check the details from https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ ## Convergence Validation: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe) differences are on mostly 0.000x, sometimes 0.00x, which may comes from the different order gradient apply happens before or after this change (on deepspeed zero stage 2) ## TODO Consolidate the logic with Stage3's similar logic.	2024-01-16 08:57:37 +08:00
Yi Zhang	922a2f00e3	Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Linux_GPU_x64 job in the pipeline has been canceled due to timeout since 0112.	2024-01-15 14:37:22 +08:00
Jian Chen	c3ce9df80c	Disabling python3.12 on training python packaging pipleines (#19123 )	2024-01-14 14:51:00 -08:00
Jian Chen	76797127d6	Always download cuda and trt libraries from Azure blob (#19118 ) ### Description This way, we will not need to update the windows images constantly and allow more flexibility to choose the cuda version in the future.	2024-01-14 11:37:26 -08:00
Yulong Wang	f917dde717	[web] remove xnnpack from web backends (#19116 ) ### Description XNNPACK is already disabled in web assembly build. This change removes the xnnpack backend registration in JS.	2024-01-13 23:04:02 -08:00
Edward Chen	e1e45901e2	iOS packaging pipeline stability (#19097 ) - Remove protoc build step which sometimes times out. Download protoc instead. - Use macOS-12 image in the set variables stage. It seems more stable.	2024-01-13 19:27:44 -08:00
Changming Sun	5558912d7b	Disable ccache in Windows CPU CI pipeline (#19131 ) ### Description Disable ccache for all the jobs in in Windows CPU CI pipeline. Before disabling it, the build has a warning that: "MSIL .netmodule or module compiled with /GL found; restarting link with /LTCG; add /LTCG to the link command line to improve linker performance" After disabling it, the warning is gone and the build doesn't use /GL or /LTCG. Cache itself should not cause this difference. ### Motivation and Context	2024-01-13 18:40:43 -08:00
Adrian Lizarraga	65893ef382	Add --parallel to QNN EP NuGet pipeline build command (#19126 ) ### Description Add --parallel to QNN EP NuGet pipeline build command ### Motivation and Context Improve build times for pipeline.	2024-01-13 02:38:40 -08:00
Jian Chen	78e796bb27	Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096 ) ### Description Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. For example, https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts	2024-01-12 22:30:43 -08:00
Jian Chen	e5eacc6d11	Fix cuda-packaging-pipeline.yml (#19115 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-12 19:09:25 -08:00
Guenther Schmuelling	96dbac6e4b	update to emsdk-3.1.51 (#18844 )	2024-01-12 16:04:33 -08:00
Caroline Zhu	4dbaa73738	[js/web/training] added end-to-end tests (#18700 ) ## Summary * following inference's [set-up for end-to-end tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e), created an end-to-end test runner for training * this test runner copies testdata from the [trainingapi folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api) * then runs two tests (training session with evalModel & optimizer model, and training session with the minimum options), and tests if the ORT-web training package encompasses inference * these tests check * createTrainingSession * runTrainStep * runOptimizerStep if applicable * the parameters methods (getParametersSize, loadParametersBuffer, and getContiguousParameters) ## TL;DR * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for setting up and running the end to end tests * [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8) contains the test function definitions (`testInferenceFunction`, `testTrainingFunctionMin`, `testTrainingFunctionAll`) ## Flow * entrypoint: user runs the following command in the terminal: `npm run test:training:e2e` * [`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28) was modified to include an npm script that will run `run.js` which will run the end to end tests * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for * detecting and installing local tarball packages of ORT-web * copying training data to the `js/web/training/e2e/data` folder * starting two Karma processes. Karma is a test runner framework that simulates testing in the browser. * In this case, the tests happen in Chrome. We can configure the tests to run in Edge and other browsers in the future. * one of these karma processes is self-hosted, meaning it pulls the ORT-web package from local * the other karma process is not self-hosted, meaning it pulls the ORT-web package from another source. In this case, we start an http server that serves the ORT-web binaries. * [`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c) is responsible for starting the HTTP server and serving the ORT binary files. This code almost identical to the same code in the inference E2E tests. * [`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28) Karma configuration file that specifies what happens when a karma process is started. The config specifies Mocha as the testing framework, which will go through all the loaded files and run any tests that exist * [`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221) File that contains the tests that Mocha will pick up on and run. * The test functions (such as testInference and testTrainingFunctionAll) are defined in [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8). ## Notes * I followed the [tests for training core](`b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc`) where they randomly generated input for the training session * E2E tests are triggered by running `npm run test:training:e2e` -- suggestions for alternative script names are appreciated!!! ## Motivation and Context - adding training bindings for web	2024-01-12 13:33:33 -08:00
Changming Sun	55b046e97e	Remove enable_mac_silicon settings (#19108 ) ### Description Remove enable_mac_silicon settings from two packaging pipelines. ### Motivation and Context Now we build universal2 packages instead.	2024-01-12 11:01:39 -08:00
Changming Sun	0e8d4c3d21	Enable Address Sanitizer in CI (#19073 ) ### Description 1. Add two build jobs for enabling Address Sanitizer in CI. One for Windows CPU, One for Linux CPU. 2. Set default compiler flags/linker flags in build.py for normal Windows/Linux/MacOS build. This can help control compiler flags in a more centralized way. 3. All Windows binaries in our official packages will be built with "/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft public symbol server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols). Limitations: 1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries. Therefore once Address Sanitizer is enabled, before running tests we need to manually set LD_LIBRARY_PATH properly otherwise libonnxruntime.so may not be able to find custom ops and shared EPs. 4. On Linux we also need to set LD_PRELOAD before running some tests(if the main executable, like python, is not built with address sanitizer. On Windows we do not need to. 5. On Windows before running python tests we should manually copy address sanitizer DLL to the onnxruntime/capi directory, because python 3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the information provided by PATH env. 6. On Linux Address Sanitizer found a lot of memory leaks from our python binding code. Therefore right now we cannot enable Address Sanitizer when building ONNX Runtime with python binding. 7. Address Sanitizer itself uses a lot of memory address space and delays memory deallocations, which is easy to cause OOM issues in 32-bit applications. We cannot run all the tests in onnxruntime_test_all in 32-bit mode with Address Sanitizer due to this reason. However, we still can run individual tests in such a way. We just cannot run all of them in one process. ### Motivation and Context To catch memory issues.	2024-01-12 07:24:40 -08:00
Changming Sun	285606108a	Set pythonInterpreter in set-python-manylinux-variables-step.yml (#19105 ) ### Description Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix a build error: ``` Starting: Set Python manylinux variables ============================================================================== Task : Python script Description : Run a Python file or inline script Version : 0.231.1 Author : Microsoft Corporation Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script ============================================================================== ##[error]Parameter 'toolPath' cannot be null or empty. Finishing: Set Python manylinux variables ``` The error was because today I deleted a bunch of software from the VM image. The task might fail if no Python versions are found in $(Agent.ToolsDirectory).	2024-01-12 07:22:02 -08:00
Jian Chen	53497702a6	Fix Nuget CUDA Packaging pipeline (#19054 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2024-01-11 11:59:21 -08:00
Jian Chen	2eb3db6bf0	Adding python3.12 support to ORT (#18814 ) ### Description Adding python3.12 support to ORT ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-11 08:34:28 -08:00
Baiju Meswani	730df1bfa2	Increase MacOS pipeline timeout (#19072 )	2024-01-09 18:35:21 -08:00
Ashwini Khade	897a4163d7	Update transformer version for training CIs (#19046 ) ### Description Updating version to resolve security vulnerability.	2024-01-09 12:00:34 -08:00
Changming Sun	ab897a4a40	Remove Windows ARM32 from nuget packaging pipelines (#19049 ) ### Description 1. Remove Windows ARM32 from nuget packaging pipelines 2. Add missing component-governance-component-detection-steps.yml to some build jobs. ### Motivation and Context Stop supporting Windows ARM32 to align with [Windows's support policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64). Users who need this feature still can build the DLLs from source. However, later on we will remove that support too.	2024-01-09 07:45:03 -08:00
Adrian Lizarraga	52e5601449	[QNN Nuget Pipeline] Build with ML ops and detect ORT version (#19024 ) ### Description - Removes `--disable_ml_ops` build flag - Automatically detects ORT version from VERSION file via `templates/set-version-number-variables-step.yml`. We will no longer need to create a commit to update ORT versions. ### Motivation and Context - A new unit test caused failures in the QNN Nuget pipeline because it did not enable ml ops. - Automate ORT version specification	2024-01-08 12:44:12 -08:00
Yi Zhang	e8ac97c8d8	Move Windows GPU training job to A10 (#19041 ) ### Description 1. Update sm to 86 ### Motivation and Context We have more A10 quota then T4 and Nvidia AXX could be partitioned	2024-01-08 09:19:58 -08:00
PeixuanZuo	efdcefcf8c	[ROCm] fix security warning (#19017 ) fix security warning	2024-01-05 10:05:34 -08:00
Changming Sun	e155c66b4a	Change all macOS python packages to use universal2 (#19013 ) ### Description Change all macOS python packages to use universal2, to reduce the number of packages we have. ### Motivation and Context According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur), macOS 11 is the first macOS version that supports universal 2. And it is the min macOS version we support. So we no longer need to maintain separate binaries for different CPU archs.	2024-01-04 17:44:49 -08:00
Adrian Lizarraga	02b1ff5fa2	[QNN EP] Support multithreaded inference of a single session (#18981 ) ### Description - Add mutex to protect QNN API calls for executing a graph and extracting the corresponding profile data. - Ensures QNN EP's execute function does not store unnecessary state (i.e., input and output buffer pointers do not need to be stored as class members.) ### Motivation and Context Allow calling `session.Run()` from multiple threads when using QNN EP.	2024-01-04 13:32:48 -08:00
PeixuanZuo	7a454acd61	[ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985 ) Update CI/Packaing pipeline to ROCm6.0	2024-01-03 17:25:15 +08:00
Yi Zhang	c97e3f4821	[Fix] exception in Fuzz Test pipeline (#18984 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The file path is not correct.	2024-01-03 14:53:31 +08:00
Yifan Li	3993d43048	[EP Perf] Fix missing Azure cli & use onnx zoo model inside image (#18917 ) ### Description * Fix [missing Azure CLI issue](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392612&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=12). * Now, once CI fails to run `az --version`, it would auto-reinstall the azure cli dependency * Use existing onnx zoo model inside image during memtesting * to avoid test failure when onnx model zoo is restructuring * Display more detail info of valgrind when memtesting * Clear invalid dep of existing AddressSanitizer test case ### Validate * Before the fix, Azure CLI is missing: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392994&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=10 * After the fix: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392619&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b	2024-01-01 17:14:39 -08:00
Yi Zhang	3f03c12986	Split Onnxruntime Nuget GPU package (#18819 ) ### Description 1. Update donwload-artifacts to flex-downloadartifacts to make it eaiser to debug. 2. Move the native files into Gpu.Windows and Gpu-linux packages. Onnxruntime-Gpu has dependency on them. 3. update the package validation as well 4. Add 2 stages to run E2E test for GPU.Windows and GPU.Linux for example: ![image](https://github.com/microsoft/onnxruntime/assets/16190118/35c6730b-8080-4f52-a17c-b9c61f41b6bb) ### Motivation and Context Single Onnxruntime.Gpu Package size has already excceded the Nuget size limit. We split the package into some smaller packages to make them can be published. For compatibility, the user can install or upgrade Onnxruntime.Gpu, which will install Gpu.Windows and Gpu.Linux automatically. And the user can only install Gpu.Windows and Gpu.Linux directly. ### Test Link 1. In ORT_NIGHTLY 2. Install the preview version in nuget-int. (nuget source: https://apiint.nugettest.org/v3/index.json) --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-12-22 16:57:16 +08:00
Changming Sun	3d8f229d39	Add ARM64EC build jobs (#18870 ) ### Description Add ARM64EC build jobs in post merge pipeline to validate if our code is compatible with Windows ARM64EC.	2023-12-21 16:31:38 -08:00
Yifan Li	54e471a054	[EP Perf] Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard (#18868 ) ### Description Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard: ![image](https://github.com/microsoft/onnxruntime/assets/109183385/bafba098-1338-46fa-b10a-ca19eff2a746) Check [here](https://msit.powerbi.com/groups/d1ae6355-afd0-4c40-b78e-676a86cab1e2/reports/82101bbb-dad2-4f24-9ddf-a37f0d41509a/ReportSectionda402bdf6824e505a614?experience=power-bi) to preview on ep perf dashboard ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - brief overview of op metrics towards various models - easy to identify models which haven't reached 100% ops on cuda/trt ep.	2023-12-20 22:11:47 -08:00
Hector Li	8931854528	Move some QNN EP provider options to session options (#18877 ) Move QNN EP provider options to session options ### Description Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first. This is the first step for PR: PR https://github.com/microsoft/onnxruntime/pull/18865	2023-12-20 00:13:38 -08:00
Scott McKay	666fcbde4d	Add LeakyRelu to list of NNAPI operators (#18880 ) ### Description <!-- Describe your changes. --> Add LeakyRelu to the list as support was added a while ago. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-20 14:44:31 +10:00
Changming Sun	535a2403dd	Update Nuget publishing jobs (#18851 ) ### Description 1. Add a CodeSign validation task before the binaries are published, to make sure all DLL files are signed. 2. Auto-trigger the CUDA 12 pipeline's publishing job.	2023-12-19 16:54:46 -08:00
Ashwini Khade	4dff154f51	Fix nightly pipeline failure (#18867 ) ### Description Fixes a failure in the ortmodule nightly pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 09:18:00 -08:00
Jian Chen	6d7519ede8	Adding new pipeline for python cuda testing (#18718 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-18 18:13:03 -08:00
Changming Sun	ad476d5a1f	Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly (#18847 ) ### Description Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly, so that we do not need to put a CUDA SDK in the build machine's image.	2023-12-15 17:44:02 -08:00
Changming Sun	fc9ecb59db	Add Windows ARM build jobs to post merge pipeline (#18832 ) ### Description Add Windows ARM build jobs to post merge pipeline to valid our code is still compatible with these build settings.	2023-12-15 08:47:52 -08:00
Changming Sun	cbad4fe49b	Update absl and googletest (#18827 ) ### Description Update absl and googletest to their latest version to include some cmake changes: 1. A googletest's cmake change that will allow using external absl and re2. 2. Nullability enhancements that will allow our clang-based static analysis detecting many kinds of null pointer errors. ### Motivation and Context To fix a C4744 link warning in our Windows pipelines. ``` LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] ```	2023-12-14 16:15:07 -08:00
Changming Sun	b129f425fc	Fix test model URL issue (#18823 ) ### Description ONNX model zoo changed their dir structure. So some our pipelines are failing. In prevent such things happening again, we'd better to read the test data for a cache from local disk instead of downloading it remotely every time.	2023-12-14 13:06:08 -08:00
Changming Sun	95193cb440	Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810 ) ### Description To upgrade the clang version in preparation for PR #17031 .	2023-12-14 08:08:41 -08:00
Rachel Guo	f3fa045681	Enable MacOS build in ORT Objc Pod (#18786 ) ### Description <!-- Describe your changes. --> Add macos build for objc pod. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Follow up pr for #18550 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-12-13 13:50:42 -08:00
Changming Sun	17eaf9b053	Fix a build warning in SparseTensor code for 32-bit build configs (#18766 ) ### Description The warning is: ``` C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.1812949Z with 2023-12-08T20:58:48.2144272Z [ 2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.2801935Z ] 2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2894476Z with 2023-12-08T20:58:48.2911521Z [ 2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr, 2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) 2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3197946Z with 2023-12-08T20:58:48.3198565Z [ 2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3905678Z ] 2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3913414Z with 2023-12-08T20:58:48.3913660Z [ 2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.3914499Z ] 2023-12-08T20:58:48.3914743Z qlinear_concat.cc 2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5534583Z with 2023-12-08T20:58:48.5541266Z [ 2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5544914Z ] 2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5553712Z with 2023-12-08T20:58:48.5555569Z [ 2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5558707Z ] 2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5566354Z with 2023-12-08T20:58:48.5568185Z [ 2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5571339Z ] 2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5578562Z with 2023-12-08T20:58:48.5580399Z [ 2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5583465Z ] 2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5591396Z with 2023-12-08T20:58:48.5593220Z [ 2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5595955Z ] ``` And the warning in #18195 ### Motivation and Context AB#22894 --------- Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>	2023-12-13 11:11:13 -08:00
Changming Sun	44054e7508	Move NuGet nightly package publishing job to a separated pipeline (#18801 ) ### Description Move NuGet nightly package publishing job to a separated pipeline. Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs Packaging Pipeline'. This PR moves it to a separate pipeline so that we can manually trigger this step for any branch(e.g. release branches).	2023-12-13 11:10:50 -08:00
Jian Chen	ce1fed6ddf	Adding a new pipeline for publishing to Python Cuda 12 packages. (#18712 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 14:17:46 -08:00
Jian Chen	bfa5eb4591	Adding a new pipeline for pubilshing cuda 12 nuget packages (#18713 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 13:07:05 -08:00
Ashwini Khade	16df8377d3	Update transformers package to fix the security issue (#18730 ) ### Description Updating transformers package in test pipeline to fix a security vulnerability. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 09:15:23 -08:00
cloudhan	de32baeeef	[ROCm] Add GemmFloat8 (#18488 )	2023-12-11 11:37:29 +08:00
Changming Sun	bf33919afb	Update absl and gtest to fix an ARM64EC build error (#18735 ) ### Description Update absl and gtest to fix an ARM64EC build error ### Motivation and Context We need to get an important fix into ORT. The fix is: `8028a87c96`	2023-12-07 15:55:17 -08:00
Yi Zhang	a045be335b	use EO pool for windows web_cpu stage (#18737 ) ### Description reuse EO pool in NPM pipeline. ### Motivation and Context build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO pool. Reuse EO pool to make the pipeline work now. When I'm free, I'll try upgrading the chrome in the custom image.	2023-12-07 10:10:00 -08:00
Rachel Guo	7762f3f7c5	[NNAPI EP] Add NNAPI Split (#18702 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> yolo-v8 model missing operator support. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-12-06 15:11:15 -08:00
Adrian Lizarraga	559bd52252	[QNN EP] Update QNN SDK to version 2.17.0 (#18684 ) ### Description - Update QNN CI Pipelines to use QNN SDK version 2.17.0 - Print warning if unit test requires adjusted tolerance to pass - Temporarily disable unloading QnnCpu.dll for windows x64 due to crash when calling FreeLibrary - Enable fixed HTP tests - QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale - QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8 - QnnHTPBackendTests.ReduceSumS8Opset13_Rank5 - QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis - QnnHTPBackendTests.WhereLargeDataBroadcastU8 - QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8 - Enabled fixed CPU tests - QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales - Increased tolerance for HTP tests that are less accurate on QNN SDK 2.17.0 - QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8 - QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input - QnnHTPBackendTests.ConvU8U8S32_bias_initializer - QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer - QnnHTPBackendTests.LRNSize3 - QnnHTPBackendTests.LRNSize5 - QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8 - QnnHTPBackendTests.MaxPool_LargeInput_1Pads - QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners - QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric - Disabled ONNX model tests - averagepool_2d_ceil: Accuracy issues only on Windows x64 QnnCpu.dll - Disabled QDQ model tests (onnx_test_runner) - facedetection_op8_qdq: Accuracy issues - Disabled CPU EP tests (these use QnnCpu.dll) - ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX - GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is initializer and bias is not - MathOpTest.MatMulFloatType "test padding and broadcast B > A": Inaccuracy (only linux) - Fix Gemm translation bugs in QNN EP: - Do not skip processing of inputs that need to be transposed. ### Motivation and Context - Allow testing with newest QNN SDK version - Take advantage of improvements to enable new models.	2023-12-06 11:05:41 -08:00
Changming Sun	eaaf27015e	Remove EnvSetupScript parameter from win-ci.yml (#18662 ) ### Description To make the code more consistent. Now some TRT pipelines download TRT binaries on-the-fly, while other TRT pipelines use a preinstalled version. This PR make them the same.	2023-12-01 15:30:16 -08:00
Rachel Guo	9c45fe4957	Fix macos xcframework test stage codesign info (#18649 ) ### Description <!-- Describe your changes. --> Remove developement id and force codesign not required in the test macos target. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix failure happened in iOS_Full_xcframwork stage in Zip-Nuget-Java-NodeJS packaging pipeline. --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-12-01 14:47:46 -08:00
Jian Chen	d69842226b	Update the template files to correct stage to fix the python cuda 12 packaging pipeline (#18651 )	2023-12-01 07:57:46 -08:00
Yi Zhang	efee9abdb7	Reduce downloads in Nuget-Java pipeline to reduce connection exception (#18635 ) ### Description 1. Add a new stage to download java tools from https://oss.sonatype.org and publish them to pipeline artifact 2. Remove downloads in other jobs, they get the java tools from pipeline artifact 3. consolidate final_java_testing stages. ### Motivation and Context Reduce downloads to reduce the connection error like below. ``` --2023-11-28 07:16:31-- https://oss.sonatype.org/service/local/repositories/releases/content/org/junit/platform/junit-platform-console-standalone/1.6.2/junit-platform-console-standalone-1.6.2.jar Resolving oss.sonatype.org (oss.sonatype.org)... 3.227.40.198, 3.229.50.23 Connecting to oss.sonatype.org (oss.sonatype.org)\|3.227.40.198\|:443... connected. HTTP request sent, awaiting response... 502 Bad Gateway 2023-11-28 07:16:32 ERROR 502: Bad Gateway. ```	2023-12-01 07:44:44 +08:00
Changming Sun	1b5675ff0f	Update post-merge-jobs.yml: increase timeout value for the Ios job (#18602 )	2023-11-30 08:07:13 -08:00
Yi Zhang	68209307da	Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 (#18614 ) ### Description Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 ### Motivation and Context Reduce the maintenance cost	2023-11-29 10:32:42 -08:00
Edward Chen	14a343441d	Fix Objective-C static analysis build (#18606 ) - Patch abseil to fix a compile error about not finding `cxxabi.h`. - Fix some static analysis warnings.	2023-11-28 17:14:20 -08:00
Jian Chen	a49f31b670	Remove drop-nuget artifact from all pipelines (#18592 ) ### Description Currently, the `drop-nuget` artifact only contains protoc.exe which is also part of the `drop-extra` artifact. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-28 13:23:01 -08:00
Mike Guo	e24733cfe9	fix the Olive CI pipeline failure on Windows (#18464 ) Fix the https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1046 failure for Windows	2023-11-28 11:42:39 -08:00
Rachel Guo	288b80d363	Add MacOS build to ORT C Pod (#18550 ) ### Description <!-- Describe your changes. --> As title. 1. Add macos build as an optionally enabled arch for pod and changes to exsiting build_ios_framework/assemble_c_pod scripts. 2. Enable macos build arch in ios packaging pipeline (currently for variants other than Mobile) and check the output artifacts are correct. 3. Write MacOS Test Target scheme in the test app and integrate into ios packaging CI testing pipeline. Currently the changes only apply to onnxruntime-c pod. as the original request was from ORT SPM which consumes the onnxruntime-c pod only as the binary target. TODO: could look into adding macos platform to objc pod as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable macos platform support in cocoapods. and also potentially produce binary target for enabling macos platform in SPM as well. Replace https://github.com/microsoft/onnxruntime/pull/18334 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-28 10:11:53 -08:00
Yi Zhang	a6d8726407	Update ADO windows image to custom image (#18598 ) ### Description Update Azure-Pipelines-EO-Windows2022-aiinfra to onnxruntime-win-CPU-2022 in Nuget_Package_CPU. To make the debugging easier, use flex-downloadPipelineArtifact ### Motivation and Context Azure-Pipelines-EO-Windows2022-aiinfra is using 1ES window-latest image. The pipeline might be failed by unexpected upgrade. Verified: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=384425&view=results ### P.S. I think we should replace all Azure-Pipelines-EO-Windows2022-aiinfra.	2023-11-28 09:04:25 -08:00
Jian Chen	3ea27c2925	Create a new Nuget Package pipeline for CUDA 12 (#18135 )	2023-11-28 09:03:46 -08:00
Rachel Guo	62f00ad8e7	[CoreML] Add Softmax and Split op support (#18358 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Added for yolov8 model missing operator support. https://github.com/microsoft/onnxruntime/issues/17654 Now the model support info looks like: _CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 3 number of nodes in the graph: 233 number of nodes supported by CoreML: 230_ (only missing 3 concat op support due to input 3d shape is not currently support in CoreML EP Concat). --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-23 14:26:57 -08:00
cloudhan	6f3c1f9dc9	[ROCm] Update ck for GemmFloat8 (#18487 )	2023-11-23 12:06:19 +08:00
Yulong Wang	d455b0f8fd	[js/web] use Chrome in CI for npm tests (#18522 ) ### Description use Chrome in CI for npm tests. Previously we use Edge, however it sometimes crashes with reasons not yet identified.	2023-11-21 18:03:57 -08:00
Abhishek Jindal	680a526e73	Training packaging pipeline for cuda12 (#18524 ) ### Description <!-- Describe your changes. --> Build ORT-training packaging pipeline for CUDA 12.2 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This will help any customer using CUDA 12 and would not need to build ORT-training from source Test run: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0	2023-11-21 13:19:21 -08:00
Jian Chen	1dd9bf5340	Remove setup_env_azure.bat (#18482 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-20 09:58:15 -08:00
Jian Chen	d97fc1824f	Create a new Python Package pipeline for CUDA 12 (#18348 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-20 09:48:28 -08:00
Wei-Sheng Chin	3bcc137eb4	Tiny change to trigger the update of DORT's CI image (#18507 ) Recent PyTorch breaks DORT CI and [a patch](https://github.com/pytorch/pytorch/pull/113697) has been merged into PyTorch main. In order to update DORT's CI, we made dummy change in this PR.	2023-11-19 22:09:11 -08:00
Changming Sun	9364c05170	Update web-ci.yml: remove depth=1 (#18500 ) ### Description It causes our "NPM Packaging Pipeline" to fail. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-17 22:49:03 -08:00
Changming Sun	41f9379f3c	Update NDK version to 26.1.10909125 (#18493 ) ### Description Similar to #17852 ### Motivation and Context To avoid downloading NDK	2023-11-17 14:14:01 -08:00
Changming Sun	5eb5056c61	Always run emsdk_env.sh before build.py, even when ccache is disabled (#18477 ) ### Description Always run emsdk_env.sh before build.py, even when ccache is disabled This is a follow up to #18434. That PR didn't handle the case when ccache was disabled.	2023-11-16 21:37:29 -08:00
Jian Chen	05526b354b	Adding new yaml file for downloading cuda, and trt from azure blob (#18443 ) This also set the Path variable for the downloaded libraries. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 19:47:39 -08:00
Ye Wang	f9af94009b	onboard MoE (#18279 ) ### Description <!-- Describe your changes. --> 1. Introduce MoE CUDA op to ORT based on FT implementation. 2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows. Remove patch file for cutlass 3.0.0. 3. Sharded MoE implementation will come with another PR limitation: __CUDA_ARCH__ >= 700 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-14 16:48:51 -08:00
Changming Sun	27d068569a	Remove Node.js tool installer task from web ci pipeline (#18434 ) EMSDK already has a nodejs. We will use that one to be more consistent(the CI build pipeline would be less dependent on the VM image).	2023-11-14 13:16:01 -08:00
Yulong Wang	d22b1af5da	[js/web] add CI steps to log info for test failure investigating (#18418 ) ### Description add CI steps to log info for test failure investigating. Currently Web CI is marked as 'optional'. This change adds some script to dump debug info for investigating the random test failure	2023-11-14 11:40:58 -08:00
Changming Sun	a09099f2dd	Remove XNNPack from web pipelines (#18419 ) ### Description Remove XNNPack from web pipelines for now	2023-11-13 22:43:53 -08:00
Yi Zhang	0b16185223	build wasm with linux (#18106 ) ### Description Make all build_wasm tasks (NPM packaging and post merge)run on Linux. Enable web gpu test in npm package pipeline too. ### Motivation and Context Even on Windows, build_wasm is running in cygwin. So, it could save a lot of time to run it on Linux.	2023-11-14 14:42:11 +08:00
Scott McKay	897c1c1f05	Set DML package name correctly in CI (#18405 ) ### Description <!-- Describe your changes. --> Set DML package name correctly so the build doesn't try and include mobile targets. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline.	2023-11-14 14:01:59 +10:00
Scott McKay	8ff41aea09	Fix 4 more bad delegates missing the attribute that cause iOS AOT errors at runtime (#18390 ) ### Description <!-- Describe your changes. --> Fix bad delegates. Add script to detect mismatch, and run in CI and when creating nuget package. Ignore whitespace when looking at the diff to the .cs file as clang-format ran. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #18363	2023-11-14 14:00:21 +10:00
PeixuanZuo	37d8bed53d	[ROCm] add migraphx into onnxruntime-training-rocm package (#18339 )	2023-11-14 11:54:22 +08:00
PeixuanZuo	a62a500ae1	[ROCm] Update CK version (#17628 ) update ck version	2023-11-13 15:43:38 -08:00
Changming Sun	c3b5479056	Remove extra CUDA version flag (#18397 ) ### Description Only one of "--cuda_version" and "--cuda_home" is needed. If they were both specified, the first one will take precedence. Since we download cuda SDKs on-the-fly now, the machines will not need to have a preinstalled CUDA SDK therefore will not have VS-CUDA integration extension. Therefore the "--cuda_version" flag will not work. This PR deletes such usages. Related PR: #15915	2023-11-13 15:11:42 -08:00
Yulong Wang	6b0c97b43f	[js/web] fix typescript type check (#18343 ) ### Description This PR fixes the TypeScript type check. Previously, when I use esbuild to replace webpack (#17745), typescript typecheck was disabled. This causes a few TypeScript type error checked in into the code base. This PR fixes the followings: - Use "Node16" as default "module" value in tsconfig.json, because in TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is an invalid combination. - Set `noUnusedParameters` to true as default. in web override it to false because multiple code need to be updated ( a following-up PR will do this ) - set correct project file for 'web/lib/*/.ts' for ESLint (otherwise WebGPU types are not populated correctly) - fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts - upgrade "@webgpu/types" to latest to fix type error in file js/web/lib/wasm/jsep/backend-webgpu.ts - add package script "prebuild" for web to run tsc type check - add type check in CI yml file	2023-11-10 16:03:38 -08:00
Changming Sun	2d23b4e117	Update min macos version (#18251 )	2023-11-10 11:08:17 -08:00
RandySheriffH	59262dfc63	Add cuda context headers to zip (#18330 ) Expose cuda context headers for cuda custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-09 14:53:58 -08:00
Changming Sun	812532592e	Add a build validation for Linux ARM64 cross-compile (#18200 ) ### Description 1. Add a build validation for Linux ARM64/ARM32 cross-compile to catch issues listed in #18195 . 2. Revert eigen's commit id back to what we had before. ### Motivation and Context To catch cross-compile issues. Added a TODO item for fixing the compile warnings in Linux ARM32 build: AB#21639	2023-11-08 13:03:18 -08:00
Yulong Wang	d117a8010f	fix typo (node)->(browser) in linux-wasm-ci.yml (#18309 ) ### Description fix display name `'Build and test (node) (simd + threads)'` to `'Build and test (browser) (simd + threads)'`	2023-11-07 17:07:40 -08:00
Yi Zhang	9868a71373	[Fix] Stages to Run couldn't be selected (#18310 ) ### Description Add the pool definition in 2 stages even the pool is Microsoft-Hosted Pool. ### Motivation and Context Recently, in Nuget pipeline, when we click the Stages to Run ![image](https://github.com/microsoft/onnxruntime/assets/16190118/45af295e-fa75-402a-a7de-803c6a2ab7cd) It always pops up ``` Encountered error(s) while parsing pipeline YAML: Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Could not find a pool with ID 5206. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. ```	2023-11-07 17:52:47 +08:00
Changming Sun	398ef677ba	Update protobuf python package's version (#18203 ) 1. Now we use a released version of ONNX, so we can directly download a prebuilt package from pypi.org. We do not need to build one from source. 2. Update protobuf python package's version to match the C/C++ version we are using. 3. Update tensorboard python python because the current one is incompatible with the newer protobuf version.	2023-11-06 09:22:54 -08:00
Yi Zhang	b7b8b5b2ce	Fix Eigen-3.4.0 URL and hash (#18290 ) ### Description Add CI changes for #18287 Install onnx explicitly to pass windows GPU+dml stage. ### Motivation and Context 'eigen-3.4' was refering to a branch, not to a tag. There is now an Eigen 3.4.1 on that branch, and thus the hash has changed. See https://github.com/microsoft/onnxruntime/issues/18286#issuecomment-1793683416	2023-11-06 09:19:51 -08:00
Scott McKay	c352e9b1f9	Rework/cleanup the C# build infrastructure for nuget packages. (#18127 ) ### Description Update the C# nuget build infrastructure to make building a test nuget package more user friendly and to simplify - Remove usage of dotnet and msbuild in CIs - was temporary requirement until .net 6 MAUI was added to the released Visual Studio - remove SelectedTargets property and its usage - Add property for excluding mobile targets - generally we exclude based on the nuget package name - can now specify `/p:IncludeMobileTargets=false` on the command line to force exclusion - support building test package using build.py `--build_nuget` better - limit inclusion of xamarin targets as building with them requires a lot more infrastructure - use msbuild directly if xamarin targets are included. use dotnet otherwise. - remove quoting of property values as it doesn't appear to be necessary and breaks when msbuild is being used - add infrastructure to be able to pack the nuget package on linux with `dotnet pack` - `nuget pack` is not user friendly as-per comments in changes - requires stub csproj to provide the nuspec path - Remove netstandard1.0 targets from nuspec - we removed support from the actual bindings previously - Remove usage of nuget-staging directory when creating nuget package on linux - the nuspec file element has a fully qualified path for a source file so there is no obvious benefit to copying to a staging directory prior to packing ### Motivation and Context Address issues with 1P users trying to create test nuget packages locally. Long overdue cleanup of CI complexity.	2023-11-03 09:05:17 -07:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
Yi Zhang	9f5a6856fe	Rerun the flaky ort-web tests automatically (#18187 ) ### Description Retry 3 times at most if the web test fails. ### Motivation and Context Web GPU tests are not stable. From this link, we could find these ort-web tests are all in top 10 failing tasks. https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=161&contextType=build. Generally, it could pass by manually rerunning it. So, enable it to rerun automatically. These test steps duration isn't long. So, it won't take too long to retry.	2023-11-03 16:34:56 +08:00
Changming Sun	d8d79521ca	Disable ccache for DML (#18230 ) ### Description Disable ccache for DML. This change is similar to #18104. Now the DML build job is having the same timeout issue. I don't know why. But disabling ccache probably would help.	2023-11-02 16:00:55 -07:00
liqun Fu	20f2dd8b6b	use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177 )	2023-10-31 14:58:21 -07:00
Jian Chen	29e40987e3	Update batch file to set PATH for Cuda with TRT (#18182 ) ### Description Update batch file to set PATH for Cuda with TRT ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-31 10:22:40 -07:00
Jian Chen	8a574b874c	Update setup_env_cuda.bat (#18176 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-30 21:28:02 -07:00
Yi Zhang	436056dcd7	Revert "Disable dml stage in windows GPU pipeline temporarily. (#18034 )" (#18150 ) This reverts commit `99b8dcaae2`. ### Description <!-- Describe your changes. --> ### Motivation and Context Restore the dml stage in windows GPU pipeline. Agent issue is solved by adding Feature.DisableGpuDriver in pool properties.	2023-10-30 15:41:07 +08:00
Xavier Dupré	c10b83eb68	Update python cryptography version to 41.0.4 (#18056 ) ### Description Version 41.0.0 currently used has vulnerabilities. ### Motivation and Context See [Vulnerable OpenSSL included in cryptography wheels](https://github.com/advisories/GHSA-v8gr-m533-ghj9)	2023-10-27 12:06:38 +02:00
Jian Chen	7c18c60bc2	Change cuda image for tensorRT to the one with cudnn8 (#18102 ) ### Description copilot:summary ### Motivation and Context copliot::walkthrough	2023-10-26 16:28:57 -07:00
Ashwini Khade	f2e19a8ccf	Updates to training pipelines to reduce CI time (#18116 ) ### Description Motivation for this PR is reducing CI test time by removing unnecessary tests from the pipelines. Following changes are for reducing test time in pipelines: - Skip CPU model tests in GPU builds. Training CIs run these tests as a sanity check. There is no direct training code being tested in these pipelines, furthermore, CPU tests are being run in CPU pipelines so no need to run them again in GPU builds and block the GPU VM. This change reduces testing time by 20-25 mins in all training GPU pipelines. - Delete debug package building pipeline for linux training packages. This was required by compiler team at some point but there have been 0 downloads of these packages. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-26 14:58:57 -07:00
Chi Lo	455a9ce614	[TensorRT EP] Use latest onnx-tensorrt parser (#18067 ) Use latest onnx-tensorrt to fix compile error. Please see the issue https://github.com/microsoft/onnxruntime/issues/18029	2023-10-26 13:55:12 -07:00
Jian Chen	b023de0bfc	Redo #18044 Install CUDA 12.2 on Windows (#18093 )	2023-10-26 10:12:46 -07:00
Changming Sun	0f72739b6d	Disable ccache for WinML build (#18104 ) ### Description It seems would resolve the timeout issue. ### Motivation and Context	2023-10-26 19:03:01 +08:00
Jian Chen	76e275baf4	Merge Cuda docker files into a single one (#18020 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-24 15:17:36 -07:00
Changming Sun	6ec45f2ba5	Merge aiinfra-linux-ARM64-CPU-2019 and onnxruntime-linux-ARM64-CPU-2019 (#18069 ) ### Description Merge aiinfra-linux-ARM64-CPU-2019 and onnxruntime-linux-ARM64-CPU-2019 machines to a single one to ease management.	2023-10-24 13:04:08 -07:00
Changming Sun	abb329179a	Update win-wasm-ci.yml: increase the timeout value (#18023 )	2023-10-24 10:50:12 -07:00
Jian Chen	e63ccd3cbb	Install CUDA 12.2 on Windows (#18044 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-24 10:47:23 -07:00
liqun Fu	020824ed50	Update ONNX to 1.15.0rc1 (#17914 )	2023-10-20 15:08:25 -07:00
Yi Zhang	99b8dcaae2	Disable dml stage in windows GPU pipeline temporarily. (#18034 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-10-20 08:41:40 -07:00
Jian Chen	cbb0e0f83c	Create a new Dockerfile for cuda 12 and trt 8.6.1.6-1.cuda12.0 (#18000 )	2023-10-18 14:46:02 -07:00
Changming Sun	57c8736596	Move a nodejs test to a different machine pool (#17970 ) ### Description This is a temp fix for the failing "Zip-Nuget-Java-Nodejs Packaging Pipeline". The pipeline is failing because I removed NodeJS from the build machine pool's image, to reduce the number of dependencies we need to maintain in VMs. So this PR will temporarily move the test to a different machine pool to get the test passed. Then I will move the test to docker. Docker images are relatively easier to update and maintain. Now we almost run all Linux test in docker, except for this one. Moving it to docker is needed for enabling GPU support in nodejs, because all our Linux VMs do not have CUDA. ### Motivation and Context	2023-10-17 09:30:14 -07:00
Hariharan Seshadri	9356986730	Fix AMD builds and enable testing NHWC CUDA ops in one GPU CI (#17972 ) ### Description This PR: (1) Fixes AMD builds after #17200 broke them (Need to remember to run AMD builds while trying to merge external CUDA PRs next time) (2) Turn on the NHWC CUDA feature in the Linux GPU CI. The extra time spent in building a few more files and running a few more tests will not be much. Test Linux GPU CI run : https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1170770 ### Motivation and Context Keep the NHWC CUDA ops tested (https://github.com/microsoft/onnxruntime/pull/17200) and guard against regressions	2023-10-17 09:23:52 -07:00
Yulong Wang	f7341e8103	enable training for win-wasm-ci.yml (#17954 ) ### Description Fixes NPM Packaging pipeline. Training was enabled for linux-wasm-ci.yml but not enabled for win-wasm-ci.yml. the web CI uses linux-wasm-ci.yml NPM packaging pipeline uses win-wasm-ci.yml	2023-10-16 16:07:20 +08:00
Scott McKay	ae211999dd	Attempt to make the usage of the Android emulator in CIs more robust (#17903 ) ### Description <!-- Describe your changes. --> Android emulator usage updates: - Change approach to detecting boot has completed - use `-delay-adb` and a simple command (`ls`) with `wait-for-device` as the first step - this ensures enough startup has occurred for adb to be responsive - use secondary loop on the python side to check for sys.boot_completed to be set - doing the check on the python side provides more feedback and seems to work well - make the 'stop' logic more precise by using psutil - add internal timeout of 20 mins for emulator startup - waiting for the CI jobs overall timeout is way too long - value is hardcoded for now (most CIs startup in under 10 mins) but could be made configurable if needed CI updates: - add template for using the Android emulator - update CIs to use template - reorder React Native CI - minimize the time the Android emulator or iOS simulator is running by moving some build steps around - don't run both at the same time - unnecessary and potentially adds significant memory pressure to the machine - fix QNN Android emulator CI as much as possible - now everything works apart from running onnx_test_runner with the QNN EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix inconsistent detection of the emulator boot completing. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-10-15 08:42:36 +10:00
PeixuanZuo	0c5b1598d3	[ROCm] Add ROCm Debug wheels to private ADO Feeds (#17887 ) Add ROCm Debug wheels to private ADO Feeds	2023-10-13 10:28:10 +08:00
Changming Sun	3f3ece4a39	Update NDK to 26.0.10792818 (#17852 ) ### Description Update NDK to 26.0.10792818 which is included in every macOS build machine so that we do not need to download a different version every time in every build. ### Motivation and Context Downloading NDK on-the-fly is a main contributor of Android related build failures.	2023-10-12 14:08:43 -07:00
Yi Zhang	9d07ca3621	Move compliance check before publishing pipeline artifact (#17857 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Compliance check would fail randomly but the stage couldn't be rerun if the pipeline artifacts are already published. There's the error like `Artifact xxxx already exists`. We had to restart the whole pipeline if there's a random error in compliance check.	2023-10-12 15:48:04 +08:00
Yulong Wang	25bbd8d4eb	[js/web] allow gpu IO binding tests to fail temporarily (#17892 ) ### Description allow gpu IO binding tests to fail temporarily. when the root cause is still in investigation, use `continueOnError: true` to allow the test to fail without blocking PRs.	2023-10-11 21:21:21 -07:00
Changming Sun	138ccecd22	Change how "NPM packaging pipeline" downloads packages from another pipeline (#17838 ) ### Description "NPM packaging pipeline" needs to download an artifact from "Zip-Nuget-Java-Nodejs Packaging Pipeline". It has been a long-time issue that they two pipelines often use different commit ids. This change declares 'Zip-Nuget-Java-Nodejs Packaging Pipeline' as a resource, so that "NPM packaging pipeline" will always fetch from the pipeline run that triggers this NPM pipeline. Their official document says: "When you define a resource trigger, if its pipeline resource is from the same repo as the current pipeline, triggering follows the same branch and commit on which the event is raised."	2023-10-11 21:07:27 -07:00
Scott McKay	046939b0c1	Include CoreML in mac os python packages (#17844 ) ### Description <!-- Describe your changes. --> Include CoreML EP in python package. I've added to the base package as CoreML comes from the OS so there are no additional libraries to distribute. Updated the CPU-based provider list to add the AzureEP, which is also included in the base package, to fix some test failures. Without this the infrastructure thinks a device copy implementation is required between AzureEP and CoreML nodes, which is not the case as the AzureEP is CPU based. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #16989	2023-10-10 11:44:32 +10:00
PeixuanZuo	2ef6ee674c	[ROCm] Update ROCm and MIGraphX CI to ROCm5.7 (#17834 ) - Update ROCm and MIGraphX CI to ROCm5.7 - Simplify test exculde file. Some tests will output `registered execution providers ROCMExecutionProvider were unable to run the model.` if they cannot run. - Add `enable_training` build argument for MIGraphX pipeline.	2023-10-09 10:29:11 +08:00
Wei-Sheng Chin	b5a103ae16	Upgrade transformers to fix CI (#17823 ) Python package pipeline fails due to "tokenizers" compilation. Since "tokenizers" is a dep of "transformers", we update its version and hope a new solution had been there. ``` error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell` --> tokenizers-lib/src/models/bpe/trainer.rs:517:47 ```	2023-10-07 09:51:24 -07:00
PeixuanZuo	37f4f27da0	[ROCm] ONNX Runtime training rocm package for ADO (#17683 ) - we will publish the onnxruntime-training-rocm package on ADO feeds. The onnxruntime-training package will solely be for cuda. - Add new pipeline for onnxruntime-training-rocm ADO feeds https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1278. Only package with latest rocm version is publish to ADO.	2023-10-07 10:45:35 +08:00
Hector Li	385fab5bae	[QNN EP] Qnn cache improvement (#17757 ) ### Description Improve the QNN context binary cache feature to reduce the memory overhead and initialization time overhead. Instead of dumping a Qnn context binary file with metadata as header, we dump a Onnx format file with metadata inside Onnx node. ### Motivation and Context reduce the memory overhead and initialization time overhead	2023-10-06 15:56:33 -07:00
Chi Lo	569876fb16	[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617 ) Two major modifications of this PR: 1. Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field. 2. Make Python API capable of using TensorRT plugins by adding new Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs to register ep's custom op domain before model load. For C++ API, it's slightly different, when calling SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op domain to session option. Later ORT can register custom op domain from session option before model loading)	2023-10-06 14:12:20 -07:00
Justin Chu	be7541ef4a	[Linter] Bump ruff and remove pylint (#17797 ) Bump ruff version and remove pylint from the linter list. Fix any new error detected by ruff. ### Motivation and Context Ruff covers many of the pylint rules. Since pylint is not enabled in this repo and runs slow, we remove it from the linters	2023-10-05 21:07:33 -07:00
Rachel Guo	5be79e2e29	Remove swift files on ORT main repo (#17799 ) ### Description <!-- Describe your changes. --> Move the swift files to ORT SPM repo now: https://github.com/microsoft/onnxruntime-swift-package-manager ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-10-05 15:27:15 -07:00
Wei-Sheng Chin	faef9c32fa	ONNX-Native Tensor Parallel: Using Distributed MatMul as Example (#17695 ) This PR introduces - New data structure to represent kernel-level (aka node-level or op-level) tensor sharding informaiton. I consider it as the fundamentaion of ONNX distribtued inference. - Building blocks for distribtued kernels implementation especially stateless implementation for communication ops. - Implementation of DistributedMatMul and its tests. Code structure: - sharding.h/.cc: Function to shard and reshard tensors (calling into NCCL). - sharding_spec.h/.cc: Representation of how a tensor is sharded. - distributed_matmul.h/.cc: Implementation of tensor parallel MatMul. Inputs and outputs are sharded across devices. - onnxruntime_test_distributed.py: distributed operator tests. Example of specifying sharding information ```python @onnxscript.script() def matmul_rs_sr_rr(tensor_x: FLOAT, tensor_w: FLOAT) -> FLOAT: # Run MatMul by sharding x along column axis and w along row axis on # 2 GPUs. return MICROSOFT_OPSET.DistributedMatMul( tensor_x, tensor_w, device_mesh_shape=[2], device_mesh_elements=[0, 1], input_shard_specs=["RS[0]", "S[0]R"], output_shard_specs=["RR"], ) onnx_model = matmul_rs_sr_rr.to_model_proto( input_types=[FLOAT[2, "s"], FLOAT["s", 2]], output_types=[FLOAT[2, 2]], ) ``` In this example, the device mesh can be visualized as 1-D tensor, `[0, 1]`. The 2nd axis of `tensor_x` is sharded across `[0, 1]` (i.e., the 0-axis of the device mesh). Similarly, the 1st axis of `tensor_w` is sharded across `[0, 1]` as well. C++ classes to represent tensor sharding (copied from sharding_spec.h): ```cpp class DeviceMesh { public: // [Device Mesh and Tensor Sharding for Tensor Parallel] // Device mesh is a tensor of device indices. // A tensor can then be partitioned along specific mesh axes. // // Assume we have 4 GPUs indexed by 0, 1, 2, and 3. // Let's consider some examples. // 1. 1D device mesh [0, 1, 2, 3]. In this case, // device_mesh_shape is [4] and device_mesh_elements // is [0, 1, 2, 3]. // If we want to shard a 2-D tensor along its axis 1, the // corresponding sharding spec is a string "RS[0]". // 2. 2D device mesh [[0, 1], [2, 3]]. In this case, // device_mesh_shape is [2, 2] and device_mesh_elements // is [0, 1, 2, 3]. // If we want to shard a 2-D tensor's // rows along mesh axis 1 and // columns along mesh axis 0, the // corresponding sharding spec is a string "S[1]S[0]". // If that 2-D tensor's value is np.array([[5, 6], [7, 8]]), // GPU 0/1/2/3 owns 5/7/6/8. Below is a visualization the sharding // proccess. // - Start with a 2-D device mesh [[0, 1], [2, 3]] and // a 2-D tensor [[5, 6], [7, 8]] // - GPU: [[0, 1], [2, 3]], Tensor: [[5, 6], [7, 8]] // - Split GPU mesh along axis 1 and tensor along // axis 0 for "S[1]" in "S[1]S[0]" // - GPU: [[0], [2]], Tensor: [[5, 6]] // GPU: [[1], [3]], Tensor: [[7, 8]] // - Split GPU mesh along axis 0 and tensor along // axis 1 for "S[0]" in "S[1]S[0]" // - GPU: [[0]], Tensor: [[5]] // - GPU: [[2]], Tensor: [[6]] // - GPU: [[1]], Tensor: [[7]] // - GPU: [[3]], Tensor: [[8]] // Actual shape of device mesh represented by `device_mesh_elements`. std::vector<int64_t> device_mesh_shape; // Flattened device mesh. std::vector<int64_t> device_mesh_elements; }; class AxisPartitionSpec { // [Device Mesh and Tensor Sharding for Tensor Parallel] // This class is the in-memory representation of // 1. if a tensor is sharded or not (aka replica), and // 2. which tensor axis is shard by which device mesh axis. // Let's consider sharding 2-D tensor along column axis on // device mesh [0, 1] as an example. // The required sharding spec RS[0] can be represented by // - AxisPartitionSpec(Condition::Replica, -1) // - AxisPartitionSpec(Condition::Shard, 0) public: // Status of a tensor axis. // A tensor axis can be either sharded or replicated // along a device mesh axis. enum class Condition { Replica, Shard }; // This field tells if a tensor axis is sharded or not. Condition cond; // If a tensor axis is sharded, this field tells which device // mesh axis to distribute the shards along. // If a tensor axis is not sharded, this field is ignored. int device_mesh_axis; // A helper to construct a replica spec for a tensor axis. static AxisPartitionSpec CreateReplica() { return AxisPartitionSpec(Condition::Replica, -1); } // A helper to construct a sharding spec for a tensor axis. // This tensor axis is sharded along `device_mesh_axis` in device mesh. static AxisPartitionSpec CreateShard(int device_mesh_axis) { return AxisPartitionSpec(Condition::Shard, device_mesh_axis); } }; class TensorPartitionSpec { // [Device Mesh and Tensor Sharding for Tensor Parallel] // TensorPartitionSpec holds a collection of AxisPartitionSpec and an // associated DeviceMesh. It is responsible for determining how a tensor // should be partitioned across a device mesh. // // Example 1: RS[0] // In this scenario, `axis_specs` would contain two `AxisPartitionSpec` objects. // - The first object is a Replica, denoting that the first axis of the tensor is // not sharded but is instead replicated. // - The second object is a Shard along the 0-th axis of the device mesh. It denotes // that the second axis of the tensor is sharded along the first axis of the // device mesh. // // Example 2: S[0]RR // In this scenario, `axis_specs` would contain three `AxisPartitionSpec` objects. // - The first object is a Shard along the 0-th axis of the device mesh, indicating // that the first axis of the tensor is sharded along the first axis of the // device mesh. // - The second and third objects are Replicas, indicating that the second and third // axes of the tensor are not sharded but are instead replicated. public: // axis_specs[i]: AxisPartitionSpec for tensor axis i. For a 2-D tensor, // axis_specs[0] is for row axis and axis_specs[1] is for // column axis. axis_specs[i].device_mesh_axis = j means that // tensor axis i is sharded along device mesh axis j. std::vector<AxisPartitionSpec> axis_specs; // device_mesh: DeviceMesh for sharding the associated tensor. // Read [Device Mesh and Tensor Sharding for Tensor Parallel] in DeviceMesh's comment. DeviceMesh device_mesh; }; ```	2023-10-05 14:22:25 -07:00
Edward Chen	b6bef0f063	Add test for iOS dynamic framework (#17790 ) Add test to cover iOS dynamic framework usage.	2023-10-05 11:18:51 -07:00
Yulong Wang	561aca97cf	[js/webgpu] support IO binding (#17480 ) <del> This PR is based on a few prerequisites PRs. They are listed as below: - #17465 - #17469 - #17470 - #17472 - #17473 - #17484 Please review the current change by only looking at commit e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later. </del> ### Description This PR introduces WebGPU IO binding. This new feature allows onnxruntime-web users to use tensors created from GPU as model input/output so that a model inferencing can be done without unnecessary data copy between CPU and GPU for model input/output. ### Examples An E2E demo/example is being worked on. Following is some simple demo with code snippet. Let's first check today how we do: ```js // STEP.1 - create an inference session: const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] }); // STEP.2 - create model input: (supposing myImageCpuData is a Float32Array) const feeds = { 'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3]) }; // STEP.3 - run model const myResults = await mySession.run(feeds); // STEP.4 - get output data const myData = myResults['output_image:0'].data; // Float32Array ``` #### for inputs (GPU tensor): Now, with IO binding, you can create a tensor from a GPU buffer, and feed it to the model: ```js // new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data) const feeds = { 'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] }) }; ``` ### for outputs (pre-allocated GPU tensor) you can also do that for output, if you know the output shape: ```js // new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object) const fetches = { 'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] }) }; // new STEP.3 - run model with pre-allocated output (fetches) const myResults = await mySession.run(feeds, fetches); ``` ### for outputs (specify location) if you do not know the output shape, you can specify the output location when creating the session: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: "gpu-buffer" }); ``` if the model has multiple outputs, you can specify them seperately: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: { "output_image:0": "gpu-buffer" } }); ``` now you don't need to prepare the `fetches` object and onnxruntime-web will prepare output data on the location that specified. #### read data when you get the output tensor, you can: ```js // get the gpu buffer object: const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer // get the CPU data asynchronizely const cpuData = await myOutputTensor.getData(); // get the CPU data asynchronizely and release the underlying GPU resources const cpuData = await myOutputTensor.getData(true); // dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called. myOutputTensor.dispose(); ``` #### resource management JavaScript has GC so you don't need to worry about managing JavaScript objects. But there are 2 types of resources that are not managed by GC: - GPU buffer that used in tensors - Underlying ORT native resources To simplify, most of the unmanaged resources and handled inside ORT web. But there are a few resources that need users to manage: - All external GPU resources, including GPU buffers inside all tensors created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User should manage those GPU buffers themselves. - When a session is created with `preferredOutputLocation` == "gpu-buffer" specified in session options, and the corresponding output is not pre-allocated, user need to call the output tensor's `dispose()` or `getData(true)` to manually release the underlying GPU buffers. - ORT internal errors (including providing a pre-allocated output tensor with wrong type/dims) will invalidate the whole wasm memory and is not recoverable. An exception is thrown in this situation.	2023-09-29 11:24:42 -07:00
Changming Sun	caf98128c1	Update linux-wasm-ci.yml: remove the ln command (#17735 ) ### Description /usr/local/bin can only be modified by root. This command seems unnecessary	2023-09-28 21:43:29 -07:00
Changming Sun	276e8733bd	Update onnx python package and setuptools (#17709 ) ### Description A follow-up for #17125	2023-09-27 07:54:48 -07:00
liqun Fu	2be4dc6d04	ONNX 1.15 integration (#17125 ) ### Description this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released ### Motivation and Context Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT. --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2023-09-26 14:44:48 -07:00
Changming Sun	a942bbf489	Update nodejs to 18.x (#17657 ) 1. Upgrade nodejs from 16.x to 18.x for Windows pipelines 2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs nodejs from internet or local disk cache. But we already moved all Linux tests to docker. So we do not need the installer anymore. 3. Remove some other unused code.	2023-09-25 14:12:11 -07:00
PeixuanZuo	216214b7d3	[ROCm] Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline (#17668 ) - Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline - Remove redundant arg	2023-09-25 10:35:28 +08:00
PeixuanZuo	5b9cd91a9c	[ROCm] fix CI (#17648 ) fix CI, follow #17621	2023-09-21 07:37:50 -07:00
Changming Sun	57dfd15d7b	Remove dnf update from docker build scripts (#17551 ) ### Description 1. Remove 'dnf update' from docker build scripts, because it upgrades TRT packages from CUDA 11.x to CUDA 12.x. To reproduce it, you can run the following commands in a CentOS CUDA 11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8. ``` export v=8.6.1.6-1.cuda11.8 dnf install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v} libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} dnf update -y ``` The last command will generate the following outputs: ``` ======================================================================================================================== Package Architecture Version Repository Size ======================================================================================================================== Upgrading: libnvinfer-devel x86_64 8.6.1.6-1.cuda12.0 cuda 542 M libnvinfer-headers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 118 k libnvinfer-headers-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 14 k libnvinfer-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-vc-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 107 k libnvinfer-vc-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 251 k libnvinfer8 x86_64 8.6.1.6-1.cuda12.0 cuda 543 M libnvonnxparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 467 k libnvonnxparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 757 k libnvparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 2.0 M libnvparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 854 k Installing dependencies: cuda-toolkit-12-0-config-common noarch 12.0.146-1 cuda 7.7 k cuda-toolkit-12-config-common noarch 12.2.140-1 cuda 7.9 k libcublas-12-0 x86_64 12.0.2.224-1 cuda 361 M libcublas-devel-12-0 x86_64 12.0.2.224-1 cuda 397 M Transaction Summary ======================================================================================================================== ``` As you can see from the output, they are CUDA 12 packages. The problem can also be solved by lock the packages' versions by using "dnf versionlock" command right after installing the CUDA/TRT packages. However, going forward, to get the better reproducibility, I suggest manually fix dnf package versions in the installation scripts like we do for TRT now. ```bash v="8.6.1.6-1.cuda11.8" &&\ yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\ yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\ libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} ``` When we have a need to upgrade a package due to security alert or some other reasons, we manually change the version string instead of relying on "dnf update". Though this approach increases efforts, it can make our pipeines more stable. 2. Move python test to docker ### Motivation and Context Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x and the result package is totally not usable(crashes every time)	2023-09-21 07:33:29 -07:00
Pranav Sharma	038c76378f	Include onnxruntime_float16.h in the package. (#17637 ) ### Description Include onnxruntime_float16.h in the package. ### Motivation and Context This was missed in the recently released 1.16 pkgs (except Nuget).	2023-09-21 00:08:10 -07:00
PeixuanZuo	1f991f27f1	[ROCm] add manylinux build test for ROCm CI (#17621 ) manylinux build is used for nightly packaging generation and it's hard to capture issue in time when related files change. This PR add manylinux build in CI.	2023-09-21 10:45:16 +08:00
Changming Sun	dd561f2015	Upgrade sympy (#17639 ) AB#17015	2023-09-20 18:44:23 -07:00
Yulong Wang	d522cc7cc4	Update npm-packaging-pipeline.yml to always use artifacts from main branch (#17604 ) ### Description Update npm-packaging-pipeline.yml to always use artifacts from main branch	2023-09-19 14:42:08 -07:00
Wei-Sheng Chin	068300d97e	Pin beartype version (#17599 ) PyTorch doesn't like the latest beartype: https://github.com/pytorch/pytorch/pull/109510	2023-09-18 19:31:04 -07:00
Yi Zhang	7116e66c4b	Improve Win QNNEP pipeline (#17586 ) ### Description 1. use standard win build template 2. enable compiler cache ### Motivation and Context Make win build task easy to maintain and accelerate the pipeline.	2023-09-19 07:36:17 +08:00
Yi Zhang	377f959c69	Run Final_Jar_Testing_Linux_GPU in docker (#17533 ) ### Description 1. Create a package test image based on [RedHat UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image) 2. Install TensorRT 8.6.1.6 in RedHat. (Ref. https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm) 3. Run Final_Jar_Testing_Linux_GPU in docker (base image: nvidia/cuda:11.8.0-cudnn8-devel-ubi8) ### Motivation and Context [AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470) ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069	2023-09-15 08:35:55 -07:00
Yulong Wang	7af2f68ef3	[js/web] add a test flag to customize chromium flags (#17545 ) ### Description add a test flag to customize chromium flags. Usage: npm test -- \<other flags> --chromium-flags=<...>	2023-09-14 10:05:31 -07:00
Changming Sun	5d3786206b	Fix ROCM's nightly build (#17518 ) ### Description PR 15470 updated some C/C++ dependencies. The change caused ROCM EP's nightly build to fail. see issue https://github.com/ROCm-Developer-Tools/HIP/issues/2082 for a background. So, the root cause is HIP compiler has a special requirement that HIP's include dirs must be used before the operating system's include folder: /usr/include. HIP adds "-isystem" in front of "/usr/include". gcc or clang will search the folders added with "-I" first, then the "-isystem" folder. It works fine as long as we do not add "-I/usr/include" to the compile commands for *.cu files. It would be wrong if we already have installed an open source library to /usr and want to use the prebuilt library from there instead of the current build dir. ### Motivation and Context	2023-09-13 08:50:14 -07:00
Yi Zhang	c0a4fe777f	Move Linux python test into docker (#17479 ) ### Description supplement of #17417 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-13 15:21:28 +08:00
rui-ren	b52127d22d	update acpt image for the training ci nightly (#17521 ) ### Description <!-- Describe your changes. --> The name of nightly ACPT image has been updated to `ptebic.azurecr.io/internal/aifx/acpt/nightly-ubuntu-cuda-torch-dev` As the previous image alias had `cu118`, `torch210dev` or `py38`, any version update will break the training nightly pipeline ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Using constant image alias to avoid pipeline failure.	2023-09-12 22:32:20 -07:00
Changming Sun	9b755dce9f	Delete all Prefast tasks (#17522 ) ### Description Delete all Prefast tasks because the new VS 17.7 version crashes every time when we run the task on our CI build servers. However, we cannot reproduce it locally. And this problem blocks us installing security patches to our CI build machines. Will use [CodeQL](https://codeql.github.com/) instead. ### Motivation and Context Address some security alerts.	2023-09-12 17:40:49 -07:00
Edward Chen	cf672c5887	Use name of temporary provisioning profile. (#17459 ) The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name.	2023-09-12 10:56:35 -07:00
Adrian Lizarraga	f20e475e67	[QNN EP] Update QNN SDK to version 2.14.1 (#17467 ) ### Description Updates the version of QNN SDK used by CI Pipelines. Enables some tests fixed by 2.14.1, but still need to look into Resize in a separate PR. ### Motivation and Context Test latest version of QNN SDK.	2023-09-11 21:07:50 -07:00
Yulong Wang	850baced33	[web] a few updates to web pipeline (#17485 ) ### Description Update the Web CI pipelines: - remove parameter 'WebTemplate': Since we start to support webgpu, the linux-web-ci.yml is no longer working and it is already out-of-date. remove this file and parameter so that we always use win-web-ci.yml - change flag `RunWebGpuTests` into 2 flags, for release and debug. Currently for CI we only run webgpu tests on release build. But we want to have the capability to run webgpu tests on debug build as well. After this PR is merged, next step is to enable both Debug and Release webgpu tests in PostMerge pipeline.	2023-09-11 11:43:42 -07:00
Caroline Zhu	dcc93909b4	Add training WASM generation to Web CI pipeline (#17319 ) ### Description [Successful pipeline run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results) Added flag to build the training artifacts & updated the pull-wasm-artifacts script to pull the training artifacts as well. Bundled into this PR are minor formatting fixes + naming fixes. ### Motivation and Context [This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended the WASM API wrapper to build training WASM artifacts as well. The ORT training WASM artifacts are required to support ORT training web bindings.	2023-09-08 15:49:47 -07:00
Changming Sun	bc84f52633	Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470 ) ### Description Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint to newer versions per request of @ mayeut. He created the following PRs to update the deps: https://github.com/microsoft/onnxruntime/pull/15432 https://github.com/microsoft/onnxruntime/pull/15434 https://github.com/microsoft/onnxruntime/pull/15435 https://github.com/microsoft/onnxruntime/pull/15436 https://github.com/microsoft/onnxruntime/pull/15437 However, our build system needs to fetch the dependencies from an internal mirror that only Microsoft employees have write access to. So I closed his PRs and created this one. This PR also updates abseil to a newer version. This is to prepare for upgrading re2.	2023-09-08 13:35:04 -07:00
Ashwini Khade	c5dbd5c919	Updates to training pipelines (#17292 )	2023-09-08 11:57:12 -07:00
Yi Zhang	ae74a517b6	Run Nuget_Test_Linux_GPU in container (#17452 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results	2023-09-08 13:41:20 +08:00
Yi Zhang	0a3eb60b01	Fix Bug: Step failed but not exited with error (#17442 ) ### Description Add "set -ex" in the script. ### Motivation and Context Build failed but it still passed. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1132003&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda	2023-09-07 14:33:31 +08:00
Changming Sun	b38fb0da06	Revert the yaml file changes in "Nodejs_Packaging_CPU" build job (#17441 ) ### Description The yaml file changes made in #16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context	2023-09-06 20:20:55 -07:00
Yi Zhang	ede339f304	Move dotnet build and test into docker in Linux CPU CI (#17417 ) ### Description install dotnet 6.0 in the docker image. move C# build and test into docker. ### Motivation and Context ### Note The Unit tests and Symbolic shape infer's migration will be in another PR.	2023-09-07 09:28:16 +08:00
Edward Chen	a3a1237270	Disable xcpretty filtering of xcodebuild output in iOS packaging pipeline. (#17429 )	2023-09-06 09:04:17 -07:00
Changming Sun	c6b0d185b4	Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856 ) ### Description 1. Update docker files and their build instructions. ARM64 and x86_64 can use the same docker file. 2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8 AB#18990	2023-09-05 18:12:10 -07:00
aciddelgado	44101e8771	Flash Attention v2 MHA (#17227 ) ### Description Integrate Flash Attention V2 to PackedMultiHeadAttention, MultiHeadAttention and Attention operators. Flash Attention v2 source code is from https://github.com/Dao-AILab/flash-attention/tree/main/csrc/flash_attn/src. We did some change to remove dependency on Torch, then removed backward and bfloat16 related code. Add benchmark script (see benchmark_mha.sh) to compare different attention kernels for MultiHeadAttention operator. Current limitations for Flash Attention in PackedMultiHeadAttention, MultiHeadAttention and Attention operators: * Relative Position Bias is not supported * Different hidden size for Q and V is not supported * Only float16 is supported * Padding/attention mask is not supported * For MultiHeadAttention, when there is past or present input, bias shall be provided to activate flash attention * For Attention, past or present inputs will deactivate flash attention * Causal is not supported Some limitations (like attention mask and causal) might be removed later. Currently, Flash Attention v2 only works in Linux. For Windows, we will enable later with Cutlass 3.2. Two environment variables can be used for testing purpose: (1) `ORT_DISABLE_FLASH_ATTENTION` to disable flash attention. Default value is 0 (enable). Set it to "1" to disable it. (2) `ORT_MIN_SEQ_LEN_FLASH_ATTENTION_PACKED_QKV`. Default value is "513", which means that we only enable flash attention when sequence length is larger than 512 for packed QKV format. Set it to "0" if you want to use flash attention v2 whenever possible. ### Speedup The following result is from Standard_ND96amsr_A100_v4 VM (A100-SXM4-80GB GPU) using benchmark_mha.sh. The metric is TFLOPs per second for MultiHeadAttention operator. There are 3 input formats: * `Q,K,V` means separated inputs query, key and value of BxSxNH * `Q,KV` means packed KV, where key is 5D: BxSxNx2xH * `QKV` means packed QKV, where query is 5D: BxSxNx3xH Note that flash attention cannot use packed QKV format, so extra Transpose is needed. We found that TensorRT kernel is faster for sequence length <= 512 for packed QKV. The reason might be no transpose is needed for TensorRT kernel in this format. We also notice that, TensorRT kernel is faster for stable diffusion 512x512 image (see seq_len=4096, heads=8, head_dim=40 below), while flash attention v2 is faster for 1024x1024 image (see seq_len=16384, heads=8, head_dim=40 below). input format \| batch size \| sequence length \| heads \| head dim \| flash_v2 (TFLOPs/s) \| TensorRT (TFLOPs/s) \| Memory Efficient Attention (TFLOPs/s) -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Q,K,V \| 32 \| 512 \| 64 \| 32 \| 78.1 \| 60.0 \| 39.3 Q,K,V \| 32 \| 512 \| 128 \| 16 \| 46.8 \| 44.1 \| 21.7 Q,K,V \| 16 \| 1024 \| 64 \| 32 \| 99.0 \| 72.8 \| 44.3 Q,K,V \| 16 \| 1024 \| 128 \| 16 \| 54.7 \| 49.2 \| 23.4 Q,K,V \| 8 \| 2048 \| 64 \| 32 \| 113.8 \| 81.2 \| 47.8 Q,K,V \| 8 \| 2048 \| 128 \| 16 \| 59.7 \| 51.9 \| 24.7 Q,K,V \| 4 \| 4096 \| 64 \| 32 \| 122.5 \| 85.6 \| 49.7 Q,K,V \| 4 \| 4096 \| 128 \| 16 \| 62.5 \| 53.3 \| 25.3 Q,K,V \| 2 \| 8192 \| 64 \| 32 \| 127.4 \| 87.5 \| 50.7 Q,K,V \| 2 \| 8192 \| 128 \| 16 \| 64.0 \| 54.2 \| 25.6 Q,K,V \| 1 \| 16384 \| 64 \| 32 \| 129.5 \| 91.0 \| 51.2 Q,K,V \| 1 \| 16384 \| 128 \| 16 \| 64.7 \| 54.5 \| 25.8 Q,K,V \| 1 \| 4096 \| 8 \| 40 \| 51.0 \| 43.6 \| 36.8 Q,K,V \| 1 \| 4096 \| 8 \| 80 \| 97.7 \| 77.0 \| 55.5 Q,K,V \| 1 \| 4096 \| 8 \| 160 \| 120.0 \| 39.7 \| 57.8 Q,K,V \| 4 \| 4096 \| 8 \| 40 \| 89.0 \| 84.4 \| 49.2 Q,K,V \| 4 \| 4096 \| 8 \| 80 \| 133.0 \| 92.2 \| 63.2 Q,K,V \| 4 \| 4096 \| 8 \| 160 \| 164.8 \| 42.7 \| 63.8 Q,K,V \| 1 \| 16384 \| 8 \| 40 \| 96.9 \| 91.3 \| 52.1 Q,K,V \| 1 \| 16384 \| 8 \| 80 \| 142.9 \| 101.5 \| 65.6 Q,K,V \| 1 \| 16384 \| 8 \| 160 \| 177.4 \| 44.2 \| 65.7 Q,K,V \| 128 \| 128 \| 12 \| 64 \| 29.0 \| 26.9 \| 25.7 Q,K,V \| 64 \| 128 \| 12 \| 64 \| 23.1 \| 10.8 \| 21.3 Q,K,V \| 128 \| 384 \| 12 \| 64 \| 83.5 \| 60.8 \| 55.7 Q,K,V \| 64 \| 384 \| 12 \| 64 \| 72.6 \| 40.5 \| 52.8 Q,K,V \| 128 \| 512 \| 12 \| 64 \| 98.9 \| 77.9 \| 62.1 Q,K,V \| 64 \| 512 \| 12 \| 64 \| 94.7 \| 75.6 \| 60.4 Q,KV \| 32 \| 512 \| 64 \| 32 \| 85.9 \| 41.1 \| 41.1 Q,KV \| 32 \| 512 \| 128 \| 16 \| 47.1 \| 21.6 \| 21.6 Q,KV \| 16 \| 1024 \| 64 \| 32 \| 104.4 \| 45.8 \| 45.8 Q,KV \| 16 \| 1024 \| 128 \| 16 \| 54.7 \| 23.6 \| 23.6 Q,KV \| 8 \| 2048 \| 64 \| 32 \| 116.8 \| 48.5 \| 48.5 Q,KV \| 8 \| 2048 \| 128 \| 16 \| 59.8 \| 24.7 \| 24.7 Q,KV \| 4 \| 4096 \| 64 \| 32 \| 124.2 \| 50.1 \| 50.1 Q,KV \| 4 \| 4096 \| 128 \| 16 \| 62.6 \| 25.3 \| 25.3 Q,KV \| 2 \| 8192 \| 64 \| 32 \| 128.5 \| 50.8 \| 50.9 Q,KV \| 2 \| 8192 \| 128 \| 16 \| 64.1 \| 25.6 \| 25.6 Q,KV \| 1 \| 16384 \| 64 \| 32 \| 129.4 \| 51.2 \| 51.2 Q,KV \| 1 \| 16384 \| 128 \| 16 \| 64.8 \| 25.8 \| 25.8 Q,KV \| 1 \| 4096 \| 8 \| 40 \| 67.5 \| 37.7 \| 37.5 Q,KV \| 1 \| 4096 \| 8 \| 80 \| 101.3 \| 56.7 \| 56.6 Q,KV \| 1 \| 4096 \| 8 \| 160 \| 124.0 \| 58.6 \| 58.6 Q,KV \| 4 \| 4096 \| 8 \| 40 \| 90.8 \| 49.8 \| 49.8 Q,KV \| 4 \| 4096 \| 8 \| 80 \| 135.6 \| 63.8 \| 63.8 Q,KV \| 4 \| 4096 \| 8 \| 160 \| 166.3 \| 64.5 \| 64.5 Q,KV \| 1 \| 16384 \| 8 \| 40 \| 97.5 \| 52.3 \| 52.3 Q,KV \| 1 \| 16384 \| 8 \| 80 \| 143.5 \| 65.9 \| 65.8 Q,KV \| 1 \| 16384 \| 8 \| 160 \| 178.4 \| 65.9 \| 65.8 Q,KV \| 128 \| 128 \| 12 \| 64 \| 26.8 \| 48.1 \| 30.9 Q,KV \| 64 \| 128 \| 12 \| 64 \| 28.0 \| 38.9 \| 25.0 Q,KV \| 128 \| 384 \| 12 \| 64 \| 97.7 \| 61.1 \| 61.0 Q,KV \| 64 \| 384 \| 12 \| 64 \| 89.5 \| 57.8 \| 57.9 Q,KV \| 128 \| 512 \| 12 \| 64 \| 111.9 \| 66.7 \| 66.9 Q,KV \| 64 \| 512 \| 12 \| 64 \| 107.2 \| 64.9 \| 64.8 QKV \| 32 \| 512 \| 64 \| 32 \| 77.2 \| 84.7 \| 39.3 QKV \| 32 \| 512 \| 128 \| 16 \| 43.4 \| 53.1 \| 20.9 QKV \| 16 \| 1024 \| 64 \| 32 \| 98.8 \| 87.4 \| 44.6 QKV \| 16 \| 1024 \| 128 \| 16 \| 52.0 \| 54.1 \| 23.2 QKV \| 8 \| 2048 \| 64 \| 32 \| 113.1 \| 89.0 \| 47.9 QKV \| 8 \| 2048 \| 128 \| 16 \| 58.2 \| 54.6 \| 24.5 QKV \| 4 \| 4096 \| 64 \| 32 \| 120.6 \| 89.7 \| 49.7 QKV \| 4 \| 4096 \| 128 \| 16 \| 61.7 \| 54.6 \| 25.2 QKV \| 2 \| 8192 \| 64 \| 32 \| 125.9 \| 89.5 \| 50.7 QKV \| 2 \| 8192 \| 128 \| 16 \| 63.6 \| 54.8 \| 25.5 QKV \| 1 \| 16384 \| 64 \| 32 \| 128.5 \| 92.0 \| 51.2 QKV \| 1 \| 16384 \| 128 \| 16 \| 64.6 \| 54.8 \| 25.7 QKV \| 1 \| 4096 \| 8 \| 40 \| 60.2 \| 69.8 \| 38.1 QKV \| 1 \| 4096 \| 8 \| 80 \| 101.6 \| 75.2 \| 56.7 QKV \| 1 \| 4096 \| 8 \| 160 \| 130.2 \| 41.2 \| 58.4 QKV \| 4 \| 4096 \| 8 \| 40 \| 90.6 \| 91.0 \| 49.5 QKV \| 4 \| 4096 \| 8 \| 80 \| 133.6 \| 98.1 \| 62.8 QKV \| 4 \| 4096 \| 8 \| 160 \| 165.3 \| 43.7 \| 63.9 QKV \| 1 \| 16384 \| 8 \| 40 \| 97.2 \| 92.8 \| 52.1 QKV \| 1 \| 16384 \| 8 \| 80 \| 143.0 \| 103.1 \| 65.6 QKV \| 1 \| 16384 \| 8 \| 160 \| 177.6 \| 44.5 \| 65.7 QKV \| 128 \| 128 \| 12 \| 64 \| 31.1 \| 65.9 \| 27.6 QKV \| 64 \| 128 \| 12 \| 64 \| 26.1 \| 49.8 \| 23.5 QKV \| 128 \| 384 \| 12 \| 64 \| 84.6 \| 88.5 \| 56.1 QKV \| 64 \| 384 \| 12 \| 64 \| 79.1 \| 80.3 \| 53.5 QKV \| 128 \| 512 \| 12 \| 64 \| 97.3 \| 114.2 \| 62.2 QKV \| 64 \| 512 \| 12 \| 64 \| 95.9 \| 110.7 \| 60.6 QKV \| 4 \| 2048 \| 32 \| 128 \| 125.26 \| 44.72 \| 78.15 QKV \| 4 \| 4096 \| 32 \| 128 \| 141.62 \| 46.29 \| 85.84 QKV \| 8 \| 2048 \| 32 \| 128 \| 127.40 \| 45.49 \| 78.75 QKV \| 8 \| 4096 \| 32 \| 128 \| 144.24 \| 46.60 \| 86.95 ### Known Issues NVCC uses huge memory while compiling flash attention CUDA kernel. Linux build with CUDA might fail when machine has limited memory while number of CPUs is large. Walkaround is to use a build machine with larger memory, or use argument like `--nvcc_threads 1` to limit nvcc threads in build. ### Motivation and Context Increases speed and efficiency of MHA or Packed MHA. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>	2023-08-31 13:52:21 -07:00
Rachel Guo	b54619509f	Refine build script for adding disable selected data types option (#17284 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Now we have multiple data types that we want to disable for minimal build and to reduce binary size. may be worth adding an argument in the build script for specifying that. Also for fp16 type stuff, it may be too restrict to disable that for all minimal build. --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-08-31 13:32:55 -07:00
Yi Zhang	507a40e1e9	Add compiler cache in Linux GPU TensorRT CI. (#17348 ) ### Description Add the compiler cache in linux GPU tensorRT CI. Save about 30 minutes in the GPU machine. (52 minutes -> 24 minutes) PS. There're only white-space differences in the dockerfile. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-31 08:13:26 +08:00
Jian Chen	081c0692a4	Update to nodejs version from 16 to 18.17.1 (#17351 ) ### Description Update to nodejs version from 16 to 18.17.1 ### Motivation and Context Nodejs will reach EOL in September 2023	2023-08-30 12:41:48 -07:00
Changming Sun	71da0824f3	Upgrade binskim and fix an error in nuget packaging pipeline (#17340 ) ### Description Upgrade binskim and fix an error in nuget packaging pipeline.	2023-08-30 07:52:06 -07:00
Jian Chen	922629aad8	Upgrade Centos7 to Alamlinux8 (#16907 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Get the latest gcc 12 by default --------- Co-authored-by: Changming Sun <chasun@microsoft.com>	2023-08-29 21:05:36 -07:00
Yi Zhang	d4a61ac71f	Pr trggiers generated by code (#17247 ) ### Description 1. Refactor the trigger rules generation. 2. Skip all doc changes in PR pipelines. ### Motivation and Context Make all trigger rules generated by running set-trigger-rules.py to reduce inconsistences. It's easily to make mistakes to copy&paste manually. For example: these 2 excludes are different, Why? `4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml (L16-L18)` `4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml (L27-L29)` ### Note All changes in workflow yamls are generated by code. Please review the skip-js.yml, skip-docs.yml and set-trigger-rules.py. @fs-eire, please double check the filter rules in skip-js.yml and the skipped workflows `7023c2edff/tools/ci_build/set-trigger-rules.py (L14-L41)`	2023-08-30 05:57:03 +08:00
Yi Zhang	0e9e9b2a67	Fix one exception in post merge (#17327 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-29 19:24:50 +08:00
cloudhan	bf8b1681f9	Build nuget pkg for ROCm (#16791 ) Add nuget pkg building and publishing for ROCm EP --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2023-08-28 13:35:08 +08:00
Yifan Li	808215366d	Fix Multi GPU TensorRT tests (#17269 ) ### Description * Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10 vm) * Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon) * Add multi gpu support to existing C# test cases * Deprecate unfunctional flag `--enable_multi_device_tests` ### Motivation and Context * Two contexts of replacing Linux MultiGPU TRT CI: * Flag `--enable_multi_device_tests` is not functional, which cannot detect issues like #17036 * The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23. Need to enable this test in other dualGPU vm pool.	2023-08-25 20:30:45 -07:00
Arthur Islamov	c262879214	Added DML and CUDA provider support in onnxruntime-node (#16050 ) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request https://github.com/microsoft/onnxruntime/issues/14127 which is tracked here https://github.com/microsoft/onnxruntime/issues/14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------	2023-08-25 16:57:06 -07:00
Yi Zhang	9cd33e07b4	Readd Tests in Window GPU Reduced Ops workflow (#17294 ) ### Description Add single test step in Window GPU Reduced Ops workflow ### Motivation and Context The old workflow's building and testing were running in one command. In PR #17263, the test step was removed by mistake. So, readd it. How to consolidate the test step is in consideration.	2023-08-25 15:56:59 +08:00
Yi Zhang	756eda2cc4	Windows CI build steps template (#17263 ) ### Description 1. New windows ci build steps template. 2. Remove useless variables. ### Motivation and Context 1. Make it easier to apply build cache to all windows CIs. 2. Other team's devs only need to take care of build options ###Comparision Before: `9f21f694cf/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L19-L82)` After: `b4c1f2261b/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L35-L54)`	2023-08-25 05:58:49 +08:00
Jian Chen	33415b9da4	Removing 10.14 suffix from osx nuget package (#17277 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-24 08:51:54 -07:00
cloudhan	87bef1f3f2	Move composable_kernel to deps.txt (#17245 )	2023-08-23 17:39:16 -07:00
Yi Zhang	61a79436e2	Common pre-build steps of Windows CI (#16970 ) ### Description Unify some pre-build common steps. ### Motivation and Context In the long run, other devs should only focus on build option and test commands. It would reduce mistakes and maintenance cost to use common template steps. There will be more PRs to achieve the goal.	2023-08-22 18:09:55 +08:00
cloudhan	4e6cec4d09	Update ck and enable test (#16383 ) Apply the fix in https://github.com/ROCmSoftwarePlatform/composable_kernel/issues/728 Introduce more kernel instances and allow the introduction of streamk and splitk.	2023-08-22 11:08:55 +08:00
Baiju Meswani	aae9a52e8b	Avoid pushing cpu package to https://download.onnxruntime.ai/ (#17238 )	2023-08-21 15:47:07 -07:00
Changming Sun	e2b6827a59	Add a CUDA 12.x pipeline and improve install_third_party_deps.ps1 (#17231 ) ### Description 1. Add a CUDA 12.x pipeline 2. Improve install_third_party_deps.ps1: avoid using Start-process. Directly call the command instead. ### Motivation and Context Since our official packages and all CI pipelines still use CUDA 11.x, we need extra pipelines to validate our source code level compatibility with CUDA 12.x. BTW for sure the prebuilt binaries in our release page are not compatible with CUDA 12.x. Do not report bugs for that. AB#15152	2023-08-21 13:04:36 -07:00
Chi Lo	9445539e2c	Update dependency for deps.txt (#17220 ) https://github.com/microsoft/onnxruntime/pull/17059 updates deps.txt and we also need to update cgmanifest.json and upload the files to Azure DevOps https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342803&view=results for testing	2023-08-19 00:43:25 -07:00
Edward Chen	d6cd41cfc1	[CoreML EP] Add Shape, Gather, and Slice ops (#17153 ) Add CoreML EP shape related ops: - Shape - Gather - Slice Add support for int64/int32 inputs in CoreML EP.	2023-08-18 22:34:34 -07:00
Yulong Wang	3426954525	disable browser stack tests (#17224 ) ### Description disable browser stack tests	2023-08-18 17:14:12 -07:00
Changming Sun	6db72165eb	Fix python packaging test pipeline (#17204 ) ### Description 1. Fix python packaging test pipeline. There was an error in tools/ci_build/github/linux/run_python_tests.sh that it installed a released version of onnxruntime python package from pypi.org to run the test. Supposedly it should pick one from the current build. 2. Refactor the pipeline to allow choosing cmake build type from the web UI when manually trigger a build. Now this feature is for Linux only. Because I don't want to change too much when we are about to cut a release branch. After that I will expand it to all platforms. This feature is useful for debugging pipeline issues, also, we may consider having a nightly pipeline to run all tests in Debug mode which may catch extra bugs because in debug mode we can enforce range check. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results ### Motivation and Context Currently the pipeline has a crash error. AB#18580	2023-08-18 14:51:26 -07:00
Adrian Lizarraga	6ee4be724b	Update LICENSE name in NuGet packaging pipelines (#17183 ) ### Description Updates NuGet packaging pipelines to use the correct license name. ### Motivation and Context The license name changed. See https://github.com/microsoft/onnxruntime/pull/17170 The QNN_Windows_Nuget and Zip-Nuget-* pipelines will not run without this update.	2023-08-17 22:22:19 -07:00
Changming Sun	0cccbcc47b	Move DML build job's Prefast task to a CPU machine pool (#17192 ) ### Description Move DML build job's Prefast task to a CPU machine pool which has larger memory. The current one runs out of memory in every run. ### Motivation and Context To fix the broken python packaging pipeline.	2023-08-17 13:16:29 -07:00
Jian Chen	e0022d061f	Set web-ci-pipeline.yml only triggered when related fields are updated (#17148 ) - 'js/web' - 'js/node' - 'onnxruntime/core/providers/js' is updated ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-17 12:55:35 -07:00
Adrian Lizarraga	96b1ff610b	Add CI and PR validation triggers to QNN Windows x64 Pipeline yaml (#17178 ) ### Description Adds continuous integration and pull-requestion validation triggers directly to the yaml file for the Windows x64 QNN CI Pipeline. ### Motivation and Context There have been various unit tests failures that break the QNN_Windows_Nuget pipeline, which builds QNN EP for Windows x64. This PR ensures that QNN EP is built and tested on a Windows x64 image for every pull request.	2023-08-16 11:44:54 -07:00
Jian Chen	8998b6811d	Fix NPM Packaging Pipeline (#17182 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-15 22:56:38 -07:00
Adam Louly	c647e3e8ab	Run nightly pipeline tests from the commit id. (#17162 ) ### Description The onnxruntime-CI-nightly-ort-pipeline encounters occasional failures due to synchronization discrepancies between the ACPT nightly image and the repository. We are addressing this by executing tests using the commit ID associated with the ort build within the ACPT image. --------- Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-08-15 12:07:38 -07:00
Changming Sun	8e203efc69	Cleanup cmake file (#17154 ) ### Description 1. Clean up cmake files. Remove some unused code 2. Remove the "Semmle" task from tools/ci_build/github/azure-pipelines/templates/win-ci.yml. Semmle is deprecated and replaced by CodeQL.	2023-08-15 10:51:33 -07:00
Changming Sun	2a22325005	Explicitly set JDK version when building ORT java package (#17147 ) ### Description Explicitly set JDK version when building ORT java package. This is to fix an internal build error.	2023-08-15 10:36:05 -07:00
Adrian Lizarraga	b734db1924	[QNN EP] Fix CI build on Windows x64 pipelines (#17152 ) ### Description - Disables Resize tests that use nearest mode on QNN CPU. - Fixes indentation problems on yaml for win x64 qnn pipeline. ### Motivation and Context The QNN windows Nuget pipeline does not run due to failing unit tests on Windows x64. These tests should not be enabled until we determine the rounding behavior of QNN's ResizeNearestNeighbor operator.	2023-08-14 21:03:14 -07:00

... 3 4 5 6 7 ...

2015 commits