onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-17 01:44:45 +00:00

Author	SHA1	Message	Date
Tianlei Wu	a2b0a69dcc	Update MultiHeadAttention benchmark to test CPU (#20972 ) ### Description MultiHeadAttention benchmark script only supports cuda provider right now. This updates the script to support testing cpu operator and ploting gpu latency. ### Motivation and Context Benchmark for the coming cpu flash attention.	2024-06-12 13:04:25 -07:00
Changming Sun	99f0fe3fae	Fix a few issues in "Zip-Nuget-Java-Nodejs Packaging Pipeline" (#21014 ) ### Description Fix a few issues in the Windows TRT job in "Zip-Nuget-Java-Nodejs Packaging Pipeline": 1. It is a Windows job. It should not use bash(which is usually not available on Windows). 2. When it sets ADO vars, it missed a semicolon Here is the doc of how to set ADO vars via scripts: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-variables-scripts?view=azure-devops&tabs=bash You could see it needs a semicolon . Without the semicolon , the vars will have an extra quotation mark in their values.	2024-06-12 09:44:24 -07:00
Baiju Meswani	94aa21c3dd	Define _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR (#21005 ) https://github.com/microsoft/STL/pull/3824 introduces constexpr mutex. An older version of msvcp140.dll will lead to ```A dynamic link library (DLL) initialization routine failed```. This error can be encountered if using conda Python since conda packages msvc dlls and these are older right now. This PR disables the constexpr mutex so that ort package can work with older msvc dlls. Thanks @snnn for the discovery.	2024-06-11 22:23:28 -07:00
Jing Fang	9be30348b9	[CPU EP] Add blocked quantization to QuantizeLinear op kernel (#20977 ) ### Description Add blocked quantization to QuantizeLinear op kernel. If the quantize axis is not the last axis, block the tensor using 1x128 blocks. Blocks are dispatched to multiple threads for concurrently processing. Currently only support scalar instructions. If the quantize axis is the last axis, block the tensor using 1 x quant_block_size blocks. Blocks are dispatched to multiple threads for concurrent processing. If output type is int types, call mlas kernel to use the SIMD instructions in each block. #### Benchmark data 20 core 2GHz CPU, RelWithDebInfo config, 196 x 4096 tensor, quantize float to int4x2 Quantize before last axis: * single thread, scalar instruction: 31380900 ns * 8 thread, scalar instruction: 5098620 ns Quantize last axis: * single thread, scalar instruction: 27927900 ns * 8 thread, SIMD instruction: 102261 ns more thread, SIMD instruction, larger block size helps ### Motivation and Context ONNX added blocked quantization to QuantizeLinear in optset 21	2024-06-11 20:25:28 -07:00
Yi Zhang	17d5dc503f	Upgrade ESRP signing task from v2 to v5 (#20995 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-12 08:31:53 +08:00
cloudhan	67c8befd1d	test: refactor flash_attn tests to use parameterized (#20913 ) Use `parameterized` to decompose the huge test case. This will make adding ROCm support be possible. --------- Co-authored-by: Guangyun Han <guangyunhan@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>	2024-06-11 15:57:20 -07:00
Tianlei Wu	b3fc9b5a0e	[CUDA] upgrade cutlass to 3.5.0 (#20940 ) ### Description Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in Windows - [x] Upgrade cutlass to 3.5.0. - [x] Fix flash attention build error with latest cutlass header files and APIs. This fix is provided by @wangyems. - [x] Update efficient attention to use new cutlass fmha interface. - [x] Patch cutlass to fix `hrsqrt` not found error for sm < 53. - [x] Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test build error for cuda 11.8 to 12.3. - [x] Disable TRT 10 deprecate warnings. The following are not included in this PR: * TRT provider replaces the deprecated APIs. * Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This test is not built by default unless you add `--cmake_extra_defines onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON` in build command. To integrate to rel-1.18.1: Either bring in other changes (like onnx 1.16.1), or generate manifest and upload a new ONNX Runtime Build Time Deps artifact based on rel-1.18.1. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/19891 https://github.com/microsoft/onnxruntime/issues/20924 https://github.com/microsoft/onnxruntime/issues/20953	2024-06-11 13:32:15 -07:00
Yulong Wang	dd805ff77d	[js/web] ESM: use the bundled target as default export (#20991 ) ### Description ESM: use the bundled target as default export In this change, the default import of the following entries: ``` import from 'onnxruntime-web'; import from 'onnxruntime-web/all'; import from 'onnxruntime-web/webgpu'; ``` will use the "bundled" version, which has no dynamic import. This change should only apply to ESM on web.	2024-06-11 11:14:55 -07:00
Jian Chen	05032e5e5f	Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925 ) ### Description Adding support of cudnn 9 ### Motivation and Context Keep exsiting cuda 12.2 with nvidia dirver 535	2024-06-11 09:37:16 -07:00
Wanming Lin	043ef5c95f	[WebNN EP] Support latest WebNN softmax op (#20827 ) Latest WebNN softmax supports N-D input and axis parameter.	2024-06-11 08:27:14 -07:00
Changming Sun	ae4a2e6b3f	Publish Build Symbols for DML nightly nuget package (#20988 ) ### Description Publish Build Symbols for DML nightly nuget package.	2024-06-10 17:53:22 -07:00
Changming Sun	dc545d366d	Publish debug symbols for Windows python packages (#20973 ) ### Description 1. Publish debug symbols for Windows python packages. This PR will publish them to ADO. Later on I will also replicate them to Microsoft Symbol Server. 2. Build the packages in Release mode instead of RelWithDebInfo, to be consistent with the other platforms(Linux/macOS/...) ### Motivation and Context To help debug things. Sometimes we found an issue, but we couldn't debug it because we didn't have symbols, and once we rebuilt the package locally the issue was gone. This change would be helpful for such scenarios. Build log: https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841	2024-06-10 12:33:49 -07:00
Changming Sun	92ae60b01f	Revert a cmake change in protobuf_cmake.patch (#20964 ) Avoid patching external projects unless absolutely necessary #20875	2024-06-10 11:20:33 -07:00
Hector Li	007d106b73	Disable inference on CPU if CPU fallback is disabled (#20976 ) ### Description Don't allow model inference on CPU (Ort CPU EP or QNN EP CPU backend) if CPU fallback is disabled.	2024-06-10 09:27:43 -07:00
Hector Li	3c6d409937	Enable Hardsigmoid for QNN EP using SDK support direct support (#20956 ) ### Description Enable Hardsigmoid for QNN EP using SDK support direct support instead of decomposing to its constituent ops so it can support the quantized model	2024-06-10 09:16:25 -07:00
Edward Chen	855c1cffc9	Update comment in cpuid_info.cc (#20974 ) Update comments to indicate that we don't need to set CPUIDInfo::is_armv8_narrow_ld_ on Apple platforms.	2024-06-10 08:52:38 -05:00
wejoncy	bd61ae530b	relax seq len checking in rotary_emb (#20778 ) ### Description Length checking is even more strict for packed batching input. There are two cases for a batch of input_ids. - padded seq with equal length of inputs. ``` \|----******\| \|------------\| \|--------\| \|-*********\| ``` - packed seqs with different length of input_ids `\|----\|---------\|----\|-\|` The max_seq_length is either from graph_inputs or the position_ids. While in most of cases, we will cache the max_seq_length of rotary_cache in the model ans shared among all layers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: kailums <kalu@microsoft.com>	2024-06-08 18:39:06 +08:00
Edward Chen	981893c318	Remove deprecated "mobile" packages (#20941 ) # Description This PR removes the building of the ORT "mobile" packages and much of the associated infrastructure which is no longer needed. Not removed yet - tools/ci_build/github/android/mobile_package.required_operators.config and the helper scripts that depend on it. # Motivation and Context The mobile packages were deprecated in 1.18. Users should use the full packages (Android - onnxruntime-android, iOS - onnxruntime-c/onnxruntime-objc) instead or do a custom build.	2024-06-07 16:20:32 -05:00
Changming Sun	a53f692832	Update c-api-noopenmp-packaging-pipelines.yml: remove CUDA version parameter (#20955 ) ### Description Update c-api-noopenmp-packaging-pipelines.yml: remove CUDA version parameter To reduce confusion. This pipeline is for generating CUDA 11 packages. Just it. Not CUDA 12. ### Motivation and Context In the last release we accidentally published CUDA 12(instead of CUDA 11) packages to nuget.org. We also tried to publish CUDA 12 packages to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly. Luckily it didn't go through because a package with the same version number already existed there. Every time when someone runs this pipeline with CUDA version set to 12, the built packages will be published to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly. And GenAI team's build pipelines are based on the nightly packages. So sometimes GenAI team builds their packages with CUDA 12 and sometimes with CUDA 11, which is very random. Therefore, please limit the use of pipeline parameters. Most Azure DevOps yml files are template files. They should use parameters. But the top level yml files should be more careful on that.	2024-06-07 11:19:59 -07:00
Jian Chen	d32adb26f2	Refactor deprecated gradle syntax (#20922 ) To replaced deprecated API. Should verify with the `Gradle cmakeCheck` step from `Windows_Packaging_CPU_x64_default` stage from the Zip-Nuge-... pipeline.	2024-06-07 11:08:52 -07:00
ivberg	74028e4bdc	Fully dynamic ETW controlled logging for ORT and QNN logs (#20537 ) ### Description Windows - Fully dynamic ETW controlled logging for ORT and QNN logs The logging support is documented here - https://onnxruntime.ai/docs/performance/tune-performance/logging_tracing.html#tracing---windows - https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#tracelogging-etw-windows-profiling Also add support for logging ORT SessionCreation on ETW CaptureState ### Motivation and Context The previous ETW support only worked if you enabled ETW before the session started. There can commonly be long-lived AI inference processes that need to be traced & debugged. This enables logging fully on the fly. Without this support a dev would have to end up killing a process or stopping a service in order to get tracing. We had to do this for a recent issue with QNN, and it was a bit painful to get the logs and it ruined the repro. ### Testing I tested with the following cases - Leaving default ORT run - Enabling ETW prior to start and leaving running for entire session + inferences, then stopping - Starting ORT session + inf, then enabling and stopping ETW - Start ORT session /w long running Inferences - wpr -start [ort.wprp](`e6228575e4/ort.wprp (L4)`) -start [etw_provider.wprp](`e6228575e4/onnxruntime/test/platform/windows/logging/etw_provider.wprp`) - Wait a few seconds - wpr -stop ort.etl - Inferences are still running - Verify ONNXRuntimeLogEvent provider events are present and new SessionCreation_CaptureState event under Microsoft.ML.ONNXRuntime provider Related: #18882 #19428	2024-06-06 21:11:14 -07:00
Changming Sun	f8b5c2805e	Update abseil-cpp.cmake: add version check (#20962 ) Some dev environments come with a preinstalled abseil. For example, conda users often do that. If the preinstalled abseil version is incompatible with what we have in cmake/deps.txt, it could result in a hard-to-understand build error. This PR adds a version check to improve that.	2024-06-06 19:42:31 -07:00
Jian Chen	96228c86a0	Adding Job names to jobs without a name (#20961 ) ### Description Adding Job names to jobs without a name ### Motivation and Context This way we will know which job fails CG scan.	2024-06-06 19:09:21 -07:00
Adrian Lizarraga	128bfc0665	[MLAS] Use C-style casting for power vector instructions (#20957 ) ### Description Uses C-style casting for Power vector instructions in `MlasQuantizeLinearInt4Kernel`. ### Motivation and Context Vector commands (e.g., vec_xst) need C-style casting to support various compiler versions. ONNX Runtime CI pipelines do not build with all compiler versions. The recent INT4 PR broke the powerpc build for certain compiler versions because it uses C++-style `static_cast<>`. See: https://github.com/microsoft/onnxruntime/pull/20362#discussion_r1630106164 Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-06-06 15:11:59 -07:00
Hector Li	05889b33ef	Support loading from model with multiple QNN context binary (#20930 ) ### Description Support loading from model with multiple QNN context binary ### Motivation and Context QNN EP generated context binary model only has one single QNN context. Because of QNN PD memory limitation, large model (>3.5GB) has to be split into 2 smaller models. Then generate the model with context binary. User can load from the smaller models with context binary. The problem is it requires 2 Ort session. User want to glue the split models into 1 (with multiple EPContext nodes) so that they can use 1 Ort session to do the work. QNN EP has limitation which only support loading from 1 single QNN context binary. This PR removes that limitation to unblock this user scenario. --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2024-06-06 14:44:57 -07:00
Wanming Lin	52874f628a	[WebNN EP] Remove some constraints for CPU backend (#20900 ) Following constraints have been supported by WebNN TFLite backend: - Concat: supports up to 4 inputs - Matmul: supports broadcasting - Resize: supports nearest mode - Split: supports up to 4 outputs	2024-06-06 08:22:41 -07:00
Wanming Lin	da1f8f9274	[WebNN EP] TFLite backend only supports limit ranges for Clip (#20863 )	2024-06-06 08:22:18 -07:00
Guenther Schmuelling	c749bd997a	webgpu quickgelu (#20939 )	2024-06-06 08:21:33 -07:00
Chester Liu	5b87544aab	Add conditional check in Get/Set current GPU device id (#20932 ) ### Description Add conditional check in Get/Set current GPU device id ### Motivation and Context Currently with ROCm build, calling `GetCurrentGpuDeviceId` will still try to find CUDA libraries and log the following error message: ```text [E:onnxruntime:, provider_bridge_ort.cc:1836 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1511 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libonnxruntime_providers_cuda.so: cannot open shared object file: No such file or directory ``` This is unnecessary and confusing.	2024-06-06 17:10:14 +08:00
Scott McKay	3ecf48e3b5	Add support for Trilu<bool>. (#20917 ) ### Description <!-- Describe your changes. --> Trilu<bool> is used by phi-3 when exported with torch.onnx.export. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-06 15:21:34 +10:00
Chester Liu	eb2ec66716	Initialize device_id in cuda_call & rocm_call (#20933 ) ### Description <!-- Describe your changes. --> Initialize `device_id` with `-1` in `cuda_call` and `rocm_call`. ### Motivation and Context From PyTorch code: `bb2de3b101/c10/cuda/CUDAFunctions.cpp (L217-L324)` If `cudaGetDevice` or `hipGetDevice` failed, an uninitialized `int` would produce a random number that changes during each run: ```text [with ERRTYPE = hipError_t; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] HIP failure 101: invalid device ordinal ; GPU=32741 ; hostname=e6724be2a31a ; file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_common.h ; line=66 ; expr=hipGetDeviceProperties(&deviceProp, 0); ``` Notice the `GPU` value above. Using `-1` would clearly indicate such failure and avoid confusion.	2024-06-06 11:19:09 +08:00
Adrian Lizarraga	b5eb9e8a8a	[QNN EP] Update to QNN SDK 2.22 (#20628 ) ### Description - Updates pipelines to use QNN SDK 2.22 by default. - Linux QNN pipeline now uses an Ubuntu 22.04 image (required by QNN SDK) - Android QNN pipeline still uses the current Ubuntu 20.04 image. Will update in a separate PR. - Disables QDQ LayerNorm test that triggers QNN's graph finalization error on QNN 2.22 - Increases accuracy tolerance for various HTP tests so that they pass on Windows arm64. ### Motivation and Context Test QNN EP with latest QNN SDK version by default. --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-06-05 18:25:23 -07:00
Adrian Lizarraga	df28c7d73b	[Quant tool] Improve performance of int4 weight quantization (#20935 ) ### Description - Uses our own quantization functions instead of the ONNX reference implementation of QuantizeLinear when quantizing weights to int4. - Uses a custom function that packs bytes into 4-bit elements. ### Motivation and Context Running the quantization tool to create QDQ models with int4 weights could take up to 7x longer. This PR uses our own quantization and byte packing utilities to improve performance. #### Measurements Model with ~5M parameters to quantize to int4. - Current implementation: 84.5s - Only replace ONNX QuantizeLinear implementation: 50.3s (1.68x speedup) - This PR (replace onnx Q impl, custom packing func): 13.5s (6.26x speedup) --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-06-05 16:48:40 -07:00
Chip Kerchner	4cb23b020c	Improvements to the INT8 GEMM portion of the code for Power (#20595 ) These are changes to improve GEMM portion of the code for Power. There are 2 main code changes : 1) Changing a function to a template parameter so that operations that add/sub zero are eliminated at compile time. Plus reuse a vector that has the mask instead of rebuilding each time. 2) Add processing 16 columns at a time in MlasGemmQuantCopyPackB8x8 - this should reduce potential page faults by a factor of 4 and also be faster. 3) Unroll MlasQgemmStoreVectorMMA and vectorize other variables.	2024-06-05 14:24:22 -07:00
Yufeng Li	63c13a4811	fix integer overflow in Attention (#20921 ) ### Description <!-- Describe your changes. --> offset used in attention is with data type int. It can overflow for large sequence length. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-05 10:19:26 -07:00
Yueqing Zhang	b374ddd704	[VitisAI] add new api for models (#20899 ) ### Description <!-- Describe your changes. --> Add new APIs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required for satisfying requirement of Microsoft. --------- Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2024-06-04 22:48:04 -07:00
Jing Fang	3ecb012337	[CPU EP] Add blocked quantization to DequantizeLinear op kernel (#20901 ) ### Description Added blocked quantization to DequantizeLinear op kernel. All existing [input types and output types](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftdequantizelinear) are supported. All axes are supported. The implementation in the PR is naive - single thread and scalar instructions. Multi-threading and vector instructions are planned in the future based on the needs. ### Motivation and Context onnx introduced blocked quantization in opset 21 for [DequantizeLinear](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftdequantizelinear). This PR adds the spec support in onnx runtime.	2024-06-04 14:44:40 -07:00
Jian Chen	5faeaf6437	Remove failOnStderr from Gradle cmakeCheck (#20919 ) ### Description Remove failOnStderr from Gradle cmakeCheck ### Motivation and Context The Gradle is still using the deprecated API	2024-06-04 13:54:49 -07:00
Tianlei Wu	6dfdef7782	update stable diffusion demo requirements (#20914 ) ### Description Update docker and package version for stable diffusion demo. ### Motivation and Context Update onnx to 1.16 for security	2024-06-04 12:08:04 -07:00
liqun Fu	51bc53580d	Update to onnx 1.16.1 (#20702 )	2024-06-04 11:06:28 -07:00
Changming Sun	3dd6fcc089	Upgrade min ios version to 13.0 (#20773 ) To align with Office and other MS products. Office's support policy is: "Office for iPad and iPhone is supported on the two most recent versions of iOS and iPadOS. When a new version of iOS or iPadOS is released, the Office Operating System requirement becomes the two most recent versions: the new version of iOS or iPadOS and the previous version." (from https://products.office.com/office-system-requirements) The latest iOS version is 17. So they support both 17 and 16. Here I set our min iOS version to 13 so that it will be a superset of what Office supports. This change would allow us using C++17's std::filesystem feature in the core framework. The modifications were generated by running ```bash find . -type f -exec sed -i "s/apple_deploy_target[ =]12.0/apple_deploy_target=13.0/g" {} \; ``` Cannot use 15.0 because otherwise iOS packaging would fail with: ``` /Users/runner/work/1/b/apple_framework/intermediates/iphoneos_arm64/Release/_deps/coremltools-src/mlmodel/src/MILBlob/Util/Span.hpp:288:9: error: cannot use 'throw' with exceptions disabled MILVerifyIsTrue(index < Size(), std::range_error, "index out of bounds"); ``` The Google OSS libraries we use only officially support iOS 15+.	2024-06-04 10:15:20 -07:00
Yi Zhang	c5087b9b58	Improve stable diffusion image parity test stability (#20904 ) ### Description 1. Add one image into whitelist, but if the image is hit, the pipeline status is warning. 2. adjust the image parity test tolerance ### Motivation and Context improve pipeline stability	2024-06-04 10:19:32 +08:00
zhijiang	3c561c8b26	fix bug (#20694 ) when num of elem in tensor large than 2^32, then we can use cuda_long as dtype of offset	2024-06-04 09:22:10 +08:00
Caroline Zhu	94ce1209f9	Bug fix for gather fusion with on-device training (#20891 ) ### Description Update the initializer that's added in GatherSliceToSplitFusion to use the GenerateNodeArgName function, rather than the GenerateNodeName function. GenerateNodeName goes through all the nodes in the graph to see if the given name is already used and generates a unique one if it has been used. GenerateNodeArgName iterates through all the node args in the graph to see if the given name is already used. ### Motivation and Context * on-device training goes through a generate artifacts step, where optimizations are applied, then, when the training artifact is loaded, additional optimizations are applied. In the first round of optimizations, a "splits" initializer is added for phi-3. With the second round of optimizations, another "splits" initializer with different dimensions and data is added. Since we call GenerateNodeName func, the first splits initializer isn't found, causing a type error where it claims the shape of splits does not match the TensorProto shape.	2024-06-03 14:41:39 -07:00
Jian Chen	456ab09d17	Component Governance fix round 5 (#20905 ) …over the case where there is only single repo checked out ### Description adding $(Build.SourcesDirectory)/cmake/external/onnx/third_party to cover the case where there is only single repo checked out ### Motivation and Context Fix CG issue https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8862110?typeId=16576846	2024-06-03 14:22:22 -07:00
Wanming Lin	9c6481fa2d	[WebNN EP] Enable ArgMax and ArgMin for CPU backend (#20865 ) WebNN TFLite backend supports ArgMax and ArgMin, but only supports 'select_last_index' value is 0.	2024-06-03 14:12:11 -07:00
Wanming Lin	c128132dd8	[WebNN EP] TFLite backend only supports Elu with default alpha (#20862 )	2024-06-03 14:10:22 -07:00
Jian Chen	ae8df4db8f	Split java's gradle build and test (#20817 ) ### Description This PR to allow `./gradlew cmakeCheck` failed on Windows_Packaging_(CUDA\|TensorRT) Job. This way, it will still generate all nessary jar and pom file need for later stage to consume while `./gradlew cmakeCheck`will be also run again in the Windows_Packaging_(CUDA\|TensorRT)_Testing stage. ### Motivation and Context Reduce the time of All java packaging stages by 30+ min.	2024-06-03 14:08:45 -07:00
Yulong Wang	ab9f153746	[js/web] allow build target for non dynamic import (#20898 ) ### Description <!-- Describe your changes. --> This PR allows to build ORT web to `ort{.all\|.webgpu}.bundle.min.mjs`, which does not have any dynamic import. This makes it possible to use ort web via static import in service worker. Fixes #20876	2024-06-03 12:33:37 -07:00
Changming Sun	d13cabf7f9	Upgrade GCC and remove the dependency on GCC8's experimental std::filesystem implementation (#20893 ) ### Description This PR upgrades CUDA 11 build pipelines' GCC version from 8 to 11. ### Motivation and Context GCC8 has an experimental std::filesystem implementation which is not ABI compatible with the formal one in later GCC releases. It didn't cause trouble for us, however, ONNX community has encountered this issue much. For example, https://github.com/onnx/onnx/issues/6047 . So this PR increases the minimum supported GCC version from 8 to 9, and removes the references to GCC's "stdc++fs" library. Please note we compile our code on RHEL8 and RHEL8's libstdc++ doesn't have the fs library, which means the binaries in ONNX Runtime's official packages always static link to the fs library. It is just a matter of which version of the library, an experimental one or a more mature one. And it is an implementation detail that is not visible from outside. Anyway, a newer GCC is better. It will give us the chance to use many C++20 features. #### Why we were using GCC 8? It is because all our Linux packages were built on RHEL8 or its equivalents. The default GCC version in RHEL8 is 8. RHEL also provides additional GCC versions from RH devtoolset. UBI8 is the abbreviation of Red Hat Universal Base Image 8, which is the containerized RHEL8. UBI8 is free, which means it doesn't require a subscription(while RHEL does). The only devtoolset that UBI8 provides is GCC 12, which is too new for being used with CUDA 11.8. And our CUDA 11.8's build env is a docker image from Nvidia that is based on UBI8. #### How the problem is solved Almalinux is an alternative to RHEL. Almalinux 8 provides GCC 11. And the CUDA 11.8 docker image from Nvidia is open source, which means we can rebuild the image based on Almalinux 8 to get GCC 11. I've done this, but I cannot republish the new image due to various complicated license restrictions. Therefore I put them at an internal location in onnxruntimebuildcache.azurecr.io.	2024-06-03 10:14:08 -07:00

1 2 3 4 5 ...

11190 commits