onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Chester Liu	6794dfd941	[QNN EP] Improve QNN error reporting using the error message (#21458 ) ### Description Massively improve the QNN error reporting by invoking `QnnError_getMessage` and returning the error message. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Example error message before this change: ```text QNN SetupBackend failed Failed to create device. Error: 14001 ``` After: ```text QNN SetupBackend failed Failed to create device. Error: QNN_DEVICE_ERROR_INVALID_CONFIG: Invalid config values ```	2024-07-23 22:41:09 -07:00
Wanming Lin	0274008b6b	[WebNN EP] ConvTranspose should calculate the pads or output shape (#21292 ) This PR adds the missing pads and output shape calculation for ConvTranspose. Per ONNX spec: - If the output shape is explicitly provided, compute the pads. - Otherwise compute the output shape, as well as the pads if the auto_pad attribute is SAME_UPPER/SAME_LOWER.	2024-07-23 18:51:49 -07:00
Scott McKay	1df9aa2f08	CoreML: Add GridSample ML Program support (#21431 ) ### Description <!-- Describe your changes. --> Add GridSample ML Program support One combination of inputs has diffs between the pytorch generated unit tests data and CoreML. Disabling until needed as investigation may take a while. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> High priorities models	2024-07-24 11:04:48 +10:00
mingyueliuh	86cedc6832	[Fix] C++ API SetOutputShape for register custom op. (#21366 ) ### Description Bug fix for the SetOutputShape method in custom op shape inference. ### Motivation and Context - Bug a : A obvious bug that will cause all dimensions to be 1. https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_inline.h#L2014 integer_dims.push_back(dim.IsInt()); -> integer_dims.push_back(dim.AsInt()); - Bug b : vector out of range error op's input maybe a scalar and shape is empty. https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_inline.h#L1985 --------- Co-authored-by: mingyue <mingyue@amd.com>	2024-07-23 16:51:00 -07:00
George Wu	c65afcea55	fix python qnn pipelines issues (#21462 ) build_py_params wasn't plumbed through for python qnn pipelines. incorporate fixes for deprecated numpy version option from https://github.com/microsoft/onnxruntime/pull/21459	2024-07-23 15:54:44 -07:00
Tianlei Wu	2b7e2a5bd0	[CUDA] Fix cuda provider fallback inconsistency (#21425 ) * Fix fallback setting (cuda still falls back to cuda). * Fix cuda provider fallback inconsistent with/without CUDA_PATH environment variable. * Add cuda and cudnn major version requirement in error message. Example result in Windows: ``` >>> import onnxruntime >>> ort_session = onnxruntime.InferenceSession("model.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) 2024-07-19 17:43:44.2260019 [E:onnxruntime:Default, provider_bridge_ort.cc:1972 onnxruntime::TryGetProviderInfo_CUDA] D:\onnxruntime\onnxruntime\core\session\provider_bridge_ort.cc:1636 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\.conda\envs\py310\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll" 2024-07-19 17:43:44.2312351 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:970 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12., and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. >>> ort_session <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x0000016BB2DF7D60> >>> ort_session.get_providers() ['CPUExecutionProvider'] ``` Example result in Linux: ``` >>> import onnxruntime >>> ort_session = onnxruntime.InferenceSession("resnet50-v2-7.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) 2024-07-20 20:33:26.486974543 [E:onnxruntime:Default, provider_bridge_ort.cc:1972 TryGetProviderInfo_CUDA] /work/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1636 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.12: cannot open shared object file: No such file or directory 2024-07-20 20:33:26.487034646 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:961 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9. and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. >>> ort_session.get_providers() ['CPUExecutionProvider'] ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/21424	2024-07-23 11:58:04 -07:00
Changming Sun	7af39c6955	Update nodejs's cmake file to fix a file copy issue (#21390 ) This commit `e5f18ba2c1` caused some nightly pipelines to fail. This PR fixes it. It is because recently I changed our Linux library's SONAME. At runtime onnxruntime_binding depends on libonnxruntime.so.1 , instead of libonnxruntime.so.1.19.0(with the full version number). Therefore we need to keep the libonnxruntime.so.1 symlink. The packaging tools/ci_build/github/js/pack-npm-packages.ps1 still needs be updated. I will address it in another PR.	2024-07-23 11:03:55 -07:00
Changming Sun	f70215d4e6	Update C++ dependencies (#21410 ) 1. Update google benchmark from 1.8.3 to 1.8.5 2. Update google test from commit in main branch to tag 1.15.0 3. Update pybind11 from 2.12.0 to 2.13.1 4. Update pytorch cpuinfo to include the support for Arm Neoverse V2, Cortex X4, A720 and A520. 5. Update re2 from 2024-05-01 to 2024-07-02 6. Update cmake to 3.30.1 7. Update Linux docker images 8. Fix a warning in test/perftest/ort_test_session.cc:826:37: error: implicit conversion loses integer precision: 'streamoff' (aka 'long long') to 'const std::streamsize' (aka 'const long') [-Werror,-Wshorten-64-to-32]	2024-07-23 10:00:36 -07:00
Scott McKay	0f1f3b7705	CoreML: ML Program Slice (#21433 ) ### Description <!-- Describe your changes. --> Add support for Slice ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> High priority models.	2024-07-23 20:21:55 +10:00
Sheil Kumar	dd010edb37	Update DirectML from 1.14.1 to 1.15.0 (#21323 ) Update DirectML from 1.14.1 to 1.15.0 --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2024-07-22 16:59:03 -07:00
Prathik Rao	11ad299451	Adds ATen fallback for scaled_dot_product_attention (#21107 ) ### Description <!-- Describe your changes. --> Introduces an ATen fallback for `torch.nn.functional.scaled_dot_product_attention`. This operator was introduced in torch 2.0 and, since then, has had many updates including the implementation of memory efficient attention for V100 machines. The current torchscript exporter exports a subgraph for attention which does not provide the same memory savings that PyTorch's memory efficient attention kernel provides. Allowing fallback to PyTorch ATen op for attention helps mitigate memory spike issues for models leveraging memory efficient attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Memory issues arose when integrating ONNX Runtime Training with AML Stable Diffusion. --------- Co-authored-by: root <prathikrao@microsoft.com>	2024-07-22 16:37:04 -07:00
mindest	5b9369e93c	Fix typos according to reviewdog report. (#21335 ) ### Description Fix typos based on reviewdog report but with some exceptions/corrections.	2024-07-22 13:37:32 -07:00
Jian Chen	4e75605eec	Replace inline pip install with pip install from requirements.txt (#21106 ) ### Description Replace inline pip install with pip install from requirements.txt ### Motivation and Context so that CG can recognize ### Dependency - [x] https://github.com/microsoft/onnxruntime/pull/21085	2024-07-22 12:39:10 -07:00
Wanming Lin	17e9ea6235	[WebNN EP] Add outputDataType option for the ArgMax/ArgMin ops (#21385 ) ### Description WebNN spec introduces a new option: `outputDataType` to `argMax` and `argMin` ops, it's default value is `int32`, we should explicitly set it to `int64` for WebNN EP. Spec CR: "Add outputDataType to argmin/argmax" https://github.com/webmachinelearning/webnn/pull/730	2024-07-22 11:56:09 -07:00
Tianlei Wu	a6c5e2cd20	[CUDA] FusedMHARunnerFP16v2 thread-safe (#21420 ) ### Description - [x] Rewrite FusedMHARunnerFP16v2 to make it thread-safe. - [x] Add multi-threading tests Previously, the kernel parameters params is stored as a member of mha runner, which means that different threads might change the params at the same time and impacts the other threads. For example, if batch_size and seq_len was changed by another thread to larger values in setup(...), buffer overrun might happen in run(...) because a kernel could read/write memory out of range of allocated buffers. In new implementation, I change the api and remove mutable member variables to make it thread safe. Below is summary of change: Before: ``` class FusedMHARunnerFP16v2::mhaImpl { void setup(int seq_len, int batch_size) { // change scalar params } void run(input, output) { // change params for input and output pointers // launch kernel using params } Fused_multihead_attention_params_v2 params; // mutable, not thread-safe } ``` After: ``` class FusedMHARunnerFP16v2::FmhaImpl { void setup(int seq_len, int batch_size, Fused_multihead_attention_params_v2& params) { // change params } void run(params, input, output) { // change params with input and output pointers // launch kernel using params } } ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/18854 https://github.com/microsoft/onnxruntime/issues/21413	2024-07-22 10:41:08 -07:00
Jing Fang	11bf309736	add transform part of the dq matmul tool chain (#21374 ) ### Description This is a partial change from [fajin/qdqmatmulnbitstoolchain](https://github.com/microsoft/onnxruntime/pull/21180). The original PR is blocked by Web CI failures. MatMulNBits is a heavily optimized matmul operation. Currently a MatMul can be converted to MatMulNBits to speed up the model inference. However, MatMulNBits is an ORT only op. To make the graph compatible with ONNX ops and utilize MatMulNBits at the same time, we introduce Q/DQ support for MatMulNBits. To convert MatMul ops in a model to MatMulNBits: 1. use matmul_4bits_quantizer.py to convert MatMul to DQ + MatMul using QDQ mode. 2. In ORT session, DQ + MatMul is fused to MatMulNBits #### Note MatMulNBits assume B weight is uint4. When no zp is provided, zp defaults to 8, which is different from DQ. DQ defaults zp to 0 when no zp provided. And DQ supports int4. Therefore some conversions are introduced during DQ + MatMul --> MatMulNBits step. #### Perf Using QDQ format will increase the model initialization time and memory consumption. With current implement, model init time increased from ~4s to ~9s, and memory consumption increased from ~2.8GB to ~4.8GB. The memory increase is due to 1. in optimizer, after transpose the B weight, a in-memory tensor proto is created using protobuf's arena. 2. in finalize step, when saving initializer and prepacking, ORT arena is used to create buffers for initializers. The memory allocated by arenas cannot be fully deallocated. If disable ORT arena memory allocation, the memory consumptions of both QDQ format and original format are ~2.2GB. The time increase is mainly due to multiple memory copy, but can be further optimized. ### Motivation and Context Please see description for details.	2024-07-19 22:55:15 -07:00
Maximilian Müller	5bec52203d	[TensorRT] Enable refitting an embedded engine when provided as byte stream (#21357 ) ### Description This allows refitting an engine using an ONNX file not available on disk. This is important for encrypted ONNX files on disk.	2024-07-19 21:11:04 -07:00
Scott McKay	34cd2e8ed8	Add CoreML ML Program Resize (#21370 ) ### Description <!-- Describe your changes. --> Add CoreML ML Program Resize - refactor existing logic to try and simplify and share between NeuralNetwork and MLProgram checks - add handling for some new attributes - antialias and axes - should have been done when setting the CoreML EP max opset to 21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support priority models	2024-07-20 09:35:05 +10:00
Tianlei Wu	6ffaaebb60	[CUDA] Attention kernel provider option (#21344 ) ### Description * Add a cuda provider option `sdpa_kernel` to choose which attention kernel to run for testing purpose. * Allow dump which attention kernel is used per node. * Reserve a flag for cudnn flash attention which will be added soon. #### CUDA provider option sdpa_kernel Instead of setting environment variable, we also support setting it in provider option. Note that the setting is global per session. That could help performance testing of each kernel. #### Attention Kernel Debug Info Set an environment variable `ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1`, and ORT will print sdpa kernel used in each node: For example ``` ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1 ./onnxruntime_test_all --gtest_filter=MultiHeadAttentionTest* ``` It will show debug information of kernel used in testing: ``` [ RUN ] MultiHeadAttentionTest.SelfAttention_Batch2_HeadSize32_NoBias_NoMask_PackedQKV AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=0 TRT_FUSED_ATTENTION=1 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=1 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1 Operator=MultiHeadAttention Node=node1 DataType=fp16 TRT_FUSED_ATTENTION=1 AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=1 TRT_FUSED_ATTENTION=0 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=0 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1 Operator=MultiHeadAttention Node=node1 DataType=fp16 EFFICIENT_ATTENTION=1 ``` In this test case, the debug info shows that one session uses trt fused attention and another session use efficient attention.	2024-07-19 13:58:54 -07:00
Yulong Wang	01df8c787d	[js/web] fix vulnerable version of dependencies (#21412 ) ### Description ``` # npm audit report socket.io 3.0.0 - 4.6.2 Severity: high socket.io has an unhandled 'error' event - https://github.com/advisories/GHSA-25hc-qcg6-38wj Depends on vulnerable versions of engine.io fix available via `npm audit fix` node_modules/socket.io ws 8.0.0 - 8.17.0 Severity: high ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q fix available via `npm audit fix` node_modules/ws engine.io 0.7.8 - 0.7.9 \|\| 6.0.0 - 6.5.4 Depends on vulnerable versions of ws node_modules/engine.io socket.io-adapter 2.5.2 - 2.5.4 Depends on vulnerable versions of ws node_modules/socket.io-adapter 4 high severity vulnerabilities ```	2024-07-19 11:11:30 -07:00
Adrian Lizarraga	22d4d82f3c	Move ReluQuantFusion to Level2 for CPU EP only (#21329 ) ### Description Moves the `Relu -> QuantizeLinear` fusion to Level2 optimizations for CPU EP only. ### Motivation and Context See the related PR for motivation and context: https://github.com/microsoft/onnxruntime/pull/20627	2024-07-19 08:36:47 -07:00
glen-amd	cc4049af83	Enabled more VitisAI backend compilers (#21411 ) ### Description Enabled more VitisAI backend compilers	2024-07-19 08:34:03 -07:00
Changming Sun	9140d9b1ff	Update azure-kusto-data and azure-kusto-ingest (#21409 ) A vulnerability has been found in the Kusto SDK. We need to update it to latest to address a security alert.	2024-07-18 14:26:26 -07:00
Edward Chen	05fc0c60ca	[MLAS][AArch64] SQNBitGemm CompInt8 - Use 4x2 tiles (#21380 ) Update SQNBitGemm ARM NEON kernel to compute 4x2 tile of output. Note: Also tried 2x4 and 4x4 tiles but observed the best microbenchmark results with 4x2 tiles.	2024-07-18 13:37:29 -07:00
Frank Dong	92f66de702	remove llama 70b (#21396 ) Remove llama 70b model due to security reason. We need add shard code in HF to enable model shardding for llama-70b, these codes are not merged into main branch as HF forks want a more general solution instead of doing shard for specify model. shared code is kept here: https://github.com/frank-dong-ms/transformers/tree/frdong/shard_llama we kept llama-70b related code here for internal use: https://github.com/frank-dong-ms/onnxruntime/tree/frdong/llama_70b	2024-07-18 12:12:10 -07:00
Yifan Li	bb76ead96c	[TensorRT EP] support TensorRT 10.2-GA (#21395 ) ### Description <!-- Describe your changes. --> * promote trt version to 10.2.0.19 * EP_Perf CI: clean config of legacy TRT<8.6, promote test env to trt10.2-cu118/cu125 * skip two tests as Float8/BF16 are supported by TRT>10.0 but TRT CIs are not hardware-compatible on these: ``` 1: [ FAILED ] 2 tests, listed below: 1: [ FAILED ] IsInfTest.test_isinf_bfloat16 1: [ FAILED ] IsInfTest.test_Float8E4M3FN ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-18 12:11:52 -07:00
kailums	1b38c05544	change ci docker image to rocm6.1 (#21296 ) ### Description <!-- Describe your changes. --> There is a bug for kernel running on rocm6.0, so change ci docker image to rocm6.1 For the torch installed in the docker image, change to rocm repo when it is not 6.0 version. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-18 14:50:01 +08:00
Ranjit Ranjan	6c7562b097	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 ) ### Description Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. ### Motivation and Context changes in this PR contains: 1. Enablement code for building onnxruntime on AIX operating system. 2. while testing the build on AIX, we found issues related to big endian platform . More details about few of those issues can be found in [Big endian issue: Graph Transformation Attention Fusion tests are failing #12921](https://github.com/microsoft/onnxruntime/issues/12921) Below are list of files and the description about the change. 1. cmake/CMakeLists.txt [BUILDING on AIX issue] check for "IBMClang" is added for handling -Wno-unused-parameter 2. cmake/external/onnxruntime_external_deps.cmake [BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX 3. cmake/onnxruntime.cmake [BUILDING on AIX issue] o Blocking codes for AIX which generates generated_source.c and further requires some symbol files. o Putting NO AIX check for non-supported linker flags like --Xlinker o iconv linking 4. cmake/onnxruntime_framework.cmake [BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN' 5. cmake/onnxruntime_mlas.cmake [BUILDING on AIX issue]POWER10 releated macro/function definition . 6. cmake/onnxruntime_providers_cpu.cmake [BUILDING on AIX issue]Putting NO AIX check for non-supported linker flags like --Xlinker 7. cmake/onnxruntime_unittests.cmake [BUILDING on AIX issue] o Putting NO AIX check for non-supported linker flags like --Xlinker o Adding required libraries for AIX linker under applicatiion like onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc 8. cmake/patches/flatbuffers/flatbuffers.patch [BUILDING on AIX issue] Handling of TypeCode in include/flatbuffers/flatbuffers.h under AIX + clang 9. onnxruntime/contrib_ops/cpu/murmur_hash3.cc [Big endian issue] Byte-Conversion handlling in compute() and getblock() routines 10. onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc [Big endian issue] Handling of test failures . Byte swapping for quant_value. 11. onnxruntime/core/framework/tensorprotoutils.cc [Big endian issue] Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto . o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling ConvertRawDataInTensorProto() in big-endian system o ConvertRawDataInTensorProto : function used mainly on big-endian system for byte-swapping of tensor raw_data 12. onnxruntime/core/framework/tensorprotoutils.h [Big endian issue] Declaration of SetRawDataInTensorProto, ConvertRawDataInTensorProto 13. onnxruntime/core/graph/graph.cc [Big endian issue] o Call ConvertRawDataInTensorProto for SPARSE_TENSOR type o Call ConvertRawDataInTensorProto for SaveToOrtFormat 14. onnxruntime/core/mlas/lib/platform.cpp [BUILDING on AIX issue] POWER10 released enablement for AIX 15. onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp [BUILDING on AIX issue]Handling of __vector under AIX+clang 16. onnxruntime/core/mlas/lib/qgemm.h [BUILDING on AIX issue] Adding _AIX flag 17. onnxruntime/core/mlas/lib/qlmul.cpp [BUILDING on AIX issue] Handling of __vector under AIX+clang 18. onnxruntime/core/optimizer/attention_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 19. onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 20. onnxruntime/core/optimizer/constant_folding.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 21. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 22. onnxruntime/core/optimizer/nchwc_transformer.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 23. onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 24. onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 25. onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 26. onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 27. onnxruntime/core/optimizer/reshape_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 28. onnxruntime/core/optimizer/stft_decomposition.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 29. onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 30. onnxruntime/core/platform/path_lib.h [BUILDING on AIX issue] Moving to normal function call, instead of template 31. onnxruntime/core/platform/posix/env.cc [BUILDING on AIX issue]Blocking syscall.h in AIX 32. onnxruntime/core/session/inference_session.cc [Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN 33. onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc [Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer and ExternalWriteReadWithLoadInitializers 34. onnxruntime/test/framework/sparse_kernels_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 35. onnxruntime/test/framework/tensorutils_test.cc [Big endian issue] Helper method ConvertEndianessForVector and call this from required place. 36. onnxruntime/test/framework/test_tensor_loader.cc o. [BUILDING on AIX issue] Handling of getcwd for AIX o. [Big endian issue] Bytes Swapping in run_external_data_test 37. onnxruntime/test/onnx/main.cc [Big endian issue] including <thread> for AIX 38. onnxruntime/test/onnx/tensorprotoutils.cc [Big endian issue] Bytes swapping in UnpackTensorWithRawData 39. onnxruntime/test/optimizer/graph_transform_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 40. onnxruntime/test/optimizer/graph_transform_test_builder.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 41. onnxruntime/test/optimizer/graph_transform_test_builder.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 42. onnxruntime/test/optimizer/initializer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 43. onnxruntime/test/optimizer/nchwc_optimizer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 44. onnxruntime/test/providers/base_tester.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data 45. onnxruntime/test/providers/cpu/generator/random_test.cc [BUILDING on AIX issue] Adding AIX check in MultinomialGoodCase --------- Co-authored-by: Vamshikrishna Thatikonda <vamshikrishna@in.ibm.com>	2024-07-17 12:37:06 -07:00
Tianlei Wu	0f4c39ec47	[ROCM] adjust test_flash_attn_rocm test tolerance (#21379 ) The test_flash_attn_rocm.py from https://github.com/microsoft/onnxruntime/pull/21032 failed frequently. For example, I saw two failed jobs today: E Max absolute difference: 0.002167 E Max absolute difference: 0.002686 Adjust the abs threshold from 0.002 to 0.005, and use default relative tolerance rtol=0.001.	2024-07-17 07:35:12 -07:00
vraspar	fa287042ca	Add ML Program support for transpose op (#21364 ) ### Description Add support for transpose op ### Motivation and Context Enable support for Autodesk model	2024-07-16 16:34:58 -07:00
Tianlei Wu	760a31c848	Exclude blkq4_fp16_gemm_sm80_test in cuda 12.5 build (#21373 ) There is build errors when build with CUDA 12.5 and `--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON`. Temporally exclude blkq4_fp16_gemm_sm80_test to unblock cuda 12.5 build.	2024-07-16 15:58:11 -07:00
Yueqing Zhang	dcc04367b7	[VitisAI] fix graph save (#21293 ) ### Description <!-- Describe your changes. --> Revert the wrong change in https://github.com/microsoft/onnxruntime/pull/20920 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It would save the data at a wrong position	2024-07-16 13:48:29 -07:00
Jing Fang	5df4ddd1c3	matmul 4bit tool chain support qdq (#21362 ) ### Description This is a partial change ported from fajin/qdqmatmulnbitstoolchain. That branch has issues resolving the web CI. MatMulNBits is a heavily optimized matmul operation. Currently a MatMul can be converted to MatMulNBits to speed up the model inference. However, MatMulNBits is an ORT only op. To make the graph compatible with ONNX ops and utilize MatMulNBits at the same time, we introduce Q/DQ support for MatMulNBits. To convert MatMul ops in a model to MatMulNBits: use matmul_4bits_quantizer.py to convert MatMul to DQ + MatMul using QDQ mode. In ORT session, DQ + MatMul is fused to MatMulNBits #### Note MatMulNBits assume B weight is uint4. When no zp is provided, zp defaults to 8, which is different from DQ. DQ defaults zp to 0 when no zp provided. And DQ supports int4. Therefore some conversions are introduced during DQ + MatMul --> MatMulNBits step. #### Perf Using QDQ format will increase the model initialization time and memory consumption. With current implement, model init time increased from ~4s to ~9s, and memory consumption increased from ~2.8GB to ~4.8GB. The memory increase is due to 1. in optimizer, after transpose the B weight, a in-memory tensor proto is created using protobuf's arena. 2. in finalize step, when saving initializer and prepacking, ORT arena is used to create buffers for initializers. The memory allocated by arenas cannot be fully deallocated. If disable ORT arena memory allocation, the memory consumptions of both QDQ format and original format are ~2.2GB. The time increase is mainly due to multiple memory copy, but can be further optimized. ### Motivation and Context Please see description for details.	2024-07-16 10:34:19 -07:00
Changming Sun	8568a67673	Fix a build error when CUDA is enabled and onnxruntime_DISABLE_CONTRIB_OPS is ON (#21285 ) Resolve #21204 To reproduce the issue, build the code with ``` python3 tools/ci_build/build.py --build_dir /tmp/build13 --config Debug --skip_submodule_sync --build_shared_lib --parallel --use_binskim_compliant_compile_flags --build_csharp --enable_onnx_tests --update --build --build_wheel --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --cmake_extra_defines onnxruntime_DISABLE_CONTRIB_OPS=ON onnxruntime_BUILD_UNIT_TESTS=OFF --skip_tests ``` Then run the following python script: ```python #!/usr/bin/python3 import onnxruntime as ort providers = [("CUDAExecutionProvider")] ort_sess = ort.InferenceSession('/data/onnx/opset17/test_gemm_default_no_bias/model.onnx', providers=providers) ``` Without this fix, you will see an error: Failed to load library libonnxruntime_providers_cuda.so with error: /tmp/build18/Debug/onnxruntime/capi/libonnxruntime_providers_cuda.so: undefined symbol: _ZN11onnxruntime4cuda21BuildKernelCreateInfoINS0_57kCudaExecutionProvider_GridSample_kOnnxDomain_ver16_floatEEENS_16KernelCreateInfoEv	2024-07-16 10:05:33 -07:00
vraspar	218301403d	Add ML Program support for basic activation ops (#21326 ) ### Description Add support for: - Sigmoid - Relu - Tanh ### Motivation and Context Enable support for Autodesk model	2024-07-15 22:30:20 -07:00
George Wu	4005d12ed4	add vitisai ep build stage to Windows CPU Pipeline (#21361 ) We need to prevent VitisAI EP build breaks, add a stage in Windows CPU CI Pipeline to build Vitis AI EP on Windows. There are no external dependencies for builds. Tests have to be disabled though as the EP has external SW/HW dependencies. This will at least allow us to prevent build breaks which has happened on multiple occasions recently. tested https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1432346&view=results and it seems to run fine.	2024-07-15 19:34:08 -07:00
Adrian Lizarraga	cf565e955d	Revert "Fix ETW Sink Initialize unproperly locking" (#21360 ) Reverts microsoft/onnxruntime#21226 Causes any onnxruntime app to hang on Windows ARM64. Our pipelines do not have the same ETW environment, so we couldn't catch it. ![image](https://github.com/user-attachments/assets/80edbf7d-be50-4cb0-a016-f390b81dc798) The call to TraceLoggingRegisterEx() recursively calls back into LazyInitialize(): LazyInitialize() -> TraceLoggingRegisterEx() -> ORT_TL_EtwEnableCallback() -> Instance() -> LazyInitialize() The original code got out of the recursive loop by checking the `initialized_` flag.	2024-07-15 17:56:08 -07:00
Jing Fang	50170c697e	[Optimizer] DQ + MatMul to MatMulNBits support: kernel changes (#21342 ) Description: ### Description This is a partial change ported from fajin/qdqmatmulnbitstoolchain. That branch has issues resolving the web CI. MatMulNBits is a heavily optimized matmul operation. Currently a MatMul can be converted to MatMulNBits to speed up the model inference. However, MatMulNBits is an ORT only op. To make the graph compatible with ONNX ops and utilize MatMulNBits at the same time, we introduce Q/DQ support for MatMulNBits. To convert MatMul ops in a model to MatMulNBits: 1. use matmul_4bits_quantizer.py to convert MatMul to DQ + MatMul using QDQ mode. 2. In ORT session, DQ + MatMul is fused to MatMulNBits #### Note MatMulNBits assume B weight is uint4. When no zp is provided, zp defaults to 8, which is different from DQ. DQ defaults zp to 0 when no zp provided. And DQ supports int4. Therefore some conversions are introduced during DQ + MatMul --> MatMulNBits step. #### Perf Using QDQ format will increase the model initialization time and memory consumption. With current implement, model init time increased from ~4s to ~9s, and memory consumption increased from ~2.8GB to ~4.8GB. The memory increase is due to 1. in optimizer, after transpose the B weight, a in-memory tensor proto is created using protobuf's arena. 2. in finalize step, when saving initializer and prepacking, ORT arena is used to create buffers for initializers. The memory allocated by arenas cannot be fully deallocated. If disable ORT arena memory allocation, the memory consumptions of both QDQ format and original format are ~2.2GB. The time increase is mainly due to multiple memory copy, but can be further optimized. ### Motivation and Context Please see description for details.	2024-07-15 15:25:40 -07:00
Jian Chen	c03e6fff4c	Combining android build and test step into one job (#21340 ) ### Description Combining android build and test step into one job ### Motivation and Context Reduce runtime by removing additional machine allocation, and artifact uploading and downloading. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-15 14:44:03 -07:00
Yifan Li	db9ee35963	[TensorRT EP] c4996 suppression to build with trt10.2ga on Windows (#21358 ) ### Description <!-- Describe your changes. --> Supress C4996 deprecated api warning as errors as a walkaround to build ORT with TRT10.2GA on Windows ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Four apis were recently declared as deprecated, which are being used by core code of TRT EP. Temporally suppress deprecated api warnings before updating these apis	2024-07-15 14:30:02 -07:00
Changming Sun	e5f18ba2c1	Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339 ) ### Description Resolve #21281 and #10589 . 1. Change libonnxruntime.so's SONAME: remove the minor and patch version. By default when creating an ELF shared object, linker will set the file's internal DT_SONAME field to the specified name which is the file name plus SOVERSION . For example, the file name for our library is libonnxruntime.so. And by default SOVERSION is the lib's VERSION number, which is something like 1.19.0. So the DT_SONAME field in libonnxruntime.so is something like libonnxruntime.so.1.18.0. You can use readelf tool to examine it. ``` readelf -d libonnxruntime.so \| grep SONAME 0x000000000000000e (SONAME) Library soname: [libonnxruntime.so.1.18.0] ``` When an executable is linked with a shared object which has a DT_SONAME field, then when the executable is run the dynamic linker will attempt to load the shared object specified by the DT_SONAME field rather than using the file name(which is libonnxruntime.so) given to the linker. After this change, the SONAME will be shorten to "libonnxruntime.so.1" instead. 2. Set default version strings for Windows DLLs, to resolve #10589	2024-07-15 14:21:34 -07:00
Edward Chen	9c2b85ad58	Fix Android build on Windows (#21304 ) - Pass a list of files instead of path separator-delimited string to project.files(). See this issue: https://github.com/gradle/gradle/issues/19817 - Check for host (instead of target) being Windows when using fallback patch program.	2024-07-15 12:29:02 -07:00
Changming Sun	dfaf18928a	Fix a path problem in onnxruntime_perf_test (#21341 ) ### Description Resolve #21267 . onnxruntime_perf_test does not work properly if the input model path url is just a single filename without any path separator. For example, ``` ./onnxruntime_perf_test -t 10 model.onnx ``` The problem was introduced in #19196 by me.	2024-07-15 10:47:02 -07:00
glen-amd	281ed8c12d	VitisAI EP Context Model (#20926 ) # Why so many commits - Runtime debugging - which is necessary - Three different approaches to EP context model - as a result testing back and forth - Windows compatibility issues - this development has been done on Linux for convenience # "Open" (?) questions - Full offloading to a specific EP - Dumping EP context models by EPs vs [by ONNXRT](`e2abba18ea/onnxruntime/core/framework/graph_partitioner.cc (L725)`) - [Node name to pick nodes](`e2abba18ea/onnxruntime/core/framework/graph_partitioner.cc (L654)`) # VitisAI EP made three variant implementations that have respective pros and cons (and of course we can combine them) ## Serialize and cache the list of compute capabilities and the original ONNX model itself ## In `ComputeCapability()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key ## In `Compile()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key # EP context model creation - Precondition Session option configuration `kOrtSessionOptionEpContextEnable` (aka "ep.context_enable") is enabled. - Approach 1 - Steps 1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext"). 2. EP implements/overrides `IExecutionProvider::GetEpContextNodes()` method. 3. ONNXRT core creates an EP context model and saves/dumps it. - `CreateEpContextModel()` in the file "graph_partitioner.cc" - In `get_ep_context_node()`, `Node::Name()` is used to check whether a node is an EP context node. This limits that EP model creation can only happen in `IExecutionProvider::Compile()`. - The workaround is (1) not implementing `IExecutionProvider::GetEpContextNodes()` and (2) dumping the EP context model by EP itself. 4. Optionally, EP can also dump the EP context model it created by iteself. - Examples - `QNNExecutionProvider` - `VitisAIExecutionProvider` - Approach 2 - Steps 1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext"). 2. EP does NOT implement `IExecutionProvider::GetEpContextNodes()` at all. 3. EP dumps the EP context model it created. - Examples - `TensorrtExecutionProvider` - UPDATES - TRT EP is switching to leveraging `IExecutionProvider::GetEpContextNodes()` - `OpenVINOExecutionProvider` (?) # What to cache in EP context nodes - Non Compilation based EPs - Examples - `VitisAIExecutionProvider` - Characteristics - Heavy lifting work happens in `IExecutionProvider::GetCapability()`. - Preconditions - `IExecutionProvider::GetCapability()` is only called once by ONNXRT. - Cache content - Serialization of a list of `ComputeCapability` - Not EP-specific - Serialized using `onnx::FunctionProto` - EP-specific cache - Compilation based EPs - Examples - `QNNExecutionProvider` - `TensorrtExecutionProvider` - `MIGraphXExecutionProvider` - `OpenVINOExecutionProvider` - Cache content - EP-specific cache # Requirements - Offline / AOT compilation of ONNX models with EP context cache - Compile somewhere, run everywhere - Pseudo code with brief explanation ``` GenerateCache(original_onnx_file, cache_onnx_file) model_buffer = load(original_onnx_file) --> Load the original ONNX model file model_buffer = decrypt(model_buffer) session_options = { kOrtSessionOptionEpContextEnable: true, kOrtSessionOptionEpContextFilePath: temp_file } --> Set necessary configs Ort::CreateSessionFromArray(model_buffer, session_options) --> The new ONNX model with EP context is created and dumped into the user specified file "temp_file" temp_buffer = encrypt(temp_file) write(temp_buffer, cache_onnx_file) --> Write the encypted context of "temp_file" into the "cache_onnx_file" file InitializeInferenceSession(cache_onnx_file) model_buffer = load(cache_onnx_file) --> Load the ONNX model with EP context from the file generated in the previous step model_buffer = decrypt(model_buffer) session_options = { } Ort::CreateSessionFromArray(model_buffer, session_options) --> Create and initalize an session with the EP context model ``` - Python code with comments - EP context model creation ```python import onnxruntime as onnxrt # Session options for creating an ONNX model with EP context cache. sess_opts = onnxrt.SessionOptions() # Verbose. sess_opts.log_severity_level = 0 # This is REQUIRED. sess_opts.add_session_config_entry("ep.context_enable", "1") # This is OPTIONAL. # Either an absolute path (preferred for now) or a relative path (WIP) is okay. # sess_opts.add_session_config_entry("ep.context_file_path", "/some/path/to/original_model_ctx.onnx") # This is OPTIONAL. sess_opts.add_session_config_entry("ep.context_embed_mode", "1") orig_model_location = "/some/path/to/original_model.onnx" sess = onnxrt.InferenceSession(orig_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[]) ``` - Inference run with an EP context model ```python import onnxruntime as onnxrt # Session options for creating an ONNX model with EP context cache. sess_opts = onnxrt.SessionOptions() # Default EP context model path. # ep_ctx_model_location = "/some/path/to/origina_model.onnx_ctx.onnx" # User configured EP context model path. ep_ctx_model_location = "/some/path/to/origina_model_ctx.onnx" sess = onnxrt.InferenceSession(ep_ctx_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[]) model_inputs = {} run_opts = onnxrt.RunOptions() # Verbose. run_opts.log_severity_level = 1 sess.run(None, model_inputs, run_opts) ``` --------- Co-authored-by: Glen Cao <glen@Glens-MacBook-Air.local>	2024-07-12 21:22:58 -07:00
Xu Xing	92a8407b39	[js/webgpu] Remove unnecessary initialization of var (#21312 ) This var has been initialized to 0 in tint, so no need extra loop to do it again: ``` float tint_symbol_52[1][4] = (float[1][4])0; { for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) { { for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) { tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f; } } } } ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-12 12:34:34 -07:00
Yi Zhang	f2ebd1cd6b	[Fix] Exception in iosDynamicFramework Post-Merge workflow (#21262 ) ### Description the exception was caused by `3dd6fcc089` Why I add skip_macos_test because there's new an exception in https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1425579&view=logs&j=c90c5af3-67d5-5936-5a62-71c93ebfca65&t=01038f35-8e78-5801-1aa1-d9647bb65858 ``` 2024-07-05T14:41:09.3864740Z mkdir -p /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest/Contents/Frameworks 2024-07-05T14:41:09.3933430Z mkdir: /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest: Operation not permitted 2024-07-05T14:41:09.3996760Z /var/folders/0f/b0mzpg5d31z074x3z5lzkdxc0000gn/T/tmp97ycvwq5/apple_package_test/Pods/Target Support Files/Pods-macos_package_testUITests/Pods-macos_package_testUITests-frameworks.sh: line 7: realpath: command not found 2024-07-05T14:41:09.4003170Z :18: error: Unexpected failure 2024-07-05T14:41:11.1323470Z error: Sandbox: mkdir(72212) deny(1) file-write-create /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest (in target 'macos_package_testUITests' from project 'apple_package_test') 2024-07-05T14:41:11.1325620Z 2024-07-05T14:41:11.8731110Z 2024-07-05T14:41:11.8733040Z Test session results, code coverage, and logs: 2024-07-05T14:41:11.8734820Z /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Logs/Test/Test-macos_package_test-2024.07.05_14-40-38-+0000.xcresult 2024-07-05T14:41:11.8735530Z 2024-07-05T14:41:11.8906210Z Testing failed: 2024-07-05T14:41:11.8911060Z Sandbox: mkdir(72212) deny(1) file-write-create /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest 2024-07-05T14:41:11.8912570Z Unexpected failure 2024-07-05T14:41:11.8913690Z Testing cancelled because the build failed. 2024-07-05T14:41:11.8914380Z 2024-07-05T14:41:11.8914970Z TEST FAILED 2024-07-05T14:41:11.8915480Z 2024-07-05T14:41:11.8915780Z 2024-07-05T14:41:11.8916750Z The following build commands failed: 2024-07-05T14:41:11.8919280Z PhaseScriptExecution [CP]\ Embed\ Pods\ Frameworks /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Intermediates.noindex/apple_package_test.build/Debug/macos_package_testUITests.build/Script-059136A7770CA5376C30F2FD.sh (in target 'macos_package_testUITests' from project 'apple_package_test') 2024-07-05T14:41:11.8922180Z (1 failure) ``` And I find macos test is skipped in `9ef28f092f/tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml (L119-L127)` as well. Maybe it is an known issue.	2024-07-12 09:24:12 -07:00
Ted Themistokleous	4ac4cd2668	Migraphx ep windows build (#21284 ) ### Description Repeat of #21084 with removal of policy CMP0144 to suppress warnings which uses CMake 3.27.0. ### Motivation and Context Already approved PR: https://github.com/microsoft/onnxruntime/pull/21084 Removed the added policy from CMake 3.27.0.	2024-07-11 21:21:38 -07:00
mingyueliuh	42b7cedb06	[VitisAI] custom op support multiple outputs (#21280 ) ### Description The implementation inside EP requires registering some custom ops which are only used in the model compilation phase. Currently only single output is supported. ### Motivation and Context Now the demand upgrade requires support for multiple outputs, so the shaper infer of ep custom op needs to be extended to support multiple outputs --------- Co-authored-by: liumingyue <mingyue@xilinx.com> Co-authored-by: mingyue <mingyue@amd.com>	2024-07-11 16:04:18 -07:00
Qingnan Duan	80b56feb41	Implement FlashAttention for CPU (#20805 ) ### Description Implement [FlashAttention](https://arxiv.org/pdf/2205.14135) and [FlashAttention-2](https://arxiv.org/pdf/2307.08691) for MultiHeadAttention on CPU. ### Motivation and Context Accelerate the execution of MultiHeadAttention. Current performance: 10ms vs 16ms (com.microsoft.MultiHeadAttention) on my Linux machine and 10ms vs 38ms (com.microsoft.MultiHeadAttention) on my Windows machine. May need further optimizations. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Qingnan Duan <qiduan@microsoft.com>	2024-07-11 14:19:59 -07:00
Edward Chen	33e7c7f6ec	Enable Android CI build stages to run in parallel. (#21314 ) Enable Android CI build stages to run in parallel to possibly reduce total build time.	2024-07-11 10:09:09 -07:00

1 2 3 4 5 ...

11373 commits