onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-29 03:30:52 +00:00

Author	SHA1	Message	Date
Adrian Lizarraga	514b4699b4	[QNN EP] Apply workaround for Conv validation bug when bias input is implicit (#21764 ) ### Description - Adds a dummy bias of all zeros when translating a Conv without an explicit bias input. This is a workaround for a QNN validation issue that fails when the optional bias input is not provided. - Corrects logic for unpacking of non-zero int4 zero-points. Bug does not impact models because we currently only support int4 zero-points equal to 0 (symmetric quant). But this would become an issue in the future if/when QNN supports non-zero int4 zero-points (so good to fix now). ### Motivation and Context Support Conv operators without a bias input on QNN EP with the latest QNN SDK.	2024-08-22 10:38:03 -07:00
Chen Feiyue	ff3e8b02c3	[VSINPU]Update vsinpu patches (#21402 ) ### Description - update patches for accuracy modification && local result recording	2024-08-21 23:58:56 -07:00
Yueqing Zhang	3ff8ca29e5	[VitisAI] remove wrong error msg, required by Microsoft (#21715 ) ### Description <!-- Describe your changes. --> Remove legacy code and wrong message. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required by Microsoft to remove unwanted error message. This is required for 8.15 release. Co-authored-by: Yueqing Zhang <yueqingz@amd.com>	2024-08-21 21:10:28 -07:00
Tianlei Wu	25d7a4fa08	[CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804 ) Use debug info to identify sdpa kernel actually used, and show it in the output of benchmark_mha.py. This updated benchmark script was used to get the benchmark results in https://github.com/microsoft/onnxruntime/pull/21629. (1) Change the output format of debug info to output like SdpaKernel=* (2) Add a step to capture stdout from onnxruntime session, and use regular expression to parse SdpaKernel=* from the captured text. Other minor changes: (1) Set different default repeats during benchmark: 100 for CPU; and 10000 for CUDA. (2) Fix PrintTensorByDims used in console dumper: if it is not enabled, do not dump tensor. (3) Update some comments ### Motivation and Context Sometime, we will use fallback for a sdpa_kernel. It could confuse user unless we can tell exact kernel is used in benchmark.	2024-08-21 17:30:16 -07:00
Tianlei Wu	44a3923ba5	run sparse attention test sequentially (#21808 ) ### Description For some reason, run SparseAttention tests in parallel causes random failure in CI pipeline. Maybe due to out of memory when too many tests running in parallel. This will run those tests in sequentially.	2024-08-21 17:24:58 -07:00
Jake Mathern	c0b68e77af	Fix warnings (#21809 ) ### Description Minor changes to resolve some warnings in ORT ### Motivation and Context Binskim for WindowsAI (which consumes ORT) treats warnings as errors, and has hit these warnings. As a security requirement, warnings like "signed/unsigned mismatch" must be resolved.	2024-08-21 14:23:37 -07:00
Edward Chen	fb9ce18e88	Add K=0 check to MatMul<float>::Compute() specialization. (#21803 ) Add K=0 check to `MatMul<float>::Compute()` specialization. Add unit test to cover both primary template and float specialization.	2024-08-21 09:15:58 -07:00
Ted Themistokleous	0e827c27fb	[MIGraphX EP] Add support for MIGraphX Exhaustive tune flag (#46 ) (#21599 ) ### Description <!-- Describe your changes. --> Set the exhaustive tune flag through the MIGraphX API and make this a Session option in Onnxruntime ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Allow users to use MIGraphX Exhaustive tuning with Onnxruntime inferences This goers hand in hand with save/load after a model and been compiled and tuning has found. --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-08-21 07:32:12 -07:00
Ted Themistokleous	26a499323f	[MIGraphX EP Support] Update migx scripts (#21806 ) ### Description <!-- Describe your changes. --> No code changes to the EP only changes to the scripts whihc invoke MIGraphX EP - One case be explicit to set MIGraphX EP when running gpt2 testing - The other to ensure we turn off optimizations like tensorRT and allow MIGraphX to handle graph optimizations ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> MIGraphX has moved away from using rocBLAS and without this, some cases used in CI shall fail as optmizations will attempt to use rocBLAS kernels instead of MIGraphx EP directly.	2024-08-21 07:22:42 -07:00
Ted Themistokleous	ed155ad46a	[MIGraphX EP] Ensure we support all inputs for MatMulInteger and ConvInteger. (#21680 ) … to int8 for now Allow for models with biases/full input and only check for int8 support in EP ### Description <!-- Describe your changes. --> Allows for all inputs for MatMulInteger and ConvInteger to be supported for prequantized models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixes issues when using prequantized models that contain weight biases --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2024-08-21 07:19:20 -07:00
Patrice Vignola	de6ebcbb54	[DML] Add int4 QDQ (#21592 )	2024-08-20 23:44:58 -07:00
Adrian Lizarraga	6fbb0ae81a	[TransposeOptimizer] Fix axis for QuantizeLinear inserted after DQ (per-channel) -> Unsqueeze (#21793 ) ### Description - Fix computation of axis for `QuantizeLinear` inserted after the sequence `DQ (per-channel) -> Unsqueeze`. Example: - Original: `DQ (axis = 0) -> Unsqueeze (axes = [0, 1, 2]) -> Op` - After QDQ fix-up: `DQ (axis = 0) -> Unsqueeze (axes = [0, 1, 2]) -> Q (axis = 3) -> DQ (axis = 3) -> Op` - Before this PR, the axis for the inserted Q/DQ ops was not correctly set to 3 (left as 0). - Fix normalization of negative axis values for `QuantizeLinear` inserted after the sequence `DQ (per-channel) ->Transpose` - Existing code added the wrong rank value to normalize the DQ axis. ### Motivation and Context Fix errors in handling of per-channel DQ in code that fixes QDQ NodeUnits.	2024-08-20 16:26:02 -07:00
Tianlei Wu	fbc3927231	[CUDA] cuDNN Flash Attention (#21629 ) ### Description - [x] Add cuDNN flash attention using cudnn frontend, and enable it in MultiHeadAttention operator. - [x] Support attention mask. - [x] Support attention bias. - [x] Update tests and benchmark script. The cuDNN SDPA is disabled by default. To enable it, need the following: (1) Requires cuDNN 9.3 or newer version installed. (2) Set an environment variable `ORT_ENABLE_CUDNN_FLASH_ATTENTION=1` or set `sdpa_kernel=8` cuda provider option to enable it. (3) Only works on devices with compute capability >= 8.0. Note that some combinations of parameters might be rejected due to limited support of head dimension or sequence lengths. Future Works: (1) FP8 and BF16 APIs. Currently, only API for FP16 are exposed. (2) Add API to support ragged batching (padding removed in inputs). (3) Support other input formats (like QKV_BS3NH). (4) Currently, q are converted to BSNH, k/v are converted to either BSNH or BNSH format. May do some experiment to see whether converting q to BNSH could be better in some case. ### Example Benchmark Results on H100 The following tests are on FP16 MultiHeadAttention operator without attention mask and attention bias. #### Test Setting 1 batch_size \| sequence_length \| past_sequence_length \| num_heads \| head_size -- \| -- \| -- \| -- \| -- 16 \| 256 \| 0 \| 32 \| 128 format \| average_latency \| tflops \| kernel -- \| -- \| -- \| -- Q,K,V (BNSH) \| 0.000075 \| 229.5 \| torch:flash Q,K,V (BNSH) \| 0.000119 \| 144.8 \| torch:efficient Q,K,V (BNSH) \| 0.000224 \| 76.5 \| torch:math Q,K,V (BSNH) \| 0.000075 \| 227.8 \| ort:cudnn Q,K,V (BSNH) \| 0.000094 \| 182.8 \| ort:flash Q,K,V (BSNH) \| 0.000138 \| 124.7 \| ort:efficient Q,K,V (BSNH) \| 0.000438 \| 39.3 \| ort:math Q,KV \| 0.000129 \| 133.0 \| ort:cudnn Q,KV \| 0.000151 \| 114.1 \| ort:flash Q,KV \| 0.000194 \| 88.5 \| ort:efficient QKV \| 0.000154 \| 111.8 \| ort:cudnn QKV \| 0.000175 \| 98.0 \| ort:flash QKV \| 0.000217 \| 79.0 \| ort:efficient #### Test Setting 2 batch_size \| sequence_length \| past_sequence_length \| num_heads \| head_size -- \| -- \| -- \| -- \| -- 16 \| 512 \| 0 \| 16 \| 64 format \| average_latency \| tflops \| kernel -- \| -- \| -- \| -- Q,K,V (BNSH) \| 0.000069 \| 249.2 \| torch:flash Q,K,V (BNSH) \| 0.000141 \| 121.7 \| torch:efficient Q,K,V (BNSH) \| 0.000294 \| 58.5 \| torch:math Q,K,V (BSNH) \| 0.000077 \| 221.7 \| ort:cudnn Q,K,V (BSNH) \| 0.000087 \| 196.6 \| ort:flash Q,K,V (BSNH) \| 0.000163 \| 105.6 \| ort:efficient Q,K,V (BSNH) \| 0.000651 \| 26.4 \| ort:math Q,KV \| 0.000103 \| 167.1 \| ort:cudnn Q,KV \| 0.000117 \| 146.3 \| ort:flash Q,KV \| 0.000192 \| 89.6 \| ort:efficient QKV \| 0.000113 \| 151.5 \| ort:cudnn QKV \| 0.000128 \| 134.7 \| ort:flash QKV \| 0.000201 \| 85.3 \| ort:efficient	2024-08-20 08:50:22 -07:00
Adrian Lizarraga	a22cc078b4	[QNN EP] Add support for GatherElements (#15966 ) ### Description - Adds support for the GatherElements operator to QNN EP. - Adds GatherElements to QDQ quantizer tool. ### Motivation and Context Enable more models to run on QNN EP.	2024-08-19 14:33:40 -07:00
Jing Fang	64674c50de	Added a tool to quantize Gather to GatherBlockQuantized (#21697 ) ### Description Added code in MatMul4BitsQuantizer to quantize Gather to GatherBlockQuantized. Only Gather with constant data is quantized. Since quantized data is in int4, the quantized model will force upgrade to onnx opset 21. The implementation purely relies on numpy. If optimization is needed, C++ kernels can be added later. Only support default RTN algorithm since GatherBlockQuantized require zero points to have the same type as quantized data. ### Motivation and Context Support quantizing gather to int4 in Web scenario.	2024-08-19 10:25:36 -07:00
Wanming Lin	7ae0b4ce64	[WebNN EP] Support Erf and Trilu for CPU backend (#21768 )	2024-08-19 07:56:16 -07:00
jingyanwangms	c018ba43ef	[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742 ) ### Description - TensorRT 10.2.0.19 -> 10.3.0.26 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 13:26:41 -07:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Emmanuel	a4bec3d374	Enabled Dynamo exporter (#21713 ) ### Description This PR modifies the run_dynamo_export function to ensure it mirrors the behavior of run_torchscript_merged_export rather than run_torchscript_separate_export. Additionally, I made adjustments to the main function to ensure that run_dynamo is correctly invoked. ### Motivation and Context The main motivation for this change is to enable successful export of LLaMA-2 and LLaMA-3 models using the Dynamo exporter to ONNX. Previously, the exporter was saving two copies of the weights, which is inefficient. The modified approach ensures that only one copy of the weights is saved, and the model can support both scenarios. These changes enhance the compatibility of the exporter with LLaMA models and subsequently other models and optimize the export process	2024-08-16 10:45:22 -07:00
Wanming Lin	b2d603abda	[WebNN EP] Remove workaround for scalar (#21704 ) Currently Chromium has supported scalar with dims = {}, remove legacy workaround for supporting scalar.	2024-08-15 22:59:51 -07:00
Dmitri Smirnov	754dba2674	Change to std::fill (#21759 ) ### Description Replace `memset(0)` with `std::fill(T{})`. This would ensure that all the types are initialized in a portable way. ### Motivation and Context Some platforms exhibit intermittent failures with NaN results. Follow up to: https://github.com/microsoft/onnxruntime/pull/21525 Cc: @ranjitshs	2024-08-15 16:16:54 -07:00
Guenther Schmuelling	d82f15d0e3	add Gelu opset-20 to webgpu (#21725 ) https://github.com/microsoft/onnxruntime/issues/21618	2024-08-14 09:45:05 -07:00
Frank Dong	a0708a0d96	avoid redundant memory allocation for external initializers (#21682 ) ### Description avoid redundant memory allocation for external initializers, we will use mmap for external initializers later so no point to allocate memory in advance then release them later. ### Motivation and Context In current implementation, we will: 1. Allocate memory (with desired size of current initializer) for initializer first: [https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L131](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L131&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340041125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=6fN57MUsergrCX%2BBS7jztWBRmc8nx19EVvn0lUJ2Gtk%3D&reserved=0) 2. For external initializer, we will point initializer to mmaped object in memory and release previously allocated tensor: [https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L89](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L89&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340054491%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=yBtXLc%2Bhpx3IT1%2FX0664foqQ5X5O%2Fy5XNhj4Oed%2BAt4%3D&reserved=0) For large models, we are keep allocating and release memory for external initializers which seems unnecessary. For phi silica model, with this change we can reduce transient memory usage from 4,566MB to 2,724MB. Since these redundant memory is released quickly when we mmap external initializers so this change has no much impact on peak memory usage.	2024-08-13 23:13:49 -07:00
Xu Xing	7172aff1cf	[js/webgpu] Fix max pool shape end with 0 (#21698 ) Bug: https://github.com/microsoft/onnxruntime/issues/21386 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-13 20:59:24 -07:00
Dmitri Smirnov	c2911bbb1c	[CUDA] Special case for K==0 in CUDA MatMul (#21525 ) ### Description This change addresses a case where we multiply two matrices, and their inner dimension is 0. numpy and Eigen which is being used in our CPU EP implementation correctly handle this case and output a [M, N] matrix filled with zeros. ### Motivation and Context This is required to support GenAI empty input Lora implementation. Addresses: https://github.com/microsoft/onnxruntime/issues/21483	2024-08-13 11:27:05 -07:00
liqun Fu	3439429717	Fix neural-speed ci failure (#21694 ) ### Description fix https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1461029&view=logs&j=3565c00d-48fa-5c65-7ab9-a05e12e29ed0&t=e43fe03a-689e-5dc5-9ad5-9f116eba3e9d&l=6341 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-08-13 10:48:25 -07:00
jingyanwangms	154084efaa	Security Fuzz Test Fixes (#21608 ) ### Description Fix address sanitizer and memory access Bug 1, 4, 5, 7, 8 found in security fuzz test ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-11 03:28:41 -07:00
Chi Lo	2abebb2a47	[TensorRT EP] No workspace size limit to TRT memory pool (#21643 ) We saw some models failed to run due to OOM and can be fixed by increase trt_max_workspace_size. This PR makes no size limitation by default (max device memory) which is aligned with trtexec.	2024-08-09 17:30:51 -07:00
Caroline Zhu	eeef0c8aca	Enable exporting for inference when loading from buffer without behavior changes (#21601 ) ### Description Added eval model buffer as optional field in Module so that you can export for inference using the eval model stored as a buffer. ### Motivation and Context - Resolves #21152 - Previous solution (PR #21422) produced an eval model that was specific to the EP's used to train because of unavoidable runtime optimizations that changed the graph stored with the eval session.	2024-08-09 16:59:50 -07:00
Krishna Bindumadhavan	37be90c9c8	[Quant tool]: Improve symmetric quantization range update for Relu/Clip (#21573 ) ### Description This PR improves the range calculation for input to Relu/Clip nodes for the symmetric quantization case. ### Motivation and Context Currently, the issue we face is that for the common scenario of conv followed by relu in the symmetric quantization config, different scales could assigned for the tensors corresponding to input & output of relu. The downside is that this may introduce noise due to multiple re-quant, and makes it difficult to fuse conv-relu nodes for hardware accelerators that support fused conv-relu. Instead, it is more efficient to assign the output range of relu as the input range of relu / output range of upstream op wherever possible. This adjustment is currently only being done for the asymmetric quantization case. For the scenario where the upstream op has multiple consumers, this assumption could be incorrect. For this case we do not adjust the ranges.	2024-08-09 14:48:09 -07:00
Adrian Lizarraga	390f0fd8ce	[QNN Quant tool] Fix validation of per-channel overrides for models with external data (#21656 ) ### Description Fixes validation of per-channel quantization overrides by not trying to unnecessary load the external weights. ### Motivation and Context The `get_qnn_qdq_config()` explicitly loads models without external data (i.e., `onnx.load_model(load_external_data=False)`). Afterwards, `get_qnn_qdq_config()` calls `tensor_proto_to_array()`, which expects that the external weights are stored in the current working directory. If the external weights are stored in a different directory, then we get a crash. Loading the actual weight values is unnecessary because we only need the weight shape. This PR removes the unnecessary call to `tensor_proto_to_array()` call.	2024-08-09 14:46:52 -07:00
Satya Kumar Jandhyala	51b2044120	[JS/WebGPU] Add Dequantizelinear operator (#21642 ) ### Description Added DequantizeLinear operator for JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 14:44:19 -07:00
Yifan Li	906ae77eea	[TensorRT EP] Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously (#21621 ) ### Description <!-- Describe your changes. --> Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reported and verified by https://github.com/microsoft/onnxruntime/issues/21567	2024-08-09 14:09:22 -07:00
saurabh	88788474b9	fix handling of multiple QuantizeLinear nodes (#21675 ) ### Description This fix addresses the issue of handling multiple QLinear nodes as outputs from the target node in OVEP. Previously, the stripping logic only supported a single Q node, leading to incorrect stripping of additional Q nodes. ### Motivation and Context The OVEP stripping logic was limited to handling a single Q node as an output from the target node. As a result, additional Q nodes were being stripped, despite the stripping rules indicating they should be retained. With this fix, OVEP can now properly handle multiple Q nodes according to the specified stripping rules, ensuring that the fate of each Q node is correctly determined. --------- Co-authored-by: sfatimar <sahar.fatima@intel.com>	2024-08-09 14:04:05 -07:00
Jing Fang	53a66f4e02	When quantize 4bit mamtul, force upgrade onnx domain opset to 21 (#21693 ) ### Description When quantize MatMul to DQ + MatMul using 4bit QDQ tool chain, previously the opsets of domains are not changed. Now, when quantize MatMul to DQ + MatMul in QDQ format, force upgrade onnx domain to opset 21. ### Motivation and Context In QDQ format, DQ with int4 and blocked quantization is used. This requires DQ with opset >= 21. When quantize MatMul to DQ + MatMul, force upgrade onnx domain to opset 21.	2024-08-09 13:50:12 -07:00
duanshengliu	c6a73defb8	Fix wrong per-tensor quantized weight type for matmul (#21347 ) ### Description <!-- Describe your changes. --> Fix wrong per-tensor quantized weight type for matmul. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix related bug as described in https://github.com/microsoft/onnxruntime/issues/21346	2024-08-09 13:36:25 -07:00
Jing Fang	f30581ed2c	[CPU EP] Add block quantized Gather contrib op (#21630 ) ### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.	2024-08-09 12:15:11 -07:00
Sumit Agarwal	702b2e28e0	Fuse Pad even if Cast is present in-between (#21640 ) ### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * \| * Cast (Optional) * \| * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * \| * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 06:52:59 -07:00
Yulong Wang	f4ec85259a	[js/web] allow relative path matching (#21657 ) ### Description <!-- Describe your changes. --> This change allows to match external data path like `a.data` to `./a.data`. <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 03:13:40 -07:00
Tianlei Wu	9334d4e362	[CUDA] Fix MHA mask (#21655 ) ### Description Fix a check of mask type introduced by me in a recent commit. Add tests.	2024-08-09 01:31:00 -07:00
Tianlei Wu	a46e49b439	Unblock migraphx and linux GPU training ci pipelines (#21662 ) ### Description * Fix migraphx build error caused by https://github.com/microsoft/onnxruntime/pull/21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by https://github.com/microsoft/onnxruntime/pull/19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-08 19:44:15 -07:00
Xiang Zhang	c93b92a43f	fix wrong check for tree ensemble regressor (#21595 ) Fix missed ORT_ENFORCE check which caused heap buffer overflow because of out of bound access.	2024-08-07 16:27:18 -07:00
Yi Zhang	621b16f478	Pin transformer and optimum version (#21650 ) ### Description <!-- Describe your changes. --> ### Motivation and Context To fix whisper test failure	2024-08-07 17:47:15 +08:00
duanshengliu	b95aa0563f	Improve speed in combining per-channel data (#21563 ) ### Description <!-- Describe your changes. --> Improve speed in combining `per-channel` data for using a single `np.concatenate` instead of multiple `np.concatenates` within a for loop. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the issue https://github.com/microsoft/onnxruntime/issues/21562 Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>	2024-08-06 16:23:20 -07:00
Adrian Lizarraga	0acefc7988	[QNN EP] Update QNN SDK to 2.25 (#21623 ) ### Description - Update pipelines to use QNN SDK 2.25 by default - Update ifdef condition to apply workaround for QNN LayerNorm validation bug to QNN SDK 2.25 (as well as 2.24) ### Motivation and Context Use the latest QNN SDK	2024-08-06 09:08:48 -07:00
liqun Fu	f6f9657fb6	Fix typos so to call correct vnni functions under vnni condition (#21625 ) ### Description Fix 2 typos in mlas avx 4bit gemm implementation to call correct vnni functions under vnni condition ### Motivation and Context needed for 1.19.0 release Signed-off-by: liqunfu <liqun.fu@microsoft.com>	2024-08-05 20:52:26 -07:00
Prathik Rao	134f47743e	bumps up version in main from 1.19 -> 1.20 (#21588 ) Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.	2024-08-05 15:46:04 -07:00
Po-Wei (Vincent)	2653226ed0	Fail tests gracefully for the minimal cuda build (#21391 ) ### Description Several tests result in segfaults during the minimal cuda build. Although test failures are expected due to the limitation of the minimal cuda EP, failing gracefully would be much preferred. ### Motivation and Context To reproduce: 1. Build ORT with: ```bash ./build.sh --build_shared_lib --use_full_protobuf --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --tensorrt_home /TensorRT-10.0.1.6 --parallel --skip_tests --skip_submodule_sync --allow_running_as_root --use_tensorrt --cmake_extra_defines onnxruntime_CUDA_MINIMAL=1 ``` 2. Run `onnxruntime_test_all` ```bash ... [----------] 1 test from AllocationPlannerTest [ RUN ] AllocationPlannerTest.ReusedInputCrossDifferentStreams Segmentation fault (core dumped) ```	2024-08-02 18:27:36 -07:00
Wanming Lin	8c641d7182	[WebNN EP] Support Dropout op (#21586 ) ### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.	2024-08-02 16:25:04 -07:00
Ted Themistokleous	45b7c41ef0	[MIGraphX EP] Set External Data Path (#21598 ) ### Description <!-- Describe your changes. --> Changes to add in Set external data path for model weight files. Additional fixes to ensure this compiles off the latest v1.19 Onnxruntime ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Separate weights used for larger models (like stable diffusion) is motivation for this change set --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Artur Wojcik <artur.wojcik@amd.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2024-08-02 16:19:04 -07:00

1 2 3 4 5 ...

6778 commits