onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-15 18:23:41 +00:00

Author	SHA1	Message	Date
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00
Changming Sun	794d39a977	LLVM16 compat changes (#20294 ) The change is similar to #15672 and #11667, for making the code compatible with CUDA 12 and LLVM16 on Mariner2.	2024-04-12 10:16:12 -07:00
liqun Fu	cd7112f800	Integration with ONNX 1.16.0 (#19745 ) ### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-04-12 09:46:49 -07:00
Patrice Vignola	a0d5067341	[DML EP] Move operators to feature level 6.4 (#20290 ) Those operators won't be in the next version of DML, but they will come in the version right after.	2024-04-12 00:02:27 -07:00
Adrian Lizarraga	327fb1fde3	[QNN EP] Use QNN's ResizeBilinear operator for specific configs of ONNX Resize (#20292 ) ### Description Uses QNN's ResizeBilinear operator for ONNX Resize with: - input rank: 4 - mode: linear - coordinate transformation mode: half_pixel, align_corners, or asymmetric #### Mapping matrix of ONNX Resize w/ "linear" mode on HTP backend. Table entries correspond to the QNN operator used for the given configuration (Resize = QNN Resize op, RBL = QNN ResizeBilinear op, X = Unsupported). \| coordinate_transformation_mode \| input_rank < 3 \| input_rank = 3 \| input_rank = 4 \| input_rank = 5 \| input_rank > 5 \| \| ------------- \| ------------- \|------------- \|------------- \|------------- \|------------- \| \| half_pixel \| X \| Resize \| RBL \| Resize \| X \| \| pytorch_half_pixel \| X \| Resize \| Resize \| Resize \| X \| \| align_corners \| X \| Resize \| RBL \| Resize \| X \| \| asymmetric \| X \| Resize \| RBL \| Resize \| X \| ### Motivation and Context QNN's ResizeBilinear operator seems to perform better (lower latency) than QNN's Resize operator for certain configurations.	2024-04-11 22:38:55 -07:00
dependabot[bot]	9ca1afa25c	Bump protobufjs from 7.2.4 to 7.2.5 in /js/web (#20270 ) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/releases">protobufjs's releases</a>.</em></p> <blockquote> <h2>protobufjs: v7.2.5</h2> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md">protobufjs's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4436cc748c`"><code>4436cc7</code></a> chore: release master (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1925">#1925</a>)</li> <li><a href="`e93286ef70`"><code>e93286e</code></a> fix: deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>)</li> <li><a href="`eaf9f0a5a4`"><code>eaf9f0a</code></a> fix: crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>)</li> <li><a href="`f2a8620179`"><code>f2a8620</code></a> fix: possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>)</li> <li>See full diff in <a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=protobufjs&package-manager=npm_and_yarn&previous-version=7.2.4&new-version=7.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-11 22:07:08 -07:00
Wanming Lin	667d2eb8e6	[WebNN EP] Support Gelu op (#20240 )	2024-04-11 19:55:44 -07:00
Patrice Vignola	12042a9387	[DML] Add FastGelu (#20066 ) Although DML doesn't have a "fast" gelu approximation operator, its standard GELU operator is still faster than having to combine all the separate elementwise operators from different ops.	2024-04-11 14:40:28 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
Maximilian Müller	2d0e1df80a	TRT detailed log and strong typed networks (#19695 ) ### Description @chilo-ms to me it seems sensible to forward the detailed log argument to the TRT logger itself. Also when no precision downcast is wanted this will ensure to actually stick to ONNX precision when used with TRT 9+.	2024-04-11 13:40:13 -07:00
Jeff Bloomfield	f7e2faf961	Re-enable MatMul QDQ fusions with the DML EP (#20248 ) This re-enables MatMul QDQ fusions with the DML EP now that bugs in related DML kernels previously encountered in the pipeline are expected to be addressed.	2024-04-11 10:43:20 -07:00
Wanming Lin	ee603ee326	[WebNN EP] Fixed WebNN constant operand is detached issue (#20229 ) Wasm allows growing the memory size, this will cause all array buffers reallocation. WebNN EP passes a wasm view to a WebNN constant directly which would lead to the WebNN constant be treated as detached buffers in JS side. Simply create a copy for WebNN constant to fix it.	2024-04-10 20:30:03 -07:00
Yufeng Li	e6ca360695	fix build break in kernel_explorer (#20235 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-10 15:01:44 -07:00
Chi Lo	47072509c7	[TensorRT EP] Fix updating provider options from session options (#20246 ) The issue comes from if user specifies a path for "ep.context_file_path" in session options, due to `context_cache_path` is a local variable and it will be destroyed when returning from `UpdateOrtTensorRTProviderOptionsV2FromSessionOptionsConfigs()`. Later in `onnxruntime::TensorrtProviderFactoryCreator::Create(&new_tensorrt_options)`, it will access the corrupted memory location because of the location is saved via context_cache_path.c_str(). Inline the `UpdateOrtTensorRTProviderOptionsV2FromSessionOptionsConfigs()` can fix this issue.	2024-04-10 12:48:37 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Dmitri Smirnov	89a96bdc34	Reduce heap contention in Tokenizer (#20196 ) ### Description <!-- Describe your changes. --> Re-use vector buffers to prevent frequent reallocations. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce process heap contention. ![image](https://github.com/microsoft/onnxruntime/assets/11303988/f0b78062-3d86-45b7-87fd-e0696b170cf8)	2024-04-10 12:12:17 -07:00
Yifan Li	9577fe454d	[EP Perf] Customize onnx-tensorrt commit id when init CI tasks (#20175 ) ### Description <!-- Describe your changes. --> Customize commit id of onnx-tensorrt in EP Perf CI variables when testing OSS parsers in different versions ### To Verify ![image](https://github.com/microsoft/onnxruntime/assets/109183385/9dc650d8-377d-4223-8951-f0849b1fe984) After assigning `onnxTensorrtCommitId` in EP Perf CI Variables, CI would prompt during the step of [Build latest ORT Image with TensorRT OSS parser](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396450): ``` Updated deps.txt with new commit id a43ce67187bab219520fd80f21af8bbd4354bc8c and hash 572535aefef477050f86744dfab1fef840198035 ``` And CI would [overwrite the line of onnx_tensorrt in deps.txt](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396451) which was assigned as: ``` onnx_tensorrt;`a43ce67187`.zip;572535aefef477050f86744dfab1fef840198035 ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To save time of modifying deps.txt and manually calculating zip hash	2024-04-10 09:46:05 -07:00
guyang3532	471e969e2f	Check padding density by input of embedding module (#19821 ) ### Description The PaddingElimination optimization is enabled when the density of embedding padding less than 90%. We need to check the density of the embedding padding to decide whether enable the optimization. Before this pr, we just check the inputs of graph and correlate one with the embedding node by iterate graph from the embedding node back to one graph input. This is hard to be general because there may be complicated pattern between graph input and embedding node. This pr check padding density by the direct input of embedding module rather than the input of graph at the first graph execution when exporting onnx graph. And if the density < 90%, insert a flag PythonOp after the embedding node as: ``` Embedding \| PythonOp (func_name:_FlagPaddingElimination) (insert if density < 90%) \| Following graph ``` When the PaddingElimination is invoked, it check if there is the flag PythonOp(func_name:_FlagPaddingElimination) after the Embedding node and if it is, remove it and do the padding elimination optimization.	2024-04-10 18:45:51 +08:00
Yi Zhang	0acde1157a	Set parallel count to avoid OOM in training GPU packaging pipeline (#20255 ) ### Description make the compilation work on Azure CPU Agent by reduce the parallel count ### Motivation and Context The OOM issue mentioned in #20244 was caused the by low memory/parallel_count.	2024-04-10 14:05:53 +08:00
pengwa	280b2634c5	Prompt layer-wise recompute when applicable (#20126 ) ### Prompt layer-wise when applicable Give explicit prompts in export failures to users to enable layer-wise memory optimization if we found the checkpoint function is used. - Using checkpoint function is a strong indicator that the model is too large to fit in GPU memory. - If we don't override the checkpoint function here, mostly ONNX export will be failed. 1. For old version PyTorch, when handling gradient checkpoint feature, we just throw an exception. 2. For new version PyTorch, an export failure happens. - But both failures did not give users explicitly "HOW" to mitigate. This PR did that. `` ![image](https://github.com/microsoft/onnxruntime/assets/10530022/c0476748-5818-4cc8-b2d6-88c7580fe4da) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-10 11:50:28 +08:00
Yi Zhang	14d7872ce9	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 ) ### Description It always has been out of memory in training CUDA 12.2 packaging pipeline https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary since the PR #19910 I tried other CPU agents for example, D64as_v5(256G memory) and D32as_v4(128G memory and 256 G SSD temp storage), which are still out of memory like the below image ![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083) But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp storage, and it takes much more time. ### Motivation and Context Restore CUDA 12.2 training packaging pipeline first. More time is needed to investigate the root cause ### Other Clues. These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4 And it runs out of memory on CPU machine. @ajindal1 cuda12.2 on T4 ``` 2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o 2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o 2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o ``` But they could be finished in about one minute with Cuda 11.8 on CPU ``` cuda11.8 on CPU 2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o cuda11.8 on GPU 024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o ```	2024-04-10 09:21:40 +08:00
Dmitri Smirnov	7d8dea9f10	Reduce Heap contention in StringNormalizer (#20182 ) ### Description <!-- Describe your changes. --> Re-use pre-computed and pre-allocated buffers for UNICODE conversions. Make sure we do not introduce unnecessary intermediate `std::string` instances. Create a Utf8Generic converter for use with non-Windows platforms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This reduces heap contention in P1 customer. ![image](https://github.com/microsoft/onnxruntime/assets/11303988/fd39fb01-7361-47d2-8f83-69dbc3bbc65c)	2024-04-09 16:10:31 -07:00
pengwa	81005e2c92	Optimize constant sharing perf (#20143 ) ### Optimize constant sharing perf by avoiding [renaming for the first name we detect a constant pattern. Currently every time we start run ConstantSharing, for each initializer, we find its pattern does not exist, then we create a new NodeArg with a unique name. Then later if other initializer share the same pattern, they will be replaced by the NodeArg. The problem is: once there is no real constant sharing cases, we still modify the graph for each initializer. This is not needed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-09 12:04:36 +08:00
danyue	07b5377f7c	Add INT16 and UINT16 compatibility for relu_quantizelinear (#20187 ) ### Description <!-- Describe your changes. --> There is a problem in relu_quantizelinear transformer that causes wrong results. The purpose of this PR is to solve this problem. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This does not take into account the situation where Q's zeropoint is tensor(int16), tensor(uint16), so when this happens, an error will occur. How to verify： ```python import onnx import onnxruntime as ort import numpy as np model_name = 'relu_quantize_testcase.onnx' model = onnx.load(model_name) ort_input0 = np.random.rand((1, 64, 64, 128),np.float32) # infer with GraphOptimizationLevel=0 so = ort.SessionOptions() so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL ort_session = ort.InferenceSession( model_name, providers=["CPUExecutionProvider"], sess_options=so ) outputs = [x.name for x in ort_session.get_outputs()] ort_outs_mod = ort_session.run(outputs, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} ) del ort_session # infer with GraphOptimizationLevel=default model_orig = onnx.load(model_name) ort_session_orig = ort.InferenceSession(model_orig.SerializeToString()) outputs_orig = [x.name for x in ort_session_orig.get_outputs()] ort_outs_orig = ort_session_orig.run(outputs_orig, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} ) # diff print(np.linalg.norm(ort_outs_mod[0].astype(np.float32) - ort_outs_orig[0].astype(np.float32))) del ort_session_orig ``` [relu_quantize_testcase.zip](https://github.com/microsoft/onnxruntime/files/14848160/relu_quantize_testcase.zip) --------- Co-authored-by: genmingz <genming.zhong@amd.com>	2024-04-08 19:41:43 -07:00
pengwa	41acd8c543	Support more ops for recompute (#20234 ) ### Support more ops for recompute To cover Mistral model, and support padding elimination ops. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-09 09:24:48 +08:00
Adam Louly	22a61a3cf5	Fix Mixtral Parity test to keep it consistent with Transformers. (#20210 ) ### Description I recently opened a PR in hf transformers repo to fix an issue on the indexing part. https://github.com/huggingface/transformers/issues/29857 onnx exporter was failing because of the tolist() conversion so we had to remove it. I found out that the code was also a part of our codebase so this PR is to keep the code consistent.	2024-04-08 13:04:12 -07:00
wejoncy	908a76d675	fix "4bit quantization scales and zeropoint tensor shape" (#19986 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-08 10:15:28 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
cloudhan	e19c778934	Improve KE for commandline and programmatically tuning dispatch (#18778 )	2024-04-08 11:08:59 +08:00
Ye Wang	cc3faba616	Support seq_len > 64K in rotary embedding cuda kernel (#20204 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-05 19:52:55 -07:00
Francesco	6af02ae06a	Remove non-existing function call (#19416 ) This function call is confusing, since it is a function call without definition of the function. It was correctly repalced from compute_data to compute_range, but function call was reintroudced in a later PR. ### Description Problem as described in [this issue](https://github.com/microsoft/onnxruntime/issues/18893 ) In the examples, different calls of compute_range() from calibrate.py can be found, also in the calibrate.py itself. The problem is that it was [replaced here] (https://github.com/microsoft/onnxruntime/pull/16550/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de ) from `compute_range()` to `cpmute_data() -> TensorsData` and then falsely [added as call here](https://github.com/microsoft/onnxruntime/pull/17029/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de ). ### Motivation and Context I suggest in this PR to remove this confusing call `self.calibrate_range()` in calibrate.py. Once it is removed and packaged, somehow the examples from the onnx-runtime-examples repository must be adapted, since they are already not working. Examples of `compute_range()` in the examples are linked in [this issue](https://github.com/microsoft/onnxruntime/issues/18893 ).	2024-04-05 19:48:48 -07:00
Adrian Lizarraga	05d97e8d18	Update QNN python packages to use QNN SDK version 2.19.2 (#20213 ) ### Description Update QNN python packages to use QNN SDK version 2.19.2. ### Motivation and Context Our CI builds already use QNN SDK version 2.19.2. We should make sure the ort-nightly-qnn python packages are also built with the same QNN SDK version.	2024-04-05 17:15:25 -07:00
Yi Zhang	23a5d0a305	Extend time out in Windows GPU packaging jobs (#20207 ) ### Description Extend Windows GPU Packaging job building time out to 6 hours, and test stage to 3 hours. ### Motivation and Context There're still a few timeout issues after refactoring. The probability is about 20% in https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84. I found the building could be finished in 4 hours if it becomes slow, https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f, although in most cases, it only take about 30 minutes. Not like before, the building couldn't be completed. So, In this PR, I extend the timeout to 6 hours. And one interesting thing, if one windows GPU job becomes slow, all other windows GPU jobs in the same run become slow too. So I doubt it has something with the ADO or virtualization. That is, it's not completely random. https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841	2024-04-06 08:03:42 +08:00
Andrew Grigorev	a6611409cc	Fix HalideIR title in third party notices reference (#20190 )	2024-04-05 11:12:43 -07:00
dependabot[bot]	2a323eb670	Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#19805 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.1 to 2.1.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.7</h2> <h2>What's Changed</h2> <ul> <li>[release/2.1] Disallow allocation attempts of unrepresentable sizes by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li> <li>[release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li> <li>[release/2.1] PBM decoder robustness improvements and BufferedReadStream observability by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li> <li>Backport 2681 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p> <h2>v2.1.6</h2> <h2>What's Changed</h2> <ul> <li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p> <h2>v2.1.5</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a> by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p> <h2>v2.1.4</h2> <h2>What's Changed</h2> <ul> <li>Backport WebP fix to 2.1 by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p> <h2>v2.1.3</h2> <h2>What's Changed</h2> <ul> <li>V2 Backport: 2133, 2154 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p> <h2>v2.1.2</h2> <h2>What's Changed</h2> <ul> <li>Backport - Issue 2123 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`fa7d712702`"><code>fa7d712</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a> from SixLabors/js/backport-2681</li> <li><a href="`36b3533cc3`"><code>36b3533</code></a> Use correct property to disable upstream warnings.</li> <li><a href="`94bb7615a1`"><code>94bb761</code></a> Update ImageSharp.csproj</li> <li><a href="`3ea2574726`"><code>3ea2574</code></a> Update PngDecoderCore.cs</li> <li><a href="`e74a55fbfd`"><code>e74a55f</code></a> [release/2.1] PBM decoder robustness improvements and BufferedReadStream obse...</li> <li><a href="`749b1c04d7`"><code>749b1c0</code></a> [release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li> <li><a href="`3064b78927`"><code>3064b78</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a> from SixLabors/backport/2.1.x/2545</li> <li><a href="`f36ec12695`"><code>f36ec12</code></a> Disallow allocation attempts of unrepresentable sizes </li> <li><a href="`688e242a84`"><code>688e242</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a> from SixLabors/js/backport-fix-jpeg-dos</li> <li><a href="`0f17a8be9c`"><code>0f17a8b</code></a> Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack.</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-05 11:11:52 -07:00
Hector Li	1ccb164c12	Improve the script to add Q, DQ nodes around EPContext node (#20107 ) Improve the script to add Q, DQ nodes around EPContext node so that the wrapper model use float data as inputs and outputs. User don't need to quantize or dequantize the data in their application	2024-04-05 10:12:01 -07:00
Guenther Schmuelling	c529e05e38	fix ConvTranspose 1D (#20194 )	2024-04-05 10:05:32 -07:00
dependabot[bot]	4f2d454211	Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#19806 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.1 to 2.1.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.7</h2> <h2>What's Changed</h2> <ul> <li>[release/2.1] Disallow allocation attempts of unrepresentable sizes by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li> <li>[release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li> <li>[release/2.1] PBM decoder robustness improvements and BufferedReadStream observability by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li> <li>Backport 2681 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p> <h2>v2.1.6</h2> <h2>What's Changed</h2> <ul> <li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p> <h2>v2.1.5</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a> by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p> <h2>v2.1.4</h2> <h2>What's Changed</h2> <ul> <li>Backport WebP fix to 2.1 by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p> <h2>v2.1.3</h2> <h2>What's Changed</h2> <ul> <li>V2 Backport: 2133, 2154 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p> <h2>v2.1.2</h2> <h2>What's Changed</h2> <ul> <li>Backport - Issue 2123 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`fa7d712702`"><code>fa7d712</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a> from SixLabors/js/backport-2681</li> <li><a href="`36b3533cc3`"><code>36b3533</code></a> Use correct property to disable upstream warnings.</li> <li><a href="`94bb7615a1`"><code>94bb761</code></a> Update ImageSharp.csproj</li> <li><a href="`3ea2574726`"><code>3ea2574</code></a> Update PngDecoderCore.cs</li> <li><a href="`e74a55fbfd`"><code>e74a55f</code></a> [release/2.1] PBM decoder robustness improvements and BufferedReadStream obse...</li> <li><a href="`749b1c04d7`"><code>749b1c0</code></a> [release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li> <li><a href="`3064b78927`"><code>3064b78</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a> from SixLabors/backport/2.1.x/2545</li> <li><a href="`f36ec12695`"><code>f36ec12</code></a> Disallow allocation attempts of unrepresentable sizes </li> <li><a href="`688e242a84`"><code>688e242</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a> from SixLabors/js/backport-fix-jpeg-dos</li> <li><a href="`0f17a8be9c`"><code>0f17a8b</code></a> Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack.</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-05 08:32:18 -07:00
Edward Chen	2b3071119a	Add onnxruntime/test/run_benchmark.py helper script. (#19234 ) ### Description Add onnxruntime/test/run_benchmark.py helper script to repeat benchmark runs until a target coefficient of variance is reached. It works with [Google Benchmark](https://github.com/google/benchmark) programs like `onnxruntime_mlas_benchmark`. ### Motivation and Context Sometimes there is variability in benchmark run results. This automates the repeated running needed to get results that are stable enough.	2024-04-05 07:02:01 -07:00
Hans	6abfb6b928	[js/rn] Support load external data (#20090 ) Support load external data by passing local model path	2024-04-05 05:55:03 -07:00
Scott McKay	f61cca1b8f	NNAPI: Improve MatMul diagnostic output (#19721 ) ### Description <!-- Describe your changes. --> Re-order so that we don't get two messages for the one node. Currently the batched matmul 'not supported' message will appear for 2D input which is valid, which can be confusing to understand. Change the order so we only check if batched matmul can be used when the input ranks are > 3, as that is one of the requirements. `c311d1faf5/onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder_helpers.cc (L257-L264)`	2024-04-04 21:58:39 -07:00
Thomas Boby	254bdbb19d	OneDNN/dnnl: Fix filepath after dnnl move (#20086 ) ### Description This adjusts the path used in the nuget script for dnnl to the new location of the file. There isn't a CI pipeline for this as far as I can tell, and I can't easily confirm this change works on master, so please check. ### Motivation and Context It is currently not possible to build onednn nuget packages. It's possible that the correct action would be to move the file not fix this path, but I'm not familiar enough with the repository layout. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-04-04 21:24:49 -07:00
Yi Zhang	4ea54b82f9	[Fix] Upload training CUDA daily wheel (#20183 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-03 13:18:26 +08:00
Andrew Fantino	7303a90f49	Fix build errors from date/date.h C++20 compatibility (#20139 ) ### Description For C++ standards >= 20, use `std::chrono::operator<<` in place of `date::operator<<` to fix ambiguous operator compile error. ### Motivation and Context The external dependency HowardHinnant/date has a conflict with std::chrono for >=C++20. Solves #20137	2024-04-02 22:10:25 -07:00
Yi Zhang	dae77e6014	Support building Windows CUDA with Ninja (#20176 ) ### How to run it locally 1. conda install ninja 2. "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64 3. python.exe {ort_repo}\tools\ci_build\build.py --config RelWithDebInfo --build_dir {ort_repo}\build_cuda --skip_submodule_sync --build_csharp --update --parallel --cmake_generator "Ninja" --build_shared_lib --enable_onnx_tests --enable_pybind --build_java --build_nodejs --use_cuda "--cuda_home=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=60 4. cd build_cuda\RelWithDebInfo 5. cmake --build . j16 ### Motivation and Context In packaging pipelines, we often come across a random issue that the building with CUDA on Windows takes too much time. Although it has been reduced much by moving the building to the CPU machine. We're planning to build with Ninja instead of msbuild in Packaging pipelines, thus, nvcc can run parallelly. It's the first step to support it locally.	2024-04-03 11:19:31 +08:00
Yulong Wang	fa1917b81b	[js/webgpu] add validation to workgroup size (#20110 ) ### Description add validation to workgroup size in `shaderHelper.mainStart()`.	2024-04-02 19:29:20 -07:00
Shubham Bhokare	be831e1ba3	Export of Openai Whisper with batched prompts (#19854 ) Adds an example to demonstrate the export of openai whipser implemenation with batch_size > 1 and addition of prompts for each audio snippet. Also handles the scenario for when prompts are not of the same size. For example if our prompt ids are [p1_id_1, p1_id_2] and [p2_id_1], the final decoder_input_ids will look as such after padding: `[prev_token, p1_id_1, p1_id_2, start_token, lang_token, transcribe_token] [prev_token, p2_id_1, PAD_TOKEN, start_token, lang_token, transcribe_token]` --------- Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>	2024-04-02 17:01:48 -07:00
Rachel Guo	19793de1b3	#19921 [Dup] LLC Core count calculations updated (#20171 ) ### Description <!-- Describe your changes. --> See #19921 Just to address one comment: https://github.com/microsoft/onnxruntime/pull/19921#discussion_r1543398640 since this is an external branch. need to open another pull request for this. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com>	2024-04-02 16:53:47 -07:00
Dmitri Smirnov	12e2538065	Add new SessionOptions config entry to disable specific transformers and rules (#20135 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Certain transformers slow down session loading time while providing no runtime perf benefits. Allow clients to exclude them.	2024-04-02 16:33:05 -07:00
Chi Lo	e916929371	[TensorRT EP] Address compiler warnings on Windows (#20134 ) Previous [PR ](https://github.com/microsoft/onnxruntime/pull/19663)changes msvc compiler warning level from set_msvc_c_cpp_compiler_warning_level(3) to set_msvc_c_cpp_compiler_warning_level(4) when using CUDA EP (it also applies to TRT EP). Some warnings still need to be addressed in TRT EP code.	2024-04-02 10:39:46 -07:00

1 2 3 4 5 ...

10884 commits