onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
Edward Chen	76461c8f4d	Increase timeout for iOS packaging pipeline jobs. (#20434 )	2024-04-23 11:55:55 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
guyang3532	ffb9c8d598	fix embedding sparsity log bug of -1% density (#20420 ) ### Description When not checked valid embedding sparsity, the log print a wrong info of "-1% density", this pr is to fix it.	2024-04-23 20:37:50 +08:00
Scott McKay	ed6f1adcb8	Fix overflow causing test failure on x86 (#20425 ) ### Description <!-- Describe your changes. --> Fix comparison that was not updated when the threshold was converted to bytes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI failure	2024-04-23 21:33:59 +10:00
Maximilian Müller	5eae33fc6b	[CUDA EP] RNN check if tf32 is allowed (#20338 ) Respect the use_tf32 flag.	2024-04-23 00:19:09 -07:00
Yi Zhang	7ebc653f04	Revert "Nuget .NET changes for Mac Catalyst (#19923 )" (#20418 ) This reverts commit `f396748ed6`. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 15:08:12 +08:00
Adrian Lizarraga	e6a677f6b7	[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359 ) ### Description - Updates Windows QNN Nuget and Python packaging pipelines to download QNN SDK from blob storage. - Makes the QNN SDK version configurable when launching the python packaging pipeline. ### Motivation and Context Removes the need to rebuild images to update QNN SDK. Only applies to Windows pipelines. Linux pipelines still get the SDK from disk.	2024-04-22 22:32:55 -07:00
aciddelgado	94c69f55d4	GQA 4 CPU (#20299 ) ### Description Support GQA operator on CPU with FP32. ### Motivation and Context Right now, models generated for CPU and GPU must be different. GQA CPU allows these models to be the same.	2024-04-22 19:57:05 -07:00
Scott McKay	c47a6ce70b	XNNPACK: Support 1D input for Conv and ConvTranspose (#20349 ) ### Description <!-- Describe your changes. --> Support 1D input to XNNPACK Conv and ConvTranspose by using faking height of 1 to convert to 2D input. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable speech model with 1D input to use XNNPACK. There is no CPU EP quantized ConvTranspose, so this fills that gap.	2024-04-23 11:50:31 +10:00
Edward Chen	3270a002fa	Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. (#20379 ) Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. The ORT format model doesn't contain information about kMSInternalNHWCDomain since it is set during layout transformation. Fall back to known domains instead.	2024-04-22 18:34:01 -07:00
Preetha Veeramalai	c7de4de501	OVEP Bug fix 1.18 (#20408 ) ### Description Contains critical bug fix ### Motivation and Context This PR handles the bug fix wrt OV caching and blob generation. This also handles the precision for AUTO plugin. --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2024-04-22 18:31:05 -07:00
pengwa	a7787a0bad	Introduce memory efficient topological sort (#20258 ) ### Introduce memory efficient topo sort (for training) ~~and laze initialize Priority-Based and Memory-Efficient topo sort. Because in most cases, they are not needed, so we free the overheads of GraphViewer construction for most use cases.~~ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 08:00:23 +08:00
Scott McKay	9372e9a0a3	Support >2GB of Tensor data in training checkpoint (#20077 ) ### Description <!-- Describe your changes. --> Add ability to store initializer data in an external file. Update training checkpoint code to use external file if data > ~2GB. I don't see a way for the flatbuffers 64-bit offsets to be used, as they don't support storing 'table' types with 64-bit offsets (and our Tensor is a 'table' type not a simple struct). `0cfb7eb80b/tests/64bit/test_64bit.fbs (L38-L39)` Allowing a Tensor to have its raw_data in an external file should hopefully work with the least friction. As it's an extra field it's backwards compatible. Please feel free to suggest alternative approaches. Side note: the diffs in the generated *.fbs.h files are unexpectedly large. Maybe they weren't re-generated when the new flatbuffers version was checked in. I updated by running: `python .\compile_schema.py -f <build output dir>\_deps\flatbuffers-build\Debug\flatc.exe` from onnxruntime\core\flatbuffers\schema which I thought was the correct way but maybe that's out of date. I think you can ignore all the diffs in the generated files and just worry about the changes to the .fbs files in onnxruntime/core/flatbuffers/schema. Basically start at the bottom of the files changed and work up as all the 'real' diffs are there. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: carzh <wolfivyaura@gmail.com>	2024-04-22 15:17:43 -07:00
Yulong Wang	4385602386	[js/web] fix test runner with optional input/output (#20399 ) ### Description fix test runner with optional input/output. This change fixes the OP test runner (.jsonc format test) with optional input(s) and/or output(s). this fix reveals a problem of dealing with optional outputs: > Take SkipSimplifiedLayerNorm as example: > > if in the ONNX model, the node's outputs are: [ 'output_0', '' ] instead of [ 'output_0' ], the current implementation will fail. The difference is, in the first case, context.outputCount == 2, and then the typescript implementation will try to create a tensor for output[1]. It will eventually call to C++ function (OpKernelContext::Output), and the output.DataRaw() will be nullptr. WebGPU backend will fail because it cannot deal with a TensorView with data == 0. > This problem may need to be fixed or workaround in separated PR. This PR does not fix this problem. Failed test cases are modified to work - please note this PR does not break those test cases as they never work.	2024-04-22 12:53:10 -07:00
aamajumder	d0e33d2078	[DML EP] Register opset 20 operators (#20092 ) ### Description This PR registers the following opset 20 operators to the DML EP: -IsNaN-20 -IsInf-20 -ReduceMax-20 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-22 12:01:59 -07:00
Yi Zhang	197b3f1d90	Enable Whisper Test with OMP_FFMPEG (#20402 ) ### Description Installing OMP_FFMPEG in the docker and Readd Whisper Test Download OMP_FFMPEG in restricted accessed Azure blob.	2024-04-22 10:55:56 -07:00
Yulong Wang	a457c1df80	upgrade emsdk to 3.1.57 (#20295 ) ### Description upgrade emsdk to 3.1.57	2024-04-19 23:05:18 -07:00
Adrian Lizarraga	77b7619a3d	[QNN EP] Support float16 BatchNormalization on the HTP backend (#20391 ) ### Description - Adds support for float16 BatchNormalization to the HTP backend. - Fixes float32 support for BatchNormalization on the HTP backend when `enable_htp_fp16_precision` is enabled. ### Motivation and Context Support more models on the QNN HTP backend.	2024-04-19 21:49:39 -07:00
Patrice Vignola	8fbb8a149f	[DML EP] Add MatMulNBits (#20308 )	2024-04-19 15:05:37 -07:00
Hector Li	55e0aaeeef	fix android build issue (#20389 ) fix android build issue	2024-04-19 14:21:34 -07:00
Rachel Guo	f396748ed6	Nuget .NET changes for Mac Catalyst (#19923 ) ### Description <!-- Describe your changes. --> Add Nuget package changes for adding new 'net6.0-maccatalyst' platform. The output ORT Nuget package was manually tested and verified in a .NET MAUI app setup. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-19 14:20:03 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Dmitri Smirnov	42b700d463	Eliminate stray vector and the contention it creates (#20377 ) ### Description Unused vector allocating large memory chunk within a concurrent routine creates heap contention and is eliminated. ### Motivation and Context This partially addresses https://github.com/microsoft/onnxruntime/issues/20373.	2024-04-19 10:27:42 -07:00
Patrice Vignola	4d98f06f93	[DML EP] Add GroupQueryAttention (#20327 )	2024-04-19 10:25:29 -07:00
Wanming Lin	7c80c39f74	[WebNN EP] WebNN CPU backend only support up to 4 Split outputs (#20350 )	2024-04-19 08:31:22 -07:00
sfatimar	4d1963c2a2	OpenVINO EP Rel 1.18 Changes (#20337 ) ### Description These changes include Support to OpenVINO 2024.1 Import PreCompiled Blobs with EPContext Blob Separate Device/Precision as input Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU AUTO GPU, CPU will only create GPU Blob and not CPU Blob. ### Motivation and Context - OpenVINO 2024.1 will be out soon - Import Precompiled Blob can greatly reduce FEIL/FIL Time. - Separating Device/Precision will make the input cleaner - --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2024-04-19 00:31:38 -07:00
Yueqing Zhang	9001c69b84	[VitisAI] Add Version Check. Requsted by Microsoft (#20347 ) ### Description <!-- Describe your changes. --> Add version for onnxruntime_providers_vitisai.dll. So, the onnxruntime_vitisai_ep.dll can check if the version is compatible. To make sure the old onnxruntime_vitisai_ep.dll still work, we would offset the api struct by version field. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? --> This is the direct request from Microsoft. The following is the problem we try to solve: How would you describe the dependency between (a) onnxruntime_vitisai_ep.dll and (b) onnxruntime_providers_vitisai.dll? E.g. for each version of (a) there is a minimum required version of (b), or for each version of (b) there is minimum required version of (a). Please note that in practice we won't be able to use the exact version of ORT/EP that you tested against (because we might need to update ORT for other reasons), but we might be able to accommodate some version constraints that you specify. As we approach shipping, we'll lock the version of ORT/EP to allow for stabilization and more detailed testing (and work with you if it needs to be updated).	2024-04-18 23:05:44 -07:00
Patrice Vignola	12569626cb	Update DML to 1.14.1 (#20380 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 22:43:41 -07:00
Patrice Vignola	b8c90beef2	[DML EP] Add SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20326 )	2024-04-18 22:17:31 -07:00
Chi Lo	a747a00cd3	[TensorRT EP] Use protobuf with debug build on Windows (#20378 ) TRT EP implicitly uses oss_parser with debug build on Windows, therefore it should use protobuf rather than protobuf-lite.	2024-04-18 19:39:08 -07:00
Patrice Vignola	745b426c60	[DML] Update DML to 1.14 (#20304 ) I am prefiring this change to pre-run the non-dml checks, and also to give folks the time to review it before DML gets released. When DML 1.14 officially releases, we'll only need to run the DML pipeline to automatically pick up the nuget package. This should save us some valuable time. Note that DML 1.14 is the release needed for ORT 1.17.4, and DML 1.15 will come soon after.	2024-04-18 16:22:57 -07:00
Adrian Lizarraga	e4c0cb2b9a	[Quant tool] Do not default to contrib Q/DQ ops for 16-bit (#20376 ) ### Description Updates the QDQ quantizer to use ONNX Q/DQ ops for 16-bit quantization if opset >= 21. ### Motivation and Context The QDQ quantizer previously set the 'com.microsoft' domain on inserted Q/DQ ops when the model needed 16-bit support. ONNX 1.16.0 added int16/uint16 support to the QuantizeLinear and DequantizeLinear operators, so we can change the default behavior.	2024-04-18 15:26:07 -07:00
Chi Lo	a8f74e3ec7	[TensorRT EP] TensorRT 10 support (#20167 ) This PR has the change of supporting INT64 tensor type for TRT 10. This PR is also compatible with TRT 8.6 and TRT 10 meaning user can build ORT TRT against TRT 8.6 or TRT 10. Due to the timeline for TRT 10 GA and ORT 1.18 release is very tight (We don't have enough time to get our CIs installed with TRT 10 GA libraries and run the build/tests), as well as Nvidia new Triton release (The timeline is also very close to the timeline of TRT 10 GA) wants to integrate TRT EP with TRT 10. Therefore, our approach is to make this PR into ORT 1.18 first, so everything is fully tested with TRT 8.6 CIs, and user can still manually build ORT 1.18 against TRT 10 like the Triton case. As for testing TRT 10, once TRT 10 GA is released, we will have another branch which includes change at this PR as well as whatever changes needed and update our CIs with TRT 10.	2024-04-18 14:03:04 -07:00
Yulong Wang	3577a4bd02	[Node.js binding] Allow installation to download CUDA binaries via script (#20364 ) ### Description Currently we try to include all prebuilt binaries into the NPM packages. This was working until we added libonnxruntime_providers_cuda.so (>400MB) into the NPM package. The NPM registry refuses to accept new package publishment because the file is too large. To make the new NPM package working, we have to remove the large file from the package, and add a new script on package installation. This script will try to dynamically install onnxruntime CUDA dynamic library for Linux/x64.	2024-04-18 13:44:42 -07:00
Guenther Schmuelling	7b017cf9f8	fix web ci: csum tests need fp64 which is not supported on webgpu (#20374 )	2024-04-18 12:30:26 -07:00
Adam Louly	ee74fb6908	Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287 ) ### Description Introducing a new class ORTPipelineModule to handle wrapping layers in DeepSpeed pipeline parallel. ### Motivation and Context To support pipeline parallelism on ORTModule. This PR will include an initial support of deepspeed Pipeline parallelism. - [x] Support Pipeline parallel where layers are nn Modules in Sequential. - [ ] Support LayerSpec and TiedLayerSpec - [ ] Enable partitioning to accept List - [ ] Full-GPU Graph Consolidation - [ ] Subgraph Merging for Inference	2024-04-18 11:30:15 -07:00
Sumit Agarwal	f664f91298	[DML EP] Expose NPU macro via build command (#20306 ) ### Description This fixes following things: - Expose `ENABLE_NPU_ADAPTER_ENUMERATION` macro via build command, so that a user can enable NPU support for DML EP seamlessly. - Add keyword `_dmlEp_` as part of the node name, which would be useful for debugging purpose. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 11:23:13 -07:00
Patrice Vignola	76434907fb	[DML EP] Add graph capture (#20257 ) This adds a new "Graph Capture" option to the DML ep, similar to the cuda graph functionality. Here's how graph capture works: - A user can enable graph capture in the session options by setting `ep.dml.enable_graph_capture` to `true` - When they want to capture a run, they set `gpu_graph_id` in their `RunOptions` to a number bigger than 0 (0 is reserved for internal use according to the cuda graph documentation). - Then, when they start the inference, the graph will be captured and stored in the DML EP for future use - When they execute the run for a second time with the same id, the `ReplayGraph` function in the DML EP will be called instead of executing the kernels, resulting in very low overhead and avoiding kernel recompilation. This feature can give up-to-par or even better performance than specifying the static dimensions at session creation time, but is also much more flexible.	2024-04-18 10:15:00 -07:00
Vincent Wang	c47f446f25	Support BFloat16 for Triton Codegen (#20353 ) Previous implementation used numpy array and numpy data_type to store constant value and data type, which is not support BFloat16 natively. This PR is to switch to use torch tensor which supports BFloat16.	2024-04-18 17:15:11 +08:00
Wanming Lin	da86f6f408	[WebNN EP] Add operators support table (#20253 )	2024-04-17 21:19:46 -07:00
Hector Li	5daeb5e0b0	enable model with external data be loaded from memory buffer (#19089 ) ### Description Background: User save large model with initializer data in external file. e.g: onnx.save_model(onnx_model, "path/to/save/the/model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="filename", size_threshold=1024). In that case, Ort loads the model, get the external initializer information (external file name, offset, length) and use the model path to find the external file, and locate to the tensor data via the offset and length. But it won't work if user load the model from memory, since Ort lost track of the model path. This PR adds API/session option to let user provide a table with external initializer file name as the key, the pointer to the loaded external file in memory and the buffer length as value. So that 1. user can load the model from memory buffer with external initializers in memory buffer too. 2. the initializers can be shared across sessions, for different EPs. 3. user can load the file in any way they want, e.g mmap. Internally, 1. at session creation time, Ort goes through the external initializers in the graph, gets the file name, offset, data length of the external initializers from Tensorproto . 2. With the file name, Ort get the file in memory buffer and buffer length from the table user provided. 4. Ort locates the tensor buffer from file in memory buffer (user provided) using the offset and data length (from Tensorproto ). 5. Ort creates the Tensor and replace the existing Tensor in the graph. ### Motivation and Context https://github.com/onnx/onnx/blob/main/docs/ExternalData.md For a model with external data, the Tensorproto may have initializer data in a separate file. The external file location is set via the file path relative to the model path. With the API to load model from memory buffer, it lost track of the model path. So it causes error if the model has external data. By adding a session option to set the external data buffer, Ort can find the external data correctly if model loaded from memory buffer.	2024-04-17 19:01:01 -07:00
Hector Li	301240433c	Log error caused by NPU SSR. (#20356 ) ### Description Log error caused by NPU SSR.	2024-04-17 18:27:45 -07:00
Edward Chen	ccaa4d1db2	[MLAS][AArch64] SQNBitGemm M>1 CompFp32 kernel optimization (#20319 ) Add ARM NEON intrinsics implementation for `Q4BitBlkDequantBForSgemm_CompFp32`.	2024-04-17 17:50:26 -07:00
Patrice Vignola	6bd6d879a3	[DML EP] Improve python API perf (#20331 ) This change improves the python API perf in 2 few ways: 1. Remove unnecessary CPU syncs by sharing a queue between the python EPs and the allocator. 2. Add an opt-in CPU spinning sync to reduce overhead in applications that run a lot of inferences per second.	2024-04-17 17:33:37 -07:00
Guenther Schmuelling	a8a77ddfdc	fix csum and enable ut (#20355 )	2024-04-17 15:01:06 -07:00
dependabot[bot]	4c3fc26255	Bump Sixlabors.ImageSharp from 2.1.7 to 2.1.8 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#20314 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.7 to 2.1.8. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.8</h2> <h2>What's Changed</h2> <ul> <li>V2 - Limit Read Palette Indices by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2719">SixLabors/ImageSharp#2719</a></li> <li>V2 - Clear Pixel Buffers on Decode. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2717">SixLabors/ImageSharp#2717</a></li> <li>V2 - Limit all memory allocations in the MemoryAllocator layer by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2715">SixLabors/ImageSharp#2715</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f21d64188e`"><code>f21d641</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2715">#2715</a> from SixLabors/backport/v2-memlimit</li> <li><a href="`8f0b4d3e68`"><code>8f0b4d3</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2717">#2717</a> from SixLabors/backport/v2-clear-buffers</li> <li><a href="`cf9496d284`"><code>cf9496d</code></a> test allocation limits</li> <li><a href="`3d298db2cd`"><code>3d298db</code></a> Adapt BmpDecoder_ThrowsException_Issue2696 for V2</li> <li><a href="`a78ce27a2b`"><code>a78ce27</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2719">#2719</a> from SixLabors/backport/v2-check-palette-indices</li> <li><a href="`e6209147b1`"><code>e620914</code></a> Clamp read palette indices.</li> <li><a href="`c122185ea0`"><code>c122185</code></a> Clear pixel buffers on decode.</li> <li><a href="`5c6ec5d6fb`"><code>5c6ec5d</code></a> Limit all allocations</li> <li>See full diff in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.7&new-version=2.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-17 14:47:44 -07:00
Adrian Lizarraga	eae7b705ac	[Quant tool] Fix quantized bias's scale dtype to properly handle fp16 bias inputs (#20340 ) ### Description - Fix quantization tool bug that did not correctly set a quantized bias's scale data type to fp16 if the original bias was fp16. - Enabled fp16 ConvTranspose quantization unit tests that were disabled. ### Motivation and Context Python quantization tests for fp16 ConvTranspose were originally disabled due to a shape inference bug. It turns out that we also have a bug in our quantizer that does not properly handle fp16 bias inputs. Fixing the bug allows us to re-enable these tests with the latest version of ONNX.	2024-04-17 10:24:28 -07:00
Adrian Lizarraga	0a1902525f	Add patch for ONNX 1.16.0 shape inference bug (#20316 ) ### Description - Adds a patch that fixes a shape inference bug that caused a segfault: https://github.com/onnx/onnx/pull/6080 - Fix documentation describing why QLinearMatMul tests are currently being skipped. ### Motivation and Context The [PR for integrating with ONNX 1.16.0](https://github.com/microsoft/onnxruntime/pull/19745) disabled various python quantization tests due to a shape inference bug. This PR applies the ONNX fix as a patch. We still can't enable the tests because some of our CIs pip install onnx-1.16.0, which doesn't include the fix.	2024-04-17 10:23:22 -07:00
Hector Li	bb1972264b	Enable provider option to let user provider the profiling file path (#20285 ) Enable provider option to let user provider the profiling file path. Separate out the profiling level for ETW, in case there's switch like ETW enabled when Ort creates the QNN profiling, then gets disabled when Ort logs the profiling events. vise versa. Enhance the logic to decide the profiling level.	2024-04-17 09:42:40 -07:00
Scott McKay	8143b0d798	Add helper to get errno and error message (#20324 ) ### Description <!-- Describe your changes. --> Add platform aware helper to fetch errno message string. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For usage in #20077 --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-04-17 21:17:36 +10:00

1 2 3 4 5 ...

10952 commits