onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-21 21:52:11 +00:00

Author	SHA1	Message	Date
Edward Chen	f1be92faf0	Patch fp16 to fix Xcode 16 builds with XNNPACK EP targeting x86_64. (#22294 )	2024-10-03 14:17:15 -07:00
Yi Zhang	bbb54985a8	Add MaxPool FP16 in XnnPack EP (#22258 ) ### Description Add support for FP16 kernels in the XnnPack execution provider for MaxPool operations. Fixes: [AB#50332](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50332) ### Motivation and Context The major purpose of this pull request is to add some common vars/functions and setup a consistent style for adding FP16 kernels in XnnPack EP. ---------	2024-10-03 18:28:58 +08:00
Caroline Zhu	c73e6afa6c	Migrate Android Java E2E tests from App Center to Browserstack (#22117 ) ### Description - removed installing AppCenter + pipeline step that runs AppCenter Espresso tests - added script for running AppCenter tests ### Motivation and Context App Center is getting deprecated in the next year + we have upcoming Android work that depends on working E2E testing. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-10-02 15:04:58 -07:00
Dmitri Smirnov	224f0651d0	[C#] Expose Multi-Lora support in C# (#22281 ) ### Description ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/22046	2024-10-02 10:00:43 -07:00
goldsteinn	4e15b229a0	ThreadPool: Spend less time busy waiting. (#21545 ) The purpose of the patch is primarily to save power, but it also has nice perf benefits (mostly from allowing the system to better distribute power to cores doing meaningful work). Changes are twofold: 1) Decrease WorkerLoop spin count dramatically ~10^6 -> ~10^4. The reality is after ~10^4 spins, if there hasn't been any new work added its unlikely any new work is imminent so sleep to preserve power. This aligns more closely with upstream EigenV3. 2) Use exponential backoff for waiting on memory. This saves a bit more power, and important increases the time between iterations in WorkerLoop to help accomidate the dramatically lowering spin counts. Since the tuning for both the iteration counts / backoff counts are dramatically different for hybrid/non-hybrid systems, this patch templates the affected functions and dynamically choses based on `CPUIDInfo::IsHybrid()`. This seemed like the "lightest weight" way of getting the change in, although its likely we could incur less dynamic overhead if we added the template argument to the entirety of `ThreadPoolTempl`. Measured performance on an [Intel Meteor Lake CPU](https://www.intel.com/content/www/us/en/products/sku/237329/intel-core-ultra-7-processor-165u-12m-cache-up-to-4-90-ghz/specifications.html) across a range of models. Below are the result of 3 runs with each metric being the value-before-patch / value-after-patch (so for something like inference time, lower is better). <div align="center"> <table> <tr> <th>Session creation time cost</th> <td>0.7179</td> </tr> <tr> <th>First inference time cost</th> <td>0.7156</td> </tr> <tr> <th>Total inference time cost</th> <td>1.0146</td> </tr> <tr> <th>Total inference requests</th> <td>0.8874</td> </tr> <tr> <th>Average inference time cost</th> <td>0.8800</td> </tr> <tr> <th>Total inference run time</th> <td>1.0146</td> </tr> <tr> <th>Number of inferences per second</th> <td>0.8955</td> </tr> <tr> <th>Avg CPU usage</th> <td>0.9462</td> </tr> <tr> <th>Peak working set size</th> <td>0.9922</td> </tr> <tr> <th>Runs</th> <td>1.1552</td> </tr> <tr> <th>Min Latency</th> <td>0.7283</td> </tr> <tr> <th>Max Latency</th> <td>0.9258</td> </tr> <tr> <th>P50 Latency</th> <td>0.9534</td> </tr> <tr> <th>P90 Latency</th> <td>0.9639</td> </tr> <tr> <th>P95 Latency</th> <td>0.9659</td> </tr> <tr> <th>P99 Latency</th> <td>0.9640</td> </tr> </table> </div> So the net result is a 1.16x improvement in throughput and between 1.08-1.37x improvement in latency.	2024-10-01 17:25:02 -07:00
Adam Pocock	14d1bfc34b	[java] Multi-LoRA support (#22280 ) ### Description Java parts of Multi-LoRA support - #22046. ### Motivation and Context API equivalence with Python & C#. --------- Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>	2024-10-01 13:54:37 -07:00
Dmitri Smirnov	1fc2b94644	Address Android warning error (#22285 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Build issue https://github.com/microsoft/onnxruntime/pull/22046#issuecomment-2386414899	2024-10-01 13:52:25 -07:00
Edward Chen	c24e55b1f1	[Java] Add API for appending QNN EP (#22208 ) - Add Java API for appending QNN EP - Update Java unit test setup - Fix issues with setting system properties for tests - Unify Windows/non-Windows setup to simplify	2024-10-01 10:18:04 -07:00
Tianlei Wu	e2b9ccc44a	Update SAM2 benchmark for testing torch compile modes and profiling (#22279 ) This pull request introduces several enhancements to the benchmarking process for the SAM2 model, including: (1) Add profiling capabilities. (2) test torch compile modes (none will disable compile and fallback to eager mode) (3) Update README for setting up the environment. ### Documentation Updates: * README.md: Updated instructions to create separate conda environments for GPU and CPU benchmarking, and detailed the parameters and outputs of the benchmark script. ### Benchmark Script Enhancements: * benchmark_sam2.py: Added optional parameters for enabling NVTX and PyTorch profiling, and adjusted the initialization and execution flow to incorporate these profiling options. These changes enhance the flexibility and functionality of the benchmarking process, making it easier to profile and benchmark the SAM2 model on different hardware configurations.	2024-10-01 09:51:12 -07:00
Yufeng Li	96e9c99dce	remove neural-speed (#22236 ) ### Description <!-- Describe your changes. --> NS is not developed anymore and ORT doesn't use it for int4 inference either. Remove it to clean up the code ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 09:50:44 -07:00
kunal-vaishnavi	50bda44a70	Fix equation in MatMulNBits op spec (#22253 ) ### Description This PR fixes an equation in the MatMulNBits op spec. The old formula is stated as ``` [CeilDiv((N * n_blocks_per_col + 1) * bits, 8)] ``` but it should be stated as ``` [N * CeilDiv(n_blocks_per_col * bits, 8)] ``` or as ``` [N * FloorDiv((n_blocks_per_col + 1) * bits, 8)] ``` ### Motivation and Context For models such as ChatGLM where the column size is odd, the division math can be off. For example: ![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0) With the old equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points = 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832 ``` With the new equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points= 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136 ```	2024-10-01 09:31:56 -07:00
Mauricio A Rovira Galvez	ffca096b5a	Fixes a crash on macOS 15 when using CoreML. (#22277 ) ### Description In macOS 15, apps running with CoreML will crash with an error message like this one: ``` Terminating app due to uncaught exception 'NSGenericException', reason: 'Failed to set compute_device_types_mask E5RT: Cannot provide zero compute device types. (1)' ``` This can be easily seen when building ONNXRuntime from source and running the unit tests. The fix was suggested in [this bug report](https://forums.developer.apple.com/forums/thread/757040). I've ported the change to ONNXRuntime and verified that: * The issue is resolved in macOS 15 (all unit tests pass). * The behaviour is unchanged in macOS 14. ### Motivation and Context This fixes #22275 allowing apps using ONNXRuntime with CoreML to work normally.	2024-10-01 16:06:03 +10:00
Scott McKay	ee7081b828	Fix syntax for some CoreML ML Program supported operator entries (#22268 ) ### Description <!-- Describe your changes. --> Fix syntax so usability checker works as expected. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 15:49:43 +10:00
Yang Gu	9e5153b688	[js/webgpu] Manage model download with a specific unittest option (#22214 ) Currently in debug mode, unit test will always download models to local file system, which is a bit annoying. This PR fixes this by adding a specific option to enable model download.	2024-09-30 18:27:43 -07:00
Yang Gu	c75f4a09b7	[js/webgpu] Remove the limitation on axis in softmax (#22231 ) In current implementation, axis in softmax has to be the last, which is an obvious limitation. This PR removes this limitation and will fix issues #20710 and #22176.	2024-09-30 18:27:11 -07:00
Dmitri Smirnov	d9de054eb5	Multi-Lora support (#22046 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-30 15:59:07 -07:00
Jian Chen	40bcb7664d	Revert "Jar Maven Signing - GnuPG and sha256" (#22273 ) Reverts microsoft/onnxruntime#22217	2024-09-30 15:07:59 -07:00
Jian Chen	ebcf2fcd16	Replace gradle/wrapper-validation-action with gradle/actions/wrapper-validation-action (#22224 ) ### Description Replace gradle/wrapper-validation-action with gradle/actions/wrapper-validation-action ### Motivation and Context This is recommended by https://github.com/gradle/wrapper-validation-action. This job uses deprecated functionality from the 'gradle/wrapper-validation-action' action.	2024-09-30 14:29:16 -07:00
Ranjit Ranjan	812075731c	[AIX] Build fix for using system installed protobuf/onnx (#22272 ) ### Description To fix the build issues for AIX OS while using system installed protobuf/onnx. ### Motivation and Context Code changes in this PR contains: 1. Fix for below compilation issue. ``` collect2: fatal error: library liblibprotobuf-lite not found compilation terminated. ``` 2. Adding onnx library into dependency list for test applicaitons.	2024-09-30 12:36:21 -07:00
Yi Zhang	d069475a63	Make A100 jobs in PR checks again (#22261 ) ### Description if the variable is 1, the job running on A100 in PR checks. Fixes [AB#50333](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50333) ### Motivation and Context We wish more big models which need to run on A100 can be tested in PR checks, but Azure may decommission A100 agents without notifications sometimes, which will block merging PRs. This PR is an improvement of current workaround, making those jobs only run main branch. Once we find the A100 are all decommisioned by Azure, we could change the UseA100 variable to 0 to disable the A100 jobs in PR checks	2024-09-30 08:29:30 -07:00
wejoncy	2cfe1f031d	[CoreML MLProgram] Support Float16 (1/N) (#22068 ) ### Description Support Float16 for CoreML MLProgram EP. Operations: "Add", "Mul", "Sub", "Div", "Pow", "Sqrt", "Reciprocal", "Sigmoid", "Tanh", "Relu", "LeakyRelu", "Concat", "GridSample", "GlobalAveragePool", "Clip", "DepthToSpace", "Resize", "Slice", "Conv", "ConvTranspose", "GlobalMaxPool", "Gemm", "MatMul", "AveragePool", "MaxPool", "Reshape", "Split", "Transpose" ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-09-30 17:56:47 +08:00
Yang Gu	434f0fa536	[js/webgpu] Fix the crash issue in unsqueeze (#22264 ) While allowing axes in unsqueeze to be scalar, its shape couldn't be always accessed like a vector. This PR fixes issue #22031 so that the original model could run well.	2024-09-30 02:28:16 -07:00
Yulong Wang	1bda91fc57	[js/webgpu] fix external buffer registration (#22254 ) ### Description Fixes the problem of running into failure when GPU inputs shuffled between iterations.	2024-09-28 10:36:40 -07:00
Enrico Galli	52a8c1cae8	[WebNN EP] Enable IO Bindings with MLTensor (#21301 ) ### Description Enables using the MLTensor to pass data between models. ### Motivation and Context Using MLTensor instead of ArrayBuffers reduces the number of copies between the CPU and devices as well as the renderer and GPU process in Chromium.	2024-09-27 17:24:21 -07:00
Patrice Vignola	ebda23be16	[DML EP] Fix Clip clamping (#22251 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 16:24:37 -07:00
shiyi	1e3cd86d80	[WebNN EP] Support LSTM op (#20293 ) <!-- Describe your changes. --> <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 14:23:08 -07:00
liqun Fu	f410e7c4cf	Fix mlas bench crash (#22248 ) Fix mlas bench crash --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-09-27 13:50:42 -07:00
Sumit Agarwal	529835cc46	[DML EP] Update DML to 1.15.2 (#22247 ) ### Description Update DML binary to the current latest redist version [1.15.2](https://www.nuget.org/packages/Microsoft.AI.DirectML/1.15.2).	2024-09-27 13:20:29 -07:00
Patrice Vignola	20be51525b	Support if node with sequence outputs (#22234 ) `If` nodes can have sequence outputs. Those nodes are mapped to the DML EP to be able to keep the outputs on the GPU, but they actually execute on the CPU by selecting either the `then` subgraph or the `else` subgraph.	2024-09-27 12:40:01 -07:00
Patrice Vignola	14ba2fb83c	[DML EP] Add intermediate tensor dumping for DML (#22246 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 12:39:45 -07:00
Hector Li	6e3163faa5	Update code regarding some QNN bug fixes (#22222 ) ### Description Update code regarding some QNN bug fixes: 1. QnnProfile_ExtendedEventData_t.version is not initialized in Qnn 2. Failed to finalize the graph for HardSigmoid with FP16 precision	2024-09-27 09:51:47 -07:00
Kyle	b81e76b9a6	Jar Maven Signing - GnuPG and sha256 (#22217 ) ### Description <!-- Describe your changes. --> Jar maven signing: - GnuPG - sha256. Jar packages artifacts: - onnxruntime-android-full-aar - onnxruntime-java - onnxruntime-java-gpu ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously, it is manually signed. Goal: make it automatically.	2024-09-27 17:50:06 +08:00
Tianlei Wu	ff8a48ef3b	Update SAM2 benchmark script and doc (#22238 ) (1) Fix a bug of parameters order. (2) Update benchmark script: * download test image if not exist * combine multiple csv files into one file, and remove duplicated lines (3) Add a section for benchmark in README.md	2024-09-26 20:57:03 -07:00
Scott McKay	3846f84218	Increase React Native E2E (#22230 ) ### Description <!-- Describe your changes. --> Increase the detox setup timeout to 4 minutes. The iOS RN E2E tests are taking slightly around 2 mins to setup causing flakiness. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve RN CI pass rate	2024-09-27 08:59:36 +10:00
Tianlei Wu	2deab75d39	Add numeric_limits for float8 types (#22228 ) Add std::numeric_limits for float8 data types to provide a consistent way to access limits of those types. Reference: * https://onnx.ai/onnx/technical/float8.html	2024-09-26 14:42:36 -07:00
Jing Fang	1942e40e05	[ARM64] MatMulNBits: use neon instrinsics to convert between fp16 and fp32 (#22195 ) ### Description For fp16 Atype, the fallback operation is convert the data to fp32 and calculate. Added neon intrinsics version to speed up the conversion. Store address alignment and loop unrolling have insignificant impact on latency so they are omitted. \|Benchmark \| Time \| CPU \| \|--------------\|---------------------------------------------\|--------------------\| \|M_ConvertF16ToF32/baseline/real_time \| 1076961 ns \| 1083398 ns \| \|M_ConvertF16ToF32/aligned:0/real_time \| 46785 ns \| 46516 ns \| \|M_ConvertF16ToF32/aligned:1/real_time \| 46631 ns \| 46391 ns \| \|M_ConvertF16ToF32_unroll2/aligned:0/real_time \| 44074 ns \| 44392 ns \| \|M_ConvertF16ToF32_unroll2/aligned:1/real_time \| 44726 ns \| 45226 ns \| \|M_ConvertF32ToF16/baseline/real_time \| 520109 ns \| 527329 ns \| \|M_ConvertF32ToF16/aligned:0/real_time \| 73610 ns \| 74015 ns \| \|M_ConvertF32ToF16/aligned:1/real_time \| 71557 ns \| 71525 ns \| \|M_ConvertF32ToF16_unroll2/aligned:0/real_time \| 64227 ns \| 63374 ns \| \|M_ConvertF32ToF16_unroll2/aligned:1/real_time \| 67428 ns \| 67989 ns \| ### Motivation and Context speed up fallback implementation of Fp16 MatMulNBits	2024-09-26 13:55:40 -07:00
jingyanwangms	d0b0ecfdb9	[Running CI] Update TensorRT to 10.4 (#22049 ) ### Description TensorRT 10.4 is GA now, update to 10.4 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-26 11:10:52 -07:00
Tianlei Wu	7880342e5e	Add numeric_limits for MLFloat16 and BFloat16 (#22197 ) ### Description * Add std::numeric_limits for MLFloat16 and BFloat16. * Update some comments in csharp ORTFloat16.shared.cs. * Add unit tests (including Clip) Note that the canonical NaN is not consistent in C++ and C#. C# uses negative quiet NaN as canonical NaN, while C++ uses positive quiet NaN. The choice of CSharp Float16.NaN is to be consistent with System.Half.NaN. FP16 data returns from CUDA might have 7FFF as NaN; FP16 data from CPU provider might have 0x7E00 as NaN. Anyway there is no consistent canonical NaN in ORT right now. Because all these NaNs are aligned with IEEE spec, there shall not an issue in downstream. ### Motivation and Context std::numeric_limits is used in codebase but not defined for MLFloat16 and BFloat16. It causes some bugs like https://github.com/microsoft/onnxruntime/issues/21957 introduced by https://github.com/microsoft/onnxruntime/pull/21493.	2024-09-25 17:10:05 -07:00
liqun Fu	72b0979e8a	Fix a wrong assignment that causing mlas benchmark to crash (#22221 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-09-25 15:53:28 -07:00
saurabh	4d6019fa02	OVEP: Tensor caching fix (#22218 ) ### Description 1. changing the emplace to [] that does have a difference, emplace will only create a new entry if it doesn't already exist in the map 2. change the logic of the caching lookup to key off of input/output names instead of ort raw ptrs. 3. changes OV tensor creation for CPU allocated input/output ORT tensors. The CPU allocated input/output tensor path was re-allocating OV tensors based on the ORT input/output tensors. So we'd get 2 copies: ORT input/output tensor -> OV tensor (OVEP) -> NPU Tensor (NPU plugin). --------- Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>	2024-09-25 14:58:04 -07:00
Hector Li	50d9612bc0	change shared_ptr to unique_ptr to make the ownership clear (#22209 ) ### Description change shared_ptr to unique_ptr to make the ownership clear.	2024-09-25 12:58:46 -07:00
Claude	3494f80e83	Check if HTMLCanvasElement exists (i.e. we are not running in a webworker) (#22153 ) This fixes #22152 ### Description Tensor.fromImage fails in a webworker context, because HTMLCanvasElement does not exist: > HTMLCanvasElement is not defined ### Motivation and Context This fixes #22152 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-09-25 11:52:52 -07:00
Adrian Lizarraga	a47254eaef	Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer (#22172 ) ### Description Updates the TransposeOptimizer to also remove empty (DQ -> Q) sequences that occur at a graph output. An empty DQ->Q sequence results from a Transpose being optimized out. Consider the following example model: ![image](https://github.com/user-attachments/assets/4e7bc4eb-ea8a-463b-9672-c4ec5ef779b2) The TransposeOptimizer removes the final Transpose and leaves an empty DQ->Q->output_0 sequence. This PR ensures that the final DQ->Q is also removed. ### Motivation and Context Models with quantized output can run on QNN EP. The inference latency of a customer model is impacted by the unnecessary DQ->Q sequence at the output. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-09-24 21:02:17 -07:00
Caroline Zhu	ee6a91533c	Add BrowserStack mention to project ReadMe (#22207 ) ### Description Condition for [BrowserStack support for open-source projects](https://www.browserstack.com/open-source) ### Motivation and Context - Considering using BrowserStack for our end-to-end tests for iOS and Android	2024-09-24 17:14:14 -07:00
Adrian Lizarraga	7811839265	[QNN EP] Always fuse (DQ->Q) to a QNN Convert operator (#22205 ) ### Description Previously, we only fused (DQ -> Q) into a QNN Convert if the quantization types differed (e.g., converting uint8 to uint16). This PR always fuses DQ -> Q regardless of the quantization type because a single QNN Convert op is faster than two separate ops. Example fusions: - [CURRENTLY SUPPORTED] Convert uint8 to uint16: - `uint8 -> DQ -> Q -> uint16` becomes `uint8 -> Convert -> uint16` - [CURRENTLY SUPPORTED] Convert uint16 to uint8: - `uint16 -> DQ -> Q -> uint8` becomes `uint16 -> Convert -> uint8` - [NEW] Convert uint8 (zp0, scale0) to uint8 (zp1, scale1): - `uint8(zp0/scale0) -> DQ -> Q -> uint8(zp1/scale1)` becomes `uint8(zp0/scale0) -> Convert -> uint8(zp1/scale1)` - [NEW] Convert uint16 (zp0, scale0) to uint16 (zp1, scale1): - `uint16(zp0/scale0) -> DQ -> Q -> uint16(zp1/scale1)` becomes `uint16(zp0/scale0) -> Convert -> uint16(zp1/scale1)` ### Motivation and Context The Transpose optimizer will normally remove empty DQ->Q sequences if the quantization params are equal. However, for cases in which the quantization params are not equal, QNN EP should convert DQ->Q to a single QNN Convert op for performance. This affects a customer model.	2024-09-24 15:51:32 -07:00
amarin16	eb2506d77a	Add MLFloat16 support for LayerNormalization, SkipLayerNormalization (#22063 ) Add `MLFloat16` support for: - `LayerNormalization` - `SimplifiedLayerNormalization` - `SkipLayerNormalization` - `SkipSimplifiedLayerNormalization` There are existing `LayerNormTest` unit tests that cover the `MLFloat16` functionality for `LayerNormalization` once `MLFloat16` is registered (for example [`LayerNormTest.LayerNorm_Scale_Float16Input`](`91c916f9c6/onnxruntime/test/contrib_ops/layer_norm_op_test.cc (L112)`)). Similarly, there are unit tests such as [`SkipLayerNormTest.SkipLayerNormBatch1_Float16`](`91c916f9c6/onnxruntime/test/contrib_ops/skiplayernorm_op_test.cc (L255)`) that cover MLFloat16 inputs for `SkipLayerNormalization`.	2024-09-24 15:06:27 -07:00
chenduan-amd	61996332ad	[VitisAI] support run_options in vitisai EP end (#22029 ) ### Description add OnRunStart() method for Vitis AI execution provider ### Motivation and Context To dynamically obtain some runtime parameters during execution, use run_options within the Vitis AI execution provider (EP).	2024-09-24 14:37:05 -07:00
George Wu	7727b4b909	[TensorRT EP] update gen_trt_engine_wrapper_onnx_model.py script (#22184 ) update script which was using deprecated num_bindings to num_io_tensors tested on an engine dumped by trtexec and loaded the engine using onnxruntime-gpu 1.19.2 python package.	2024-09-24 14:34:05 -07:00
Ye Wang	6cc06ad069	GQA MLFloat16 cpu (#22102 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <you@example.com>	2024-09-24 09:51:59 -07:00
Hector Li	5fa4505d1b	Set enable_htp_fp16_precision default to true (#22186 ) ### Description Set enable_htp_fp16_precision default to true for HTP backend.	2024-09-24 09:37:53 -07:00

... 4 5 6 7 8 ...

11997 commits