onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-25 02:50:42 +00:00

Author	SHA1	Message	Date
Xavier Dupré	407c1ab2e2	Fix conversion of TensorData, TensorsData to json (#22166 ) ### Description Fix write_calibration_table to support TensorData, TensorsData	2024-10-06 19:13:03 -07:00
Scott McKay	280c013d67	Fix warning when building on Windows (#22327 ) ### Description <!-- Describe your changes. --> Specify type to fix warning ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-06 15:51:58 +10:00
jingyanwangms	5036e63d58	Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4 (#22321 ) ### Description Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-05 18:06:06 +10:00
mingmingtasd	004bd36f3d	[WebNN EP] Support Tile operator (#22148 ) PTAL, thanks! @Honry , @fdwr thanks!	2024-10-05 00:56:55 -07:00
chenduan-amd	98a75900ef	[VitisAI]Add interface to get configurations from Sessionoption (#22096 ) ### Description Add interface to get config_options from onnxruntime. ### Motivation and Context to support config session_options after EP Append, So need get configurations on ep end.	2024-10-05 00:12:44 -07:00
Wanming Lin	39c8b3759f	[JS/WebGPU] Fixed bugs in inputs validation of Resize (#21955 ) - 'scales' and 'sizes' may be empty tensor, make sure it's 1D tensor and non-empty - Make sure 'scales' and 'sizes' if present its length is non-zero	2024-10-04 18:29:53 -07:00
Tianlei Wu	b5ef85555a	Support onnx data types (bfloat16, float8) in python I/O binding APIs (#22306 ) ### Description (1) Support onnx data types in python APIs: * IOBinding.bind_input * IOBinding.bind_output * ortvalue_from_shape_and_type (2) Add unit tests, which serves an example of running BFloat16 or Float8 models in Python. Other minor changes: (3) replace deprecated NP_TYPE_TO_TENSOR_TYPE by helper API. (4) Rename ortvalue_from_numpy_with_onnxtype to ortvalue_from_numpy_with_onnx_type. The integer of onnx element type can be found in (https://onnx.ai/onnx/api/mapping.html). Note that FLOAT4E2M1 is not supported yet. ### Motivation and Context Current python API does not support Bfloat16 and float8 (FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ) types, and other new data types like INT4, UInt4 etc. This removes the limitation. https://github.com/microsoft/onnxruntime/issues/13001 https://github.com/microsoft/onnxruntime/issues/20481 https://github.com/microsoft/onnxruntime/issues/20578	2024-10-04 17:29:15 -07:00
Dmitri Smirnov	96a1ce1c04	[C#] Address Packaging pipeline failure (#22307 ) ### Description Add new test data copy to 2 more test projects.	2024-10-04 17:28:09 -07:00
Dmitri Smirnov	9f3676bc31	Address leftover comments for Lora support (#22322 ) ### Description Address comments ### Motivation and Context Re: https://github.com/microsoft/onnxruntime/pull/22046	2024-10-04 16:43:26 -07:00
Dmitri Smirnov	0645ad19a4	[PyBind] Expose enable_mem_arena property for SessionOptions (#22323 ) ### Description Expose enable_mem_arena property for SessionOptions ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22271	2024-10-04 16:43:15 -07:00
Changming Sun	715b74d61a	Re-enable codesign for maven packages (#22308 ) ### Description PR #22217 was reverted. This PR re-enables it. ### Motivation and Context	2024-10-04 14:30:17 -07:00
Tianlei Wu	f3f33bfa05	Upgrade cutlass to 3.5.1 and cudnn frontend to 1.7.0 (#22316 ) ### Description Upgrade cutlass to 3.5.1 Upgrade cudnn_frontend to 1.7.0	2024-10-04 11:48:50 -07:00
Changming Sun	f25f3868a7	Auto regenerate LORA's fbs files (#22313 ) ### Description A left-over of PR #22046 ### Motivation and Context Right now our VCPKG pipelines are broken.	2024-10-04 10:01:19 -07:00
Edward Chen	1df215e9bb	Update arena creation check in Environment::CreateAndRegisterAllocator() to check for 32-bit builds instead of non-x86_64 builds. (#22304 )	2024-10-04 09:03:16 -07:00
jingyanwangms	bb0c1f0a05	Update cuda version in release pipeline (#22305 ) ### Description With TensorRT 10.4 update, the name of TensorRT windows package changed ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-03 22:28:28 -07:00
Ranjit Ranjan	d0ddfa9b9e	[AIX] build fix for using system install protobuf/onnx (#22302 ) ### Description Fixing merge issue occurred in https://github.com/microsoft/onnxruntime/pull/22272 ### Motivation and Context To build onnxruntime using system installed protobuf/onnx.	2024-10-03 19:29:42 -07:00
Jing Fang	a80bf8d158	Reduce matmulnbits UT time (#22303 ) ### Description Flatten MatMulNbits UT and reduce unnecessary loops. ### Motivation and Context Reduce matmulnbits UT time	2024-10-03 16:24:56 -07:00
Edward Chen	f1be92faf0	Patch fp16 to fix Xcode 16 builds with XNNPACK EP targeting x86_64. (#22294 )	2024-10-03 14:17:15 -07:00
Yi Zhang	bbb54985a8	Add MaxPool FP16 in XnnPack EP (#22258 ) ### Description Add support for FP16 kernels in the XnnPack execution provider for MaxPool operations. Fixes: [AB#50332](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50332) ### Motivation and Context The major purpose of this pull request is to add some common vars/functions and setup a consistent style for adding FP16 kernels in XnnPack EP. ---------	2024-10-03 18:28:58 +08:00
Caroline Zhu	c73e6afa6c	Migrate Android Java E2E tests from App Center to Browserstack (#22117 ) ### Description - removed installing AppCenter + pipeline step that runs AppCenter Espresso tests - added script for running AppCenter tests ### Motivation and Context App Center is getting deprecated in the next year + we have upcoming Android work that depends on working E2E testing. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-10-02 15:04:58 -07:00
Dmitri Smirnov	224f0651d0	[C#] Expose Multi-Lora support in C# (#22281 ) ### Description ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/22046	2024-10-02 10:00:43 -07:00
goldsteinn	4e15b229a0	ThreadPool: Spend less time busy waiting. (#21545 ) The purpose of the patch is primarily to save power, but it also has nice perf benefits (mostly from allowing the system to better distribute power to cores doing meaningful work). Changes are twofold: 1) Decrease WorkerLoop spin count dramatically ~10^6 -> ~10^4. The reality is after ~10^4 spins, if there hasn't been any new work added its unlikely any new work is imminent so sleep to preserve power. This aligns more closely with upstream EigenV3. 2) Use exponential backoff for waiting on memory. This saves a bit more power, and important increases the time between iterations in WorkerLoop to help accomidate the dramatically lowering spin counts. Since the tuning for both the iteration counts / backoff counts are dramatically different for hybrid/non-hybrid systems, this patch templates the affected functions and dynamically choses based on `CPUIDInfo::IsHybrid()`. This seemed like the "lightest weight" way of getting the change in, although its likely we could incur less dynamic overhead if we added the template argument to the entirety of `ThreadPoolTempl`. Measured performance on an [Intel Meteor Lake CPU](https://www.intel.com/content/www/us/en/products/sku/237329/intel-core-ultra-7-processor-165u-12m-cache-up-to-4-90-ghz/specifications.html) across a range of models. Below are the result of 3 runs with each metric being the value-before-patch / value-after-patch (so for something like inference time, lower is better). <div align="center"> <table> <tr> <th>Session creation time cost</th> <td>0.7179</td> </tr> <tr> <th>First inference time cost</th> <td>0.7156</td> </tr> <tr> <th>Total inference time cost</th> <td>1.0146</td> </tr> <tr> <th>Total inference requests</th> <td>0.8874</td> </tr> <tr> <th>Average inference time cost</th> <td>0.8800</td> </tr> <tr> <th>Total inference run time</th> <td>1.0146</td> </tr> <tr> <th>Number of inferences per second</th> <td>0.8955</td> </tr> <tr> <th>Avg CPU usage</th> <td>0.9462</td> </tr> <tr> <th>Peak working set size</th> <td>0.9922</td> </tr> <tr> <th>Runs</th> <td>1.1552</td> </tr> <tr> <th>Min Latency</th> <td>0.7283</td> </tr> <tr> <th>Max Latency</th> <td>0.9258</td> </tr> <tr> <th>P50 Latency</th> <td>0.9534</td> </tr> <tr> <th>P90 Latency</th> <td>0.9639</td> </tr> <tr> <th>P95 Latency</th> <td>0.9659</td> </tr> <tr> <th>P99 Latency</th> <td>0.9640</td> </tr> </table> </div> So the net result is a 1.16x improvement in throughput and between 1.08-1.37x improvement in latency.	2024-10-01 17:25:02 -07:00
Adam Pocock	14d1bfc34b	[java] Multi-LoRA support (#22280 ) ### Description Java parts of Multi-LoRA support - #22046. ### Motivation and Context API equivalence with Python & C#. --------- Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>	2024-10-01 13:54:37 -07:00
Dmitri Smirnov	1fc2b94644	Address Android warning error (#22285 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Build issue https://github.com/microsoft/onnxruntime/pull/22046#issuecomment-2386414899	2024-10-01 13:52:25 -07:00
Edward Chen	c24e55b1f1	[Java] Add API for appending QNN EP (#22208 ) - Add Java API for appending QNN EP - Update Java unit test setup - Fix issues with setting system properties for tests - Unify Windows/non-Windows setup to simplify	2024-10-01 10:18:04 -07:00
Tianlei Wu	e2b9ccc44a	Update SAM2 benchmark for testing torch compile modes and profiling (#22279 ) This pull request introduces several enhancements to the benchmarking process for the SAM2 model, including: (1) Add profiling capabilities. (2) test torch compile modes (none will disable compile and fallback to eager mode) (3) Update README for setting up the environment. ### Documentation Updates: * README.md: Updated instructions to create separate conda environments for GPU and CPU benchmarking, and detailed the parameters and outputs of the benchmark script. ### Benchmark Script Enhancements: * benchmark_sam2.py: Added optional parameters for enabling NVTX and PyTorch profiling, and adjusted the initialization and execution flow to incorporate these profiling options. These changes enhance the flexibility and functionality of the benchmarking process, making it easier to profile and benchmark the SAM2 model on different hardware configurations.	2024-10-01 09:51:12 -07:00
Yufeng Li	96e9c99dce	remove neural-speed (#22236 ) ### Description <!-- Describe your changes. --> NS is not developed anymore and ORT doesn't use it for int4 inference either. Remove it to clean up the code ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 09:50:44 -07:00
kunal-vaishnavi	50bda44a70	Fix equation in MatMulNBits op spec (#22253 ) ### Description This PR fixes an equation in the MatMulNBits op spec. The old formula is stated as ``` [CeilDiv((N * n_blocks_per_col + 1) * bits, 8)] ``` but it should be stated as ``` [N * CeilDiv(n_blocks_per_col * bits, 8)] ``` or as ``` [N * FloorDiv((n_blocks_per_col + 1) * bits, 8)] ``` ### Motivation and Context For models such as ChatGLM where the column size is odd, the division math can be off. For example: ![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0) With the old equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points = 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832 ``` With the new equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points= 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136 ```	2024-10-01 09:31:56 -07:00
Mauricio A Rovira Galvez	ffca096b5a	Fixes a crash on macOS 15 when using CoreML. (#22277 ) ### Description In macOS 15, apps running with CoreML will crash with an error message like this one: ``` Terminating app due to uncaught exception 'NSGenericException', reason: 'Failed to set compute_device_types_mask E5RT: Cannot provide zero compute device types. (1)' ``` This can be easily seen when building ONNXRuntime from source and running the unit tests. The fix was suggested in [this bug report](https://forums.developer.apple.com/forums/thread/757040). I've ported the change to ONNXRuntime and verified that: * The issue is resolved in macOS 15 (all unit tests pass). * The behaviour is unchanged in macOS 14. ### Motivation and Context This fixes #22275 allowing apps using ONNXRuntime with CoreML to work normally.	2024-10-01 16:06:03 +10:00
Scott McKay	ee7081b828	Fix syntax for some CoreML ML Program supported operator entries (#22268 ) ### Description <!-- Describe your changes. --> Fix syntax so usability checker works as expected. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 15:49:43 +10:00
Yang Gu	9e5153b688	[js/webgpu] Manage model download with a specific unittest option (#22214 ) Currently in debug mode, unit test will always download models to local file system, which is a bit annoying. This PR fixes this by adding a specific option to enable model download.	2024-09-30 18:27:43 -07:00
Yang Gu	c75f4a09b7	[js/webgpu] Remove the limitation on axis in softmax (#22231 ) In current implementation, axis in softmax has to be the last, which is an obvious limitation. This PR removes this limitation and will fix issues #20710 and #22176.	2024-09-30 18:27:11 -07:00
Dmitri Smirnov	d9de054eb5	Multi-Lora support (#22046 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-30 15:59:07 -07:00
Jian Chen	40bcb7664d	Revert "Jar Maven Signing - GnuPG and sha256" (#22273 ) Reverts microsoft/onnxruntime#22217	2024-09-30 15:07:59 -07:00
Jian Chen	ebcf2fcd16	Replace gradle/wrapper-validation-action with gradle/actions/wrapper-validation-action (#22224 ) ### Description Replace gradle/wrapper-validation-action with gradle/actions/wrapper-validation-action ### Motivation and Context This is recommended by https://github.com/gradle/wrapper-validation-action. This job uses deprecated functionality from the 'gradle/wrapper-validation-action' action.	2024-09-30 14:29:16 -07:00
Ranjit Ranjan	812075731c	[AIX] Build fix for using system installed protobuf/onnx (#22272 ) ### Description To fix the build issues for AIX OS while using system installed protobuf/onnx. ### Motivation and Context Code changes in this PR contains: 1. Fix for below compilation issue. ``` collect2: fatal error: library liblibprotobuf-lite not found compilation terminated. ``` 2. Adding onnx library into dependency list for test applicaitons.	2024-09-30 12:36:21 -07:00
Yi Zhang	d069475a63	Make A100 jobs in PR checks again (#22261 ) ### Description if the variable is 1, the job running on A100 in PR checks. Fixes [AB#50333](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50333) ### Motivation and Context We wish more big models which need to run on A100 can be tested in PR checks, but Azure may decommission A100 agents without notifications sometimes, which will block merging PRs. This PR is an improvement of current workaround, making those jobs only run main branch. Once we find the A100 are all decommisioned by Azure, we could change the UseA100 variable to 0 to disable the A100 jobs in PR checks	2024-09-30 08:29:30 -07:00
wejoncy	2cfe1f031d	[CoreML MLProgram] Support Float16 (1/N) (#22068 ) ### Description Support Float16 for CoreML MLProgram EP. Operations: "Add", "Mul", "Sub", "Div", "Pow", "Sqrt", "Reciprocal", "Sigmoid", "Tanh", "Relu", "LeakyRelu", "Concat", "GridSample", "GlobalAveragePool", "Clip", "DepthToSpace", "Resize", "Slice", "Conv", "ConvTranspose", "GlobalMaxPool", "Gemm", "MatMul", "AveragePool", "MaxPool", "Reshape", "Split", "Transpose" ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-09-30 17:56:47 +08:00
Yang Gu	434f0fa536	[js/webgpu] Fix the crash issue in unsqueeze (#22264 ) While allowing axes in unsqueeze to be scalar, its shape couldn't be always accessed like a vector. This PR fixes issue #22031 so that the original model could run well.	2024-09-30 02:28:16 -07:00
Yulong Wang	1bda91fc57	[js/webgpu] fix external buffer registration (#22254 ) ### Description Fixes the problem of running into failure when GPU inputs shuffled between iterations.	2024-09-28 10:36:40 -07:00
Enrico Galli	52a8c1cae8	[WebNN EP] Enable IO Bindings with MLTensor (#21301 ) ### Description Enables using the MLTensor to pass data between models. ### Motivation and Context Using MLTensor instead of ArrayBuffers reduces the number of copies between the CPU and devices as well as the renderer and GPU process in Chromium.	2024-09-27 17:24:21 -07:00
Patrice Vignola	ebda23be16	[DML EP] Fix Clip clamping (#22251 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 16:24:37 -07:00
shiyi	1e3cd86d80	[WebNN EP] Support LSTM op (#20293 ) <!-- Describe your changes. --> <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 14:23:08 -07:00
liqun Fu	f410e7c4cf	Fix mlas bench crash (#22248 ) Fix mlas bench crash --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-09-27 13:50:42 -07:00
Sumit Agarwal	529835cc46	[DML EP] Update DML to 1.15.2 (#22247 ) ### Description Update DML binary to the current latest redist version [1.15.2](https://www.nuget.org/packages/Microsoft.AI.DirectML/1.15.2).	2024-09-27 13:20:29 -07:00
Patrice Vignola	20be51525b	Support if node with sequence outputs (#22234 ) `If` nodes can have sequence outputs. Those nodes are mapped to the DML EP to be able to keep the outputs on the GPU, but they actually execute on the CPU by selecting either the `then` subgraph or the `else` subgraph.	2024-09-27 12:40:01 -07:00
Patrice Vignola	14ba2fb83c	[DML EP] Add intermediate tensor dumping for DML (#22246 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 12:39:45 -07:00
Hector Li	6e3163faa5	Update code regarding some QNN bug fixes (#22222 ) ### Description Update code regarding some QNN bug fixes: 1. QnnProfile_ExtendedEventData_t.version is not initialized in Qnn 2. Failed to finalize the graph for HardSigmoid with FP16 precision	2024-09-27 09:51:47 -07:00
Kyle	b81e76b9a6	Jar Maven Signing - GnuPG and sha256 (#22217 ) ### Description <!-- Describe your changes. --> Jar maven signing: - GnuPG - sha256. Jar packages artifacts: - onnxruntime-android-full-aar - onnxruntime-java - onnxruntime-java-gpu ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously, it is manually signed. Goal: make it automatically.	2024-09-27 17:50:06 +08:00
Tianlei Wu	ff8a48ef3b	Update SAM2 benchmark script and doc (#22238 ) (1) Fix a bug of parameters order. (2) Update benchmark script: * download test image if not exist * combine multiple csv files into one file, and remove duplicated lines (3) Add a section for benchmark in README.md	2024-09-26 20:57:03 -07:00

1 2 3 4 5 ...

11764 commits