onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-14 18:12:05 +00:00

Author	SHA1	Message	Date
sheetalarkadam	dd2ea8469e	Add qnn android package (#22296 ) ### Description Pre built QNN Android package ### Future Work 1. Setting up CI with Browserstack- onnxruntime_tests and Android test 2. ESRP Release to Maven	2024-10-10 10:37:22 -07:00
Kevin Chen	709368ea14	Remove deprecation warnings for TensorRT 10.5 builds (#22374 ) ### Description In TensorRT 10.5, the APIs `platformHasFastFp16` and `platformHasFastInt8` have been deprecated. Ignore these deprecation warnings. Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2024-10-10 08:49:10 -07:00
Tianlei Wu	2584d02c5e	Fix SAM2 benchmark script on cuda graph (#22377 ) ### Description Update segment anything 2 benchmark script: (1) Fix cuda graph in benchmark. Make sure --use_cuda_graph takes effect and random_inputs() generates according to the dtype of the model. (2) Add a parameter to enable profiling. (3) Use latest cuda 12.6.2 and cudnn 9.5. (4) Update README.md. ### Motivation and Context Previous, --use_cuda_graph does not take effect. This fixes the benchmark.	2024-10-10 08:46:44 -07:00
Luis E. P.	1bc546af61	Add SetEpDynamicOptions and remove workload_type from run/session options (#22282 ) ### Description Add SetEpDynamicOptions and Remove workload_type from run/session options. ### Motivation and Context Added SetEpDynamicOptions as a dynamic way of changing EP settings even in the middle of a Run Using workload_type run/session options to set Efficient/Default mode for workloads does not cover all the scenarios and can lead to priority inversions. Working on a new API to support setting Efficient/Default mode for workloads. --------- Co-authored-by: Luis E. Pena <luispena@microsoft.com>	2024-10-09 22:54:22 -07:00
Changming Sun	2bef89c171	Upgrade absl to the latest released version (#22365 ) ### Description Resolve #21976 . ABSL generally does not have forward/backward compatibility. Our code is only compatible with one fixed LTS version. So it's important to fix the version number there when using find_package to detect an installed version.	2024-10-09 20:21:40 -07:00
Changming Sun	dcf1e0c3b0	Re-enable CUDA 12 python package test pipeline (#22370 ) ### Description It runs after "Python-CUDA-Packaging-Pipeline" that runs on a CPU machine that skipped all tests. This testing pipeline is for doing the tests.	2024-10-09 20:21:27 -07:00
Yi Zhang	25b1c38e87	Add conv fp16 kernel in xnnpack EP (#22301 ) ### Description Add FP16 kernels of Conv and ConvTranspose [AB#50186](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50186) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ---------	2024-10-10 08:48:09 +08:00
Tianlei Wu	2b0ea6c834	Fix security warning: use sha256 instead of md5 (#22373 ) ### Description Fix S360 warning: Use of unapproved hash algorithm or API MD5.	2024-10-09 17:24:46 -07:00
Tianlei Wu	5cc2529379	Fix EmbedLayerNormalization fusion related to LayerNormalization optional input (#22371 ) This fixes a bug found by libfuzzer: LayerNormalization third input (beta) is optional. The following code has potential out of bound access if the input is not available: ``` NodeArg* beta = layer_norm_node.MutableInputDefs()[2]; ``` This adds a check to ensure the third input exists before fusion. [AB#49036](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/49036)	2024-10-09 17:24:21 -07:00
Tianlei Wu	071b607807	[CUDA] Add CUDA_VERSION and CUDNN_VERSION etc. arguments to Dockerfile.cuda (#22351 ) ### Description * Add a few arguments CUDA_VERSION, CUDNN_VERSION, OS, GIT_COMMIT, GIT_BRANCH and ONNXRUNTIME_VERSION to the Dockerfile.cuda to allow for more flexibility in the build process. * Update README.md to include the new arguments and their usage. * Output labels to image so that it is easy to inspect the image. Available CUDA versions for ubuntu 24.04 can be found [here](https://hub.docker.com/r/nvidia/cuda/tags), and available CUDNN versions can be found [here](https://pypi.org/project/nvidia-cudnn-cu12/#history). Example command line to build docker image: ``` docker build -t onnxruntime-cuda --build-arg CUDA_VERSION=12.6.1 \ --build-arg CUDNN_VERSION=9.5.0.50 \ --build-arg GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD) \ --build-arg GIT_COMMIT=$(git rev-parse HEAD) \ --build-arg ONNXRUNTIME_VERSION=$(cat ../VERSION_NUMBER) \ -f Dockerfile.cuda .. ``` Example labels from `docker inspect onnxruntime-cuda`: ``` "Labels": { "CUDA_VERSION": "12.6.1", "CUDNN_VERSION": "9.5.0.50", "maintainer": "Changming Sun <chasun@microsoft.com>", "onnxruntime_git_branch": "main", "onnxruntime_git_commit": "bc84958dcef5c6017ae58085f55b669efd74f4a5", "onnxruntime_version": "1.20.0", "org.opencontainers.image.ref.name": "ubuntu", "org.opencontainers.image.version": "24.04" } ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/22339 has hard-coded the cuda and cudnn versions. User might want to choose specified cuda and cudnn version during building docker image.	2024-10-09 12:06:33 -07:00
Hector Li	3b00024b55	Fix the QNN nuget package issue (#22358 ) Fix the QNN nuget package issue ### Description Inside the package, folder name \runtimes\win-arm64\ was changed to \runtimes\win-ARM64\, which breaks lib copy settings in Microsoft.ML.OnnxRuntime.QNN.props. ### Motivation and Context Fix issue: https://github.com/microsoft/onnxruntime/issues/21692	2024-10-09 08:41:23 -07:00
Changming Sun	9ee963110e	Update manylinux version (#22355 ) ### Description Update the commit from 59600894a2c1c18290944b83e989bfe618975230 to 1887322ed36d522409a6b805d4e7942cf76a8e40 ### Motivation and Context The new one has python 3.13. AB#50959	2024-10-08 23:11:11 -07:00
Pranav Sharma	c415991c16	Revert "ThreadPool: Spend less time busy waiting. (#21545 )" (#22350 ) This reverts commit `4e15b229a0`. Reason: We are seeing an increase in the number of deadlocks after this PR. We have a release coming up next week and do not have enough time to investigate the root cause, hence reverting this PR temporarily. Moreover, this is causing an increase int he binary size. ### Description We are seeing an [increase in the number of deadlocks](https://github.com/microsoft/onnxruntime/pull/22315#issuecomment-2394821893) after this PR. We have a release coming up next week and do not have enough time to investigate the root cause, hence reverting this PR temporarily. ### Motivation and Context See above.	2024-10-08 17:50:26 -07:00
Yulong Wang	c5d28cac4d	Initial WebGPU EP checkin (#22318 ) ### Description This change introduces the WebGPU EP into ONNX Runtime. To make the PR as simple as possible, this PR excluded the following: - C API changes for WebGPU EP - actual implementation of WebGPU EP. Currently in this PR, WebGPU is a stub implementation that does not register any kernel. - Python IO Binding update - Node.js IO Binding update This PR now contains only 43 file changes (while the working branch contains 130+) and hopefully this makes it easier to review. There is going to be separated PRs for each mentioned above. Current working branch: #21904	2024-10-08 16:10:46 -07:00
Edward Chen	bc84958dce	Remove duplicate fp32_bits definition. (#22340 ) Remove duplicate fp32_bits definition in onnxruntime/core/mlas/lib/cast.cpp	2024-10-08 11:57:00 -07:00
Tianlei Wu	8595e56d8e	[CUDA] Update Docker to use Ubuntu 24.04, cuda 12.6, cudnn 9.4 and python 3.12 (#22339 ) ### Description Serve as example to build and run onnxruntime-gpu with latest software stack. To build docker image: ``` git clone https://github.com/microsoft/onnxruntime cd onnxruntime/dockerfiles docker build -t onnxruntime-cuda -f Dockerfile.cuda .. ``` To launch the docker image built from previous step (and mount the code directory to run a unit test below): ``` cd .. docker run --rm -it --gpus all -v $PWD:/code onnxruntime-cuda /bin/bash ``` Then run the following in docker image to verify that the cuda provider is good: ``` python /code/onnxruntime/test/python/onnxruntime_test_python_cudagraph.py ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22335	2024-10-08 09:54:46 -07:00
Changming Sun	d98340968e	Stop publishing python 3.8/3.9 packages (#22343 ) ### Description 1. Stop publishing python 3.8/3.9 packages, to align with numpy. 2. Add a trigger for CUDA12's python test pipeline.	2024-10-08 09:50:05 -07:00
aciddelgado	cc0193cd42	Fix Memory Issue GQA CPU Rotary (#22290 ) ### Description In GQA there was a memory issue which was best described by @edgchen1 [here](https://github.com/microsoft/onnxruntime/issues/22252#issuecomment-2384559255) > here's the problematic code: > > `d9de054eb5/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc (L149-L157)` > > annotated: > > ```c++ > if (packed_qkv) { > // Q is an OrtValue declared in the enclosing scope. > OrtValue RotaryQKV; > Tensor::InitOrtValue(element_type, TensorShape({batch_size, num_heads_ + 2 * kv_num_heads_, sequence_length, head_size}), allocator, RotaryQKV); > // Save pointer to Q's data in q_input. > q_input = Q.Get<Tensor>().Data<T>(); > k_input = q_input + num_heads_ * sequence_length * head_size; > q_rotary = RotaryQKV.GetMutable<Tensor>()->MutableData<T>(); > k_rotary = q_rotary + num_heads_ * sequence_length * head_size; > // Overwrite Q with RotaryQKV (OrtValues contain shared_ptr to contained value). > // Now, q_input is pointing to freed memory. > Q = RotaryQKV; > } > ``` > > later on, when we use `q_input`, there is a read access violation. > > `d9de054eb5/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc (L170-L172)` > > this problem showed up when CPU allocator sharing between sessions was enabled. in that case, the CPU allocator's arena was disabled. I suspect that the default usage of the arena hid this issue. > > though I debugged into the first branch, this appears to be a problem in both branches: > > `d9de054eb5/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc (L149-L168)` ### Motivation and Context Fixes a crucial bug. The issue was found here https://github.com/microsoft/onnxruntime/issues/22252	2024-10-08 09:22:05 -07:00
stsokolo	efb8703a25	[MIGraphX EP Support]Add rocm to transformers/benchmark.py script (#22299 ) ### Description Add ROCm EP option to benchmark.py script when using int8 quantization. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Without this change benchmarks with int8 quantization cannot be run with ROCm execution provider.	2024-10-07 21:57:18 -07:00
Xavier Dupré	407c1ab2e2	Fix conversion of TensorData, TensorsData to json (#22166 ) ### Description Fix write_calibration_table to support TensorData, TensorsData	2024-10-06 19:13:03 -07:00
Scott McKay	280c013d67	Fix warning when building on Windows (#22327 ) ### Description <!-- Describe your changes. --> Specify type to fix warning ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-06 15:51:58 +10:00
jingyanwangms	5036e63d58	Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4 (#22321 ) ### Description Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-05 18:06:06 +10:00
mingmingtasd	004bd36f3d	[WebNN EP] Support Tile operator (#22148 ) PTAL, thanks! @Honry , @fdwr thanks!	2024-10-05 00:56:55 -07:00
chenduan-amd	98a75900ef	[VitisAI]Add interface to get configurations from Sessionoption (#22096 ) ### Description Add interface to get config_options from onnxruntime. ### Motivation and Context to support config session_options after EP Append, So need get configurations on ep end.	2024-10-05 00:12:44 -07:00
Wanming Lin	39c8b3759f	[JS/WebGPU] Fixed bugs in inputs validation of Resize (#21955 ) - 'scales' and 'sizes' may be empty tensor, make sure it's 1D tensor and non-empty - Make sure 'scales' and 'sizes' if present its length is non-zero	2024-10-04 18:29:53 -07:00
Tianlei Wu	b5ef85555a	Support onnx data types (bfloat16, float8) in python I/O binding APIs (#22306 ) ### Description (1) Support onnx data types in python APIs: * IOBinding.bind_input * IOBinding.bind_output * ortvalue_from_shape_and_type (2) Add unit tests, which serves an example of running BFloat16 or Float8 models in Python. Other minor changes: (3) replace deprecated NP_TYPE_TO_TENSOR_TYPE by helper API. (4) Rename ortvalue_from_numpy_with_onnxtype to ortvalue_from_numpy_with_onnx_type. The integer of onnx element type can be found in (https://onnx.ai/onnx/api/mapping.html). Note that FLOAT4E2M1 is not supported yet. ### Motivation and Context Current python API does not support Bfloat16 and float8 (FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ) types, and other new data types like INT4, UInt4 etc. This removes the limitation. https://github.com/microsoft/onnxruntime/issues/13001 https://github.com/microsoft/onnxruntime/issues/20481 https://github.com/microsoft/onnxruntime/issues/20578	2024-10-04 17:29:15 -07:00
Dmitri Smirnov	96a1ce1c04	[C#] Address Packaging pipeline failure (#22307 ) ### Description Add new test data copy to 2 more test projects.	2024-10-04 17:28:09 -07:00
Dmitri Smirnov	9f3676bc31	Address leftover comments for Lora support (#22322 ) ### Description Address comments ### Motivation and Context Re: https://github.com/microsoft/onnxruntime/pull/22046	2024-10-04 16:43:26 -07:00
Dmitri Smirnov	0645ad19a4	[PyBind] Expose enable_mem_arena property for SessionOptions (#22323 ) ### Description Expose enable_mem_arena property for SessionOptions ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22271	2024-10-04 16:43:15 -07:00
Changming Sun	715b74d61a	Re-enable codesign for maven packages (#22308 ) ### Description PR #22217 was reverted. This PR re-enables it. ### Motivation and Context	2024-10-04 14:30:17 -07:00
Tianlei Wu	f3f33bfa05	Upgrade cutlass to 3.5.1 and cudnn frontend to 1.7.0 (#22316 ) ### Description Upgrade cutlass to 3.5.1 Upgrade cudnn_frontend to 1.7.0	2024-10-04 11:48:50 -07:00
Changming Sun	f25f3868a7	Auto regenerate LORA's fbs files (#22313 ) ### Description A left-over of PR #22046 ### Motivation and Context Right now our VCPKG pipelines are broken.	2024-10-04 10:01:19 -07:00
Edward Chen	1df215e9bb	Update arena creation check in Environment::CreateAndRegisterAllocator() to check for 32-bit builds instead of non-x86_64 builds. (#22304 )	2024-10-04 09:03:16 -07:00
jingyanwangms	bb0c1f0a05	Update cuda version in release pipeline (#22305 ) ### Description With TensorRT 10.4 update, the name of TensorRT windows package changed ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-03 22:28:28 -07:00
Ranjit Ranjan	d0ddfa9b9e	[AIX] build fix for using system install protobuf/onnx (#22302 ) ### Description Fixing merge issue occurred in https://github.com/microsoft/onnxruntime/pull/22272 ### Motivation and Context To build onnxruntime using system installed protobuf/onnx.	2024-10-03 19:29:42 -07:00
Jing Fang	a80bf8d158	Reduce matmulnbits UT time (#22303 ) ### Description Flatten MatMulNbits UT and reduce unnecessary loops. ### Motivation and Context Reduce matmulnbits UT time	2024-10-03 16:24:56 -07:00
Edward Chen	f1be92faf0	Patch fp16 to fix Xcode 16 builds with XNNPACK EP targeting x86_64. (#22294 )	2024-10-03 14:17:15 -07:00
Yi Zhang	bbb54985a8	Add MaxPool FP16 in XnnPack EP (#22258 ) ### Description Add support for FP16 kernels in the XnnPack execution provider for MaxPool operations. Fixes: [AB#50332](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50332) ### Motivation and Context The major purpose of this pull request is to add some common vars/functions and setup a consistent style for adding FP16 kernels in XnnPack EP. ---------	2024-10-03 18:28:58 +08:00
Caroline Zhu	c73e6afa6c	Migrate Android Java E2E tests from App Center to Browserstack (#22117 ) ### Description - removed installing AppCenter + pipeline step that runs AppCenter Espresso tests - added script for running AppCenter tests ### Motivation and Context App Center is getting deprecated in the next year + we have upcoming Android work that depends on working E2E testing. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-10-02 15:04:58 -07:00
Dmitri Smirnov	224f0651d0	[C#] Expose Multi-Lora support in C# (#22281 ) ### Description ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/22046	2024-10-02 10:00:43 -07:00
goldsteinn	4e15b229a0	ThreadPool: Spend less time busy waiting. (#21545 ) The purpose of the patch is primarily to save power, but it also has nice perf benefits (mostly from allowing the system to better distribute power to cores doing meaningful work). Changes are twofold: 1) Decrease WorkerLoop spin count dramatically ~10^6 -> ~10^4. The reality is after ~10^4 spins, if there hasn't been any new work added its unlikely any new work is imminent so sleep to preserve power. This aligns more closely with upstream EigenV3. 2) Use exponential backoff for waiting on memory. This saves a bit more power, and important increases the time between iterations in WorkerLoop to help accomidate the dramatically lowering spin counts. Since the tuning for both the iteration counts / backoff counts are dramatically different for hybrid/non-hybrid systems, this patch templates the affected functions and dynamically choses based on `CPUIDInfo::IsHybrid()`. This seemed like the "lightest weight" way of getting the change in, although its likely we could incur less dynamic overhead if we added the template argument to the entirety of `ThreadPoolTempl`. Measured performance on an [Intel Meteor Lake CPU](https://www.intel.com/content/www/us/en/products/sku/237329/intel-core-ultra-7-processor-165u-12m-cache-up-to-4-90-ghz/specifications.html) across a range of models. Below are the result of 3 runs with each metric being the value-before-patch / value-after-patch (so for something like inference time, lower is better). <div align="center"> <table> <tr> <th>Session creation time cost</th> <td>0.7179</td> </tr> <tr> <th>First inference time cost</th> <td>0.7156</td> </tr> <tr> <th>Total inference time cost</th> <td>1.0146</td> </tr> <tr> <th>Total inference requests</th> <td>0.8874</td> </tr> <tr> <th>Average inference time cost</th> <td>0.8800</td> </tr> <tr> <th>Total inference run time</th> <td>1.0146</td> </tr> <tr> <th>Number of inferences per second</th> <td>0.8955</td> </tr> <tr> <th>Avg CPU usage</th> <td>0.9462</td> </tr> <tr> <th>Peak working set size</th> <td>0.9922</td> </tr> <tr> <th>Runs</th> <td>1.1552</td> </tr> <tr> <th>Min Latency</th> <td>0.7283</td> </tr> <tr> <th>Max Latency</th> <td>0.9258</td> </tr> <tr> <th>P50 Latency</th> <td>0.9534</td> </tr> <tr> <th>P90 Latency</th> <td>0.9639</td> </tr> <tr> <th>P95 Latency</th> <td>0.9659</td> </tr> <tr> <th>P99 Latency</th> <td>0.9640</td> </tr> </table> </div> So the net result is a 1.16x improvement in throughput and between 1.08-1.37x improvement in latency.	2024-10-01 17:25:02 -07:00
Adam Pocock	14d1bfc34b	[java] Multi-LoRA support (#22280 ) ### Description Java parts of Multi-LoRA support - #22046. ### Motivation and Context API equivalence with Python & C#. --------- Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>	2024-10-01 13:54:37 -07:00
Dmitri Smirnov	1fc2b94644	Address Android warning error (#22285 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Build issue https://github.com/microsoft/onnxruntime/pull/22046#issuecomment-2386414899	2024-10-01 13:52:25 -07:00
Edward Chen	c24e55b1f1	[Java] Add API for appending QNN EP (#22208 ) - Add Java API for appending QNN EP - Update Java unit test setup - Fix issues with setting system properties for tests - Unify Windows/non-Windows setup to simplify	2024-10-01 10:18:04 -07:00
Tianlei Wu	e2b9ccc44a	Update SAM2 benchmark for testing torch compile modes and profiling (#22279 ) This pull request introduces several enhancements to the benchmarking process for the SAM2 model, including: (1) Add profiling capabilities. (2) test torch compile modes (none will disable compile and fallback to eager mode) (3) Update README for setting up the environment. ### Documentation Updates: * README.md: Updated instructions to create separate conda environments for GPU and CPU benchmarking, and detailed the parameters and outputs of the benchmark script. ### Benchmark Script Enhancements: * benchmark_sam2.py: Added optional parameters for enabling NVTX and PyTorch profiling, and adjusted the initialization and execution flow to incorporate these profiling options. These changes enhance the flexibility and functionality of the benchmarking process, making it easier to profile and benchmark the SAM2 model on different hardware configurations.	2024-10-01 09:51:12 -07:00
Yufeng Li	96e9c99dce	remove neural-speed (#22236 ) ### Description <!-- Describe your changes. --> NS is not developed anymore and ORT doesn't use it for int4 inference either. Remove it to clean up the code ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 09:50:44 -07:00
kunal-vaishnavi	50bda44a70	Fix equation in MatMulNBits op spec (#22253 ) ### Description This PR fixes an equation in the MatMulNBits op spec. The old formula is stated as ``` [CeilDiv((N * n_blocks_per_col + 1) * bits, 8)] ``` but it should be stated as ``` [N * CeilDiv(n_blocks_per_col * bits, 8)] ``` or as ``` [N * FloorDiv((n_blocks_per_col + 1) * bits, 8)] ``` ### Motivation and Context For models such as ChatGLM where the column size is odd, the division math can be off. For example: ![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0) With the old equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points = 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832 ``` With the new equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points= 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136 ```	2024-10-01 09:31:56 -07:00
Mauricio A Rovira Galvez	ffca096b5a	Fixes a crash on macOS 15 when using CoreML. (#22277 ) ### Description In macOS 15, apps running with CoreML will crash with an error message like this one: ``` Terminating app due to uncaught exception 'NSGenericException', reason: 'Failed to set compute_device_types_mask E5RT: Cannot provide zero compute device types. (1)' ``` This can be easily seen when building ONNXRuntime from source and running the unit tests. The fix was suggested in [this bug report](https://forums.developer.apple.com/forums/thread/757040). I've ported the change to ONNXRuntime and verified that: * The issue is resolved in macOS 15 (all unit tests pass). * The behaviour is unchanged in macOS 14. ### Motivation and Context This fixes #22275 allowing apps using ONNXRuntime with CoreML to work normally.	2024-10-01 16:06:03 +10:00
Scott McKay	ee7081b828	Fix syntax for some CoreML ML Program supported operator entries (#22268 ) ### Description <!-- Describe your changes. --> Fix syntax so usability checker works as expected. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-10-01 15:49:43 +10:00
Yang Gu	9e5153b688	[js/webgpu] Manage model download with a specific unittest option (#22214 ) Currently in debug mode, unit test will always download models to local file system, which is a bit annoying. This PR fixes this by adding a specific option to enable model download.	2024-09-30 18:27:43 -07:00

1 2 3 4 5 ...

11783 commits