onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-17 01:44:45 +00:00

Author	SHA1	Message	Date
Scott McKay	04ff0ceeed	Merge	2025-01-18 11:24:02 +10:00
Scott McKay	e915f01b41	Merge	2025-01-18 11:22:58 +10:00
Scott McKay	a52e268613	Fix 2 more x86 issues	2025-01-18 09:40:27 +10:00
Scott McKay	e84eb00af1	Fix x86 error	2025-01-17 19:10:43 +10:00
Scott McKay	5db0b520c4	Fix x86 build	2025-01-16 20:59:40 +10:00
Scott McKay	453f13a2b5	Address PR comments Add unit tests	2025-01-16 19:43:46 +10:00
Scott McKay	45d5906358	Merge	2025-01-14 10:21:06 +10:00
Scott McKay	0e145e0d0b	Tweak comment	2025-01-08 07:45:52 +10:00
Scott McKay	0dcf0864d3	Update test to use 128 bytes for initializer so it can be allocated externally.	2025-01-07 18:54:38 +10:00
Scott McKay	d360f76626	Merge	2025-01-07 16:14:18 +10:00
Scott McKay	347bd7a3f2	Take ownership of node attributes for consistency Updates comments for clarity. Copy external data into initializer when saving model for debugging.	2025-01-07 16:11:35 +10:00
Changming Sun	704523c2d8	[build] Be compatible with the latest protobuf (#23260 ) Resolve #21308	2025-01-06 13:10:43 -08:00
Changming Sun	c6cbda3257	Update Python-Cuda-Publishing-Pipeline (#23253 ) ### Description 1. Currently Python-Cuda-Publishing-Pipeline only publishes Linux wheels, not Windows wheels. It is because recently we refactored the upstream pipeline("Python-CUDA-Packaging-Pipeline") to use 1ES PT. This PR fixed the issue 2. tools/ci_build/github/azure-pipelines/stages/py-win-gpu-stage.yml no longer includes component-governance-component-detection-steps.yml , because 1ES PT already inserted such a thing 3. Delete tools/ci_build/github/windows/eager/requirements.txt because it is no longer used. ### Motivation and Context The "Python-CUDA-Packaging-Pipeline" is for CUDA 12. "Python CUDA ALT Packaging Pipeline" is for CUDA 11. The two pipelines are very similar, except the CUDA versions are different. Each of them has three parts: build, test, publish. "Python-CUDA-Packaging-Pipeline" is the first part: build. "Python CUDA12 Package Test Pipeline" is the second part. "Python-Cuda-Publishing-Pipeline" is the third part that publishes the packages to an internal ADO feed.	2025-01-06 11:50:58 -08:00
Yulong Wang	c53c9caf17	[js] update mocha to v11.0.1 (#23254 ) ### Description Update `mocha` to v11.0.1 and `fs-extra` to v11.2.0 ``` # npm audit report nanoid <3.3.8 Severity: moderate Predictable results in nanoid generation when given non-integer values - https://github.com/advisories/GHSA-mwcw-c2x4-8c55 fix available via `npm audit fix` node_modules/nanoid mocha 8.2.0 - 10.2.0 Depends on vulnerable versions of nanoid node_modules/mocha 2 moderate severity vulnerabilities ```	2025-01-05 22:29:02 -08:00
Yulong Wang	21b4d2ac9f	fix pipeline build-perf-test-binaries (#23255 )	2025-01-05 22:28:41 -08:00
Wu, Junze	2a16ad0215	[js/node] add proxy agent support for onnxruntime-node install script (#23232 ) ### Description Add proxy agent to fetch request ### Motivation and Context Fixes #23231 --------- Signed-off-by: Junze Wu <junze.wu@intel.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2025-01-04 20:27:55 -08:00
Changming Sun	b7ef81a034	Move Linux GPU CI pipeline to A10 (#23235 ) Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ```	2025-01-04 19:11:37 -08:00
Jiajia Qin	4247153bb2	[webgpu] Add kernel type to profile info (#23167 ) ### Description This PR is convenient to do post processing for the generated json file when profiling is enabled. Kernel type can be used to aggregate the same type kernels' overall time.	2025-01-03 14:28:48 -08:00
Yulong Wang	5c2e60c5af	[js/node] update install script to allow use proxy (#23242 ) ### Description Use `https.get` instead of `fetch` in ORT Nodejs binding package install script. ### Motivation and Context According to discussions in #23232, the package `global-agent` cannot work with `fetch` API. To make it work with the proxy agent, this PR replaces the `fetch` API with `https.get` in the install script.	2025-01-03 14:27:15 -08:00
Changming Sun	5d692b0136	Merge web machine pools (#23243 ) ### Description The Web CI pipeline uses three different Windows machine pools: 1. onnxruntime-Win2022-webgpu-A10 2. onnxruntime-Win2022-VS2022-webgpu-A10 3. onnxruntime-Win-CPU-2022-web This PR merges them together to reduce ongoing maintenance cost.	2025-01-03 13:53:17 -08:00
Yueqing Zhang	aedb49beb4	[VitisAI] change all support tensor type from ir 9 to ir 10 (#23204 ) ### Description <!-- Describe your changes. --> Changed all support tensor type from ir 9 to ir 10. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - See issue https://github.com/microsoft/onnxruntime/issues/23205 Co-authored-by: Yueqing Zhang <yueqingz@amd.com>	2025-01-02 06:45:21 -08:00
Yifan Li	bc91f5c72e	[TensorRT EP] Fix to build ORT on legacy TRT8.5 (#23215 ) ### Description <!-- Describe your changes. --> For legacy jetson users who use jetpack 5.x, the latest TRT version is 8.5. Add version check to newer trt features to fix build on jetpack 5.x (cuda11.8+gcc11 are required) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-01 19:24:24 -08:00
xhcao	a3833a5e79	[js/webgpu] validate transpose perm if specified (#23197 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-01 15:58:54 -08:00
Dmitry Deshevoy	0b87bccca8	[CUDA] Make cubins const (#23225 ) ### Description Make arrays with cubin data const. ### Motivation and Context Non-const arrays are put into the .data section which might cause excessive memory usage in some scenarios. Making cubin arrays const allows them to be put into the .rodata section.	2024-12-31 16:20:21 -08:00
Changming Sun	afd3e81c94	Remove PostBuildCleanup (#23233 ) Remove PostBuildCleanup tasks since it is deprecated. It is to address a warning in our pipelines: "Task 'Post Build Cleanup' version 3 (PostBuildCleanup@3) is dependent on a Node version (6) that is end-of-life. Contact the extension owner for an updated version of the task. Task maintainers should review Node upgrade guidance: https://aka.ms/node-runner-guidance" Now the cleanup is controlled in another place: https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/workspace?view=azure-pipelines The code change was generated by the following Linux command: ```bash find . -name \*.yml -exec sed -i '/PostBuildCleanup/,+2d' {} \; ```	2024-12-31 13:12:33 -08:00
Jean-Michaël Celerier	2116fd1999	Update onnxruntime_c_api.h to work with MinGW (#23169 ) The SAL2 macros are not always available there ### Description Make SAL2 macros only available on MSVC. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/1175	2024-12-31 11:05:10 -08:00
Changming Sun	69bb53db85	Enable delay loading hooker for python packages (#23227 ) ### Description Enable delay loading hooker for python packages	2024-12-31 10:12:31 -08:00
wejoncy	86870114eb	[CoreML] support coreml model cache (#23065 ) ### Description Refactor compute plan profiling Support cache coreml model to speed up session initialization. this is only support by user provided entry and user responsible to manage the cache With the cache, session initialization time can be reduced by 50% or more: \|model\| before\| after\| \|--\|--\|--\| \|yolo11.onnx\| 0.6s\|0.1s\| \|yolo11-fp16.onnx\|1.8s\|0.1s\| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: wejoncy <wejoncy@.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-12-31 09:29:41 +08:00
Scott McKay	6fb01c19a7	Remove temp debug code	2024-12-30 12:01:41 +10:00
Scott McKay	019edc9264	Fix minimal build. Fix some more old 'graph api' naming	2024-12-30 11:15:03 +10:00
Scott McKay	6f2a5c3c46	More debug info	2024-12-30 10:37:06 +10:00
Scott McKay	0dc1e6ee61	Add Constant test	2024-12-30 09:42:34 +10:00
Wanming Lin	2d05c4bcd9	[WebNN] Support SkipSimplifiedLayerNormalization op (#23151 ) The algorithm of `SkipSimplifiedLayerNormalization` is quite similar to the `SimplifiedLayerNormalization`, only different is `SkipSimplifiedLayerNormalization` provides an additional output used for calculating the sum of the input, skip and bias (if it exits). BTW, fix a bug in `SimplifiedLayerNormalization`, adding bias if it exits.	2024-12-24 12:44:14 -08:00
liqun Fu	a9a881cc98	Integrate onnx 1.17.0 (#21897 ) ### Description <!-- Describe your changes. --> for ORT 1.21.0 release Create following related issues to track skipped tests due to updated ONNX operators in the ONNX 1.17.0 release: https://github.com/microsoft/onnxruntime/issues/23162 https://github.com/microsoft/onnxruntime/issues/23164 https://github.com/microsoft/onnxruntime/issues/23163 https://github.com/microsoft/onnxruntime/issues/23161 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Liqun Fu <liqun.fu@microsoft.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yifan Li <109183385+yf711@users.noreply.github.com> Co-authored-by: yf711 <yifanl@microsoft.com>	2024-12-24 09:02:02 -08:00
Scott McKay	002b6cc238	Fix some more builds/tests	2024-12-24 17:12:46 +10:00
Scott McKay	275f762b3d	Last linux build error fixes.	2024-12-24 08:43:30 +10:00
Scott McKay	5e85fce91e	Add missed changed.	2024-12-24 07:52:29 +10:00
Scott McKay	d8ef92b4ce	Remove unused function to fix build error. Fix some long lines.	2024-12-24 07:24:17 +10:00
Adrian Lizarraga	81cd6eacd0	[QNN EP] Fix multithread sync bug in ETW callback (#23156 ) ### Description Fixes crash in QNN dlls when an ETW callback tries to change the QNN log level. This is caused by a function that does not lock a mutex before modifying the QNN log level. ### Motivation and Context An ETW callback into QNN EP leads to a crash within QNN SDK dlls. It happens approximately 1 out of 3 full QNN unit tests runs. The cause is a multithreading synchronization bug in QNN EP. We're not always locking a mutex when ETW calls QNN EP to notify of ETW config change. There are two branches in the QNN EP callback function that try to update the QNN log handle. One branch correctly locks a mutex, but other does not lock it at all. This causes crashes within QNN dlls. - Does not lock mutex: [onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at main · microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L426) - Locks mutex: [onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at main · microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L442) The fix is to lock the mutex in both paths.	2024-12-23 10:02:04 -08:00
Scott McKay	41fb824df9	Fix some linux build errors.	2024-12-23 21:49:37 +10:00
Scott McKay	351d12df9e	Improve consistency. Update some comments.	2024-12-23 19:31:21 +10:00
Scott McKay	dece8b8e6a	Model Builder API - Create new model - Augment existing model	2024-12-23 18:39:14 +10:00
amancini-N	c6ba7edd83	Enable pointer-generator T5 models in BeamSearch (#23134 ) ### Description Introduces a new optional input (encoder_ibnput_ids) in the decoder graph of the T5 implementation for BeamSearch. This allows usage of pointer generator networks in decoder graph. ### Motivation and Context - Fixes #23123	2024-12-22 21:30:49 -08:00
Yueqing Zhang	ebdbbb7531	[VitisAI] Int4 support (#22850 ) ### Description <!-- Describe your changes. --> 1. Add support for throwing error when hardware is not supported for VitisAI. 2. Add support for unloading VitisAI EP. 3. Add API for Win25. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is requirement for Win25	2024-12-20 22:03:27 -08:00
Yulong Wang	6806174096	fix webgpu delay load test (#23157 ) ### Description This change fixes the WebGPU delay load test. <details> <summary>Fix UB in macro</summary> The following C++ code outputs `2, 1` in MSVC, while it outputs `1, 1` in GCC: ```c++ #include <iostream> #define A 1 #define B 1 #define ENABLE defined(A) && defined(B) #if ENABLE int x = 1; #else int x = 2; #endif #if defined(A) && defined(B) int y = 1; #else int y = 2; #endif int main() { std::cout << x << ", " << y << "\n"; } ``` Clang reports `macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]`. </details> <details> <summary>Fix condition of build option onnxruntime_ENABLE_DELAY_LOADING_WIN_DLLS</summary> Delay load is explicitly disabled when python binding is being built. modifies the condition. </details>	2024-12-20 13:37:12 -08:00
Changming Sun	fcc34da5e9	Fix a tiny problem in winml.cmake (#23173 ) ### Description CMake's [target_link_libraries](https://cmake.org/cmake/help/latest/command/target_link_libraries.html#id2) function accepts plain library name(like `re2`) or target name(like `re2::re2`) or some other kinds of names. "plain library names" are old-fashioned, for compatibility only. We should use target names. ### Motivation and Context To make vcpkg work with winml build. See #23158	2024-12-20 11:48:43 -08:00
Dmitri Smirnov	00b262dbb4	Implement pre-packed blobs serialization on disk and their memory mapping on load (#23069 ) ### Description <!-- Describe your changes. --> Pre-packing is a feature, that allows kernels to re-arrange weights data to gain performance at interference time Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated. This change enables pre-packed data to be stored on disk alongside with the external initializers. The pre-packed blobs are memory mapped and are loaded into either the X-session shared container or a new container that shares pre-packed blobs within the session. With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data. A separate container owned by a root `SessionState` owns and shares the data when X-session sharing is not enabled. To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing `ToGraphProtoWithExternalInitializers` function. To externalize the pre-packed weights, we introduce a new session option `kOrtSessionOptionsSavePrePackedConstantInitializers.` Note, that pre-packing should be enabled (default) for this to work. `ToGraphProtoWithExternalInitializers`function is modified to recurse into subgraphs to make sure we properly account for local initializer names. In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage by pre-packed initializers and externalize them.	2024-12-20 10:49:08 -08:00
xhcao	29bccad96d	[webgpu] fix compiling error (#23139 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-20 09:05:23 -08:00
mingyue	4aca8f33df	[Bug Fix] Missing CustomOp SchemaRegister when generator EPContext ONNX model (#23091 ) ### Description Enhancements to EPContext Operations: 1. Introduced support for the bfloat16 data type in EPContext operations. 2. Bug Fix: Missing Custom OP Schema Registration when generator EPContext ONNX model --------- Co-authored-by: mingyue <mingyue@xilinx.com> Co-authored-by: Hector Li <hecli@microsoft.com>	2024-12-19 16:47:13 -08:00
Jiajia Qin	7c782f6741	[webgpu] Always use tile matmulnbits for block_size = 32 (#23140 ) ### Description After the optimization of prefill time with #23102, it seems that always using the tile matmulnibits with block_size = 32 can bring better performance even for discrete gpu for phi3 model. Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my NV RTX 2000 GPU.	2024-12-19 16:22:53 -08:00

1 2 3 4 5 ...

12166 commits