onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-21 02:18:09 +00:00

Author	SHA1	Message	Date
Adrian Lizarraga	cf565e955d	Revert "Fix ETW Sink Initialize unproperly locking" (#21360 ) Reverts microsoft/onnxruntime#21226 Causes any onnxruntime app to hang on Windows ARM64. Our pipelines do not have the same ETW environment, so we couldn't catch it. ![image](https://github.com/user-attachments/assets/80edbf7d-be50-4cb0-a016-f390b81dc798) The call to TraceLoggingRegisterEx() recursively calls back into LazyInitialize(): LazyInitialize() -> TraceLoggingRegisterEx() -> ORT_TL_EtwEnableCallback() -> Instance() -> LazyInitialize() The original code got out of the recursive loop by checking the `initialized_` flag.	2024-07-15 17:56:08 -07:00
Jing Fang	50170c697e	[Optimizer] DQ + MatMul to MatMulNBits support: kernel changes (#21342 ) Description: ### Description This is a partial change ported from fajin/qdqmatmulnbitstoolchain. That branch has issues resolving the web CI. MatMulNBits is a heavily optimized matmul operation. Currently a MatMul can be converted to MatMulNBits to speed up the model inference. However, MatMulNBits is an ORT only op. To make the graph compatible with ONNX ops and utilize MatMulNBits at the same time, we introduce Q/DQ support for MatMulNBits. To convert MatMul ops in a model to MatMulNBits: 1. use matmul_4bits_quantizer.py to convert MatMul to DQ + MatMul using QDQ mode. 2. In ORT session, DQ + MatMul is fused to MatMulNBits #### Note MatMulNBits assume B weight is uint4. When no zp is provided, zp defaults to 8, which is different from DQ. DQ defaults zp to 0 when no zp provided. And DQ supports int4. Therefore some conversions are introduced during DQ + MatMul --> MatMulNBits step. #### Perf Using QDQ format will increase the model initialization time and memory consumption. With current implement, model init time increased from ~4s to ~9s, and memory consumption increased from ~2.8GB to ~4.8GB. The memory increase is due to 1. in optimizer, after transpose the B weight, a in-memory tensor proto is created using protobuf's arena. 2. in finalize step, when saving initializer and prepacking, ORT arena is used to create buffers for initializers. The memory allocated by arenas cannot be fully deallocated. If disable ORT arena memory allocation, the memory consumptions of both QDQ format and original format are ~2.2GB. The time increase is mainly due to multiple memory copy, but can be further optimized. ### Motivation and Context Please see description for details.	2024-07-15 15:25:40 -07:00
Jian Chen	c03e6fff4c	Combining android build and test step into one job (#21340 ) ### Description Combining android build and test step into one job ### Motivation and Context Reduce runtime by removing additional machine allocation, and artifact uploading and downloading. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-15 14:44:03 -07:00
Yifan Li	db9ee35963	[TensorRT EP] c4996 suppression to build with trt10.2ga on Windows (#21358 ) ### Description <!-- Describe your changes. --> Supress C4996 deprecated api warning as errors as a walkaround to build ORT with TRT10.2GA on Windows ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Four apis were recently declared as deprecated, which are being used by core code of TRT EP. Temporally suppress deprecated api warnings before updating these apis	2024-07-15 14:30:02 -07:00
Changming Sun	e5f18ba2c1	Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339 ) ### Description Resolve #21281 and #10589 . 1. Change libonnxruntime.so's SONAME: remove the minor and patch version. By default when creating an ELF shared object, linker will set the file's internal DT_SONAME field to the specified name which is the file name plus SOVERSION . For example, the file name for our library is libonnxruntime.so. And by default SOVERSION is the lib's VERSION number, which is something like 1.19.0. So the DT_SONAME field in libonnxruntime.so is something like libonnxruntime.so.1.18.0. You can use readelf tool to examine it. ``` readelf -d libonnxruntime.so \| grep SONAME 0x000000000000000e (SONAME) Library soname: [libonnxruntime.so.1.18.0] ``` When an executable is linked with a shared object which has a DT_SONAME field, then when the executable is run the dynamic linker will attempt to load the shared object specified by the DT_SONAME field rather than using the file name(which is libonnxruntime.so) given to the linker. After this change, the SONAME will be shorten to "libonnxruntime.so.1" instead. 2. Set default version strings for Windows DLLs, to resolve #10589	2024-07-15 14:21:34 -07:00
Edward Chen	9c2b85ad58	Fix Android build on Windows (#21304 ) - Pass a list of files instead of path separator-delimited string to project.files(). See this issue: https://github.com/gradle/gradle/issues/19817 - Check for host (instead of target) being Windows when using fallback patch program.	2024-07-15 12:29:02 -07:00
Changming Sun	dfaf18928a	Fix a path problem in onnxruntime_perf_test (#21341 ) ### Description Resolve #21267 . onnxruntime_perf_test does not work properly if the input model path url is just a single filename without any path separator. For example, ``` ./onnxruntime_perf_test -t 10 model.onnx ``` The problem was introduced in #19196 by me.	2024-07-15 10:47:02 -07:00
glen-amd	281ed8c12d	VitisAI EP Context Model (#20926 ) # Why so many commits - Runtime debugging - which is necessary - Three different approaches to EP context model - as a result testing back and forth - Windows compatibility issues - this development has been done on Linux for convenience # "Open" (?) questions - Full offloading to a specific EP - Dumping EP context models by EPs vs [by ONNXRT](`e2abba18ea/onnxruntime/core/framework/graph_partitioner.cc (L725)`) - [Node name to pick nodes](`e2abba18ea/onnxruntime/core/framework/graph_partitioner.cc (L654)`) # VitisAI EP made three variant implementations that have respective pros and cons (and of course we can combine them) ## Serialize and cache the list of compute capabilities and the original ONNX model itself ## In `ComputeCapability()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key ## In `Compile()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key # EP context model creation - Precondition Session option configuration `kOrtSessionOptionEpContextEnable` (aka "ep.context_enable") is enabled. - Approach 1 - Steps 1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext"). 2. EP implements/overrides `IExecutionProvider::GetEpContextNodes()` method. 3. ONNXRT core creates an EP context model and saves/dumps it. - `CreateEpContextModel()` in the file "graph_partitioner.cc" - In `get_ep_context_node()`, `Node::Name()` is used to check whether a node is an EP context node. This limits that EP model creation can only happen in `IExecutionProvider::Compile()`. - The workaround is (1) not implementing `IExecutionProvider::GetEpContextNodes()` and (2) dumping the EP context model by EP itself. 4. Optionally, EP can also dump the EP context model it created by iteself. - Examples - `QNNExecutionProvider` - `VitisAIExecutionProvider` - Approach 2 - Steps 1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext"). 2. EP does NOT implement `IExecutionProvider::GetEpContextNodes()` at all. 3. EP dumps the EP context model it created. - Examples - `TensorrtExecutionProvider` - UPDATES - TRT EP is switching to leveraging `IExecutionProvider::GetEpContextNodes()` - `OpenVINOExecutionProvider` (?) # What to cache in EP context nodes - Non Compilation based EPs - Examples - `VitisAIExecutionProvider` - Characteristics - Heavy lifting work happens in `IExecutionProvider::GetCapability()`. - Preconditions - `IExecutionProvider::GetCapability()` is only called once by ONNXRT. - Cache content - Serialization of a list of `ComputeCapability` - Not EP-specific - Serialized using `onnx::FunctionProto` - EP-specific cache - Compilation based EPs - Examples - `QNNExecutionProvider` - `TensorrtExecutionProvider` - `MIGraphXExecutionProvider` - `OpenVINOExecutionProvider` - Cache content - EP-specific cache # Requirements - Offline / AOT compilation of ONNX models with EP context cache - Compile somewhere, run everywhere - Pseudo code with brief explanation ``` GenerateCache(original_onnx_file, cache_onnx_file) model_buffer = load(original_onnx_file) --> Load the original ONNX model file model_buffer = decrypt(model_buffer) session_options = { kOrtSessionOptionEpContextEnable: true, kOrtSessionOptionEpContextFilePath: temp_file } --> Set necessary configs Ort::CreateSessionFromArray(model_buffer, session_options) --> The new ONNX model with EP context is created and dumped into the user specified file "temp_file" temp_buffer = encrypt(temp_file) write(temp_buffer, cache_onnx_file) --> Write the encypted context of "temp_file" into the "cache_onnx_file" file InitializeInferenceSession(cache_onnx_file) model_buffer = load(cache_onnx_file) --> Load the ONNX model with EP context from the file generated in the previous step model_buffer = decrypt(model_buffer) session_options = { } Ort::CreateSessionFromArray(model_buffer, session_options) --> Create and initalize an session with the EP context model ``` - Python code with comments - EP context model creation ```python import onnxruntime as onnxrt # Session options for creating an ONNX model with EP context cache. sess_opts = onnxrt.SessionOptions() # Verbose. sess_opts.log_severity_level = 0 # This is REQUIRED. sess_opts.add_session_config_entry("ep.context_enable", "1") # This is OPTIONAL. # Either an absolute path (preferred for now) or a relative path (WIP) is okay. # sess_opts.add_session_config_entry("ep.context_file_path", "/some/path/to/original_model_ctx.onnx") # This is OPTIONAL. sess_opts.add_session_config_entry("ep.context_embed_mode", "1") orig_model_location = "/some/path/to/original_model.onnx" sess = onnxrt.InferenceSession(orig_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[]) ``` - Inference run with an EP context model ```python import onnxruntime as onnxrt # Session options for creating an ONNX model with EP context cache. sess_opts = onnxrt.SessionOptions() # Default EP context model path. # ep_ctx_model_location = "/some/path/to/origina_model.onnx_ctx.onnx" # User configured EP context model path. ep_ctx_model_location = "/some/path/to/origina_model_ctx.onnx" sess = onnxrt.InferenceSession(ep_ctx_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[]) model_inputs = {} run_opts = onnxrt.RunOptions() # Verbose. run_opts.log_severity_level = 1 sess.run(None, model_inputs, run_opts) ``` --------- Co-authored-by: Glen Cao <glen@Glens-MacBook-Air.local>	2024-07-12 21:22:58 -07:00
Xu Xing	92a8407b39	[js/webgpu] Remove unnecessary initialization of var (#21312 ) This var has been initialized to 0 in tint, so no need extra loop to do it again: ``` float tint_symbol_52[1][4] = (float[1][4])0; { for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) { { for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) { tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f; } } } } ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-12 12:34:34 -07:00
Yi Zhang	f2ebd1cd6b	[Fix] Exception in iosDynamicFramework Post-Merge workflow (#21262 ) ### Description the exception was caused by `3dd6fcc089` Why I add skip_macos_test because there's new an exception in https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1425579&view=logs&j=c90c5af3-67d5-5936-5a62-71c93ebfca65&t=01038f35-8e78-5801-1aa1-d9647bb65858 ``` 2024-07-05T14:41:09.3864740Z mkdir -p /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest/Contents/Frameworks 2024-07-05T14:41:09.3933430Z mkdir: /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest: Operation not permitted 2024-07-05T14:41:09.3996760Z /var/folders/0f/b0mzpg5d31z074x3z5lzkdxc0000gn/T/tmp97ycvwq5/apple_package_test/Pods/Target Support Files/Pods-macos_package_testUITests/Pods-macos_package_testUITests-frameworks.sh: line 7: realpath: command not found 2024-07-05T14:41:09.4003170Z :18: error: Unexpected failure 2024-07-05T14:41:11.1323470Z error: Sandbox: mkdir(72212) deny(1) file-write-create /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest (in target 'macos_package_testUITests' from project 'apple_package_test') 2024-07-05T14:41:11.1325620Z 2024-07-05T14:41:11.8731110Z 2024-07-05T14:41:11.8733040Z Test session results, code coverage, and logs: 2024-07-05T14:41:11.8734820Z /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Logs/Test/Test-macos_package_test-2024.07.05_14-40-38-+0000.xcresult 2024-07-05T14:41:11.8735530Z 2024-07-05T14:41:11.8906210Z Testing failed: 2024-07-05T14:41:11.8911060Z Sandbox: mkdir(72212) deny(1) file-write-create /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Products/Debug/macos_package_testUITests.xctest 2024-07-05T14:41:11.8912570Z Unexpected failure 2024-07-05T14:41:11.8913690Z Testing cancelled because the build failed. 2024-07-05T14:41:11.8914380Z 2024-07-05T14:41:11.8914970Z TEST FAILED 2024-07-05T14:41:11.8915480Z 2024-07-05T14:41:11.8915780Z 2024-07-05T14:41:11.8916750Z The following build commands failed: 2024-07-05T14:41:11.8919280Z PhaseScriptExecution [CP]\ Embed\ Pods\ Frameworks /Users/runner/Library/Developer/Xcode/DerivedData/apple_package_test-akksnidsbpojopfdqrclgsoqqerv/Build/Intermediates.noindex/apple_package_test.build/Debug/macos_package_testUITests.build/Script-059136A7770CA5376C30F2FD.sh (in target 'macos_package_testUITests' from project 'apple_package_test') 2024-07-05T14:41:11.8922180Z (1 failure) ``` And I find macos test is skipped in `9ef28f092f/tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml (L119-L127)` as well. Maybe it is an known issue.	2024-07-12 09:24:12 -07:00
Ted Themistokleous	4ac4cd2668	Migraphx ep windows build (#21284 ) ### Description Repeat of #21084 with removal of policy CMP0144 to suppress warnings which uses CMake 3.27.0. ### Motivation and Context Already approved PR: https://github.com/microsoft/onnxruntime/pull/21084 Removed the added policy from CMake 3.27.0.	2024-07-11 21:21:38 -07:00
mingyueliuh	42b7cedb06	[VitisAI] custom op support multiple outputs (#21280 ) ### Description The implementation inside EP requires registering some custom ops which are only used in the model compilation phase. Currently only single output is supported. ### Motivation and Context Now the demand upgrade requires support for multiple outputs, so the shaper infer of ep custom op needs to be extended to support multiple outputs --------- Co-authored-by: liumingyue <mingyue@xilinx.com> Co-authored-by: mingyue <mingyue@amd.com>	2024-07-11 16:04:18 -07:00
Qingnan Duan	80b56feb41	Implement FlashAttention for CPU (#20805 ) ### Description Implement [FlashAttention](https://arxiv.org/pdf/2205.14135) and [FlashAttention-2](https://arxiv.org/pdf/2307.08691) for MultiHeadAttention on CPU. ### Motivation and Context Accelerate the execution of MultiHeadAttention. Current performance: 10ms vs 16ms (com.microsoft.MultiHeadAttention) on my Linux machine and 10ms vs 38ms (com.microsoft.MultiHeadAttention) on my Windows machine. May need further optimizations. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Qingnan Duan <qiduan@microsoft.com>	2024-07-11 14:19:59 -07:00
Edward Chen	33e7c7f6ec	Enable Android CI build stages to run in parallel. (#21314 ) Enable Android CI build stages to run in parallel to possibly reduce total build time.	2024-07-11 10:09:09 -07:00
Yi Zhang	41ea47be1e	Move QNN nuget package stages out of the big Nuget packaging pipeline. (#21306 ) ### Description 1. remove QNN stages from the big packaging pipeline 2. Add publish nightly package in the current [QNN Nuget pipeline](https://dev.azure.com/aiinfra/Lotus/_builddefinitionId=1234]) ### Motivation and Context Reduce the complexity of the big Nuget packaging pipelines. --------- Co-authored-by: Yi Zhang <your@email.com>	2024-07-11 09:07:23 -07:00
pengwa	88336ffa92	Fix typos - 1st Wave (#21278 ) ### Description There are so many typos reported by the review dog, [Optional Lint] actions (example: https://github.com/microsoft/onnxruntime/actions/runs/9864564489/job/27239732367), this PR is to fix some of them. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-11 13:35:08 +08:00
pengwa	0a1178add9	Fix lint C++ actions (#21303 ) ### Description <!-- Describe your changes. --> `83e0c6b96e` is the last commit having Lint C++ actions pass. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/96bf005e-5815-46d0-ac17-c6094200957c) `4a7eaff1d9` is the first commit let it fail. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/72a9271e-7b4b-40f8-83a5-f28b82c5e726) Reviewdog/action-cpplint@master changed since that day. https://github.com/reviewdog/action-cpplint/pull/42/files make action-cpplint starts using reviewdog release https://github.com/reviewdog/reviewdog/releases/tag/v0.19.0. Optional Lint also failed with many typos, should be also related to the same reason. Let's fix that in different prs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-11 09:46:41 +08:00
Changming Sun	fe6ef404b5	Enable LTO for Android build (#21243 ) ### Description Enable LTO for Android build, which can reduce binary size by 6%.	2024-07-10 18:44:17 -07:00
Sheil Kumar	28af544278	[DirectML] Broadcast NC-dims for Tensors A&B in DynamicQuantizeMatMul (#21298 ) ### Description [DirectML] Broadcast NC-dims for Tensors A&B in DynamicQuantizeMatMul The DynamicQuantizeMatMul allows input tensors in NCHW format, and DirectML requires that input tensors share the same batch and channel dimensions. Tensors A and B should be broadcast (if possible) to the corresponding output NC dims. ### Motivation and Context Certain models which use DynamicQuantizeMatMul hit a crash when the NC dims are intended to be broadcast. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-07-10 17:35:47 -07:00
Edward Chen	20cd3394fc	[MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation (#21193 ) Update AArch64 SQNBitGemm CompInt8 kernels to process matrix in tiles. E.g., computing the output in 2x2 tiles allows us to compute four elements of the output with one read of two rows of A and two columns of B. Also moved some code around as it was getting big for a single file.	2024-07-10 15:39:26 -07:00
Changming Sun	8749fa381e	Update absl (#21300 ) ### Description Our macOS pipeline are failing because of a build error in absl. However, the bug fix we need is not available in the latest ABSL release. Here is the issue: https://github.com/abseil/abseil-cpp/pull/1536 And here is the fix: `779a3565ac` GTests uses ABSL. But this ABSL target also depends on GTest. So, it is a circular dependency. We should be able to avoid that by avoid building tests for ABSL. However, the version we are using has a problem with that: it has cmake target that still depends on GTest even when testing is disabled. It's strange that we suddenly hit this problem and it only happens on macOS.	2024-07-10 11:14:15 -07:00
Adrian Lizarraga	5753f8da8c	[QNN EP] Initial INT4 support (#21171 ) ### Description - Adds support for int4 quantized weights (per-tensor and per-channel) on QNN EP - Adds test script that creates an INT4 qdq model with a Conv - Adds a unit tests demonstrating accuracy issues. ### Motivation and Context This is the next step in being able to run models that use 4-bit quantized weights on QNN EP.	2024-07-10 10:03:53 -07:00
Pavan Goyal	1b82d835d8	[Fix] InterOpNumThreads Session Option for ONNX ReactNative Package (#21263 ) ### Description This PR resolves a bug related to setting the interOpNumThreads session option when creating an ORTSession. Currently, when the interOpNumThreads option is passed from React Native, the native module incorrectly sets intraOpNumThreads instead of interOpNumThreads. ### Motivation and Context Since this is a bug, users of the Onnx React Native package may believe that they are setting interOpNumThreads correctly, So this change is required. Refer to the code snippet below for details <img width="634" alt="Screenshot 2024-07-05 at 9 28 58 PM" src="https://github.com/microsoft/onnxruntime/assets/88655321/70a8f216-553a-4f4c-9481-e6871f0e37e6">	2024-07-10 07:00:18 -07:00
Ștefan Talpalaru	1b19045afa	[build] allow MPI on Unix when NCCL is disabled (#21175 ) ### Description CMake logic fixed to allow enabling MPI while NCCL is disabled. ### Motivation and Context MPI is also used on the CPU backend, not only with CUDA, so it makes sense to decouple it properly from NCCL (which is for dealing with multiple Nvidia GPUs).	2024-07-09 21:21:40 -07:00
Hann Wang	d28c26a919	[ROCm] fix: obtain AMD GPU memory info through rocm_smi library (#21190 ) ### Description Previously ROCMExecutionProvider uses `hipMemGetInfo` to obtain the sizes of total memory and available memory. However, this API has been broken since ROCm 5.7. In this PR, we use `rocm_smi` library instead of `hipMemGetInfo`. ### Motivation and Context `hipMemGetInfo` API has been broken since ROCm 5.7 and inference with ROCMExecutionProvider will lead to following errors: ``` HIP failure 1: invalid argument ; GPU=0 ; hostname=4cc4900475fe ; file=/onnxruntime/onnxruntime/core/providers/rocm/rocm_execution_provider.cc ; line=229 ; expr=hipMemGetInfo(&free, &total); ``` MIOpen has a brute-force fix for this (`911e671895/src/hip/handlehip.cpp (L72)`). Instead of hard-coding available memory to 16GB, I suppose we could obtain memory info through `rocm_smi` library as in this PR.	2024-07-09 20:35:26 -07:00
Chen Feiyue	fffd430091	[VSINPU]Code improvement && Slice/Dropout OP support (#21217 ) ### Description - Refactor codes to meet line length limit and guard missing warning - Add slice/dropout op support - Move vsinpu ep's cmake settings from onnxruntime_providers.cmake to a separate file - Modify apis with param onnxruntime::Path because this kind is replaced by std:filesystem::path by #20920	2024-07-09 20:14:46 -07:00
Maximilian Müller	cc0de0d526	[Build] Propagate build option for CUDA minimal to TRT (#20695 ) ### Description Extend cuda minimal option to TRT provider, as with TRT 10 no linking to cuDNN is required anymore . Besides that with the new engine dump feature it is also possible to embed an engine in to an ONNX and not ship a builder lib. In addition to that this has roughly the same deserialization time/session setup time that using TRT standalone has. ### Motivation and Context ``` exe_builder_lib\onnxruntime_perf_test.exe -I -e tensorrt -r 5 -i 'trt_engine_cache_enable\|1 trt_timing_cache_enable\|1 trt_dump_ep_context_model\|1 trt_weightless_engine_enable\|1' model.onnx exe_no_builder_lib\onnxruntime_perf_test.exe -I -e tensorrt -r 5 -i 'trt_engine_cache_enable\|1 trt_timing_cache_enable\|1 trt_dump_ep_context_model\|1 trt_weightless_engine_enable\|1' model_ctx.onnx ```	2024-07-09 14:40:04 -07:00
Edward Chen	307b34a820	[NNAPI EP] Track skipped initializer usage (#21286 ) Track skipped initializer usage in NNAPI EP to account for usage by other nodes.	2024-07-09 13:43:22 -07:00
Xiang Zhang	1ab162fbca	Fix ETW Sink Initialize unproperly locking (#21226 ) ### Description ETW trace logger is fakely registered as initialized_ is marked as true before the registration is done, causing crashing issue for Lenovo camera application. [Bug 42610244](https://microsoft.visualstudio.com/OS/_workitems/edit/42610244): [Watson Failure] caused by SVCHOSTGROUP_Camera_INVALID_POINTER_READ_c0000005_onnxruntime.dll!onnxruntime::logging::Logger::Log	2024-07-09 10:55:41 -07:00
Jian Chen	d1c19e79ea	Update OpenVino CI Ubuntu to 22.04 (#21127 ) ### Description [Update OpenVino CI Ubuntu to 22.04](`312fab5b3f`) ### Motivation and Context Ubuntu 22.04 is needed for linux C++20	2024-07-09 09:56:44 -07:00
Wanming Lin	eeb8fc0931	[WebNN EP] Release WebNN MLGraphBuilder after Compile to free memory (#21200 ) This would help release the constants bound by the MLGraphBuilder.	2024-07-09 08:49:58 -07:00
Changming Sun	2c53b4a534	Remove core/common/gsl.h (#20894 ) ### Description It might be easier if we just directly include the original gsl headers. "core/common/gsl.h" is an indirection that doesn't provide extra help.	2024-07-08 18:09:39 -07:00
Enrico Galli	4c3c809bdb	[js/webnn] Enable user-supplied MLContext (#20600 ) ### Description This PR enables the API added in #20816 as well as moving context creation to JS. ### Motivation and Context In order to enable I/O Binding with the upcoming [MLBuffer](https://github.com/webmachinelearning/webnn/issues/542) API in the WebNN specification, we need to share the same `MLContext` across multiple sessions. This is because `MLBuffer`s are restricted to the `MLContext` where they were created. This PR enables developers to use the same `MLContext` across multiple sessions.	2024-07-08 10:19:39 -07:00
Wanming Lin	cd516a1677	[WebNN EP] Remove constraint for conv ops on CPU backend (#21237 ) Currently WebNN TFLite backend allows the filter of conv2d/convTranspose2d be an input. Remove the constraint and operate necessary transpose/reshape operations for the filter input.	2024-07-08 10:14:43 -07:00
zz002	4a7eaff1d9	[vitisai] Fix build failure introduced by #20920 (#21247 ) ### Description Fix build failure introduced by #20920	2024-07-08 05:44:30 -07:00
Jing Fang	83e0c6b96e	Add MatMulNBits shape infer to SymbolicShapeInference (#21246 ) ### Description Support MatMulNBits shape infer in SymbolicShapeInference MatMulNBits's B input is rank-2, so implicit merge does not apply. ### Motivation and Context [Issue with performing shape inference using symbolic_shape_infer.py with Phi-3 ONNX Models · Issue #21194 · microsoft/onnxruntime (github.com)](https://github.com/microsoft/onnxruntime/issues/21194)	2024-07-05 16:24:57 -07:00
KnightYao	9ef28f092f	[Fix Bug] Fp8Fp8 Run Error (#20911 ) Fix fp8fp8 when input A is e5m2, input B is e4m3 will run error ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-05 17:11:59 +02:00
pengwa	3f6b7430d6	Use cuda memset async (#21216 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-05 17:27:45 +08:00
Baiju Meswani	0bbd061a54	Exclude azure ep from gen_def.cc (#21250 ) Addresses python packaging pipeline failure.	2024-07-04 10:50:27 -07:00
Changming Sun	07c429191e	Delete path.h (#21211 ) ### Description Delete path.h and replace all occurrences of onnxruntime::Path with std::filesystem::path. Previously we couldn't use C++17's std::filesystem because it was not supported in iOS 12(which was released in 2018). Now we dropped the support for iOS 12. ### Motivation and Context To simplify code. For example, if an EP wants to use the Path class, now it can directly use it without going through a wrapper. And the standard implementation can handle various path types better. (We didn't take much consideration on UNC path, "/" as a path separator on Windows, etc).	2024-07-04 15:54:13 +08:00
kailums	40d4b2ec75	exclude split3inner kernel on rocm ep (#21238 ) ### Description There is an issue when using split3inner kernel on rocm-6.0.3, exclude these code from rocm EP.	2024-07-04 14:32:28 +08:00
Tianlei Wu	7d9b12a2e3	[CPU] SparseAttention op (#21110 ) Add SparseAttention cpu implementation. - [x] Refactoring GQAAttentionBase - [x] Add SparseAttention implementation - [x] Add test cases This is unfused version. Flash attention version will be added later.	2024-07-03 21:51:57 -07:00
Yi Zhang	30b6e82e7d	Make ROCm packaging stages to a single workflow (#21235 ) ### Description Make current ROCm packaging stages to a single workflow. Reduce the possibility of all nightly packages can't be generated by one failed stage ### Motivation and Context Our plan is to reduce the complexity of the current zip-nuget pipeline to improve the stability and performance of nightly packages generation. ROCm packaging stages has no dependencies with other packaging jobs and it's the most time-consuming route. After this change, the most used CPU/CUDA/Mobile packaging workflow duration can be reduced roughly from 3h20m to 2h30m.	2024-07-04 11:07:04 +08:00
cloudhan	f39ee14b46	Add GQA support for ROCm (#21032 )	2024-07-03 14:55:31 +08:00
pengwa	4932e04053	ORTModule GraphTransitionManager (#19007 ) ### Problem Currently, the codebase contains some logics pertaining to model re-export checks and graph_builder reinitialization checks. Ideally, these operations should function akin to a state machine. However, upon inspecting the implementation, it becomes apparent that certain states are checked or set in various scattered locations. This fragmentation makes it challenging to comprehend when a re-export or re-initialization will be triggered. For optimal clarity and maintainability, it is advisable to consolidate these states into a cohesive component, rather than dispersing them within the current graph execution manager. Furthermore, the process of model exports and post-export processing for stage 3 support or memory-efficient gradient management introduces considerable complexity. To enhance the codebase's structure, it would be beneficial to extract these intricate functionalities into a dedicated component, divorcing them from the current graph execution manager. As part of the effort to improve the codebase, it's essential to address inconsistencies in handling input/output flatten/unflatten operations. Currently, there are several functions performing these operations recursively, each with slightly different implementations. This inconsistency leads to varying support for input/output data types and structures in different parts of the code. To rectify this, the proposed pull request simplifies these operations into a set of primitive functions, ensuring uniformity. This not only streamlines the code but also facilitates the maintenance of consistency when introducing bug fixes or supporting new data types. One thing to mention here: input output handling is deeply bound to the graph transition mentioned above, so it is difficult to make this change separately. While acknowledging the complexity of these logics, it is reassuring that the codebase benefits from an extensive suite of unit tests that cover all possible branches. Despite the intricacies, ensuring the passage of all tests has been a time-intensive but necessary aspect of this development effort. ### Design Introduce `GraphTransitionManager` and put all model export and post-export processing logics in it. 1. Re-export check 2. Do export 3. Re-post-export process check 4. Do post-export process 5. Return `PostExportProcessedModelInfo`, which contains all the information we need, to pass to ORT to build gradient graph (currently we do the same for training or evaluating, but ideally we should not do it for evaluating, let's keep this behavior as it is now, and make the change later). ``` # Input names for the pre-gradient-build graph. # This may be different with the one in ExportedGraph since we may modify the graph inputs as needed # for example when memory efficient gradient management is enabled. self.onnx_graph_input_names: list[str] = onnx_graph_input_names # A subset of onnx_graph_input_names. # Input names that require gradients for the pre-gradient-build graph. self.onnx_graph_input_names_require_grad: list[str] = onnx_graph_input_names_require_grad # Create symbolic names for each dimension of the graph input (e.g. onnx_graph_input_names). # The key is the input name, the value is a dict of {dim_index: symbolic_dim_name} # e.g. {"input1": {0: "input1_dim0", 1: "input1_dim1"}, "input2": {0: "input2_dim0"}} self.onnx_graph_input_dynamic_axes_map: dict[str, dict[int, str]] = onnx_graph_input_dynamic_axes_map self.buffer_for_ort_runs: dict[str, torch.Tensor] = OrderedDict() self.onnx_graph_input_names_user_defined = ( onnx_graph_input_names_user_defined # The ONNX graph input names excluding the parameters, buffers. ) # The ONNX graph input names excluding the parameters, buffers. self.onnx_graph_input_names_require_grad_user_defined = onnx_graph_input_names_require_grad_user_defined self._post_export_processed_model: onnx.ModelProto \| None = post_export_processed_model # A function to access the input data from the args and kwargs. # If it is not None, the length is same as onnx_graph_input_names. # For i-th input name, we can use the i-th function to get the input data from args and kwargs. self.data_accessor: list[callable] \| None = data_accessor # Used for unflattening the outputs from the ORT forward run. self.module_forward_output_schema: ORTModelInputOutputSchemaType \| None = module_forward_output_schema``` The `GraphTransitionManager` instance is a property of `GraphExecutionManager` (e.g. `TrainingManager` or ``InferenceManager), 1. Use 'self._graph_transition_manager.use_cache_or_reconstruct_post_processed_model(inputs, kwargs)' to check whether the PyTorch module need a re-export or re-post-export-process. 2. Use `self._graph_transition_manager._post_export_processed_model_info.construct_inputs` to construct the list of inputs used for ORT runs. 3. Use `self._graph_transition_manager._post_export_processed_model_info.restore_outputs(user_outputs)` to restore the outputs in original PyTorch output structure. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-03 10:53:31 +08:00
Baiju Meswani	116398c1a4	onnxruntime shared lib inside python package (#21223 )	2024-07-02 15:37:50 -07:00
Tianlei Wu	7df97f1987	Add debugging helper to dump string, vector and thread id (#21224 ) ### Description Add some macro to help print data to console for debugging purpose. Example usage: ``` int input_id; vector<int> some_vector; DUMP_CPU_TENSOR_INIT() DUMP_CPU_TENSOR("some vector", some_vector); DUMP_STRING("input_id=", input_id); ``` - To enable dump thread id, set environment variable `ORT_DUMP_THREAD_ID=0`. - User can disable dumping by environment variable `ORT_ENABLE_CPU_DUMP=0`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-02 11:24:04 -07:00
Yifan Li	7be1d4aad3	[TensorRT EP] Update TRT10.0 deprecated api (#20989 ) ### Description <!-- Describe your changes. --> Note: * This PR would remove C4996 suppression in tensorrt_execution_provider.cc only (according to Nvidia, places with nvinfer.h included need C4996 suppression, when /Zc:__cplusplus is enabled in ORT win build) * A follow-up PR will be raised to update deprecated TRT Plugin api usage. Here are deprecated apis to be updated in this PR: \| deprecated api \| Update \| \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| \| [kCUBLAS](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#a9e1d81e5a8bfeb38b86e22a66d5f836a) \| / \| \| [kCUBLAS_LT](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#a9e1d81e5a8bfeb38b86e22a66d5f836a) \| / \| \| [kCUDNN](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#a9e1d81e5a8bfeb38b86e22a66d5f836a) \| / \| \| [reallocateOutput](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1v__1__0_1_1_i_output_allocator.html#acae6441d4029584cc1c6550917518691) \| Superseded by [reallocateOutputAsync](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1v__1__0_1_1_i_output_allocator.html#aa40eeb891c1dfe4c1bbf1eabe8c705ab) with cudaStream_t argument \| \| [createExecutionContextWithoutDeviceMemory](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_cuda_engine.html#adc86bcc42b098204997396ef2b1093fb) \| Superseded by [createExecutionContext()](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_cuda_engine.html#a35de29aa6134165a5b14a537e6d99e82) with parameter.<br />Check [ExecutionContextAllocationStrategy::kUSER_MANAGED](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#ac6251a050df629edfc0ce037fa366503) for more detail \| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> TRT deprecated api list: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/deprecated.html	2024-07-01 22:55:20 -07:00
Yi Zhang	beb2496748	Templatize publishing nuget package (#21199 ) ### Description It's the prerequisite step of reducing complexity of current zip-nuget pipeline. Some packaging tasks could be cut from the most complex nuget pipline and easily be published ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-02 09:24:19 +08:00
Scott McKay	8c2689877f	CoreML: Disable 1D ML Program matmul due to bug in coreml (#21186 ) ### Description Disable using CoreML ML Program for a matmul where one of the inputs is 1D as the CoreML implementation appears to be broken. See https://github.com/apple/coremltools/issues/2263 Add some debugging notes. ### Motivation and Context Fix failing test on macos-14.	2024-06-29 12:19:51 -07:00

1 2 3 4 5 ...

11337 commits