onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-12 00:59:23 +00:00

Author	SHA1	Message	Date
maggie1059	dfd4bce36e	Use compute queues by default in DML EP (#20438 ) ### Description We originally only use compute queues for compute-only devices; this change sets the default for DX12 devices to use compute queues as well. ### Motivation and Context There have been issues with TDRs occurring when using the current default queues, which doesn't happen on compute queues.	2024-04-24 10:44:16 -07:00
Xavier Dupré	f78215adad	Fix quantization tools for issue #19529 (#19591 ) ### Description Fix issue #19529, the code was using a variable loop outside a loop.	2024-04-24 19:16:27 +02:00
Scott McKay	a46bab6364	Update podspec url to use AFD hostname (#20452 ) Update to use AFD url when generating podspec	2024-04-24 09:37:24 -07:00
Satya Kumar Jandhyala	ae78cdb5d7	[JS/WebGPU] MultiheadAttention bugfix (#20447 ) ### Description Fixed pastkey, key and pastvalue, value concatenation condition and fixed index error. Added new test cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-24 08:43:14 -07:00
Guenther Schmuelling	33d5ea39b3	[js/webgpu] fixes for fp16 attention (#20440 )	2024-04-24 08:01:28 -07:00
Xavier Dupré	80213a9e66	Add implementation for ScatterND (#19540 ) ### Description onnxruntime switches to CPU for ScatterND after opset 13. This extends the implementation of higher opsets.	2024-04-24 14:08:50 +02:00
Rachel Guo	14fcf0a52d	Support visionos build (#20365 ) ### Description <!-- Describe your changes. --> This PR supports a build of onnxruntime.xcframework for xros/xrsimulator for visionos via the build command of `python3 tools/ci_build/github/apple/build_apple_framework.py --config Release/Debug tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`. For officially include visionos in ios cocoapods package and testing in CI, would require separate work for upgrading the Xcode version & upgrade macOS CI agent to macos-13-arm64 or higher. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> visionos support: https://github.com/microsoft/onnxruntime/discussions/19313 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-23 18:15:07 -07:00
Adam Louly	4ce7bbf6f1	Add LayerSpec Support to ORTPipelineModule (#20410 ) ### Description In Deepspeed's Pipeline Parallel Implementation, there is a class used to instantiate the object after it's moved to the device and assigned in a stage. This approach helps reduce peak memory usage. In this PR, we're adding support to ORT for wrapping this LayerSpec.	2024-04-23 17:57:08 -07:00
Yulong Wang	5055dc0aa8	[js/web] add diagnose log for chrome (#20439 ) ### Description Add logs to further diagnose the pipeline issue.	2024-04-23 17:18:54 -07:00
Maximilian Müller	b4e50758c0	Fix shape conv fuse opt (#20282 ) FIx: - Multiples Convs into an Add+Relu will fuse the op although intermediates are needed ![image](https://github.com/microsoft/onnxruntime/assets/44298237/0c85a30c-5f41-4e62-ae2e-f41eada6c2c3) - Also fixes an issue with Shape Initializers Merge as input, that occurs when the input initializer is the same across multiple nodes but not all nodes are Shape nodes.	2024-04-23 16:19:57 -07:00
Yulong Wang	8f53957bcf	[js/web] add "browser" field to support parcel v2 (#20422 ) ### Description As described in latest discussion in #19915, parcel v2 without using the [new resolver](https://parceljs.org/blog/v2-9-0/#new-resolver) will not work correctly with onnxruntime-web. There are still users who uses parcel with default resolver, so add this deprecated field "browser" back for backward compatibility. This PR also corrects the "main" field, which is for old resolver for Node.js.	2024-04-23 13:10:11 -07:00
Yulong Wang	13bda11583	[Node.js binding] Fix install script (#20416 ) ### Description Fix a few bugs of the install script of onnxruntime-node package. This change is integrated from branch `rel-1.17.3` (#20397)	2024-04-23 13:01:16 -07:00
Satya Kumar Jandhyala	d42ac7f0c6	[JS/WebGPU] Multihead attention improvements (#20286 ) ### Description Enabled more usecases ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 12:39:49 -07:00
Edward Chen	76461c8f4d	Increase timeout for iOS packaging pipeline jobs. (#20434 )	2024-04-23 11:55:55 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
guyang3532	ffb9c8d598	fix embedding sparsity log bug of -1% density (#20420 ) ### Description When not checked valid embedding sparsity, the log print a wrong info of "-1% density", this pr is to fix it.	2024-04-23 20:37:50 +08:00
Scott McKay	ed6f1adcb8	Fix overflow causing test failure on x86 (#20425 ) ### Description <!-- Describe your changes. --> Fix comparison that was not updated when the threshold was converted to bytes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI failure	2024-04-23 21:33:59 +10:00
Maximilian Müller	5eae33fc6b	[CUDA EP] RNN check if tf32 is allowed (#20338 ) Respect the use_tf32 flag.	2024-04-23 00:19:09 -07:00
Yi Zhang	7ebc653f04	Revert "Nuget .NET changes for Mac Catalyst (#19923 )" (#20418 ) This reverts commit `f396748ed6`. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 15:08:12 +08:00
Adrian Lizarraga	e6a677f6b7	[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359 ) ### Description - Updates Windows QNN Nuget and Python packaging pipelines to download QNN SDK from blob storage. - Makes the QNN SDK version configurable when launching the python packaging pipeline. ### Motivation and Context Removes the need to rebuild images to update QNN SDK. Only applies to Windows pipelines. Linux pipelines still get the SDK from disk.	2024-04-22 22:32:55 -07:00
aciddelgado	94c69f55d4	GQA 4 CPU (#20299 ) ### Description Support GQA operator on CPU with FP32. ### Motivation and Context Right now, models generated for CPU and GPU must be different. GQA CPU allows these models to be the same.	2024-04-22 19:57:05 -07:00
Scott McKay	c47a6ce70b	XNNPACK: Support 1D input for Conv and ConvTranspose (#20349 ) ### Description <!-- Describe your changes. --> Support 1D input to XNNPACK Conv and ConvTranspose by using faking height of 1 to convert to 2D input. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable speech model with 1D input to use XNNPACK. There is no CPU EP quantized ConvTranspose, so this fills that gap.	2024-04-23 11:50:31 +10:00
Edward Chen	3270a002fa	Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. (#20379 ) Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. The ORT format model doesn't contain information about kMSInternalNHWCDomain since it is set during layout transformation. Fall back to known domains instead.	2024-04-22 18:34:01 -07:00
Preetha Veeramalai	c7de4de501	OVEP Bug fix 1.18 (#20408 ) ### Description Contains critical bug fix ### Motivation and Context This PR handles the bug fix wrt OV caching and blob generation. This also handles the precision for AUTO plugin. --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2024-04-22 18:31:05 -07:00
pengwa	a7787a0bad	Introduce memory efficient topological sort (#20258 ) ### Introduce memory efficient topo sort (for training) ~~and laze initialize Priority-Based and Memory-Efficient topo sort. Because in most cases, they are not needed, so we free the overheads of GraphViewer construction for most use cases.~~ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 08:00:23 +08:00
Scott McKay	9372e9a0a3	Support >2GB of Tensor data in training checkpoint (#20077 ) ### Description <!-- Describe your changes. --> Add ability to store initializer data in an external file. Update training checkpoint code to use external file if data > ~2GB. I don't see a way for the flatbuffers 64-bit offsets to be used, as they don't support storing 'table' types with 64-bit offsets (and our Tensor is a 'table' type not a simple struct). `0cfb7eb80b/tests/64bit/test_64bit.fbs (L38-L39)` Allowing a Tensor to have its raw_data in an external file should hopefully work with the least friction. As it's an extra field it's backwards compatible. Please feel free to suggest alternative approaches. Side note: the diffs in the generated *.fbs.h files are unexpectedly large. Maybe they weren't re-generated when the new flatbuffers version was checked in. I updated by running: `python .\compile_schema.py -f <build output dir>\_deps\flatbuffers-build\Debug\flatc.exe` from onnxruntime\core\flatbuffers\schema which I thought was the correct way but maybe that's out of date. I think you can ignore all the diffs in the generated files and just worry about the changes to the .fbs files in onnxruntime/core/flatbuffers/schema. Basically start at the bottom of the files changed and work up as all the 'real' diffs are there. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: carzh <wolfivyaura@gmail.com>	2024-04-22 15:17:43 -07:00
Yulong Wang	4385602386	[js/web] fix test runner with optional input/output (#20399 ) ### Description fix test runner with optional input/output. This change fixes the OP test runner (.jsonc format test) with optional input(s) and/or output(s). this fix reveals a problem of dealing with optional outputs: > Take SkipSimplifiedLayerNorm as example: > > if in the ONNX model, the node's outputs are: [ 'output_0', '' ] instead of [ 'output_0' ], the current implementation will fail. The difference is, in the first case, context.outputCount == 2, and then the typescript implementation will try to create a tensor for output[1]. It will eventually call to C++ function (OpKernelContext::Output), and the output.DataRaw() will be nullptr. WebGPU backend will fail because it cannot deal with a TensorView with data == 0. > This problem may need to be fixed or workaround in separated PR. This PR does not fix this problem. Failed test cases are modified to work - please note this PR does not break those test cases as they never work.	2024-04-22 12:53:10 -07:00
aamajumder	d0e33d2078	[DML EP] Register opset 20 operators (#20092 ) ### Description This PR registers the following opset 20 operators to the DML EP: -IsNaN-20 -IsInf-20 -ReduceMax-20 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-22 12:01:59 -07:00
Yi Zhang	197b3f1d90	Enable Whisper Test with OMP_FFMPEG (#20402 ) ### Description Installing OMP_FFMPEG in the docker and Readd Whisper Test Download OMP_FFMPEG in restricted accessed Azure blob.	2024-04-22 10:55:56 -07:00
Yulong Wang	a457c1df80	upgrade emsdk to 3.1.57 (#20295 ) ### Description upgrade emsdk to 3.1.57	2024-04-19 23:05:18 -07:00
Adrian Lizarraga	77b7619a3d	[QNN EP] Support float16 BatchNormalization on the HTP backend (#20391 ) ### Description - Adds support for float16 BatchNormalization to the HTP backend. - Fixes float32 support for BatchNormalization on the HTP backend when `enable_htp_fp16_precision` is enabled. ### Motivation and Context Support more models on the QNN HTP backend.	2024-04-19 21:49:39 -07:00
Patrice Vignola	8fbb8a149f	[DML EP] Add MatMulNBits (#20308 )	2024-04-19 15:05:37 -07:00
Hector Li	55e0aaeeef	fix android build issue (#20389 ) fix android build issue	2024-04-19 14:21:34 -07:00
Rachel Guo	f396748ed6	Nuget .NET changes for Mac Catalyst (#19923 ) ### Description <!-- Describe your changes. --> Add Nuget package changes for adding new 'net6.0-maccatalyst' platform. The output ORT Nuget package was manually tested and verified in a .NET MAUI app setup. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-19 14:20:03 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Dmitri Smirnov	42b700d463	Eliminate stray vector and the contention it creates (#20377 ) ### Description Unused vector allocating large memory chunk within a concurrent routine creates heap contention and is eliminated. ### Motivation and Context This partially addresses https://github.com/microsoft/onnxruntime/issues/20373.	2024-04-19 10:27:42 -07:00
Patrice Vignola	4d98f06f93	[DML EP] Add GroupQueryAttention (#20327 )	2024-04-19 10:25:29 -07:00
Wanming Lin	7c80c39f74	[WebNN EP] WebNN CPU backend only support up to 4 Split outputs (#20350 )	2024-04-19 08:31:22 -07:00
sfatimar	4d1963c2a2	OpenVINO EP Rel 1.18 Changes (#20337 ) ### Description These changes include Support to OpenVINO 2024.1 Import PreCompiled Blobs with EPContext Blob Separate Device/Precision as input Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU AUTO GPU, CPU will only create GPU Blob and not CPU Blob. ### Motivation and Context - OpenVINO 2024.1 will be out soon - Import Precompiled Blob can greatly reduce FEIL/FIL Time. - Separating Device/Precision will make the input cleaner - --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2024-04-19 00:31:38 -07:00
Yueqing Zhang	9001c69b84	[VitisAI] Add Version Check. Requsted by Microsoft (#20347 ) ### Description <!-- Describe your changes. --> Add version for onnxruntime_providers_vitisai.dll. So, the onnxruntime_vitisai_ep.dll can check if the version is compatible. To make sure the old onnxruntime_vitisai_ep.dll still work, we would offset the api struct by version field. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? --> This is the direct request from Microsoft. The following is the problem we try to solve: How would you describe the dependency between (a) onnxruntime_vitisai_ep.dll and (b) onnxruntime_providers_vitisai.dll? E.g. for each version of (a) there is a minimum required version of (b), or for each version of (b) there is minimum required version of (a). Please note that in practice we won't be able to use the exact version of ORT/EP that you tested against (because we might need to update ORT for other reasons), but we might be able to accommodate some version constraints that you specify. As we approach shipping, we'll lock the version of ORT/EP to allow for stabilization and more detailed testing (and work with you if it needs to be updated).	2024-04-18 23:05:44 -07:00
Patrice Vignola	12569626cb	Update DML to 1.14.1 (#20380 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 22:43:41 -07:00
Patrice Vignola	b8c90beef2	[DML EP] Add SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20326 )	2024-04-18 22:17:31 -07:00
Chi Lo	a747a00cd3	[TensorRT EP] Use protobuf with debug build on Windows (#20378 ) TRT EP implicitly uses oss_parser with debug build on Windows, therefore it should use protobuf rather than protobuf-lite.	2024-04-18 19:39:08 -07:00
Patrice Vignola	745b426c60	[DML] Update DML to 1.14 (#20304 ) I am prefiring this change to pre-run the non-dml checks, and also to give folks the time to review it before DML gets released. When DML 1.14 officially releases, we'll only need to run the DML pipeline to automatically pick up the nuget package. This should save us some valuable time. Note that DML 1.14 is the release needed for ORT 1.17.4, and DML 1.15 will come soon after.	2024-04-18 16:22:57 -07:00
Adrian Lizarraga	e4c0cb2b9a	[Quant tool] Do not default to contrib Q/DQ ops for 16-bit (#20376 ) ### Description Updates the QDQ quantizer to use ONNX Q/DQ ops for 16-bit quantization if opset >= 21. ### Motivation and Context The QDQ quantizer previously set the 'com.microsoft' domain on inserted Q/DQ ops when the model needed 16-bit support. ONNX 1.16.0 added int16/uint16 support to the QuantizeLinear and DequantizeLinear operators, so we can change the default behavior.	2024-04-18 15:26:07 -07:00
Chi Lo	a8f74e3ec7	[TensorRT EP] TensorRT 10 support (#20167 ) This PR has the change of supporting INT64 tensor type for TRT 10. This PR is also compatible with TRT 8.6 and TRT 10 meaning user can build ORT TRT against TRT 8.6 or TRT 10. Due to the timeline for TRT 10 GA and ORT 1.18 release is very tight (We don't have enough time to get our CIs installed with TRT 10 GA libraries and run the build/tests), as well as Nvidia new Triton release (The timeline is also very close to the timeline of TRT 10 GA) wants to integrate TRT EP with TRT 10. Therefore, our approach is to make this PR into ORT 1.18 first, so everything is fully tested with TRT 8.6 CIs, and user can still manually build ORT 1.18 against TRT 10 like the Triton case. As for testing TRT 10, once TRT 10 GA is released, we will have another branch which includes change at this PR as well as whatever changes needed and update our CIs with TRT 10.	2024-04-18 14:03:04 -07:00
Yulong Wang	3577a4bd02	[Node.js binding] Allow installation to download CUDA binaries via script (#20364 ) ### Description Currently we try to include all prebuilt binaries into the NPM packages. This was working until we added libonnxruntime_providers_cuda.so (>400MB) into the NPM package. The NPM registry refuses to accept new package publishment because the file is too large. To make the new NPM package working, we have to remove the large file from the package, and add a new script on package installation. This script will try to dynamically install onnxruntime CUDA dynamic library for Linux/x64.	2024-04-18 13:44:42 -07:00
Guenther Schmuelling	7b017cf9f8	fix web ci: csum tests need fp64 which is not supported on webgpu (#20374 )	2024-04-18 12:30:26 -07:00
Adam Louly	ee74fb6908	Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287 ) ### Description Introducing a new class ORTPipelineModule to handle wrapping layers in DeepSpeed pipeline parallel. ### Motivation and Context To support pipeline parallelism on ORTModule. This PR will include an initial support of deepspeed Pipeline parallelism. - [x] Support Pipeline parallel where layers are nn Modules in Sequential. - [ ] Support LayerSpec and TiedLayerSpec - [ ] Enable partitioning to accept List - [ ] Full-GPU Graph Consolidation - [ ] Subgraph Merging for Inference	2024-04-18 11:30:15 -07:00
Sumit Agarwal	f664f91298	[DML EP] Expose NPU macro via build command (#20306 ) ### Description This fixes following things: - Expose `ENABLE_NPU_ADAPTER_ENUMERATION` macro via build command, so that a user can enable NPU support for DML EP seamlessly. - Add keyword `_dmlEp_` as part of the node name, which would be useful for debugging purpose. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 11:23:13 -07:00

1 2 3 4 5 ...

10965 commits