onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Wanming Lin	7f0aac0d8a	Revert "[WebNN EP] Rename op logicalNot to not" (#18997 ) Reverts microsoft/onnxruntime#18936 WebNN spec is discussing using the `logicalNot` name at https://github.com/webmachinelearning/webnn/issues/496, and the Chromium implementation has suspended the renaming change. For consistent, we should keep using `logicalNot` in WebNN EP util it is finalized.	2024-01-05 08:15:50 -08:00
Changming Sun	e155c66b4a	Change all macOS python packages to use universal2 (#19013 ) ### Description Change all macOS python packages to use universal2, to reduce the number of packages we have. ### Motivation and Context According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur), macOS 11 is the first macOS version that supports universal 2. And it is the min macOS version we support. So we no longer need to maintain separate binaries for different CPU archs.	2024-01-04 17:44:49 -08:00
liqun Fu	e10a8ae31f	reduce max/min 20 (#17805 ) ### Description reducemax/min have been updated in onnx(20). implement it in ort ### Motivation and Context this is for ort1.17.0 release --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-01-04 17:41:01 -08:00
Jeff Bloomfield	55a669409a	Merge pull request #18983 from microsoft/WindowsAI Merge WindowsAI to main	2024-01-04 17:21:19 -08:00
Adrian Lizarraga	02b1ff5fa2	[QNN EP] Support multithreaded inference of a single session (#18981 ) ### Description - Add mutex to protect QNN API calls for executing a graph and extracting the corresponding profile data. - Ensures QNN EP's execute function does not store unnecessary state (i.e., input and output buffer pointers do not need to be stored as class members.) ### Motivation and Context Allow calling `session.Run()` from multiple threads when using QNN EP.	2024-01-04 13:32:48 -08:00
Wei-Sheng Chin	658e30eb33	Remove DORT since it's in PyTorch main now (#18996 ) Main code are removed and tests are modified to use DORT directly from PyTorch.	2024-01-04 12:59:47 -08:00
Xavier Dupré	889b1ef2d1	Fix schema type constraint for custom operators (#17497 ) ### Description onnxruntime may raise an error "type inference failed" but when a custom operator sets IsHomogeneous to false in its schema. This change make sure that TypeInferenceFunction and schema type constraints are aligned to prevent that from happening. --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2024-01-04 20:27:46 +01:00
Jeff Bloomfield	7401b6661d	Update OperatorKernels.md	2024-01-04 11:27:03 -08:00
Changming Sun	011b562b51	Update c# dependencies (#18995 ) ### Description Update c# dependencies	2024-01-04 10:41:28 -08:00
Jeff Bloomfield	8ea3e68192	Update ContribOperators.md	2024-01-04 10:10:46 -08:00
Yulong Wang	b18abaaa2c	[js/web] wait for threadpool initialization (#18952 ) ### Description a replacement of #18683. try to resolve #18689. By specifying "-s PTHREAD_POOL_SIZE" flag in emscripten, it forces the threadpool to initialize before the webassembly instance is available.	2024-01-04 08:06:55 -08:00
xhcao	867b9d8f04	[js/webgpu] Fix f16 errors for ConvTranspose2D (#18986 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-04 08:06:01 -08:00
Atanas Dimitrov	4e2d88b75f	Remove useless `NodeProto` serializations (#18791 ) ## Description This pull request aims to enhance the efficiency of the inference session creation by eliminating unnecessary `Node::ToProto` invocations. The current codebase presents opportunities for optimization, particularly in the removal of superfluous `Node::ToProto` calls, along with their subsequent `~NodeProto` invocations. ## Motivation and Context The optimization focus of this pull request is on addressing low-hanging fruit in the inference session creation process. By strategically removing undesired `Node::ToProto` calls, we aim to streamline the codebase and enhance the overall performance. The flame graphs illustrate the notable improvements achieved by reducing the percentage of `Node::ToProto` calls, thereby optimizing the execution flow. ### Code Snippet ```cpp TEST(InferenceSessionTests, Bench) { // Initialize logging manager auto logging_manager = std::make_unique<logging::LoggingManager>( std::unique_ptr<ISink>(new CLogSink()), logging::Severity::kVERBOSE, false, LoggingManager::InstanceType::Temporal); // Create environment std::unique_ptr<Environment> env; auto st = Environment::Create(std::move(logging_manager), env); ASSERT_TRUE(st.IsOK()); // Configure session options SessionOptions so; so.execution_mode = ExecutionMode::ORT_SEQUENTIAL; so.graph_optimization_level = TransformerLevel::Level2; so.intra_op_param.thread_pool_size = 1; // Initialize and load the InferenceSession InferenceSessionTestGlobalThreadPools session1{so, *env}; ASSERT_STATUS_OK(session1.Load("big.onnx")); ASSERT_STATUS_OK(session1.Initialize()); } ``` ### `big.onnx` model creation ```python import onnx import numpy as np from spox import argument, build, Tensor, Var from spox.opset.ai.onnx import v17 as op from spox.opset.ai.onnx.ml.v3 import label_encoder a = argument(Tensor(np.int64, ('N',))) c = a for x in range(1000): c = op.mul(c, op.const(np.ones(10000, dtype=np.int64))) for x in range(3000): all_strings = list("random_string" + str(i) for i in range(100)) all_ints = list(range(len(all_strings))) c = label_encoder( c, keys_int64s=all_ints, values_strings=all_strings ) c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints) model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c}) onnx.save(model, "big.onnx") ``` Testing in `Release` with `perf` yields: Before: 3.3% spent in `Node::ToProto` After: 1.6% spent in `Node::ToProto` --------- Co-authored-by: Atanas Dimitrov <atanasdimitrov@Atanass-MacBook-Pro.local>	2024-01-04 17:38:28 +10:00
Jeff Bloomfield	f4ad940ff3	Disable MatMul QDQ selector on DML EP until MatMulIntegerToFloat is re-enabled	2024-01-03 18:37:14 -08:00
Steven Roussey	d5628f52df	link to docs incorrect for js/web/node (#18960 ) ### Description link to docs incorrect for js/web/node ### Motivation and Context Trying to build myself and not yet succeeding.	2024-01-03 17:30:24 -08:00
JJ	5fade70b50	Update README.md (#18963 ) Fixed a small spelling error. ### Description <!-- Describe your changes. --> Small spelling error fix. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? It is documentation for the product, and it misspells the word documentation. This reflects on your product and the quality of the work.	2024-01-03 17:26:25 -08:00
Scott McKay	8e9188e265	Add SessionOptions use_deterministic_compute to the C and C++ APIs. (#18944 ) ### Description <!-- Describe your changes. --> SessionOptions use_deterministic_compute can be set via the python API. User request to enable setting via C API. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #17416	2024-01-04 11:12:48 +10:00
Jeff Bloomfield	70a6f816af	Port attention query fix from `b2768bbf23`	2024-01-03 16:22:54 -08:00
raoanag	56fcea94e3	Enable QDQ quantization for DML EP (#18367 ) ### Description This enables QDQ transforms with the DML EP	2024-01-03 16:13:23 -08:00
Jeff Bloomfield	ee60e3af6c	Limit size of constant nodes creates by DML EP following deduplicatio… (#18915 ) ### Description This limits the size of constant data nodes which the DML EP creates in the DML graph following de-duplication of 1D quantization tensors. In the process it reduces a check for the maximum size of the constant node. This is merged from: https://github.com/microsoft/onnxruntime/pull/18494 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-03 16:13:22 -08:00
tbqh	70d3f682a7	De-duplicate 1D scale and zero point tensors to scalars in DML kernels (#18862 ) ### Description Cleanup and rebase from [this PR](https://github.com/microsoft/onnxruntime/pull/18629) ### Motivation and Context --------- Co-authored-by: Christian Larson <chrilaMSFT@users.noreply.github.com> Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com> Co-authored-by: Anagha Rao <anagrao@microsoft.com>	2024-01-03 16:13:19 -08:00
Jeff Bloomfield	bdaeebd6ff	Fix bug in DML EP ExecuteCommandList fast path and simplify design (#18866 ) ### Description This addresses a bug in a fast path that was added for submission of re-used command lists of fused graph kernels in the DML EP, addressing a D3D debug layer error. ### Motivation and Context The fast path in DmlCommandRecorder::ExecuteCommandList enabled a current non-reused command list, if empty, to be used for commands following submission of the fused command list. The fix ensures the associated command allocator is only re-used after the next fence value is completed, which is higher due to submission of the other command list. The command recorder design was intended to support batching of provided command list execution, however it submits command lists immedately as an implementation detail to maximize CPU/GPU parallelism. If that heuristic was removed, it would expose additional issues in this same fast path. Because of this and complexity and inefficiency of the old batching mechanism, I also removed this.	2024-01-03 16:13:15 -08:00
Sheil Kumar	b2f81c8725	Hide Col2Im registration behind DML_TARGET_VERSION 6300 (#18829 ) Hide Col2Im registration behind DML_TARGET_VERSION 6300 Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-01-03 16:13:15 -08:00
Jake Mathern	d2f7a5b128	Cherry pick fix constant pow (#18785 ) ### Description Cherry pick https://github.com/microsoft/onnxruntime/pull/18784	2024-01-03 16:13:14 -08:00
Sheil Kumar	107d7492b9	[DirectML EP] Add DML EP registration for Col2Im (#17786 ) ### Description [DirectML EP] Add DML EP registration for Col2Im operator ### Motivation and Context Add Col2Im support for opset 18. This operator is implemented as the DirectML Fold operator. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2024-01-03 16:13:14 -08:00
Christian Larson	c1ec3c3f93	User/chrila/fix dml dx12 warning (#18746 ) Update resource creation flag to avoid D3D12 WARNING ### Description Update the DML DX12 allocator to use D3D12_RESOUCE_STATE_COMMON to avoid DX12 Warning messages. ### Motivation and Context When directML is created with debug layer there are warnings when resources are created by ORT. --------- Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com>	2024-01-03 16:13:13 -08:00
Xiang Zhang	623d957607	register resize with uint8/int8 support (#18647 ) ### Description 1. Expand input datatype support for Resize with uint8/int8. 2. Update the logic to compute output shape of Resize Op, roiRange is got rid of to align with how tests compute the output shape to go around the size asserting in MLOperatorAuthorImpl.cpp `m_inputDimensions[i] * roiRange * scale` -> `m_inputDimensions[i] * scale` 3. disable 4 tests because of the result mismatch. The results of DML with float32 and uint8/int8 match each other, so it should be problem of resize implementation, which is out the scope of this PR. `ResizeOpTest.NhwcResizeOpLinearDownSampleTest_tf_crop_and_resize_without_extrapolation_uint8 ResizeOpTest.NhwcResizeOpLinearDownSampleTest_tf_crop_and_resize_without_extrapolation_int8 ResizeOpTest.NhwcResizeOpLinearDownSampleTest_4DBilinear_pytorch_half_pixel_uint8 ResizeOpTest.NhwcResizeOpLinearDownSampleTest_4DBilinear_pytorch_half_pixel_int8`	2024-01-03 16:13:13 -08:00
Sheil Kumar	e8209ce2b0	CP `7fd1ce95a4` (#18560 ) CP `7fd1ce95a4` for onnxruntime_perf_test changes. Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-01-03 16:13:11 -08:00
raoanag	7f9e6c42c2	readd npu enumeration (#18437 ) (#18518 ) [Cherry pick Reviewed] Re-add changes which were merged out... --------- ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Sheil Kumar <smk2007@gmail.com> Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-01-03 16:12:20 -08:00
raoanag	613fdce12e	Create ring buffer for re-used command lists (#18368 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2024-01-03 16:12:19 -08:00
raoanag	5c283340c3	Filter activation fusions on MCDM (#18371 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2024-01-03 16:12:19 -08:00
raoanag	a1000a0a3c	Enable GEMM activation fusions on MCDM (#18372 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2024-01-03 16:12:17 -08:00
raoanag	531e875fb5	Avoid command list reset in common case of re-used command list execution (#18370 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2024-01-03 16:12:17 -08:00
raoanag	d5f3aae3fd	Utilize DML constant input graph node (#18267 ) ### Description This PR also includes, `8b0a55e7cc` DML constant pow operator `7520974970` Enable custom heaps based on query- ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2024-01-03 16:12:15 -08:00
raoanag	dcfff10f57	Enable QLinearAveragePooling DML EP (#17384 ) (#18240 ) [Cherry Pick Reviewed] DML EP Implementation for [QLinearAveragePool](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QLinearAveragePool) ``` Note: Google Test filter = QLinearPool* [==========] Running 72 tests from 2 test suites. [----------] Global test environment set-up. [----------] 36 tests from QLinearGlobalAveragePool [ RUN ] QLinearGlobalAveragePool.Nhwc_1x1x32x32 [ OK ] QLinearGlobalAveragePool.Nhwc_1x1x32x32 (410 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x32x32x1 [ OK ] QLinearGlobalAveragePool.Nchw_1x32x32x1 (641 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x8x8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x256x8x8 (156 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x256 [ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x256 (134 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x7x7 [ OK ] QLinearGlobalAveragePool.Nhwc_1x255x7x7 (160 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x255 [ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x255 (145 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x8x8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x255x8x8 (148 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x255 [ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x255 (129 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x7x7 [ OK ] QLinearGlobalAveragePool.Nhwc_1x256x7x7 (134 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x256 [ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x256 (131 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x8x8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x256x8x8 (159 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x256 [ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x256 (168 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x7x7 [ OK ] QLinearGlobalAveragePool.Nhwc_3x255x7x7 (139 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x255 [ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x255 (170 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x8x8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x255x8x8 (155 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x255 [ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x255 (156 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x7x7 [ OK ] QLinearGlobalAveragePool.Nhwc_3x256x7x7 (133 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x256 [ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x256 (149 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x1x32x32_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x1x32x32_S8 (131 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x32x32x1_S8 [ OK ] QLinearGlobalAveragePool.Nchw_1x32x32x1_S8 (127 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x8x8_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x256x8x8_S8 (153 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x256_S8 [ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x256_S8 (129 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x7x7_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x255x7x7_S8 (133 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x255_S8 [ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x255_S8 (135 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x8x8_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x255x8x8_S8 (129 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x255_S8 [ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x255_S8 (152 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x7x7_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_1x256x7x7_S8 (140 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x256_S8 [ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x256_S8 (133 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x8x8_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x256x8x8_S8 (135 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x256_S8 [ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x256_S8 (147 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x7x7_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x255x7x7_S8 (156 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x255_S8 [ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x255_S8 (155 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x8x8_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x255x8x8_S8 (138 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x255_S8 [ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x255_S8 (155 ms) [ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x7x7_S8 [ OK ] QLinearGlobalAveragePool.Nhwc_3x256x7x7_S8 (144 ms) [ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x256_S8 [ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x256_S8 (139 ms) [----------] 36 tests from QLinearGlobalAveragePool (5968 ms total) [----------] 36 tests from QLinearPoolTest [ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel [ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel (480 ms) [ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel [ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel (481 ms) [ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel [ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel (512 ms) [ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel [ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel (455 ms) [ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel [ OK ] QLinearPoolTest.AveragePool2D_MultiChannel (463 ms) [ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel [ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel (448 ms) [ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel [ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel (458 ms) [ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc (171 ms) [ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc (169 ms) [ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc (152 ms) [ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc (660 ms) [ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc [ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc (150 ms) [ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc (145 ms) [ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc [ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc (146 ms) [ RUN ] QLinearPoolTest.AveragePool2D_BigImage [ OK ] QLinearPoolTest.AveragePool2D_BigImage (505 ms) [ RUN ] QLinearPoolTest.AveragePool2D_BigImage_nhwc [ OK ] QLinearPoolTest.AveragePool2D_BigImage_nhwc (161 ms) [ RUN ] QLinearPoolTest.AveragePool2D_Global [ OK ] QLinearPoolTest.AveragePool2D_Global (481 ms) [ RUN ] QLinearPoolTest.AveragePool2D_Global_nhwc [ OK ] QLinearPoolTest.AveragePool2D_Global_nhwc (152 ms) [ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_S8 (461 ms) [ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_S8 (448 ms) [ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_S8 (471 ms) [ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_S8 (473 ms) [ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_S8 [ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_S8 (1507 ms) [ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_S8 (477 ms) [ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_S8 [ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_S8 (493 ms) [ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc_S8 (158 ms) [ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc_S8 (146 ms) [ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc_S8 (146 ms) [ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc_S8 (158 ms) [ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc_S8 (157 ms) [ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc_S8 (145 ms) [ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc_S8 (147 ms) [ RUN ] QLinearPoolTest.AveragePool2D_BigImage_S8 [ OK ] QLinearPoolTest.AveragePool2D_BigImage_S8 (537 ms) [ RUN ] QLinearPoolTest.AveragePool2D_BigImage_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool2D_BigImage_nhwc_S8 (173 ms) [ RUN ] QLinearPoolTest.AveragePool2D_Global_S8 [ OK ] QLinearPoolTest.AveragePool2D_Global_S8 (457 ms) [ RUN ] QLinearPoolTest.AveragePool2D_Global_nhwc_S8 [ OK ] QLinearPoolTest.AveragePool2D_Global_nhwc_S8 (150 ms) [----------] 36 tests from QLinearPoolTest (12914 ms total) [----------] Global test environment tear-down [==========] 72 tests from 2 test suites ran. (18885 ms total) [ PASSED ] 72 tests. memleakdbg: ----- No memory leaks detected ----- ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-03 16:11:46 -08:00
raoanag	cb7f28a16a	Register Resize for INT8 and UINT8 (#18252 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Adrian Tsai <adtsai@microsoft.com>	2024-01-03 16:11:33 -08:00
raoanag	9ff5e3b7b0	Add QLinearConcat for DML EP (#16971 ) (#18268 ) ### Description [Cherry Pick Reviewed] ``` [ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms) [ RUN ] QLinearConcatS8.InputOne_Dynamic [ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms) [ RUN ] QLinearConcatS8.InputOne_Const [ OK ] QLinearConcatS8.InputOne_Const (255 ms) [----------] 11 tests from QLinearConcatS8 (3385 ms total) [----------] Global test environment tear-down [==========] 21 tests from 3 test suites ran. (9355 ms total) [ PASSED ] 21 tests. ``` [#16971](https://github.com/microsoft/onnxruntime/pull/16971) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Xiang Zhang <xianz@microsoft.com>	2024-01-03 16:11:31 -08:00
Xiang Zhang	9bbe425d7f	Register LPpool18 and AvgPool 19 (#16880 )	2024-01-03 16:10:49 -08:00
Jeff Bloomfield	c3d96a7b35	Update DML version to 1.13.0 (#18978 ) Update DML nuget version to 1.13.0	2024-01-03 16:09:55 -08:00
Jiajie Hu	3b8b9147fa	[js/webgpu] Mitigate floating point accuracy issue in Resize (#18956 ) ### Description The patch fixes a floating point accuracy issue in Resize by preferring integer indices and integer arithmetic where possible. ### Motivation and Context Model test `test_resize_upsample_sizes_nearest_floor_align_corners` was observed to be failing on certain platforms. The root cause is the inaccurate floating point evaluation of 21 / 7 (2.999... vs 3), which results in the wrong input element to be indexed (floor(2.999...) vs floor(3)).	2024-01-03 14:15:26 -08:00
Yang Gu	c5f3952b68	[js/webgpu] Introduce trace support (#18928 ) This is to leverage console.timeStamp to add a single marker to browsers' (only Chromium and Firefox support it) performance tool. With this support, we can dump both CPU and GPU timestamps, and use post-processing tool to clearly understand the calibrated timeline. A demo tool can be found at https://github.com/webatintel/ort-test, and more detailed info can be found at https://docs.google.com/document/d/1TuVxjE8jnELBXdhI4QGFgMnUqQn6Q53QA9y4a_dH688/edit.	2024-01-03 10:13:17 -08:00
PeixuanZuo	7a454acd61	[ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985 ) Update CI/Packaing pipeline to ROCm6.0	2024-01-03 17:25:15 +08:00
Yi Zhang	c97e3f4821	[Fix] exception in Fuzz Test pipeline (#18984 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The file path is not correct.	2024-01-03 14:53:31 +08:00
Ye Wang	1c2dca95d8	pass rotary embedding to attention op (#18846 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-02 20:38:33 -08:00
Scott McKay	df740d7d15	Throw if unique_ptr or array allocation fails due to SafeInt overflow (#18941 ) ### Description <!-- Describe your changes. --> If we fail to calculate the buffer size (due to overflow) we currently return a nullptr. This is inconsistent as an actual memory allocation failure throws. An overflow would typically be due to bad input so an exception makes more sense given that. Change to throw so code using MakeUniquePtr* and AllocArray* doesn't need to check for nullptr. Add some extra info to the log message to help debugging. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Should help with #18905 by avoiding the invalid attempted usage of a nullptr from the allocation. Extra info _might_ help with figuring out where the overflow is coming from which is the real issue.	2024-01-03 07:57:51 +10:00
Yifan Li	3993d43048	[EP Perf] Fix missing Azure cli & use onnx zoo model inside image (#18917 ) ### Description * Fix [missing Azure CLI issue](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392612&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=12). * Now, once CI fails to run `az --version`, it would auto-reinstall the azure cli dependency * Use existing onnx zoo model inside image during memtesting * to avoid test failure when onnx model zoo is restructuring * Display more detail info of valgrind when memtesting * Clear invalid dep of existing AddressSanitizer test case ### Validate * Before the fix, Azure CLI is missing: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392994&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b&l=10 * After the fix: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=392619&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=d0fed32c-7043-5439-8bf2-dd69d21beb5b	2024-01-01 17:14:39 -08:00
dependabot[bot]	81cbdb10a9	Bump actions/upload-artifact from 3 to 4 (#18920 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <h2>What's Changed</h2> <p>The release of upload-artifact@v4 and download-artifact@v4 are major changes to the backend architecture of Artifacts. They have numerous performance and behavioral improvements.</p> <p>For more information, see the <a href="https://github.com/actions/toolkit/tree/main/packages/artifact"><code>@actions/artifact</code></a> documentation.</p> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/vmjoseph"><code>@vmjoseph</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/464">actions/upload-artifact#464</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v3...v4.0.0">https://github.com/actions/upload-artifact/compare/v3...v4.0.0</a></p> <h2>v3.1.3</h2> <h2>What's Changed</h2> <ul> <li>chore(github): remove trailing whitespaces by <a href="https://github.com/ljmf00"><code>@ljmf00</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/313">actions/upload-artifact#313</a></li> <li>Bump <code>@actions/artifact</code> version to v1.1.2 by <a href="https://github.com/bethanyj28"><code>@bethanyj28</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/436">actions/upload-artifact#436</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v3...v3.1.3">https://github.com/actions/upload-artifact/compare/v3...v3.1.3</a></p> <h2>v3.1.2</h2> <ul> <li>Update all <code>@actions/*</code> NPM packages to their latest versions- <a href="https://redirect.github.com/actions/upload-artifact/issues/374">#374</a></li> <li>Update all dev dependencies to their most recent versions - <a href="https://redirect.github.com/actions/upload-artifact/issues/375">#375</a></li> </ul> <h2>v3.1.1</h2> <ul> <li>Update actions/core package to latest version to remove <code>set-output</code> deprecation warning <a href="https://redirect.github.com/actions/upload-artifact/issues/351">#351</a></li> </ul> <h2>v3.1.0</h2> <h2>What's Changed</h2> <ul> <li>Bump <code>@actions/artifact</code> to v1.1.0 (<a href="https://redirect.github.com/actions/upload-artifact/pull/327">actions/upload-artifact#327</a>) <ul> <li>Adds checksum headers on artifact upload (<a href="https://redirect.github.com/actions/toolkit/pull/1095">actions/toolkit#1095</a>) (<a href="https://redirect.github.com/actions/toolkit/pull/1063">actions/toolkit#1063</a>)</li> </ul> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c7d193f32e`"><code>c7d193f</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/466">#466</a> from actions/v4-beta</li> <li><a href="`13131bb095`"><code>13131bb</code></a> licensed cache</li> <li><a href="`4a6c273b98`"><code>4a6c273</code></a> Merge branch 'main' into v4-beta</li> <li><a href="`f391bb91a3`"><code>f391bb9</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/465">#465</a> from actions/robherley/v4-documentation</li> <li><a href="`9653d03c4b`"><code>9653d03</code></a> Apply suggestions from code review</li> <li><a href="`875b630764`"><code>875b630</code></a> add limitations section</li> <li><a href="`ecb21463e9`"><code>ecb2146</code></a> add compression example</li> <li><a href="`5e7604f84a`"><code>5e7604f</code></a> trim some repeated info</li> <li><a href="`d6437d0758`"><code>d6437d0</code></a> naming</li> <li><a href="`1b56155703`"><code>1b56155</code></a> s/v4-beta/v4/g</li> <li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v3...v4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-12-31 21:10:47 -08:00
satyajandhyala	780fc3611b	[JS/Web] Sajandhy/webgpu resize scales rank check (#18954 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-29 09:23:27 -08:00
Wanming Lin	96d1f3203a	[WebNN EP] Decompose Concat with input number > 4 for CPU backend (#18930 ) WebNN XNNPack backend only supports the concat with inputs number <= 4, decomposing the Concat with inputs number > 4 into multiple WebNN concat ops.	2023-12-28 17:31:56 -08:00
Wanming Lin	a3626b67b3	[WebNN EP] Rename op logicalNot to not (#18936 ) WebNN latest spec uses the name 'not'.	2023-12-28 17:31:37 -08:00

1 2 3 4 5 ...

10272 commits