onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

Author	SHA1	Message	Date
Xu Xing	76dfe5347c	[js/webgpu] Support uniforms for instance-norm (#18929 ) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2024-01-09 14:56:00 -08:00
Milos Puzovic	37ac9d391c	Enable Arm Compute Library 23.08 (#17672 ) ### Description This PR enables onnxruntime to build with the most recent release of Arm Compute Library ### Motivation and Context The latest version of Arm Compute Library that onnxruntime can build is 20.02 which is more than 3 years old.	2024-01-09 14:10:25 -08:00
Changming Sun	a2afd92093	Format TS code (#19066 ) ### Description Format code	2024-01-09 13:41:10 -08:00
Ashwini Khade	897a4163d7	Update transformer version for training CIs (#19046 ) ### Description Updating version to resolve security vulnerability.	2024-01-09 12:00:34 -08:00
Yifan Li	574c7caf3a	[TensorRT EP] Clear constrain of trt plugin with different input type (#19044 ) ### Description <!-- Describe your changes. --> Add heterogeneous support to skip this check for TRT plugin which has different input tensor types ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 10:29:06 -08:00
zesongw	ad6dd0a597	[WebNN] Enable npm unit tests (#18486 ) ### Description - Support more test cases for WebNN EP in suite-test-list.jsonc - Add DISABLE_WEBNN flag in build.ts as preparing for WebNN EP release - Add test option: '--webnn-device-type' in test-runner-args-cli.ts to support running WebNN 'gpu' deviceType - Use Chrome Stable as default browser for WebNN testing to unblock the CI limitation.	2024-01-09 10:10:57 -08:00
Xu Xing	557ac74c05	[js/webgpu] Support gemm uniforms (#19056 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 09:57:06 -08:00
Xu Xing	42ba2aed54	[js/webgpu] Support pad uniforms (#19057 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 09:34:56 -08:00
Xu Xing	eb92681bfb	[js/webgpu] Support range uniforms (#19055 )	2024-01-09 09:33:57 -08:00
junchao-loongson	c1367ae553	Sqnbitgemm: add loongarch64 code path (#18775 ) ### Description Add support code for loongarch64 platform in sqnbitgemm ``` 100% tests passed, 0 tests failed out of 7 Total Test time (real) = 116.99 sec 2023-12-11 10:43:21,287 build [INFO] - Build complete ```	2024-01-09 09:20:45 -08:00
Xu Xing	dee6a5b371	[js/webgpu] Support uniforms for attention and multihead attention (#18903 )	2024-01-09 07:46:30 -08:00
Changming Sun	ab897a4a40	Remove Windows ARM32 from nuget packaging pipelines (#19049 ) ### Description 1. Remove Windows ARM32 from nuget packaging pipelines 2. Add missing component-governance-component-detection-steps.yml to some build jobs. ### Motivation and Context Stop supporting Windows ARM32 to align with [Windows's support policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64). Users who need this feature still can build the DLLs from source. However, later on we will remove that support too.	2024-01-09 07:45:03 -08:00
pengwa	7cb8b20db2	Remove mem consuming test case to unblock running ci on lower-end gpu (#19059 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 20:05:34 +08:00
zesongw	eb35896ede	[WebNN EP] Update WebNN normalization ops (#18817 ) Use batchNormalization, layerNormalization and instanceNormalization instead of meanVarianceNormalization to implement normalization Ops. The spec of meanVarianceNormalization has been deleted. Remove groupNormalization.	2024-01-08 22:02:44 -08:00
Changming Sun	68c29ece23	In a Linux or Android build check if the compiler support bfloat16 and float16 (#18813 ) ### Description Restrict clang version because we have an upcoming change that requires clang version >=16 , which will mainly affect Android build.	2024-01-08 19:46:33 -08:00
Xu Xing	8f024b7394	[js/webgpu] Support uniforms for layer-norm (#18755 )	2024-01-08 18:16:25 -08:00
Guenther Schmuelling	a8bb1df331	[js/webgpu] fix heap access > 2GB (#19010 )	2024-01-08 17:58:38 -08:00
Jeff Bloomfield	975a315cd7	Fix x86 build error in GraphDescBuilder.cpp affecting packaging pipeline (#19045 ) ### Description This addresses a 32 bit build error affecting the packaging pipeline ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-08 17:49:19 -08:00
zesongw	99a8400e90	[WebNN EP] Fall back resize nearest mode for WebNN CPU backend (#19039 ) WebNN CPU backend only supports linear mode. Fall back for this case.	2024-01-08 17:16:52 -08:00
Adrian Lizarraga	52e5601449	[QNN Nuget Pipeline] Build with ML ops and detect ORT version (#19024 ) ### Description - Removes `--disable_ml_ops` build flag - Automatically detects ORT version from VERSION file via `templates/set-version-number-variables-step.yml`. We will no longer need to create a commit to update ORT versions. ### Motivation and Context - A new unit test caused failures in the QNN Nuget pipeline because it did not enable ml ops. - Automate ORT version specification	2024-01-08 12:44:12 -08:00
Yi Zhang	e8ac97c8d8	Move Windows GPU training job to A10 (#19041 ) ### Description 1. Update sm to 86 ### Motivation and Context We have more A10 quota then T4 and Nvidia AXX could be partitioned	2024-01-08 09:19:58 -08:00
Jeff Daily	db3c076081	[ROCm] do not use failed miopen fusion compile (#19012 ) The FusedConv operator for the ROCm EP could fail to compile the fused operation, in which case it should not attempt to use the failed fusion plan. In addition, the hash for the miopenConvolutionDescriptor_t for newer ROCm versions was failing to use all components of the descriptor.	2024-01-08 19:06:45 +08:00
Edward Chen	4190c29d22	Add MatMulNBits accuracy_level parameter to quantization utilities. (#19015 ) Allow MatMulNBits `accuracy_level` attribute (added in #17669) to be set to a particular value when the model is quantized.	2024-01-05 14:51:07 -08:00
PeixuanZuo	efdcefcf8c	[ROCm] fix security warning (#19017 ) fix security warning	2024-01-05 10:05:34 -08:00
Jiajie Hu	447a3a7c70	[js/webgpu] Fix Expand/Gather when input type is bool (#18999 ) ### Description Also update the op test suite. ### Motivation and Context Previously the total size in case `Expand - last dim is not divisible by 4` was a multiple of 4, even though the last dimension was not, so the bug has never been caught.	2024-01-05 08:16:15 -08:00
Wanming Lin	7f0aac0d8a	Revert "[WebNN EP] Rename op logicalNot to not" (#18997 ) Reverts microsoft/onnxruntime#18936 WebNN spec is discussing using the `logicalNot` name at https://github.com/webmachinelearning/webnn/issues/496, and the Chromium implementation has suspended the renaming change. For consistent, we should keep using `logicalNot` in WebNN EP util it is finalized.	2024-01-05 08:15:50 -08:00
Changming Sun	e155c66b4a	Change all macOS python packages to use universal2 (#19013 ) ### Description Change all macOS python packages to use universal2, to reduce the number of packages we have. ### Motivation and Context According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur), macOS 11 is the first macOS version that supports universal 2. And it is the min macOS version we support. So we no longer need to maintain separate binaries for different CPU archs.	2024-01-04 17:44:49 -08:00
liqun Fu	e10a8ae31f	reduce max/min 20 (#17805 ) ### Description reducemax/min have been updated in onnx(20). implement it in ort ### Motivation and Context this is for ort1.17.0 release --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-01-04 17:41:01 -08:00
Jeff Bloomfield	55a669409a	Merge pull request #18983 from microsoft/WindowsAI Merge WindowsAI to main	2024-01-04 17:21:19 -08:00
Adrian Lizarraga	02b1ff5fa2	[QNN EP] Support multithreaded inference of a single session (#18981 ) ### Description - Add mutex to protect QNN API calls for executing a graph and extracting the corresponding profile data. - Ensures QNN EP's execute function does not store unnecessary state (i.e., input and output buffer pointers do not need to be stored as class members.) ### Motivation and Context Allow calling `session.Run()` from multiple threads when using QNN EP.	2024-01-04 13:32:48 -08:00
Wei-Sheng Chin	658e30eb33	Remove DORT since it's in PyTorch main now (#18996 ) Main code are removed and tests are modified to use DORT directly from PyTorch.	2024-01-04 12:59:47 -08:00
Xavier Dupré	889b1ef2d1	Fix schema type constraint for custom operators (#17497 ) ### Description onnxruntime may raise an error "type inference failed" but when a custom operator sets IsHomogeneous to false in its schema. This change make sure that TypeInferenceFunction and schema type constraints are aligned to prevent that from happening. --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2024-01-04 20:27:46 +01:00
Jeff Bloomfield	7401b6661d	Update OperatorKernels.md	2024-01-04 11:27:03 -08:00
Changming Sun	011b562b51	Update c# dependencies (#18995 ) ### Description Update c# dependencies	2024-01-04 10:41:28 -08:00
Jeff Bloomfield	8ea3e68192	Update ContribOperators.md	2024-01-04 10:10:46 -08:00
Yulong Wang	b18abaaa2c	[js/web] wait for threadpool initialization (#18952 ) ### Description a replacement of #18683. try to resolve #18689. By specifying "-s PTHREAD_POOL_SIZE" flag in emscripten, it forces the threadpool to initialize before the webassembly instance is available.	2024-01-04 08:06:55 -08:00
xhcao	867b9d8f04	[js/webgpu] Fix f16 errors for ConvTranspose2D (#18986 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-04 08:06:01 -08:00
Atanas Dimitrov	4e2d88b75f	Remove useless `NodeProto` serializations (#18791 ) ## Description This pull request aims to enhance the efficiency of the inference session creation by eliminating unnecessary `Node::ToProto` invocations. The current codebase presents opportunities for optimization, particularly in the removal of superfluous `Node::ToProto` calls, along with their subsequent `~NodeProto` invocations. ## Motivation and Context The optimization focus of this pull request is on addressing low-hanging fruit in the inference session creation process. By strategically removing undesired `Node::ToProto` calls, we aim to streamline the codebase and enhance the overall performance. The flame graphs illustrate the notable improvements achieved by reducing the percentage of `Node::ToProto` calls, thereby optimizing the execution flow. ### Code Snippet ```cpp TEST(InferenceSessionTests, Bench) { // Initialize logging manager auto logging_manager = std::make_unique<logging::LoggingManager>( std::unique_ptr<ISink>(new CLogSink()), logging::Severity::kVERBOSE, false, LoggingManager::InstanceType::Temporal); // Create environment std::unique_ptr<Environment> env; auto st = Environment::Create(std::move(logging_manager), env); ASSERT_TRUE(st.IsOK()); // Configure session options SessionOptions so; so.execution_mode = ExecutionMode::ORT_SEQUENTIAL; so.graph_optimization_level = TransformerLevel::Level2; so.intra_op_param.thread_pool_size = 1; // Initialize and load the InferenceSession InferenceSessionTestGlobalThreadPools session1{so, *env}; ASSERT_STATUS_OK(session1.Load("big.onnx")); ASSERT_STATUS_OK(session1.Initialize()); } ``` ### `big.onnx` model creation ```python import onnx import numpy as np from spox import argument, build, Tensor, Var from spox.opset.ai.onnx import v17 as op from spox.opset.ai.onnx.ml.v3 import label_encoder a = argument(Tensor(np.int64, ('N',))) c = a for x in range(1000): c = op.mul(c, op.const(np.ones(10000, dtype=np.int64))) for x in range(3000): all_strings = list("random_string" + str(i) for i in range(100)) all_ints = list(range(len(all_strings))) c = label_encoder( c, keys_int64s=all_ints, values_strings=all_strings ) c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints) model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c}) onnx.save(model, "big.onnx") ``` Testing in `Release` with `perf` yields: Before: 3.3% spent in `Node::ToProto` After: 1.6% spent in `Node::ToProto` --------- Co-authored-by: Atanas Dimitrov <atanasdimitrov@Atanass-MacBook-Pro.local>	2024-01-04 17:38:28 +10:00
Jeff Bloomfield	f4ad940ff3	Disable MatMul QDQ selector on DML EP until MatMulIntegerToFloat is re-enabled	2024-01-03 18:37:14 -08:00
Steven Roussey	d5628f52df	link to docs incorrect for js/web/node (#18960 ) ### Description link to docs incorrect for js/web/node ### Motivation and Context Trying to build myself and not yet succeeding.	2024-01-03 17:30:24 -08:00
JJ	5fade70b50	Update README.md (#18963 ) Fixed a small spelling error. ### Description <!-- Describe your changes. --> Small spelling error fix. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? It is documentation for the product, and it misspells the word documentation. This reflects on your product and the quality of the work.	2024-01-03 17:26:25 -08:00
Scott McKay	8e9188e265	Add SessionOptions use_deterministic_compute to the C and C++ APIs. (#18944 ) ### Description <!-- Describe your changes. --> SessionOptions use_deterministic_compute can be set via the python API. User request to enable setting via C API. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #17416	2024-01-04 11:12:48 +10:00
Jeff Bloomfield	70a6f816af	Port attention query fix from `b2768bbf23`	2024-01-03 16:22:54 -08:00
raoanag	56fcea94e3	Enable QDQ quantization for DML EP (#18367 ) ### Description This enables QDQ transforms with the DML EP	2024-01-03 16:13:23 -08:00
Jeff Bloomfield	ee60e3af6c	Limit size of constant nodes creates by DML EP following deduplicatio… (#18915 ) ### Description This limits the size of constant data nodes which the DML EP creates in the DML graph following de-duplication of 1D quantization tensors. In the process it reduces a check for the maximum size of the constant node. This is merged from: https://github.com/microsoft/onnxruntime/pull/18494 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-03 16:13:22 -08:00
tbqh	70d3f682a7	De-duplicate 1D scale and zero point tensors to scalars in DML kernels (#18862 ) ### Description Cleanup and rebase from [this PR](https://github.com/microsoft/onnxruntime/pull/18629) ### Motivation and Context --------- Co-authored-by: Christian Larson <chrilaMSFT@users.noreply.github.com> Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com> Co-authored-by: Anagha Rao <anagrao@microsoft.com>	2024-01-03 16:13:19 -08:00
Jeff Bloomfield	bdaeebd6ff	Fix bug in DML EP ExecuteCommandList fast path and simplify design (#18866 ) ### Description This addresses a bug in a fast path that was added for submission of re-used command lists of fused graph kernels in the DML EP, addressing a D3D debug layer error. ### Motivation and Context The fast path in DmlCommandRecorder::ExecuteCommandList enabled a current non-reused command list, if empty, to be used for commands following submission of the fused command list. The fix ensures the associated command allocator is only re-used after the next fence value is completed, which is higher due to submission of the other command list. The command recorder design was intended to support batching of provided command list execution, however it submits command lists immedately as an implementation detail to maximize CPU/GPU parallelism. If that heuristic was removed, it would expose additional issues in this same fast path. Because of this and complexity and inefficiency of the old batching mechanism, I also removed this.	2024-01-03 16:13:15 -08:00
Sheil Kumar	b2f81c8725	Hide Col2Im registration behind DML_TARGET_VERSION 6300 (#18829 ) Hide Col2Im registration behind DML_TARGET_VERSION 6300 Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-01-03 16:13:15 -08:00
Jake Mathern	d2f7a5b128	Cherry pick fix constant pow (#18785 ) ### Description Cherry pick https://github.com/microsoft/onnxruntime/pull/18784	2024-01-03 16:13:14 -08:00
Sheil Kumar	107d7492b9	[DirectML EP] Add DML EP registration for Col2Im (#17786 ) ### Description [DirectML EP] Add DML EP registration for Col2Im operator ### Motivation and Context Add Col2Im support for opset 18. This operator is implemented as the DirectML Fold operator. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2024-01-03 16:13:14 -08:00

1 2 3 4 5 ...

10297 commits