Commit graph

10297 commits

Author SHA1 Message Date
Xu Xing
76dfe5347c
[js/webgpu] Support uniforms for instance-norm (#18929)
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
2024-01-09 14:56:00 -08:00
Milos Puzovic
37ac9d391c
Enable Arm Compute Library 23.08 (#17672)
### Description

This PR enables onnxruntime to build with the most recent release of Arm
Compute Library

### Motivation and Context

The latest version of Arm Compute Library that onnxruntime can build is
20.02 which is more than 3 years old.
2024-01-09 14:10:25 -08:00
Changming Sun
a2afd92093
Format TS code (#19066)
### Description
Format code
2024-01-09 13:41:10 -08:00
Ashwini Khade
897a4163d7
Update transformer version for training CIs (#19046)
### Description
Updating version to resolve security vulnerability.
2024-01-09 12:00:34 -08:00
Yifan Li
574c7caf3a
[TensorRT EP] Clear constrain of trt plugin with different input type (#19044)
### Description
<!-- Describe your changes. -->
Add heterogeneous support to skip this check for TRT plugin which has
different input tensor types



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 10:29:06 -08:00
zesongw
ad6dd0a597
[WebNN] Enable npm unit tests (#18486)
### Description
- Support more test cases for WebNN EP in suite-test-list.jsonc
- Add DISABLE_WEBNN flag in build.ts as preparing for WebNN EP release
- Add test option: '--webnn-device-type' in test-runner-args-cli.ts to
support running WebNN 'gpu' deviceType
- Use Chrome Stable as default browser for WebNN testing to unblock the
CI limitation.
2024-01-09 10:10:57 -08:00
Xu Xing
557ac74c05
[js/webgpu] Support gemm uniforms (#19056)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 09:57:06 -08:00
Xu Xing
42ba2aed54
[js/webgpu] Support pad uniforms (#19057)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 09:34:56 -08:00
Xu Xing
eb92681bfb
[js/webgpu] Support range uniforms (#19055) 2024-01-09 09:33:57 -08:00
junchao-loongson
c1367ae553
Sqnbitgemm: add loongarch64 code path (#18775)
### Description

Add support code for loongarch64 platform in sqnbitgemm

```
100% tests passed, 0 tests failed out of 7

Total Test time (real) = 116.99 sec
2023-12-11 10:43:21,287 build [INFO] - Build complete

```
2024-01-09 09:20:45 -08:00
Xu Xing
dee6a5b371
[js/webgpu] Support uniforms for attention and multihead attention (#18903) 2024-01-09 07:46:30 -08:00
Changming Sun
ab897a4a40
Remove Windows ARM32 from nuget packaging pipelines (#19049)
### Description
1. Remove Windows ARM32 from nuget  packaging pipelines

2. Add missing component-governance-component-detection-steps.yml to
some build jobs.

### Motivation and Context
Stop supporting Windows ARM32 to align with [Windows's support
policy](https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64).
Users who need this feature still can build the DLLs from source.
However, later on we will remove that support too.
2024-01-09 07:45:03 -08:00
pengwa
7cb8b20db2
Remove mem consuming test case to unblock running ci on lower-end gpu (#19059)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 20:05:34 +08:00
zesongw
eb35896ede
[WebNN EP] Update WebNN normalization ops (#18817)
Use batchNormalization, layerNormalization and instanceNormalization
instead of meanVarianceNormalization to implement normalization Ops. The
spec of meanVarianceNormalization has been deleted.
Remove groupNormalization.
2024-01-08 22:02:44 -08:00
Changming Sun
68c29ece23
In a Linux or Android build check if the compiler support bfloat16 and float16 (#18813)
### Description
Restrict clang version because we have an upcoming change that requires
clang version >=16 , which will mainly affect Android build.
2024-01-08 19:46:33 -08:00
Xu Xing
8f024b7394
[js/webgpu] Support uniforms for layer-norm (#18755) 2024-01-08 18:16:25 -08:00
Guenther Schmuelling
a8bb1df331
[js/webgpu] fix heap access > 2GB (#19010) 2024-01-08 17:58:38 -08:00
Jeff Bloomfield
975a315cd7
Fix x86 build error in GraphDescBuilder.cpp affecting packaging pipeline (#19045)
### Description
This addresses a 32 bit build error affecting the packaging pipeline



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-08 17:49:19 -08:00
zesongw
99a8400e90
[WebNN EP] Fall back resize nearest mode for WebNN CPU backend (#19039)
WebNN CPU backend only supports linear mode. Fall back for this case.
2024-01-08 17:16:52 -08:00
Adrian Lizarraga
52e5601449
[QNN Nuget Pipeline] Build with ML ops and detect ORT version (#19024)
### Description
- Removes `--disable_ml_ops` build flag 
- Automatically detects ORT version from VERSION file via
`templates/set-version-number-variables-step.yml`. We will no longer
need to create a commit to update ORT versions.

### Motivation and Context
- A new unit test caused failures in the QNN Nuget pipeline because it
did not enable ml ops.
- Automate ORT version specification
2024-01-08 12:44:12 -08:00
Yi Zhang
e8ac97c8d8
Move Windows GPU training job to A10 (#19041)
### Description
1. Update sm to 86

### Motivation and Context
We have more A10 quota then T4 and Nvidia AXX could be  partitioned
2024-01-08 09:19:58 -08:00
Jeff Daily
db3c076081
[ROCm] do not use failed miopen fusion compile (#19012)
The FusedConv operator for the ROCm EP could fail to compile the fused
operation, in which case it should not attempt to use the failed fusion
plan. In addition, the hash for the miopenConvolutionDescriptor_t for
newer ROCm versions was failing to use all components of the descriptor.
2024-01-08 19:06:45 +08:00
Edward Chen
4190c29d22
Add MatMulNBits accuracy_level parameter to quantization utilities. (#19015)
Allow MatMulNBits `accuracy_level` attribute (added in #17669) to be set to a particular value when the model is quantized.
2024-01-05 14:51:07 -08:00
PeixuanZuo
efdcefcf8c
[ROCm] fix security warning (#19017)
fix security warning
2024-01-05 10:05:34 -08:00
Jiajie Hu
447a3a7c70
[js/webgpu] Fix Expand/Gather when input type is bool (#18999)
### Description
Also update the op test suite.

### Motivation and Context
Previously the *total* size in case `Expand - last dim is not divisible
by 4` was a multiple of 4, even though the *last dimension* was not, so
the bug has never been caught.
2024-01-05 08:16:15 -08:00
Wanming Lin
7f0aac0d8a
Revert "[WebNN EP] Rename op logicalNot to not" (#18997)
Reverts microsoft/onnxruntime#18936

WebNN spec is discussing using the `logicalNot` name at
https://github.com/webmachinelearning/webnn/issues/496, and the Chromium
implementation has suspended the renaming change. For consistent, we
should keep using `logicalNot` in WebNN EP util it is finalized.
2024-01-05 08:15:50 -08:00
Changming Sun
e155c66b4a
Change all macOS python packages to use universal2 (#19013)
### Description
Change all macOS python packages to use universal2, to reduce the number
of packages we have.

### Motivation and Context
According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur),
macOS 11 is the first macOS version that supports universal 2. And it is
the min macOS version we support. So we no longer need to maintain
separate binaries for different CPU archs.
2024-01-04 17:44:49 -08:00
liqun Fu
e10a8ae31f
reduce max/min 20 (#17805)
### Description
reducemax/min have been updated in onnx(20). implement it in ort



### Motivation and Context
this is for ort1.17.0 release

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2024-01-04 17:41:01 -08:00
Jeff Bloomfield
55a669409a
Merge pull request #18983 from microsoft/WindowsAI
Merge WindowsAI to main
2024-01-04 17:21:19 -08:00
Adrian Lizarraga
02b1ff5fa2
[QNN EP] Support multithreaded inference of a single session (#18981)
### Description
- Add mutex to protect QNN API calls for executing a graph and
extracting the corresponding profile data.
- Ensures QNN EP's execute function does not store unnecessary state
(i.e., input and output buffer pointers do not need to be stored as
class members.)

### Motivation and Context
Allow calling `session.Run()` from multiple threads when using QNN EP.
2024-01-04 13:32:48 -08:00
Wei-Sheng Chin
658e30eb33
Remove DORT since it's in PyTorch main now (#18996)
Main code are removed and tests are modified to use DORT directly from
PyTorch.
2024-01-04 12:59:47 -08:00
Xavier Dupré
889b1ef2d1
Fix schema type constraint for custom operators (#17497)
### Description
onnxruntime may raise an error "type inference failed" but when a custom
operator sets IsHomogeneous to false in its schema. This change make
sure that TypeInferenceFunction and schema type constraints are aligned
to prevent that from happening.

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2024-01-04 20:27:46 +01:00
Jeff Bloomfield
7401b6661d Update OperatorKernels.md 2024-01-04 11:27:03 -08:00
Changming Sun
011b562b51
Update c# dependencies (#18995)
### Description
Update c# dependencies
2024-01-04 10:41:28 -08:00
Jeff Bloomfield
8ea3e68192 Update ContribOperators.md 2024-01-04 10:10:46 -08:00
Yulong Wang
b18abaaa2c
[js/web] wait for threadpool initialization (#18952)
### Description

a replacement of #18683. try to resolve #18689.

By specifying "-s PTHREAD_POOL_SIZE" flag in emscripten, it forces the
threadpool to initialize before the webassembly instance is available.
2024-01-04 08:06:55 -08:00
xhcao
867b9d8f04
[js/webgpu] Fix f16 errors for ConvTranspose2D (#18986)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-04 08:06:01 -08:00
Atanas Dimitrov
4e2d88b75f
Remove useless NodeProto serializations (#18791)
## Description
This pull request aims to enhance the efficiency of the inference
session creation by eliminating unnecessary `Node::ToProto` invocations.
The current codebase presents opportunities for optimization,
particularly in the removal of superfluous `Node::ToProto` calls, along
with their subsequent `~NodeProto` invocations.

## Motivation and Context
The optimization focus of this pull request is on addressing low-hanging
fruit in the inference session creation process. By strategically
removing undesired `Node::ToProto` calls, we aim to streamline the
codebase and enhance the overall performance. The flame graphs
illustrate the notable improvements achieved by reducing the percentage
of `Node::ToProto` calls, thereby optimizing the execution flow.

### Code Snippet
```cpp
TEST(InferenceSessionTests, Bench) {
  // Initialize logging manager
  auto logging_manager = std::make_unique<logging::LoggingManager>(
      std::unique_ptr<ISink>(new CLogSink()), logging::Severity::kVERBOSE, false,
      LoggingManager::InstanceType::Temporal);

  // Create environment
  std::unique_ptr<Environment> env;
  auto st = Environment::Create(std::move(logging_manager), env);
  ASSERT_TRUE(st.IsOK());

  // Configure session options
  SessionOptions so;
  so.execution_mode = ExecutionMode::ORT_SEQUENTIAL;
  so.graph_optimization_level = TransformerLevel::Level2;
  so.intra_op_param.thread_pool_size = 1;

  // Initialize and load the InferenceSession
  InferenceSessionTestGlobalThreadPools session1{so, *env};
  ASSERT_STATUS_OK(session1.Load("big.onnx"));
  ASSERT_STATUS_OK(session1.Initialize());
}
```

### `big.onnx` model creation
```python
import onnx
import numpy as np
from spox import argument, build, Tensor, Var
from spox.opset.ai.onnx import v17 as op
from spox.opset.ai.onnx.ml.v3 import label_encoder

a = argument(Tensor(np.int64, ('N',)))
c = a

for x in range(1000):
    c = op.mul(c, op.const(np.ones(10000, dtype=np.int64)))

for x in range(3000):
    all_strings = list("random_string" + str(i) for i in range(100))
    all_ints = list(range(len(all_strings)))
    c = label_encoder(
        c,
        keys_int64s=all_ints,
        values_strings=all_strings
    )
    c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints)

model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c})
onnx.save(model, "big.onnx")
```

Testing in `Release` with `perf` yields:
Before: 3.3% spent in `Node::ToProto`
After: 1.6% spent in `Node::ToProto`

---------

Co-authored-by: Atanas Dimitrov <atanasdimitrov@Atanass-MacBook-Pro.local>
2024-01-04 17:38:28 +10:00
Jeff Bloomfield
f4ad940ff3 Disable MatMul QDQ selector on DML EP until MatMulIntegerToFloat is re-enabled 2024-01-03 18:37:14 -08:00
Steven Roussey
d5628f52df
link to docs incorrect for js/web/node (#18960)
### Description
link to docs incorrect for js/web/node



### Motivation and Context
Trying to build myself and not yet succeeding.
2024-01-03 17:30:24 -08:00
JJ
5fade70b50
Update README.md (#18963)
Fixed a small spelling error.

### Description
<!-- Describe your changes. -->
Small spelling error fix.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
It is documentation for the product, and it misspells the word
documentation. This reflects on your product and the quality of the
work.
2024-01-03 17:26:25 -08:00
Scott McKay
8e9188e265
Add SessionOptions use_deterministic_compute to the C and C++ APIs. (#18944)
### Description
<!-- Describe your changes. -->
SessionOptions use_deterministic_compute can be set via the python API.
User request to enable setting via C API.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#17416
2024-01-04 11:12:48 +10:00
Jeff Bloomfield
70a6f816af Port attention query fix from b2768bbf23 2024-01-03 16:22:54 -08:00
raoanag
56fcea94e3 Enable QDQ quantization for DML EP (#18367)
### Description
This enables QDQ transforms with the DML EP
2024-01-03 16:13:23 -08:00
Jeff Bloomfield
ee60e3af6c Limit size of constant nodes creates by DML EP following deduplicatio… (#18915)
### Description
This limits the size of constant data nodes which the DML EP creates in
the DML graph following de-duplication of 1D quantization tensors. In
the process it reduces a check for the maximum size of the constant
node.

This is merged from: https://github.com/microsoft/onnxruntime/pull/18494

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-03 16:13:22 -08:00
tbqh
70d3f682a7 De-duplicate 1D scale and zero point tensors to scalars in DML kernels (#18862)
### Description
Cleanup and rebase from [this
PR](https://github.com/microsoft/onnxruntime/pull/18629)



### Motivation and Context

---------

Co-authored-by: Christian Larson <chrilaMSFT@users.noreply.github.com>
Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com>
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
Co-authored-by: Anagha Rao <anagrao@microsoft.com>
2024-01-03 16:13:19 -08:00
Jeff Bloomfield
bdaeebd6ff Fix bug in DML EP ExecuteCommandList fast path and simplify design (#18866)
### Description
This addresses a bug in a fast path that was added for submission of
re-used command lists of fused graph kernels in the DML EP, addressing a
D3D debug layer error.

### Motivation and Context
The fast path in DmlCommandRecorder::ExecuteCommandList enabled a
current non-reused command list, if empty, to be used for commands
following submission of the fused command list. The fix ensures the
associated command allocator is only re-used after the next fence value
is completed, which is higher due to submission of the other command
list.

The command recorder design was intended to support batching of provided
command list execution, however it submits command lists immedately as
an implementation detail to maximize CPU/GPU parallelism. If that
heuristic was removed, it would expose additional issues in this same
fast path. Because of this and complexity and inefficiency of the old
batching mechanism, I also removed this.
2024-01-03 16:13:15 -08:00
Sheil Kumar
b2f81c8725 Hide Col2Im registration behind DML_TARGET_VERSION 6300 (#18829)
Hide Col2Im registration behind DML_TARGET_VERSION 6300

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-01-03 16:13:15 -08:00
Jake Mathern
d2f7a5b128 Cherry pick fix constant pow (#18785)
### Description
Cherry pick https://github.com/microsoft/onnxruntime/pull/18784
2024-01-03 16:13:14 -08:00
Sheil Kumar
107d7492b9 [DirectML EP] Add DML EP registration for Col2Im (#17786)
### Description
[DirectML EP] Add DML EP registration for Col2Im operator

### Motivation and Context
Add Col2Im support for opset 18.
This operator is implemented as the DirectML Fold operator.

---------

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2024-01-03 16:13:14 -08:00