Commit graph

11261 commits

Author SHA1 Message Date
zhijiang
269d9b094f
Zhijxu/fix softmax cudnn bf16 (#21045)
if seq >2048, ort will fallback to cudnn version, while when dtype is
bf16, ort will throw exception, this PR trying to fix it.
2024-06-24 16:07:39 +08:00
Yi Zhang
5b5ce0bfb0
Add UsePython Task in Nuget Publish workflow (#21144)
### Description
Otherwise it would fail in 

b95982e588/tools/ci_build/github/azure-pipelines/publish-nuget.yml (L78-L81)



### Motivation and Context
The Windows CPU image is migrated  to managed image


### Verification Link
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1313
2024-06-24 13:36:13 +08:00
Dmitri Smirnov
b95982e588
Fix 2D detection bug (#21128)
### Description
Should compare two leading dims for 1.f

### Motivation and Context
Vulnerability scanner
2024-06-21 13:58:21 -07:00
Dwayne Robinson
ac21626725
DML EP EinSum make more generic to avoid EP fallback (#21114)
### Problem
Newer models using more novel equations (e.g. `bhwc,hkc->bhwk` in
Segment Anything's encoder or `bqc,bchw->bqhw`) cause fallback from DML
to CPU, yielding performance issues. The EP had some pattern matching to
map more common equations to existing DML operators, but the number of
permutations was prohibitive and could not catch them all.

### Solution
So, ditch the static mapping, and instead handle any 1-input or 2-input
cases via remapped strides and a mini-graph of elementwise
multiplication & sum reduction (as if DML had a
`DML_OPERATOR_DOT_PRODUCT` that took `axes`). A subset of mappings still
exist for performance (GEMM, pure reduction, transpose...), but they are
identified generally rather than via a pattern table. Also...

- Diagonals are supported now (e.g. iji->i).
- Removes any remaining DML-specific EinSum `GTEST_SKIP` statements.
- Handles any cases up to 8 unique labels (DML dimension limit is 8D).
- \>= 3 inputs and arbitrary size inputs via ellipsis are not handled,
but we have yet to come across a model.
2024-06-21 11:46:16 -07:00
Caroline Zhu
6236707c64
Enable >2GB models + allow model paths to be passed for generate_artifacts API (#20958)
### Description
Alternative design from #20942 

Allow users to pass in a model path for the generate_artifacts API. 

### Motivation and Context
- ONNX API calls such as the onnx checker + shape inference fail when
given a model > 2GB, but work if a path to a model >2GB is passed in.
2024-06-21 09:55:26 -07:00
RuomeiMS
7cf9263ee7
Add changes for strided calibration (#20949)
Context and motivation:
When quantizing large transformer models, we faced OOM issue when the
number of calibration samples goes up. To resolve this, in the PR we
want to add support for reading quantization data in chunck, calculating
ranges for intermediate tensors, then accumulating results for the final
ranges.
2024-06-21 08:23:23 -07:00
Changming Sun
f5625b8858
Revert "[MIGraphX EP] enable compilation and execution on Windows (21084)" (#21132)
### Description

This reverts commit 1d7bf56947 because it
broken the AMD GPU CI pipeline. Sorry when I reviewed the PR I forgot to
run the AMD GPU CI pipeline.

Will revert the PR first then ask the author to fix the issue.
2024-06-21 01:01:07 -07:00
Yi Zhang
69d522f4e9
[Fix] use cmdline in Final Jar Testing Stage for new managed Windows Image (#21130)
### Description
No bash command in Managed Windows image.
Use CmdlLine step instead.



### Verified Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=491902&view=logs&j=f1f8e11e-a9fa-53e5-cd29-3ba2c1988550
2024-06-21 12:41:06 +08:00
Jake Mathern
b9eb1dc21e
Update protobuf_cmake.patch to allow extra disablements configurable by projects that build ORT (#20875)
### Description
Update protobuf_cmake.patch to allow extra disablements. ORT repo
already patches protobuf to not disable the warning 4996.


### Motivation and Context

To meet SDL requirements, Microsoft repos have to fail build if there is
warning 4996
Binskim also gives errors if warning 4996 is disabled.
We can suppress the Binskim issues, but we need a way to disable the
warnings for the minimal set of code that has them.
Right now, WindowsAI disables 4996 for entirety of ORT, but it should
only be disabled for protobuf.
2024-06-20 16:28:15 -07:00
Ted Themistokleous
1d7bf56947
[MIGraphX EP] enable compilation and execution on Windows (#36) (#21084) 2024-06-20 16:21:11 -07:00
Changming Sun
efcaa835b1
Update generate_nuspec_for_native_nuget.py for training (#21112)
### Description
Similar to #21096 , but this one is for ORT training nuget package.
2024-06-20 16:13:31 -07:00
Yi-Hong Lyu
00c713088d
Adpot QDQFinalCleanupTransformer for Q->DQs/DQ->Qs cases (#21018)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-20 11:21:32 -07:00
Wanming Lin
0c80cd2157
[WebNN EP] Update Prelu restriction for CPU backend (#20878) 2024-06-20 11:04:01 -07:00
ivberg
55f7f9d7a9
Fix Crash When Enabling and Disabling ETW with Old Callbacks (#21086)
### Description
Under certain conditions with enabling & disabling ETW continuously, we
got a crash report.
Allows ETW callbacks to be de-registered upon class destructor.
Related to #20537

### Motivation and Context
Fixes crash

### Callstack
We see it crash in
[0x0]
onnxruntime!<lambda_967a738fca8512372f170fcaf2d094d4>::operator()+0x34
0x12941ff570 0x7ffa994f0a04

[0x1] onnxruntime!std::_Func_class<void,_GUID const *,unsigned
long,unsigned char,unsigned __int64,unsigned
__int64,_EVENT_FILTER_DESCRIPTOR *,void *>::operator()+0x54 0x12941ff7b0
0x7ffa994f0d64

[0x2]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::InvokeCallbacks+0xcc
0x12941ff7b0 0x7ffa994f0d64

[0x3]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::ORT_TL_EtwEnableCallback+0x94
0x12941ff860 0x7ffa98d19628
 

and seems to us that the this pointer captured in 
etwRegistrationManager.RegisterInternalCallback(
      [&etwRegistrationManager, this](
...
is no longer valid when the callback is called.
2024-06-20 06:45:45 -07:00
Changming Sun
bd3a9ee99d
Add UsePythonVersion (#21109)
### Description
The machine has multiple python installations and none of them is in
PATH. Therefore we should explicitly set python version via this task to
avoid having surprises.

### Motivation and Context
Similar to #21095
2024-06-19 20:47:21 -07:00
Changming Sun
27f3ac78d4
Delete RoslynAnalyzers (#21104)
### Description
Delete RoslynAnalyzers. Use CodeQL instead.


### Motivation and Context
Now we already have CodeQL which is modern and also covers C# code. The
RoslynAnalyzers one is not in our pull request pipelines. The
"RoslynAnalyzers@2" task is outdated and needs be upgraded. I will
delete it for now since we already have CodeQL.
2024-06-19 20:11:15 -07:00
Chi Lo
e737547862
Add support for INT64 types in TensorRT constant layer calibration (#21101)
This PR is a duplicate of the
https://github.com/microsoft/onnxruntime/pull/21041
Create this PR in case the original one can't be updated for patch
release timeline.
2024-06-19 20:36:26 -05:00
Jing Fang
6817b013b9
[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse (#21054)
### Description

1. added kernel to quantize matmul B tensor to q4, and store in the same
shape as original tensor. scales and zero points are calculated as well.
scales and zero points have the same shape.
2. added kernel to transpose q4 B tensor to B tensor in MatMulNBits.
Scales and zero points are transposed as well.

####
Benchmark
<1024 x 4096 input, 64 quant block, 8 threads>: 
 - quantize: 23035923 ns
 - transpose: 718635 ns

<1024 x 4095 input, 64 quant block, 8 threads>: 
 - quantize: 26759319 ns
 - transpose: 1279064 ns

### Motivation and Context
The MatMulNbits tool chain current only supports converting a MatMul op
direct to MatMulNBits op. MatMulNbits op is not an ONNX standard op.
Therefore, we need the tool chain to support converting MatMul to Q/DQ
format, and later in the transform step converts DQ + MatMul to
MatMulNBits. The tensors stored in DQ are the quantized constants and
will be stored in the MatMulNBits.
2024-06-19 17:15:45 -07:00
Jian Chen
8448f31d90
change is_pod tp is_trivial (#21071)
### Description
change is_pod tp is_trivial



### Motivation and Context
This is commonnly needed for both linux and win c++20 upgrade.
is_trivial was introduced backed in C++11
2024-06-19 16:23:47 -07:00
Changming Sun
be423747b1
Delete pyop (#21094)
### Description
Remove the "--enable_language_interop_ops" build flag, because the code
is incompatible with the latest numpy, and the build flag is not used
anywhere except a macOS CI pipeline. It does not seem to have a ship
plan.


### Motivation and Context
The build error was:
```
onnxruntime/core/language_interop_ops/pyop/pyop.cc:122:85: error: no member named 'elsize' in '_PyArray_Descr'
                                  static_cast<int64_t>(PyArray_DescrFromType(type)->elsize),
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^
```
2024-06-19 16:21:33 -07:00
Clément Péron
8ab8e649a7
tools: build: fix typo (#21052)
### Description
Typo in the python build script
2024-06-19 16:14:58 -07:00
Changming Sun
8b9656717b
Fix a perm issue in Windows Static Analysis pipeline (#21100)
### Description
Due to a security setting change, now we need to explicitly set the
permissions. I forgot doing that when bringing the old change back.

### Motivation and Context
Now the pipeline cannot publish scanning result to Github
2024-06-19 14:44:39 -07:00
Adrian Lizarraga
3ae5df1d18
[QNN EP] Update QNN SDK to 2.23.0 (#21008)
### Description
- Updates CI pipelines to use QNN SDK 2.23.0 by default.
- QNN SDK adds support for int64 Cast. This allows QNN EP to support
ONNX ArgMax/ArgMin/TopK operators that generate an int64 graph output.

Example translation of ArgMax:
- **ONNX**:    input --> ArgMax --> output (int64)
- **QNN**: input --> ArgMax --> Cast (int32 to int64) --> output (int64)

### Motivation and Context
Update onnxruntime to use the latest QNN SDK.
2024-06-19 12:37:42 -07:00
Jian Chen
6a0d64e65c
Component Gov round 7 (#21051)
### Description
ignoreDirectories does not recursively include sub folders like we
thought it would. We need to add additional sub folders.



### Motivation and Context
Fix CG :
1.
https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474679?typeId=25427568
2.
https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11475140?typeId=25421034&pipelinesTrackingFilter=0
2024-06-19 11:07:02 -07:00
Tianlei Wu
769d379c63
Refactor MultiHeadAttention cpu op (#21055)
Refactoring of MultiHeadAttention op
- [x] Add some checking for cross attention of pass_past_in_kv to make
sure there is no kv cache and bias.
- [x] Update interface of PackVIntoRotaryQKV so that it can be used by
SparseAttention later.
- [x] Add test cases

### Motivation and Context
To prepare the pull request for SparseAttention cpu op.
2024-06-19 10:23:26 -07:00
Xu Xing
c3076721f3
[js/webgpu] Support conv3d naive (#20706)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 10:13:50 -07:00
Tianlei Wu
01279d8896
[ROCM] Exclude flash attention from hipify (#21091)
Exclude flash attention sub-directory from hipify.
2024-06-19 08:59:10 -07:00
Scott McKay
6e742c426e
Update nuget package generation script entries for .net8 MAUI (#21096)
### Description
<!-- Describe your changes. -->
Remove xamarin related entries.
Update MAUI entries to net8
Remove macos entries (not required by MAUI)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Updates missed from #21062
2024-06-19 21:10:22 +08:00
Yi Zhang
cc3168bcbb
Add UsePython task in Nuget_Packaging_CPU stage (#21095)
### Description
supplement of https://github.com/microsoft/onnxruntime/pull/21062



### Motivation and Context
2024-06-19 21:09:37 +08:00
Peishen Yan
50b49642d5
[WebNN EP] Update triangular_op_builder.cc (#20994)
As a follow-up of https://github.com/microsoft/onnxruntime/pull/20730
2024-06-19 03:28:34 -07:00
Wanming Lin
40879a2623
[WebNN EP] Enable Cast op for WebNN CPU backend (#20864)
WebNN TFLite backend supports `cast` op but doesn't support casting to
`uint64` data type.
2024-06-19 01:51:19 -07:00
Wanming Lin
35c430a95a
[WebNN EP] Enable several ops for WebNN CPU backend (#20847)
WebNN CPU implementation has been migrated from XNNPack to TFLite which
supports more ops. Turn on partial `cpu` supported ops which just need
the change from `false` to `true` firstly.
2024-06-19 01:45:31 -07:00
Scott McKay
5fc60f36f2
Update to the net8 MAUI targets. Remove Xamarin. (#21062)
### Description
<!-- Describe your changes. -->
Xamarin is EOL so remove support.
The MAUI targets are EOL and need updating.
https://dotnet.microsoft.com/en-us/platform/support/policy/maui

Other cleanups:
- netcoreapp3.1 is EOL
- the net6 macos target was added in the mistaken belief that was for
MAUI mac support, but that is actually via the mac-catalyst target which
we recently added support for.
- some CIs that were using the old build setup of splitting pre-net6
targets. The ORT C# bindings csproj was updated last year and the
`PreNet6` and `SelectedTargets` properties no longer exist as they were
replaced by the simpler `IncludeMobileTargets` property.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove EOL components.
#21058
2024-06-19 16:20:58 +10:00
Jian Chen
1ad2c0a4b2
fix Window_CI in Github Action (#21070)
### Description
fix Window_CI in Github Action
2024-06-18 23:14:08 -07:00
cloudhan
ddd4ce3cb7
[ROCm] Update ck to use ck_tile (#21030) 2024-06-19 14:06:10 +08:00
Yi Zhang
5a0e5237f5
Fix onebranch exception in code signing (#21088)
### Description
Fix regression caused by
https://github.com/microsoft/onnxruntime/pull/20995



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 12:07:17 +08:00
Yulong Wang
5e81fa8aec
[js] fix vulnerability CVE-2024-4068: upgrade braces to 3.0.3 (#21078)
### Description

Upgrade `braces` to 3.0.3

[CVE-2024-4068](https://github.com/advisories/GHSA-grv7-fg5c-xmjg)

```
# npm audit report

braces  <3.0.3
Severity: high
Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg
fix available via `npm audit fix`
node_modules/braces

1 high severity vulnerability
```
2024-06-18 16:02:08 -07:00
Changming Sun
ffb8e8eb0e
Update build.py: add a comment (#20993)
### Description
Update build.py: add a comment


### Motivation and Context
See the comment.
2024-06-18 13:52:34 -07:00
Yulong Wang
631a2c16be
[js/web] skip default locateFile() when dynamic import is disabled (#21073)
### Description
skip default `locateFile()` when dynamic import is disabled. This allows
the file to work with bundlers to load WebAssembly file correctly if
`env.wasm.wasmPaths` is not set.
2024-06-18 12:21:45 -07:00
Changming Sun
b75b2fcdcb
Add MSVC static analyzer back (#21056)
### Description
Add MSVC static analyzer back. Previously it had a stability issue. It was deleted in #17522 .

### Motivation and Context
2024-06-18 12:10:11 -07:00
Yang Gu
1473d66a00
[js/webgpu] Prefer adapter.info to adapter.requestAdapterInfo (#21065)
WebGPU is deprecating async adapter.requestAdapterInfo, and replacing it
with sync adapter.info.
Spec change: https://github.com/gpuweb/gpuweb/pull/4662
2024-06-18 12:02:38 -07:00
Ted Themistokleous
dadd0c451a
[MIGraphX EP] Fix MIGraphX mixed precision run input parameters (#20982)
See #20643

### Description

Changes order of how we perform quantization to better support mixed
precision and fixes a bug found with parameters of inputs for int8
quantization not being correctly handled.

We now perform int8 quantization first on a full precision input model,
before then quantizing the model to fp16 for remain ops that aren't
quantized. The former case was causing us to use a low precision input
which could cause larger values to be inserted than intended to the
model when int8 quantization is perform. The symptom of this was a
failure during quantization steps.

Similar to the above input parameters were being uninitialized and
resulting in similar failure during int8 quantization.

GPU faults were intermittent but present as using uninitialized memory
created undefined behavior when we started testing more complex models
during mixed precision.

### Motivation and Context

In some cases we've seen random data and/or invalid values entering into
compiled onnx graphs. This is due to input parameters to the MIGraphX
Graph not being set correctly when mixed precision (int8 + fp16) is used
and ordering of quantization steps is causes a lower precision model to
be used to perform int8 quantization. In most cases the failure is
silent/intermittent. In some cases we've observed gpu faults due to out
of bounds values being set.

This change is required as a large input parameter to the MIGraphX graph
is initialized to a large random value, and the next operator is using
that for indexing, we get undefined behavior and a GPU fault.
2024-06-18 11:18:13 +08:00
Yi Zhang
809cb26ace
Use A100 for LLama2 model test (#21068)
### Description




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-18 11:04:02 +08:00
Changming Sun
9ef4f1b789
Update pybind11 (#21072)
### Description
Upgrade pybind11 to the latest as suggested by @gnought in #21063

### Motivation and Context
Recently numpy released a new version, which caused compatibility issue
between the latest numpy version and the latest ONNX Runtime version.
2024-06-17 19:50:57 -07:00
Scott McKay
159fe9d4f3
Update to mobile model usability checker (#19843)
### Description
<!-- Describe your changes. -->

- Add check for CoreML MLProgram supported ops
- Only check usability with ORT Mobile package if requested
- this package will be deprecated so info is a) of minimal value and b)
can be confusing.
- Output more things at INFO level
- a lot of meaningful info was only output at DEBUG level. The default
INFO level is more useful
  - dump full partition info at DEBUG level
- Check subgraphs fully
  - CoreML can handle a subgraph
- TBD if we want to add support for adding a subgraph to the parent
graph for Loop and If nodes
    - most likely will be required for simple If nodes to be performant
- Check 5D CoreML limitation

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve helper tools

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-06-18 07:50:33 +10:00
Nikolai Svakhin
7b3fff650a
Updated build script for CUDA case (#20987)
### Description

In CUDA case, use the cuda_home variable to set CMAKE's CUDA compiler to
a correct version of NVCC

Otherwise, an NVCC from a current PATH would be picked up, which could
be from a different version of CUDA.


### Motivation and Context

I had a case when I had main CUDA installed, and it was a version 11.8.

I wanted to build against 12.5, so I downloaded and unpacked it into a
separate directory and passed it as a `--cuda-home` parameter, however
the ONNX builder was still picking the NVCC compiler from 11.8.

This would fix the issue
https://github.com/microsoft/onnxruntime/issues/20928


cc @gedoensmax
2024-06-17 14:41:43 -07:00
Adrian Lizarraga
a6c18ae9df
[QNN EP] Add quantization axis checks for Conv/ConvTranspose/Q/DQ ops (#21016)
### Description
Updates QNN EP to reject Conv/ConvTranspose/Q/DQ ops with unsupported
quantization axis values.



### Motivation and Context
Allows these unsupported operators to be handled by the CPU EP.

Fixes errors like the following:

> Node 'ConvTranspose' OpType:ConvTranspose with
domain:com.ms.internal.nhwc was inserted using the NHWC format as
requested by QNNExecutionProvider, but was not selected by that EP. This
means the graph is now invalid as there will not be an EP able to run
the node. This could be a bug in layout transformer, or in the
GetCapability implementation of the EP.

---------

Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-06-17 09:46:14 -07:00
Xavier Dupré
c501c6ffaf
Rename a mispelled filename in the documentation (#21066)
### Description
Rename a file in the documentation
2024-06-17 18:18:41 +02:00
Wanming Lin
bbb6dbf6d2
[WebNN EP] Update data type constraints for Reduction ops (#20912)
WebNN Spec adds missing 64-bit integers support for `reduceL1`,
`reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this
[PR](https://github.com/webmachinelearning/webnn/pull/695), which has
already been implemented in Chromium. Update corresponding data type
constraints in WebNN EP.

Besides, WebNN CPU backend currently doesn't support `uint64` and
`uint32` for these ops.
2024-06-17 08:46:18 -07:00
Frank Dong
8aa2667ae6
add bf16 for Tile CUDA executor (#20854)
### Description
add bf16 for Tile CUDA executor



### Motivation and Context
required change to support phimm model for ORT training
2024-06-17 05:52:13 -07:00