Commit graph

11272 commits

Author SHA1 Message Date
mindest
eecc11afc7
[ROCm] Disable ck_tile in Debug build (#21178)
### Description
tmp fix: disable ck_tile for Debug build.



### Motivation and Context
Release build works fine for ck_tile, while Debug build fails.
<details>
<summary> Typical error log to revisit
</summary>

```
[880/1797] Building HIP object CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp.o
FAILED: CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp.o 
/opt/rocm/llvm/bin/clang++ -DEIGEN_MPL2_ONLY -DENABLE_ROCM_PROFILING -DENABLE_STRIDED_TENSORS -DENABLE_TRAINING -DENABLE_TRAINING_APIS -DENABLE_TRAINING_CORE -DENABLE_TRAINING_OPS -DENABLE_TRAINING_TORCH_INTEROP -DMIOPEN_VERSION=30100 -DORT_ENABLE_STREAM -DROCM_VERSION=60100 -DUSE_ROCM=1 -D_GNU_SOURCE -D__HIP_ROCclr__=1 -D__bf16__ -D__fp16__ -D__fp32__ -I/build/Debug/_deps/utf8_range-src -I/ws/onnxruntime/include/onnxruntime -I/ws/onnxruntime/include/onnxruntime/core/session -I/ws/onnxruntime/orttraining/orttraining/training_api/include -I/build/Debug/_deps/composable_kernel-src/example/ck_tile/01_fmha -I/build/Debug/_deps/composable_kernel-src/include -I/build/Debug/_deps/composable_kernel-build/include -I/build/Debug/_deps/composable_kernel-src/library/include -isystem /opt/rocm-6.1.0/include -g -O -std=gnu++17 --offload-arch=gfx90a -fPIC -x hip -mllvm=-amdgpu-early-inline-all=true -mllvm=-amdgpu-function-calls=false -MD -MT CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp.o -MF CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp.o.d -o CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp.o -x hip -c /build/Debug/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp
In file included from /build/Debug/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psddv.cpp:5:
In file included from /build/Debug/_deps/composable_kernel-src/example/ck_tile/01_fmha/fmha_fwd.hpp:6:
In file included from /build/Debug/_deps/composable_kernel-src/include/ck_tile/core.hpp:11:
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
   27 |     asm volatile("s_add_u32 m0, %0, m0" : : "n"(v) : "memory");
      |                  ^
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
/build/Debug/_deps/composable_kernel-src/include/ck_tile/core/arch/utility.hpp:27:18: error: constraint 'n' expects an integer constant expression
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx90a.
...
```
</details>
2024-06-27 12:04:17 +08:00
Scott McKay
887a818aa7
Check for unit test log severity override earlier (#21177)
### Description
<!-- Describe your changes. -->
Setting the log level after environment creation is too late in some
cases.

If the DML EP is enabled, it will create a composite sink with the
original logger using the creation time log severity, as well as
additional ETW sink. As it saves the current severity levels for each
sink inside the composite sink that prevents being able to get verbose
log output to stdout even if you set that at the session level.

I don't know enough about the setup that combines ETW with the original
sink to say whether we should also be updating the severity of
individual sinks in the combined sink, so this change is limited to
making the unit tests behave in the expected manner when the default log
severity is set in the background and not directly controlled.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make it possible to get verbose output to stdout when the DML EP is
enabled.
2024-06-27 12:51:13 +10:00
Vincent Wang
3c0b407709
Rollback 19832, Remove shape_input_merge Fusion (#21179)
The PR caused Big Models pipeline failure for running Llama2. After the
rollback, the pipeline is back to normal.
2024-06-26 10:00:45 -07:00
Scott McKay
337cc56d6f
Convert scalars to 1D to satisfy ML Program requirements. (#21159)
### Description
<!-- Describe your changes. -->
Convert scalars to 1D to satisfy ML Program requirements.


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1418617&view=logs&j=f7cc61a9-cc70-56e7-b06c-4668ca17e426&t=16d281b5-1bfd-5309-f274-36d0dffd9cb1&l=27167

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fixes test failure in #17361
2024-06-26 09:54:36 -07:00
mindest
e2abba18ea
Skip softmax BF16 test for ROCm (#21162)
### Description

Skip softmax BF16 test for ROCm, because BFloat16 is unsupported by
MIOpen, and `torch.cuda.is_available()` also returns `True` for ROCm.
2024-06-26 11:15:50 +08:00
Wanming Lin
41ad83fb00
[WebNN EP] Support rest Reduction ops for TFLite backend (#21135)
- reduceLogSum, reduceLogSumExp and reduceSumSquare have been landed in
https://chromium-review.googlesource.com/c/chromium/src/+/5575815
- reduceL1 and reduceL2 have been landed in
https://chromium-review.googlesource.com/c/chromium/src/+/5606091
2024-06-25 18:30:55 -07:00
Wanming Lin
4743803944
[WebNN EP] Support more Normalization ops for TFLite backend (#21151)
Following Normalization ops have been supported in Chromium for TFLite
backend:
- batchNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5532745
- layerNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5573326
- instanceNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5532750
2024-06-24 19:04:23 -07:00
Jian Chen
f81c0ec32a
Remove warning suppression from Java Packaging pipeline. (#21010)
### Description
Remove warning suppression from Java Packaging pipeline.


### Motivation and Context
We want the CI step not to produce warning.
2024-06-24 16:46:21 -07:00
mindest
adaf0e8116
[Fix] USE_NCCL -> ORT_USE_NCCL (#21136)
### Description
Correct the macro used when NCCL enabled.
2024-06-24 11:33:17 -07:00
Wanming Lin
3a917e49fb
[WebNN EP] Support 4 more ops for TFLite backend (#21134)
Recently WebNN TFLite backend supports gelu, expand, softsign,
reciprocal.
2024-06-24 09:52:12 -07:00
aciddelgado
ebd0368bb0
Make Flash Attention work on Windows (#21015)
### Description
Previously, Flash Attention only worked on Linux systems. This PR will
make it work and enable it to be built and run on Windows.

Limitations of Flash Attention in Windows: Requires CUDA 12.

### Motivation and Context
This will significantly increase the performance of Windows-based LLM's
with hardware sm>=80.

To illustrate the improvement of Flash Attention over Memory Efficient
Attention, here are some average benchmark numbers for the GQA operator,
run with configurations based on several recent models (Llama, Mixtral,
Phi-3). The benchmarks were obtained on RTX4090 GPU using the test
script located at
(onnxruntime/test/python/transformers/benchmark_gqa_windows.py).

* Clarifying Note: These benchmarks are just for the GQA operator, not
the entire model.

### Memory Efficient Attention Kernel Benchmarks:
| Model Name | Max Sequence Length | Inference Interval (ms) |
Throughput (samples/second) |

|----------------------------------------|---------------------|-------------------------|-----------------------------|
| Llama3-8B (Average Prompt) | 8192 | 0.19790525 | 13105.63425 |
| Llama3-8B (Average Token) | 8192 | 0.207775538 | 12025.10172 |
| Llama3-70B (Average Prompt) | 8192 | 0.216049167 | 11563.31185 |
| Llama3-70B (Average Token) | 8192 | 0.209730731 | 12284.38149 |
| Mixtral-8x22B-v0.1 (Average Prompt) | 32768 | 0.371928785 |
7031.440056 |
| Mixtral-8x22B-v0.1 (Average Token) | 32768 | 0.2996659 | 7607.947159 |
| Phi-3-mini-128k (Average Prompt) | 131072 | 0.183195867 | 15542.0852 |
| Phi-3-mini-128k (Average Token) | 131072 | 0.198215688 | 12874.53494 |
| Phi-3-small-128k (Average Prompt) | 65536 | 2.9884929 | 2332.584142 |
| Phi-3-small-128k (Average Token) | 65536 | 0.845072406 | 2877.85822 |
| Phi-3-medium-128K (Average Prompt) | 32768 | 0.324974429 | 8094.909517
|
| Phi-3-medium-128K (Average Token) | 32768 | 0.263662567 | 8978.463687
|

### Flash Attention Kernel Benchmarks:
| Model Name | Max Sequence Length | Inference Interval (ms) |
Throughput (samples/second) |

|--------------------------------------|---------------------|-------------------------|-----------------------------|
| Llama3-8B (Average Prompt) | 8192 | 0.163566292 | 16213.69057 |
| Llama3-8B (Average Token) | 8192 | 0.161643692 | 16196.14715 |
| Llama3-70B (Average Prompt) | 8192 | 0.160510375 | 17448.67753 |
| Llama3-70B (Average Token) | 8192 | 0.169427308 | 14702.62043 |
| Mixtral-8x22B-v0.1 (Average Prompt) | 32768 | 0.164121964 |
15618.51301 |
| Mixtral-8x22B-v0.1 (Average Token) | 32768 | 0.1715865 | 14524.32273 |
| Phi-3-mini-128k (Average Prompt) | 131072 | 0.167527167 | 14576.725 |
| Phi-3-mini-128k (Average Token) | 131072 | 0.175940594 | 15762.051 |
| Phi-3-small-128k (Average Prompt) | 65536 | 0.162719733 | 17824.494 |
| Phi-3-small-128k (Average Token) | 65536 | 0.14977525 | 16749.19858 |
| Phi-3-medium-128K (Average Prompt) | 32768 | 0.156490786 | 17679.2513
|
| Phi-3-medium-128K (Average Token) | 32768 | 0.165333833 | 14932.26079
|

Flash Attention is consistently faster for every configuration we
benchmarked, with improvements in our trials ranging from ~20% to ~650%.

In addition to these improvements in performance, Flash Attention has
better memory usage. For example, Memory Efficient Attention cannot
handle a max sequence length higher than 32,768, but Flash Attention can
handle max sequence lengths at least as high as 131,072.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-06-24 09:43:49 -07:00
zhijiang
269d9b094f
Zhijxu/fix softmax cudnn bf16 (#21045)
if seq >2048, ort will fallback to cudnn version, while when dtype is
bf16, ort will throw exception, this PR trying to fix it.
2024-06-24 16:07:39 +08:00
Yi Zhang
5b5ce0bfb0
Add UsePython Task in Nuget Publish workflow (#21144)
### Description
Otherwise it would fail in 

b95982e588/tools/ci_build/github/azure-pipelines/publish-nuget.yml (L78-L81)



### Motivation and Context
The Windows CPU image is migrated  to managed image


### Verification Link
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1313
2024-06-24 13:36:13 +08:00
Dmitri Smirnov
b95982e588
Fix 2D detection bug (#21128)
### Description
Should compare two leading dims for 1.f

### Motivation and Context
Vulnerability scanner
2024-06-21 13:58:21 -07:00
Dwayne Robinson
ac21626725
DML EP EinSum make more generic to avoid EP fallback (#21114)
### Problem
Newer models using more novel equations (e.g. `bhwc,hkc->bhwk` in
Segment Anything's encoder or `bqc,bchw->bqhw`) cause fallback from DML
to CPU, yielding performance issues. The EP had some pattern matching to
map more common equations to existing DML operators, but the number of
permutations was prohibitive and could not catch them all.

### Solution
So, ditch the static mapping, and instead handle any 1-input or 2-input
cases via remapped strides and a mini-graph of elementwise
multiplication & sum reduction (as if DML had a
`DML_OPERATOR_DOT_PRODUCT` that took `axes`). A subset of mappings still
exist for performance (GEMM, pure reduction, transpose...), but they are
identified generally rather than via a pattern table. Also...

- Diagonals are supported now (e.g. iji->i).
- Removes any remaining DML-specific EinSum `GTEST_SKIP` statements.
- Handles any cases up to 8 unique labels (DML dimension limit is 8D).
- \>= 3 inputs and arbitrary size inputs via ellipsis are not handled,
but we have yet to come across a model.
2024-06-21 11:46:16 -07:00
Caroline Zhu
6236707c64
Enable >2GB models + allow model paths to be passed for generate_artifacts API (#20958)
### Description
Alternative design from #20942 

Allow users to pass in a model path for the generate_artifacts API. 

### Motivation and Context
- ONNX API calls such as the onnx checker + shape inference fail when
given a model > 2GB, but work if a path to a model >2GB is passed in.
2024-06-21 09:55:26 -07:00
RuomeiMS
7cf9263ee7
Add changes for strided calibration (#20949)
Context and motivation:
When quantizing large transformer models, we faced OOM issue when the
number of calibration samples goes up. To resolve this, in the PR we
want to add support for reading quantization data in chunck, calculating
ranges for intermediate tensors, then accumulating results for the final
ranges.
2024-06-21 08:23:23 -07:00
Changming Sun
f5625b8858
Revert "[MIGraphX EP] enable compilation and execution on Windows (21084)" (#21132)
### Description

This reverts commit 1d7bf56947 because it
broken the AMD GPU CI pipeline. Sorry when I reviewed the PR I forgot to
run the AMD GPU CI pipeline.

Will revert the PR first then ask the author to fix the issue.
2024-06-21 01:01:07 -07:00
Yi Zhang
69d522f4e9
[Fix] use cmdline in Final Jar Testing Stage for new managed Windows Image (#21130)
### Description
No bash command in Managed Windows image.
Use CmdlLine step instead.



### Verified Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=491902&view=logs&j=f1f8e11e-a9fa-53e5-cd29-3ba2c1988550
2024-06-21 12:41:06 +08:00
Jake Mathern
b9eb1dc21e
Update protobuf_cmake.patch to allow extra disablements configurable by projects that build ORT (#20875)
### Description
Update protobuf_cmake.patch to allow extra disablements. ORT repo
already patches protobuf to not disable the warning 4996.


### Motivation and Context

To meet SDL requirements, Microsoft repos have to fail build if there is
warning 4996
Binskim also gives errors if warning 4996 is disabled.
We can suppress the Binskim issues, but we need a way to disable the
warnings for the minimal set of code that has them.
Right now, WindowsAI disables 4996 for entirety of ORT, but it should
only be disabled for protobuf.
2024-06-20 16:28:15 -07:00
Ted Themistokleous
1d7bf56947
[MIGraphX EP] enable compilation and execution on Windows (#36) (#21084) 2024-06-20 16:21:11 -07:00
Changming Sun
efcaa835b1
Update generate_nuspec_for_native_nuget.py for training (#21112)
### Description
Similar to #21096 , but this one is for ORT training nuget package.
2024-06-20 16:13:31 -07:00
Yi-Hong Lyu
00c713088d
Adpot QDQFinalCleanupTransformer for Q->DQs/DQ->Qs cases (#21018)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-20 11:21:32 -07:00
Wanming Lin
0c80cd2157
[WebNN EP] Update Prelu restriction for CPU backend (#20878) 2024-06-20 11:04:01 -07:00
ivberg
55f7f9d7a9
Fix Crash When Enabling and Disabling ETW with Old Callbacks (#21086)
### Description
Under certain conditions with enabling & disabling ETW continuously, we
got a crash report.
Allows ETW callbacks to be de-registered upon class destructor.
Related to #20537

### Motivation and Context
Fixes crash

### Callstack
We see it crash in
[0x0]
onnxruntime!<lambda_967a738fca8512372f170fcaf2d094d4>::operator()+0x34
0x12941ff570 0x7ffa994f0a04

[0x1] onnxruntime!std::_Func_class<void,_GUID const *,unsigned
long,unsigned char,unsigned __int64,unsigned
__int64,_EVENT_FILTER_DESCRIPTOR *,void *>::operator()+0x54 0x12941ff7b0
0x7ffa994f0d64

[0x2]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::InvokeCallbacks+0xcc
0x12941ff7b0 0x7ffa994f0d64

[0x3]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::ORT_TL_EtwEnableCallback+0x94
0x12941ff860 0x7ffa98d19628
 

and seems to us that the this pointer captured in 
etwRegistrationManager.RegisterInternalCallback(
      [&etwRegistrationManager, this](
...
is no longer valid when the callback is called.
2024-06-20 06:45:45 -07:00
Changming Sun
bd3a9ee99d
Add UsePythonVersion (#21109)
### Description
The machine has multiple python installations and none of them is in
PATH. Therefore we should explicitly set python version via this task to
avoid having surprises.

### Motivation and Context
Similar to #21095
2024-06-19 20:47:21 -07:00
Changming Sun
27f3ac78d4
Delete RoslynAnalyzers (#21104)
### Description
Delete RoslynAnalyzers. Use CodeQL instead.


### Motivation and Context
Now we already have CodeQL which is modern and also covers C# code. The
RoslynAnalyzers one is not in our pull request pipelines. The
"RoslynAnalyzers@2" task is outdated and needs be upgraded. I will
delete it for now since we already have CodeQL.
2024-06-19 20:11:15 -07:00
Chi Lo
e737547862
Add support for INT64 types in TensorRT constant layer calibration (#21101)
This PR is a duplicate of the
https://github.com/microsoft/onnxruntime/pull/21041
Create this PR in case the original one can't be updated for patch
release timeline.
2024-06-19 20:36:26 -05:00
Jing Fang
6817b013b9
[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse (#21054)
### Description

1. added kernel to quantize matmul B tensor to q4, and store in the same
shape as original tensor. scales and zero points are calculated as well.
scales and zero points have the same shape.
2. added kernel to transpose q4 B tensor to B tensor in MatMulNBits.
Scales and zero points are transposed as well.

####
Benchmark
<1024 x 4096 input, 64 quant block, 8 threads>: 
 - quantize: 23035923 ns
 - transpose: 718635 ns

<1024 x 4095 input, 64 quant block, 8 threads>: 
 - quantize: 26759319 ns
 - transpose: 1279064 ns

### Motivation and Context
The MatMulNbits tool chain current only supports converting a MatMul op
direct to MatMulNBits op. MatMulNbits op is not an ONNX standard op.
Therefore, we need the tool chain to support converting MatMul to Q/DQ
format, and later in the transform step converts DQ + MatMul to
MatMulNBits. The tensors stored in DQ are the quantized constants and
will be stored in the MatMulNBits.
2024-06-19 17:15:45 -07:00
Jian Chen
8448f31d90
change is_pod tp is_trivial (#21071)
### Description
change is_pod tp is_trivial



### Motivation and Context
This is commonnly needed for both linux and win c++20 upgrade.
is_trivial was introduced backed in C++11
2024-06-19 16:23:47 -07:00
Changming Sun
be423747b1
Delete pyop (#21094)
### Description
Remove the "--enable_language_interop_ops" build flag, because the code
is incompatible with the latest numpy, and the build flag is not used
anywhere except a macOS CI pipeline. It does not seem to have a ship
plan.


### Motivation and Context
The build error was:
```
onnxruntime/core/language_interop_ops/pyop/pyop.cc:122:85: error: no member named 'elsize' in '_PyArray_Descr'
                                  static_cast<int64_t>(PyArray_DescrFromType(type)->elsize),
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^
```
2024-06-19 16:21:33 -07:00
Clément Péron
8ab8e649a7
tools: build: fix typo (#21052)
### Description
Typo in the python build script
2024-06-19 16:14:58 -07:00
Changming Sun
8b9656717b
Fix a perm issue in Windows Static Analysis pipeline (#21100)
### Description
Due to a security setting change, now we need to explicitly set the
permissions. I forgot doing that when bringing the old change back.

### Motivation and Context
Now the pipeline cannot publish scanning result to Github
2024-06-19 14:44:39 -07:00
Adrian Lizarraga
3ae5df1d18
[QNN EP] Update QNN SDK to 2.23.0 (#21008)
### Description
- Updates CI pipelines to use QNN SDK 2.23.0 by default.
- QNN SDK adds support for int64 Cast. This allows QNN EP to support
ONNX ArgMax/ArgMin/TopK operators that generate an int64 graph output.

Example translation of ArgMax:
- **ONNX**:    input --> ArgMax --> output (int64)
- **QNN**: input --> ArgMax --> Cast (int32 to int64) --> output (int64)

### Motivation and Context
Update onnxruntime to use the latest QNN SDK.
2024-06-19 12:37:42 -07:00
Jian Chen
6a0d64e65c
Component Gov round 7 (#21051)
### Description
ignoreDirectories does not recursively include sub folders like we
thought it would. We need to add additional sub folders.



### Motivation and Context
Fix CG :
1.
https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474679?typeId=25427568
2.
https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11475140?typeId=25421034&pipelinesTrackingFilter=0
2024-06-19 11:07:02 -07:00
Tianlei Wu
769d379c63
Refactor MultiHeadAttention cpu op (#21055)
Refactoring of MultiHeadAttention op
- [x] Add some checking for cross attention of pass_past_in_kv to make
sure there is no kv cache and bias.
- [x] Update interface of PackVIntoRotaryQKV so that it can be used by
SparseAttention later.
- [x] Add test cases

### Motivation and Context
To prepare the pull request for SparseAttention cpu op.
2024-06-19 10:23:26 -07:00
Xu Xing
c3076721f3
[js/webgpu] Support conv3d naive (#20706)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 10:13:50 -07:00
Tianlei Wu
01279d8896
[ROCM] Exclude flash attention from hipify (#21091)
Exclude flash attention sub-directory from hipify.
2024-06-19 08:59:10 -07:00
Scott McKay
6e742c426e
Update nuget package generation script entries for .net8 MAUI (#21096)
### Description
<!-- Describe your changes. -->
Remove xamarin related entries.
Update MAUI entries to net8
Remove macos entries (not required by MAUI)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Updates missed from #21062
2024-06-19 21:10:22 +08:00
Yi Zhang
cc3168bcbb
Add UsePython task in Nuget_Packaging_CPU stage (#21095)
### Description
supplement of https://github.com/microsoft/onnxruntime/pull/21062



### Motivation and Context
2024-06-19 21:09:37 +08:00
Peishen Yan
50b49642d5
[WebNN EP] Update triangular_op_builder.cc (#20994)
As a follow-up of https://github.com/microsoft/onnxruntime/pull/20730
2024-06-19 03:28:34 -07:00
Wanming Lin
40879a2623
[WebNN EP] Enable Cast op for WebNN CPU backend (#20864)
WebNN TFLite backend supports `cast` op but doesn't support casting to
`uint64` data type.
2024-06-19 01:51:19 -07:00
Wanming Lin
35c430a95a
[WebNN EP] Enable several ops for WebNN CPU backend (#20847)
WebNN CPU implementation has been migrated from XNNPack to TFLite which
supports more ops. Turn on partial `cpu` supported ops which just need
the change from `false` to `true` firstly.
2024-06-19 01:45:31 -07:00
Scott McKay
5fc60f36f2
Update to the net8 MAUI targets. Remove Xamarin. (#21062)
### Description
<!-- Describe your changes. -->
Xamarin is EOL so remove support.
The MAUI targets are EOL and need updating.
https://dotnet.microsoft.com/en-us/platform/support/policy/maui

Other cleanups:
- netcoreapp3.1 is EOL
- the net6 macos target was added in the mistaken belief that was for
MAUI mac support, but that is actually via the mac-catalyst target which
we recently added support for.
- some CIs that were using the old build setup of splitting pre-net6
targets. The ORT C# bindings csproj was updated last year and the
`PreNet6` and `SelectedTargets` properties no longer exist as they were
replaced by the simpler `IncludeMobileTargets` property.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove EOL components.
#21058
2024-06-19 16:20:58 +10:00
Jian Chen
1ad2c0a4b2
fix Window_CI in Github Action (#21070)
### Description
fix Window_CI in Github Action
2024-06-18 23:14:08 -07:00
cloudhan
ddd4ce3cb7
[ROCm] Update ck to use ck_tile (#21030) 2024-06-19 14:06:10 +08:00
Yi Zhang
5a0e5237f5
Fix onebranch exception in code signing (#21088)
### Description
Fix regression caused by
https://github.com/microsoft/onnxruntime/pull/20995



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 12:07:17 +08:00
Yulong Wang
5e81fa8aec
[js] fix vulnerability CVE-2024-4068: upgrade braces to 3.0.3 (#21078)
### Description

Upgrade `braces` to 3.0.3

[CVE-2024-4068](https://github.com/advisories/GHSA-grv7-fg5c-xmjg)

```
# npm audit report

braces  <3.0.3
Severity: high
Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg
fix available via `npm audit fix`
node_modules/braces

1 high severity vulnerability
```
2024-06-18 16:02:08 -07:00
Changming Sun
ffb8e8eb0e
Update build.py: add a comment (#20993)
### Description
Update build.py: add a comment


### Motivation and Context
See the comment.
2024-06-18 13:52:34 -07:00
Yulong Wang
631a2c16be
[js/web] skip default locateFile() when dynamic import is disabled (#21073)
### Description
skip default `locateFile()` when dynamic import is disabled. This allows
the file to work with bundlers to load WebAssembly file correctly if
`env.wasm.wasmPaths` is not set.
2024-06-18 12:21:45 -07:00