Commit graph

8343 commits

Author SHA1 Message Date
Hariharan Seshadri
ed7ab1660d
[CUDA] Add option to use DecoderMaskedMultiheadAttention in BeamSearch (#14990) 2023-03-15 17:16:32 -07:00
Yufeng Li
da084b0fc1
check axis range for LayerNorm (#14845)
### Description
<!-- Describe your changes. -->
Add check on axis to make sure it is in a valid range


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-15 14:44:59 -07:00
Changming Sun
5213546e62
Change how to find npm (#15001) 2023-03-15 11:10:10 -07:00
wejoncy
32533dd1c2 fix 2023-03-15 13:23:56 +08:00
JiCheng
cc15ceef4e Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/model_builder.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
6bdb03281a clean comments 2023-03-15 13:23:56 +08:00
wejoncy
762ea2402e fix 2023-03-15 13:23:56 +08:00
JiCheng
8db28d9139 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
8d00961321 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
c10462b5f5 ORT_UNUSED_PARAMETER 2023-03-15 13:23:56 +08:00
wejoncy
92fabf57ea comments 2023-03-15 13:23:56 +08:00
JiCheng
cd3173d531 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
dad772ef09 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
8aeed1e87d amend 2023-03-15 13:23:56 +08:00
wejoncy
5fe61c53a3 comments 2023-03-15 13:23:56 +08:00
JiCheng
d236085845 Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
d490908836 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
a8ed956fa7 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
de5e58c077 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
d70b5b38e2 amend 2023-03-15 13:23:56 +08:00
wejoncy
fdc970a40d rename constant var 2023-03-15 13:23:56 +08:00
wejoncy
18015f0f55 use span 2023-03-15 13:23:56 +08:00
JiCheng
4ca84ac303 Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
28a4cfeeef refactor 2023-03-15 13:23:56 +08:00
wejoncy
adfd38edb9 Fail early when getting device 2023-03-15 13:23:56 +08:00
wejoncy
82ae138143 address comments 2023-03-15 13:23:56 +08:00
JiCheng
36aac0036b Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
760a2b99d0 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
7c6fc31b65 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
adf990f30f more docs 2023-03-15 13:23:56 +08:00
wejoncy
028c2372fa remove disable_cpu_soft temporarily 2023-03-15 13:23:56 +08:00
wejoncy
30151323da move init_device to ctor 2023-03-15 13:23:56 +08:00
JiCheng
d135bb7c0c Update nnapi_execution_provider.cc 2023-03-15 13:23:56 +08:00
JiCheng
781e72f663 Update nnapi_api_helper.cc 2023-03-15 13:23:56 +08:00
JiCheng
8383a54f9d Update include/onnxruntime/core/providers/nnapi/nnapi_provider_factory.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
3873a55bd3 [NNAPI] fix feature_level query 2023-03-15 13:23:56 +08:00
Jian Chen
6891ab5bac
fix_macos (#15018)
### Description
<!-- Describe your changes. -->
This fix macos packaging build on universal2 arch. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-14 21:54:44 -07:00
Tianlei Wu
bdfdebfca7
Fix ReduceSum in attention fusion (#15047)
Fix https://github.com/microsoft/onnxruntime/issues/14959.
ReduceSum-13 move axes from attribute to node input.
2023-03-14 20:34:17 -07:00
PeixuanZuo
c70838cbbb
[ROCm] add Conv, NhwcConv benchmark to microbench (#15017)
Add Conv, NhwcConv benchmark to microbench.

Related PR: https://github.com/microsoft/onnxruntime/pull/14982,
https://github.com/microsoft/onnxruntime/pull/14980
2023-03-15 11:07:17 +08:00
Yi Zhang
f096f6167b
Remove python37 and cuda37 packages in orttraing (#15041)
### Description
supplement of #14874 and #14887

### Motivation and Context


N.B.
I'm not sure if python matrix of rocm is expected (python3.7-3.9) @faxu
@snnn

(https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/azure-pipelines/orttraining-py-packaging-pipeline-rocm.yml)
2023-03-15 08:54:15 +08:00
Yi-Hong Lyu
a8680ff188
Add float16 Sigmoid support (#14910) 2023-03-14 17:12:49 -07:00
Tianlei Wu
0bb7390d11
fix prefast warnings (#15033)
Fix prefast warnings
2023-03-14 13:51:51 -07:00
Rachel Guo
db4e664f7c
Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989)
### Description
<!-- Describe your changes. -->

Re-enable the react native e2e android unit test for react native CI as
recent change of specifying `default` instead of `google-apis` in
android emulator CI tests gives pretty stable result for now.

Upgrade the targetSDKversion for gradle test project in
react-native/android to meet minimum target api level requirement for
Google Play apps.


https://support.google.com/googleplay/android-developer/answer/11926878?hl=en

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

React Native CI issue.
2023-03-14 13:35:38 -07:00
Alex Kogan
8b09702b88
Enable parallel computation in Clip ops (#14925)
### Description
<!-- Describe your changes. -->
This PR speeds-up Clip operations by replacing their sequential
implementation with a parallelized one. The parallelization is achieved
by dividing the input data into chunks of size N and using a thread pool
to process the chunks in parallel. The chunk size N is set to 16K based
on performance evaluation on input tensors of 10^i elements for i in [1
.. 6].


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The Clip operation is frequently executed in image processing models.
Its implementation can be easily parallelized and therefore sped up when
executed on a multi-core machine. On long inputs (>= 100K elements) this
PR achieves speedup of over 2x. On shorter inputs, this PR does not
introduce any substantial performance change.
2023-03-14 09:41:44 -07:00
PeixuanZuo
2ff7f3e93a
[ROCm] support optimized Stable Diffusion model (#14980)
Add BiasSplitGelu/BiasAdd/GroupNorm/NhwcConv operator for ROCm EP.

1. BiasSplitGelu and BiasAdd operators can be automatically hipified
from CUDA EP.
2. GroupNorm was hipified from CUDA EP and modified to build.
3. NhwcConv is similar to NhwcConv in CUDA EP, But the MIOpen API and
cuDnn API are different. `miopenConvolutionForwardbias` and
`miopenOpTensor` of MIOpen doesn't support NHWC layout now, use
BinaryElementwise to replace miopenConvolutionForwardbias(NHWC layout).
2023-03-14 23:15:37 +08:00
PeixuanZuo
ff2850029b
[ROCm] refact SkipLayernorm long if-elseif statements (#14795)
Refact SkipLayernorm long if-elseif statements.
2023-03-14 23:04:55 +08:00
Ye Wang
0fa00429d5
[T5 optimization] script fusions and fixes (#14967)
### Description
<!-- Describe your changes. -->

1. added script for t5 encoder self attention and t5 decoder self/cross
attention fusions.
2. added simplified layernorm fusion for --external_data_format senario.
(otherwise relying on ORT optimizer)
3. added rel_pos_bias shape inference code, modified attention/mha shape
inference script.
4. reworked graph_topologic_sort() because the currently implementation
is not functioning correctly. also added an option to topo-sort the
graph in a deterministic way to let tests pass.

note:
1. the t5-beamsearch export code is slightly modified. specifically,
encoder_hidden_states(ehs) is no longer an input to the t5 decoder since
the ehs is not actually used in the graph execution.
2. recent PRs do not add optimizations to t5 on cpu. 
3. the fp32 model(encoder and decoder) for t5-small, t5-base and
t5-large can get a parity of e-5 and the corresponding beam search
models generate same results as pytorch.
4. fp16(mixed-precision) models, however, get a parity around 3e-2 and
some has maximum diff a bit over 3e-2. But the beam search models still
generate same results as pytorch (based on limited input data)
5. mt-5 model has a parity issue at the moment, even before any
optimization. will investigate later.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-03-13 23:35:56 -07:00
Christian Veenhuis
59dfcfdce7
Fix typos in sources: operater, tranform, neccessary, trainig (#14907)
### Description
While browsing the sources I found several typos here and there.
I collected them to a single PR and fixed them.
Namely these typos are: operater, tranform, neccessary, trainig.
After fixing none of them was found anymore:

$ git grep "operater"
$ git grep "tranform"
$ git grep "neccessary"
$ git grep "trainig"
$ 

### Motivation and Context
Since some of the typos are in example notebooks and markdown files,
users can see them.
2023-03-13 22:45:04 -07:00
Ye Wang
538d64891a
[t5 optimization] kernel changes to t5 (#14928)
### Description
<!-- Describe your changes. -->

1. support optional bias in Attention op (used in T5 encoder)
2. support broadcasting rel_pos_bias in attention_softmax.h
3. add scale in
MHA op's attributes
4. support past_key/past_value and present_key/present_value in MHA
5. UT and parity tests are added
6. fix an issue: https://github.com/microsoft/onnxruntime/issues/14920

note: the fusions will be in another PR since mt5 needs to be tested and
an issue from github will be investigated.

Future works:
1. support shared buffer for past/present
2. enable trt kernels when possible and investigate (trt/cutlass)kernels
with rel_pos_bias)
3. support KV/QKV packing with past/present

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-03-13 14:29:16 -07:00
Dmitri Smirnov
b34e570ad0
Enable LeakyRelu latest and refactor fast_gelu_fusion to enable the script (#15003)
### Description
Enable LeakyRelu latest since the last version differs only in type
support.
Refactor `fast_gelu_fusion` to enable the script, because our script is
unable to
check if any of the optimizers are outdated and no longer in effect.

### Motivation and Context
We do not want to loose performance.

Next step is to file improvements issues if any are required.
2023-03-13 14:20:11 -07:00