Commit graph

8339 commits

Author SHA1 Message Date
JiCheng
cc15ceef4e Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/model_builder.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
6bdb03281a clean comments 2023-03-15 13:23:56 +08:00
wejoncy
762ea2402e fix 2023-03-15 13:23:56 +08:00
JiCheng
8db28d9139 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
8d00961321 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
c10462b5f5 ORT_UNUSED_PARAMETER 2023-03-15 13:23:56 +08:00
wejoncy
92fabf57ea comments 2023-03-15 13:23:56 +08:00
JiCheng
cd3173d531 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
dad772ef09 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
8aeed1e87d amend 2023-03-15 13:23:56 +08:00
wejoncy
5fe61c53a3 comments 2023-03-15 13:23:56 +08:00
JiCheng
d236085845 Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
d490908836 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
a8ed956fa7 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
de5e58c077 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
d70b5b38e2 amend 2023-03-15 13:23:56 +08:00
wejoncy
fdc970a40d rename constant var 2023-03-15 13:23:56 +08:00
wejoncy
18015f0f55 use span 2023-03-15 13:23:56 +08:00
JiCheng
4ca84ac303 Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
28a4cfeeef refactor 2023-03-15 13:23:56 +08:00
wejoncy
adfd38edb9 Fail early when getting device 2023-03-15 13:23:56 +08:00
wejoncy
82ae138143 address comments 2023-03-15 13:23:56 +08:00
JiCheng
36aac0036b Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
760a2b99d0 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
JiCheng
7c6fc31b65 Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
adf990f30f more docs 2023-03-15 13:23:56 +08:00
wejoncy
028c2372fa remove disable_cpu_soft temporarily 2023-03-15 13:23:56 +08:00
wejoncy
30151323da move init_device to ctor 2023-03-15 13:23:56 +08:00
JiCheng
d135bb7c0c Update nnapi_execution_provider.cc 2023-03-15 13:23:56 +08:00
JiCheng
781e72f663 Update nnapi_api_helper.cc 2023-03-15 13:23:56 +08:00
JiCheng
8383a54f9d Update include/onnxruntime/core/providers/nnapi/nnapi_provider_factory.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-15 13:23:56 +08:00
wejoncy
3873a55bd3 [NNAPI] fix feature_level query 2023-03-15 13:23:56 +08:00
Jian Chen
6891ab5bac
fix_macos (#15018)
### Description
<!-- Describe your changes. -->
This fix macos packaging build on universal2 arch. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-14 21:54:44 -07:00
Tianlei Wu
bdfdebfca7
Fix ReduceSum in attention fusion (#15047)
Fix https://github.com/microsoft/onnxruntime/issues/14959.
ReduceSum-13 move axes from attribute to node input.
2023-03-14 20:34:17 -07:00
PeixuanZuo
c70838cbbb
[ROCm] add Conv, NhwcConv benchmark to microbench (#15017)
Add Conv, NhwcConv benchmark to microbench.

Related PR: https://github.com/microsoft/onnxruntime/pull/14982,
https://github.com/microsoft/onnxruntime/pull/14980
2023-03-15 11:07:17 +08:00
Yi Zhang
f096f6167b
Remove python37 and cuda37 packages in orttraing (#15041)
### Description
supplement of #14874 and #14887

### Motivation and Context


N.B.
I'm not sure if python matrix of rocm is expected (python3.7-3.9) @faxu
@snnn

(https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/azure-pipelines/orttraining-py-packaging-pipeline-rocm.yml)
2023-03-15 08:54:15 +08:00
Yi-Hong Lyu
a8680ff188
Add float16 Sigmoid support (#14910) 2023-03-14 17:12:49 -07:00
Tianlei Wu
0bb7390d11
fix prefast warnings (#15033)
Fix prefast warnings
2023-03-14 13:51:51 -07:00
Rachel Guo
db4e664f7c
Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989)
### Description
<!-- Describe your changes. -->

Re-enable the react native e2e android unit test for react native CI as
recent change of specifying `default` instead of `google-apis` in
android emulator CI tests gives pretty stable result for now.

Upgrade the targetSDKversion for gradle test project in
react-native/android to meet minimum target api level requirement for
Google Play apps.


https://support.google.com/googleplay/android-developer/answer/11926878?hl=en

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

React Native CI issue.
2023-03-14 13:35:38 -07:00
Alex Kogan
8b09702b88
Enable parallel computation in Clip ops (#14925)
### Description
<!-- Describe your changes. -->
This PR speeds-up Clip operations by replacing their sequential
implementation with a parallelized one. The parallelization is achieved
by dividing the input data into chunks of size N and using a thread pool
to process the chunks in parallel. The chunk size N is set to 16K based
on performance evaluation on input tensors of 10^i elements for i in [1
.. 6].


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The Clip operation is frequently executed in image processing models.
Its implementation can be easily parallelized and therefore sped up when
executed on a multi-core machine. On long inputs (>= 100K elements) this
PR achieves speedup of over 2x. On shorter inputs, this PR does not
introduce any substantial performance change.
2023-03-14 09:41:44 -07:00
PeixuanZuo
2ff7f3e93a
[ROCm] support optimized Stable Diffusion model (#14980)
Add BiasSplitGelu/BiasAdd/GroupNorm/NhwcConv operator for ROCm EP.

1. BiasSplitGelu and BiasAdd operators can be automatically hipified
from CUDA EP.
2. GroupNorm was hipified from CUDA EP and modified to build.
3. NhwcConv is similar to NhwcConv in CUDA EP, But the MIOpen API and
cuDnn API are different. `miopenConvolutionForwardbias` and
`miopenOpTensor` of MIOpen doesn't support NHWC layout now, use
BinaryElementwise to replace miopenConvolutionForwardbias(NHWC layout).
2023-03-14 23:15:37 +08:00
PeixuanZuo
ff2850029b
[ROCm] refact SkipLayernorm long if-elseif statements (#14795)
Refact SkipLayernorm long if-elseif statements.
2023-03-14 23:04:55 +08:00
Ye Wang
0fa00429d5
[T5 optimization] script fusions and fixes (#14967)
### Description
<!-- Describe your changes. -->

1. added script for t5 encoder self attention and t5 decoder self/cross
attention fusions.
2. added simplified layernorm fusion for --external_data_format senario.
(otherwise relying on ORT optimizer)
3. added rel_pos_bias shape inference code, modified attention/mha shape
inference script.
4. reworked graph_topologic_sort() because the currently implementation
is not functioning correctly. also added an option to topo-sort the
graph in a deterministic way to let tests pass.

note:
1. the t5-beamsearch export code is slightly modified. specifically,
encoder_hidden_states(ehs) is no longer an input to the t5 decoder since
the ehs is not actually used in the graph execution.
2. recent PRs do not add optimizations to t5 on cpu. 
3. the fp32 model(encoder and decoder) for t5-small, t5-base and
t5-large can get a parity of e-5 and the corresponding beam search
models generate same results as pytorch.
4. fp16(mixed-precision) models, however, get a parity around 3e-2 and
some has maximum diff a bit over 3e-2. But the beam search models still
generate same results as pytorch (based on limited input data)
5. mt-5 model has a parity issue at the moment, even before any
optimization. will investigate later.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-03-13 23:35:56 -07:00
Christian Veenhuis
59dfcfdce7
Fix typos in sources: operater, tranform, neccessary, trainig (#14907)
### Description
While browsing the sources I found several typos here and there.
I collected them to a single PR and fixed them.
Namely these typos are: operater, tranform, neccessary, trainig.
After fixing none of them was found anymore:

$ git grep "operater"
$ git grep "tranform"
$ git grep "neccessary"
$ git grep "trainig"
$ 

### Motivation and Context
Since some of the typos are in example notebooks and markdown files,
users can see them.
2023-03-13 22:45:04 -07:00
Ye Wang
538d64891a
[t5 optimization] kernel changes to t5 (#14928)
### Description
<!-- Describe your changes. -->

1. support optional bias in Attention op (used in T5 encoder)
2. support broadcasting rel_pos_bias in attention_softmax.h
3. add scale in
MHA op's attributes
4. support past_key/past_value and present_key/present_value in MHA
5. UT and parity tests are added
6. fix an issue: https://github.com/microsoft/onnxruntime/issues/14920

note: the fusions will be in another PR since mt5 needs to be tested and
an issue from github will be investigated.

Future works:
1. support shared buffer for past/present
2. enable trt kernels when possible and investigate (trt/cutlass)kernels
with rel_pos_bias)
3. support KV/QKV packing with past/present

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-03-13 14:29:16 -07:00
Dmitri Smirnov
b34e570ad0
Enable LeakyRelu latest and refactor fast_gelu_fusion to enable the script (#15003)
### Description
Enable LeakyRelu latest since the last version differs only in type
support.
Refactor `fast_gelu_fusion` to enable the script, because our script is
unable to
check if any of the optimizers are outdated and no longer in effect.

### Motivation and Context
We do not want to loose performance.

Next step is to file improvements issues if any are required.
2023-03-13 14:20:11 -07:00
Nat Kershaw (MSFT)
a5d814008c
Fix API docs deploy so that a PR is not required (#15011)
Fixes this
[issue](https://github.com/microsoft/onnxruntime/actions/runs/4387534694/jobs/7682945415#step:12:534)
and removes the extra PR step in the workflow.

Also logs the commit of the main branch that the docs were generated
from to a file called version.txt at the root of the API docs tree.

Tested for Java API docs and results staged here:
https://natke.github.io/onnxruntime/docs/api/java/index.html

If approved, I can migrate all of the other API docs generation
workflows to use this scheme.
2023-03-13 09:36:08 -07:00
pengwa
44dda08b51
Renaming files (#15015)
### Renaming files for compute optimizer

### Motivation and Context

A follow up for https://github.com/microsoft/onnxruntime/pull/14832
2023-03-13 17:07:59 +08:00
PeixuanZuo
c55f347689
[ROCm] change miopen_conv_use_max_workspace=true (#14982)
Change miopen_conv_use_max_workspace=true to get best algorithm during
`miopenFindConvolutionForwardAlgorithm` process.
2023-03-13 16:19:23 +08:00
pengwa
448e989df8
Op slicing upstream refactor (#14832)
### Slice op upstream refactor

A refactor work for https://github.com/microsoft/onnxruntime/pull/13672.

### Motivation and Context

There is a similar optimization opportunity for other operator
upstreaming, to reduce compute flops. So refactor the existing code base
for making it easier to support other ops.

The changes in this PR are mainly about renaming and moving. 
- Move common logic (from compute_optimizer.h/cc) into
upstream_transformer_base.h/cc and shared_utils.h/cc.
- For upstream common logic, they are moved into
upstream_transformer_base.h/cc
   - For shared utilities, they are moved to shared_utils.h/cc.
- After the move, compute_optimizer.h/cc mainly for upstreaming gather
implementation (inheriting upstream_transformer_base.h/cc). Ideally it
should be renamed, but for easier review this time, I keep its name.
2023-03-13 08:19:32 +08:00