**Description**: Describe your changes.
This allow us quickly launch a microbench session by, for example:
`python skip_layer_norm_test.py 8 128 128 float32 `
Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.
### Description
<!-- Describe your changes. -->
As title
-Split long OpBuilder and OpSupportChecker files into individual
operator files.
-Add OpBuilder/SupportChecker registry factories.
-Combine the functionality of op_builder and op_support_checker into one
op_builder.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The NNAPI OPBuilder was splitted into OPBuilder (For EP::Compile) and
OPSupportChecker (for EP::GetCapability)
At the time it was reasonable choice, but OPBuilder/OPSupportChecker
share some logic and has to use addition helper.
Clean up now to make NNAPI OPBuilder/OPSupportChecker into single
OPBuilder (similar to what CoreML EP has)
### Description
<!-- Describe your changes. -->
Update React Native documentation to reflect change to use full ORT. Fix
broken links.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
ORT v1.13 uses the full ORT package. Instructions for performing a
custom build did not cover this.
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
**Description**: Describe your changes.
add SkipLayerNorm vectorize regular case
1. when hidden size <= 1024, SkipLayerNormTunable op can use both small
case and regular case
2. when hidden size > 1024, SkipLayerNormTunable op can only use regular
case.
**Motivation and Context**
- Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here.
This PR is to fix https://github.com/microsoft/onnxruntime/issues/12930
and https://github.com/microsoft/onnxruntime/issues/12579.
In detail:
- For CPU EP, since current impl of SimplifiedLayerNormalization doesn't
support input and scale having different data types, so if the sub-graph
contains Cast Op, the sub-graph will not fused, this guarantee that both
inputs and output data type will be same
- For CUDA EP, add (fp16, float) support to (T,V) type constraints all
combinations of fp16 and float can be supported in the impl
With the fix, the original model can be run with
SimplifiedLayerNormalization, which also helps to improve the perf.
Fix issue that all nodes inputs are added as sub-graph inputs event the input does not exist.
Solution:
Skip the placeholder inputs while adding node inputs as sub-graph inputs. E.g Onnx node test test_resize_upsample_scales_linear, 2nd input roi is empty.
Fixesmicrosoft/onnxruntime#12969
### Motivation and Context
Build is broken, can't find cudnn.lib with nvidia official install of
cuDNN
Alternative method is to use `IF(EXISTS
${onnxruntime_CUDNN_HOME}/lib/x64/cudnn.lib)` to test for legacy
location and only add the legacy dir to the path, else add the current
official `lib/` dir.
### Description
Add CUDA kernel.
Support double in CPU kernel and only write Mean and InvStdDev values if the optional outputs exist.
### Motivation and Context
Complete opset 17 support for LayoutNormalization
Attention, Quantize/Dequantize etc.
Update QOrderedMatmul's schema, updated unittest.
Verified test data for QOrdered Attention.
Co-authored-by: Zhang Lei <phill.zhang@gmail.com>
Co-authored-by: Lei Zhang <zhalei@microsoft.com>
1. Update CUDA version from 11.4 to 11.6.
2. Update Manylinux version
3. Upgrade GCC version from 10 to 11 for most x86_64 pipelines. CentOS 7 ARM64 doesn't have GCC 11 yet.
4. Refactor python packaging pipeline:
a. Split Linux GPU build job to two parts, build and test, so that the
build part doesn't need to use a GPU machine
b. Make the Linux GPU build job and Linux CPU build job more similar: share the same bash script and yaml file.
5. Temporarily disable Attention_Mask1D_Fp16_B2_FusedNoPadding because it is causing one of our packaging pipeline to fail. I have created an ADO task for this.
### Description ###
Add opset17 node test data
### Motivation and Context ###
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Some Ops in EP directory instead of contrib_ops directory will
require TunableOp. We will also need to add EP level session tuning
options for it. So move those code all at once.
Also remove duplicated utility functions.
**Description**: LayerNormalization is now part of the ONNX spec as of
opset 17.
We had a LayerNormalization contrib op, which (incorrectly) was
registered in the ONNX domain. Use that implementation for the ONNX
operator.
Update skip_layer_norm_fusion.cc. There are other optimizers that use
LayerNormalization that need updates as well.
**Motivation and Context**
#12916
**Description**: This PR adds Ascend CANN execution provider support.
**Motivation and Context**
- Why is this change required? What problem does it solve?
As the info shown in the issue. CANN is the API layer for Ascend
processor. Add CANN EP can allow user run onnx model on Ascend hardware
via onnxruntime
The detail change:
1. Added CANN EP framework.
2. Added the basic operators to support ResNet and VGG model.
3. Added C/C++、Python API support
- If it fixes an open issue, please link to the issue here.
https://github.com/microsoft/onnxruntime/issues/11477
Author:
lijiawei <lijiawei19@huawei.com>
wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: FFrog <ljw1101.vip@gmail.com>