Commit graph

8519 commits

Author SHA1 Message Date
Changming Sun
c8524d2dab
Refactor web-ci pipeline and delete eager mode CI pipeline (#15416)
### Description
1. Move it to a separated pool that use the same image as [the public
hosted
pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml).
Also, create a beta pool which contains the next version image of the
hosted pool, and add jobs in our post merge pipeline to test if the next
version image will break our CI. So, usually we will have at least one
week to prepare.

2. Change the cmake generator in use in our pipelines from "Ninja" to
"MingW Makefile", because the latest version of cmake doesn't work with
the latest version of Ninja. People who prefer Ninja could still use
ninja in their local build by passing "--cmake_generator ninja" to
[build.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py).

3. Delete eager mode CI pipeline. 


### Motivation and Context
I need to update the software we have in our CI build machines, and I
need to resolve this incompatibility issue. In more detail, the build
error I hit was:

em++: error:
CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o:
No such file or directory
("CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o"
was expected to be an input file, based on the commandline arguments
provided)

After this PR we will deprecate python 3.7 support. The eager mode CI
pipeline is the last one that still use python 3.7. Then we can rework
the PR #10953 made by [fs-eire](https://github.com/fs-eire) last year.

Fixed
[AB#14435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14435)
2023-04-10 10:41:04 -07:00
Hector Li
9ef11f1c6a
[QNN EP] Qnn batchnorm Op support (#15222)
### Description
Support BatchNorm Op in Qnn EP
Node Unit group support for BatchNorm, Exp ops

### Motivation and Context
Enable more models.
2023-04-10 10:36:57 -07:00
Yi Zhang
0ea965c541
clear cache stat. after building (#15439)
### Description
Add  `ccache -z` after every building.


### Motivation and Context
Uploaded Cache stat shouldn't include cache stat.
2023-04-10 13:56:55 +08:00
stevenlix
6d126f8996
Add FP16 support for Whisper model (#15427)
Current ORT can only run inference for Whisper FP32 model. This PR adds
FP16 support.
2023-04-08 21:36:10 -07:00
Ye Wang
34f22daf25
Support T5 Beam Search with DecoderMaskedMHA (#15386)
### Description
<!-- Describe your changes. -->
tldr:
Latency improvement
t5-small: 37.8% 
t5-base: 24.5%


Benchmark on V100

Before:
T5-small
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '104.74', 'latency_95_percentile': '104.74',
'latency_99_percentile': '104.74', 'average_latency_ms': '104.74',
'QPS': '19.10', 'parity': True}
T5-base
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '200.93', 'latency_95_percentile': '200.93',
'latency_99_percentile': '200.93', 'average_latency_ms': '200.93',
'QPS': '9.95', 'parity': True}



After:
T5-small
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '76.01', 'latency_95_percentile': '76.01',
'latency_99_percentile': '76.01', 'average_latency_ms': '76.01', 'QPS':
'26.31', 'parity': True}
T5-base
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '161.40', 'latency_95_percentile': '161.40',
'latency_99_percentile': '161.40', 'average_latency_ms': '161.40',
'QPS': '12.39', 'parity': True}


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-04-08 12:50:18 -07:00
Hariharan Seshadri
f77c8f4863
Fix Npm packaging pipeline (#15425)
### Description
It seems like https://github.com/microsoft/onnxruntime/pull/15329
re-worked some jobs in `react-native-ci.yml` into stages. When this
template is used from within `npm-packaging-pipeline.yml`, there is
problem in that there is a stage that contains multiple stages as jobs.
Per my understanding, this is not acceptable to Azure DevOps. So,
re-working some portion of `npm-packaging-pipeline.yml` to accomadate
changes in https://github.com/microsoft/onnxruntime/pull/15329

### Motivation and Context
Fix NPM packaging pipeline
Validating test run with fix:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=297391&view=results
2023-04-07 22:13:39 -07:00
Ryan Hill
56beac4b5b
VIT model handling in the Benchmark.sh file (#15045)
### Description
Adds VIT model type to the benchmark
Also adds Swin (v1) model type

### Motivation and Context
Image models are important and we should verify these work as expected
at the performance we expect.
2023-04-07 20:17:29 -07:00
Pranav Prakash
3c5d02a9ce
Implement BatchNormGradient kernel for CPU EP (#7622)
**Description**: Register an implementation for BatchNormInternal and
add a CPU kernel for BatchNormGradient. This is the third in a series of
PRs to implement BN training on CPU (first was #6946, second was #7539).

**Motivation and Context**
Support training networks with BatchNorm (e.g. convnets). Also note that
there exists a CUDA kernel for BN (forward training & backwards) but
it's currently disabled due to flaky failures; someone more familiar
with those parts can register the implementation for BNInternal on CUDA
(gradient kernel doesn't have to change).

---------

Co-authored-by: Simon Zirui Guo <simonguozirui@berkeley.edu>
Co-authored-by: mindest <linminuser@gmail.com>
Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>
2023-04-08 09:20:26 +08:00
Rui Ren
5e2f46df2b
update deepspeed version 0.8.3 (#15415)
### Description
<!-- Describe your changes. -->
Update the support deepspeed to 0.8.3 as it's the latest version


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will fix the error of `Skip modifying optimizer because of
unsupported DeepSpeed version`

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-07 17:59:50 -07:00
Edward Chen
666aff56a4
Add workflow to update Objective-C docs. (#15413)
Add workflow to update Objective-C API docs. Remove the Objective-C API doc generation step from the packaging pipeline.

There are similar workflows for automatically updating other language API docs. This change enables this for Objective-C too.
2023-04-07 15:00:15 -07:00
Edward Chen
8db86f2c52
Use fixed version of Android NDK in binary size checks pipeline. (#15422)
Ensure that we build with a known version of NDK and are not surprised when the default version on the build machine changes.

A similar change was made for other Android build pipelines previously, but this one was missed.
2023-04-07 14:53:54 -07:00
Yateng Hong
9bb4e4bef4
Fix masm flags (#15417)
### Description
Fix onnxruntime_mlas build failure with cmake 3.26. Updated CMAKE
generator expression to make sure certain complier flags only apply for
C/CXX compiler.

### Motivation and Context
CMake changed the behavior of ASM_MASM in version 3.26. See
https://gitlab.kitware.com/cmake/cmake/-/issues/24639.

This also fixed the issue of #15101
2023-04-07 10:20:03 -07:00
Adrian Lizarraga
c294040bac
[QNN EP] Support AveragePool operator (#15419)
### Description
Adds support for the AveragePool operator to QNN EP.

### Motivation and Context
This is needed to enable more models to run with QNN EP.
2023-04-07 10:09:55 -07:00
Edward Chen
139f3df4d2
Update binary size checks pipeline to use stages for separate checks. (#15408)
Allow running of any single check instead of all of them.
2023-04-07 09:55:40 -07:00
Chen Fu
8dce83a818
Fuse 'Add' operator into FP16 Conv (#15213)
### Description
Adding 'Add' functionality to FP16 Conv operator. It takes a tensor that
has the same shape of the output tensor, and add it to the result
tensor.


### Motivation and Context
Needed to run Resnet 50
2023-04-07 09:51:03 -07:00
Hector Li
bb21031cbb
[QNN EP]Fix issue in LeakyRelu Opbuilder for HTP backend. (#15356)
### Description
Fix issue in LeakyRelu Opbuilder for HTP backend.
Qnn Prelu(Onnx LeakyRelu) requires alpha data as the 2nd input while
Onnx set it as attribute. HTP backend requires input to be quantized. It
caused Qnn Op validation failed by setting the 2ns input as float32 data
type.
Fix:
Need to set the 2nd input as quantized input for HTP backend. Calculate
the quantization parameter and quantize the alpha data into uint8.

### Motivation and Context
Unblock models with the LeakyRelu execution on QualComm HTP backend.
2023-04-07 09:15:07 -07:00
pengwa
16f5909f2d
Introduce shrunken gather operator (#15396)
### Introduce shrunken gather operator

Exist Gather operator schema won't guarantee output element count will
be smaller than input element count.
Actually, it is possible output element count >, =, or < input element
count.

For some cases we know for sure output element count MUST be <= input
element count, we will upstream those Gather operators to reduce compute
flops.

So this PR introduces an ShrunkenGather which explicitly guarantee
output count will be smaller than input count. The operator add
additional restriction on inputs, but still re-use existing Gather's
implementations plus input check during runtime.

This is a requirement for subsequent optimization (Draft PR:
https://github.com/microsoft/onnxruntime/pull/15401) we will do for
label sparsity and embedding sparsity.
2023-04-07 15:12:58 +08:00
Adrian Lizarraga
d31dd5935a
[QNN EP] Support Resize's pytorch_half_pixel coordinate transformation mode on HTP (#15390)
### Description
- Now uses QNN's Resize operator for quantized models
- Still uses QNN's ResizeBilinear or ResizeNearestNeighbor for
non-quantized models.

### Motivation and Context
This update is necessary to support more models on QNN HTP backend.
Specifically, we need to support Resize's `pytorch_half_pixel`
coordinate transformation mode on HTP.
2023-04-06 23:56:33 -07:00
Hector Li
03dd4e6da3
[QNN EP]fix bug in DlError (#15412)
### Description
fix bug in DlError. nullptr returned from  DlError() will cause crash.
2023-04-06 20:01:08 -07:00
Changming Sun
df11c85955
Download protoc.exe from nuget when cross-compiling (#15395)
### Description
1. The protoc package on nuget.org contains binaries for
Windows_x86/Windows_x64/Linux_x86/Linux_x64/MacOS_x64, which can cover
most use cases. Though it doesn't have binaries for AMR64, they are only
needed when we cross-compile for Intel CPUs on ARM CPUs. It is rare.
When you have such a need, you always can build protoc from source by
yourself and pass it to build.py as "--path_to_protoc_exe". Or if you
have security concerns that you don't want to use prebuilt binaries from
outside, you can do the same thing.

2. Remove GoogleTestAdapter related thing. That part of code is out of
maintain.

### Motivation and Context
As a follow-up of PR #15190.
2023-04-06 17:06:59 -07:00
Yuriy Chernyshov
65579021ee
Remove UTF-8 BOM (#15026) 2023-04-06 16:09:17 -07:00
Aditya Goel
e5617617fc
Float to float label encoder (#15400) 2023-04-06 16:05:36 -07:00
Hector Li
276c0a00e4
Reuse QDQConv for ConvTranspose to generate the QDQ model (#15385)
### Description
Reuse QDQConv for ConvTranspose to generate the QDQ model

### Motivation and Context
Generate the correct QDQ model
2023-04-06 15:07:44 -07:00
petermcaughan
2bd8e4a130
Petermca/whisper dedup (#15365)
### Description
Apply `get_shared_initializers()` to the encoder and decoder subgraphs
of Whisper before chaining and exporting the full, final model.


### Motivation and Context
The Whisper export process has some overlap between the encoder and
decoder subgraphs due to the format of the BeamSearch contrib op.
Consequently, there is some shared model data that is duplicated in the
final exported product, which can result in a file size increase of
~40%. This PR takes the methods in `convert_generation.py` and applies
them during the whisper export process.

---------

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-04-06 13:27:05 -07:00
Dmitri Smirnov
dc1845a9c8
Update mimalloc dependancy to the latest release (2.1.1) for Windows build. (#15382)
### Description
Update mimalloc dependency.

### Motivation and Context
The latest release contains important fixes including memory leaks and
used by customers.
2023-04-06 13:07:00 -07:00
petermcaughan
d0cca91cfb
Fix token_id values for whisper export (#15362)
### Description
The current ONNX export of Whisper utilizes hard-coded values for
token_ids when configuring the BeamSearch node. This PR removes these
literals and instead takes these values straight from the WhisperConfig.



### Motivation and Context
Hard-coding these values can cause some parity issues when comparing to
default PyTorch behavior - this change to take from WhisperConfig
resolves these.

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-04-06 11:01:21 -07:00
Deokhwan Kim
55495cc809
Do not apply QuickGeluFusion if an intermediate tensor is a graph output (#15109) 2023-04-06 10:17:06 -07:00
Stephan Gocht
026fb3ca1e
Fix compilation error when CUDNN_HOME is defined. (#15348) 2023-04-06 08:56:20 -07:00
Sheil Kumar
0fbbb6a43e
WindowsAI build failing due to deprecated .NET5 SDK missing in build image (#15383)
WindowsAI build failing due to deprecated .NET5 SDK missing in build
image

.NET5 was deprecated last year, and recently the build machine images
have been updated to not include this SDK.
Unblock failing builds by force insalling .NET5 SDK as part of the build
pipeline.
2023-04-06 08:51:07 -07:00
Changming Sun
a5b4d2a8a7
XNNPack: allow users to choose whether enable CPU MEM arena or not (#15392)
### Description
XNNPack: allow users to choose whether enable CPU MEM arena or not.
Right now it is hardcoded to true and it is not impacted by the on/off
switch in SessionOption. We should make it work.

### Motivation and Context
As we have such a switch in SessionOption, it should work as expected.
2023-04-06 15:43:13 +08:00
Hariharan Seshadri
ca68ab6126
Support decoder masked self attention for greedy sampling (#15319) 2023-04-05 23:08:43 -07:00
cloudhan
71a4e7eb97
Automatically enable tunable op usage for production models (#15156)
Split `IsTunbaleOpEnable` semantics into **enable tunable op for using**
and **enable tunable op for tuning**.

They remain disabled in general for safety purpose. But
- if session is created with onnx model with tuning results embeded
- the embedded tuning results is set to the EP without error `Status`

then we automatically enable the using, tuning remains disabled.

The planned options will be
- `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. **NOTE:** most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path.
- `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp`

Then for the possible future options:
- `tunable_op_tuning_max_iteration`: blahblah
- `tunable_op_tuning_max_duration_ms`: blahblah
- `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this.

For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations.
2023-04-06 13:52:47 +08:00
Jian Chen
2e52de265a
Upgrade remainding python to 3.11 removing 3.7 (#15321)
### Description
Upgrade remainding python to 3.11 removing 3.7


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-05 21:43:51 -07:00
Thuy Dao
6e1e808ec8
fix error unqualified call to 'std::move' (#15347) 2023-04-05 20:40:30 -07:00
Yi Zhang
962d8d2b19
Add compilation cache in react native CI (#15329)
### Description
1. Replacing jobs with stages for better debugging and maintainance
2. Added compilation cache to accelerate the workflow.
3. Splited building protobuf and major code as 2 tasks



### Motivation and Context
Reduced compilation time about one hour.
test run:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=943695&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=8b360243-7783-51da-8079-2304089d3d1d
2023-04-06 10:39:14 +08:00
Aditya Goel
a7d321e9dc
String to string label encoder (#15379) 2023-04-05 14:04:34 -07:00
Leso_KN
ea6b32fea8
Fix: Add def main() in onnxruntime_test.py (#15208) 2023-04-05 12:31:39 -07:00
Adam Pocock
ef11032c89
[java] Allows the creation and extraction of zero length tensors (#15116)
### Description
Allows the creation of zero length tensors via the buffer path (the
array path with zero length arrays still throws as the validation logic
to check it's not ragged would require more intrusive revision), and
allows the `tensor.getValue()` method to return a Java multidimensional
array with a zero dimension. Also added a test for the creation and
extraction behaviour.

### Motivation and Context
The Python interface can return zero length tensors (e.g. if object
detection doesn't find any objects), and before this PR in Java calling
`tensor.getValue()` throws an exception with a confusing error message.
Fixes #7270 & #15107.
2023-04-05 10:49:59 -07:00
Patrice Vignola
9191e04259
[DML EP] Add QuickGelu (#15220) 2023-04-05 10:49:34 -07:00
Justin Chu
a96e19abc4
Add type annotations to onnxruntime_inference_collection.py (#15364)
### Description

Add type annotations to `onnxruntime_inference_collection.py`



### Motivation and Context

Fixes #15334
2023-04-05 10:32:49 -07:00
Chen Fu
764e489a00
Adding FP16 Global Average Pool operator (#15324)
### Description
Adding FP16 Global Average Pool operator


### Motivation and Context

Supporting fp16 cpu inference
2023-04-05 09:38:02 -07:00
Aditya Goel
a4e9a48345
Reduce operators support for int64 type (#15358) 2023-04-05 09:19:43 -07:00
Edward Chen
9f5aa8e021
Add clog back to onnxruntime_EXTERNAL_LIBRARIES. (#15363)
### Description
<!-- Describe your changes. -->

Add clog back to onnxruntime_EXTERNAL_LIBRARIES.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix iOS packaging pipeline build failure.
2023-04-05 09:11:19 -07:00
Hector Li
a0d8dbe28d
Register Resize op into nhwc schema for Qnn EP (#15373)
### Description
Register Resize op into nhwc schema for Qnn EP.

### Motivation and Context
Resize op is identified as layout sensitive op for Qnn EP, need to
register it into nhwc schema
2023-04-05 08:41:16 -07:00
George Wu
4db10c93d1
[TensorRT EP] make --use_tensorrt_builtin_parser the default behavior in build.py (#15320)
Change the default behavior to link against the nvonnxparser library
(onnx-tensorrt parser) that is included with the TensorRT package.
Previously, the default behavior was to build and statically link
against the OSS onnx-tensorrt parser.
Historically, we wanted to incorporate the latest commits/fixes from OSS
parser.
These days the OSS parser is not significantly different from the
included parser library so there is less reason to build against it by
default.
By linking with parser shared library from TensorRT library, the major
benefit is it's much easier to
build/link against a minor version update of TensorRT. And OnnxRuntime
can be updated with a new TensorRT minor version by simply replacing
TensorRT libraries with the newer version. (because the parser is no
longer statically linked into onnxruntime)

Added --use_tensorrt_oss_parser to build.py to support the previous
default behavior. (build + static link OSS parser)
2023-04-05 07:53:29 -07:00
pengwa
fe0db63dee
Upstream reshape of merging batch/sequence (#15023)
### Upstream reshape of merging batch/sequence

For Reshape node that fulfills following requirements:
- input data rank = 3
- input shape is constant initializer, the untorched dim value MUST be a
constant value.
- Reshape is merging the first dimension, so output data rank = 2.

We upstream it to make it run as earlier as possible. Doing this will
allow us to upstream other operators (Gather) that is blocked by those
kind of Reshape node.

Currently, we did not enable it in graph_transformer_utils, since the
combined upstream gather changes are not ready yet.

Before:


![image](https://user-images.githubusercontent.com/10530022/224698252-f9705082-9710-4385-95ec-f1ccf50dc0e3.png)


After:


![image](https://user-images.githubusercontent.com/10530022/224698381-7e124d0d-ba47-4f35-8e37-6015014cd1c4.png)
2023-04-05 18:51:07 +08:00
Baiju Meswani
6b755debbc
Miscellaneous updates to training artifact generation (#15315) 2023-04-04 20:09:51 -07:00
Nhat Nguyen
198994d01d
Register PytorchAtenDomain in RegisterOrtOpSchemas (#14567) 2023-04-04 17:34:13 -07:00
Hariharan Seshadri
5294cd0c55
Print value errors in ort.InferenceSession to user (#15360) 2023-04-04 16:01:24 -07:00
Anton Korablin
207c57219a
Add support for full ViT optimization (#15289)
Add support for ViT optimization in optimizer.py
As ViT architecture follows BERT rather closely, we can easily reuse
BERT fusions for ViT. The only difference is that ViT does not have
attention mask, which means there is no Add node in qk paths.
Make the necessary changes in onnx_exporter.py to be able to cover
optimizations with test.
2023-04-04 14:05:24 -07:00