Commit graph

8477 commits

Author SHA1 Message Date
Edward Chen
9f5aa8e021
Add clog back to onnxruntime_EXTERNAL_LIBRARIES. (#15363)
### Description
<!-- Describe your changes. -->

Add clog back to onnxruntime_EXTERNAL_LIBRARIES.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix iOS packaging pipeline build failure.
2023-04-05 09:11:19 -07:00
Hector Li
a0d8dbe28d
Register Resize op into nhwc schema for Qnn EP (#15373)
### Description
Register Resize op into nhwc schema for Qnn EP.

### Motivation and Context
Resize op is identified as layout sensitive op for Qnn EP, need to
register it into nhwc schema
2023-04-05 08:41:16 -07:00
George Wu
4db10c93d1
[TensorRT EP] make --use_tensorrt_builtin_parser the default behavior in build.py (#15320)
Change the default behavior to link against the nvonnxparser library
(onnx-tensorrt parser) that is included with the TensorRT package.
Previously, the default behavior was to build and statically link
against the OSS onnx-tensorrt parser.
Historically, we wanted to incorporate the latest commits/fixes from OSS
parser.
These days the OSS parser is not significantly different from the
included parser library so there is less reason to build against it by
default.
By linking with parser shared library from TensorRT library, the major
benefit is it's much easier to
build/link against a minor version update of TensorRT. And OnnxRuntime
can be updated with a new TensorRT minor version by simply replacing
TensorRT libraries with the newer version. (because the parser is no
longer statically linked into onnxruntime)

Added --use_tensorrt_oss_parser to build.py to support the previous
default behavior. (build + static link OSS parser)
2023-04-05 07:53:29 -07:00
pengwa
fe0db63dee
Upstream reshape of merging batch/sequence (#15023)
### Upstream reshape of merging batch/sequence

For Reshape node that fulfills following requirements:
- input data rank = 3
- input shape is constant initializer, the untorched dim value MUST be a
constant value.
- Reshape is merging the first dimension, so output data rank = 2.

We upstream it to make it run as earlier as possible. Doing this will
allow us to upstream other operators (Gather) that is blocked by those
kind of Reshape node.

Currently, we did not enable it in graph_transformer_utils, since the
combined upstream gather changes are not ready yet.

Before:


![image](https://user-images.githubusercontent.com/10530022/224698252-f9705082-9710-4385-95ec-f1ccf50dc0e3.png)


After:


![image](https://user-images.githubusercontent.com/10530022/224698381-7e124d0d-ba47-4f35-8e37-6015014cd1c4.png)
2023-04-05 18:51:07 +08:00
Baiju Meswani
6b755debbc
Miscellaneous updates to training artifact generation (#15315) 2023-04-04 20:09:51 -07:00
Nhat Nguyen
198994d01d
Register PytorchAtenDomain in RegisterOrtOpSchemas (#14567) 2023-04-04 17:34:13 -07:00
Hariharan Seshadri
5294cd0c55
Print value errors in ort.InferenceSession to user (#15360) 2023-04-04 16:01:24 -07:00
Anton Korablin
207c57219a
Add support for full ViT optimization (#15289)
Add support for ViT optimization in optimizer.py
As ViT architecture follows BERT rather closely, we can easily reuse
BERT fusions for ViT. The only difference is that ViT does not have
attention mask, which means there is no Add node in qk paths.
Make the necessary changes in onnx_exporter.py to be able to cover
optimizations with test.
2023-04-04 14:05:24 -07:00
Aditya Goel
1c1d386561
Adds int32_t and uint32_t clip kernels (#15306) 2023-04-04 13:44:50 -07:00
Hariharan Seshadri
adb3d5dcb9
Allow constant folding nodes that have missing optional inputs (#15344) 2023-04-04 11:55:37 -07:00
Severin Simmler
4400e80452
Allow Path objects for deserialization of ONNX models (#15307) 2023-04-04 11:38:00 -07:00
Jian Chen
af28754e6f
Update python package pipeline to support 3.11 (#15311)
### Description
Update python package pipeline to support 3.11

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-04 10:55:32 -07:00
Ye Wang
0412bffbb4
fix build bug when enabling DEBUG_GENERATION (#15338) 2023-04-04 09:44:07 -07:00
petermcaughan
1251964f96
Petermca/beamsearch whisper (#15339)
### Description
Adjust various code paths to allow Whisper model to function with
BeamSearch op.

Approach: Add a new kModelType enum value in IGenerationParameters as
so:
#### Old: 0 = GPT2, 1 = T5
#### New: 0 = GPT2, 1 = T5, 2 = Whisper

When the user assigns this attribute value to 2, various shape and type
checks are changed to accommodate Whisper inputs.


### Motivation and Context
BeamSearch is currently designed to function with BERT-based models with
inputs as vocab tokens, and needs changes to function with Whisper
inputs (3-D float values processed from audio data).

---------

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-04-04 09:09:10 -07:00
Yi Zhang
b54ca9a041
Read the cache in main build if it's a (Intermediate)merge branch. (#15330)
### Description
In merge branch,  the run only reads the cache generated in main build.
As a result, each run in merge branch will not upload new cache except
at the first time.

### Motivation and Context
1.Reduce the cache storage.
If there's some big changes, devs should trigger the specific builds
manually in https://dev.azure.com/onnxruntime/onnxruntime/_build. It
still reads own branch cache.
2023-04-04 20:21:05 +08:00
pengwa
5baf5f506b
log level control + fix typos (#15302)
### log level control + fix typos
2023-04-04 20:19:13 +08:00
petermcaughan
f30e2d4387
Whisper Export (#15247)
### Description
Add scripts to export Whisper model to ONNX and integrate the ORT
BeamSearch op with the resulting graphs.

Example command to execute this script:

python convert_to_onnx.py -m openai/whisper-large --output whisper -e

---------

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-04-04 05:01:04 -07:00
Tianlei Wu
3cf3fa0467
Fix prefast warnings (#15340)
Fix prefast warnings: 
(1) Arithmetic overflow: Using operator '*' on a 4 byte value and then
casting the result to a 8 byte value. Cast the value to the wider type
before calling operator '*' to avoid overflow (io.2).
(2) Dereferencing NULL pointer 'key'.
2023-04-03 22:29:13 -07:00
Ye Wang
dec11afb83
Fix a prefast warning (#15343)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->


https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/14272/?triage=true
2023-04-03 18:25:25 -07:00
Hector Li
44027797b0
[QNN EP] Gather support int64 indices input (#15317)
### Description
Gather support int64 indices input

### Motivation and Context
Support more scenario
2023-04-03 17:51:42 -07:00
Matthieu Darbois
85bb13345d
Rework some external targets to ease building with -DFETCHCONTENT_FULLY_DISCONNECTED=ON (#15323)
### Description
Rework some external targets to ease building with
`-DFETCHCONTENT_FULLY_DISCONNECTED=ON`
This will allow package managers to more easily provide an onnxruntime
package by reducing the amount of patching needed downstream at each
version.

### Motivation and Context
Availability of onnxruntime in some C++ package managers
https://github.com/microsoft/onnxruntime/issues/7150
https://github.com/conan-io/conan-center-index/issues/16699
https://github.com/microsoft/vcpkg/issues/20548

My initial intent is to get this in conan but the PR would most likely
be useful (though not tested) to vcpkg as well (and maybe others).
I tried to get only a first batch of not too specific patches (i.e. not
specific to conan).

The first commit reworks `flatbuffers` and just extends what @snnn did
in https://github.com/microsoft/onnxruntime/pull/13991
The second commit reworks `pytorch_cpuinfo`
The third commit reworks `google_nsync`
2023-04-03 17:45:12 -07:00
RandySheriffH
e4aae94f20
Remove azure build to unblock PRs (#15336)
Temporarily remove Azure build check to unblock PR(s).
We need to investigate the sudden build failure and reenable.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-04-03 12:47:14 -07:00
Ye Wang
fbfe92f66a
DecoderMaskedMultiHeadAttention enhancement (#15292) 2023-04-02 21:53:03 -07:00
Sheil Kumar
7ccdf9ad8c
User/sheilk/sequence fix (#15239)
Ensure that Loop operators run on CPU.
Fix memcpy for Sequence Tensors, so that empty sequences (like when
SequenceEmpty runs on DirectML) can be copied back to CPU.
2023-03-31 12:57:25 -07:00
Dmitri Smirnov
c06ab5e353
Optimize use of Eigen::DenseBase::select() for PRelu (#15287)
MSVC and gcc are both not good at optimizing select(), even in trivial
usage outside of ORT.
gcc seems to do better with -ffast-math (not used by ORT) but /fp:fast
does nothing for MSVC
This PR delivers a 33% speedup on the same model (360us -> 270us on
Windows; 205 us -> 153 us on Linux; measured on different systems).

TODO: Examine and fix Elu and other similar activation functions for the
use of `Eigen::select`

  Co-authored-by: @fpribeiro

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-31 11:20:07 -07:00
shalvamist
fff75a301c
ORT_Web - JS graph parsing update (#15185)
### Description
Simplified the JS graph parsing logic - addressing gitHub issue #15006
bug fix
2023-03-31 09:26:55 -07:00
Yufeng Li
c68044cc4b
fix prefast warning for GenerationCudaDeviceHelper::ProcessLogits (#15163) 2023-03-31 08:50:53 -07:00
Yufeng Li
c08d6b42e8
Add tool to support packing mode for BERT model (#15283)
### Description
<!-- Describe your changes. -->
Add a tool to convert fused BERT like model to packing mode


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-31 08:46:47 -07:00
cloudhan
027e231a83
Report unsupport reason during tuning (#15246) 2023-03-31 16:54:11 +08:00
JiCheng
60cc082f0a
[NNAPI] Minor fix (#15052)
### Description
<!-- Describe your changes. -->

Followed by https://github.com/microsoft/onnxruntime/pull/14881


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-31 15:13:57 +08:00
PeixuanZuo
d80859f63d
[ROCm] fix python packaging pipeline and add python10 (#15282)
rocm python packaging pipeline failed because manylinux version and
manylinux.patch update.
1. fix duplicate `epel-release` installation issue, ROCm pipeline
install it at the begin of the dockerfile to install rocm libs. remove
duplicate installation on install-runtime-packages.sh.
```
/var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package.
Error: Nothing to do
```
2. add python10 to fix error below.
```
+ /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools
build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory
```
3. add python10 to rocm pipeline.

pipeline link:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results
2023-03-31 10:25:21 +08:00
Baiju Meswani
e870089ca8
Refining the offline tooling for training artifact generation (#15212) 2023-03-30 18:05:51 -07:00
Pranav Sharma
818b94b4ea
Add owners for public facing API files (#15288)
### Description
Add owners for public facing API files

### Motivation and Context
Tighter control on the APIs
2023-03-30 17:16:15 -07:00
Chen Fu
605c2f4b89
Remove fp16 support from apple (#15270)
### Description

Removing fp16 support from apple build


### Motivation and Context
FP16 support on ARM64 only available after armv8.2a, thus the clang
compiler needs a compilation flag `-march=armv8.2-a+fp16`.
Unfortunately, our current universal build does not support hardware
specific compilation flags on cpp source files, as it would cause
trouble when compiling against more than one hardware target. Until we
figure out how to remove this limitation, had to disable fp16 support
for Apple systems.
2023-03-30 16:44:26 -07:00
Guenther Schmuelling
4645726d74
fix for webgl lrn (#15236)
fix issue that resulted in wrong results for lrn on webgpu
2023-03-30 16:16:57 -07:00
Edward Chen
9f942e1a3e
Graph transformer to ensure unique DQ nodes for QDQ node units (#15145)
### Description
<!-- Describe your changes. -->

Add required graph transformer to duplicate DQ nodes to ensure that QDQ
node units have unique DQ nodes. This condition is necessary for QDQ
node unit processing.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

There is an existing Python utility that does this: 

c7ced7a5e9/tools/python/util/qdq_helpers/qdq_model_utils.py (L77)

This PR implements it as a graph transformer so it is integrated into
ORT and does not require a separate step to update the model. There are
also tests to ensure that its effects are not undone by basic level
graph optimizations.
2023-03-31 08:39:43 +10:00
Xavier Dupré
786f8b98f7
Add a page in the documentation for every operator in onnxruntime (#14340) 2023-03-30 14:39:16 -07:00
yf711
dc61d3b5b6
Fix symbolic shape inference script on precision loss issue (#15215)
### Description
When calculating symbolic shape like `mul(get_int_val(values=[1024,
0.5]))`,
the current script calls `get_int_val()` to get values, which values
becomes `[1024, 0]`.
Thus, the result of `mul(values)`->`mul([1024,0])`=0, but the expected
shape size is 512

Fix: for math binary operations like `mul()` and `div()`, 
don't convert input shapes into integers if any possible precision loss
happen;
keep the input shape as float, finish the operation, and cast final
result into integer and output the shape.

Test cases are added:
1. mul(1024, 0.5)=>512 (before this fix, the output would be 0, as float
0.5 would be converted to int 0)
2. div(768, 1.5)=>512 (before this fix, the output would be 768, as
float 1.5 would be converted to int 0)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-30 12:15:27 -07:00
cao lei
c2dad6893b
use cudaStreamNonBlocking flag (#15258)
### Description
This PR uses cudaStreamNonBlocking flag when creating cuda stream,
meaning the created stream will run concurrently with default stream, no
implicit synchronization with default stream.



### Motivation and Context
This PR is required for the perf concern
2023-03-30 11:43:50 -07:00
Changming Sun
75f6861cb8
Skip DNNL's opset18 tests (#15275)
### Description
DNNL EP doesn't support opset18 yet. So, let it skip such tests so that
we could still test the other EPs.

The models mentioned above are ONNX node tests that live in
github.com/onnx/onnx
2023-03-30 09:58:11 -07:00
Scott McKay
6d464748ba
Make internal nhwc schema registrations complete (#15278)
### Description
<!-- Describe your changes. -->
Add all the ONNX layout sensitive ops from opset 11 on. 
Make list in transpose optimizer consistent.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
When we run L1 optimizers after layout transform in a full build it
needs a schema for any layout sensitive ops that get converted to the
internal domain. Previously we did not run L1, so we got away without
having schemas unless the EP used a static kernel for the nhwc version
of the op.
2023-03-30 08:55:14 -07:00
Yi Zhang
c5f5e3ec5e
Improve 2 cache tasks in one pipeline yaml (#15267)
### Description
1. Make 2 cache tasks in one pipeline really works
2. Each building step has its own environment variable CCACHE_DIR
instead of job variables.
3. Extenal Protobuf compilation cache only updates with deps.txt. It
doesn't generate new cache in every commit.


### Motivation and Context
The simple workflow is as below
```
--------build with ccache-------             
         |                       
        cache                    
         |                       
      {CCACHE_DIR}-----cache stat.
```

```
-------Cache@2------
           |
    download cache           
           |                         
          {path}--------upload cache
```

1. {XXX} means environment variable or task input.
2. {CCACHE_DIR} must be consistent with {path}. Ccache produces caches
in {CCACHE_DIR} and Cache@2 download cache into {path} and tar {path}
and upload it.
3. Protobuf changes with deps.txt so that it would reduce the storage
size.
4. Next step, we may split the compilation into 2 steps, one for
external dependencies and another for ORT.
2023-03-30 23:22:11 +08:00
Yi Zhang
aab3c15585
Add Compliation Cache in CoreML pipeline (#15259)
### Description
1. move the cache task definition into template
2. In debug mode, the compiler mtime is different in different machine.
So, change the CCACHE_COMPILERCHECK to content.


### Motivation and Context
1. Accelerate the CoreML pipeline.
Test run:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=938040&view=logs&j=1ac7588f-a5bd-5ff7-4a8a-a34869d50220
With Cache, the run can be finished in 12 minutes. Without cache, it
takes about 1 hour.
3. Make the cache function easy to use and maintain.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-30 23:18:52 +08:00
Yulong Wang
2928fda490
[web] disable browser test temporarily (#15280)
### Description
This PR disables browser test temporarily. The test randomly fails and
we are investigating the issue. Disable the test to unblock others.
2023-03-30 08:15:36 -07:00
Changming Sun
15f7dca9fb
Update protobuf to 3.21.x (#15245)
### Description

Fixed
[AB#10092](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/10092),
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753),
[AB#11759](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11759)

### Motivation and Context
The one we use has a security issue in Java, though we don't use that
version's protobuf java package.
2023-03-29 14:08:18 -07:00
Changming Sun
5d1dbfb432
Update ONNX test data (#15256)
Change the test data version from 1.13.0 to 1.13.1, which will include some bug fixes.
2023-03-29 13:13:11 -07:00
Changming Sun
4a0b86eba6
Update the post-merge pipeline (#14965)
### Description
1.  Remove Linux jobs for ORT-Extension combined build
2.  Add a macOS build job for ORT-Extension combined build
3. Adjust the yaml file so that it can support two different ADO
instances.


### Motivation and Context
To test our code better. And it will enable us to run such tests for
every commit in the main branch. It would be easier for us to figure out
which change caused a build break.

See
[AB#13435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13435)
2023-03-29 13:12:07 -07:00
Changming Sun
fb1f03fdff
Increase the timeout value of win-wasm-ci.yml (#15257) 2023-03-29 13:11:51 -07:00
FFFrog
ecb89ed752
[CANN] Multi-stream execution support for CANN EP. (#14058)
### Description
**Multi-stream** execution support for **CANN EP**.

### Motivation and Context
**CANN EP** is currently **unavailable** due to the introduction of a
new mechanism for multi-stream execution
[#13495](https://github.com/microsoft/onnxruntime/pull/13495), the
deletion of the Fence-based synchronization mechanism, and the failure
to update the relevant logic of **CANN EP** synchronously.

This PR is to fix it.
2023-03-29 11:57:22 -07:00
Adrian Lizarraga
febc69e1b2
[QNN EP] Support Cast in HTP backend (#15234)
### Description
Adds support for the Cast operator to the QNN HTP backend.



### Motivation and Context
Enable more models to run on QNN HTP backend.
2023-03-29 11:01:34 -07:00