Commit graph

10986 commits

Author SHA1 Message Date
Sumit Agarwal
f4f49535a4
[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 (#21670)
### Description
This change cherry-picks 2 Pad fusion optimization:
https://github.com/microsoft/onnxruntime/pull/21640 and
https://github.com/microsoft/onnxruntime/pull/21556.

It also has to cherry-pick 2 extra changes to unblock pipeline and
dependency failure: https://github.com/microsoft/onnxruntime/pull/21300
and https://github.com/microsoft/onnxruntime/pull/21662 (didn't include
test which are part of 1.18.1 payload).

Also uploaded new version of
[onnxruntime_build_dependencies:10.177](https://dev.azure.com/onnxruntime/onnxruntime/_artifacts/feed/onnxruntime/UPack/onnxruntime_build_dependencies/overview/1.0.177)
and updated the same in `download-deps.yml`.

Additionally it also updates DML binary to 1.15.1.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-08-12 07:02:00 -07:00
Adrian Lizarraga
387127404e
[ORT 1.18.1 Release] Update ORT numpy dependency to >=1.21.6,<2.0 (#21141)
### Description
Updates the version of numpy required by onnxruntime to >=1.21.6,<2.0



### Motivation and Context
Numpy released version 2.0. The onnxruntime 1.18.1 release is using
numpy < 2.0, so we need to update requirement files to only install
versions between 1.21.6 and 2.0 (non-inclusive).
2024-06-24 11:24:55 -07:00
Yifan Li
d0aee204af
[ORT 1.18.1 Release] Cherry pick 3rd round (#21129)
### Description
<!-- Describe your changes. -->
Adding critical TensorRT EP support


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: wejoncy <wejoncy@163.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Yi Zhang <your@email.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: inisis <46103969+inisis@users.noreply.github.com>
Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com>
Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Dhruv Matani <dhruvbird@gmail.com>
Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com>
Co-authored-by: Xu Xing <xing.xu@intel.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Jian Chen <cjian@microsoft.com>
Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com>
Co-authored-by: Thomas Boby <thomas@boby.uk>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Michal Guzek <mguzek@nvidia.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2024-06-24 10:02:38 -07:00
Yifan Li
8bfcf14b42
[ORT 1.18.1 Release] update 1.18.1 patch release version (#21143)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-24 09:20:31 -07:00
Yifan Li
25ab935664
[ORT 1.18.1 Release] Cherry pick 2nd round (#21111)
### Description
<!-- Describe your changes. -->
[#21062](https://github.com/microsoft/onnxruntime/pull/21062),
[#21096](https://github.com/microsoft/onnxruntime/pull/21096) to fix
Xamarin,

[#21095](https://github.com/microsoft/onnxruntime/pull/21095) and
[#21109](https://github.com/microsoft/onnxruntime/pull/21109) to fix
python on NuGet_Packaging stages

[#21104](https://github.com/microsoft/onnxruntime/pull/21104) to remove
failing roslynanalyzer on NuGet_Packaging stages

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
2024-06-20 17:01:57 -07:00
Yifan Li
91fb865058
[ORT 1.18.1 Release] Cherry pick 1st round (#21105)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Jian Chen <cjian@microsoft.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Your Name <you@example.com>
2024-06-19 22:10:58 -05:00
Yi-Hong Lyu
45737400a2
[ORT 1.18.0 Release] Cherry pick 3rd/Final round (#20677)
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Jian Chen <cjian@microsoft.com>
2024-05-15 00:14:29 -07:00
Yi-Hong Lyu
ed349b9d9d
Mark end of version 17 and 18 C API (#20671)
Additionally, these versions are safeguarded by the `static_assert`.
2024-05-14 02:26:15 -07:00
Yi-Hong Lyu
d72b476723
[ORT 1.18.0 Release] Cherry pick 2nd round (#20620) 2024-05-10 01:23:14 -07:00
Yi-Hong Lyu
65f3fbf137
[ORT 1.18.0 Release] Cherry pick 1st round (#20585) 2024-05-08 08:42:07 -07:00
Yi-Hong Lyu
204f1f59b9
Run fuzz testing before the CG task cleans up the build directory (#20500) (#20516)
### Description
<!-- Describe your changes. -->
Update order of steps


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. --> Fix CI

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-04-30 02:51:13 -07:00
Satya Kumar Jandhyala
21b3cbc3af
[WIP][JS/WebGPU] Inputs Key and Value could be 4-dims. (#20470)
### Description
The Key and Value inputs could be 4-dims


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-25 13:33:46 -07:00
Edward Chen
2c19db0af1
Put x64 specific benchmark code into ifdefs. (#20456) 2024-04-25 12:33:12 -07:00
Frank Dong
227c4419fc
add bf16 support for few ops (#20385)
### Description
Add bf16 support for below ops:
ConstantOfShape
Exp
Erf
convolution
PythonOp



### Motivation and Context
phimm model works on bf16, ORT need support bf16 on previous ops to work
with phimm on bf16
2024-04-25 11:28:34 -07:00
Yi Zhang
464f199b95
Extend mac package jobs time out limit (#20459) 2024-04-25 10:13:13 -07:00
Yi-Hong Lyu
edffa2a180
Optimize MlasComputeSoftmax with prefetch (#20393)
The prefetching instructions (_mm_prefetch) is used to anticipate memory
accesses by prefetching the next row of the input buffer. This
optimization is designed to reduce the impact of memory latency, thereby
enhancing the performance of the MlasComputeSoftmax function. As a
result, the worst-case performance of the OCR model has improved by
approximately 50ms, which equates to a 3% improvement.
2024-04-25 08:28:59 -07:00
Chi Lo
a077330c3e
[TensorRT] adapt for TRT lib name change after TRT 10 GA (#20445)
For TensorRT 10 GA onwards, the TensorRT libraries will have major
version appended to the end on Windows, for example, nvinfer_10.dll,
nvinfer_plugin_10.dll, nvonnxparser_10.dll ...

Change cmake file accordingly.
2024-04-24 21:46:54 -07:00
Yi Zhang
e5947f5729
Two improvements in pipelines (#20449)
### Description
1. Update the image name to avoid docker image wouldn't be overwrite.
there was an mistake that variables.CUDA_VERSION_MAJOR is always empty

14fcf0a52d/tools/ci_build/github/azure-pipelines/stages/nuget-linux-cuda-packaging-stage.yml (L120)
3. set one artifact name as variable to make the job rerunnable



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-25 10:15:40 +08:00
Xavier Dupré
218b6b0a73
Fix missing argument when calling _get_quantize_input_nodes (#20245)
### Description
The current code is calling one method with a missing argument.



### Motivation and Context
It breaks Olive's unittests.

---------

Co-authored-by: Xavier Dupré <xavier.dupre@gmail.com>
2024-04-25 00:46:48 +02:00
Yulong Wang
a5182a2ef3
[js/web] update test condition for '--force-localhost' (#20450)
### Description

Fixes the NPM packaging pipeline failure.
2024-04-24 12:14:03 -07:00
Edward Chen
9cc5badc49
Fix Objective-C static analysis warnings. (#20417)
Replace most usages of [NSString stringWithUTF8String:] with checked helper function. The issue is that the former can return nil.
2024-04-24 11:48:29 -07:00
maggie1059
dfd4bce36e
Use compute queues by default in DML EP (#20438)
### Description
We originally only use compute queues for compute-only devices; this
change sets the default for DX12 devices to use compute queues as well.



### Motivation and Context
There have been issues with TDRs occurring when using the current
default queues, which doesn't happen on compute queues.
2024-04-24 10:44:16 -07:00
Xavier Dupré
f78215adad
Fix quantization tools for issue #19529 (#19591)
### Description
Fix issue #19529, the code was using a variable loop outside a loop.
2024-04-24 19:16:27 +02:00
Scott McKay
a46bab6364
Update podspec url to use AFD hostname (#20452)
Update to use AFD url when generating podspec
2024-04-24 09:37:24 -07:00
Satya Kumar Jandhyala
ae78cdb5d7
[JS/WebGPU] MultiheadAttention bugfix (#20447)
### Description
Fixed pastkey, key and pastvalue, value concatenation condition and
fixed index error. Added new test cases.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-24 08:43:14 -07:00
Guenther Schmuelling
33d5ea39b3
[js/webgpu] fixes for fp16 attention (#20440) 2024-04-24 08:01:28 -07:00
Xavier Dupré
80213a9e66
Add implementation for ScatterND (#19540)
### Description
onnxruntime switches to CPU for ScatterND after opset 13. This extends
the implementation of higher opsets.
2024-04-24 14:08:50 +02:00
Rachel Guo
14fcf0a52d
Support visionos build (#20365)
### Description
<!-- Describe your changes. -->

This PR supports a build of onnxruntime.xcframework for xros/xrsimulator
for visionos via the build command of

`python3 tools/ci_build/github/apple/build_apple_framework.py --config
Release/Debug
tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`.

For officially include visionos in ios cocoapods package and testing in
CI, would require separate work for upgrading the Xcode version &
upgrade macOS CI agent to macos-13-arm64 or higher.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

visionos support:
https://github.com/microsoft/onnxruntime/discussions/19313

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2024-04-23 18:15:07 -07:00
Adam Louly
4ce7bbf6f1
Add LayerSpec Support to ORTPipelineModule (#20410)
### Description
In Deepspeed's Pipeline Parallel Implementation, there is a class used
to instantiate the object after it's moved to the device and assigned in
a stage.

This approach helps reduce peak memory usage. 

In this PR, we're adding support to ORT for wrapping this LayerSpec.
2024-04-23 17:57:08 -07:00
Yulong Wang
5055dc0aa8
[js/web] add diagnose log for chrome (#20439)
### Description

Add logs to further diagnose the pipeline issue.
2024-04-23 17:18:54 -07:00
Maximilian Müller
b4e50758c0
Fix shape conv fuse opt (#20282)
FIx:
- Multiples Convs into an Add+Relu will fuse the op although intermediates are needed

![image](https://github.com/microsoft/onnxruntime/assets/44298237/0c85a30c-5f41-4e62-ae2e-f41eada6c2c3)
- Also fixes an issue with Shape Initializers Merge as input, that
occurs when the input initializer is the same across multiple nodes but
not all nodes are Shape nodes.
2024-04-23 16:19:57 -07:00
Yulong Wang
8f53957bcf
[js/web] add "browser" field to support parcel v2 (#20422)
### Description

As described in latest discussion in #19915, parcel v2 without using the
[new resolver](https://parceljs.org/blog/v2-9-0/#new-resolver) will not
work correctly with onnxruntime-web. There are still users who uses
parcel with default resolver, so add this deprecated field "browser"
back for backward compatibility. This PR also corrects the "main" field,
which is for old resolver for Node.js.
2024-04-23 13:10:11 -07:00
Yulong Wang
13bda11583
[Node.js binding] Fix install script (#20416)
### Description
Fix a few bugs of the install script of onnxruntime-node package.

This change is integrated from branch `rel-1.17.3` (#20397)
2024-04-23 13:01:16 -07:00
Satya Kumar Jandhyala
d42ac7f0c6
[JS/WebGPU] Multihead attention improvements (#20286)
### Description
Enabled more usecases



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 12:39:49 -07:00
Edward Chen
76461c8f4d
Increase timeout for iOS packaging pipeline jobs. (#20434) 2024-04-23 11:55:55 -07:00
Guenther Schmuelling
b8e6684313
more conservitive gpu-buffer cache algo (#20312)
tuned based on 80 models to keep performance impact minimal
2024-04-23 09:07:04 -07:00
guyang3532
ffb9c8d598
fix embedding sparsity log bug of -1% density (#20420)
### Description
When not checked valid embedding sparsity, the log print a wrong info of
"-1% density", this pr is to fix it.
2024-04-23 20:37:50 +08:00
Scott McKay
ed6f1adcb8
Fix overflow causing test failure on x86 (#20425)
### Description
<!-- Describe your changes. -->
Fix comparison that was not updated when the threshold was converted to
bytes.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CI failure
2024-04-23 21:33:59 +10:00
Maximilian Müller
5eae33fc6b
[CUDA EP] RNN check if tf32 is allowed (#20338)
Respect the use_tf32 flag.
2024-04-23 00:19:09 -07:00
Yi Zhang
7ebc653f04
Revert "Nuget .NET changes for Mac Catalyst (#19923)" (#20418)
This reverts commit f396748ed6.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 15:08:12 +08:00
Adrian Lizarraga
e6a677f6b7
[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359)
### Description
- Updates Windows QNN Nuget and Python packaging pipelines to download
QNN SDK from blob storage.
- Makes the QNN SDK version configurable when launching the python
packaging pipeline.



### Motivation and Context
Removes the need to rebuild images to update QNN SDK. Only applies to
Windows pipelines. Linux pipelines still get the SDK from disk.
2024-04-22 22:32:55 -07:00
aciddelgado
94c69f55d4
GQA 4 CPU (#20299)
### Description
Support GQA operator on CPU with FP32.



### Motivation and Context
Right now, models generated for CPU and GPU must be different. GQA CPU
allows these models to be the same.
2024-04-22 19:57:05 -07:00
Scott McKay
c47a6ce70b
XNNPACK: Support 1D input for Conv and ConvTranspose (#20349)
### Description
<!-- Describe your changes. -->
Support 1D input to XNNPACK Conv and ConvTranspose by using faking
height of 1 to convert to 2D input.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable speech model with 1D input to use XNNPACK. There is no CPU EP
quantized ConvTranspose, so this fills that gap.
2024-04-23 11:50:31 +10:00
Edward Chen
3270a002fa
Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. (#20379)
Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. The ORT format model doesn't contain information about kMSInternalNHWCDomain since it is set during layout transformation. Fall back to known domains instead.
2024-04-22 18:34:01 -07:00
Preetha Veeramalai
c7de4de501
OVEP Bug fix 1.18 (#20408)
### Description
Contains critical bug fix 



### Motivation and Context
This PR handles the bug fix wrt OV caching and blob generation.
This also handles the precision for AUTO plugin.

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2024-04-22 18:31:05 -07:00
pengwa
a7787a0bad
Introduce memory efficient topological sort (#20258)
### Introduce memory efficient topo sort (for training)

~~and laze initialize Priority-Based and Memory-Efficient topo sort.
Because in most cases, they are not needed, so we free the overheads of
GraphViewer construction for most use cases.~~

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 08:00:23 +08:00
Scott McKay
9372e9a0a3
Support >2GB of Tensor data in training checkpoint (#20077)
### Description
<!-- Describe your changes. -->
Add ability to store initializer data in an external file.
Update training checkpoint code to use external file if data > ~2GB.

I don't see a way for the flatbuffers 64-bit offsets to be used, as they
don't support storing 'table' types with 64-bit offsets (and our Tensor
is a 'table' type not a simple struct).


0cfb7eb80b/tests/64bit/test_64bit.fbs (L38-L39)

Allowing a Tensor to have its raw_data in an external file should
hopefully work with the least friction. As it's an extra field it's
backwards compatible.

Please feel free to suggest alternative approaches. 

Side note: the diffs in the generated *.fbs.h files are unexpectedly
large. Maybe they weren't re-generated when the new flatbuffers version
was checked in. I updated by running:
`python .\compile_schema.py -f <build output
dir>\_deps\flatbuffers-build\Debug\flatc.exe`
from onnxruntime\core\flatbuffers\schema which I thought was the correct
way but maybe that's out of date.

I think you can ignore all the diffs in the generated files and just
worry about the changes to the .fbs files in
onnxruntime/core/flatbuffers/schema. Basically start at the bottom of
the files changed and work up as all the 'real' diffs are there.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: carzh <wolfivyaura@gmail.com>
2024-04-22 15:17:43 -07:00
Yulong Wang
4385602386
[js/web] fix test runner with optional input/output (#20399)
### Description
fix test runner with optional input/output.

This change fixes the OP test runner (.jsonc format test) with optional
input(s) and/or output(s).

this fix reveals a problem of dealing with optional outputs:

> Take SkipSimplifiedLayerNorm as example: 
>
> if in the ONNX model, the node's outputs are: [ 'output_0', '' ]
instead of [ 'output_0' ], the current implementation will fail. The
difference is, in the first case, context.outputCount == 2, and then the
typescript implementation will try to create a tensor for output[1]. It
will eventually call to C++ function (OpKernelContext::Output), and the
output.DataRaw() will be nullptr. WebGPU backend will fail because it
cannot deal with a TensorView with data == 0.
>

This problem may need to be fixed or workaround in separated PR. This PR
does not fix this problem. Failed test cases are modified to work -
please note this PR does not break those test cases as they never work.
2024-04-22 12:53:10 -07:00
aamajumder
d0e33d2078
[DML EP] Register opset 20 operators (#20092)
### Description
This PR registers the following opset 20 operators to the DML EP:
-IsNaN-20
-IsInf-20
-ReduceMax-20


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-22 12:01:59 -07:00
Yi Zhang
197b3f1d90
Enable Whisper Test with OMP_FFMPEG (#20402)
### Description
 Installing OMP_FFMPEG in the docker  and Readd Whisper Test
Download OMP_FFMPEG in restricted accessed Azure blob.
2024-04-22 10:55:56 -07:00