Commit graph

10557 commits

Author SHA1 Message Date
satyajandhyala
dfeda9019c
[JS/WebGPU] Add MatMulNBits (#19446)
### Description
Add MatMulNBits to support MatMul using 4-bit quantized weights



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-17 09:19:17 -08:00
Yulong Wang
06269a3952
[js/webgpu] allow uint8 tensors for webgpu (#19545)
### Description
allow uint8 tensors for webgpu
2024-02-16 18:28:27 -08:00
Adrian Lizarraga
4874a41008
[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546)
### Description
Updates the default QNN SDK version to 2.19.2.240210.

### Motivation and Context
Build and test the latest version of QNN SDK in our pipelines.
2024-02-16 16:59:43 -08:00
kunal-vaishnavi
44d8ad93b2
Whisper Timestamps and Temperature (#19509)
### Description
This PR updates exporting and running the Whisper model with beam search
by adding the following.

- Adds temperature as a graph input to the exported model
- Fixes the token ids by adding them as attributes to
`WhisperBeamSearch`
- Fixes the timestamps test cases so they pass now
- Fixes a bug with invoking `torch.onnx.export`
- Cleans up the Whisper scripts and groups the arguments in
`convert_to_onnx.py`
- Adds a `requirements.txt` file to specify package dependencies
- Adds `whisper-large-v3` to list of pretrained models
- Fixes a bug with missing cross-attention KV cache inputs in the
decoder subgraph

### Motivation and Context

- This is a follow-up to [this
PR](https://github.com/microsoft/onnxruntime/pull/19188).
- The incorrect token ids in the timestamps processor were first noticed
during [this PR
review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333520007).
When they were originally added in [this
PR](https://github.com/microsoft/onnxruntime/pull/15853), the offsets
were previously constant across the Whisper model sizes. When comparing
the new `whisper-large-v3` variant, the English-only variants (e.g.
`whisper-tiny.en`), and the original variants (e.g. `whisper-tiny`),
both the values and the offsets differ. Therefore, it is easier to set
the token ids as attributes to `WhisperBeamSearch` when exporting to
ensure the right values are used in the timestamps processor.
- The Hugging Face API for returning timestamps and the expected outputs
from the PyTorch model have both changed.
- The fix for `torch.onnx.export` is a follow-up to [this PR
review](https://github.com/microsoft/onnxruntime/pull/17179#issuecomment-1683001470).
- The argument grouping is a follow-up to [this PR
review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333521721).
- Specific package versions are needed to run the Whisper scripts and
the `requirements.txt` file ensures that these versions are installed.
- The `whisper-large-v3` variant is released and should be in the list
of official pretrained models.
- After the changes from [this
PR](https://github.com/microsoft/onnxruntime/pull/17316), the exported
model is not loading in an ORT inference session because the
cross-attention KV cache inputs are missing in the decoder subgraph.
2024-02-16 15:21:43 -08:00
Tianlei Wu
1dce5e1732
Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541)
### Description
Some test thresholds that previously worked in T4 GPU does not work
anymore. The reason is current pipeline uses A10, and TF32 is enabled by
default.

Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random
test failure.

### Motivation and Context
Linux Test has random failure at tests:

ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[446], expected: <0.0419757> but was: <0.041948937>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
 
org.opentest4j.AssertionFailedError: array contents differ at index [6],
expected: <0.0225981> but was: <0.022587791>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
2024-02-16 14:41:11 -08:00
Adrian Lizarraga
b84712151c
QNN EP: Fuse DQ -> Q sequences into a QNN Convert op (#19511)
### Description
Fuses DQ -> Q sequences into a QNN Convert operator if:
- Converting from one qtype to another. Ex: Dequantize(uint8 to float)
-> Quantize(float to uint16)
- The DQ and Q operators are not part of another node unit (i.e.,
standalone)
- The Q operator is the only consumer for the DQ operator.



### Motivation and Context
Allows faster execution of QDQ models with mixed activation types by
leveraging the QNN Convert operator, which converts between quantization
types. For certain models, this results in inference latency speed-ups
of up to 2x (depends on the number of DQ -> Q sequences).

#### Example for Add node unit with 16-bit I/O:

Original:
```
u8 ----> DQ ---> Q ---u16--> Add ---u16-->
                              ^
                              |
u16 --------------------------+
```

After fusing DQ -> Q:
```
u8 ----> Convert ---u16--> Add ---u16-->
                            ^
                            |
u16 ------------------------+
```
2024-02-16 14:36:05 -08:00
Sheil Kumar
ef0b71308c
Optimize KahnsTopologicalSort and PriorityNodeCompare (#19475)
**Description**
1) During SessionInitialization, KahnsTopologicalSort is a major cause
of perf degradation.
The main cause of slow down is that the TopologicalSort needs to keep
track of nodes to visit in order, and reorder them based on priority (as
informed by a comparator). The existing implementation uses a
priority_queue that is backed by a std::vector container. However,
vectors are not good for insertion and reordering. The appropriate data
type for this operation is a linked list. However, linked lists like
std::list are not usable as a container for std::priority_queue. This is
because std::priority_queue requires random access, which linked lists
do not have. However, for this simple implementation, we can leverage a
std::list under the hood and perform insertions manually using
std::upper_bound. This drastically reduces the time taken by the method,
which currently instead causes numerous recopies and a lot of movement
inside the graph nodes to visit list.

2) In the comparator, I hide forward and backward attribute checking
behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be
valid in the training scenario.

3) In noopelimination transformer, I prevent the creation of Initializer
(which unpacks tensorproto data) in every node and only create
initializers when Add/Sub/Mul/Div op nodes are detected.

**Motivation and Context**
Session creation time of many models is quite slow.

---------

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-02-16 05:34:55 -08:00
Tianlei Wu
4bfa69def8
Speed Up DecoderMaskedSelfAttentionTest (#19531)
### Description
The unit tests take 19 minutes to run (in debug build) because of too
many combinations. I reduce the combinations and remain good test
coverage. After the change, the test can finish in 51 seconds.

Before:
[----------] 2 tests from DecoderMaskedSelfAttentionTest
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (394086 ms)
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (747035 ms)
[----------] 2 tests from DecoderMaskedSelfAttentionTest (1141122 ms
total)

After:
[----------] 2 tests from DecoderMaskedSelfAttentionTest
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (21057 ms)
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (30653 ms)
[----------] 2 tests from DecoderMaskedSelfAttentionTest (51710 ms
total)


### Motivation and Context
Reduce test time, and improve build pipeline efficiency.
2024-02-15 20:22:36 -08:00
sophies927
d0061d6fb1
Update stale.yml to use old version as a bug fix (#19532)
### Description
Changed the actions/stale version back to v8 from v9.



### Motivation and Context
There is a well-documented issue w/ the new actions/stale version
(v9.0.0) that causes the following error: "Error delete _state: [403]
Resource not accessible by integration". See
https://github.com/actions/stale/issues/1133 for more context.

This issue is preventing the stale bot from labeling stale issues since
the version was updated b/c the action can no longer access the cache
and cannot apply labels to all issues due to GH API rate limiting.

There are two potential fixes if we continue to use the new version: (1)
run the action on all PRs/issues to avoid using the cache or (2) give
write access to the endpoints listed in
https://docs.github.com/en/rest/authentication/permissions-required-for-fine-grained-personal-access-tokens?apiVersion=2022-11-28#repository-permissions-for-actions.
Neither of these options is preferable, so I am going to wait until the
bug is fixed.

Note: The old version (v8.0.0) uses Node 16, which will be deprecated in
Spring 2024, instead of Node 20, so we should keep an eye on [this
issue](https://github.com/actions/stale/issues/1133) to see when they
make the fix and we can switch back to the new version.
2024-02-15 17:03:11 -08:00
rui-ren
d63c664ca0
fix rocm ci pipeline (#19525)
### Description
<!-- Describe your changes. -->

ROCm CI pipeline issue.
```
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
    main()
  File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
    datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
    self._download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
    data_file = dl_manager.download_and_extract(self.config.data_url)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
    return self.extract(self.download(url_or_urls))
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
    downloaded_path_or_paths = map_nested(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
    return function(data_struct)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
    return cached_path(url_or_filename, download_config=download_config)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
    output_path = get_from_cache(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
    raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip

```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update the `datasets` pipeline to latest version `2.17.0`.
2024-02-15 00:02:08 -08:00
Changming Sun
660f39aca5
Perf improvement for Intel MTL CPUs (#19524)
### Description
See the comments inside of the changed files for more detailed
information.

The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc
and onnxruntime/core/platform/windows/hardware_core_enumerator.h were
copied from WinML source folder in this repo, with minor coding style
changes.

I had an offline discussion with Sheil. We agree that given the lack of
a future proof solution, we may check-in this temp fix first, and rework
it later. I will have a meeting with @ivberg for discussing the issue
deeply, and seeking for a long term solution. Thanks for offering help,
@ivberg !

### Motivation and Context
With this change, we will see about 2x perf improvement on some Intel
CPUs.
2024-02-14 18:35:56 -08:00
jingyanwangms
775c774f4b
Add BF16 to Sqrt (#19363)
### Description
Sqrt does not have BF16 support yet. Adding that with this PR



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-14 18:07:51 -08:00
rui-ren
a67e692546
add GatherSliceToSplitFusion and Unittest (#19218)
### Multi Query Attention Optimization

in multi-query attention
```
batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
return fused_qkv[..., :-2, :], fused_qkv[..., [-2], :], fused_qkv[..., [-1], :]
```
which can be optimized to 
```
batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
(query, key, value) = fused_qkv.split([self.num_heads, 1, 1], dim=2)
return query, key, value
```

this optimization can be validated from nsight profiling and perf
benchmarking.
   
<img width="545" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/15321482/cefcd061-4a01-4aaf-a008-8e265f7f63e9">

As such, This PR is to Optimize the `Gather/Gather/Slice` Ops to `Split`
Kernel.

### Optimization Target
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

As 2 `Gather` and 1 `Slice` Kernels are time consuming for backward
prop, it would be efficient to use 1 `Split` Kernel


###  Example

- Before Fusion
<img width="419" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/15321482/17410319-57ea-4176-afd4-1efdcd3fdbae">
 
- After Fusion
<img width="424" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/15321482/f1ee1582-96d4-45f4-8778-49d1f3fd370a">

### Perf Gain
After the optimization, there will have **~7%** perf gain. 

> The `Transpose` Kernel can be fused too, will update it in next PR.
However, after testing Transponse Ops fusion on Falcon model, there is
no perf gain. Will not create a new PR.

---------

Co-authored-by: ruiren <ruiren@microsoft.com>
2024-02-14 15:07:56 -08:00
Scott McKay
4e5119760d
Add initial support for CoreML ML Program to the CoreML EP. (#19347)
### Description
<!-- Describe your changes. -->
Adds infrastructure to create an ML Package containing the Model using
ML Program. Updated coremltools files to v7.1 to bring in new protobuf
definitions along with the tools to write the weight.bin file and create
an ML Package correctly.

Enables building a CoreML Model on all platforms which means all the
operator builder code can be debugged anywhere. Execution of the
generated CoreML model is obviously limited to Apple platforms.

The Conv operator builder has been updated to be able to generate an ML
Program Operation.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
NeuralNetwork is no longer being developed and ML Program is the
replacement going forward.
2024-02-15 08:46:03 +10:00
Baiju Meswani
944d8f8513
Update the default std flag used during torch extensions compilation (#19516) 2024-02-14 12:49:34 -08:00
Prathik Rao
3b03b2e046
Upgrade default ORTModule opset from 15 to 17 (#19315)
### Description
<!-- Describe your changes. -->

This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
the final opset supported by torchscript exporter
(https://github.com/pytorch/pytorch/pull/107829)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Engineering excellence contribution for ORT Training DRI.

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-14 11:19:33 -08:00
Sheil Kumar
1508c2ee39
Restrict L2 Cache Core check to Intel devices (#19483)
### Description
Limit SoC core detection via 2 level cache core logic to Intel and
Hybrid processors.

### Motivation and Context
The following code was added to add support for a new class of CPU cores
present in Intel’s next generation Intel Core Ultra mobile processors.
This code is essential to avoid placing threads on low performing SoC
cores that don’t have L3 cache. SoC cores are meant to specialize in
system bringup and help improve responsiveness and power usage, in other
words they are not meant to run compute heavy AI workloads. In order to
avoid broad exposure of this logic, it is currently designed to be
restricted to Intel platforms that have hybrid enabled.

---------

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-02-14 10:31:03 -08:00
Tianlei Wu
fbff99a432
Change Jave Test Threshold (#19508)
### Description
Increase the threshold to 1e-5 to avoid test failed in CUDA when
difference is slightly larger than 1e-6.
May because TF32 is used in those CUDA tests.

### Motivation and Context


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1291322&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d

ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[103], expected: <0.0102678> but was: <0.010266338>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
        

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1293200&view=logs&jobId=f2f63060-d9d6-52d0-adee-b97db5a9ab91&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d
        
InferenceTest > testCUDA() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[103], expected: <0.0102678> but was: <0.010266337>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
2024-02-14 10:08:46 -08:00
Ye Wang
f53d2c2465
Phi2 script fixes (#19500)
### Description
<!-- Describe your changes. -->

This PR is intended to support Phi2 passes in Olive. 
Merge it before https://github.com/microsoft/Olive/pull/938

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-14 10:08:11 -08:00
Prathik Rao
544407038d
SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898)
### Description
<!-- Describe your changes. -->

Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which
will provide speedup for Llama-v2 on A100 using bfloat16 numerical
format.

_layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_

![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284)

### Repro Instructions

```python
from torch import nn
from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel
import torch

dtype = torch.bfloat16
# dtype = torch.float16

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(784, 10, dtype=dtype)
        self.layernorm = nn.LayerNorm([784], dtype=dtype)

    def forward(self, x):
        x = x.view(x.shape[0], -1)
        x = self.layernorm(x)
        x = self.fc(x)

        return x

model = Net()
model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO))
model.to("cuda")

images = torch.randn((8, 28, 28), dtype=dtype).to("cuda")
output = model(images)
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

ONNX Runtime integration with Llama-v2 family of LLMs.

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-14 10:05:16 -08:00
dependabot[bot]
18f76bd25d
Bump gradle/wrapper-validation-action from 1 to 2 (#19412)
Bumps
[gradle/wrapper-validation-action](https://github.com/gradle/wrapper-validation-action)
from 1 to 2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/gradle/wrapper-validation-action/releases">gradle/wrapper-validation-action's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.0</h2>
<h2>What's Changed</h2>
<p>The version of the Node.js runtime was updated to 20, and the
majority of dependencies were updated to the latest versions.
From now on, the <code>wrapper-validation-action</code> will require a
Node.js 20 runtime environment.</p>
<p>There are no functional changes in this release.
This release is tagged with the <code>v2</code> version label.</p>
<ul>
<li>[NEW] Update Node.js runtime to version 20 (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
</ul>
<h2>v2.0.0-rc.1</h2>
<p>This is a release candidate for <code>v2.0.0</code>. It is also
available under the <code>v2</code> version label.</p>
<h2>What's Changed</h2>
<p>The version of the Node.js runtime was updated to 20, and the
majority of dependencies were updated to the latest versions.
From now on, the <code>wrapper-validation-action</code> will require a
Node.js 20 runtime environment.</p>
<p>There are no functional changes in this release.</p>
<ul>
<li>[NEW] Update Node.js runtime to version 20 (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
</ul>
<h2>v1.1.0</h2>
<p>The action now adds the path of the failed wrapper Jar as a
<code>failed-wrapper</code> Step output parameter.
This makes the value available for reporting in later Steps/Jobs.</p>
<h2>v1.0.6</h2>
<h1>Gradle Wrapper Validation</h1>
<ul>
<li>Security vulnerability: <a
href="959bfac6da">Bump
json5 from 1.0.1 to 1.0.2</a></li>
<li>Security vulnerability: <a
href="ffa46e5c87">Bump
qs from 6.10.1 to 6.11.0</a></li>
</ul>
<h2>v1.0.5</h2>
<h1>Gradle Wrapper Validation</h1>
<ul>
<li>Update dependencies for Node 16 (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/53">#53</a>)</li>
<li>Update dependencies with security vulnerabilities (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/67">#67</a>)</li>
<li>Update various other dependencies (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/45">#45</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/47">#47</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/48">#48</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/54">#54</a>)</li>
</ul>
<h2>v1.0.4</h2>
<h1>Gradle Wrapper Validation</h1>
<ul>
<li>Retry connections to the server on failure (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/39">#39</a>)</li>
<li>Update dependencies (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/38">#38</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/37">#37</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/36">#36</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/34">#34</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/31">#31</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/30">#30</a>,
<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/29">#29</a>)</li>
</ul>
<h2>v1.0.3</h2>
<h1>Gradle Wrapper Validation</h1>
<p>Update <code>minimist</code> version to  <code>1.2.5</code></p>
<h2>v1.0.2</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="27152f6fa0"><code>27152f6</code></a>
Update to Node 20 (<a
href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
<li><a
href="d8758a98d1"><code>d8758a9</code></a>
Build output</li>
<li><a
href="e916071cca"><code>e916071</code></a>
Update NPM dependencies</li>
<li><a
href="d9359e465a"><code>d9359e4</code></a>
Add asdf config file</li>
<li><a
href="77d43de170"><code>77d43de</code></a>
Update upload-artifact version</li>
<li><a
href="2f8436d9bb"><code>2f8436d</code></a>
Use setup-node@v4 instead of pinning to a revision</li>
<li><a
href="bfa0fe410a"><code>bfa0fe4</code></a>
Consistently use npm cache for workflows</li>
<li><a
href="8be8473276"><code>8be8473</code></a>
Update workflows and action to NodeJS 20</li>
<li><a
href="c8fad9e3f8"><code>c8fad9e</code></a>
Bump <code>@​babel/traverse</code> from 7.14.7 to 7.23.2</li>
<li><a
href="342dbebe72"><code>342dbeb</code></a>
Update README to use <code>actions/checkout@v4</code></li>
<li>See full diff in <a
href="https://github.com/gradle/wrapper-validation-action/compare/v1...v2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gradle/wrapper-validation-action&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-13 15:59:24 -08:00
dependabot[bot]
f048fb5b14
Bump nuget/setup-nuget from 1 to 2 (#19411)
Bumps [nuget/setup-nuget](https://github.com/nuget/setup-nuget) from 1
to 2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/nuget/setup-nuget/releases">nuget/setup-nuget's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>build(deps): bump semver from 7.3.8 to 7.5.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/49">NuGet/setup-nuget#49</a></li>
<li>build(deps-dev): bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/51">NuGet/setup-nuget#51</a></li>
<li>build(deps-dev): bump <code>@​babel/traverse</code> from 7.23.0 to
7.23.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/57">NuGet/setup-nuget#57</a></li>
<li>Update to use Node.js 20 by <a
href="https://github.com/frederikprijck"><code>@​frederikprijck</code></a>
in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li>
<li>build(deps-dev): bump prettier from 2.8.7 to 3.0.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/60">NuGet/setup-nuget#60</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.18.0 to
20.8.9 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/62">NuGet/setup-nuget#62</a></li>
<li>build(deps-dev): bump <code>@​vercel/ncc</code> from 0.36.1 to
0.38.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/61">NuGet/setup-nuget#61</a></li>
<li>build(deps-dev): bump eslint-plugin-jest from 27.4.0 to 27.6.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/64">NuGet/setup-nuget#64</a></li>
<li>build(deps-dev): bump nock from 13.3.3 to 13.3.6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/63">NuGet/setup-nuget#63</a></li>
<li>build(deps-dev): bump eslint from 8.50.0 to 8.52.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/65">NuGet/setup-nuget#65</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
5.62.0 to 6.9.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/70">NuGet/setup-nuget#70</a></li>
<li>build(deps-dev): bump eslint-plugin-github from 4.10.0 to 4.10.1 by
<a href="https://github.com/dependabot"><code>@​dependabot</code></a> in
<a
href="https://redirect.github.com/NuGet/setup-nuget/pull/68">NuGet/setup-nuget#68</a></li>
<li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.5 to
29.5.7 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/69">NuGet/setup-nuget#69</a></li>
<li>build(deps-dev): bump eslint from 8.52.0 to 8.53.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/73">NuGet/setup-nuget#73</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.8.9 to
20.8.10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/71">NuGet/setup-nuget#71</a></li>
<li>build(deps-dev): bump nock from 13.3.6 to 13.3.8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/72">NuGet/setup-nuget#72</a></li>
<li>build(deps-dev): bump prettier from 3.0.3 to 3.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/74">NuGet/setup-nuget#74</a></li>
<li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.7 to
29.5.8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/76">NuGet/setup-nuget#76</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.9.1 to 6.10.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/77">NuGet/setup-nuget#77</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.8.10 to
20.9.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/75">NuGet/setup-nuget#75</a></li>
<li>build(deps-dev): bump eslint from 8.53.0 to 8.54.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/80">NuGet/setup-nuget#80</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.9.0 to
20.9.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/79">NuGet/setup-nuget#79</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.10.0 to 6.12.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/81">NuGet/setup-nuget#81</a></li>
<li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.8 to
29.5.10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/83">NuGet/setup-nuget#83</a></li>
<li>build(deps-dev): bump typescript from 5.2.2 to 5.3.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/82">NuGet/setup-nuget#82</a></li>
<li>build(deps-dev): bump nock from 13.3.8 to 13.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/88">NuGet/setup-nuget#88</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.9.2 to
20.10.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/86">NuGet/setup-nuget#86</a></li>
<li>build(deps-dev): bump eslint from 8.54.0 to 8.55.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/85">NuGet/setup-nuget#85</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.12.0 to 6.13.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/89">NuGet/setup-nuget#89</a></li>
<li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.10 to
29.5.11 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/93">NuGet/setup-nuget#93</a></li>
<li>build(deps-dev): bump prettier from 3.1.0 to 3.1.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/91">NuGet/setup-nuget#91</a></li>
<li>build(deps-dev): bump typescript from 5.3.2 to 5.3.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/92">NuGet/setup-nuget#92</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.10.3 to
20.10.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/90">NuGet/setup-nuget#90</a></li>
<li>build(deps-dev): bump eslint from 8.55.0 to 8.56.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/94">NuGet/setup-nuget#94</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.13.2 to 6.19.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/107">NuGet/setup-nuget#107</a></li>
<li>build(deps-dev): bump eslint-plugin-jest from 27.6.0 to 27.6.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/106">NuGet/setup-nuget#106</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.10.4 to
20.11.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/110">NuGet/setup-nuget#110</a></li>
<li>build(deps-dev): bump prettier from 3.1.1 to 3.2.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/109">NuGet/setup-nuget#109</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 20.11.5 to
20.11.10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/116">NuGet/setup-nuget#116</a></li>
<li>build(deps-dev): bump nock from 13.4.0 to 13.5.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/115">NuGet/setup-nuget#115</a></li>
<li>build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/113">NuGet/setup-nuget#113</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.19.0 to 6.20.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/117">NuGet/setup-nuget#117</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/frederikprijck"><code>@​frederikprijck</code></a>
made their first contribution in <a
href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0">https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a21f25cd39"><code>a21f25c</code></a>
Update dist for release (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/118">#118</a>)</li>
<li><a
href="5166d73a43"><code>5166d73</code></a>
build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.19.0 to 6.20.0 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/117">#117</a>)</li>
<li><a
href="b915545882"><code>b915545</code></a>
build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/113">#113</a>)</li>
<li><a
href="00081d4dbe"><code>00081d4</code></a>
build(deps-dev): bump nock from 13.4.0 to 13.5.1 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/115">#115</a>)</li>
<li><a
href="e44f8a5711"><code>e44f8a5</code></a>
build(deps-dev): bump <code>@​types/node</code> from 20.11.5 to 20.11.10
(<a
href="https://redirect.github.com/nuget/setup-nuget/issues/116">#116</a>)</li>
<li><a
href="f685ada866"><code>f685ada</code></a>
build(deps-dev): bump prettier from 3.1.1 to 3.2.4 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/109">#109</a>)</li>
<li><a
href="aee2c690f4"><code>aee2c69</code></a>
build(deps-dev): bump <code>@​types/node</code> from 20.10.4 to 20.11.5
(<a
href="https://redirect.github.com/nuget/setup-nuget/issues/110">#110</a>)</li>
<li><a
href="2bd1cef324"><code>2bd1cef</code></a>
build(deps-dev): bump eslint-plugin-jest from 27.6.0 to 27.6.3 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/106">#106</a>)</li>
<li><a
href="c5ed90cfc8"><code>c5ed90c</code></a>
build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
6.13.2 to 6.19.0 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/107">#107</a>)</li>
<li><a
href="34040aa462"><code>34040aa</code></a>
build(deps-dev): bump eslint from 8.55.0 to 8.56.0 (<a
href="https://redirect.github.com/nuget/setup-nuget/issues/94">#94</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/nuget/setup-nuget/compare/v1...v2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nuget/setup-nuget&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-13 15:59:15 -08:00
fxmarty
1e10cdb2b9
Fix subgraph quantization regression in onnxruntime 1.17 (#19421)
As per title, fixes
https://github.com/microsoft/onnxruntime/issues/19418

ONNX Runtime 1.17 broke the quantization of ONNX models with subgraphs
where initializers are placed on the top-level graph, while different
subgraphs use the same initializer.
2024-02-13 15:49:19 -08:00
Yifan Li
5c7e6b2e2a
[EP Perf] Add CI option to enable TRT-OSS parser (#19448)
### Description
<!-- Describe your changes. -->
* Introducing CI option to enable TRT-OSS parser, during ep perf
testing:

![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504)

By default, open-sourced onnx-tensorrt parser listed under
[cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40)
will be used if enabling this option.


### To verify this option and check the difference during ORT image
build:
If this option is enabled:
<img width="649" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd">

If this option is not enabled (by default):
<img width="683" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab">

* update default usage of cmake/trt version to the latest

### Motivation and Context
Make it easier to test oss parser and find potential gap between
tensorrt builtin/oss parser.

Schedule runs with oss parser will be set after this PR gets merged
2024-02-12 23:04:08 -08:00
George Wu
5e70c6b3a6
allow protobuf lite build for TRT EP (#19498)
allow protobuf-lite builds with TensorRT EP as long as it's built with
the trt built-in parser and not the oss-parser.
This is because trt built-in parser statically links protobuf so there
aren't any conflicts for protobuf-lite.
2024-02-12 22:53:04 -08:00
Adrian Lizarraga
4dfba53bfb
[QNN EP] Build x64 python wheel for QNN EP (#19499)
### Description
Adds a job to the python packaging pipeline that builds x64 python
wheels for QNN EP.



### Motivation and Context
Necessary to create a cached QNN model on Windows x64, which is done by
creating a properly configured onnxruntime session with QNN EP.
2024-02-12 20:54:04 -08:00
Patrice Vignola
61e07a46e1
[DML EP] Support split hidden size for RotaryEmbedding (#18852)
RotaryEmbedding now supports the `[batchSize, numHeads, sequenceLength,
headSize]` format for its input, which is used in Mistral.
2024-02-12 19:36:08 -08:00
Hector Li
a622710fe1
Add option to skip session run in perf_test tool (#19501)
Enable a option to exit after session creation so that user can measure session creation time to measure impact of enabling any initialization optimizations.
2024-02-12 19:11:40 -08:00
snadampal
7fa6f4fca4
add arm64 bfloat16 fastmath mode option for transformers benchmarking script (#19294)
Add arm64 bfloat16 fastmath mode option for transformers benchmarking script.

### Motivation and Context
onnxruntime now supports bfloat16 fastmath gemm kernels for arm64 platforms with bfloat16 instruction support. This PR updates benchmark scripts to test that mode.
2024-02-12 15:20:36 -08:00
Preetha Veeramalai
90e2e8561f
Ovep 1.17.1 (#19482)
### Description
Handle  bugs for API backward compatability.
Update to consume the onnx model path rather the onnx serialised model
to OV compile_model API
2024-02-12 12:31:08 -08:00
Changming Sun
9cb97ee507
Disable CPU EP's allocator's arena when address sanitizer is enabled (#19485)
### Description
Disable CPU EP's allocator's arena when address sanitizer is enabled,
because it masks problems. For example, the code in
onnxruntime/test/quantization/quantization_test.cc has a memory leak
problem: it allocated a buffer but didn't free it, but most memory leak
check tool cannot detect that because the buffer was from an arena and
the arena was finally freed.

### Motivation and Context
Provider better memory leak check coverage.
2024-02-12 09:39:49 -08:00
Baiju Meswani
c831031ad5
Remove cuda gencode 90 to reduce onnxruntime-training package size (#19486) 2024-02-12 09:24:36 -08:00
Fangrui Song
d00adb7989
Align bins_space_ storage (#17552)
Otherwise, `new (BinFromIndex(b)) Bin(this, bin_size);` in bfc_arena.cc
would cause a -fsanitize=alignment (part of -fsanitize=undefined)
failure like

runtime error: constructor call on misaligned address 0xXXX for type
'Bin', which requires 8 byte alignment
2024-02-11 19:18:26 -08:00
Patrice Vignola
1182b5509b
Disable streams for the DML EP (#19481)
There's currently a bug in the allocation planner when reusing buffers
and more than one streams are used that make it possible (although
rarely) to reach a reference count of 0 for a buffer that is still being
used. Since DML doesn't benefit from multiple streams, disabling it is
the safest option for now.

This is a high priority issue that we need to fix for 1.17.1 since it
breaks stable diffusion. Identifying the perfect fix and fixing the
underlying issue would be too risky for a patch release, especially
given the limited time that we have.

https://github.com/microsoft/onnxruntime/issues/19480
2024-02-10 00:34:34 -08:00
Dmitri Smirnov
0e984ef0d1
Small fixes in Resize CPU antialias (#19476)
### Description
Add a comment, pass NCHW = true when setting upsample_antialias

### Motivation and Context
Small bugs.
2024-02-09 15:27:04 -08:00
Shubham Bhokare
90cf03767d
Support ONNX export of OpenAi Whisper model (#17316)
Build from source and run the command below

Example, converting whisper-base
`
python -m onnxruntime.transformers.models.whisper.convert_to_onnx -m
openai/whisper-base --model_impl openai -e -o -w --chain_model --output
./demo`
2024-02-09 15:26:39 -05:00
Changming Sun
1007d8f3d1
Revert "Revert NeuralSpeed code for x64 MatMulNBits (#19382)" (#19474)
This reverts commit 0d10c7f3c1.
2024-02-09 09:24:54 -08:00
Justin Chu
3d2ddf96e3
Bump ruff linter to 0.2.1 (#19471)
### Motivation and Context

Include new lint rules
2024-02-08 16:08:27 -08:00
Yulong Wang
03be65e064
[js/web] fix types exports in package.json (#19458)
### Description

Since TypeScript v4.7, types need to specify inside "exports" field when
it is available. This PR appends types just before each "default" (which
is required by spec to be the last item).

Fixes #19403.
2024-02-08 15:56:48 -08:00
ivberg
148f54c6ea
Add capturestate / rundown ETW support logging for session and provider options (#19397)
### Description
Add capturestate / rundown ETW support logging for session and provider
options.

### Motivation and Context
Follow-up to #16259 and #18882

This is very useful when you have longer running ONNX sessions which
will be the case for a lot of AI workloads. That means ETW tracing may
start minutes or hours after a process & session has been established.
When a trace is captured, you would want to know the state of ONNX at
that time. The state for ONNX is session and config options so that they
show up in the trace.

Tested with xperf and ORT 
xperf -start ort -on 3a26b1ff-7484-7484-7484-15261f42614d
xperf -capturestate ort 3a26b1ff-7484-7484-7484-15261f42614d <--- Run
this after session has been up for some time
xperf -stop ort -d .\ort.etl  <- Trace will now also have rundown events

Also these will show if you use WPR [CaptureStateOnSave
](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/capturestateonsave)
2024-02-08 11:28:05 -08:00
Scott McKay
3b1b18347c
Check for invalid combination of python + minimal build in build.py (#19463)
### Description
<!-- Describe your changes. -->
Python bindings aren't supported in a minimal build. Check in build.py
so user gets a better error message.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#19422
2024-02-08 09:08:41 -08:00
Ye Wang
19952c5b35
Add script to convert phi2 to ort-vllm compatible (#19429)
### Description
<!-- Describe your changes. -->

1. add option to export onnx compatiable with ort_vllm. This makes sure
that onnx model only leverages on paged attn from vllm. It's intended to
use internally so not mentioned in readme.
2. add details in ORT
installation(https://github.com/microsoft/onnxruntime/pull/19338#discussion_r1476906190)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: wejoncy <wejoncy@163.com>
2024-02-07 17:03:06 -08:00
luoyu-intel
0d10c7f3c1
Revert NeuralSpeed code for x64 MatMulNBits (#19382)
### Description
<!-- Describe your changes. -->
Revert PR#19016 https://github.com/microsoft/onnxruntime/pull/19016
Revert PR#17669 https://github.com/microsoft/onnxruntime/pull/17669
2024-02-07 13:04:37 -08:00
Jian Chen
75f06319d6
Change binet to bin (#19424)
### Description
This pull request includes a small change to the
`Dockerfile.manylinux2_28_cuda` file in the
`tools/ci_build/github/linux/docker` directory. The change corrects the
`PREPEND_PATH` argument from `/usr/local/cuda/binet` to
`/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is
set.
2024-02-07 09:51:02 -08:00
Scott McKay
36d223676b
Use GraphViewer.IsConstantInitializer in NNAPI EP. (#19401)
### Description
<!-- Describe your changes. -->
An overridable initializer should not have a fixed value included in an
NNAPI model as it could be changed at runtime. The current check doesn't
include validating that the initializer is constant.

I was updating GetClipMinMax as part of adding CoreML EP ML Program
support, and in order to make both CoreML and NNAPI do the more correct
thing of using IsConstantInitializer this set of changes was required.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make NNAPI and CoreML EPs more correct.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-02-07 14:01:51 +10:00
Patrice Vignola
302d4be7d9
[DML EP] Fix external data unpacking (#19415)
### Description
This change
55a669409a
didn't take into account external data when unpacking initializer, and
therefore crashes when trying to unpack them.
2024-02-06 17:10:55 -08:00
Maximilian Müller
91b2e660fe
[Build] fix: missing nvcc flags when compiling with unittests (#19308)
When configured using the following CMake ops Clion is not able to
configure due to checking with `nvcc ... --dryrun tmp.cu`:
```
cmake -G Ninja -Donnxruntime_USE_TENSORRT="ON" -Donnxruntime_USE_CUDA="ON" -Donnxruntime_USE_CUDA_NHWC_OPS="ON" -DCMAKE_CUDA_ARCHITECTURES="native" -Donnxruntime_NVCC_THREADS=1 -Donnxruntime_ENABLE_NVTX_PROFILE="ON" -Donnxruntime_USE_TENSORRT_BUILTIN_PARSER="ON" -DCMAKE_CUDA_COMPILER_LAUNCHER="ccache" -Donnxruntime_BUILD_UNIT_TESTS="ON" -Donnxruntime_USE_TRITON_KERNEL=OFF -Donnxruntime_USE_FLASH_ATTENTION=OFF
```
Without building the unittests everything works fine. I believe my
changes only follow the logic that is actually desired. If
`NVCC_HAS_STRICT_ALIASING` is set to false it should not be possible to
add this as a CUDA flag. Same is true for `HAS_NOERROR` as seen in
`adjust_global_compile_flags.cmake`
2024-02-06 17:01:26 -08:00
Edward Chen
df5c6718bd
Remove iOS simulator max runtime version limit. (#19396) 2024-02-06 14:54:06 -08:00
Tianlei Wu
bedf0eee73
[CUDA] Add use_tf32 provider option (for FP32 GEMM) (#19357)
[TF32](https://blogs.nvidia.com/blog/tensorfloat-32-precision-format/)
could help boost performance on GPU of SM >= 80. Sometime, user observes accuracy loss, or need disable TF32 for testing
purpose. To disable TF32, it is also possible to set environment
variable `NVIDIA_TF32_OVERRIDE = 0`. However, sometime we do not want to
use environment variable to avoid impacting other applications, or want
to have finer control (like one session using TF32, and another session
not). This provider option could help.

Here we add a provider option `use_tf32`. When `use_tf32 = 0`, we will
disable TF32 for float MatMul/GEMM in cublas. It applies to MatMulNBits,
Attention, LongformerAttention, PackedAttention,
PackedMultiHeadAttention operators when float GEMM is used internally in
the operator. Note that it will not impact other data type, like fp8
gemm could still use TF32 in accumulation.

Previously, cublasGemmStridedBatchedHelper does not use TF32 in
inference. Here we enabled TF32 by default, so we might observe speed up
for FP32 transformers models on SM >= 80.

There is another PR that enables the option for cuDNN Conv later.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/15407
https://github.com/microsoft/onnxruntime/issues/19288
2024-02-06 13:31:33 -08:00
Tianlei Wu
c4b49fb7bf
[CUDA] remove CUBLAS_TENSOR_OP_MATH mode (#19431)
This pull request replaces `CUBLAS_TENSOR_OP_MATH` with
`CUBLAS_DEFAULT_MATH`. The changes affect several files, including test
cases and a Python script for AMD hipify process.

### Motivation and Context

CUBLAS_TENSOR_OP_MATH mode is deprecated:
https://docs.nvidia.com/cuda/cublas/index.html#cublasmath-t

On CUDA versions prior to 11, users are required to set the math mode to
CUBLAS_TENSOR_OP_MATH manually to be able to use tensor cores for FP16.
On CUDA 11 and CUDA 12, this is no longer required. Since latest ORT
only supports CUDA >= 11 so it is safe to remove CUBLAS_TENSOR_OP_MATH
from our code base.
2024-02-06 12:48:39 -08:00