Commit graph

8854 commits

Author SHA1 Message Date
Dmitri Smirnov
684e900e96
Remove NETSTANDARD1.1 moniker and NETSTD1.1 specific code (#16018)
### Description

Remove NETSTANDARD1.1 moniker and NETSTD1.1 specific code. We no longer
target this platform.

### Motivation and Context
NETSTANDARD1.1 target constraints the development and the modern
libraries we would like to use in the code while it is apparently no
longer required by customers.
2023-05-22 17:33:46 -07:00
RandySheriffH
d35361bf9d
Fix python pipeline for AzureEP without using root (#16023)
Fix python pipeline for AzureEP without using root, this is for 1.15.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-22 16:38:47 -07:00
satyajandhyala
22a578c06c
Use node name to uniquify the subgraph nodes. (#15855)
### Description
<!-- Describe your changes. -->
Use the unique name of the function node name to uniquify the subgraph
node names.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Prevent duplicate node names in the graph.
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/15849

---------

Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
2023-05-22 16:15:14 -07:00
zhijiang
4dc4470cc7
Fix fusion for two LayerNorm sharing same input but with different weights (#15919)
in gpt_j_residual(https://arxiv.org/pdf/2204.06745.pdf), there are 2 LN
nodes will share one same input, and ORT does CSE graph optimization
before LN fusion, which will modify the LN graph pattern and thus make
LN fusion failure.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/40990fd6-796f-4edf-be0b-3203e8503678)
2023-05-22 08:26:36 +08:00
zhijiang
5607a7151a
Introduce register-efficient warp-wise Softmax (#15266)
improve softmax forward when number of elem to do softmax is between
(1024,2048]

several optimizations done in the PR:
1. originally ort will call softmax_block_forward when shape is 1500,
this will cause 5.53ms, however ort has another implementation called
softmax_warp_forward, this function will only need 4.74ms, so i modified
the function selection logic to call the faster version.
2. softmax_warp_forward will use register to cache the input in fp32
mode, this will consume many registers when data number is large and
will make warp occupancy quite low, also compiler can do some of its
optimizations, so the pr implements another version of
softmax_warp_forward, it will use shared memory instead of register to
cache the input; also when the for loop in the function has many
iterations, actually disable loop unrolling will make kernel faster
further.

the perf table between softmax_warp_forward1(the original version) and
softmax_warp_forward2

![image](https://user-images.githubusercontent.com/43435212/228491963-cf87e3b3-e69e-454c-bab6-7e62a25bf76b.png)


in open-ai whisper case, the kernel gain will be 5.53ms/3.03ms = 82%
(softmax_block_forward vs softmax_warp_forward2)
2023-05-22 08:26:03 +08:00
Changming Sun
0204594f90
Cleanup WASM cmake code (#15996)
### Description
Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if
(CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code
look more nature.
For example,

```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY)
```
becomes
```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
```
2023-05-20 18:07:39 -07:00
Yulong Wang
e9e6bedf37
[js/webgpu] generate operator table for webgpu (#15954)
### Description
[js/webgpu] generate operator table for webgpu
2023-05-20 12:20:41 -07:00
Yulong Wang
18f17c555d
[js/webgpu] fix buffer size when download (#15990)
### Description
fix buffer size when download. buffer size should always be padded to
multiple of 4.

resolved issue described in #15796

>
![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)
2023-05-20 00:21:18 -07:00
Patrice Vignola
85cacf315b
[DML EP] Add MultiHeadAttention and fix Attention (#15727) 2023-05-19 15:07:14 -07:00
Yulong Wang
dc06c255b4
fix transpose optimizer on GPU EP (#15988)
### Description
because of #15618 , the default allocator changed to device allocator,
which will be GPU instead of CPU. in transpose optimizer we expect to
read data from initializers so a CPU allocator is required here.

this change fixes transpose optimizer on GPU EP

Fixes the issue referred to in #15869, #15796
2023-05-19 14:33:45 -07:00
Hector Li
4324d2173b
[QNN EP] Enable Qnn context cache to save model initialization time (#15815)
### Description
Enable Qnn Context cache feature to save model initialization time
Provider options:
qnn_context_cache_enable|1 to enable the cache feature
qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default.

### Motivation and Context
Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost.

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-05-19 10:52:17 -07:00
RandySheriffH
4dfb89b3ad
Implement mutex-free spin lock for task queue (#14834)
Implemented "lock-free" spinlock to save CPU usage on context switching.
The change has been tested on queene service of Ads team, the lock-free
version of ort (40 threads) saves CPU usage on gen8 (128 logical
processors on 8 numa nodes) windows by nearly half, from 65% to 35%.

For 32 cores, the curve is flat:

Anubis, 32 vCPU, windows, hugging face models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
 alvert_base_v2 | 34.21 | 34.09
 bert_large_uncased | 116.27| 117.84
 bart_base | 72.06 | 71.99
 distilgpt2 | 25.43 | 25.02
 vit_base_patch16_224 | 37.33 | 37.76

Anubis, 32 vCPU win, Linux, 1st party models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
deepthink_v2 | 24.35 | 22.95
bing_feeds |  36.96 | 36.48
deep_writes |  14.46 | 14.32
keypoints |  9.34 | 7.69
model11 |  1.71 | 1.66
model12 |  1.82 | 1.44
model2 |  4.21 | 3.95
model6 |  1.08 | 1.05
agiencoder |  0.99 | 0.93
geminet_transformer |  5.32 | 5.24

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-19 10:12:10 -07:00
cloudhan
0b0a359520
[CAPI] CAPI impl refactor (#15974)
1. Better options string building
2. avoid potential `new` `delete`
2023-05-19 11:40:56 +08:00
Patrice Vignola
310b22aa0c
[DML EP] Update DirectML version to 1.12.0 (#16011) 2023-05-18 19:37:12 -07:00
PeixuanZuo
d78bbf5ef2
[ROCm] remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline (#16004)
remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline.
2023-05-19 10:29:01 +08:00
George Wu
a74fdeb7fc
fix unused var warning in contrib_ops/cuda/bert/attention.cc (#16010)
fix https://github.com/microsoft/onnxruntime/issues/16000
2023-05-18 17:42:08 -07:00
Zhang Lei
0f8e66d905
optimization for whisper model with decoder masked multihead attention (#15827)
* graph tools update
* cuda kernel update
* operator spec update and implementation update
* greed search bug fix on wrong assumption for cross/self attention
input length
* avoid use of "" name in value info when loading graph which
historically in many model
2023-05-18 15:38:31 -07:00
Changming Sun
be6c0bb53c
Update cgmanifests/generated/cgmanifest.json to fix a syntax error (#15997)
### Description
In PR #15797, the author manually edited the
cgmanifests/generated/cgmanifest.json file and made an error that makes the file ill formed.

### Motivation and Context
2023-05-18 15:03:06 -07:00
Yufeng Li
0fed00c04d
fix topo sort in quantization tool (#16003)
### Description
<!-- Describe your changes. -->
Should not set up dependent node list for empty('') input


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-18 13:43:52 -07:00
Jian Chen
ea7b2deffd
Removing C4090 warning suppression (#15994)
### Description
Removing C4090 warning  suppression after windows pipelines adapt vs2022


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-18 10:08:05 -07:00
Ashwini Khade
0c815a95b7
android package fix (#15999)
### Description
This PR adds the training headers to the training android packages.


### Motivation and Context
Training headers need to be added as part of the training android
packages, however because of the typo in the cmake these headers were
not being added. This PR fixes the issue.
2023-05-18 09:21:03 -07:00
Changming Sun
842b1a3472
Revert a change in #15797: restore the correct version of emsdk (#15995)
### Description
Revert a change in #15797: restore the correct version of emsdk


### Motivation and Context
Without change, when you build it on Windows you will see:
```
2023-05-17 19:41:30,093 build [INFO] - Activating emsdk...
2023-05-17 19:41:30,093 util.run [INFO] - Running subprocess in 'C:\src\onnxruntime2\cmake\external\emsdk'
  'C:\src\onnxruntime2\cmake\external\emsdk\emsdk.bat' activate 3.1.37
error: tool or SDK not found: '3.1.37'
```
2023-05-18 07:41:38 -07:00
Edward Chen
648bedf91a
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993)
### Description
<!-- Describe your changes. -->

Minor changes to allow CoreML EP to handle more nodes and models.
- Remove graph input dynamic shape check from
coreml::GetSupportedNodes(). Each node input is still checked.
- Add check for optional input in coreml::IsInputSupported(). If an
input does not exist it should not be considered unsupported.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Some CoreML EP checks seem too strict now.
2023-05-18 16:24:30 +10:00
cloudhan
5a8b892bdc
[C#] Address the concern of append EP throw (#15973) 2023-05-18 11:53:54 +08:00
Edward Chen
6d46007028
Add explicit 'set +x' before printing a vso[] command to avoid output getting parsed again with a trailing quote. (#15986)
Here's the motivating issue:
https://github.com/microsoft/azure-pipelines-tasks/issues/10331

Noticed some problems in other repos so also updating usages in ORT.

We may be fine now without it, but this change adds some safeguard against future additions of 'set -x' for debugging.
2023-05-17 19:30:28 -07:00
Changming Sun
d98763473a
Change CUDA pipelines to download CUDA SDK in every build job (#15915)
### Description
Change CUDA pipelines to download CUDA SDK in every build job


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 17:31:51 -07:00
cloudhan
856afa49dd
[C#] Add missing rocm csharp api (#15540) 2023-05-18 08:15:19 +08:00
Linnea May
0d6416c0e9
DML EP Bitwise operators opset 18 (#15892)
### Description
<!-- Describe your changes. -->
Add dml registration for bitwise and, or, xor and not added in opset 18.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Linnea May <linneamay@microsoft.com>
2023-05-17 13:27:49 -07:00
Vrajang Parikh
5abaca9d69
add maybe unused attribute to vars only used for logging (#15970)
### Description
Add maybe_unused attribute to variables that are only used for logging



### Motivation and Context
Building ORT with training using Xcode 14.3 causes`
-Wunused-but-set-variable` error as some variables are created and
exclusively used for debug logging. Adding maybe_unused suppresses
warnings on unused variables when logging is disabled and fixes the
local build.
2023-05-17 10:24:13 -07:00
Yi Zhang
6d43d51eb0
[Fix] No test result report while not using ctest (#15976)
### Description
1. Set gtest output while ctest is set to empty.
2. onnx_src in _deps shouldn't be removed because
onnx_test_pytorch_converted and onnx_test_pytorch_converted need to read
data from onnx/backend/test/data/..

### Motivation and Context
Test result report is important to find the flaky tests.

### To do
Tests are not inconsistent.
If ctest_path is empty, onnx_test_pytorch_converted and
onnx_test_pytorch_converted will not be executed, if it's not,
onnxruntime_mlas_test will not be executed.


270c09a37f/tools/ci_build/build.py (L1743-L1753)
2023-05-17 08:31:16 -07:00
Jian Chen
2881d849d4
Update Win-CPU-2021 to onnxruntime-Win-CPU-2022 (#15967)
### Description
After this PR there are following pool need to be updated.

old|new|note
---|---|---
onnxruntime-Win2019-GPU-dml-A10|tbd|
onnxruntime-Win2019-GPU-T4|onnxruntime-Win2022-GPU-T4|
onnxruntime-Win2019-GPU-training-T4|onnxruntime-Win2022-GPU-T4|ame as
the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4|tbd|
aiinfra-dml-winbuild|tbd|
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 08:29:27 -07:00
Yulong Wang
084d0d0d2d
Update github issue template for 'web': add EP (#15955)
### Description
Update github issue template for 'web': add EP
2023-05-16 23:50:33 -07:00
Patrice Vignola
0ff915eba8
[DML EP] Add frequent upload heap flushing (#15960)
This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.
2023-05-16 22:35:38 -07:00
stevenlix
270c09a37f
Add timestamp logits processor for whisper (#15853)
Enable timestamp estimation and logits processing for Whisper model.
2023-05-16 21:40:00 -07:00
kailums
f62f722c70
integrate triton into ort (#15862)
### Description
In some scenarios, the triton written kernels are more performant than
CK or other handwritten kernels, so we implement a framework that
onnxruntime can use these triton written kernels.

This PR is to integrate triton into ort, so that ort can use kernels
that written and compiled by triton.

The main change focus on two part:
1. a build part to compile triton written kernel and combine these
kernels into libonnxruntime_providers_rocm.so
2. a loader and launcher in c++, for loading and launch triton written
kernels.

#### Build

To compile triton written kernel, add a script
`tools/ci_build/compile_triton.py`. This script will dynamic load all
kernel files, compile them, and generate `triton_kernel_infos.a` and
`triton_kernel_infos.h`.

`triton_kernel_infos.a` contains all compiled kernel instructions, this
file will be combined into libonnxruntime_providers_rocm.so, using
--whole-archive flag.

`triton_kernel_infos.h` defines a const array that contains all the
metadata for each compiled kernel. These metadata will be used for load
and launch. So this header file is included by 'triton_kernel.cu' which
defines load and launch functions.

Add a build flag in build.py and CMakeList.txt, when building rocm
provider, it will call triton_kernel build command, and generate all
necessary files.

#### C++ Load and Launch

On c++ part, we implement load and launch functions in triton_kernel.cu
and triton_kernel.h.

These two files located in `providers/cuda`, and when compiling rocm,
they will be hipified. so this part supports both cuda and rocm. But
currently we only call triton kernel in rocm.

We also implement a softmax triton op for example. Because there will
generate many kernels for different input shape of softmax, we use
TunableOp to select the best one.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 09:35:28 +08:00
Sheil Kumar
a7ad859e3a
DML EP Register Split18 (#15931)
Register Split18 for DirectML

Split13 was previously implemented. Split18 adds a new attribute called
"num_outputs" that must be used mutually exclusively with the "split"
input.

The "num_outputs" attribute wil split the tensor evenly (and handles odd
uneven splits). To implement, the DML split tensor just needs to be
overridden in the presence of the num_output attribute.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2023-05-16 11:58:19 -07:00
Yulong Wang
04ea561fc8
[js/webgpu] throw error when WebGPU=ON and SIMD=OFF (#15924)
### Description
throw error when WebGPU=ON and SIMD=OFF
2023-05-16 11:05:56 -07:00
Jian Chen
780442b9f6
Change windows machine pools to use VS2022
 (#15806)
### Description
<!-- Describe your changes. -->



Old pool | New pool | Notes
-- | -- | --
onnxruntime-Win-CPU-2019 | onnxruntime-Win-CPU-2022 |  
onnxruntime-Win2019-CPU-training | onnxruntime-Win2022-CPU-training-AMD
|  
onnxruntime-Win2019-CPU-training-AMD |
onnxruntime-Win2022-CPU-training-AMD | Same as the above
onnxruntime-Win2019-GPU-dml-A10 | Need be created | You need to create a
new image for it first
onnxruntime-Win2019-GPU-T4 | onnxruntime-Win2022-GPU-T4 |  
onnxruntime-Win2019-GPU-training-T4 | onnxruntime-Win2022-GPU-T4 | Same
as the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4| TBD|TBD
Win-CPU-2021|onnxruntime-Win-CPU-2022| will do it in next PR
Win-CPU-2019|onnxruntime-Win2022-Intel-CPU'| Intel CPU needed for
win-ci-pipeline.yml -> `stage: x64_release_dnnl`

<br class="Apple-interchange-newline">

### Motivation and Context
With vs2022 we can take the advantage of 64bit compiler. It also with
better c++20 support
2023-05-16 10:34:34 -07:00
RandySheriffH
7faad53632
Set default option for package name and build arg options (#15958)
Set default value for parameters in nuget-zip pipeline, and only apply
the configurations when they are not "NONE".

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-16 09:07:38 -07:00
Akash
1079df6aaa
Update StableDiffusion path after cloning repo (#15948)
### Description
Correct path to SD files in README



### Motivation and Context
Small typo in path
2023-05-16 08:39:27 -07:00
Baiju Meswani
6b7181d31d
Add C# API documentation for training (and some other changes) (#15935) 2023-05-16 03:15:24 -07:00
Prathik Rao
a0ccb95f3c
add option to load pretrained weights for T5 model (#15951)
### Description
<!-- Describe your changes. -->

Adds option to pass in pretrained weights file during T5 inference onnx
export. Mimics the changes made to whisper:
https://github.com/microsoft/onnxruntime/pull/15759

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required for ONNX Runtime demo being presented at BUILD.
2023-05-15 22:52:35 -07:00
PeixuanZuo
e96f10d27b
[ROCm] reduce batch size to fix CI error (#15714)
ROCm CI batch size test occasionally fail. Try reduce batch size to fix
it.

error log:
Non-zero status code returned while running FusedMatMul node.
Name:'MatMul_2914_Grad/FusedMatMul_0' Status Message: HIP error
hipErrorNotFound:named symbol not found
Non-zero status code returned while running Gemm node.
Name:'MatMul_2891_Grad/Gemm_5' Status Message: HIP error
hipErrorNotFound:named symbol not found
2023-05-16 13:10:02 +08:00
Aung T Naing
bc5018a4e1
[QNN EP] test coverage for MaxPool (#15904)
### Description
Added MaxPool tests to show the issues with MaxPool and also provide
test coverage

The following tests are currently Failing:
 ./onnxruntime_test_all --gtest_filter=*.TestMaxPool*

[  FAILED  ] 5 tests, listed below:
[  FAILED  ] QnnCPUBackendTests.TestMaxPool_Ceil
[  FAILED  ] QnnCPUBackendTests.TestMaxPool_Large_Input2_Ceil
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input_HTP_u8
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input2_HTP_u8
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input2_Ceil_HTP_u8


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Provide test coverage for MaxPool and debug model related issues.
2023-05-15 21:35:50 -07:00
Yulong Wang
22a9a1a630
[js/webgpu] only register webgpu backend when it's available (#15922)
### Description
only register webgpu backend when it's available
2023-05-15 18:09:31 -07:00
cloudhan
dc383ed4ce
Basic CSharp packaging support for ROCm EP (#15535)
This PR mainly fixes building errors when trying to build nupkg for ROCm EP.
It also slighly improve the packaging logic so that devlopers can
produce the nupkg on linux natively.
2023-05-16 07:27:38 +08:00
Yulong Wang
204111a79e
[js/webgpu] support proxy for webgpu (#15851)
### Description
[js/webgpu] support proxy for webgpu. fixes #15832
2023-05-15 16:23:13 -07:00
Yulong Wang
f3b8130d1a
[js/web] support npm run pull:wasm [buildID] (#15877)
### Description
support `npm run pull:wasm [buildID]`

remove `npm run pull:wasm:debug` as it can be simply replaced with `npm
run pull:wasm debug`.
2023-05-15 16:19:34 -07:00
Jian Chen
00c1da5e0a
Fixing NhwcFusedConv fp16 (#15950)
### Description
<!-- Describe your changes. -->

This should produced fused Resnet50.fp16.onnx

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-15 15:34:41 -07:00
kunal-vaishnavi
5b663d6797
Whisper Multitask and Multilingual (#15936)
### Description
This PR enables Whisper's multitask format and allows a user to use
Whisper for multiple tasks (e.g. transcription, translation) and for
multilingual purposes (e.g. English, Spanish). This PR also removes
`attention_mask` as a required input for Whisper with beam search.

### Usage
Here is an example of how you can use Whisper for English transcription.
```
import numpy as np
import onnxruntime as ort

from datasets import load_dataset
from transformers import AutoConfig, AutoProcessor

model = "openai/whisper-tiny"
config = AutoConfig.from_pretrained(model)
processor = AutoProcessor.from_pretrained(model)

forced_decoder_ids = processor.get_decoder_prompt_ids(language="english", task="transcribe")
# forced_decoder_ids is of the format [(1, 50259), (2, 50359), (3, 50363)] and needs to be 
# of the format [50258, 50259, 50359, 50363] where 50258 is the start token id
forced_decoder_ids = [config.decoder_start_token_id] + list(map(lambda token: token[1], forced_decoder_ids))

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
input_features = processor(ds[0]["audio"]["array"], return_tensors="np").input_features

inputs = {
  "input_features": np.float32(input_features),
  "max_length": np.array([26], dtype=np.int32),
  "min_length": np.array([1], dtype=np.int32),
  "num_beams": np.array([2], dtype=np.int32),
  "num_return_sequences": np.array([1], dtype=np.int32),
  "length_penalty": np.array([1.0], dtype=np.float32),
  "repetition_penalty": np.array([1.0], dtype=np.float32),
  "decoder_input_ids": np.array([forced_decoder_ids], dtype=np.int32),
}
sess = ort.InferenceSession("whisper-tiny_beamsearch.onnx", providers=["CPUExecutionProvider"])
outputs = sess.run(None, inputs)

# Print tokens and decoded output
print(outputs[0][0][0])
print(processor.decode(outputs[0][0][0]))
```

If you don't want to provide specific decoder input ids or you want
Whisper to predict the output language and task, you can set
`forced_decoder_ids = [config.decoder_start_token_id]` instead.

### Motivation and Context

As seen in the figure below from the [OpenAI Whisper
paper](https://cdn.openai.com/papers/whisper.pdf), Whisper can be used
for multiple tasks and languages.

![Screenshot 2023-05-12
165215](https://github.com/microsoft/onnxruntime/assets/115581922/49335e39-a79c-4f78-92e9-89b034405f65)
2023-05-15 14:36:33 -07:00