onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-23 19:32:23 +00:00

History

kunal-vaishnavi 5b663d6797 Whisper Multitask and Multilingual (#15936 ) ### Description This PR enables Whisper's multitask format and allows a user to use Whisper for multiple tasks (e.g. transcription, translation) and for multilingual purposes (e.g. English, Spanish). This PR also removes `attention_mask` as a required input for Whisper with beam search. ### Usage Here is an example of how you can use Whisper for English transcription. ``` import numpy as np import onnxruntime as ort from datasets import load_dataset from transformers import AutoConfig, AutoProcessor model = "openai/whisper-tiny" config = AutoConfig.from_pretrained(model) processor = AutoProcessor.from_pretrained(model) forced_decoder_ids = processor.get_decoder_prompt_ids(language="english", task="transcribe") # forced_decoder_ids is of the format [(1, 50259), (2, 50359), (3, 50363)] and needs to be # of the format [50258, 50259, 50359, 50363] where 50258 is the start token id forced_decoder_ids = [config.decoder_start_token_id] + list(map(lambda token: token[1], forced_decoder_ids)) ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") input_features = processor(ds[0]["audio"]["array"], return_tensors="np").input_features inputs = { "input_features": np.float32(input_features), "max_length": np.array([26], dtype=np.int32), "min_length": np.array([1], dtype=np.int32), "num_beams": np.array([2], dtype=np.int32), "num_return_sequences": np.array([1], dtype=np.int32), "length_penalty": np.array([1.0], dtype=np.float32), "repetition_penalty": np.array([1.0], dtype=np.float32), "decoder_input_ids": np.array([forced_decoder_ids], dtype=np.int32), } sess = ort.InferenceSession("whisper-tiny_beamsearch.onnx", providers=["CPUExecutionProvider"]) outputs = sess.run(None, inputs) # Print tokens and decoded output print(outputs[0][0][0]) print(processor.decode(outputs[0][0][0])) ``` If you don't want to provide specific decoder input ids or you want Whisper to predict the output language and task, you can set `forced_decoder_ids = [config.decoder_start_token_id]` instead. ### Motivation and Context As seen in the figure below from the [OpenAI Whisper paper](https://cdn.openai.com/papers/whisper.pdf), Whisper can be used for multiple tasks and languages. ![Screenshot 2023-05-12 165215](https://github.com/microsoft/onnxruntime/assets/115581922/49335e39-a79c-4f78-92e9-89b034405f65)		2023-05-15 14:36:33 -07:00
..
c_cxx	Training Documentation (#15612 )	2023-04-25 11:44:12 -07:00
execution_providers/images	Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225 )	2021-02-05 18:09:27 -08:00
images	API Documentation (#8948 )	2021-09-09 22:04:51 -07:00
python	Update VERSION_NUMBER (#15773 )	2023-05-03 15:07:34 -07:00
ABI_Dev_Notes.md	skip windows GPU check if changes only in doc (#13248 )	2022-10-11 13:51:44 +08:00
Android_testing.md	Removed BUILD.md from master as source now lives in gh-pages (#6709 )	2021-02-19 11:34:21 -08:00
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	Enable RUFF as a formatter (#15699 )	2023-04-26 14:04:07 -07:00
ContribOperators.md	Whisper Multitask and Multilingual (#15936 )	2023-05-15 14:36:33 -07:00
FAQ.md	Fix typo enviroment => environment (#13195 )	2022-10-03 17:02:26 -07:00
How_To_Update_ONNX_Dev_Notes.md	Remove exclusions for ONNX model tests that now pass. (#14337 )	2023-01-24 08:04:27 +10:00
Memory_Optimizer.md	Add guidelines for ORTModule (#13553 )	2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md	Update docs/ONNX_Runtime_Server_Usage.md (#7818 )	2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Fix broken and outdated links in documentation (#14092 )	2023-02-23 10:48:04 -08:00
OperatorKernels.md	Whisper Multitask and Multilingual (#15936 )	2023-05-15 14:36:33 -07:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORTMobilePackageOperatorTypeSupport.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md	log level control + fix typos (#15302 )	2023-04-04 20:19:13 +08:00
ORTModule_Training_Guidelines.md	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
PR_Guidelines.md	Add guidelines for writing a good PR. (#3830 )	2020-05-05 16:28:21 -07:00
Privacy.md	[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481 )	2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md	Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871 )	2020-05-12 17:07:06 -07:00
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md	Updated TPN for OpenMPI and cleanup (#3932 )	2020-05-14 11:42:44 -07:00
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
TVM_EP.md	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00