onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Patrice Vignola	85cacf315b	[DML EP] Add MultiHeadAttention and fix Attention (#15727 )	2023-05-19 15:07:14 -07:00
Patrice Vignola	310b22aa0c	[DML EP] Update DirectML version to 1.12.0 (#16011 )	2023-05-18 19:37:12 -07:00
Zhang Lei	0f8e66d905	optimization for whisper model with decoder masked multihead attention (#15827 ) * graph tools update * cuda kernel update * operator spec update and implementation update * greed search bug fix on wrong assumption for cross/self attention input length * avoid use of "" name in value info when loading graph which historically in many model	2023-05-18 15:38:31 -07:00
Linnea May	0d6416c0e9	DML EP Bitwise operators opset 18 (#15892 ) ### Description <!-- Describe your changes. --> Add dml registration for bitwise and, or, xor and not added in opset 18. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Linnea May <linneamay@microsoft.com>	2023-05-17 13:27:49 -07:00
stevenlix	270c09a37f	Add timestamp logits processor for whisper (#15853 ) Enable timestamp estimation and logits processing for Whisper model.	2023-05-16 21:40:00 -07:00
kailums	f62f722c70	integrate triton into ort (#15862 ) ### Description In some scenarios, the triton written kernels are more performant than CK or other handwritten kernels, so we implement a framework that onnxruntime can use these triton written kernels. This PR is to integrate triton into ort, so that ort can use kernels that written and compiled by triton. The main change focus on two part: 1. a build part to compile triton written kernel and combine these kernels into libonnxruntime_providers_rocm.so 2. a loader and launcher in c++, for loading and launch triton written kernels. #### Build To compile triton written kernel, add a script `tools/ci_build/compile_triton.py`. This script will dynamic load all kernel files, compile them, and generate `triton_kernel_infos.a` and `triton_kernel_infos.h`. `triton_kernel_infos.a` contains all compiled kernel instructions, this file will be combined into libonnxruntime_providers_rocm.so, using --whole-archive flag. `triton_kernel_infos.h` defines a const array that contains all the metadata for each compiled kernel. These metadata will be used for load and launch. So this header file is included by 'triton_kernel.cu' which defines load and launch functions. Add a build flag in build.py and CMakeList.txt, when building rocm provider, it will call triton_kernel build command, and generate all necessary files. #### C++ Load and Launch On c++ part, we implement load and launch functions in triton_kernel.cu and triton_kernel.h. These two files located in `providers/cuda`, and when compiling rocm, they will be hipified. so this part supports both cuda and rocm. But currently we only call triton kernel in rocm. We also implement a softmax triton op for example. Because there will generate many kernels for different input shape of softmax, we use TunableOp to select the best one. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-17 09:35:28 +08:00
Sheil Kumar	a7ad859e3a	DML EP Register Split18 (#15931 ) Register Split18 for DirectML Split13 was previously implemented. Split18 adds a new attribute called "num_outputs" that must be used mutually exclusively with the "split" input. The "num_outputs" attribute wil split the tensor evenly (and handles odd uneven splits). To implement, the DML split tensor just needs to be overridden in the presence of the num_output attribute. --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-05-16 11:58:19 -07:00
Baiju Meswani	6b7181d31d	Add C# API documentation for training (and some other changes) (#15935 )	2023-05-16 03:15:24 -07:00
kunal-vaishnavi	5b663d6797	Whisper Multitask and Multilingual (#15936 ) ### Description This PR enables Whisper's multitask format and allows a user to use Whisper for multiple tasks (e.g. transcription, translation) and for multilingual purposes (e.g. English, Spanish). This PR also removes `attention_mask` as a required input for Whisper with beam search. ### Usage Here is an example of how you can use Whisper for English transcription. ``` import numpy as np import onnxruntime as ort from datasets import load_dataset from transformers import AutoConfig, AutoProcessor model = "openai/whisper-tiny" config = AutoConfig.from_pretrained(model) processor = AutoProcessor.from_pretrained(model) forced_decoder_ids = processor.get_decoder_prompt_ids(language="english", task="transcribe") # forced_decoder_ids is of the format [(1, 50259), (2, 50359), (3, 50363)] and needs to be # of the format [50258, 50259, 50359, 50363] where 50258 is the start token id forced_decoder_ids = [config.decoder_start_token_id] + list(map(lambda token: token[1], forced_decoder_ids)) ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") input_features = processor(ds[0]["audio"]["array"], return_tensors="np").input_features inputs = { "input_features": np.float32(input_features), "max_length": np.array([26], dtype=np.int32), "min_length": np.array([1], dtype=np.int32), "num_beams": np.array([2], dtype=np.int32), "num_return_sequences": np.array([1], dtype=np.int32), "length_penalty": np.array([1.0], dtype=np.float32), "repetition_penalty": np.array([1.0], dtype=np.float32), "decoder_input_ids": np.array([forced_decoder_ids], dtype=np.int32), } sess = ort.InferenceSession("whisper-tiny_beamsearch.onnx", providers=["CPUExecutionProvider"]) outputs = sess.run(None, inputs) # Print tokens and decoded output print(outputs[0][0][0]) print(processor.decode(outputs[0][0][0])) ``` If you don't want to provide specific decoder input ids or you want Whisper to predict the output language and task, you can set `forced_decoder_ids = [config.decoder_start_token_id]` instead. ### Motivation and Context As seen in the figure below from the [OpenAI Whisper paper](https://cdn.openai.com/papers/whisper.pdf), Whisper can be used for multiple tasks and languages. ![Screenshot 2023-05-12 165215](https://github.com/microsoft/onnxruntime/assets/115581922/49335e39-a79c-4f78-92e9-89b034405f65)	2023-05-15 14:36:33 -07:00
Ye Wang	3418ca28a8	pack qkv in t5 decoder (#15801 ) ### Description <!-- Describe your changes. --> V100, b_4_s_128, max_output_len=64, beam=4 before: t5_small: 101.28ms t5_base: 200.07ms after: t5_small: 87.65ms t5_base: 174.44ms ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-05-15 13:45:39 -07:00
liqun Fu	a8d9b29cd2	support AveragePool19 and Pad19 (#15597 )	2023-05-15 10:46:24 -07:00
Sheil Kumar	fa16e2e0f3	Register CPU OptionalGetElement, OptionalHasElement on DirectML (#15926 ) Register CPU OptionalGetElement, OptionalHasElement on DirectML Graphs with OptionalGetElement and OptionalHasElement should work in a DML graph without extra memcpy operation on and off the GPU. CopyCpuTensor is swapped with DataTransferManager.CopyTensor() to make the CPU operator usable by other providers. --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-05-15 09:53:35 -07:00
Linnea May	95a4607dcf	User/linneamay/roi align 16 (#15812 ) ### Description <!-- Describe your changes. --> Add registration for DML RoiAlign-16 and tests for new coordinate_transform_mode attribute. PR [7354](https://github.com/microsoft/onnxruntime/pull/7354) is still open to fix the CPU EP version, which is why there are skipped tests right now. That will be completed separately so that, for now, we can officially support opset16 with the next release. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Linnea May <linneamay@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-05-09 21:56:41 -07:00
Sumit Agarwal	b473e3f3c6	[DML EP] Update DirectML version to 1.11.0 (#15858 ) ### Description - Update DML version to 1.11.0 - Disable Gemm+Softmax fusion ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-09 12:48:15 -07:00
Sheil Kumar	2b7f26af7c	Add GridSample implementation to DirectML (#15788 ) Add GridSample implementation to DirectML EP. Temporary add HLSL shader in the DirectML EP to handle GridSample until officially added to DirectML.	2023-05-05 15:59:33 -07:00
Changming Sun	1fb2f2605b	Update VERSION_NUMBER (#15773 ) ### Description 1. Update VERSION_NUMBER for preparing the upcoming release. This PR's commit will not be included in the 1.15 release branch 2. Delete package/rpm/onnxruntime.spec since it was not used in past years. ### Motivation and Context Preparing the release. Fixed [AB#15311](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15311)	2023-05-03 15:07:34 -07:00
Baiju Meswani	2d519d21af	Python documentation for onnxruntime-training (#15765 )	2023-05-02 16:58:16 -07:00
Nat Kershaw (MSFT)	9219615471	Fix python AP docs generation (#15760 ) Docs are failing on the operator generation step. Remove this temporarily so that we can publish.	2023-05-01 18:31:59 -07:00
liqun Fu	62fc6ed5a8	[Feature Request] Support Resize opset 19 (#15633 )	2023-05-01 10:49:17 -07:00
Linnea May	2c3697be00	User/linneamay/reduce 18 (#15701 ) ### Description <!-- Describe your changes. --> Add registration for DML reduce functions in opset 18. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Linnea May <linneamay@microsoft.com>	2023-04-27 20:32:11 -07:00
kunal-vaishnavi	39d6d7050d	Change EmbedLayerNormalization mask index output to optional (#15526 ) ### Description This PR changes an EmbedLayerNormalization node's mask index output to be an optional output if a mask input is not provided. ### Motivation and Context The documentation for EmbedLayerNormalization states ``` The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated. ``` However, if the mask input is not provided, the mask index output is still calculated and required.	2023-04-27 16:32:42 -07:00
Justin Chu	76ddc92fbd	Enable RUFF as a formatter (#15699 ) ### Description RUFF can now format since lintrunner-adapters v0.8. Removed the RUFF-FIX linter. ### Motivation and Context Better engineering	2023-04-26 14:04:07 -07:00
sfatimar	ebaafac3f5	Openvino ep ort 5.0 (#15626 ) ### Description The PR adds VPU support to OpenVINO Execution Provider Bug fixes for GPU, CPU. Changes to OpenVINO Backend in Serialized Model API for faster First Inference Latency. Deprecation to HDDL-VADM and MYRIAD, removed code Support OpenVINO 2023.0 Dynamic Shapes Support for iGPU ### Motivation and Context - VPU is an upcoming hardware that can provide AI Acceleration for Client Systems through OpenVINO - If it fixes an open issue, please link to the issue here. --> --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-04-25 20:59:42 -07:00
Baiju Meswani	5885abfb35	Training Documentation (#15612 )	2023-04-25 11:44:12 -07:00
Baiju Meswani	11b0a18de6	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
kunal-vaishnavi	901c2bc384	Whisper Model Optimization (#15473 ) ### Description This PR contains fusion-level and kernel-level optimizations for [OpenAI's Whisper](https://github.com/openai/whisper). Some of the added optimizations include: - Pruning of duplicate/unnecessary inputs and outputs - Fusion support for Whisper models with or without these inputs/outputs (e.g. with these inputs/outputs if exporting with an older official Optimum version, without these inputs/outputs if exporting with Optimum from source) - Attention fusions - For Whisper's encoder and decoder - Modified symbolic shape inference for present output when no past input exists (for decoder) - Multi-head attention fusions - For Whisper's decoder and decoder with past - Packed MatMul for the 3 MatMuls excluded in multi-head attention fusion - Attention kernel changes - CPU: - Different Q and KV sequence lengths - Parallel memset for large sequence lengths - Convert broadcast add after MatMul of Q and K (add_qk) to element-wise add - Separate present key-value output into present key and present value (for multi-head attention spec) - CUDA: - Use memory efficient attention compute kernel with present state (for decoder) - Multi-head attention kernel changes - CPU: - Introduction of multi-head attention CPU kernel (previously did not exist) - Use AddBiasReshape instead of AddBiasTranspose when sequence length = 1 (for decoder with past) - Different Q, K, V input shapes - Pass past key and past value directly as key and value - CUDA: - Use memory efficient attention compute kernel with past and/or present state (for decoder with past) ### Usage To use the optimizations, run the ORT transformer optimizer script as follows: ``` $ cd onnxruntime/onnxruntime/python/tools/transformers/ $ python3 optimizer.py --input <filename>.onnx --output <filename>.onnx --model_type bart --num_heads <number of attention heads, depends on the size of the whisper model used> --hidden_size <attention hidden size, depends on the size of the whisper model used> --use_external_data_format --use_multi_head_attention ``` Once optimized, here's an example of how to run Whisper with [Hugging Face's Optimum](https://github.com/huggingface/optimum): ``` from transformers.onnx.utils import get_preprocessor from optimum.onnxruntime import ORTModelForSpeechSeq2Seq from optimum.pipelines import pipeline as ort_pipeline import whisper # Installed from OpenAI's repo - setup instructions at https://github.com/openai/whisper/ directory = './whisper_opt' # Where the optimized ONNX models are located model_name = 'openai/whisper-tiny' device = 'cpu' # Get pipeline processor = get_preprocessor(model_name) model = ORTModelForSpeechSeq2Seq.from_pretrained( directory, use_io_binding=(device == 'cuda'), provider='CPUExecutionProvider', ).to(device) pipe = ort_pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=(-1 if device == 'cpu' else 0), ) # Load audio file and run pipeline audio = whisper.load_audio('tests/jfk.flac') audio = whisper.pad_or_trim(audio) outputs = pipe([audio]) print(outputs) ``` Note: In order to use these changes with Optimum, it is recommended to use Optimum from source to have the following changes: - https://github.com/huggingface/optimum/pull/872 - https://github.com/huggingface/optimum/pull/920 ### Motivation and Context This PR helps the following issues: - https://github.com/microsoft/onnxruntime/issues/15100 - https://github.com/microsoft/onnxruntime/issues/15235 - https://github.com/huggingface/optimum/issues/869 (work in progress) This PR can be used with the other currently merged Whisper PRs: - https://github.com/microsoft/onnxruntime/pull/15247 - https://github.com/microsoft/onnxruntime/pull/15339 - https://github.com/microsoft/onnxruntime/pull/15362 - https://github.com/microsoft/onnxruntime/pull/15365 - https://github.com/microsoft/onnxruntime/pull/15427 This PR uses changes from the following merged PRs: - https://github.com/microsoft/onnxruntime/pull/14198 - https://github.com/microsoft/onnxruntime/pull/14146 - https://github.com/microsoft/onnxruntime/pull/14201 - https://github.com/microsoft/onnxruntime/pull/14928 (this introduced the new multi-head attention spec)	2023-04-18 17:13:54 -07:00
liqun Fu	919d8f2660	update with onnx main (#14929 )	2023-04-18 08:42:51 -07:00
Justin Chu	a36caba073	Bump ruff in CI (#15533 ) ### Description Bump ruff version in CI and fixed new lint errors. - This change enables the flake8-implicit-str-concat rules which helps detect unintended string concatenations: https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc - Update gitignore to include common python files that we want to exclude. ### Motivation and Context Code quality	2023-04-17 10:11:44 -07:00
pengwa	516c8e95fa	Optimize SCE loss compute (#15401 ) ### Optimize SCE loss compute Compute optimization based on label data sparsity: - Insert ShrunkenGather before SCELoss node, to filter out invalid labels for compute. - Support ShrunkenGather upstream. - Added test for the above. - Added flag to enable label sparsity optimization with env var, by default disabled now. Will enable after comprehensive benchmarking later. - Extract common logic into test_optimizer_utils.h/cc from core/optimizer/compute_optimzier_test.cc, then the common functions can be shared by both core/optimizer/compute_optimzier_test.cc and orttraining/core/optimizer/compute_optimzier_test.cc - Extract common logic into shared_utils.h/cc: `GetONNXOpSetVersion` and `Create1DInitializerFromVector` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-13 13:02:12 +08:00
Patrice Vignola	3be5bfe363	[DML EP] Add MatMul + SoftMax fusion (#15240 )	2023-04-11 08:31:04 -07:00
Patrice Vignola	7c927bb95c	[DML EP] Add BiasSplitGelu (#15197 )	2023-04-11 08:30:37 -07:00
Patrice Vignola	c5b6ee1a99	[DML EP] Add NhwcConv (#15194 )	2023-04-10 23:16:09 -07:00
Patrice Vignola	4a676b011a	[DML EP] Add BiasAdd (#15211 )	2023-04-10 14:46:33 -07:00
stevenlix	6d126f8996	Add FP16 support for Whisper model (#15427 ) Current ORT can only run inference for Whisper FP32 model. This PR adds FP16 support.	2023-04-08 21:36:10 -07:00
Chen Fu	8dce83a818	Fuse 'Add' operator into FP16 Conv (#15213 ) ### Description Adding 'Add' functionality to FP16 Conv operator. It takes a tensor that has the same shape of the output tensor, and add it to the result tensor. ### Motivation and Context Needed to run Resnet 50	2023-04-07 09:51:03 -07:00
Patrice Vignola	9191e04259	[DML EP] Add QuickGelu (#15220 )	2023-04-05 10:49:34 -07:00
Aditya Goel	a4e9a48345	Reduce operators support for int64 type (#15358 )	2023-04-05 09:19:43 -07:00
Aditya Goel	1c1d386561	Adds int32_t and uint32_t clip kernels (#15306 )	2023-04-04 13:44:50 -07:00
petermcaughan	1251964f96	Petermca/beamsearch whisper (#15339 ) ### Description Adjust various code paths to allow Whisper model to function with BeamSearch op. Approach: Add a new kModelType enum value in IGenerationParameters as so: #### Old: 0 = GPT2, 1 = T5 #### New: 0 = GPT2, 1 = T5, 2 = Whisper When the user assigns this attribute value to 2, various shape and type checks are changed to accommodate Whisper inputs. ### Motivation and Context BeamSearch is currently designed to function with BERT-based models with inputs as vocab tokens, and needs changes to function with Whisper inputs (3-D float values processed from audio data). --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-04 09:09:10 -07:00
pengwa	5baf5f506b	log level control + fix typos (#15302 ) ### log level control + fix typos	2023-04-04 20:19:13 +08:00
Ye Wang	fbfe92f66a	DecoderMaskedMultiHeadAttention enhancement (#15292 )	2023-04-02 21:53:03 -07:00
Yufeng Li	c08d6b42e8	Add tool to support packing mode for BERT model (#15283 ) ### Description <!-- Describe your changes. --> Add a tool to convert fused BERT like model to packing mode ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-31 08:46:47 -07:00
Xavier Dupré	786f8b98f7	Add a page in the documentation for every operator in onnxruntime (#14340 )	2023-03-30 14:39:16 -07:00
Jian Chen	792d411135	Update python 3.11 and remove 3.7 for Linux (#15214 ) ### Description Update python 3.11 and remove 3.7 ### Motivation and Context Update python 3.11 and remove 3.7 --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-27 14:46:30 -07:00
Patrice Vignola	67a6022c03	[DML EP] Add GroupNorm (#15189 ) Comparison between the different normalization operations: ![](https://user-images.githubusercontent.com/1041752/106491728-73d40680-64b7-11eb-8769-3f758996e959.png)	2023-03-27 12:52:53 -07:00
Justin Chu	d834ec895a	Adopt linrtunner as the linting tool - take 2 (#15085 ) ### Description `lintrunner` is a linter runner successfully used by pytorch, onnx and onnx-script. It provides a uniform experience running linters locally and in CI. It supports all major dev systems: Windows, Linux and MacOs. The checks are enforced by the `Python format` workflow. This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors in Python code. `lintrunner` now runs all required python lints including `ruff`(replacing `flake8`), `black` and `isort`. Future lints like `clang-format` can be added. Most errors are auto-fixed by `ruff` and the fixes should be considered robust. Lints that are more complicated to fix are applied `# noqa` for now and should be fixed in follow up PRs. ### Notable changes 1. This PR removed some suboptimal patterns: - `not xxx in` -> `xxx not in` membership checks - bare excepts (`except:` -> `except Exception`) - unused imports The follow up PR will remove: - `import *` - mutable values as default in function definitions (`def func(a=[])`) - more unused imports - unused local variables 2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than flake8 and is more robust. We are using it successfully in onnx and onnx-script. It also supports auto-fixing many flake8 errors. 3. Removed the legacy flake8 ci flow and updated docs. 4. The added workflow supports SARIF code scanning reports on github, example snapshot: ![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png) 5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Unified linting experience in CI and local. Replacing https://github.com/microsoft/onnxruntime/pull/14306 --------- Signed-off-by: Justin Chu <justinchu@microsoft.com>	2023-03-24 15:29:03 -07:00
Nat Kershaw (MSFT)	28f64066de	Auto deploy API docs (#15088 )	2023-03-23 15:08:49 -07:00
Ye Wang	44ba23e0f5	Rename DecoderMaskedMHA to DecoderMaskedSelfAttn (#15166 ) ### Description <!-- Describe your changes. --> As synced offline, rename this op and will create another op for mha that supports both self and cross attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-23 12:31:38 -07:00
Ye Wang	2ee822d483	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 ) ### Description <!-- Describe your changes. --> 1. upgrade cutlass to 3.0 that containing attn_bias support. 2. extend Attention/MHA to use memory efficient attention when rel_pos_bias with [1, num_head, s, s] and 1d mask with [2 batch_size + 1] are present. new mask format introduction: MASK_1D_KEY_SEQ_LEN_START, [3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1], query_start[0], ..., query_start[batch_size - 1], query_end[batch_size - 1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size - 1]] e.g 2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this 1D mask is [3, 5, 0, 6, 12, 0, 6, 12] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It potentially benefits tnlrv6 and t5(encoder) --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-03-23 11:05:17 -07:00
Hariharan Seshadri	7033346605	Support mask_filter_value attribute in DecoderMaskedMultiheadAttention (#15158 )	2023-03-23 11:00:09 -07:00

1 2 3 4 5 ...

551 commits