onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

petermcaughan febd5facce Change head_size parameter dependent on qkv_hidden_size (#12933 ) Description: Add qkv_hidden_size support in CUDA Attention Layer implementation. Changes include: - Modify UT to test GPU and CPU implementation - Add overload for CUDA kernel `AddBiasTransposeQKV` to support scenario where V_HIDDEN_SIZE != QK_HIDDEN_SIZE - Update variable names from `head_size` to `qkv_head_sizes[0]` or `qkv_head_sizes[2]` - Modify function definitions to allow communication of `qkv_hidden_sizes` or `qkv_head_sizes` Note that this feature is not supported in Rocm EP or quantized attention right now. Motivation and Context - Why is this change required? What problem does it solve? The current CUDA implementation of attention layer doesn't support the parameter qkv_hidden_size added in the CPU implementation in PR [8039](https://github.com/microsoft/onnxruntime/pull/8039) - If it fixes an open issue, please link to the issue here. Co-authored-by: Peter Mcaughan <petermca@microsoft.com>		2022-10-11 00:25:47 -07:00
..
cpu	Decouple use_sequence_as_input_ids from has_hidden_states (#13130 )	2022-09-29 22:45:52 -07:00
cuda	Change head_size parameter dependent on qkv_hidden_size (#12933 )	2022-10-11 00:25:47 -07:00
rocm	Change head_size parameter dependent on qkv_hidden_size (#12933 )	2022-10-11 00:25:47 -07:00