onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

History

Tianlei Wu ba22d7879a [CUDA/ROCm] Conditionally support ArgMax and ArgMin for opset 12 and above (#22713 ) ### Description Based on https://github.com/microsoft/onnxruntime/pull/9700, and extend it to ArgMin as well. This pull request introduces several enhancements and fixes related to the `ArgMax` and `ArgMin` operators in the CUDA execution provider. The changes ensure proper handling of these operators across different versions and improve kernel registration and fallback mechanisms. Key changes include: #### Enhancements to `ArgMax` and `ArgMin` Operators: * Added new kernel class registrations for `ArgMax` and `ArgMin` for different data types and versions in `onnxruntime/core/providers/cuda/cuda_execution_provider.cc`. [[1]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R966-R972) [[2]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1209-R1215) [[3]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1657-R1659) [[4]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285L1825-L1827) [[5]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1933-R1939) [[6]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2174-R2180) * Introduced `ArgMaxOrArgMinNeedFallbackToCPU` function to handle fallback to CPU when the `select_last_index` attribute is set to 1, as CUDA does not support this attribute. [[1]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2597-R2622) [[2]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2672-R2674) #### Macro and Kernel Registration Improvements: * Replaced `REGISTER_KERNEL_UNTIL_VERSIONED_TYPED` with `REGISTER_KERNEL_VERSIONED_RANGE_TYPED` and `REGISTER_KERNEL_VERSIONED_SINCE_TYPED` macros for better version handling. [[1]](diffhunk://#diff-ee5316fc3898058f70e942d9a84de36be4c7da09f144633a2504236430d5d033L19-R29) [[2]](diffhunk://#diff-ee5316fc3898058f70e942d9a84de36be4c7da09f144633a2504236430d5d033L40-R46) * Updated kernel registration for `ArgMax` and `ArgMin` to use the new macros, ensuring proper version handling and support for different data types. #### Safety Checks: * Added safety checks in the `ArgMax` and `ArgMin` classes to ensure `select_last_index` is not set to 1, as it is not supported on CUDA. [[1]](diffhunk://#diff-8ab09fef1f4a12cbf3b3432e509f8f1ef561e83c72778a0e047780060aeef6efL91-R99) [[2]](diffhunk://#diff-8ab09fef1f4a12cbf3b3432e509f8f1ef561e83c72778a0e047780060aeef6efL101-R117) #### Testing Enhancements: * Added new tests for `ArgMax` and `ArgMin` operators to verify behavior when `select_last_index` is set to 0, ensuring compatibility with both CPU and CUDA execution providers. [[1]](diffhunk://#diff-77affe1b70d1a9d38c2485f7c6b16ef2b6b541ed94dd727bc9b286f068f1481aR3340-R3360) [[2]](diffhunk://#diff-77affe1b70d1a9d38c2485f7c6b16ef2b6b541ed94dd727bc9b286f068f1481aR3679-R3699) ### Motivation and Context Improve CUDA kernel coverage for stable diffusion model and hence improve its performance on CUDA		2024-11-06 09:54:32 -08:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images
images
python	[Doc] Add I/O binding example using onnx data type in python API summary (#22695 )	2024-11-02 12:51:37 -07:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md
Coding_Conventions_and_Standards.md	Update lintrunner requirements (#22185 )	2024-09-23 18:27:16 -07:00
ContribOperators.md	DecoderMaskedMultiHeadAttention CPU kernel. (#22292 )	2024-10-12 13:43:00 -07:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	Update Dockerfile.cuda (#21042 )	2024-06-13 23:50:03 -07:00
Memory_Optimizer.md	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
Model_Test.md	Update docs/Model_Test.md (#11466 )	2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	[CUDA/ROCm] Conditionally support ArgMax and ArgMin for opset 12 and above (#22713 )	2024-11-06 09:54:32 -08:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_Use_Triton_Kernel.md	Rename a mispelled filename in the documentation (#21066 )	2024-06-17 18:18:41 +02:00
ORTModule_Convergence_Notes.md	Fix and enable few ORTModule Unit Tests (#19847 )	2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	Adds ATen fallback for scaled_dot_product_attention (#21107 )	2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md