onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-27 20:02:15 +00:00

History

Tianlei Wu 3afb38cfb7 [CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426 ) Follow up of https://github.com/microsoft/onnxruntime/pull/19357 to apply the use_tf32 option on fp32 cuDNN convolution. When use_tf32 = 0, we will disable TF32 in cuDNN convolution for FP32 inputs. https://docs.nvidia.com/deeplearning/cudnn/api/cudnn-graph-library.html#cudnnmathtype-t CUDNN_FMA_MATH - Restricted to only kernels that use FMA instructions. - On pre-NVIDIA A100 GPU devices, CUDNN_DEFAULT_MATH and CUDNN_FMA_MATH have the same behavior: Tensor Core kernels will not be selected. - With NVIDIA Ampere architecture and CUDA toolkit 11, CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH does not. - The TF32 behavior for CUDNN_DEFAULT_MATH and the other Tensor Core math types can be explicitly disabled by the environment variable NVIDIA_TF32_OVERRIDE=0.		2024-02-21 12:46:16 -08:00
..
contrib_ops	[ROCm] Add SkipGroupNorm for ROCm EP (#19303 )	2024-02-21 11:08:48 +08:00
core	[CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426 )	2024-02-21 12:46:16 -08:00
python	Changed command line argpasrse to process '--symmetric [True\|False]'. (#19577 )	2024-02-20 21:18:54 -08:00
test	[ROCm] Add SkipGroupNorm for ROCm EP (#19303 )	2024-02-21 11:08:48 +08:00
tool/etw
wasm	[js/webgpu] Support capture and replay for jsep (#18989 )	2024-01-30 18:28:03 -08:00
__init__.py	[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170 )	2024-01-17 11:18:32 -08:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings