onnxruntime/orttraining
Tianlei Wu 3afb38cfb7
[CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426)
Follow up of https://github.com/microsoft/onnxruntime/pull/19357 to apply the use_tf32 option on fp32 cuDNN convolution.

When use_tf32 = 0, we will disable TF32 in cuDNN convolution for FP32 inputs.

https://docs.nvidia.com/deeplearning/cudnn/api/cudnn-graph-library.html#cudnnmathtype-t
**CUDNN_FMA_MATH**
- Restricted to only kernels that use FMA instructions.
- On pre-NVIDIA A100 GPU devices, CUDNN_DEFAULT_MATH and CUDNN_FMA_MATH
have the same behavior: Tensor Core kernels will not be selected.
- With NVIDIA Ampere architecture and CUDA toolkit 11,
CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH
does not.
- The TF32 behavior for CUDNN_DEFAULT_MATH and the other Tensor Core
math types can be explicitly disabled by the environment variable
NVIDIA_TF32_OVERRIDE=0.
2024-02-21 12:46:16 -08:00
..
orttraining [CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426) 2024-02-21 12:46:16 -08:00
tools Bump ruff linter to 0.2.1 (#19471) 2024-02-08 16:08:27 -08:00