onnxruntime/orttraining
Edward Chen 9810b9e02b
Reduce amount of compiled CUDA device code (#6118)
Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight.

Make corresponding changes for ROCM execution provider code.

Other minor cleanup.
2020-12-14 15:27:40 -08:00
..
orttraining Reduce amount of compiled CUDA device code (#6118) 2020-12-14 15:27:40 -08:00
pytorch_frontend_examples Fix mnist example (#4926) 2020-08-26 15:28:39 -07:00
tools add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821) 2020-12-08 23:14:56 -08:00