onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Ryan Lai	c4429952aa	Revert "Merged PR 5490021: Merge latest github master into dmldev branch Last RI was on Monday : `0afbdfd81c`"	2020-12-11 01:15:04 +00:00
Ryan Lai	054fb4d3f6	Merged PR 5490021: Merge latest github master into dmldev branch Last RI was on Monday : `0afbdfd81c`	2020-12-11 01:13:32 +00:00
Jesse Benson	bd96f60888	Use CUDA's IsAllFinite kernel for ROCm	2020-11-30 09:24:22 -08:00
Tianlei Wu	31a6be3d67	Add Longformer Attention Cuda Op(#5932 ) Limitation: Global tokens must be at the beginning of sequence.	2020-11-25 13:52:10 -08:00
Suffian Khan	4d603e83d7	Remove attention_past.cu and attention_transpose.cu from hipify to fix AMD build (#5921 ) * remove attention_transpose.cu and attention_past.cu from hipify * remove print line * remove trailing ws for flake test * fix ws onre mor etime	2020-11-24 20:49:06 -05:00
Weixing Zhang	bb1af718b5	fix build failures due to recent change(`858040fa`) in CUDA EP (#5736 ) Some part of code for reduction kernels has been changed in `858040fa`, which cause failures in rocm build since ROCm EP shares some code with CUDA EP. This PR is to quick fix this failure by not sharing two files for now to unblock CI enabling on ROCm EP. Another PR for leveraging `858040fa` for ROCm EP will be done later.	2020-11-09 08:41:30 -08:00
Weixing Zhang	fff85a6a35	Add GPU kernels for ROCm EP (#5655 ) * Add kernels for AMD GPU. This PR is mostly about GPU kernels for ROCm EP. Due to similar GPU programming language (CUDA and HIP and similar math library calls, one principle in ROCM EP design is to share CUDA kernels as much as possible for ROCm. Thus, the script amd_hipify.py has been created for converting CUDA kernels to ROCm HIP kernels automatically during compilation phase. But, for some reasons such as perf issue, syntax difference..., some converted kernels need some manual intervention. These kernels will be checked in the repo physically for now. In order to avoid manual intervention, the plan is to refactor CUDA kernels to make them portable between CUDA EP and ROCm EP as much as possible. Please refer to "HIP Porting Guide" for details. * like lamb, multi-tensor-apply needs to be disabled for IsAllFiniteOp and ReduceAllL2, current AMD GPU compiler has perf issue for kernel parameter which is a structure with "pass by value". * Use hipMemsetAsync and add checks on HIP calls. * move the generated files to build folder. Co-authored-by: Jesse Benson <jesseb@microsoft.com>	2020-11-06 16:11:06 -08:00

7 commits