onnxruntime

saymrwulf/onnxruntime

Fork 0

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-04 23:59:56 +00:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Weixing Zhang	2705115732	add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821 ) * add HSA_NO_SCRATCH_RECLAIM=1 to dockerfile It is to work around an issue in AMD compiler which generates poor GPU ISA when the type of kernel parameter is a structure and “pass-by-value” is used * update BUILD.md * add dockerfile for rocm3.10	2020-12-08 23:14:56 -08:00
Weixing Zhang	fc614ad050	revert the code change which was based on `b4869926` The change `b4869926` which was to remove per-thread allocator would cause seg fault for distributed training. In addition, add dockerfile for ROCm3.9	2020-11-15 00:24:32 -08:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00

Weixing Zhang

2705115732

add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821 )

* add HSA_NO_SCRATCH_RECLAIM=1 to dockerfile

It is to work around an issue in AMD compiler which generates poor GPU ISA when the type of kernel parameter is a structure and “pass-by-value” is used

* update BUILD.md

* add dockerfile for rocm3.10

2020-12-08 23:14:56 -08:00

Weixing Zhang

fc614ad050

revert the code change which was based on b4869926

The change b4869926 which was to remove per-thread allocator would cause seg fault for
distributed training.

In addition, add dockerfile for ROCm3.9

2020-11-15 00:24:32 -08:00

Weixing Zhang

aec4cb489e

ROCm EP for AMD GPU (#5480 )

The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

2020-10-29 17:13:04 -07:00

3 commits