Commit graph

11 commits

Author SHA1 Message Date
Weixing Zhang
aec4cb489e
ROCm EP for AMD GPU (#5480)
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-10-29 17:13:04 -07:00
Guoyu Wang
3a3f26f38e
Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276)
* Move fbs include from header to cc

* add initial cmake for flatbuffers

* Move most flatbuffers util to ort_flatbuffers

* move code around

* fix

* move test/perf runner to use flatbuffer directly instead of model

* minor update

* Fix build break

* Clean up includes and foward decl

* Fix traning CI build breaks

* Addressed PR comment, replaced some include with forward decls

* Remove ORT_MUST_USE_RESULT temporarily
2020-09-25 05:36:29 -07:00
gwang-msft
7ca8388dc9
[ORT Mobile] file format schema and file I/O code (#4973)
* ort mobile file format schema and [de]serializing code
2020-09-01 11:51:31 +10:00
Changming Sun
26546f81fe
Remove the private ONNX protobuf definition file (#4878) 2020-08-24 12:40:33 -07:00
Weixing Zhang
b4b1c6440a
Enable ORT with CUDA 11 toolkit (#4168)
* ORT on CUDA 11

1. Seperate HOROVOD and MPI
2. Seperate NCCL from HOROVOD in CMakeLists.txt
2. Remove dependency on external cub
3. cudnnSetRNNDescriptor is changed in cuDNN 8.0

* polish the code about MPI/NCCL in CMakeLists.txt and build.py

* check CUDA version

* ${MPI_INCLUDE_DIRS} should be PUBLIC

* sm30, sm50 are deprecated in CUDA 11 Toolkit

* update change based on code review feedback.

* add sm_52

* improve MPI/NCCL build path

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-15 08:47:03 -07:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
ytaous
bc441b7e5c
Add cpu/mem usage for perf metrics (#3947)
* add cpu/mem usage

* on comments

* on comments

* renaming

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-15 12:29:40 -07:00
ytaous
96030fdcbc
dashboard integration - output training perf metrics as json (#3809)
* dashboard integration - first phase

* change a field

* perf scripts

* addressing PR comments

* address comments and fix build

* minor

* make GetConfigFromData() const

* more update for comments

* addressing comments

* more on addressing comments

* minor

* fix build

* add condition check

* more on comments

* retrun status

* remove batch size

* on comments

* rename pkg path

* rename pkg path

* additional commentss

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-10 10:29:38 -07:00
edgchen1
0ec90f7019
Put safeint_interface include directory into onnxruntime_common interface include directories to simplify usage by other targets. (#3546) 2020-04-16 10:34:32 -07:00
ytaous
f73008483a
safeint for region bytes in bfc arena and code clean up (#3447)
* PR comments

* remove build issue workaround

* SafeInt for region bytes

* fix build

* fix build

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-04-08 13:54:42 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00