onnxruntime/cmake
Weixing Zhang fff85a6a35
Add GPU kernels for ROCm EP (#5655)
* Add kernels for AMD GPU.

This PR is mostly about GPU kernels for ROCm EP. Due to similar GPU programming language (CUDA and HIP and similar math library calls, one principle in ROCM EP design is to share CUDA kernels as much as possible for ROCm. Thus, the script amd_hipify.py has been created for converting CUDA kernels to ROCm HIP kernels automatically during compilation phase. But, for some reasons such as perf issue, syntax difference..., some converted kernels need some manual intervention. These kernels will be checked in the repo physically for now. In order to avoid manual intervention, the plan is to refactor CUDA kernels to make them portable between CUDA EP and ROCm EP as much as possible.

Please refer to "HIP Porting Guide" for details.

* like lamb, multi-tensor-apply needs to be disabled for IsAllFiniteOp and ReduceAllL2, current AMD GPU compiler has perf issue for kernel parameter which is a structure with "pass by value".

* Use hipMemsetAsync and add checks on HIP calls.

* move the generated files to build folder.

Co-authored-by: Jesse Benson <jesseb@microsoft.com>
2020-11-06 16:11:06 -08:00
..
external Upgrade optional implementation to https://github.com/martinmoene/optional-lite. (#5563) 2020-11-03 15:27:47 -08:00
horovod Address PR comments and clean up. (#3536) 2020-04-15 15:51:52 -07:00
patches OpenVINO EP v2.0 (#3585) 2020-04-24 04:06:02 -07:00
tensorboard Introduce training changes. 2020-03-11 14:39:03 -07:00
CMakeLists.txt Modify logic to determine OV Version (#5701) 2020-11-05 15:12:02 -08:00
CMakeSettings.json Fork the WinML APIs into the Microsoft namespace (#3503) 2020-04-17 06:18:54 -07:00
codeconv.runsettings CMake changes (#2961) 2020-02-03 19:33:14 -08:00
ConfigureVisualStudioCodeAnalysis.props
EnableVisualStudioCodeAnalysis.props
flake8.cmake Older flake8 versions report false positives and don't handle the same things in the config file. (#3983) 2020-05-20 07:29:22 +10:00
onnxruntime.cmake ROCm EP for AMD GPU (#5480) 2020-10-29 17:13:04 -07:00
onnxruntime_codegen.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_common.cmake Upgrade optional implementation to https://github.com/martinmoene/optional-lite. (#5563) 2020-11-03 15:27:47 -08:00
onnxruntime_config.h.in Thread pool changes (#3153) 2020-03-30 12:18:40 -07:00
onnxruntime_csharp.cmake Add amd migraphx execution provider to onnx runtime (#2929) 2020-05-27 04:24:59 +08:00
onnxruntime_flatbuffers.cmake fix build break (#5306) 2020-09-28 00:10:48 -07:00
onnxruntime_framework.cmake Ryanunderhill/backout 5014 (#5167) 2020-09-14 22:48:00 -07:00
onnxruntime_fuzz_test.cmake Onnxruntime fuzzing (#4341) 2020-07-06 16:34:34 -07:00
onnxruntime_graph.cmake Replace MPI Send and Recv with NCCL Send and Recv (#5054) 2020-09-09 09:39:56 -07:00
onnxruntime_ios.toolchain.cmake Add iOS test pipeline and a sample app. (#5298) 2020-09-29 13:53:11 -07:00
onnxruntime_java.cmake [Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576) 2020-07-22 14:08:12 -07:00
onnxruntime_java_unittests.cmake [java] Adds a CUDA test (#3956) 2020-05-18 12:05:51 -07:00
onnxruntime_language_interop_ops.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_mlas.cmake MLAS: Add support for AVXVNNI (#5592) 2020-10-26 16:27:48 -07:00
onnxruntime_nodejs.cmake build: split nodejs binding build and test to avoid timeout issue (#4188) 2020-06-10 19:16:32 -07:00
onnxruntime_nuphar_extern.cmake Weba/merge ngemm (#2021) 2019-10-05 12:09:22 -07:00
onnxruntime_optimizer.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_providers.cmake Add GPU kernels for ROCm EP (#5655) 2020-11-06 16:11:06 -08:00
onnxruntime_pyop.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_python.cmake ROCm EP for AMD GPU (#5480) 2020-10-29 17:13:04 -07:00
onnxruntime_session.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_training.cmake ROCm EP for AMD GPU (#5480) 2020-10-29 17:13:04 -07:00
onnxruntime_unittests.cmake Revert "Custom Op on GPU (#5620)" 2020-10-30 21:23:51 -07:00
onnxruntime_util.cmake Create Utils for Adding Range and Marker (#4013) 2020-05-24 22:55:24 -07:00
precompiled_header.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
protobuf_function.cmake Last major set of ORT format model changes (#5056) 2020-09-05 07:59:01 +10:00
set_winapi_family_desktop.h Fix WCOS/Win32 linking bugs (#3126) 2020-03-19 08:52:40 -07:00
store_toolchain.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
target_delayload.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
wcos_rules_override.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
wil.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
winml.cmake Store/containerized apps support (#4651) 2020-09-09 14:36:35 -07:00
winml_cppwinrt.cmake Add Experimental WinRT API IDL as placeholder for adding new winrt features (#4736) 2020-08-12 12:45:19 -07:00
winml_sdk_helpers.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
winml_unittests.cmake Add WinML Model testing (#5417) 2020-10-15 19:04:12 -07:00