onnxruntime/cmake
Tianlei Wu 8b4517218b
Remove USE_CUTLASS flag (#19271)
### Description
Since Cutlass can be built with CUDA 11.4 (The minimum CUDA version for
onnxruntime CUDA build), there is no need to have a flag to disable
cutlass.

Changes:
(1) Reverted https://github.com/microsoft/onnxruntime/pull/18761
(2) remove the condition to build cutlass.
(3) Fix a few build errors or warnings during testing CUDA 11.4 build. 

Note that SM 89 and 90 (including fp8) requires CUDA 11.8 or later.
Flash attention and cutlass fused multihead attention will not be built
for CUDA < 11.6. It is recommended to use CUDA 11.8 or above to build if
you want to support latest GPUs.

It is better to include it in 1.17.0 (otherwise, the release branch
might encounter build failure with CUDA 11.4).

Tests:
(1) Build with flash attention and efficient attention off: **passed**
(2) Build with CUDA 11.4: **passed**

Example build command used in Ubuntu 20.04:
```
export CUDA_HOME=/usr/local/cuda-11.4
export CUDNN_HOME=/usr/lib/x86_64-linux-gnu/
export CUDACXX=/usr/local/cuda-11.4/bin/nvcc

sh build.sh --config Release  --build_shared_lib --parallel  --use_cuda --cuda_version 11.4 \
            --cuda_home $CUDA_HOME --cudnn_home $CUDNN_HOME --build_wheel --skip_tests \
            --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80 \
            --disable_types float8
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-25 16:57:58 -08:00
..
external Remove USE_CUTLASS flag (#19271) 2024-01-25 16:57:58 -08:00
patches Update absl and gtest to fix an ARM64EC build error (#18735) 2023-12-07 15:55:17 -08:00
tensorboard
adjust_global_compile_flags.cmake [WebNN EP] Fixed build issue with disable_rtti (#19173) 2024-01-16 21:35:13 -08:00
arm64x.cmake Build onnxruntime.dll as arm64x (#18633) 2023-12-06 16:49:00 -08:00
CMakeLists.txt Remove USE_CUTLASS flag (#19271) 2024-01-25 16:57:58 -08:00
CMakeSettings.json
codeconv.runsettings
deps.txt Update abseil to a release tag and register neural_speed (#19255) 2024-01-24 14:37:39 -08:00
deps_update_and_upload.py [Linter] Bump ruff and remove pylint (#17797) 2023-10-05 21:07:33 -07:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
nuget_helpers.cmake
onnxruntime.cmake Add MacOS build to ORT C Pod (#18550) 2023-11-28 10:11:53 -08:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
onnxruntime_compile_triton_kernel.cmake [ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196) 2023-07-11 13:55:30 +08:00
onnxruntime_config.h.in Enabling c++ 20 in MacOS build (#16187) 2023-09-26 11:27:02 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake Rework some external targets to ease building with -DFETCHCONTENT_FULLY_DISCONNECTED=ON (#15323) 2023-04-03 17:45:12 -07:00
onnxruntime_framework.cmake [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_framework.natvis [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_fuzz_test.cmake Fix fuzz test (#14385) 2023-01-22 22:17:43 -08:00
onnxruntime_graph.cmake Pre-link when creating static library for apple framework (#18241) 2023-11-03 23:38:29 +10:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_java_unittests.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_kernel_explorer.cmake [ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657) 2023-07-13 11:20:26 +08:00
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031) 2024-01-22 14:43:06 -08:00
onnxruntime_nodejs.cmake Added DML and CUDA provider support in onnxruntime-node (#16050) 2023-08-25 16:57:06 -07:00
onnxruntime_objectivec.cmake Objective C Training API: TrainingSession (#16374) 2023-06-28 09:13:56 -07:00
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake [ROCm] Fix hipify error: fast_divmod.h: No such file or directory (#19060) 2024-01-10 14:49:19 +08:00
onnxruntime_providers.cmake Add API for NPU Device Selection in the DML EP (#17612) 2023-10-11 14:53:00 -07:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_coreml.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cpu.cmake Update x64 template kernel library for 'sqnbitgemm' (#19016) 2024-01-18 13:16:34 -08:00
onnxruntime_providers_cuda.cmake [TensorRT EP] Enable a minimal CUDA EP compilation without kernels (#19052) 2024-01-17 11:33:34 -08:00
onnxruntime_providers_dml.cmake Delay load dxcore.dll in addition to ext-ms-win-dxcore-l1-1-0.dll (#18913) 2023-12-26 12:33:42 -08:00
onnxruntime_providers_dnnl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_nnapi.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_openvino.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_qnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_tensorrt.cmake [TensorRT EP] Properly set CUDA_INCLUDE_DIR for onnx-tensorrt (#18274) 2023-11-03 20:04:10 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake [VitisAI] 1. api compatbile 2. dynamic load onnx (#18470) 2023-12-14 14:43:41 -08:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Update XNNPACK to latest version (#18038) 2023-11-03 09:04:28 -07:00
onnxruntime_pyop.cmake
onnxruntime_python.cmake Remove DORT since it's in PyTorch main now (#18996) 2024-01-04 12:59:47 -08:00
onnxruntime_rocm_hipify.cmake [Cuda] Refactor GroupNorm (#19146) 2024-01-25 22:28:47 +08:00
onnxruntime_session.cmake added support for cmake "find_package" (#8919) 2023-06-19 22:20:31 -07:00
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake Triton Codegen for ORTModule (#15831) 2023-07-13 18:17:58 +08:00
onnxruntime_unittests.cmake [ROCm] enable hipGraph (#18382) 2024-01-23 11:17:04 +08:00
onnxruntime_util.cmake
onnxruntime_webassembly.cmake [WebNN EP] Fixed build issue with disable_rtti (#19173) 2024-01-16 21:35:13 -08:00
precompiled_header.cmake
riscv64.toolchain.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
Sdl.ruleset Add a Github workflow for Prefast (#15763) 2023-05-03 11:42:51 -07:00
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
wcos_rules_override.cmake
winml.cmake Update winml to use #cores - #soc cores by Default as the number of intraopthreads (#18384) 2023-11-28 09:26:48 -08:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00