onnxruntime/cmake/external
pengwa 37f743680a
Fix build when flash attention and memory efficient attention are disabled (#18761)
### Fix build when flash attention and memory efficient attention are
disabled

On a customer env with lower version of CUDA < 11.6. Both flash
attention and memory efficient attention is turned OFF according to
e8f33b54ba/cmake/CMakeLists.txt (L701).
So
e8f33b54ba/cmake/external/cutlass.cmake (L1)
condition check return false. No cutlass lib is built.

```
Turn off flash attention since CUDA compiler version < 11.6
```

While, the kernels in
https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/moe/ft_moe
are depending on cutass for its build, so we get error like this:

```
[ 77%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu.o
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
   23 | #include "cutlass/array.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
   23 | #include "cutlass/array.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
   23 | #include "cutlass/array.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
   23 | #include "cutlass/array.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_00044da3_00000000-11_moe_gemm_kernels_fp16_fp16.compute_60.cpp1.ii
make[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:6290: CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:2210: CMakeFiles/onnxruntime_providers_cuda.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
Traceback (most recent call last):
  File "/tmp/onnxruntime/tools/ci_build/build.py", line 2746, in <module>
    sys.exit(main())
  File "/tmp/onnxruntime/tools/ci_build/build.py", line 2639, in main
    build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
  File "/tmp/onnxruntime/tools/ci_build/build.py", line 1527, in build_targets
    run_subprocess(cmd_args, env=env)
  File "/tmp/onnxruntime/tools/ci_build/build.py", line 824, in run_subprocess
    return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
  File "/tmp/onnxruntime/tools/python/util/run.py", line 49, in run
    completed_process = subprocess.run(
  File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
```


### Motivation and Context

To summarize, there are two cases we will have build failure for Linux
CUDA build:
1. User use cuda version < 11.6
2. User disabled Flash attention and memory efficient attention
explictly with onnxruntime_USE_FLASH_ATTENTION and
onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION
2023-12-26 08:57:58 +08:00
..
emsdk@a896e3d066 [wasm] upgrade emsdk to 3.1.44 (#17069) 2023-08-10 16:08:36 -07:00
git.Win32.2.41.03.patch Fix ability to use patch on Windows CI machines (#18356) 2023-11-11 07:32:14 +10:00
libprotobuf-mutator@7a2ed51a6b
onnx@b86cc54efc use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177) 2023-10-31 14:58:21 -07:00
abseil-cpp.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
abseil-cpp.natvis Create edges with arg positons correctly accounting for non-existing args (#18462) 2023-11-20 14:49:09 -08:00
composable_kernel.cmake [ROCm] Update CK version (#17628) 2023-11-13 15:43:38 -08:00
cutlass.cmake Fix build when flash attention and memory efficient attention are disabled (#18761) 2023-12-26 08:57:58 +08:00
dml.cmake Bump DirectML version from 1.12.0 to 1.12.1 (#17225) 2023-08-20 09:55:38 -07:00
dnnl.cmake [DNNL] add Arm Compute Library (ACL) backend for dnnl execution provider (#15847) 2023-12-01 09:16:44 -08:00
eigen.cmake Fix ability to use patch on Windows CI machines (#18356) 2023-11-11 07:32:14 +10:00
extensions.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
find_snpe.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
FindNumPy.cmake
helper_functions.cmake Improve cache hit rate in windows build (#15538) 2023-04-18 09:31:35 -07:00
ipp-crypto.cmake [TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381) 2022-07-31 14:36:54 +02:00
mimalloc.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnx_minimal.cmake Fix some build issues on MacOS with Xcode 14.3. (#15878) 2023-06-07 12:07:11 -07:00
onnx_protobuf.natvis Fix visualization issues with Attribute/Tensor protos (#17188) 2023-08-16 13:56:51 -07:00
onnxruntime_external_deps.cmake FIX: Our cmake script didn't check googletest's hash (#18826) 2023-12-15 08:48:15 -08:00
protobuf_function.cmake Fix some build issues on MacOS with Xcode 14.3. (#15878) 2023-06-07 12:07:11 -07:00
pybind11.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
pyxir.cmake
tvm.cmake [TVM EP] Support zero copying TVM EP output tensor to ONNX Runtime output tensor (#12593) 2023-02-08 10:02:20 -08:00
wil.cmake Rework WIL dependency retrieval/usage (#17130) 2023-08-15 09:11:46 -07:00
xnnpack.cmake Update XNNPACK to latest version (#18038) 2023-11-03 09:04:28 -07:00