mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-22 22:01:08 +00:00
### Description Add memory efficient attention from CUTLASS. TODO (in next pull request): (1) Need performance tests on different GPUs, then add a sequence length threshold (only activate it for long sequence length). (2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when it is in cutlass master. |
||
|---|---|---|
| .. | ||
| eigen@d10b27fe37 | ||
| emsdk@c220895fd1 | ||
| libprotobuf-mutator@7a2ed51a6b | ||
| onnx@9b7bca2a72 | ||
| onnxruntime-extensions@81e7799c69 | ||
| protobuf@a902b39270 | ||
| abseil-cpp.cmake | ||
| abseil-cpp.natvis | ||
| composable_kernel.cmake | ||
| cutlass.cmake | ||
| dml.cmake | ||
| dnnl.cmake | ||
| eigen.cmake | ||
| extensions.cmake | ||
| find_snpe.cmake | ||
| FindNumPy.cmake | ||
| helper_functions.cmake | ||
| ipp-crypto.cmake | ||
| mimalloc.cmake | ||
| onnx_minimal.cmake | ||
| onnx_protobuf.natvis | ||
| onnxruntime_external_deps.cmake | ||
| protobuf_function.cmake | ||
| pybind11.cmake | ||
| pyxir.cmake | ||
| triton.cmake | ||
| tvm.cmake | ||
| wil.cmake | ||
| xnnpack.cmake | ||