pytorch/caffe2
Xinya Zhang 12116aee68 Add Flash Attention support on ROCM (#121561)
This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton)

- [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`).
    * MI300X is supported. More architectures will be added once Triton support them.
- [x] Only supports power of two sequence lengths.
    * Now it support arbitrary sequence length
- [ ] No support for varlen APIs.
    * varlen API will be supported in future release of AOTriton
- [x] Only support head dimension 16,32,64,128.
    * Now it support arbitrary head dimension <= 256
- [x] Performance is still being optimized.
    * Kernel is selected according to autotune information from Triton.

Other improvements from AOTriton include
* Allow more flexible Tensor storage layout
* More flexible API

This is a more extensive fix to #112997

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561
Approved by: https://github.com/huydhn
2024-03-28 00:27:38 +00:00
..
contrib [codemod] Remove unused variables in caffe2/caffe2/contrib/fakelowp/spatial_batch_norm_fp16_fake_op.h (#120178) 2024-03-19 22:36:38 +00:00
core [cuDNN] Cleanup cuDNN < 8.1 ifdefs (#120862) 2024-03-07 01:46:25 +00:00
cuda_rtc
db
distributed
experiments [codemod] Remove unused variables in caffe2/caffe2/experiments/operators/tt_pad_op.h (#120177) 2024-03-19 23:36:52 +00:00
ideep
image
mobile
mpi
observers
onnx [codemod][highrisk] Fix shadowed variable in caffe2/caffe2/onnx/onnx_exporter.cc (#117996) 2024-01-22 22:57:06 +00:00
operators [codemod] Remove unused variables in caffe2/caffe2/operators/softmax_op_cudnn.cc (#121995) 2024-03-19 22:35:58 +00:00
opt [codemod] Remove unused variables in caffe2/caffe2/opt/nql/graphmatcher.cc (#118116) 2024-03-19 22:45:43 +00:00
perfkernels [caffe2] Add an avx512 implementation of adagrad_update (#113289) 2024-02-15 01:45:30 +00:00
predictor
proto
python Move doc links to point to main (#121823) 2024-03-15 19:49:37 +00:00
quantization
queue
serialize Expose recordSize in ChunkRecordIterator (#120239) 2024-02-21 04:33:03 +00:00
sgd
share
test
transforms
utils [codemod][lowrisk] Fix deprecated use of 0/NULL (#120740) 2024-02-28 20:13:13 +00:00
video [codemod] Remove unused variables in caffe2/caffe2/video/video_decoder.cc (#122151) 2024-03-19 22:34:17 +00:00
.clang-format
__init__.py
BUILD_MODE.bzl
CMakeLists.txt Add Flash Attention support on ROCM (#121561) 2024-03-28 00:27:38 +00:00
README.md
release-notes.md
requirements.txt
unexported_symbols.lds
VERSION_NUMBER
version_script.lds

Caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

Questions and Feedback

Please use GitHub issues (https://github.com/pytorch/pytorch/issues) to ask questions, report bugs, and request new features.

Further Resources on Caffe2.ai