mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

History

Xinya Zhang 12116aee68 Add Flash Attention support on ROCM (#121561 ) This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton) - [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`). * MI300X is supported. More architectures will be added once Triton support them. - [x] Only supports power of two sequence lengths. * Now it support arbitrary sequence length - [ ] No support for varlen APIs. * varlen API will be supported in future release of AOTriton - [x] Only support head dimension 16,32,64,128. * Now it support arbitrary head dimension <= 256 - [x] Performance is still being optimized. * Kernel is selected according to autotune information from Triton. Other improvements from AOTriton include * Allow more flexible Tensor storage layout * More flexible API This is a more extensive fix to #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561 Approved by: https://github.com/huydhn		2024-03-28 00:27:38 +00:00
..
contrib	[codemod] Remove unused variables in caffe2/caffe2/contrib/fakelowp/spatial_batch_norm_fp16_fake_op.h (#120178 )	2024-03-19 22:36:38 +00:00
core	[cuDNN] Cleanup cuDNN < 8.1 ifdefs (#120862 )	2024-03-07 01:46:25 +00:00
cuda_rtc
db
distributed
experiments	[codemod] Remove unused variables in caffe2/caffe2/experiments/operators/tt_pad_op.h (#120177 )	2024-03-19 23:36:52 +00:00
ideep
image
mobile
mpi
observers
onnx	[codemod][highrisk] Fix shadowed variable in caffe2/caffe2/onnx/onnx_exporter.cc (#117996 )	2024-01-22 22:57:06 +00:00
operators	[codemod] Remove unused variables in caffe2/caffe2/operators/softmax_op_cudnn.cc (#121995 )	2024-03-19 22:35:58 +00:00
opt	[codemod] Remove unused variables in caffe2/caffe2/opt/nql/graphmatcher.cc (#118116 )	2024-03-19 22:45:43 +00:00
perfkernels	[caffe2] Add an avx512 implementation of adagrad_update (#113289 )	2024-02-15 01:45:30 +00:00
predictor
proto
python	Move doc links to point to main (#121823 )	2024-03-15 19:49:37 +00:00
quantization
queue
serialize	Expose recordSize in ChunkRecordIterator (#120239 )	2024-02-21 04:33:03 +00:00
sgd
share
test
transforms
utils	[codemod][lowrisk] Fix deprecated use of 0/NULL (#120740 )	2024-02-28 20:13:13 +00:00
video	[codemod] Remove unused variables in caffe2/caffe2/video/video_decoder.cc (#122151 )	2024-03-19 22:34:17 +00:00
.clang-format
__init__.py
BUILD_MODE.bzl
CMakeLists.txt	Add Flash Attention support on ROCM (#121561 )	2024-03-28 00:27:38 +00:00
README.md
release-notes.md
requirements.txt
unexported_symbols.lds
VERSION_NUMBER
version_script.lds

README.md

Caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

Questions and Feedback

Please use GitHub issues (https://github.com/pytorch/pytorch/issues) to ask questions, report bugs, and request new features.

README.md

Caffe2

Questions and Feedback

Further Resources on Caffe2.ai