mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
## Description
Our current CP doesn't support efficient attention when `compute_log_sumexp=False`. `compute_log_sumexp=False` only if that `requires_grad=False` and since PP's [shape inference](
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| _attention.py | ||
| _func_map.py | ||
| _register_sharding.py | ||
| _tp_transform.py | ||