mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask * format code using black and ruff * skip computing mask if attention_mask=None * add tests for load balancing loss Mixtral-Moe * fix assert loss is different in mixtral_test * fix pad_leng * use assertNotAlmostEqual and print to debug * remove print for debug * minor updates * reduce rtol and atol |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_modeling_mixtral.py | ||