transformers/tests
pglorio 33cb1f7b61
Add Zamba2 (#34517)
* First commit

* Finish model implementation

* First commit

* Finish model implementation

* Register zamba2

* generated modeling and configuration

* generated modeling and configuration

* added hybrid cache

* fix attention_mask in mamba

* dropped unused loras

* fix flash2

* config docstrings

* fix config and fwd pass

* make fixup fixes

* text_modeling_zamba2

* small fixes

* make fixup fixes

* Fix modular model converter

* added inheritances in modular, renamed zamba cache

* modular rebase

* new modular conversion

* fix generated modeling file

* fixed import for Zamba2RMSNormGated

* modular file cleanup

* make fixup and model tests

* dropped inheritance for Zamba2PreTrainedModel

* make fixup and unit tests

* Add inheritance of rope from GemmaRotaryEmbedding

* moved rope to model init

* drop del self.self_attn and del self.feed_forward

* fix tests

* renamed lora -> adapter

* rewrote adapter implementation

* fixed tests

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Dropped adapter in-place sum

* removed rope from attention init

* updated rope

* created get_layers method

* make fixup fix

* make fixup fixes

* make fixup fixes

* update to new attention standard

* update to new attention standard

* make fixup fixes

* minor fixes

* cache_position

* removed cache_position postion_ids use_cache

* remove config from modular

* removed config from modular (2)

* import apply_rotary_pos_emb from llama

* fixed rope_kwargs

* Instantiate cache in Zamba2Model

* fix cache

* fix @slow decorator

* small fix in modular file

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* several minor fixes

* inherit mamba2decoder fwd and drop position_ids in mamba

* removed docstrings from modular

* reinstate zamba2 attention decoder fwd

* use regex for tied keys

* Revert "use regex for tied keys"

This reverts commit 9007a522b1f831df6d516a281c0d3fdd20a118f5.

* use regex for tied keys

* add cpu to slow forward tests

* dropped config.use_shared_mlp_adapter

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* re-convert from modular

---------

Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-27 10:51:23 +01:00
..
agents use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
bettertransformer use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
deepspeed use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
extended
fixtures
fsdp [tests] make cuda-only tests device-agnostic (#35607) 2025-01-13 14:48:39 +01:00
generation Add Zamba2 (#34517) 2025-01-27 10:51:23 +01:00
models Add Zamba2 (#34517) 2025-01-27 10:51:23 +01:00
optimization
peft_integration use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
pipelines Fix test_pipelines_video_classification that was always failing (#35842) 2025-01-23 19:22:32 +01:00
quantization use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
repo_utils Fix modular edge case + modular sorting order (#35562) 2025-01-09 17:17:52 +01:00
sagemaker
tokenization tokenizer train from iterator without pre_tokenizers (#35396) 2025-01-09 15:34:43 +01:00
tp Simplify Tensor Parallel implementation with PyTorch TP (#34184) 2024-11-18 19:51:49 +01:00
trainer use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
utils use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
__init__.py
test_backbone_common.py
test_configuration_common.py
test_feature_extraction_common.py
test_image_processing_common.py use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
test_image_transforms.py
test_modeling_common.py use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
test_modeling_flax_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_modeling_tf_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_pipeline_mixin.py
test_processing_common.py VLMs: major clean up 🧼 (#34502) 2025-01-08 10:35:23 +01:00
test_sequence_feature_extraction_common.py
test_tokenization_common.py [tokenizers] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593) 2025-01-09 17:46:50 +01:00