transformers/tests
Niklas Muennighoff ecd61c6286
Add OLMoE (#32406)
* Add OLMoE

* Add OLMoE

* Updates

* Make norm optional; add keys

* Add output

* Add

* Fix dtype

* Fix eos config

* Update

* Add OLMoE

* Fix OLMoE path

* Format

* Format

* Rmv copy statement

* Rmv copy statement

* Format

* Add copies

* Cp rotary

* Fix aming

* Fix naming

* Update RoPE integration; num_logits_to_keep; Add copy statements

* Add eps to config

* Format

* Add aux loss

* Adapt router_aux_loss_coef

* Update md

* Adapt

* adapt tests
2024-09-03 18:43:12 +02:00
..
agents Add duckduckgo search tool (#32882) 2024-09-02 09:56:20 +02:00
benchmark
bettertransformer
deepspeed Revert PR 32299, flag users when Zero-3 was missed (#32851) 2024-08-16 12:35:41 -04:00
extended Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
fixtures
fsdp 🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627) 2024-08-20 11:42:36 +02:00
generation Generate: fix assistant in different device (#33257) 2024-09-02 14:37:49 +01:00
models Add OLMoE (#32406) 2024-09-03 18:43:12 +02:00
optimization fix: Fixed the 1st argument name in classmethods (#31907) 2024-07-11 12:11:50 +01:00
peft_integration
pipelines Add assistant prefill for chat templates and TextGenerationPipeline (#33198) 2024-09-02 13:23:47 +01:00
quantization 🚨 Support dequantization for most GGML types (#32625) 2024-09-03 12:58:14 +02:00
repo_utils Refactor CI: more explicit (#30674) 2024-08-30 18:17:25 +02:00
sagemaker Fixed log messages that are resulting in TypeError due to too many arguments (#32017) 2024-07-17 10:56:44 +01:00
tokenization #32184 save total_vocab_size (#32240) 2024-08-05 09:22:48 +02:00
trainer Only disallow DeepSpeed Zero-3 for auto bs finder (#31731) 2024-09-03 09:16:28 -04:00
utils Add a static cache that offloads to the CPU or other device (#32161) 2024-08-29 11:51:09 +02:00
__init__.py
test_backbone_common.py
test_configuration_common.py Refactor: Removed un-necessary object base class (#32230) 2024-07-26 10:33:02 +02:00
test_feature_extraction_common.py
test_image_processing_common.py Update kwargs validation for preprocess with decorator (#32024) 2024-08-06 11:33:05 +01:00
test_image_transforms.py
test_modeling_common.py Test: add higher atol in test_forward_with_num_logits_to_keep (#33093) 2024-08-26 15:23:30 +01:00
test_modeling_flax_common.py
test_modeling_tf_common.py
test_pipeline_mixin.py fix: Fixed raising TypeError instead of ValueError for invalid type (#32111) 2024-07-22 17:46:17 +01:00
test_processing_common.py Modify ProcessorTesterMixin for better generalization (#32637) 2024-08-13 11:48:53 -04:00
test_sequence_feature_extraction_common.py
test_tokenization_common.py Add assistant prefill for chat templates and TextGenerationPipeline (#33198) 2024-09-02 13:23:47 +01:00