transformers

mirror of https://github.com/saymrwulf/transformers.git synced 2026-05-14 20:58:08 +00:00

Author	SHA1	Message	Date
Arthur Zucker	298b3f1930	v4.48.3	2025-02-07 10:32:49 +01:00
Arthur Zucker	d28f0207d5	GPTNeoX needs kwargs	2025-02-07 10:14:53 +01:00
Zach Mueller	3d6e55c7e7	Fix model kwargs (#35875 ) * Save state * Make a failing test * Better test * mpt -> done, many more to go * Rm extranious * Bamba * Bert * big_bird * biogpt * bloom * codegen * ctrl * data2vec * dbrx * Through up to Dbrx * electra * ernie * falcon * Fuyu/persimmon * Include noop kwargs to base models * Rebase * Skip musigen * Refactor/skip mllama * Revert makefile * Rm file * Fix PT failing, need to modify rest of loss funcs to not resize * Propagate some * Continue * More * More options * Mostly fixed * Proved that it's the same * Bloom is good * Make ability to override loss func possible * Fixup * Clean * Fix xglm * Quality tests * Skip OCR2 * Make specific loss for xglm * Make order the same/line up 1:1 * xglm * Skip fx output loss bloom model * Didn't pass in pad_token_id * Fix quality	2025-02-06 21:09:51 +01:00
Raushan Turganbay	093bebcdd9	Paligemma: fix generation with Gemma2 (#36044 ) * fix paligemma * nit * use `kwargs` in models that can load any LM * update changes to only affect Paligenma	2025-02-06 14:37:47 +01:00
Cyril Vallez	97a6cf9072	Fix device in rope module when using dynamic updates (#35608 ) fix rope device	2025-02-06 14:27:04 +01:00
Matt	11e31ec24f	Add future import for Py < 3.10 (#35666 ) * Add future import for Py < 3.10 * make fixup * Same issue in convert_olmo2_weights_to_hf.py	2025-02-05 11:36:39 +01:00
Cyril Vallez	b673c16cad	Fix mask slicing for models with HybridCache (#35681 ) * correctly slice * check mask * Update modular_gemma2.py * fix * add tests * fix typo * finally fix mask slicing * Finally correctly slice in all cases!! * add test for all attention functions * small fix in tests * trick around dynamo tracing issue * last update * more robust * kwargs propagation * make it explicit for checkpointing * apply modular	2025-01-30 18:54:13 +01:00
Yih-Dar	aa3e590100	Update `squad_convert_example_to_features` to work with numpy v2 (#35955 ) * Fix * Fix * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-30 18:48:25 +01:00
Arthur Zucker	f3fad5755a	v4.48.2	2025-01-30 18:34:40 +01:00
Ilyas Moutawwakil	e5f88ae076	Fix is_causal being a tensor (#35791 ) * fix is_causal being a tensor * convert in sdpa attention only when jit tracing	2025-01-30 09:24:54 +01:00
Raushan Turganbay	163c8bbdc9	Fix: loading DBRX back from saved path (#35728 ) * fix dtype as dict for some models + add test * add comment in tests	2025-01-30 09:24:51 +01:00
Marc Sun	b17abf9519	Fix NoneType type as it requires py>=3.10 (#35843 ) fix type	2025-01-30 09:23:48 +01:00
Tyler Michael Smith	f7b6047a4e	Restore is_torch_greater_or_equal_than for backward compatibility (#35734 ) * Restore is_torch_greater_or_equal_than for backward compatibility Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * review comments Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-30 09:23:06 +01:00
Arthur Zucker	2e752ead46	revert my changes	2025-01-20 17:05:34 +01:00
Arthur Zucker	785b5cf444	v4.48.1	2025-01-20 16:20:06 +01:00
eustlb	3b09464364	Patch moonshine (#35731 ) * udpate expected logits for T4 runners * update doc * correct order of the args for better readability * remove generate wrap * convert modular	2025-01-20 16:19:50 +01:00
kang sheng	b00807fac2	Fix condition when GA loss bug fix is not performed (#35651 ) * fix condition when GA loss bug fix is not performed * max loss diff is 2.29 * fix typo * add an extra validation that loss should not vary too much	2025-01-20 16:12:49 +01:00
Arthur	612bfd0801	[`Phi`] bias should be True (#35650 ) bias should be True	2025-01-20 16:12:09 +01:00
Raushan Turganbay	6bc0fbcfa7	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:30:23 +01:00
Cyril Vallez	59e28c30fa	Fix flex_attention in training mode (#35605 ) * fix flex * add test * style	2025-01-10 11:50:12 +01:00
Arthur Zucker	7cf6230e25	push a fix for now	2025-01-10 11:34:08 +01:00
Arthur Zucker	d6f446ffa7	when filtering we can't use the convert script as we removed them	2025-01-10 11:29:31 +01:00
Arthur Zucker	8ce1e9578a	[test-all]	2025-01-10 11:20:41 +01:00
eustlb	af2d7caff3	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:03:36 +01:00
Tom Aarsen	42b8e7916b	ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459 ) * Introduce 5 integration tests for the 4 model classes + torch export * ModernBert: reuse GemmaRotaryEmbedding via modular * Revert #35589, keep rope_kwargs; rely on them in modular_modernbert * Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert" This reverts commit 11b44b9ee83e199cbfb7c5ba2d11f7a7fdbba2d3. * Don't set rope_kwargs; override 'self.rope_init_fn' call instead	2025-01-10 10:27:39 +01:00
Arthur Zucker	e39c9f7a78	v4.48-release	2025-01-10 10:12:04 +01:00
Zach Mueller	8de7b1ba8d	Add flex_attn to diffllama (#35601 ) Add sdpa to diffllama	2025-01-09 20:49:11 +01:00
Benjamin Warner	1e3ddcb2d0	ModernBERT bug fixes (#35404 ) * bug fixes * organize imports * wrap cpu warning in reference_compile * Avoid needing repad_logits_with_grad, always repad with grads when training I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that? * Revert "Avoid needing repad_logits_with_grad, always repad with grads when training" This reverts commit cedcb4e89bcea199a1135a0933e71f534b656239. * Fix grammar: keep -> keeps * Propagate grammar fix with modular_model_converter --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>	2025-01-09 20:15:38 +01:00
Arthur	e97d7a5be5	add `_supports_flex_attn = True` for models that do support it (#35598 ) * add `_supports_flex_attn = True` * fix repo consistency	2025-01-09 20:03:33 +01:00
胡译文	c9c682d19c	[doc] deepspeed universal checkpoint (#35015 ) * universal checkpoint * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-09 09:50:51 -08:00
Cyril Vallez	3a4ae6eace	Refactor/fix Cohere2 (#35594 ) * refactor/fix cohere2 * add kwargs * tests * remove func and import it	2025-01-09 17:54:57 +01:00
Tom Aarsen	32e0db8a69	[`tokenizers`] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593 ) * Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this. * Simplify setting self.add_prefix_space, ensure pre_tok exists * Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized' `862d1a346a/bindings/python/src/pre_tokenizers.rs (L672)` produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer. * Propagate add_prefix_space in T5TokenizerFast to superclass	2025-01-09 17:46:50 +01:00
Cyril Vallez	46276f9a7f	Fix modular edge case + modular sorting order (#35562 ) * look-ahead negation * re add examples by default * Fix the bug in topological sort * Update create_dependency_mapping.py * start adding test * finalize test * more tests * style * style	2025-01-09 17:17:52 +01:00
Amit Luhar	d3fe9fa3fe	PR for Issue #22694 : Fixed Training Evaluation table display for VSCode (#35557 )	2025-01-09 15:05:47 +00:00
Pablo Montalvo	395b114bd1	Small fix rope kwargs (#35589 ) * don't know why this keeps popping up? * remove unused rope_kwargs	2025-01-09 15:40:36 +01:00
Yih-Dar	82dd6c14bb	Fix flaky `SwitchTransformersModelTest::test_training_gradient` (#35587 ) * fix * Update tests/models/switch_transformers/test_modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-09 15:36:22 +01:00
Arthur	eb4579cf43	`tokenizer` train from iterator without pre_tokenizers (#35396 ) * fix if else issues * add a test * fix the test * style	2025-01-09 15:34:43 +01:00
Mehant Kammakomati	320512df46	feat: add TP plan for granite (#35573 ) Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>	2025-01-09 15:25:55 +01:00
Saif Rehman Nasir	633da1b10e	[Idefics3] Move image features to same device as input embeds (#35100 ) * [Idefics3] Move image features to same device as input embeds * Update src/transformers/models/idefics3/modeling_idefics3.py * make style --------- Co-authored-by: Saif Rehman Nasir <shyshin@github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Raushan Turganbay <raushan@huggingface.co>	2025-01-09 14:25:36 +01:00
Jack Morris	832c6191ed	Add inputs_embeds param to ModernBertModel (#35373 ) * update modular_modernbert -- add inputs_embeds param to ModernBertModel * Fix implementation issues; extend to other classes; docstring First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented. I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes. I also introduced an error if input_ids and input_embeds are both or neither provided. Lastly, I fixed an issue with device being based solely on input_ids with attention_mask. * Propagate inputs_embeds to ModernBertForMaskedLM correctly Also reintroduce inputs_embeds test --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2025-01-09 14:17:26 +01:00
Yih-Dar	1b2f942af7	Fix flaky `test_batching_equivalence` (#35564 ) * yes! * oh no!!! * oh no!!! * style * oh no!!! * oh no!!! * oh no!!! * oh no!!! --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-09 14:00:08 +01:00
Chander G	4adc415b6d	Setup loss_type in config at model init time (#34616 ) * setup loss_type in config at model init time ensures no additional graph break introduced when torch.compile'ed fixes #34615 Signed-off-by: ChanderG <mail@chandergovind.org> * lookup loss mapping at init time instead of manual setup Signed-off-by: ChanderG <mail@chandergovind.org> * remove redundant lookup at loss_function time * overwride losstype at init time --------- Signed-off-by: ChanderG <mail@chandergovind.org> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-01-09 13:32:21 +01:00
Cyril Vallez	c8ab6ce6ce	Re-add missing __all__ for Cohere and Phi3 (#35578 ) re-add missing __all__	2025-01-09 11:29:31 +01:00
Merve Noyan	487c31a21f	Minor fix in video text 2 text docs (#35546 ) minor fix in docs	2025-01-09 11:20:36 +01:00
Cyril Vallez	965a2fb320	More model refactoring! (#35359 ) * cohere * style * phi3 * style * small fix * small fix * phi3 longrope * oups * Update rope (only for phi3 still) * Update test_modeling_rope_utils.py * Update modeling_phi3.py * fix * fix copies * style * Fix copied from bad renaming	2025-01-09 11:09:09 +01:00
Raushan Turganbay	137965ca7d	Don't show warning for `inv_freq` buffers (#35255 ) dont show warning	2025-01-09 10:46:01 +01:00
Arthur	8cad65a698	Fix multi-gpu loss (#35395 ) push to device	2025-01-09 10:14:31 +01:00
Arthur	2e2f8015c0	update code owners (#35576 ) update	2025-01-09 09:55:41 +01:00
Ahmed Almaghz	a6256ec098	[i18n-ar] Translated file: `docs/source/ar/tasks/multiple_choice.md` into Arabic (#35199 ) * إضافة الترجمة العربية: multiple_choice.md * Update multiple_choice.md * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update _toctree.yml * Add files via upload * Update _toctree.yml --------- Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>	2025-01-08 14:17:58 -08:00
nhamanasu	b32938aeee	Fix all output_dir in test_trainer.py to use tmp_dir (#35266 ) * update codecarbon * replace directly-specified-test-dirs with tmp_dir * pass tmp_dir to all get_regression_trainer * test_trainer.py: Use tmp_dir consistently for all output_dir arguments * fix some with...as tmp_dir blocks * reflect the comments to improve test_trainer.py * refresh .gitignore	2025-01-08 19:44:39 +01:00

1 2 3 4 5 ...

17772 commits