* Save state
* Make a failing test
* Better test
* mpt -> done, many more to go
* Rm extranious
* Bamba
* Bert
* big_bird
* biogpt
* bloom
* codegen
* ctrl
* data2vec
* dbrx
* Through up to Dbrx
* electra
* ernie
* falcon
* Fuyu/persimmon
* Include noop kwargs to base models
* Rebase
* Skip musigen
* Refactor/skip mllama
* Revert makefile
* Rm file
* Fix PT failing, need to modify rest of loss funcs to not resize
* Propagate some
* Continue
* More
* More options
* Mostly fixed
* Proved that it's the same
* Bloom is good
* Make ability to override loss func possible
* Fixup
* Clean
* Fix xglm
* Quality tests
* Skip OCR2
* Make specific loss for xglm
* Make order the same/line up 1:1
* xglm
* Skip fx output loss bloom model
* Didn't pass in pad_token_id
* Fix quality
* correctly slice
* check mask
* Update modular_gemma2.py
* fix
* add tests
* fix typo
* finally fix mask slicing
* Finally correctly slice in all cases!!
* add test for all attention functions
* small fix in tests
* trick around dynamo tracing issue
* last update
* more robust
* kwargs propagation
* make it explicit for checkpointing
* apply modular
* Restore is_torch_greater_or_equal_than for backward compatibility
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
* review comments
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
---------
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
* model can convert to HF and be loaded back
* nit
* works in single batch generation but hallucinates
* use the image tokens
* add image generation
* now it works
* add tests
* update
* add modulare but it doesn't work for porting docstring :(
* skip some tests
* add slow tests
* modular removed the import?
* guess this works
* update
* update
* fix copies
* fix test
* fix copies
* update
* docs
* fix tests
* last fix tests?
* pls
* repo consistency
* more style
* style
* remove file
* address comments
* tiny bits
* update after the new modular
* fix tests
* add one more cond in check attributes
* decompose down/up/mid blocks
* allow static cache generation in VLMs
* nit
* fix copies
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix VAE upsampling
* Update src/transformers/models/emu3/modular_emu3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments
* state overwritten stuff explicitly
* fix copies
* add the flag for flex attn
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Introduce 5 integration tests for the 4 model classes + torch export
* ModernBert: reuse GemmaRotaryEmbedding via modular
* Revert #35589, keep rope_kwargs; rely on them in modular_modernbert
* Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert"
This reverts commit 11b44b9ee83e199cbfb7c5ba2d11f7a7fdbba2d3.
* Don't set rope_kwargs; override 'self.rope_init_fn' call instead
* bug fixes
* organize imports
* wrap cpu warning in reference_compile
* Avoid needing repad_logits_with_grad, always repad with grads when training
I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that?
* Revert "Avoid needing repad_logits_with_grad, always repad with grads when training"
This reverts commit cedcb4e89bcea199a1135a0933e71f534b656239.
* Fix grammar: keep -> keeps
* Propagate grammar fix with modular_model_converter
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
* universal checkpoint
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer
in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this.
* Simplify setting self.add_prefix_space, ensure pre_tok exists
* Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized'
862d1a346a/bindings/python/src/pre_tokenizers.rs (L672) produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer.
* Propagate add_prefix_space in T5TokenizerFast to superclass
* look-ahead negation
* re add examples by default
* Fix the bug in topological sort
* Update create_dependency_mapping.py
* start adding test
* finalize test
* more tests
* style
* style
* update modular_modernbert -- add inputs_embeds param to ModernBertModel
* Fix implementation issues; extend to other classes; docstring
First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented.
I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes.
I also introduced an error if input_ids and input_embeds are both or neither provided.
Lastly, I fixed an issue with device being based solely on input_ids with attention_mask.
* Propagate inputs_embeds to ModernBertForMaskedLM correctly
Also reintroduce inputs_embeds test
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
* setup loss_type in config at model init time
ensures no additional graph break introduced when torch.compile'ed
fixes#34615
Signed-off-by: ChanderG <mail@chandergovind.org>
* lookup loss mapping at init time instead of manual setup
Signed-off-by: ChanderG <mail@chandergovind.org>
* remove redundant lookup at loss_function time
* overwride losstype at init time
---------
Signed-off-by: ChanderG <mail@chandergovind.org>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* إضافة الترجمة العربية: multiple_choice.md
* Update multiple_choice.md
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Add files via upload
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* update codecarbon
* replace directly-specified-test-dirs with tmp_dir
* pass tmp_dir to all get_regression_trainer
* test_trainer.py: Use tmp_dir consistently for all output_dir arguments
* fix some with...as tmp_dir blocks
* reflect the comments to improve test_trainer.py
* refresh .gitignore