transformers/tests/generation
Matt 0d84901cb7
Terminator strings for generate() (#28932)
* stash commit (will discard all of this)

* stash commit

* First commit - needs a lot of testing!

* Add a test

* Fix imports and make the tests actually test something

* Tests pass!

* Rearrange test

* Add comments (but it's still a bit confusing)

* Stop storing the tokenizer

* Comment fixup

* Fix for input_ids with a single sequence

* Update tests to test single sequences

* make fixup

* Fix incorrect use of isin()

* Expand tests to catch more cases

* Expand tests to catch more cases

* make fixup

* Fix length calculation and update tests

* Handle Ġ as a space replacement too

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add optimizations from Joao's suggestion

* Remove TODO

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* make fixup

* Rename some variables and remove some debugging clauses for clarity

* Add tests for the sub-methods

* Clarify one test slightly

* Add stop_strings to GenerationConfig

* generate() supports stop_string arg, asks for tokenizer if not provided

* make fixup

* Cleanup code and rename variables for clarity

* Update tokenizer error

* Update tokenizer passing, handle generation on GPU

* Slightly more explanation cleanup

* More comment cleanup

* Factor out the token cleanup so it's more obvious what we're doing, and we can change it later

* Careful with that cleanup!

* Cleanup + optimizations to _get_matching_positions

* More minor performance tweaks

* Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms)

* Remove the pin_memory call

* Parallelize across all stop strings!

* Quick fix for tensor devices

* Update embeddings test for the new format

* Fix test imports

* Manual patching for BERT-like tokenizers

* Return a bool vector instead of a single True/False

* Better comment

* Better comment

* Add tests from @zucchini-nlp

* Amy's list creation nit

* tok_list -> token_list

* Push a big expanded docstring (should we put it somewhere else?)

* Expand docstrings

* Docstring fixups

* Rebase

* make fixup

* Make a properly general method for figuring out token strings

* Fix naming throughout the functions

* Move cache, refactor, fix tests

* Add comment

* Remove finished TODO

* Remove finished TODO

* make fixup

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update and shorten docstring

* Update tests to be shorter/clearer and test specific cases

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 14:13:04 +01:00
..
__init__.py [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
test_beam_constraints.py Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
test_beam_search.py Time to Say Goodbye, torch 1.7 and 1.8 (#22291) 2023-03-21 19:22:01 +01:00
test_configuration_utils.py Generate: get generation mode from the generation config instance 🧼 (#29441) 2024-03-06 11:18:35 +00:00
test_flax_logits_process.py Adding FlaxNoRepeatNGramLogitsProcessor (#29677) 2024-04-02 11:39:33 +02:00
test_flax_utils.py Add support for beam search's num_return_sequencs flag in flax (#23082) 2023-05-03 10:50:34 -04:00
test_framework_agnostic.py Revert workaround for TF safetensors loading (#30128) 2024-04-09 11:04:18 +01:00
test_logits_process.py Change in-place operations to out-of-place in LogitsProcessors (#29680) 2024-03-21 16:37:33 +00:00
test_stopping_criteria.py Terminator strings for generate() (#28932) 2024-04-22 14:13:04 +01:00
test_streamers.py Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
test_tf_logits_process.py Better TF docstring types (#23477) 2023-05-24 13:52:52 +01:00
test_tf_utils.py Revert workaround for TF safetensors loading (#30128) 2024-04-09 11:04:18 +01:00
test_utils.py Terminator strings for generate() (#28932) 2024-04-22 14:13:04 +01:00