transformers/tests/generation
Matt 4563ba2c6f
Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)
* Fix StopStringCriteria to handle tokens above len(tokenizer)

This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer).

The fix:
1. Adds a clamp operation to ensure token IDs are within bounds
2. Adds a test case to verify the behavior

* Use self.stop_strings instead of stop_strings

* Handle clipping correctly

* make fixup

* Update test to the new embedding vecs

* Use much bigger values in the mismatch test

* Typo fix

* Slight simplification

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-02-06 16:53:28 +00:00
..
__init__.py
test_beam_constraints.py
test_beam_search.py
test_candidate_generator.py Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability (#35009) 2024-12-12 15:47:05 +01:00
test_configuration_utils.py [generate] can instantiate GenerationConfig(cache_implementation="static") (#35679) 2025-01-16 17:04:54 +00:00
test_flax_logits_process.py
test_flax_utils.py Fix CI (#34458) 2024-10-29 08:26:04 +01:00
test_framework_agnostic.py Generation: fix handling of special tokens (#31254) 2024-06-06 15:21:32 +05:00
test_fsdp.py Default synced_gpus to True when using FullyShardedDataParallel (#33483) 2024-10-10 14:09:04 -04:00
test_logits_process.py use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
test_stopping_criteria.py Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797) 2025-02-06 16:53:28 +00:00
test_streamers.py Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931) 2024-12-20 12:08:12 +01:00
test_tf_logits_process.py fix: multilingual midel convert to tflite get wrong token (#32079) 2024-08-27 11:44:09 +02:00
test_tf_utils.py Revert workaround for TF safetensors loading (#30128) 2024-04-09 11:04:18 +01:00
test_utils.py Iterative generation using Input embeds and past_key_values (#35890) 2025-02-06 11:06:05 +01:00