mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
* stash commit (will discard all of this) * stash commit * First commit - needs a lot of testing! * Add a test * Fix imports and make the tests actually test something * Tests pass! * Rearrange test * Add comments (but it's still a bit confusing) * Stop storing the tokenizer * Comment fixup * Fix for input_ids with a single sequence * Update tests to test single sequences * make fixup * Fix incorrect use of isin() * Expand tests to catch more cases * Expand tests to catch more cases * make fixup * Fix length calculation and update tests * Handle Ġ as a space replacement too * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add optimizations from Joao's suggestion * Remove TODO * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * make fixup * Rename some variables and remove some debugging clauses for clarity * Add tests for the sub-methods * Clarify one test slightly * Add stop_strings to GenerationConfig * generate() supports stop_string arg, asks for tokenizer if not provided * make fixup * Cleanup code and rename variables for clarity * Update tokenizer error * Update tokenizer passing, handle generation on GPU * Slightly more explanation cleanup * More comment cleanup * Factor out the token cleanup so it's more obvious what we're doing, and we can change it later * Careful with that cleanup! * Cleanup + optimizations to _get_matching_positions * More minor performance tweaks * Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms) * Remove the pin_memory call * Parallelize across all stop strings! * Quick fix for tensor devices * Update embeddings test for the new format * Fix test imports * Manual patching for BERT-like tokenizers * Return a bool vector instead of a single True/False * Better comment * Better comment * Add tests from @zucchini-nlp * Amy's list creation nit * tok_list -> token_list * Push a big expanded docstring (should we put it somewhere else?) * Expand docstrings * Docstring fixups * Rebase * make fixup * Make a properly general method for figuring out token strings * Fix naming throughout the functions * Move cache, refactor, fix tests * Add comment * Remove finished TODO * Remove finished TODO * make fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update and shorten docstring * Update tests to be shorter/clearer and test specific cases --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| benchmark | ||
| bettertransformer | ||
| deepspeed | ||
| extended | ||
| fixtures | ||
| fsdp | ||
| generation | ||
| models | ||
| optimization | ||
| peft_integration | ||
| pipelines | ||
| quantization | ||
| repo_utils | ||
| sagemaker | ||
| tokenization | ||
| tools | ||
| trainer | ||
| utils | ||
| __init__.py | ||
| test_backbone_common.py | ||
| test_cache_utils.py | ||
| test_configuration_common.py | ||
| test_configuration_utils.py | ||
| test_feature_extraction_common.py | ||
| test_feature_extraction_utils.py | ||
| test_image_processing_common.py | ||
| test_image_processing_utils.py | ||
| test_image_transforms.py | ||
| test_modeling_common.py | ||
| test_modeling_flax_common.py | ||
| test_modeling_flax_utils.py | ||
| test_modeling_tf_common.py | ||
| test_modeling_tf_utils.py | ||
| test_modeling_utils.py | ||
| test_pipeline_mixin.py | ||
| test_processing_common.py | ||
| test_sequence_feature_extraction_common.py | ||
| test_tokenization_common.py | ||
| test_tokenization_utils.py | ||