mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-15 21:01:19 +00:00
* fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer * account for apostrophe at start of new word * remove _run_split_on_punc, use re.findall instead * remove debugging, make style and quality * use pattern and punc splitting, repo-consistency will fail * remove commented out debugging * adds bool args to BasicTokenizer, remove pattern * do_split_on_punc default True * clean stray comments and line breaks * rebase, repo-consistency * update to just do punctuation split * add unicode normalizing back * remove redundant line |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_modeling_bert.py | ||
| test_modeling_flax_bert.py | ||
| test_modeling_tf_bert.py | ||
| test_tokenization_bert.py | ||
| test_tokenization_bert_tf.py | ||