Processor: prevent duplicated tokens

When using text-only LLMs, the chat template is expected to take care of
adding the required special tokens, such as bos. Hence, tokenization
must not include special tokens.

The same contract should be honored for multimodal processors.
This commit is contained in:
Pedro Cuenca 2025-02-06 10:41:05 +01:00
parent b5f327f350
commit c4cbed8081

View file

@ -1246,6 +1246,7 @@ class ProcessorMixin(PushToHubMixin):
text=prompt,
images=images if images else None,
videos=videos if videos else None,
add_special_tokens=False,
**kwargs,
)
if return_dict: