Switch from PreTrainedTokenizerFast to GPT2TokenizerFast and add eos_token & bos_token
#15
by
loubnabnl
HF Staff
- opened
PreTrainedTokenizerFast returns token_type_ids by default and santacoder is not trained on them so passing model(tokenizer(text)) can result in weird behavior in some cases. We'll use GPT2TokenizerFastinstead.
loubnabnl
changed pull request status to
merged