update with #s of sentences/tokens (#6546)

This commit is contained in:
Jim Regan 2020-08-17 21:48:05 +01:00 committed by GitHub
parent 63144701ed
commit fb7330b30e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -15,6 +15,8 @@ tags:
* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish)
* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler)
(2125804 sentences, 47419062 tokens, as reckoned by wc)
```
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="jimregan/BERTreach", tokenizer="jimregan/BERTreach")