mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
Create README.md (#7625)
* Create README.md * Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md * Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>
This commit is contained in:
parent
8e23749649
commit
bdda4f2249
1 changed files with 27 additions and 0 deletions
|
|
@ -0,0 +1,27 @@
|
|||
---
|
||||
language:
|
||||
- en
|
||||
- ar
|
||||
datasets:
|
||||
- gigaword
|
||||
- oscar
|
||||
- wikipedia
|
||||
---
|
||||
|
||||
## GigaBERT-v3
|
||||
GigaBERT-v3 is a customized bilingual BERT for English and Arabic. It was pre-trained in a large-scale corpus (Gigaword+Oscar+Wikipedia) with ~10B tokens, showing state-of-the-art zero-shot transfer performance from English to Arabic on information extraction (IE) tasks. More details can be found in the following paper:
|
||||
|
||||
@inproceedings{lan2020gigabert,
|
||||
author = {Lan, Wuwei and Chen, Yang and Xu, Wei and Ritter, Alan},
|
||||
title = {GigaBERT: Zero-shot Transfer Learning from English to Arabic},
|
||||
booktitle = {Proceedings of The 2020 Conference on Empirical Methods on Natural Language Processing (EMNLP)},
|
||||
year = {2020}
|
||||
}
|
||||
|
||||
## Usage
|
||||
```
|
||||
from transformers import *
|
||||
tokenizer = BertTokenizer.from_pretrained("lanwuwei/GigaBERT-v3-Arabic-and-English", do_lower_case=True)
|
||||
model = BertForTokenClassification.from_pretrained("lanwuwei/GigaBERT-v3-Arabic-and-English")
|
||||
```
|
||||
More code examples can be found [here](https://github.com/lanwuwei/GigaBERT).
|
||||
Loading…
Reference in a new issue