mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
Honor contributors to models (#11329)
* Honor contributors to models * Fix typo * Address review comments * Add more authors
This commit is contained in:
parent
aad95c7cde
commit
74712e22f3
57 changed files with 121 additions and 55 deletions
|
|
@ -43,7 +43,8 @@ Tips:
|
|||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||
number of (repeating) layers.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||
<https://github.com/google-research/ALBERT>`__.
|
||||
|
||||
AlbertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -35,7 +35,8 @@ According to the abstract,
|
|||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||
of up to 6 ROUGE.
|
||||
|
||||
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
|
||||
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
|
||||
|
||||
|
||||
Examples
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@ BARThez
|
|||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model`
|
||||
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
|
||||
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
|
||||
2020.
|
||||
|
||||
|
|
@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti
|
|||
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
|
||||
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
|
||||
|
||||
The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__.
|
||||
This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
|
||||
<https://github.com/moussaKam/BARThez>`__.
|
||||
|
||||
|
||||
Examples
|
||||
|
|
|
|||
|
|
@ -42,7 +42,8 @@ Tips:
|
|||
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
|
||||
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/bert>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/google-research/bert>`__.
|
||||
|
||||
BertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -71,6 +71,8 @@ Tips:
|
|||
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
|
||||
<bert>` for more usage examples.
|
||||
|
||||
This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.
|
||||
|
||||
BertJapaneseTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
|
|||
|
|
@ -79,7 +79,8 @@ Tips:
|
|||
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
||||
Therefore, no EOS token should be added to the end of the input.
|
||||
|
||||
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
|
||||
|
||||
BertGenerationConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -54,8 +54,8 @@ Example of use:
|
|||
>>> # from transformers import TFAutoModel
|
||||
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||
<https://github.com/VinAIResearch/BERTweet>`__.
|
||||
|
||||
BertweetTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -50,7 +50,8 @@ Tips:
|
|||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/bigbird>`__.
|
||||
This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
|
||||
`here <https://github.com/google-research/bigbird>`__.
|
||||
|
||||
BigBirdConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior
|
|||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
||||
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
|
||||
<https://github.com/facebookresearch/ParlAI>`__ .
|
||||
|
||||
|
||||
Implementation Notes
|
||||
|
|
|
|||
|
|
@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior
|
|||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
|
||||
found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
||||
|
||||
BlenderbotSmallConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -43,4 +43,5 @@ Tips:
|
|||
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
|
||||
algorithm to make BORT fine-tuning work.
|
||||
|
||||
The original code can be found `here <https://github.com/alexa/bort/>`__.
|
||||
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
|
||||
<https://github.com/alexa/bort/>`__.
|
||||
|
|
|
|||
|
|
@ -37,7 +37,8 @@ Tips:
|
|||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
The original code can be found `here <https://camembert-model.fr/>`__.
|
||||
This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
|
||||
<https://camembert-model.fr/>`__.
|
||||
|
||||
CamembertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
|
|||
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
|
||||
using less than 1/4 training cost. Code and pre-trained models will be released.*
|
||||
|
||||
ConvBERT training tips are similar to those of BERT. The original implementation can be found here:
|
||||
https://github.com/yitu-opensource/ConvBert
|
||||
ConvBERT training tips are similar to those of BERT.
|
||||
|
||||
This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
|
||||
here: https://github.com/yitu-opensource/ConvBert
|
||||
|
||||
ConvBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc
|
|||
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
|
||||
NLP tasks in the settings of few-shot (even zero-shot) learning.*
|
||||
|
||||
The original implementation can be found here: https://github.com/TsinghuaAI/CPM-Generate
|
||||
This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
|
||||
here: https://github.com/TsinghuaAI/CPM-Generate
|
||||
|
||||
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
|
||||
|
||||
|
|
|
|||
|
|
@ -46,7 +46,8 @@ Tips:
|
|||
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
||||
this argument.
|
||||
|
||||
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
|
||||
This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
|
||||
`here <https://github.com/salesforce/ctrl>`__.
|
||||
|
||||
|
||||
CTRLConfig
|
||||
|
|
|
|||
|
|
@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
|
|||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
DebertaConfig
|
||||
|
|
|
|||
|
|
@ -58,7 +58,8 @@ New in v2:
|
|||
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
||||
performance of downstream tasks.
|
||||
|
||||
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
DebertaV2Config
|
||||
|
|
|
|||
|
|
@ -73,6 +73,8 @@ Tips:
|
|||
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
|
||||
prepare images for the model.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.
|
||||
|
||||
|
||||
DeiTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -44,7 +44,7 @@ Tips:
|
|||
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
|
||||
necessary though, just let us know if you need this option.
|
||||
|
||||
The original code can be found `here
|
||||
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab
|
|||
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
|
||||
benchmarks.*
|
||||
|
||||
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
|
||||
This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
|
||||
<https://github.com/facebookresearch/DPR>`__.
|
||||
|
||||
|
||||
DPRConfig
|
||||
|
|
|
|||
|
|
@ -54,7 +54,8 @@ Tips:
|
|||
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
|
||||
doesn't exist in the generator).
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/electra>`__.
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||
<https://github.com/google-research/electra>`__.
|
||||
|
||||
|
||||
ElectraConfig
|
||||
|
|
|
|||
|
|
@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
|
|||
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
|
||||
community for further reproducible experiments in French NLP.*
|
||||
|
||||
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
|
||||
This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
|
||||
<https://github.com/getalp/Flaubert>`__.
|
||||
|
||||
|
||||
FlaubertConfig
|
||||
|
|
|
|||
|
|
@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
|
|||
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
|
||||
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
|
||||
|
||||
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
|
||||
This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
|
||||
|
||||
Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -49,7 +49,8 @@ Tips:
|
|||
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
|
||||
:class:`~transformers.FunnelForMultipleChoice`.
|
||||
|
||||
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
|
||||
This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
|
||||
<https://github.com/laiguokun/Funnel-Transformer>`__.
|
||||
|
||||
|
||||
FunnelConfig
|
||||
|
|
|
|||
|
|
@ -45,7 +45,8 @@ Tips:
|
|||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
|
||||
showcasing the generative capabilities of several models. GPT is one of them.
|
||||
|
||||
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/openai/finetune-transformer-lm>`__.
|
||||
|
||||
Note:
|
||||
|
||||
|
|
|
|||
|
|
@ -45,7 +45,8 @@ Tips:
|
|||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
|
||||
|
||||
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://openai.com/blog/better-language-models/>`__.
|
||||
|
||||
|
||||
GPT2Config
|
||||
|
|
|
|||
|
|
@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c
|
|||
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
||||
256 tokens.
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||
|
||||
Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
|
|
|
|||
|
|
@ -56,7 +56,9 @@ Examples of use:
|
|||
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
|
||||
This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
|
||||
`here <https://github.com/allegro/HerBERT>`__.
|
||||
|
||||
|
||||
HerbertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE
|
|||
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
||||
been open-sourced.*
|
||||
|
||||
This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
|
||||
<https://github.com/kssteven418/I-BERT>`__.
|
||||
|
||||
The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
|
||||
|
||||
IBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
|
|||
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
|
||||
It includes an inference part, which shows how to use Google's Tesseract on a new document.
|
||||
|
||||
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
|
||||
This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
|
||||
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
|
||||
|
||||
|
||||
LayoutLMConfig
|
||||
|
|
|
|||
|
|
@ -53,6 +53,8 @@ Tips:
|
|||
- A notebook showing how to fine-tune LED, can be accessed `here
|
||||
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||
|
||||
|
||||
LEDConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -40,7 +40,8 @@ Tips:
|
|||
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
|
||||
:obj:`</s>`).
|
||||
|
||||
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
|
||||
This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
|
||||
<https://github.com/allenai/longformer>`__.
|
||||
|
||||
Longformer Self Attention
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -52,7 +52,8 @@ Tips:
|
|||
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
|
||||
both self attention outputs are disregarded.
|
||||
|
||||
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
|
||||
This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
|
||||
<https://github.com/airsplay/lxmert>`__.
|
||||
|
||||
|
||||
LxmertConfig
|
||||
|
|
|
|||
|
|
@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga
|
|||
translating between non-English directions while performing competitively to the best single systems of WMT. We
|
||||
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||
|
||||
|
||||
Training and Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
|
|
|||
|
|
@ -37,6 +37,7 @@ Implementation Notes
|
|||
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
|
||||
:obj:`<s/>`),
|
||||
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
|
||||
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
|
||||
|
||||
Naming
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me
|
|||
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
||||
on the encoder, decoder, or reconstructing parts of the text.
|
||||
|
||||
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
|
||||
|
||||
Training of MBart
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
|
|
|||
|
|
@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder
|
|||
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
|
||||
|
||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
||||
approach using "tensor parallel" and "pipeline parallel" techniques.
|
||||
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
|
||||
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
|
||||
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
|
||||
"pipeline parallel" techniques.
|
||||
|
||||
MegatronBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder
|
|||
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
|
||||
|
||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
||||
approach using "tensor parallel" and "pipeline parallel" techniques.
|
||||
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
|
||||
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
|
||||
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
|
||||
"pipeline parallel" techniques.
|
||||
|
||||
|
|
|
|||
|
|
@ -44,7 +44,8 @@ Tips:
|
|||
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
|
||||
with a causal language modeling (CLM) objective are better in that regard.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
|
||||
This model was contributed by `vshampor <https://huggingface.co/vshampor>`__. The original code can be found `here
|
||||
<https://github.com/google-research/mobilebert>`__.
|
||||
|
||||
MobileBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
|
|||
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
|
||||
benchmarks. All of the code and model checkpoints*
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/multilingual-t5>`__.
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://github.com/google-research/multilingual-t5>`__.
|
||||
|
||||
MT5Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -31,7 +31,8 @@ According to the abstract,
|
|||
extractive summary.
|
||||
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
|
||||
|
||||
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
|
||||
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
|
||||
<https://github.com/google-research/pegasus>`__.
|
||||
|
||||
|
||||
Checkpoints
|
||||
|
|
|
|||
|
|
@ -50,7 +50,7 @@ Example of use:
|
|||
>>> # phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
||||
|
||||
PhobertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -43,6 +43,7 @@ outperforming parametric seq2seq models and task-specific retrieve-and-extract a
|
|||
tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art
|
||||
parametric-only seq2seq baseline.*
|
||||
|
||||
This model was contributed by `ola13 <https://huggingface.co/ola13>`__.
|
||||
|
||||
|
||||
RagConfig
|
||||
|
|
|
|||
|
|
@ -32,7 +32,8 @@ layers instead of the standard residuals, which allows storing activations only
|
|||
N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models
|
||||
while being much more memory-efficient and much faster on long sequences.*
|
||||
|
||||
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The Authors' code can be
|
||||
found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
|
||||
|
||||
Axial Positional Encodings
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -20,8 +20,8 @@ The RetriBERT model was proposed in the blog post `Explain Anything Like I'm Fiv
|
|||
Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a small model that uses either a single or
|
||||
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
|
||||
|
||||
Code to train and use the model can be found `here
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
||||
This model was contributed by `yjernite <https://huggingface.co/yjernite>`__. Code to train and use the model can be
|
||||
found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
||||
|
||||
|
||||
RetriBertConfig
|
||||
|
|
|
|||
|
|
@ -44,7 +44,8 @@ Tips:
|
|||
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
|
||||
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
||||
This model was contributed by `julien-c <https://huggingface.co/julien-c>`__. The original code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
||||
|
||||
|
||||
RobertaConfig
|
||||
|
|
|
|||
|
|
@ -25,7 +25,8 @@ transcripts/translations autoregressively. Speech2Text has been fine-tuned on se
|
|||
`LibriSpeech <http://www.openslr.org/12>`__, `CoVoST 2 <https://github.com/facebookresearch/covost>`__, `MuST-C
|
||||
<https://ict.fbk.eu/must-c/>`__.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text>`__.
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The original code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text>`__.
|
||||
|
||||
|
||||
Inference
|
||||
|
|
|
|||
|
|
@ -47,6 +47,9 @@ Tips:
|
|||
- For best results when finetuning on sequence classification tasks, it is recommended to start with the
|
||||
`squeezebert/squeezebert-mnli-headless` checkpoint.
|
||||
|
||||
This model was contributed by `forresti <https://huggingface.co/forresti>`__.
|
||||
|
||||
|
||||
SqueezeBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
|
|||
|
|
@ -48,7 +48,8 @@ Tips:
|
|||
layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar embeddings.
|
||||
Encoder input padding can be done on the left and on the right.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/google-research/text-to-text-transfer-transformer>`__.
|
||||
|
||||
Training
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -49,7 +49,8 @@ entailment (a binary classification task). For more details, see their follow-up
|
|||
intermediate pre-training <https://www.aclweb.org/anthology/2020.findings-emnlp.27/>`__ by Julian Martin Eisenschlos,
|
||||
Syrine Krichene and Thomas Müller.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/tapas>`__.
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/google-research/tapas>`__.
|
||||
|
||||
Tips:
|
||||
|
||||
|
|
|
|||
|
|
@ -41,7 +41,8 @@ Tips:
|
|||
original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
|
||||
- Transformer-XL is one of the few models that has no sequence length limit.
|
||||
|
||||
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/kimiyoung/transformer-xl>`__.
|
||||
|
||||
|
||||
TransfoXLConfig
|
||||
|
|
|
|||
|
|
@ -67,7 +67,8 @@ Tips:
|
|||
improvement of 2% to training from scratch, but still 4% behind supervised pre-training.
|
||||
|
||||
|
||||
The original code (written in JAX) can be found `here <https://github.com/google-research/vision_transformer>`__.
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code (written in JAX) can be
|
||||
found `here <https://github.com/google-research/vision_transformer>`__.
|
||||
|
||||
Note that we converted the weights from Ross Wightman's `timm library
|
||||
<https://github.com/rwightman/pytorch-image-models>`__, who already converted the weights from JAX to PyTorch. Credits
|
||||
|
|
|
|||
|
|
@ -36,6 +36,8 @@ Tips:
|
|||
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded
|
||||
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||
|
||||
|
||||
Wav2Vec2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
|||
|
|
@ -42,7 +42,8 @@ Tips:
|
|||
- XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the :doc:`multi-lingual
|
||||
<../multilingual>` page for more information.
|
||||
|
||||
The original code can be found `here <https://github.com/facebookresearch/XLM/>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/facebookresearch/XLM/>`__.
|
||||
|
||||
|
||||
XLMConfig
|
||||
|
|
|
|||
|
|
@ -44,7 +44,8 @@ Tips:
|
|||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
|
||||
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
|
||||
|
||||
|
||||
XLMRobertaConfig
|
||||
|
|
|
|||
|
|
@ -44,7 +44,8 @@ Tips:
|
|||
`examples/text-generation/run_generation.py`)
|
||||
- XLNet is one of the few models that has no sequence length limit.
|
||||
|
||||
The original code can be found `here <https://github.com/zihangdai/xlnet/>`__.
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/zihangdai/xlnet/>`__.
|
||||
|
||||
|
||||
XLNetConfig
|
||||
|
|
|
|||
|
|
@ -27,6 +27,10 @@ Tips:
|
|||
|
||||
<INSERT TIPS ABOUT MODEL HERE>
|
||||
|
||||
This model was contributed by `<INSERT YOUR HF USERNAME HERE>
|
||||
<https://huggingface.co/<INSERT YOUR HF USERNAME HERE>>`__. The original code can be found `here
|
||||
<<INSERT LINK TO GITHUB REPO HERE>>`__.
|
||||
|
||||
{{cookiecutter.camelcase_modelname}}Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue