From 74712e22f3332655457be71aa36a495d5f5af633 Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Wed, 21 Apr 2021 09:47:27 -0400 Subject: [PATCH] Honor contributors to models (#11329) * Honor contributors to models * Fix typo * Address review comments * Add more authors --- docs/source/model_doc/albert.rst | 3 ++- docs/source/model_doc/bart.rst | 3 ++- docs/source/model_doc/barthez.rst | 5 +++-- docs/source/model_doc/bert.rst | 3 ++- docs/source/model_doc/bert_japanese.rst | 2 ++ docs/source/model_doc/bertgeneration.rst | 3 ++- docs/source/model_doc/bertweet.rst | 4 ++-- docs/source/model_doc/bigbird.rst | 3 ++- docs/source/model_doc/blenderbot.rst | 3 ++- docs/source/model_doc/blenderbot_small.rst | 3 ++- docs/source/model_doc/bort.rst | 3 ++- docs/source/model_doc/camembert.rst | 3 ++- docs/source/model_doc/convbert.rst | 6 ++++-- docs/source/model_doc/cpm.rst | 3 ++- docs/source/model_doc/ctrl.rst | 3 ++- docs/source/model_doc/deberta.rst | 3 ++- docs/source/model_doc/deberta_v2.rst | 3 ++- docs/source/model_doc/deit.rst | 2 ++ docs/source/model_doc/distilbert.rst | 2 +- docs/source/model_doc/dpr.rst | 3 ++- docs/source/model_doc/electra.rst | 3 ++- docs/source/model_doc/flaubert.rst | 3 ++- docs/source/model_doc/fsmt.rst | 3 ++- docs/source/model_doc/funnel.rst | 3 ++- docs/source/model_doc/gpt.rst | 3 ++- docs/source/model_doc/gpt2.rst | 3 ++- docs/source/model_doc/gpt_neo.rst | 2 ++ docs/source/model_doc/herbert.rst | 4 +++- docs/source/model_doc/ibert.rst | 3 ++- docs/source/model_doc/layoutlm.rst | 3 ++- docs/source/model_doc/led.rst | 2 ++ docs/source/model_doc/longformer.rst | 3 ++- docs/source/model_doc/lxmert.rst | 3 ++- docs/source/model_doc/m2m_100.rst | 2 ++ docs/source/model_doc/marian.rst | 1 + docs/source/model_doc/mbart.rst | 3 ++- docs/source/model_doc/megatron_bert.rst | 7 ++++--- docs/source/model_doc/megatron_gpt2.rst | 7 ++++--- docs/source/model_doc/mobilebert.rst | 3 ++- docs/source/model_doc/mt5.rst | 3 ++- docs/source/model_doc/pegasus.rst | 3 ++- docs/source/model_doc/phobert.rst | 2 +- docs/source/model_doc/rag.rst | 1 + docs/source/model_doc/reformer.rst | 3 ++- docs/source/model_doc/retribert.rst | 4 ++-- docs/source/model_doc/roberta.rst | 3 ++- docs/source/model_doc/speech_to_text.rst | 3 ++- docs/source/model_doc/squeezebert.rst | 3 +++ docs/source/model_doc/t5.rst | 3 ++- docs/source/model_doc/tapas.rst | 3 ++- docs/source/model_doc/transformerxl.rst | 3 ++- docs/source/model_doc/vit.rst | 3 ++- docs/source/model_doc/wav2vec2.rst | 2 ++ docs/source/model_doc/xlm.rst | 3 ++- docs/source/model_doc/xlmroberta.rst | 3 ++- docs/source/model_doc/xlnet.rst | 3 ++- .../{{cookiecutter.lowercase_modelname}}.rst | 4 ++++ 57 files changed, 121 insertions(+), 55 deletions(-) diff --git a/docs/source/model_doc/albert.rst b/docs/source/model_doc/albert.rst index 256695df9..c4b4eac02 100644 --- a/docs/source/model_doc/albert.rst +++ b/docs/source/model_doc/albert.rst @@ -43,7 +43,8 @@ Tips: similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers. -The original code can be found `here `__. +This model was contributed by `lysandre `__. The original code can be found `here +`__. AlbertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bart.rst b/docs/source/model_doc/bart.rst index 3e754f67c..0c2ccda20 100644 --- a/docs/source/model_doc/bart.rst +++ b/docs/source/model_doc/bart.rst @@ -35,7 +35,8 @@ According to the abstract, state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. -The Authors' code can be found `here `__. +This model was contributed by `sshleifer `__. The Authors' code can be found `here +`__. Examples diff --git a/docs/source/model_doc/barthez.rst b/docs/source/model_doc/barthez.rst index 3b360e30f..5188d666c 100644 --- a/docs/source/model_doc/barthez.rst +++ b/docs/source/model_doc/barthez.rst @@ -16,7 +16,7 @@ BARThez Overview ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model` +The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model `__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct, 2020. @@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez, provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.* -The Authors' code can be found `here `__. +This model was contributed by `moussakam `__. The Authors' code can be found `here +`__. Examples diff --git a/docs/source/model_doc/bert.rst b/docs/source/model_doc/bert.rst index 658006f54..497f04638 100644 --- a/docs/source/model_doc/bert.rst +++ b/docs/source/model_doc/bert.rst @@ -42,7 +42,8 @@ Tips: - BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. BertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bert_japanese.rst b/docs/source/model_doc/bert_japanese.rst index 586d26ed6..f9c37dec4 100644 --- a/docs/source/model_doc/bert_japanese.rst +++ b/docs/source/model_doc/bert_japanese.rst @@ -71,6 +71,8 @@ Tips: - This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT ` for more usage examples. +This model was contributed by `cl-tohoku `__. + BertJapaneseTokenizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bertgeneration.rst b/docs/source/model_doc/bertgeneration.rst index 686b1b838..f9e34cf76 100644 --- a/docs/source/model_doc/bertgeneration.rst +++ b/docs/source/model_doc/bertgeneration.rst @@ -79,7 +79,8 @@ Tips: - For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input. Therefore, no EOS token should be added to the end of the input. -The original code can be found `here `__. +This model was contributed by `patrickvonplaten `__. The original code can be +found `here `__. BertGenerationConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bertweet.rst b/docs/source/model_doc/bertweet.rst index 215746fca..6a66c3202 100644 --- a/docs/source/model_doc/bertweet.rst +++ b/docs/source/model_doc/bertweet.rst @@ -54,8 +54,8 @@ Example of use: >>> # from transformers import TFAutoModel >>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base") - -The original code can be found `here `__. +This model was contributed by `dqnguyen `__. The original code can be found `here +`__. BertweetTokenizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bigbird.rst b/docs/source/model_doc/bigbird.rst index b3c2c5d2a..300bfe68c 100644 --- a/docs/source/model_doc/bigbird.rst +++ b/docs/source/model_doc/bigbird.rst @@ -50,7 +50,8 @@ Tips: - Current implementation supports only **ITC**. - Current implementation doesn't support **num_random_blocks = 0** -The original code can be found `here `__. +This model was contributed by `vasudevgupta `__. The original code can be found +`here `__. BigBirdConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/blenderbot.rst b/docs/source/model_doc/blenderbot.rst index 4a13199d6..fbed715cb 100644 --- a/docs/source/model_doc/blenderbot.rst +++ b/docs/source/model_doc/blenderbot.rst @@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.* -The authors' code can be found `here `__ . +This model was contributed by `sshleifer `__. The authors' code can be found `here +`__ . Implementation Notes diff --git a/docs/source/model_doc/blenderbot_small.rst b/docs/source/model_doc/blenderbot_small.rst index 9eb2a5c0e..4d2a5339c 100644 --- a/docs/source/model_doc/blenderbot_small.rst +++ b/docs/source/model_doc/blenderbot_small.rst @@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.* -The authors' code can be found `here `__ . +This model was contributed by `patrickvonplaten `__. The authors' code can be +found `here `__ . BlenderbotSmallConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/bort.rst b/docs/source/model_doc/bort.rst index 14b5df79c..ec6e57166 100644 --- a/docs/source/model_doc/bort.rst +++ b/docs/source/model_doc/bort.rst @@ -43,4 +43,5 @@ Tips: that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the algorithm to make BORT fine-tuning work. -The original code can be found `here `__. +This model was contributed by `stefan-it `__. The original code can be found `here +`__. diff --git a/docs/source/model_doc/camembert.rst b/docs/source/model_doc/camembert.rst index c8f7d7998..7654d0037 100644 --- a/docs/source/model_doc/camembert.rst +++ b/docs/source/model_doc/camembert.rst @@ -37,7 +37,8 @@ Tips: - This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa ` for usage examples as well as the information relative to the inputs and outputs. -The original code can be found `here `__. +This model was contributed by `camembert `__. The original code can be found `here +`__. CamembertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/convbert.rst b/docs/source/model_doc/convbert.rst index 69f747335..133a44dad 100644 --- a/docs/source/model_doc/convbert.rst +++ b/docs/source/model_doc/convbert.rst @@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while using less than 1/4 training cost. Code and pre-trained models will be released.* -ConvBERT training tips are similar to those of BERT. The original implementation can be found here: -https://github.com/yitu-opensource/ConvBert +ConvBERT training tips are similar to those of BERT. + +This model was contributed by `abhishek `__. The original implementation can be found +here: https://github.com/yitu-opensource/ConvBert ConvBertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/cpm.rst b/docs/source/model_doc/cpm.rst index e1380f4a9..e12d215e9 100644 --- a/docs/source/model_doc/cpm.rst +++ b/docs/source/model_doc/cpm.rst @@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning.* -The original implementation can be found here: https://github.com/TsinghuaAI/CPM-Generate +This model was contributed by `canwenxu `__. The original implementation can be found +here: https://github.com/TsinghuaAI/CPM-Generate Note: We only have a tokenizer here, since the model architecture is the same as GPT-2. diff --git a/docs/source/model_doc/ctrl.rst b/docs/source/model_doc/ctrl.rst index 94b7a61ca..aa426b32f 100644 --- a/docs/source/model_doc/ctrl.rst +++ b/docs/source/model_doc/ctrl.rst @@ -46,7 +46,8 @@ Tips: `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of this argument. -The original code can be found `here `__. +This model was contributed by `keskarnitishr `__. The original code can be found +`here `__. CTRLConfig diff --git a/docs/source/model_doc/deberta.rst b/docs/source/model_doc/deberta.rst index 027e5f916..37e0d4a37 100644 --- a/docs/source/model_doc/deberta.rst +++ b/docs/source/model_doc/deberta.rst @@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.* -The original code can be found `here `__. +This model was contributed by `DeBERTa `__. The original code can be found `here +`__. DebertaConfig diff --git a/docs/source/model_doc/deberta_v2.rst b/docs/source/model_doc/deberta_v2.rst index 45eadb4d4..9075129a7 100644 --- a/docs/source/model_doc/deberta_v2.rst +++ b/docs/source/model_doc/deberta_v2.rst @@ -58,7 +58,8 @@ New in v2: - **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the performance of downstream tasks. -The original code can be found `here `__. +This model was contributed by `DeBERTa `__. The original code can be found `here +`__. DebertaV2Config diff --git a/docs/source/model_doc/deit.rst b/docs/source/model_doc/deit.rst index add47b591..edf164434 100644 --- a/docs/source/model_doc/deit.rst +++ b/docs/source/model_doc/deit.rst @@ -73,6 +73,8 @@ Tips: `facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to prepare images for the model. +This model was contributed by `nielsr `__. + DeiTConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/distilbert.rst b/docs/source/model_doc/distilbert.rst index 06d1f5a6d..b67287ca9 100644 --- a/docs/source/model_doc/distilbert.rst +++ b/docs/source/model_doc/distilbert.rst @@ -44,7 +44,7 @@ Tips: - DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if necessary though, just let us know if you need this option. -The original code can be found `here +This model was contributed by `victorsanh `__. The original code can be found `here `__. diff --git a/docs/source/model_doc/dpr.rst b/docs/source/model_doc/dpr.rst index 285450839..005faf8cf 100644 --- a/docs/source/model_doc/dpr.rst +++ b/docs/source/model_doc/dpr.rst @@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.* -The original code can be found `here `__. +This model was contributed by `lhoestq `__. The original code can be found `here +`__. DPRConfig diff --git a/docs/source/model_doc/electra.rst b/docs/source/model_doc/electra.rst index e2f450f98..a332b1fd8 100644 --- a/docs/source/model_doc/electra.rst +++ b/docs/source/model_doc/electra.rst @@ -54,7 +54,8 @@ Tips: :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it doesn't exist in the generator). -The original code can be found `here `__. +This model was contributed by `lysandre `__. The original code can be found `here +`__. ElectraConfig diff --git a/docs/source/model_doc/flaubert.rst b/docs/source/model_doc/flaubert.rst index 3d2d21d5d..734e01ce9 100644 --- a/docs/source/model_doc/flaubert.rst +++ b/docs/source/model_doc/flaubert.rst @@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research community for further reproducible experiments in French NLP.* -The original code can be found `here `__. +This model was contributed by `formiel `__. The original code can be found `here +`__. FlaubertConfig diff --git a/docs/source/model_doc/fsmt.rst b/docs/source/model_doc/fsmt.rst index c60909f88..61323d76c 100644 --- a/docs/source/model_doc/fsmt.rst +++ b/docs/source/model_doc/fsmt.rst @@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations. This system improves upon our WMT'18 submission by 4.5 BLEU points.* -The original code can be found here __. +This model was contributed by `stas `__. The original code can be found here +__. Implementation Notes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/funnel.rst b/docs/source/model_doc/funnel.rst index c9a9f4c87..e473bbec6 100644 --- a/docs/source/model_doc/funnel.rst +++ b/docs/source/model_doc/funnel.rst @@ -49,7 +49,8 @@ Tips: :class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and :class:`~transformers.FunnelForMultipleChoice`. -The original code can be found `here `__. +This model was contributed by `sgugger `__. The original code can be found `here +`__. FunnelConfig diff --git a/docs/source/model_doc/gpt.rst b/docs/source/model_doc/gpt.rst index 8b72fdd69..29706592c 100644 --- a/docs/source/model_doc/gpt.rst +++ b/docs/source/model_doc/gpt.rst @@ -45,7 +45,8 @@ Tips: `Write With Transformer `__ is a webapp created and hosted by Hugging Face showcasing the generative capabilities of several models. GPT is one of them. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. Note: diff --git a/docs/source/model_doc/gpt2.rst b/docs/source/model_doc/gpt2.rst index c74b963f9..1f4ae099b 100644 --- a/docs/source/model_doc/gpt2.rst +++ b/docs/source/model_doc/gpt2.rst @@ -45,7 +45,8 @@ Tips: Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. GPT2Config diff --git a/docs/source/model_doc/gpt_neo.rst b/docs/source/model_doc/gpt_neo.rst index 3a164ee87..2c235cd48 100644 --- a/docs/source/model_doc/gpt_neo.rst +++ b/docs/source/model_doc/gpt_neo.rst @@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of 256 tokens. +This model was contributed by `valhalla `__. + Generation _______________________________________________________________________________________________________________________ diff --git a/docs/source/model_doc/herbert.rst b/docs/source/model_doc/herbert.rst index 8f237a21c..a931566d0 100644 --- a/docs/source/model_doc/herbert.rst +++ b/docs/source/model_doc/herbert.rst @@ -56,7 +56,9 @@ Examples of use: >>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1") -The original code can be found `here `__. +This model was contributed by `rmroczkowski `__. The original code can be found +`here `__. + HerbertTokenizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/ibert.rst b/docs/source/model_doc/ibert.rst index 1fd3d369b..e3c8428d0 100644 --- a/docs/source/model_doc/ibert.rst +++ b/docs/source/model_doc/ibert.rst @@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has been open-sourced.* +This model was contributed by `kssteven `__. The original code can be found `here +`__. -The original code can be found `here `__. IBertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/layoutlm.rst b/docs/source/model_doc/layoutlm.rst index 6c537f236..81ff49cd5 100644 --- a/docs/source/model_doc/layoutlm.rst +++ b/docs/source/model_doc/layoutlm.rst @@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e `__. It includes an inference part, which shows how to use Google's Tesseract on a new document. -The original code can be found `here `_. +This model was contributed by `liminghao1630 `__. The original code can be found +`here `_. LayoutLMConfig diff --git a/docs/source/model_doc/led.rst b/docs/source/model_doc/led.rst index 83a938616..2e05163d3 100644 --- a/docs/source/model_doc/led.rst +++ b/docs/source/model_doc/led.rst @@ -53,6 +53,8 @@ Tips: - A notebook showing how to fine-tune LED, can be accessed `here `__. +This model was contributed by `patrickvonplaten `__. + LEDConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/longformer.rst b/docs/source/model_doc/longformer.rst index e9c5b5054..d6fc3e030 100644 --- a/docs/source/model_doc/longformer.rst +++ b/docs/source/model_doc/longformer.rst @@ -40,7 +40,8 @@ Tips: token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:``). -The Authors' code can be found `here `__. +This model was contributed by `beltagy `__. The Authors' code can be found `here +`__. Longformer Self Attention ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/lxmert.rst b/docs/source/model_doc/lxmert.rst index 6b43f2788..4c5fe3b0a 100644 --- a/docs/source/model_doc/lxmert.rst +++ b/docs/source/model_doc/lxmert.rst @@ -52,7 +52,8 @@ Tips: contains self-attention for each respective modality and cross-attention, only the cross attention is returned and both self attention outputs are disregarded. -The original code can be found `here `__. +This model was contributed by `eltoto1219 `__. The original code can be found `here +`__. LxmertConfig diff --git a/docs/source/model_doc/m2m_100.rst b/docs/source/model_doc/m2m_100.rst index 757e198c2..76cc7094b 100644 --- a/docs/source/model_doc/m2m_100.rst +++ b/docs/source/model_doc/m2m_100.rst @@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.* +This model was contributed by `valhalla `__. + Training and Generation _______________________________________________________________________________________________________________________ diff --git a/docs/source/model_doc/marian.rst b/docs/source/model_doc/marian.rst index 51018a4f7..c88e9e5ae 100644 --- a/docs/source/model_doc/marian.rst +++ b/docs/source/model_doc/marian.rst @@ -37,6 +37,7 @@ Implementation Notes - the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses :obj:``), - Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``. +- This model was contributed by `sshleifer `__. Naming ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/mbart.rst b/docs/source/model_doc/mbart.rst index 05631ab0c..a94cd385b 100644 --- a/docs/source/model_doc/mbart.rst +++ b/docs/source/model_doc/mbart.rst @@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. -The Authors' code can be found `here `__ +This model was contributed by `valhalla `__. The Authors' code can be found `here +`__ Training of MBart _______________________________________________________________________________________________________________________ diff --git a/docs/source/model_doc/megatron_bert.rst b/docs/source/model_doc/megatron_bert.rst index 7e6262981..89e690734 100644 --- a/docs/source/model_doc/megatron_bert.rst +++ b/docs/source/model_doc/megatron_bert.rst @@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip -The original code can be found `here `__. That repository contains a multi-GPU -and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel -approach using "tensor parallel" and "pipeline parallel" techniques. +This model was contributed by `jdemouth `__. The original code can be found `here +`__. That repository contains a multi-GPU and multi-node implementation of the +Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and +"pipeline parallel" techniques. MegatronBertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/megatron_gpt2.rst b/docs/source/model_doc/megatron_gpt2.rst index 67ec7227f..4ec7e1b30 100644 --- a/docs/source/model_doc/megatron_gpt2.rst +++ b/docs/source/model_doc/megatron_gpt2.rst @@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip -The original code can be found `here `__. That repository contains a multi-GPU -and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel -approach using "tensor parallel" and "pipeline parallel" techniques. +This model was contributed by `jdemouth `__. The original code can be found `here +`__. That repository contains a multi-GPU and multi-node implementation of the +Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and +"pipeline parallel" techniques. diff --git a/docs/source/model_doc/mobilebert.rst b/docs/source/model_doc/mobilebert.rst index feb203e45..9166e382c 100644 --- a/docs/source/model_doc/mobilebert.rst +++ b/docs/source/model_doc/mobilebert.rst @@ -44,7 +44,8 @@ Tips: efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language modeling (CLM) objective are better in that regard. -The original code can be found `here `__. +This model was contributed by `vshampor `__. The original code can be found `here +`__. MobileBertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/mt5.rst b/docs/source/model_doc/mt5.rst index f6c7af74c..b287d9578 100644 --- a/docs/source/model_doc/mt5.rst +++ b/docs/source/model_doc/mt5.rst @@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints* -The original code can be found `here `__. +This model was contributed by `patrickvonplaten `__. The original code can be +found `here `__. MT5Config ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/pegasus.rst b/docs/source/model_doc/pegasus.rst index 9294d293e..0b180f375 100644 --- a/docs/source/model_doc/pegasus.rst +++ b/docs/source/model_doc/pegasus.rst @@ -31,7 +31,8 @@ According to the abstract, extractive summary. - Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval. -The Authors' code can be found `here `__. +This model was contributed by `sshleifer `__. The Authors' code can be found `here +`__. Checkpoints diff --git a/docs/source/model_doc/phobert.rst b/docs/source/model_doc/phobert.rst index 1d4958286..bb35a460e 100644 --- a/docs/source/model_doc/phobert.rst +++ b/docs/source/model_doc/phobert.rst @@ -50,7 +50,7 @@ Example of use: >>> # phobert = TFAutoModel.from_pretrained("vinai/phobert-base") -The original code can be found `here `__. + This model was contributed by `dqnguyen `__. The original code can be found `here `__. PhobertTokenizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/rag.rst b/docs/source/model_doc/rag.rst index 796b06e73..62acc18e8 100644 --- a/docs/source/model_doc/rag.rst +++ b/docs/source/model_doc/rag.rst @@ -43,6 +43,7 @@ outperforming parametric seq2seq models and task-specific retrieve-and-extract a tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.* +This model was contributed by `ola13 `__. RagConfig diff --git a/docs/source/model_doc/reformer.rst b/docs/source/model_doc/reformer.rst index 9fa45076b..ea48ce536 100644 --- a/docs/source/model_doc/reformer.rst +++ b/docs/source/model_doc/reformer.rst @@ -32,7 +32,8 @@ layers instead of the standard residuals, which allows storing activations only N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.* -The Authors' code can be found `here `__. +This model was contributed by `patrickvonplaten `__. The Authors' code can be +found `here `__. Axial Positional Encodings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/retribert.rst b/docs/source/model_doc/retribert.rst index dbc73eb94..833d19db7 100644 --- a/docs/source/model_doc/retribert.rst +++ b/docs/source/model_doc/retribert.rst @@ -20,8 +20,8 @@ The RetriBERT model was proposed in the blog post `Explain Anything Like I'm Fiv Question Answering `__. RetriBERT is a small model that uses either a single or pair of BERT encoders with lower-dimension projection for dense semantic indexing of text. -Code to train and use the model can be found `here -`__. +This model was contributed by `yjernite `__. Code to train and use the model can be +found `here `__. RetriBertConfig diff --git a/docs/source/model_doc/roberta.rst b/docs/source/model_doc/roberta.rst index b9409a1ee..82ce71178 100644 --- a/docs/source/model_doc/roberta.rst +++ b/docs/source/model_doc/roberta.rst @@ -44,7 +44,8 @@ Tips: separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:``) - :doc:`CamemBERT ` is a wrapper around RoBERTa. Refer to this page for usage examples. -The original code can be found `here `_. +This model was contributed by `julien-c `__. The original code can be found `here +`_. RobertaConfig diff --git a/docs/source/model_doc/speech_to_text.rst b/docs/source/model_doc/speech_to_text.rst index 04b1bbfae..b8de71d66 100644 --- a/docs/source/model_doc/speech_to_text.rst +++ b/docs/source/model_doc/speech_to_text.rst @@ -25,7 +25,8 @@ transcripts/translations autoregressively. Speech2Text has been fine-tuned on se `LibriSpeech `__, `CoVoST 2 `__, `MuST-C `__. -The original code can be found `here `__. +This model was contributed by `valhalla `__. The original code can be found `here +`__. Inference diff --git a/docs/source/model_doc/squeezebert.rst b/docs/source/model_doc/squeezebert.rst index ea2e202a4..9f70cd655 100644 --- a/docs/source/model_doc/squeezebert.rst +++ b/docs/source/model_doc/squeezebert.rst @@ -47,6 +47,9 @@ Tips: - For best results when finetuning on sequence classification tasks, it is recommended to start with the `squeezebert/squeezebert-mnli-headless` checkpoint. +This model was contributed by `forresti `__. + + SqueezeBertConfig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/t5.rst b/docs/source/model_doc/t5.rst index b400401eb..fe8d2c405 100644 --- a/docs/source/model_doc/t5.rst +++ b/docs/source/model_doc/t5.rst @@ -48,7 +48,8 @@ Tips: layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. Training ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/tapas.rst b/docs/source/model_doc/tapas.rst index b50352a61..d1cea3226 100644 --- a/docs/source/model_doc/tapas.rst +++ b/docs/source/model_doc/tapas.rst @@ -49,7 +49,8 @@ entailment (a binary classification task). For more details, see their follow-up intermediate pre-training `__ by Julian Martin Eisenschlos, Syrine Krichene and Thomas Müller. -The original code can be found `here `__. +This model was contributed by `nielsr `__. The original code can be found `here +`__. Tips: diff --git a/docs/source/model_doc/transformerxl.rst b/docs/source/model_doc/transformerxl.rst index 6fcc7073d..df4ebecbf 100644 --- a/docs/source/model_doc/transformerxl.rst +++ b/docs/source/model_doc/transformerxl.rst @@ -41,7 +41,8 @@ Tips: original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left. - Transformer-XL is one of the few models that has no sequence length limit. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. TransfoXLConfig diff --git a/docs/source/model_doc/vit.rst b/docs/source/model_doc/vit.rst index b747a490d..a010a7119 100644 --- a/docs/source/model_doc/vit.rst +++ b/docs/source/model_doc/vit.rst @@ -67,7 +67,8 @@ Tips: improvement of 2% to training from scratch, but still 4% behind supervised pre-training. -The original code (written in JAX) can be found `here `__. +This model was contributed by `nielsr `__. The original code (written in JAX) can be +found `here `__. Note that we converted the weights from Ross Wightman's `timm library `__, who already converted the weights from JAX to PyTorch. Credits diff --git a/docs/source/model_doc/wav2vec2.rst b/docs/source/model_doc/wav2vec2.rst index 63b851afb..cd0b6e0cc 100644 --- a/docs/source/model_doc/wav2vec2.rst +++ b/docs/source/model_doc/wav2vec2.rst @@ -36,6 +36,8 @@ Tips: - Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded using :class:`~transformers.Wav2Vec2CTCTokenizer`. +This model was contributed by `patrickvonplaten `__. + Wav2Vec2Config ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/model_doc/xlm.rst b/docs/source/model_doc/xlm.rst index 4841198e1..5a837714c 100644 --- a/docs/source/model_doc/xlm.rst +++ b/docs/source/model_doc/xlm.rst @@ -42,7 +42,8 @@ Tips: - XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the :doc:`multi-lingual <../multilingual>` page for more information. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. XLMConfig diff --git a/docs/source/model_doc/xlmroberta.rst b/docs/source/model_doc/xlmroberta.rst index c95954a20..c24bbf7f5 100644 --- a/docs/source/model_doc/xlmroberta.rst +++ b/docs/source/model_doc/xlmroberta.rst @@ -44,7 +44,8 @@ Tips: - This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa ` for usage examples as well as the information relative to the inputs and outputs. -The original code can be found `here `__. +This model was contributed by `stefan-it `__. The original code can be found `here +`__. XLMRobertaConfig diff --git a/docs/source/model_doc/xlnet.rst b/docs/source/model_doc/xlnet.rst index bdf8dbeb8..02c557e45 100644 --- a/docs/source/model_doc/xlnet.rst +++ b/docs/source/model_doc/xlnet.rst @@ -44,7 +44,8 @@ Tips: `examples/text-generation/run_generation.py`) - XLNet is one of the few models that has no sequence length limit. -The original code can be found `here `__. +This model was contributed by `thomwolf `__. The original code can be found `here +`__. XLNetConfig diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst index 7510fe44e..7a0573e0b 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst @@ -27,6 +27,10 @@ Tips: +This model was contributed by ` +>`__. The original code can be found `here +<>`__. + {{cookiecutter.camelcase_modelname}}Config ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~