From f012c00adafcf45ca217afb8f192f7e71d51dbe4 Mon Sep 17 00:00:00 2001 From: Mishig Davaadorj Date: Mon, 10 Jan 2022 08:06:14 -0700 Subject: [PATCH] Model summary horizontal banners (#15058) --- docs/source/model_summary.mdx | 54 +++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/docs/source/model_summary.mdx b/docs/source/model_summary.mdx index 2966771cc..c6a99b895 100644 --- a/docs/source/model_summary.mdx +++ b/docs/source/model_summary.mdx @@ -63,12 +63,14 @@ that at each position, the model can only look at the tokens before the attentio ### Original GPT +
Models Doc +
[Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf), Alec Radford et al. @@ -79,12 +81,14 @@ classification. ### GPT-2 +
Models Doc +
[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), Alec Radford et al. @@ -97,12 +101,14 @@ classification. ### CTRL +
Models Doc +
[CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858), Nitish Shirish Keskar et al. @@ -115,12 +121,14 @@ The library provides a version of the model for language modeling only. ### Transformer-XL +
Models Doc +
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860), Zihang Dai et al. @@ -143,12 +151,14 @@ The library provides a version of the model for language modeling only. ### Reformer +
Models Doc +
[Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451), Nikita Kitaev et al . @@ -178,12 +188,14 @@ The library provides a version of the model for language modeling only. ### XLNet +
Models Doc +
[XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237), Zhilin Yang et al. @@ -210,12 +222,14 @@ corrupted versions. ### BERT +
Models Doc +
[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin et al. @@ -236,12 +250,14 @@ token classification, sentence classification, multiple choice classification an ### ALBERT +
Models Doc +
[ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), Zhenzhong Lan et al. @@ -262,12 +278,14 @@ classification, multiple choice classification and question answering. ### RoBERTa +
Models Doc +
[RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692), Yinhan Liu et al. @@ -284,12 +302,14 @@ classification, multiple choice classification and question answering. ### DistilBERT +
Models Doc +
[DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108), Victor Sanh et al. @@ -306,12 +326,14 @@ and question answering. ### ConvBERT +
Models Doc +
[ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496), Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. @@ -333,12 +355,14 @@ and question answering. ### XLM +
Models Doc +
[Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291), Guillaume Lample and Alexis Conneau @@ -364,12 +388,14 @@ question answering. ### XLM-RoBERTa +
Models Doc +
[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), Alexis Conneau et al. @@ -383,12 +409,14 @@ classification, multiple choice classification and question answering. ### FlauBERT +
Models Doc +
[FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372), Hang Le et al. @@ -398,12 +426,14 @@ The library provides a version of the model for language modeling and sentence c ### ELECTRA +
Models Doc +
[ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555), Kevin Clark et al. @@ -419,12 +449,14 @@ classification. ### Funnel Transformer +
Models Doc +
[Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236), Zihang Dai et al. @@ -449,12 +481,14 @@ classification, multiple choice classification and question answering. ### Longformer +
Models Doc +
[Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150), Iz Beltagy et al. @@ -485,12 +519,14 @@ As mentioned before, these models keep both the encoder and the decoder of the o ### BART +
Models Doc +
[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461), Mike Lewis et al. @@ -508,12 +544,14 @@ The library provides a version of this model for conditional generation and sequ ### Pegasus +
Models Doc +
[PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf), Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019. @@ -535,12 +573,14 @@ The library provides a version of this model for conditional generation, which s ### MarianMT +
Models Doc +
[Marian: Fast Neural Machine Translation in C++](https://arxiv.org/abs/1804.00344), Marcin Junczys-Dowmunt et al. @@ -551,12 +591,14 @@ The library provides a version of this model for conditional generation. ### T5 +
Models Doc +
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683), Colin Raffel et al. @@ -580,12 +622,14 @@ The library provides a version of this model for conditional generation. ### MT5 +
Models Doc +
[mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934), Linting Xue et al. @@ -598,12 +642,14 @@ The library provides a version of this model for conditional generation. ### MBart +
Models Doc +
[Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. @@ -624,12 +670,14 @@ finetuning. ### ProphetNet +
Models Doc +
[ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. @@ -646,12 +694,14 @@ summarization. ### XLM-ProphetNet +
Models Doc +
[ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. @@ -696,12 +746,14 @@ Some models use documents retrieval during (pre)training and inference for open- ### DPR +
Models Doc +
[Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906), Vladimir Karpukhin et al. @@ -722,12 +774,14 @@ then it calls the reader with the question and the retrieved documents to get th ### RAG +
Models Doc +
[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401), Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau