transformers/docs/source/model_doc/bert.rst

BERT
----------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~

The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer
pre-trained using a combination of masked language modeling objective and next sentence prediction
on a large corpus comprising the Toronto Book Corpus and Wikipedia.

The abstract from the paper is the following:

*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations
from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional
representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result,
the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models
for a wide range of tasks, such as question answering and language inference, without substantial task-specific
architecture modifications.*

*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*

Tips:

- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
  the right rather than the left.
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked
  tokens and at NLU in general, but is not optimal for text generation.

The original code can be found `here <https://github.com/google-research/bert>`_.

BertConfig
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertConfig
    :members:


BertTokenizer
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertTokenizer
    :members: build_inputs_with_special_tokens, get_special_tokens_mask,
        create_token_type_ids_from_sequences, save_vocabulary


BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertTokenizerFast
    :members:


Bert specific outputs
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.modeling_bert.BertForPreTrainingOutput
    :members:

.. autoclass:: transformers.modeling_tf_bert.TFBertForPreTrainingOutput
    :members:


BertModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertModel
    :members:


BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForPreTraining
    :members:


BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForMaskedLM
    :members:


BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForNextSentencePrediction
    :members:


BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForSequenceClassification
    :members:


BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForMultipleChoice
    :members:


BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForTokenClassification
    :members:


BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForQuestionAnswering
    :members:


TFBertModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertModel
    :members:


TFBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForPreTraining
    :members:


TFBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForMaskedLM
    :members:


TFBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForNextSentencePrediction
    :members:


TFBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForSequenceClassification
    :members:


TFBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForMultipleChoice
    :members:


TFBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForTokenClassification
    :members:


TFBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFBertForQuestionAnswering
    :members:
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`BERT`
			`----------------------------------------------------`

BERT PyTorch models 2020-01-16 19:45:02 +00:00			`Overview`
			`~~~~~~~~~~~~~~~~~~~~~`

			The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
			`by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer`
			`pre-trained using a combination of masked language modeling objective and next sentence prediction`
			`on a large corpus comprising the Toronto Book Corpus and Wikipedia.`

			`The abstract from the paper is the following:`

			`*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations`
			`from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional`
			`representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result,`
			`the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models`
			`for a wide range of tasks, such as question answering and language inference, without substantial task-specific`
			`architecture modifications.*`

			`*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural`
			`language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI`
			`accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute`
			`improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*`

			`Tips:`

			`- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on`
			`the right rather than the left.`
Remove outdated BERT tips (#6217) * Remove out-dated BERT tips * Update modeling_outputs.py * Update bert.rst * Update bert.rst 2020-08-03 17:17:56 +00:00			`- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked`
			`tokens and at NLU in general, but is not optimal for text generation.`
BERT PyTorch models 2020-01-16 19:45:02 +00:00
[Docs] Add DialoGPT (#3755) * add dialoGPT * update README.md * fix conflict * update readme * add code links to docs * Update README.md * Update dialo_gpt2.rst * Update pretrained_models.rst * Update docs/source/model_doc/dialo_gpt2.rst Co-Authored-By: Julien Chaumond <chaumond@gmail.com> * change filename of dialogpt Co-authored-by: Julien Chaumond <chaumond@gmail.com> 2020-04-16 07:04:32 +00:00			The original code can be found `here <https://github.com/google-research/bert>`_.

BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertConfig`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertConfig`
Tokenizers and Config classes are referenced. 2019-07-05 21:44:59 +00:00			`:members:`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00

BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertTokenizer`
Tokenizers and Config classes are referenced. 2019-07-05 21:44:59 +00:00			`~~~~~~~~~~~~~~~~~~~~~`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00
[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertTokenizer`
Documentation (#2989) * All Tokenizers BertTokenizer + few fixes RobertaTokenizer OpenAIGPTTokenizer + Fixes GPT2Tokenizer + fixes TransfoXLTokenizer Correct rst for TransformerXL XLMTokenizer + fixes XLNet Tokenizer + Style DistilBERT + Fix XLNet RST CTRLTokenizer CamemBERT Tokenizer FlaubertTokenizer XLMRobertaTokenizer cleanup * cleanup 2020-02-25 23:43:36 +00:00			`:members: build_inputs_with_special_tokens, get_special_tokens_mask,`
			`create_token_type_ids_from_sequences, save_vocabulary`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00

Cleanup fast tokenizers integration (#3706) * First pass on utility classes and python tokenizers * finishing cleanup pass * style and quality * Fix tests * Updating following @mfuntowicz comment * style and quality * Fix Roberta * fix batch_size/seq_length inBatchEncoding * add alignement methods + tests * Fix OpenAI and Transfo-XL tokenizers * adding trim_offsets=True default for GPT2 et RoBERTa * style and quality * fix tests * add_prefix_space in roberta * bump up tokenizers to rc7 * style * unfortunately tensorfow does like these - removing shape/seq_len for now * Update src/transformers/tokenization_utils.py Co-Authored-By: Stefan Schweter <stefan@schweter.it> * Adding doc and docstrings * making flake8 happy Co-authored-by: Stefan Schweter <stefan@schweter.it> 2020-04-18 11:43:57 +00:00			`BertTokenizerFast`
			`~~~~~~~~~~~~~~~~~~~~~`

			`.. autoclass:: transformers.BertTokenizerFast`
			`:members:`


Document model outputs (#5673) * Document model outputs * Update docs/source/main_classes/output.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> 2020-07-10 21:31:02 +00:00			`Bert specific outputs`
			`~~~~~~~~~~~~~~~~~~~~~`

Tf model outputs (#6247) * TF outputs and test on BERT * Albert to DistilBert * All remaining TF models except T5 * Documentation * One file forgotten * TF outputs and test on BERT * Albert to DistilBert * All remaining TF models except T5 * Documentation * One file forgotten * Add new models and fix issues * Quality improvements * Add T5 * A bit of cleanup * Fix for slow tests * Style 2020-08-05 15:34:39 +00:00			`.. autoclass:: transformers.modeling_bert.BertForPreTrainingOutput`
			`:members:`

			`.. autoclass:: transformers.modeling_tf_bert.TFBertForPreTrainingOutput`
Document model outputs (#5673) * Document model outputs * Update docs/source/main_classes/output.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> 2020-07-10 21:31:02 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertModel`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertModel`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForPreTraining`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForPreTraining`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForMaskedLM`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForMaskedLM`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForNextSentencePrediction`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForNextSentencePrediction`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForSequenceClassification`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForSequenceClassification`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForMultipleChoice`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForMultipleChoice`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForTokenClassification`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForTokenClassification`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`BertForQuestionAnswering`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[BIG] pytorch-transformers => transformers 2019-09-26 08:15:53 +00:00			`.. autoclass:: transformers.BertForQuestionAnswering`
Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 21:35:26 +00:00			`:members:`

TF models added to documentation 2019-09-25 10:31:05 +00:00
BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertModel`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertModel`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForPreTraining`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForPreTraining`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForMaskedLM`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForMaskedLM`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForNextSentencePrediction`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForNextSentencePrediction`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForSequenceClassification`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForSequenceClassification`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForMultipleChoice`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForMultipleChoice`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForTokenClassification`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForTokenClassification`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`


BERT PyTorch models 2020-01-16 19:45:02 +00:00			`TFBertForQuestionAnswering`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

[doc] pytorch_transformers -> transformers 2019-09-26 12:47:15 +00:00			`.. autoclass:: transformers.TFBertForQuestionAnswering`
TF models added to documentation 2019-09-25 10:31:05 +00:00			`:members:`