mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
Fix en documentation typos (#21799)
* fix wrong url * typos in english documentation
This commit is contained in:
parent
a36983653e
commit
ba2a5f13f7
17 changed files with 19 additions and 19 deletions
|
|
@ -363,7 +363,7 @@ for more information.
|
|||
|
||||
### Develop on Windows
|
||||
|
||||
On Windows (unless you're working in [Windows Subsytem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or WSL), you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:
|
||||
On Windows (unless you're working in [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or WSL), you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:
|
||||
|
||||
```bash
|
||||
git config core.autocrlf input
|
||||
|
|
|
|||
|
|
@ -18,7 +18,7 @@ This page regroups resources around 🤗 Transformers developed by the community
|
|||
| [Fine-tune T5 for Classification and Multiple Choice](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | How to fine-tune T5 for classification and multiple choice tasks using a text-to-text format with PyTorch Lightning | [Suraj Patil](https://github.com/patil-suraj) | [](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) |
|
||||
| [Fine-tune DialoGPT on New Datasets and Languages](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | How to fine-tune the DialoGPT model on a new dataset for open-dialog conversational chatbots | [Nathan Cooper](https://github.com/ncoop57) | [](https://colab.research.google.com/github/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) |
|
||||
| [Long Sequence Modeling with Reformer](https://github.com/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) | How to train on sequences as long as 500,000 tokens with Reformer | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) |
|
||||
| [Fine-tune BART for Summarization](https://github.com/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) | How to fine-tune BART for summarization with fastai using blurr | [Wayde Gilliam](https://ohmeow.com/) | [](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) |
|
||||
| [Fine-tune BART for Summarization](https://github.com/ohmeow/ohmeow_website/blob/master/posts/2021-05-25-mbart-sequence-classification-with-blurr.ipynb) | How to fine-tune BART for summarization with fastai using blurr | [Wayde Gilliam](https://ohmeow.com/) | [](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/posts/2021-05-25-mbart-sequence-classification-with-blurr.ipynb) |
|
||||
| [Fine-tune a pre-trained Transformer on anyone's tweets](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) | How to generate tweets in the style of your favorite Twitter account by fine-tuning a GPT-2 model | [Boris Dayma](https://github.com/borisdayma) | [](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) |
|
||||
| [Optimize 🤗 Hugging Face models with Weights & Biases](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) | A complete tutorial showcasing W&B integration with Hugging Face | [Boris Dayma](https://github.com/borisdayma) | [](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) |
|
||||
| [Pretrain Longformer](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) | How to build a "long" version of existing pretrained models | [Iz Beltagy](https://beltagy.net) | [](https://colab.research.google.com/github/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) |
|
||||
|
|
|
|||
|
|
@ -809,7 +809,7 @@ Stage 0 is disabling all types of sharding and just using DeepSpeed as DDP. You
|
|||
}
|
||||
```
|
||||
|
||||
Ths will essentially disable ZeRO without you needing to change anything else.
|
||||
This will essentially disable ZeRO without you needing to change anything else.
|
||||
|
||||
|
||||
#### ZeRO-1 Config
|
||||
|
|
|
|||
|
|
@ -103,9 +103,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
|||
<PipelineTag pipeline="summarization"/>
|
||||
|
||||
- A blog post on [Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker](https://huggingface.co/blog/sagemaker-distributed-training-seq2seq).
|
||||
- A notebook on how to [finetune BART for summarization with fastai using blurr](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb). 🌎
|
||||
- A notebook on how to [finetune BART for summarization with fastai using blurr](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/posts/2021-05-25-mbart-sequence-classification-with-blurr.ipynb). 🌎
|
||||
- A notebook on how to [finetune BART for summarization in two languages with Trainer class](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb). 🌎
|
||||
- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [noteboook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
|
||||
- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
|
||||
- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb).
|
||||
- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization).
|
||||
- [Summarization](https://huggingface.co/course/chapter7/5?fw=pt#summarization) chapter of the 🤗 Hugging Face course.
|
||||
|
|
|
|||
|
|
@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
|
|||
|
||||
The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf)
|
||||
by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford,
|
||||
Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on
|
||||
Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditioned on
|
||||
an artist, genres and lyrics.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
|
|
|||
|
|
@ -21,7 +21,7 @@ performance, similar to [LayoutLM](layoutlm).
|
|||
|
||||
The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains
|
||||
state-of-the-art results on 2 important benchmarks:
|
||||
- [WebSRC](https://x-lance.github.io/WebSRC/), a dataset for Web-Based Structual Reading Comprehension (a bit like SQuAD but for web pages)
|
||||
- [WebSRC](https://x-lance.github.io/WebSRC/), a dataset for Web-Based Structural Reading Comprehension (a bit like SQuAD but for web pages)
|
||||
- [SWDE](https://www.researchgate.net/publication/221299838_From_one_tree_to_a_forest_a_unified_solution_for_structured_web_data_extraction), a dataset
|
||||
for information extraction from web pages (basically named-entity recogntion on web pages)
|
||||
|
||||
|
|
|
|||
|
|
@ -16,8 +16,8 @@ specific language governing permissions and limitations under the License.
|
|||
|
||||
The SwitchTransformers model was proposed in [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer.
|
||||
|
||||
The Switch Transformer model uses a sparse T5 encoder-decoder architecure, where the MLP are replaced by a Mixture of Experts (MoE). A routing mechanism (top 1 in this case) associates each token to one of the expert, where each expert is a dense MLP. While switch transformers have a lot more weights than their equivalent dense models, the sparsity allows better scaling and better finetuning performance at scale.
|
||||
During a forward pass, only a fraction of the weights are used. The routing mecanism allows the model to select relevant weights on the fly which increases the model capacity without increasing the number of operations.
|
||||
The Switch Transformer model uses a sparse T5 encoder-decoder architecture, where the MLP are replaced by a Mixture of Experts (MoE). A routing mechanism (top 1 in this case) associates each token to one of the expert, where each expert is a dense MLP. While switch transformers have a lot more weights than their equivalent dense models, the sparsity allows better scaling and better finetuning performance at scale.
|
||||
During a forward pass, only a fraction of the weights are used. The routing mechanism allows the model to select relevant weights on the fly which increases the model capacity without increasing the number of operations.
|
||||
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
|
|
|||
|
|
@ -329,7 +329,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
|||
- A notebook to [Finetune T5-base-dutch to perform Dutch abstractive summarization on a TPU](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/T5/Fine_tuning_Dutch_T5_base_on_CNN_Daily_Mail_for_summarization_(on_TPU_using_HuggingFace_Accelerate).ipynb).
|
||||
- A notebook for how to [finetune T5 for summarization in PyTorch and track experiments with WandB](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_summarization_wandb.ipynb#scrollTo=OKRpFvYhBauC). 🌎
|
||||
- A blog post on [Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker](https://huggingface.co/blog/sagemaker-distributed-training-seq2seq).
|
||||
- [`T5ForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [noteboook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
|
||||
- [`T5ForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
|
||||
- [`TFT5ForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb).
|
||||
- [`FlaxT5ForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization).
|
||||
- [Summarization](https://huggingface.co/course/chapter7/5?fw=pt#summarization) chapter of the 🤗 Hugging Face course.
|
||||
|
|
|
|||
|
|
@ -104,7 +104,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
|||
|
||||
🚀 Deploy
|
||||
|
||||
- A blog post on how to [Deploy Serveless XLM RoBERTa on AWS Lambda](https://www.philschmid.de/multilingual-serverless-xlm-roberta-with-huggingface).
|
||||
- A blog post on how to [Deploy Serverless XLM RoBERTa on AWS Lambda](https://www.philschmid.de/multilingual-serverless-xlm-roberta-with-huggingface).
|
||||
|
||||
## XLMRobertaConfig
|
||||
|
||||
|
|
|
|||
|
|
@ -168,7 +168,7 @@ Apply the `group_texts` function over the entire dataset:
|
|||
>>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
|
|
|||
|
|
@ -119,7 +119,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
|||
tokenized_swag = swag.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
`DataCollatorForMultipleChoice` flattens all the model inputs, applies padding, and then unflattens the results:
|
||||
|
||||
|
|
|
|||
|
|
@ -54,7 +54,7 @@ We encourage you to login to your Hugging Face account so you can upload and sha
|
|||
|
||||
## Load SQuAD dataset
|
||||
|
||||
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
|
||||
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
|
|
|||
|
|
@ -50,7 +50,7 @@ We encourage you to log in to your Hugging Face account so you can upload and sh
|
|||
|
||||
## Load SceneParse150 dataset
|
||||
|
||||
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
|
||||
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
|
|
|||
|
|
@ -97,7 +97,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
|||
tokenized_imdb = imdb.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
|
|
|||
|
|
@ -118,7 +118,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
|||
>>> tokenized_billsum = billsum.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
|
|
|||
|
|
@ -156,7 +156,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
|||
>>> tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
|
|
|||
|
|
@ -113,7 +113,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
|||
>>> tokenized_books = books.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
|
|
|||
Loading…
Reference in a new issue