transformers/docs/source
Eduardo Gonzalez Ponferrada df5a4094a6
Add Data2Vec (#15507)
* Add data2vec model cloned from roberta

* Add checkpoint conversion script

* Fix copies

* Update docs

* Add checkpoint conversion script

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update documentation

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* add inputs to logits to data2vec'

* correct autio models

* correct config auto

* correct tok auto

* Update utils/tests_fetcher.py

* delete unnecessary files

* delete unnecessary files

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix copies

* Update docs

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update documentation

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* add inputs to logits to data2vec'

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* correct autio models

* correct config auto

* correct tok auto

* delete unnecessary files

* delete unnecessary files

* Update utils/tests_fetcher.py

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move data2vec tests to new structure

* Fix test imports for text tests

* Remove fairseq files

* Change paper link to arxiv

* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.

* Update text model checkpoint to be facebook/data2vec-text-base

* Add 'Copy from' statements and update paper links and docs

* fix copy from statements

* improve copied from

* correct more copied from statements

* finish copied from stuff

* make style

* add model to README

* add to master

Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-01 11:09:20 +01:00
..
internal TF generate refactor - Greedy Search (#15562) 2022-02-15 17:54:43 +01:00
main_classes Adding ZeroShotImageClassificationPipeline (#12119) 2022-02-23 09:41:42 +01:00
model_doc Add Data2Vec (#15507) 2022-03-01 11:09:20 +01:00
tasks 🧼 NLP task guides (#15731) 2022-02-23 13:58:33 -06:00
_config.py Prevent style_doc from tempering the config file 2021-12-10 15:31:43 -05:00
_toctree.yml Add Data2Vec (#15507) 2022-03-01 11:09:20 +01:00
accelerate.mdx Fix code format for Accelerate doc (#15335) 2022-01-27 13:49:04 -06:00
add_new_model.mdx added link to our writing-doc document (#15756) 2022-02-22 09:57:28 +01:00
add_new_pipeline.mdx Doc styler examples (#14953) 2021-12-27 19:07:46 -05:00
autoclass_tutorial.mdx Update tutorial docs (#15165) 2022-02-01 18:31:35 -06:00
benchmarks.mdx [doc] normalize HF Transformers string (#15023) 2022-01-10 08:44:33 -08:00
bertology.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00
community.mdx add t5 ner finetuning (#15432) 2022-01-31 17:03:06 +01:00
contributing.md
converting_tensorflow_models.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00
create_a_model.mdx Create a custom model guide (#15489) 2022-02-07 12:34:56 -06:00
custom_datasets.mdx Added missing code in exemplary notebook - custom datasets fine-tuning (#15300) 2022-01-25 17:26:17 -05:00
custom_models.mdx [doc] custom_models: mention security features of the Hub (#15768) 2022-02-23 11:40:06 -05:00
debugging.mdx add a network debug script and document it (#15652) 2022-02-15 08:48:00 -08:00
examples.md
fast_tokenizers.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00
glossary.mdx Doc styler examples (#14953) 2021-12-27 19:07:46 -05:00
index.mdx Add Data2Vec (#15507) 2022-03-01 11:09:20 +01:00
installation.mdx Get started docs (#15098) 2022-01-28 19:01:37 -06:00
migration.mdx Doc styler examples (#14953) 2021-12-27 19:07:46 -05:00
model_sharing.mdx Update model share tutorial (#15288) 2022-01-28 18:49:26 -06:00
model_summary.mdx Add "open in hf spaces" gradio button issue #73 (#15106) 2022-01-14 10:12:30 -05:00
multilingual.mdx Doc styler examples (#14953) 2021-12-27 19:07:46 -05:00
notebooks.md
parallelism.mdx [deepspeed docs] Megatron-Deepspeed info (#15488) 2022-02-04 11:15:13 -08:00
performance.mdx add model scaling section (#15119) 2022-02-09 15:27:30 +01:00
perplexity.mdx Doc styler examples (#14953) 2021-12-27 19:07:46 -05:00
philosophy.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00
pipeline_tutorial.mdx Update tutorial docs (#15165) 2022-02-01 18:31:35 -06:00
pr_checks.mdx [docs] fix wrong file name in pr_check (#15380) 2022-01-28 07:52:01 -05:00
preprocessing.mdx Update tutorial docs (#15165) 2022-02-01 18:31:35 -06:00
quicktour.mdx Re-enable doctests for the quicktour (#15828) 2022-02-25 17:46:38 +01:00
sagemaker.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00
serialization.mdx Add Data2Vec (#15507) 2022-03-01 11:09:20 +01:00
task_summary.mdx Re-enable doctests for the quicktour (#15828) 2022-02-25 17:46:38 +01:00
testing.mdx [doc] normalize HF Transformers string (#15023) 2022-01-10 08:44:33 -08:00
tokenizer_summary.mdx Fix grammar in tokenizer_summary (#15614) 2022-02-11 16:51:30 -05:00
training.mdx Update fine-tune docs (#15259) 2022-02-01 18:28:12 -06:00
troubleshooting.mdx Convert rst files (#14888) 2021-12-22 16:14:35 -05:00