transformers/tests
Matthijs Hollemans e4bacf6614
[WIP] add SpeechT5 model (#18922)
* make SpeechT5 model by copying Wav2Vec2

* add paper to docs

* whoops added docs in wrong file

* remove SpeechT5Tokenizer + put CTC back in the name

* remove deprecated class

* remove unused docstring

* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead

* remove classes we don't need right now

* initial stab at speech encoder prenet

* add more speech encoder prenet stuff

* improve SpeechEncoderPrenet

* add encoder (not finished yet)

* add relative position bias to self-attention

* add encoder CTC layers

* fix formatting

* add decoder from BART, doesn't work yet

* make it work with generate loop

* wrap the encoder into a speech encoder class

* wrap the decoder in a text decoder class

* changed my mind

* changed my mind again ;-)

* load decoder weights, make it work

* add weights for text decoder postnet

* add SpeechT5ForCTC model that uses only the encoder

* clean up EncoderLayer and DecoderLayer

* implement _init_weights in SpeechT5PreTrainedModel

* cleanup config + Encoder and Decoder

* add head + cross attention masks

* improve doc comments

* fixup

* more cleanup

* more fixup

* TextDecoderPrenet works now, thanks Kendall

* add CTC loss

* add placeholders for other pre/postnets

* add type annotation

* fix freeze_feature_encoder

* set padding tokens to 0 in decoder attention mask

* encoder attention mask downsampling

* remove features_pen calculation

* disable the padding tokens thing again

* fixup

* more fixup

* code review fixes

* rename encoder/decoder wrapper classes

* allow checkpoints to be loaded into SpeechT5Model

* put encoder into wrapper for CTC model

* clean up conversion script

* add encoder for TTS model

* add speech decoder prenet

* add speech decoder post-net

* attempt to reconstruct the generation loop

* add speech generation loop

* clean up generate_speech

* small tweaks

* fix forward pass

* enable always dropout on speech decoder prenet

* sort declaration

* rename models

* fixup

* fix copies

* more fixup

* make consistency checker happy

* add Seq2SeqSpectrogramOutput class

* doc comments

* quick note about loss and labels

* add HiFi-GAN implementation (from Speech2Speech PR)

* rename file

* add vocoder to TTS model

* improve vocoder

* working on tokenizer

* more better tokenizer

* add CTC tokenizer

* fix decode and batch_code in CTC tokenizer

* fix processor

* two processors and feature extractors

* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2

* cleanup

* more cleanup

* even more fixup

* notebooks

* fix log-mel spectrograms

* support reduction factor

* fixup

* shift spectrograms to right to create decoder inputs

* return correct labels

* add labels for stop token prediction

* fix doc comments

* fixup

* remove SpeechT5ForPreTraining

* more fixup

* update copyright headers

* add usage examples

* add SpeechT5ProcessorForCTC

* fixup

* push unofficial checkpoints to hub

* initial version of tokenizer unit tests

* add slow test

* fix failing tests

* tests for CTC tokenizer

* finish CTC tokenizer tests

* processor tests

* initial test for feature extractors

* tests for spectrogram feature extractor

* fixup

* more fixup

* add decorators

* require speech for tests

* modeling tests

* more tests for ASR model

* fix imports

* add fake tests for the other models

* fixup

* remove jupyter notebooks

* add missing SpeechT5Model tests

* add missing tests for SpeechT5ForCTC

* add missing tests for SpeechT5ForTextToSpeech

* sort tests by name

* fix Hi-Fi GAN tests

* fixup

* add speech-to-speech model

* refactor duplicate speech generation code

* add processor for SpeechToSpeech model

* add usage example

* add tests for speech-to-speech model

* fixup

* enable gradient checkpointing for SpeechT5FeatureEncoder

* code review

* push_to_hub now takes repo_id

* improve doc comments for HiFi-GAN config

* add missing test

* add integration tests

* make number of layers in speech decoder prenet configurable

* rename variable

* rename variables

* add auto classes for TTS and S2S

* REMOVE CTC!!!

* S2S processor does not support save/load_pretrained

* fixup

* these models are now in an auto mapping

* fix doc links

* rename HiFiGAN to HifiGan, remove separate config file

* REMOVE auto classes

* there can be only one

* fixup

* replace assert

* reformat

* feature extractor can process input and target at same time

* update checkpoint names

* fix commit hash
2023-02-03 12:43:46 -05:00
..
benchmark
deepspeed [examples/deepspeed] fix renamed api (#21283) 2023-01-24 09:54:33 -08:00
extended [bnb optim] fixing test (#21030) 2023-01-12 08:52:54 -08:00
fixtures [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
generation 🚨🚨 Generate: standardize beam search behavior across frameworks (#21368) 2023-02-03 10:24:02 +00:00
mixed_int8 [bnb] Fine-tuning HF 8-bit models (#21290) 2023-02-02 16:39:23 +01:00
models [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
onnx Add Onnx Config for PoolFormer (#20868) 2022-12-23 01:30:57 -05:00
optimization
pipelines Fix some pipeline tests (#21401) 2023-02-02 19:03:31 +01:00
repo_utils
sagemaker
tokenization
trainer Add AWS Neuron torchrun support (#20806) 2023-01-18 11:21:19 -05:00
utils Add the GeLU activation from pytorch with the tanh approximation (#21345) 2023-02-02 09:33:04 -05:00
__init__.py
test_configuration_common.py
test_feature_extraction_common.py Add test_image_processing_common.py (#20785) 2023-01-23 13:48:30 +00:00
test_image_processing_common.py Add test_image_processing_common.py (#20785) 2023-01-23 13:48:30 +00:00
test_image_transforms.py Move convert_to_rgb to image_transforms module (#20784) 2022-12-15 18:47:04 +00:00
test_modeling_common.py Add variant to transformers (#21332) 2023-02-01 09:21:52 +01:00
test_modeling_flax_common.py Generate: save generation config with the models' .save_pretrained() (#21264) 2023-01-23 16:21:44 +00:00
test_modeling_tf_common.py Generate: fix TF XLA tests on models with max_position_embeddings or max_target_positions (#21389) 2023-01-31 15:49:34 +00:00
test_sequence_feature_extraction_common.py
test_tokenization_common.py