mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
* make SpeechT5 model by copying Wav2Vec2 * add paper to docs * whoops added docs in wrong file * remove SpeechT5Tokenizer + put CTC back in the name * remove deprecated class * remove unused docstring * delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead * remove classes we don't need right now * initial stab at speech encoder prenet * add more speech encoder prenet stuff * improve SpeechEncoderPrenet * add encoder (not finished yet) * add relative position bias to self-attention * add encoder CTC layers * fix formatting * add decoder from BART, doesn't work yet * make it work with generate loop * wrap the encoder into a speech encoder class * wrap the decoder in a text decoder class * changed my mind * changed my mind again ;-) * load decoder weights, make it work * add weights for text decoder postnet * add SpeechT5ForCTC model that uses only the encoder * clean up EncoderLayer and DecoderLayer * implement _init_weights in SpeechT5PreTrainedModel * cleanup config + Encoder and Decoder * add head + cross attention masks * improve doc comments * fixup * more cleanup * more fixup * TextDecoderPrenet works now, thanks Kendall * add CTC loss * add placeholders for other pre/postnets * add type annotation * fix freeze_feature_encoder * set padding tokens to 0 in decoder attention mask * encoder attention mask downsampling * remove features_pen calculation * disable the padding tokens thing again * fixup * more fixup * code review fixes * rename encoder/decoder wrapper classes * allow checkpoints to be loaded into SpeechT5Model * put encoder into wrapper for CTC model * clean up conversion script * add encoder for TTS model * add speech decoder prenet * add speech decoder post-net * attempt to reconstruct the generation loop * add speech generation loop * clean up generate_speech * small tweaks * fix forward pass * enable always dropout on speech decoder prenet * sort declaration * rename models * fixup * fix copies * more fixup * make consistency checker happy * add Seq2SeqSpectrogramOutput class * doc comments * quick note about loss and labels * add HiFi-GAN implementation (from Speech2Speech PR) * rename file * add vocoder to TTS model * improve vocoder * working on tokenizer * more better tokenizer * add CTC tokenizer * fix decode and batch_code in CTC tokenizer * fix processor * two processors and feature extractors * use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2 * cleanup * more cleanup * even more fixup * notebooks * fix log-mel spectrograms * support reduction factor * fixup * shift spectrograms to right to create decoder inputs * return correct labels * add labels for stop token prediction * fix doc comments * fixup * remove SpeechT5ForPreTraining * more fixup * update copyright headers * add usage examples * add SpeechT5ProcessorForCTC * fixup * push unofficial checkpoints to hub * initial version of tokenizer unit tests * add slow test * fix failing tests * tests for CTC tokenizer * finish CTC tokenizer tests * processor tests * initial test for feature extractors * tests for spectrogram feature extractor * fixup * more fixup * add decorators * require speech for tests * modeling tests * more tests for ASR model * fix imports * add fake tests for the other models * fixup * remove jupyter notebooks * add missing SpeechT5Model tests * add missing tests for SpeechT5ForCTC * add missing tests for SpeechT5ForTextToSpeech * sort tests by name * fix Hi-Fi GAN tests * fixup * add speech-to-speech model * refactor duplicate speech generation code * add processor for SpeechToSpeech model * add usage example * add tests for speech-to-speech model * fixup * enable gradient checkpointing for SpeechT5FeatureEncoder * code review * push_to_hub now takes repo_id * improve doc comments for HiFi-GAN config * add missing test * add integration tests * make number of layers in speech decoder prenet configurable * rename variable * rename variables * add auto classes for TTS and S2S * REMOVE CTC!!! * S2S processor does not support save/load_pretrained * fixup * these models are now in an auto mapping * fix doc links * rename HiFiGAN to HifiGan, remove separate config file * REMOVE auto classes * there can be only one * fixup * replace assert * reformat * feature extractor can process input and target at same time * update checkpoint names * fix commit hash |
||
|---|---|---|
| .. | ||
| albert.mdx | ||
| altclip.mdx | ||
| audio-spectrogram-transformer.mdx | ||
| auto.mdx | ||
| bart.mdx | ||
| barthez.mdx | ||
| bartpho.mdx | ||
| beit.mdx | ||
| bert-generation.mdx | ||
| bert-japanese.mdx | ||
| bert.mdx | ||
| bertweet.mdx | ||
| big_bird.mdx | ||
| bigbird_pegasus.mdx | ||
| biogpt.mdx | ||
| bit.mdx | ||
| blenderbot-small.mdx | ||
| blenderbot.mdx | ||
| blip.mdx | ||
| bloom.mdx | ||
| bort.mdx | ||
| bridgetower.mdx | ||
| byt5.mdx | ||
| camembert.mdx | ||
| canine.mdx | ||
| chinese_clip.mdx | ||
| clip.mdx | ||
| clipseg.mdx | ||
| codegen.mdx | ||
| conditional_detr.mdx | ||
| convbert.mdx | ||
| convnext.mdx | ||
| cpm.mdx | ||
| ctrl.mdx | ||
| cvt.mdx | ||
| data2vec.mdx | ||
| deberta-v2.mdx | ||
| deberta.mdx | ||
| decision_transformer.mdx | ||
| deformable_detr.mdx | ||
| deit.mdx | ||
| deta.mdx | ||
| detr.mdx | ||
| dialogpt.mdx | ||
| dinat.mdx | ||
| distilbert.mdx | ||
| dit.mdx | ||
| donut.mdx | ||
| dpr.mdx | ||
| dpt.mdx | ||
| efficientformer.mdx | ||
| electra.mdx | ||
| encoder-decoder.mdx | ||
| ernie.mdx | ||
| esm.mdx | ||
| flan-t5.mdx | ||
| flaubert.mdx | ||
| flava.mdx | ||
| fnet.mdx | ||
| fsmt.mdx | ||
| funnel.mdx | ||
| git.mdx | ||
| glpn.mdx | ||
| gpt-sw3.mdx | ||
| gpt2.mdx | ||
| gpt_neo.mdx | ||
| gpt_neox.mdx | ||
| gpt_neox_japanese.mdx | ||
| gptj.mdx | ||
| graphormer.mdx | ||
| groupvit.mdx | ||
| herbert.mdx | ||
| hubert.mdx | ||
| ibert.mdx | ||
| imagegpt.mdx | ||
| jukebox.mdx | ||
| layoutlm.mdx | ||
| layoutlmv2.mdx | ||
| layoutlmv3.mdx | ||
| layoutxlm.mdx | ||
| led.mdx | ||
| levit.mdx | ||
| lilt.mdx | ||
| longformer.mdx | ||
| longt5.mdx | ||
| luke.mdx | ||
| lxmert.mdx | ||
| m2m_100.mdx | ||
| marian.mdx | ||
| markuplm.mdx | ||
| mask2former.mdx | ||
| maskformer.mdx | ||
| mbart.mdx | ||
| mctct.mdx | ||
| megatron-bert.mdx | ||
| megatron_gpt2.mdx | ||
| mluke.mdx | ||
| mobilebert.mdx | ||
| mobilenet_v1.mdx | ||
| mobilenet_v2.mdx | ||
| mobilevit.mdx | ||
| mpnet.mdx | ||
| mt5.mdx | ||
| mvp.mdx | ||
| nat.mdx | ||
| nezha.mdx | ||
| nllb.mdx | ||
| nystromformer.mdx | ||
| oneformer.mdx | ||
| openai-gpt.mdx | ||
| opt.mdx | ||
| owlvit.mdx | ||
| pegasus.mdx | ||
| pegasus_x.mdx | ||
| perceiver.mdx | ||
| phobert.mdx | ||
| plbart.mdx | ||
| poolformer.mdx | ||
| prophetnet.mdx | ||
| qdqbert.mdx | ||
| rag.mdx | ||
| realm.mdx | ||
| reformer.mdx | ||
| regnet.mdx | ||
| rembert.mdx | ||
| resnet.mdx | ||
| retribert.mdx | ||
| roberta-prelayernorm.mdx | ||
| roberta.mdx | ||
| roc_bert.mdx | ||
| roformer.mdx | ||
| segformer.mdx | ||
| sew-d.mdx | ||
| sew.mdx | ||
| speech-encoder-decoder.mdx | ||
| speech_to_text.mdx | ||
| speech_to_text_2.mdx | ||
| speecht5.mdx | ||
| splinter.mdx | ||
| squeezebert.mdx | ||
| swin.mdx | ||
| swin2sr.mdx | ||
| swinv2.mdx | ||
| switch_transformers.mdx | ||
| t5.mdx | ||
| t5v1.1.mdx | ||
| table-transformer.mdx | ||
| tapas.mdx | ||
| tapex.mdx | ||
| time_series_transformer.mdx | ||
| timesformer.mdx | ||
| trajectory_transformer.mdx | ||
| transfo-xl.mdx | ||
| trocr.mdx | ||
| ul2.mdx | ||
| unispeech-sat.mdx | ||
| unispeech.mdx | ||
| upernet.mdx | ||
| van.mdx | ||
| videomae.mdx | ||
| vilt.mdx | ||
| vision-encoder-decoder.mdx | ||
| vision-text-dual-encoder.mdx | ||
| visual_bert.mdx | ||
| vit.mdx | ||
| vit_hybrid.mdx | ||
| vit_mae.mdx | ||
| vit_msn.mdx | ||
| wav2vec2-conformer.mdx | ||
| wav2vec2.mdx | ||
| wav2vec2_phoneme.mdx | ||
| wavlm.mdx | ||
| whisper.mdx | ||
| xclip.mdx | ||
| xglm.mdx | ||
| xlm-prophetnet.mdx | ||
| xlm-roberta-xl.mdx | ||
| xlm-roberta.mdx | ||
| xlm.mdx | ||
| xlnet.mdx | ||
| xls_r.mdx | ||
| xlsr_wav2vec2.mdx | ||
| yolos.mdx | ||
| yoso.mdx | ||