transformers

mirror of https://github.com/saymrwulf/transformers.git synced 2026-05-15 21:01:19 +00:00

History

Joel Lamy-Poirier e0921c6b53 Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 ) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>		2023-04-10 10:57:21 +02:00
..
internal	Generate: `TextIteratorStreamer` (streamer for gradio) (#22501 )	2023-04-03 15:04:37 +01:00
main_classes	Generate: basic token streaming (#22449 )	2023-03-30 12:00:12 +01:00
model_doc	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
tasks	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
_config.py
_toctree.yml	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
accelerate.mdx
add_new_model.mdx	🚨🚨🚨 Enforce single model initialization (#21431 )	2023-02-09 15:46:26 -05:00
add_new_pipeline.mdx
add_tensorflow_model.mdx
attention.mdx	Refactor model summary (#21408 )	2023-02-15 10:35:14 -08:00
autoclass_tutorial.mdx
benchmarks.mdx
bertology.mdx	update: bertology paper (#22012 )	2023-03-08 07:54:30 -05:00
big_models.mdx
community.mdx	Fix en documentation typos (#21799 )	2023-02-27 08:36:36 +01:00
contributing.md
converting_tensorflow_models.mdx
create_a_model.mdx	Documentation code sample fixes (#21302 )	2023-01-25 11:33:39 -05:00
custom_models.mdx
debugging.mdx
fast_tokenizers.mdx
generation_strategies.mdx	Generate: add API warning to streamers (#22659 )	2023-04-07 14:15:20 -04:00
glossary.mdx	docs: New terms and updates to glossary (#21982 )	2023-03-13 19:09:37 -04:00
hpo_train.mdx
index.mdx	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
installation.mdx	Can't install tf2 on M1 Chip by default (#22046 )	2023-03-09 07:44:58 -05:00
migration.mdx
model_sharing.mdx	Fix `PushToHubCallback` import in Share a model docs (#21457 )	2023-02-06 09:26:22 -05:00
model_summary.mdx	Refactor model summary (#21408 )	2023-02-15 10:35:14 -08:00
multilingual.mdx
notebooks.md
pad_truncation.mdx	Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278 )	2023-03-20 14:18:55 -04:00
perf_hardware.mdx
perf_infer_cpu.mdx
perf_infer_gpu_many.mdx
perf_infer_gpu_one.mdx	[`Doc`] Fix int8 docs (#21487 )	2023-02-07 15:09:27 +01:00
perf_infer_special.mdx
perf_train_cpu.mdx	Add perf numbers for perf_train_cpu (#20974 )	2023-02-06 09:20:43 -05:00
perf_train_cpu_many.mdx
perf_train_gpu_many.mdx
perf_train_gpu_one.mdx
perf_train_special.mdx
perf_train_tpu.mdx
perf_train_tpu_tf.mdx	Typos/fixes to link syntax (#21450 )	2023-02-07 15:19:19 +00:00
performance.mdx
perplexity.mdx	Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411 )	2023-03-28 09:09:17 -04:00
philosophy.mdx
pipeline_tutorial.mdx	Update 2 doctest expected values for torch 2.0.0 (#22148 )	2023-03-14 09:13:16 +00:00
pipeline_webserver.mdx	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
pr_checks.mdx	Cleanup quality (#21493 )	2023-02-07 12:27:31 -05:00
preprocessing.mdx	Updates to computer vision section of the Preprocess doc (#21181 )	2023-01-19 08:43:36 -05:00
quicktour.mdx	Fix 2 quicktour file doctest (#21742 )	2023-02-23 09:41:28 +01:00
run_scripts.mdx
sagemaker.mdx
serialization.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
task_summary.mdx	Remove trailing 'extractive' word from en documentation (#21594 )	2023-02-13 10:09:00 -05:00
tasks_explained.mdx	Update task summary (#21067 )	2023-02-02 11:41:27 -08:00
testing.mdx	[`tests`] add `accelerate` marker (#21743 )	2023-02-27 12:33:34 +01:00
tf_xla.mdx	Rewrite a couple of lines in the TF XLA doc (#21177 )	2023-01-18 17:53:05 +00:00
tokenizer_summary.mdx
torchscript.mdx
training.mdx	Fix code example in training tutorial (#21201 )	2023-01-20 07:38:15 -08:00
troubleshooting.mdx	Removed BLIP mention from the troubleshooting guide (#21872 )	2023-03-01 08:26:25 -05:00