transformers

mirror of https://github.com/saymrwulf/transformers.git synced 2026-05-14 20:58:08 +00:00

History

Stella Biderman c02cd95c56 GPT-J-6B (#13022 ) * Test GPTJ implementation * Fixed conflicts * Update __init__.py * Update __init__.py * change GPT_J to GPTJ * fix missing imports and typos * use einops for now (need to change to torch ops later) * Use torch ops instead of einsum * remove einops deps * Update configuration_auto.py * Added GPT J * Update gptj.rst * Update __init__.py * Update test_modeling_gptj.py * Added GPT J * Changed configs to match GPT2 instead of GPT Neo * Removed non-existent sequence model * Update configuration_auto.py * Update configuration_auto.py * Update configuration_auto.py * Update modeling_gptj.py * Update modeling_gptj.py * Progress on updating configs to agree with GPT2 * Update modeling_gptj.py * num_layers -> n_layer * layer_norm_eps -> layer_norm_epsilon * attention_layers -> num_hidden_layers * Update modeling_gptj.py * attention_pdrop -> attn_pdrop * hidden_act -> activation_function * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * fix layernorm and lm_head size delete attn_type * Update docs/source/model_doc/gptj.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * removed claim that GPT J uses local attention * Removed GPTJForSequenceClassification * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Removed unsupported boilerplate * Update tests/test_modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update __init__.py * Update configuration_gptj.py * Update modeling_gptj.py * Corrected indentation * Remove stray backslash * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Update docs to match * Remove tf loading * Remove config.jax * Remove stray `else:` statement * Remove references to `load_tf_weights_in_gptj` * Adapt tests to match output from GPT-J 6B * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Default `activation_function` to `gelu_new` - Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()` * Fix part of the config documentation * Revert "Update configuration_auto.py" This reverts commit e9860e9c043b6ebf57a0e705044e9ec9ba2263bb. * Revert "Update configuration_auto.py" This reverts commit cfaaae4c4dc70f1fbe9abd60fc8bd0b863b8c011. * Revert "Update configuration_auto.py" This reverts commit 687788954fd0cfbc567fa1202d56a4ff9271944f. * Revert "Update configuration_auto.py" This reverts commit 194d024ea87d4fcef0dcb08e57f52c47511a9fc6. * Hyphenate GPT-J * Undid sorting of the models alphabetically * Reverting previous commit * fix style and quality issues * Update docs/source/model_doc/gptj.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Replaced GPTJ-specific code with generic code * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Made the code always use rotary positional encodings * Update index.rst * Fix documentation * Combine attention classes - Condense all attention operations into `GPTJAttention` - Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout` * Removed `config.rotary_dim` from tests * Update test_modeling_gptj.py * Update test_modeling_gptj.py * Fix formatting * Removed depreciated argument `layer_id` to `GPTJAttention` * Update modeling_gptj.py * Update modeling_gptj.py * Fix code quality * Restore model functionality * Save `lm_head.weight` in checkpoints * Fix crashes when loading with reduced precision * refactor self._attn(...)` and rename layer weights" * make sure logits are in fp32 for sampling * improve docs * Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist * Added GPT-J to the README * Fix doc/readme consistency * Add rough parallelization support - Remove unused imports and variables - Clean up docstrings - Port experimental parallelization code from GPT-2 into GPT-J * Clean up loose ends * Fix index.rst Co-authored-by: kurumuz <kurumuz1@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Eric Hallahan <eric@hallahans.name> Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: your_github_username <your_github_email> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>		2021-08-31 17:53:02 +02:00
..
_static	Docs for v4.10.0	2021-08-31 16:02:31 +02:00
imgs	[doc] DP/PP/TP/etc parallelism (#12524 )	2021-07-09 17:39:09 -07:00
internal	Fix doc building error	2021-08-12 05:49:02 -04:00
main_classes	TF/Numpy variants for all DataCollator classes (#13105 )	2021-08-31 13:06:48 +01:00
model_doc	GPT-J-6B (#13022 )	2021-08-31 17:53:02 +02:00
add_new_model.rst	consistent nn. and nn.functional: part 5 docs (#12161 )	2021-06-14 13:34:32 -07:00
benchmarks.rst	[Docs] fixed broken link (#12205 )	2021-06-16 15:14:53 -04:00
bertology.rst
community.md	docs: add HuggingArtists to community notebooks (#13050 )	2021-08-10 09:36:44 +02:00
conf.py	Docs for v4.10.0	2021-08-31 16:02:31 +02:00
contributing.md
converting_tensorflow_models.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
custom_datasets.rst
debugging.rst	[debug] DebugUnderflowOverflow doesn't work with DP (#12816 )	2021-07-21 09:36:02 -07:00
examples.md
fast_tokenizers.rst
favicon.ico
glossary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
index.rst	GPT-J-6B (#13022 )	2021-08-31 17:53:02 +02:00
installation.md	Add mention of the huggingface_hub methods for offline mode (#12320 )	2021-06-23 09:45:30 -04:00
migration.md	consistent nn. and nn.functional: part 5 docs (#12161 )	2021-06-14 13:34:32 -07:00
model_sharing.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
model_summary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
multilingual.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
notebooks.md
parallelism.md	docs: fix minor typo (#13289 )	2021-08-31 06:49:05 -04:00
performance.md	[doc] performance: batch sizes (#12725 )	2021-07-15 09:39:34 -07:00
perplexity.rst	Create perplexity.rst (#13004 )	2021-08-05 02:56:13 -04:00
philosophy.rst
preprocessing.rst	doc mismatch fixed (#13345 )	2021-08-31 06:28:37 -04:00
pretrained_models.rst
quicktour.rst	Doctests job (#13088 )	2021-08-12 03:42:25 -04:00
sagemaker.md	remove documentation (#12657 )	2021-07-12 18:02:51 +02:00
serialization.rst	Add to ONNX docs (#13048 )	2021-08-09 09:51:49 -04:00
task_summary.rst	Doctests job (#13088 )	2021-08-12 03:42:25 -04:00
testing.rst	[doc] testing: how to trigger a self-push workflow (#12724 )	2021-07-15 16:18:56 -07:00
tokenizer_summary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
training.rst	fix: typo spelling grammar (#13212 )	2021-08-30 08:09:14 -04:00
troubleshooting.md