transformers

mirror of https://github.com/saymrwulf/transformers.git synced 2026-05-14 20:58:08 +00:00

Author	SHA1	Message	Date
saymrwulf	44d88ddb40	Experimenting with FPGA constraints (source: Trimberger 'Three Ages of FPGAs') (transformers)	2015-06-03 21:18:00 +02:00
saymrwulf	7cdefe560c	Minor doc updates: linking to article on quantization (transformers)	2015-05-13 21:01:00 +02:00
saymrwulf	4905a76bd8	Refactor for clarity, might break a few tests though (transformers)	2015-04-10 19:03:00 +02:00
saymrwulf	ac85e61aa3	Experimenting with FPGA constraints (source: Trimberger 'Three Ages of FPGAs') (transformers)	2015-04-10 01:36:00 +02:00
saymrwulf	f19fa1d2a9	Experimenting with FPGA constraints (source: Trimberger 'Three Ages of FPGAs') (transformers)	2015-03-24 19:55:00 +01:00
saymrwulf	d08a72dbed	Implementing approach from a new paper read last night (transformers)	2015-03-08 07:24:00 +01:00
saymrwulf	10fd04a35a	Implementing approach from a new paper read last night (transformers)	2015-03-08 07:36:00 +01:00
saymrwulf	c11ad745d8	Late-night bugfix on financial RL environment (transformers)	2015-03-08 21:34:00 +01:00
saymrwulf	30fd6f216b	Minor doc updates: linking to article on quantization (transformers)	2015-02-28 23:27:00 +01:00
saymrwulf	d788f040e2	Refactor for clarity, might break a few tests though (transformers)	2014-12-23 18:04:00 +01:00
saymrwulf	86203b05e8	Refactor for clarity, might break a few tests though (transformers)	2014-12-23 03:18:00 +01:00
saymrwulf	117438e542	Testing bigger LLM config, referencing 'Attention Is All You Need' (transformers)	2014-12-23 21:37:00 +01:00
saymrwulf	0e37b10204	Minor doc updates: linking to article on quantization (transformers)	2014-12-23 21:25:00 +01:00
saymrwulf	2a2900d955	Testing bigger LLM config, referencing 'Attention Is All You Need' (transformers)	2014-11-04 04:16:00 +01:00
saymrwulf	96ea40ed26	Testing bigger LLM config, referencing 'Attention Is All You Need' (transformers)	2014-11-04 17:39:00 +01:00
saymrwulf	34e9691bdd	Experimenting with FPGA constraints (source: Trimberger 'Three Ages of FPGAs') (transformers)	2014-08-31 03:08:00 +02:00
saymrwulf	62060bbe66	Minor doc updates: linking to article on quantization (transformers)	2014-08-30 00:27:00 +02:00
saymrwulf	97058aebff	Minor doc updates: linking to article on quantization (transformers)	2014-07-29 19:47:00 +02:00
saymrwulf	da45732b94	Late-night bugfix on financial RL environment (transformers)	2014-07-29 00:28:00 +02:00
saymrwulf	21f75cd340	Late-night bugfix on financial RL environment (transformers)	2014-07-29 06:46:00 +02:00
saymrwulf	e24b9b63b9	Minor doc updates: linking to article on quantization (transformers)	2014-07-13 03:41:00 +02:00
saymrwulf	d36c1bffe6	Implementing approach from a new paper read last night (transformers)	2014-06-26 03:28:00 +02:00
saymrwulf	801d6c73a5	Late-night bugfix on financial RL environment (transformers)	2014-06-26 07:42:00 +02:00
saymrwulf	8c60ec8cc3	Refactor for clarity, might break a few tests though (transformers)	2014-06-16 06:40:00 +02:00
saymrwulf	ca3027713e	Quick fix, referencing a known issue from the official repo (transformers)	2014-06-16 01:09:00 +02:00
saymrwulf	c0fa1a63f2	Experimenting with FPGA constraints (source: Trimberger 'Three Ages of FPGAs') (transformers)	2014-05-15 08:36:00 +02:00
saymrwulf	5cede1d5b5	Late-night bugfix on financial RL environment (transformers)	2014-05-08 18:23:00 +02:00
saymrwulf	5782c85d65	Implementing approach from a new paper read last night (transformers)	2014-04-05 04:02:00 +02:00
saymrwulf	3319dfaa62	Quick fix, referencing a known issue from the official repo (transformers)	2014-01-10 01:42:00 +01:00
saymrwulf	7e1a4f6832	Late-night bugfix on financial RL environment (transformers)	2014-01-10 02:43:00 +01:00
jiqing-feng	0baf003915	Refactor OPT model (#36101 ) * remove cross attention Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * remove is_decoder Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix pkv Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-02-10 14:27:16 +01:00
Yoni Gozlan	924f1c717a	Remove Multi-threaded image conversion for fast image processors (#36105 ) remove multithreaded image conversion Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-10 07:59:34 -05:00
Yih-Dar	3897f2caf8	Enable pytest live log and show warning logs on GitHub Actions CI runs (#35912 ) * fix * remove * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-02-10 13:36:20 +01:00
Jingze Shi	48a309d0d2	Support constant lr with cooldown (#35453 ) * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and decay methods to 'get_wsd_schedule' * support num_training_steps and num_stable_steps for get_wsd_schedule * support num_training_steps and num_stable_steps for get_wsd_schedule * get wsd scheduler before the `num_training_steps` decision * fix code_quality * Update stable branch logic * fix code_quality * Move stable stage decide to `get_wsd_schedule` * Update docstring of `get_wsd_schedule` * Update `num_train_steps` to optional * Update `num_train_steps` to optional * Update docstring of `get_wsd_schedule` * Update src/transformers/optimization.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-10 13:21:55 +01:00
Armaghan Shakir	9a6be63fdb	Add Apple's Depth-Pro for depth estimation (#34583 ) * implement config and model building blocks * refactor model architechture * update model outputs * update init param to include use_fov_model * update param name in config * fix hidden_states and attentions outputs for fov * sort config * complete minor todos * update patching * update config for encoder * fix config * use correct defaults in config * update merge for compatibility with different image size * restructure encoder for custom configuration * make fov model compatible with custom config * replace word "decoder" with "fusion" * weight conversion script * fix fov squeeze * update conversion script (without test) * upload ruff image processing * create fast image processing * use torch interpolation for image processing * complete post_process_depth_estimation * config: fix imports and sort args * apply inference in weight conversion * use mllama script instead for weight conversion * clean weight conversion script * add depth-pro status in other files * fill docstring in config * formatting * more formatting * formatting with ruff * formatting with style * fix copied classes * add examples; update weight convert script * fix using check_table.py and isort * fix config docstring * add depth pro to sdpa docs * undo unintentional changes in configuration_gemma.py * minor fixes * test image processing * fixes and tests * more fixes * use output states from image_encoder instead * Revert "use output states from image_encoder instead" This reverts commit 2408ec54e4f27d2abbecdb8374e58f34d91d8e96. * make embeddings dynamic * reshape output hidden states and attentions as part of computation graph * fix ruff formating * fix docstring failure * use num_fov_head_layers in tests * update doc * check consistency with config * ruff formatting * update test case * fix ruff formatting * add tests for fov * use interpolation in postprocess * run and fix slow tests locally * use scaled_images_features for image and fov encoder * return fused_hidden_states in fusion stage * fix example * fix ruff * fix copyright license for all files * add __all__ for each file * minor fixes - fix download spell - add push_to_hub option - fix Optional type hinting - apply single loop for DepthProImageProcessor.preprocess * return list in post_process_depth_estimation * minor fixes - capitalize start of docstring - use ignore copy - fix examples - move docstring templates and custom output classes to top - remove "-> None" typehinting from __init__ - type hinting for forward passes - fix docstrings for custom output classes * fix "ruff check" * update upsample and projection * major changes: (image size and merge optimization) - add support for images of any size - optimize merge operation - remove image_size from config - use full names instead of B, C, H, W - remove interpolation from fusion stage - add interpolation after merge - move validations to config - update integration test - add type hints for functions * fix push_to_hub option in weights conversion * remove image_size in weights conversion * major changes in the architecture - remove all DepthProViT modules and support different backbones using the AutoModel API - set default use_fov_model to False - validate parameters in configuration - update interpolate function: use "nearest" for faster computation - update reshape_feature function: remove all special tokens, possible from different backbones - update merge function: use padding from config instead of merge_out_size - remove patch_to_batch and batch_to_patch conversions for now - calculate out_size dynamically in the encoder - leave head_mask calculation to the backbone - fix bugs with merge - add more comments - update tests * placeholder for unused config attributes * improve docs amid review * minor change in docs * further optimize merge * fix formatting * remove unused patch/batch convertion functions * use original F.interpolate * improve function naming * minor chages - use torch_int instead of int - use proper for newly initialized tensors - use user provided return_dict for patch_encoder - use if-else block instead in self.use_fov_model * rearchitect upsample block for improved modularity * update upsample keys in weight conversion * improve padding in merge_patches * use double-loop for merge * update comments * create feature_extractor, reduce some forward code * introduce config.use_mask_token in dinov2 * minor fixes * minor fixes for onnx * update __init__ to latest format * remove DepthProConfig.to_dict() * major changes in backbone * update config in weight conversion * formatting * converted model is fp32 * improve naming and docs for feature_extractor->reconstruct_feature_maps * minor fixes; amid review * create intermediate vars in func call * use torch.testing.assert_close * use ModuleList instead of Sequential and ModuleDict * update docs * include fov in integraiton tests * update docs * improve initialization of convolution layers * fix unused fov keys * update tests * ruff format * fix test, amid kaimming initialization * add depthpro to toctree * add residual layer to _no_split_modules * architecture rework * Update src/transformers/models/depth_pro/image_processing_depth_pro.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update docs * improve merge_patches * use flatten with fov_output * ruff formatting * update resources section in docs Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix typo "final_kernal_size" Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix output typehint for DepthProDepthEstimator Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * residual operation in 2 steps Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * use image_size instead of global patch_size in interpolation * replace all Sequential with ModuleList * update fov * update heads * fix and update conversion script for heads * ruff formatting * remove float32 conversion * use "Fov" instead of "FOV" in class names * use "Fov" instead of "FOV" in config docs * remove prune_heads * update fusion stage * use device in examples * update processor * ruff fixes * add do_rescale in image_processor_dict * skip test: test_fast_is_faster_than_slow * ruff formatting * DepthProImageProcessorFast in other files * revert antialias removal * add antialias in BaseImageProcessorFast * Revert "revert antialias removal" This reverts commit 5caa0bd8f9f7463b98410c04e6cfe8fef3adee18. * Revert "add antialias in BaseImageProcessorFast" This reverts commit 3ae1134780ae236872985523d9c0a444eabcc179. * update processor for grouping and antialias * try test_fast_is_faster_than_slow without "skip" or "flanky" * update checkpoint * update checkpoint * use @is_flanky for processor test * update checkpoint to "apple/DepthPro-hf" --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-02-10 11:32:45 +00:00
Raushan Turganbay	c399921965	Paligemma: revert #36084 (#36113 ) * revert * type check	2025-02-10 12:04:24 +01:00
Raushan Turganbay	eebd2c972c	Chat template: update for processor (#35953 ) * update * we need batched nested input to always process correctly * update a bit * fix copies	2025-02-10 09:52:19 +01:00
Raushan Turganbay	5bd7694781	Processors: allow tuples of images when checking (#36084 ) allow tuples of images	2025-02-10 09:35:13 +01:00
Kyle Sayers	3a3b06ace4	fix MllamaVisionAttention typehint (#35975 ) * fix MllamaVisionAttention typehint Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/transformers/models/mllama/modeling_mllama.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * fix suggestion Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2025-02-10 09:17:10 +01:00
Fanli Lin	6b55046213	[docs] fix not-working example code in `perf_infer_gpu_one.md` (#36087 ) * bug fix * update memory limit	2025-02-07 12:42:22 -08:00
Fanli Lin	14ca7f1452	[docs] fix typo (#36080 ) typo fix	2025-02-07 12:42:09 -08:00
Fanli Lin	c361b1e3d9	[docs] fix model checkpoint name (#36075 ) update model name	2025-02-07 12:41:52 -08:00
Zach Mueller	ba29a439ad	Fix OS err (#36094 ) * Try via local_main_process first * try 2	2025-02-07 09:57:43 -05:00
Matt	a18b7fdd9e	Move audio top_k tests to the right file and add slow decorator (#36072 ) * Move audio top_k tests to the right file and add slow decorator because we load a real model * empty commit to trigger tests	2025-02-07 14:32:30 +00:00
DeepWave	014047e1c8	Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL (#36065 )	2025-02-07 10:43:45 +01:00
Jade Choghari	006d9249ec	Adding RT-DETRv2 for object detection (#34773 ) * cookiecutter add rtdetrv2 * make modular working * working modelgit add . * working modelgit add . * finalize moduar inheritence * finalize moduar inheritence * Update src/transformers/models/rtdetrv2/modular_rtdetrv2.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * update modular and add rename * remove output ckpt * define loss_kwargs * fix CamelCase naming * fix naming + files * fix modular and convert file * additional changes * fix modular * fix import error (switch to lazy) * fix autobackbone * make style * add * update testing * fix loss * remove old folder * fix testing for v2 * update docstring * fix docstring * add resnetv2 (with modular bug to fix) * remove resnetv2 backbone * fix changes * small fixes * remove rtdetrv2resnetconfig * add rtdetrv2 name to convert * make style * Update docs/source/en/model_doc/rt_detr_v2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix modular typo after review * add reviewed changes * add final review changes * Update docs/source/en/model_doc/rt_detr_v2.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/rt_detr_v2/__init__.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/rt_detr_v2/convert_rt_detr_v2_weights_to_hf.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * add review changes * remove rtdetrv2 resnet * removing this weird project change * change ckpt name from jadechoghari to author * implement review and update testing * update naming and remove wrong ckpt * name * make fix-copies * Fix RT-DETR loss * Add resources, fix name * Fix repo in docs * Fix table name --------- Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: qubvel <qubvel@gmail.com>	2025-02-06 19:28:45 +00:00
Fanli Lin	6246c03260	[docs] fix outdated example code in `trainer.md` (#36066 ) fix bugs	2025-02-06 10:54:22 -08:00
Matt	4563ba2c6f	Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797 ) * Fix StopStringCriteria to handle tokens above len(tokenizer) This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer). The fix: 1. Adds a clamp operation to ensure token IDs are within bounds 2. Adds a test case to verify the behavior * Use self.stop_strings instead of stop_strings * Handle clipping correctly * make fixup * Update test to the new embedding vecs * Use much bigger values in the mismatch test * Typo fix * Slight simplification --------- Co-authored-by: openhands <openhands@all-hands.dev>	2025-02-06 16:53:28 +00:00
Zach Mueller	28f73bc307	Fix model kwargs (#35875 ) * Save state * Make a failing test * Better test * mpt -> done, many more to go * Rm extranious * Bamba * Bert * big_bird * biogpt * bloom * codegen * ctrl * data2vec * dbrx * Through up to Dbrx * electra * ernie * falcon * Fuyu/persimmon * Include noop kwargs to base models * Rebase * Skip musigen * Refactor/skip mllama * Revert makefile * Rm file * Fix PT failing, need to modify rest of loss funcs to not resize * Propagate some * Continue * More * More options * Mostly fixed * Proved that it's the same * Bloom is good * Make ability to override loss func possible * Fixup * Clean * Fix xglm * Quality tests * Skip OCR2 * Make specific loss for xglm * Make order the same/line up 1:1 * xglm * Skip fx output loss bloom model * Didn't pass in pad_token_id * Fix quality	2025-02-06 11:35:25 -05:00
湛露先生	1590c66430	Fix words typos in ggml test. (#36060 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-02-06 15:32:40 +00:00

1 2 3 4 5 ...

18012 commits