pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

History

haozhe.zhu 1c3fe84033 [optim] add fused_adagrad support for CPU device (#124905 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99		2024-05-13 01:16:20 +00:00
..
codegen
data
distributed	Fix DDP no_sync when find_unused_parameters is True (#124193 )	2024-05-09 17:33:33 +00:00
generated
opinfo	Test foreach functions with all dtypes except qints (#125527 )	2024-05-10 18:56:37 +00:00
optests
test_module
__init__.py
autocast_test_lists.py
autograd_function_db.py
check_kernel_launches.py
common_cuda.py
common_device_type.py	[Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866 )	2024-05-09 00:51:35 +00:00
common_dist_composable.py
common_distributed.py	[c10d] Reduce test time by reusing ProcessGroup (#125648 )	2024-05-08 22:33:40 +00:00
common_dtype.py
common_fsdp.py	[FSDP2] Added HSDP grad acc tests and some minor changes (#125479 )	2024-05-03 23:44:05 +00:00
common_jit.py
common_methods_invocations.py	Test foreach functions with all dtypes except qints (#125527 )	2024-05-10 18:56:37 +00:00
common_mkldnn.py
common_modules.py	Skip test_memory_format_nn_BatchNorm2d in inductor (#125970 )	2024-05-11 04:11:18 +00:00
common_nn.py
common_optimizers.py	[optim] add fused_adagrad support for CPU device (#124905 )	2024-05-13 01:16:20 +00:00
common_pruning.py
common_quantization.py	[MPS] And naive quantized intmm and `.gputrace` capture hooks (#125163 )	2024-05-03 15:20:39 +00:00
common_quantized.py
common_subclass.py
common_utils.py	Remove Caffe2 python (#125143 )	2024-05-10 21:15:43 +00:00
composite_compliance.py
custom_op_db.py	Fix torch.library.register_fake's module reporting (#125037 )	2024-04-26 20:53:33 +00:00
dist_utils.py
dynamo_test_failures.py
hop_db.py	[dynamo] support torchbind object input (#124978 )	2024-05-07 03:02:00 +00:00
hypothesis_utils.py
inductor_utils.py	[Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866 )	2024-05-09 00:51:35 +00:00
jit_metaprogramming_utils.py
jit_utils.py
logging_tensor.py	[BE]: Update ruff to v0.4.4 (#125031 )	2024-05-12 20:02:37 +00:00
logging_utils.py	[inductor] Minor fixes to various tests before enabling fx graph caching in OSS by default (#125258 )	2024-05-01 02:34:01 +00:00
quantization_torch_package_models.py
static_module.py
torchbind_impls.py	[dynamo] support torchbind object input (#124978 )	2024-05-07 03:02:00 +00:00
triton_utils.py
two_tensor.py