pytorch/torch/testing/_internal
haozhe.zhu 1c3fe84033 [optim] add fused_adagrad support for CPU device (#124905)
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c
https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_adagrad time: 0.2500 seconds
_fused_adagrad time: 0.0933 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_adagrad time: 2.8819 seconds
_fused_adagrad time: 1.7591 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905
Approved by: https://github.com/jgong5, https://github.com/janeyx99
2024-05-13 01:16:20 +00:00
..
codegen
data
distributed Fix DDP no_sync when find_unused_parameters is True (#124193) 2024-05-09 17:33:33 +00:00
generated
opinfo Test foreach functions with all dtypes except qints (#125527) 2024-05-10 18:56:37 +00:00
optests
test_module
__init__.py
autocast_test_lists.py
autograd_function_db.py
check_kernel_launches.py
common_cuda.py
common_device_type.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866) 2024-05-09 00:51:35 +00:00
common_dist_composable.py
common_distributed.py [c10d] Reduce test time by reusing ProcessGroup (#125648) 2024-05-08 22:33:40 +00:00
common_dtype.py
common_fsdp.py [FSDP2] Added HSDP grad acc tests and some minor changes (#125479) 2024-05-03 23:44:05 +00:00
common_jit.py
common_methods_invocations.py Test foreach functions with all dtypes except qints (#125527) 2024-05-10 18:56:37 +00:00
common_mkldnn.py
common_modules.py Skip test_memory_format_nn_BatchNorm2d in inductor (#125970) 2024-05-11 04:11:18 +00:00
common_nn.py
common_optimizers.py [optim] add fused_adagrad support for CPU device (#124905) 2024-05-13 01:16:20 +00:00
common_pruning.py
common_quantization.py [MPS] And naive quantized intmm and .gputrace capture hooks (#125163) 2024-05-03 15:20:39 +00:00
common_quantized.py
common_subclass.py
common_utils.py Remove Caffe2 python (#125143) 2024-05-10 21:15:43 +00:00
composite_compliance.py
custom_op_db.py Fix torch.library.register_fake's module reporting (#125037) 2024-04-26 20:53:33 +00:00
dist_utils.py
dynamo_test_failures.py
hop_db.py [dynamo] support torchbind object input (#124978) 2024-05-07 03:02:00 +00:00
hypothesis_utils.py
inductor_utils.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866) 2024-05-09 00:51:35 +00:00
jit_metaprogramming_utils.py
jit_utils.py
logging_tensor.py [BE]: Update ruff to v0.4.4 (#125031) 2024-05-12 20:02:37 +00:00
logging_utils.py [inductor] Minor fixes to various tests before enabling fx graph caching in OSS by default (#125258) 2024-05-01 02:34:01 +00:00
quantization_torch_package_models.py
static_module.py
torchbind_impls.py [dynamo] support torchbind object input (#124978) 2024-05-07 03:02:00 +00:00
triton_utils.py
two_tensor.py