pytorch/test
Anant Gulati b379a28a95 Generalization of distributed test cases for non-CUDA devices (#138216)
# Motivation
This pr is an extension of #131758. As described in #131758, these changes are looking to make distributed UTs more accessible to users of all device types.

It is a demonstration of a few changes discussed by @kwen2501 and @jgong5 in the discussion for #131758(https://github.com/pytorch/pytorch/pull/131758#discussion_r1762422784)

This PR contains two types of changes, the first is to the common distributed folder where we have added a new class derived from MultiProcessTestCase which helps abstracts out the process group creation /deletion and other functionality for a given device.

The new generalized content can be added by deriving from this base class.
Also includes other misc changes for gaudi support

The second changed file is test_functional_api. a test file in common distributed. This file is a POC for how we can use this new class to write more device agnostic distributed test cases.

The following changes have been made to test_functional_api.py:
-Functionality has been added to test for non cuda devices using intel HPU as an example
-Multiple set up steps previously required by MultiProcessTestCase have been abstracted out
-Misc adaptations to allow for general call to accelerators while adding test skips instead explicitly skipping for multiple GPUs
-Skipifhpu flags have been added to enable skipping a few Multithreaded test cases which are as yet not supported on HPUs

NOTE: Within test functional api, there are tests which require the use of some multithreading functions which are as yet not supported on HPUs. These have been skipped for hpu using skipHPU decorator.

I will be raising a separate PR to improve usability pf said decorators in a device agnostic setting in the manner suggested by @kwen2501 in a comment on this PR.

This pr is a cleaned up version of a previous PR(#136988) which I closed due to human error. I have addressed some of the comments made by @kwen2501 in this as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138216
Approved by: https://github.com/kwen2501, https://github.com/guangyey
2024-11-18 09:38:00 +00:00
..
ao/sparsity
autograd Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
backends/xeon
benchmark_utils
bottleneck_test
cpp [8/N] Don't skip ASAN on some tests (#140081) 2024-11-09 01:00:13 +00:00
cpp_api_parity
cpp_extensions OpenReg: Support autograd (#140662) 2024-11-14 23:47:56 +00:00
custom_backend
custom_operator
distributed Generalization of distributed test cases for non-CUDA devices (#138216) 2024-11-18 09:38:00 +00:00
distributions Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
dynamo [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [5/N] (#140663) 2024-11-18 04:11:56 +00:00
dynamo_expected_failures Revert "[dynamo] Fix constant propagation in builtins and UserClasses (#131354)" 2024-11-01 00:13:20 +00:00
dynamo_skips config: Add env_name_default and env_name_force to Config (#138956) 2024-11-06 21:20:42 +00:00
edge Set RUNPATH so installed tests can find the required shared libraries (#136627) 2024-10-25 09:38:08 +00:00
error_messages
expect Add ScalarList overload to _foreach_lerp (#134482) 2024-11-12 19:03:41 +00:00
export Ignore eager profiling code in training IR (#140826) 2024-11-16 20:31:17 +00:00
forward_backward_compatibility
functorch Fix softmax_backward_data cpu implementation error when argument output is noncontinguous (#139740) 2024-11-15 19:53:20 +00:00
fx Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726)" 2024-11-04 17:44:44 +00:00
higher_order_ops Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
inductor [inductor] Support fixed triton configs defined at compile time (#140217) 2024-11-17 16:10:37 +00:00
jit
jit_hooks
lazy
mobile
nn [BE] Replace skipIfMPS with expectedFailureMPS (#139940) 2024-11-15 03:48:37 +00:00
onnx [ONNX] Improve the conversion of from dynamic axes to shapes (#140488) 2024-11-15 04:26:45 +00:00
optim
package
profiler [profiler][UT] instantiate profiler UTs for devices and enable UTs for xpu profiler (#134316) 2024-11-05 05:46:13 +00:00
quantization [Quant][Onednn] add linear_dynamic_fp16 ops (#140376) 2024-11-14 05:19:18 +00:00
scripts
test_img
torch_np Update test_multiarray.py to support numpy 2.0+ (#138461) 2024-10-28 04:30:50 +00:00
typing
xpu
_test_bazel.py
allowlist_for_publicAPI.json
conftest.py
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py Flip default on weights_only (#137602) 2024-11-04 18:30:29 +00:00
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py Remove most rockset references (#139922) 2024-11-12 21:17:43 +00:00
simulate_nccl_errors.py
slow_tests.json Update slow tests (#139051) 2024-11-04 11:49:06 +00:00
test_ao_sparsity.py
test_autocast.py [MPS] Update error message for supported autocast type (#139192) 2024-10-30 16:48:29 +00:00
test_autograd.py [Codemod] skipIfMps->skipIfMPS (#140562) 2024-11-13 19:45:08 +00:00
test_autograd_fallback.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_autoload.py
test_binary_ufuncs.py
test_bundled_images.py
test_bundled_inputs.py
test_ci_sanity_check_fail.py
test_comparison_utils.py
test_compile_benchmark_util.py
test_complex.py
test_content_store.py
test_cpp_api_parity.py
test_cpp_extensions_aot.py
test_cpp_extensions_jit.py Avoid file encoding issues when loading cpp extensions (#138565) 2024-10-28 14:06:34 +00:00
test_cpp_extensions_mtia_backend.py
test_cpp_extensions_open_device_registration.py Support dlpack for privateuse1 (#135331) 2024-11-13 13:13:14 +00:00
test_cpp_extensions_stream_and_event.py
test_cuda.py Support tensor betas in Adam and AdamW (#134171) 2024-11-15 21:55:55 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py [BE]: Add better optional typing (#138426) 2024-10-27 14:19:00 +00:00
test_cuda_trace.py
test_custom_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_dataloader.py [BE][Ez]: Update ruff to 0.7.4 (#140806) 2024-11-15 17:04:32 +00:00
test_datapipe.py
test_decomp.py [7/N] Don't skip ASAN on some tests (#139675) 2024-11-05 14:01:01 +00:00
test_deploy.py
test_determination.py
test_dispatch.py
test_dlpack.py Use DLPack for creating tensors out of custom classes, when available. (#138697) 2024-10-26 01:27:05 +00:00
test_dynamic_shapes.py Revert "[dynamo] add SymNode bitwise and/or (#138777)" 2024-11-14 21:52:40 +00:00
test_expanded_weights.py
test_fake_tensor.py Fix split decomp returning self (#140065) 2024-11-13 01:58:02 +00:00
test_file_check.py
test_flop_counter.py FlopCounterMode: Decompose ops for inference mode (#138508) 2024-11-09 03:13:53 +00:00
test_foreach.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_function_schema.py
test_functional_autograd_benchmark.py
test_functional_optim.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_functionalization.py
test_functionalization_of_rng_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_futures.py
test_fx.py Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726)" 2024-11-04 17:44:44 +00:00
test_fx_experimental.py Revert "[dynamo] add SymNode bitwise and/or (#138777)" 2024-11-14 21:52:40 +00:00
test_fx_passes.py
test_fx_reinplace_pass.py
test_hub.py
test_import_stats.py
test_indexing.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_itt.py
test_jit.py Add support for parsing torch.Generator in JIT (#140489) 2024-11-13 23:06:57 +00:00
test_jit_autocast.py
test_jit_disabled.py
test_jit_fuser.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py
test_jit_legacy.py
test_jit_llga_fuser.py [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [2/N] (#140238) 2024-11-13 05:13:39 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py
test_license.py
test_linalg.py [ROCm] TunableOp fix for batched MM with views. (#140673) 2024-11-14 20:22:12 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_matmul_cuda.py
test_meta.py Fix triangular_solve meta function out parameter names. (#140186) 2024-11-12 19:04:34 +00:00
test_metal.py
test_mkl_verbose.py
test_mkldnn.py
test_mkldnn_fusion.py
test_mkldnn_verbose.py
test_mobile_optimizer.py
test_model_dump.py
test_model_exports_to_core_aten.py
test_module_tracker.py
test_modules.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_monitor.py
test_mps.py [BE] Replace skipIfMPS with expectedFailureMPS (#139940) 2024-11-15 03:48:37 +00:00
test_multiprocessing.py
test_multiprocessing_spawn.py
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py
test_nestedtensor.py Revert "Allow NJT by default for weights_only torch.load (#140304)" 2024-11-13 15:24:00 +00:00
test_nn.py Revert "Add NHWC support for group normalization (#126635)" 2024-11-15 23:38:15 +00:00
test_nnapi.py
test_numba_integration.py
test_numpy_interop.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_openmp.py
test_ops.py triangular_solve: fix meta function output argument dtype check. (#140286) 2024-11-14 15:25:14 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py
test_ops_jit.py
test_optim.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_out_dtype_op.py
test_overrides.py
test_package.py
test_per_overload_api.py
test_prims.py
test_proxy_tensor.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_pruning_op.py
test_public_bindings.py
test_python_dispatch.py [torchgen] Improve schema parsing with regex for numeric ranges (#140210) 2024-11-14 23:28:27 +00:00
test_pytree.py
test_quantization.py
test_reductions.py use more elements per thread for narrow dtypes (#139449) 2024-11-14 22:50:16 +00:00
test_scatter_gather_ops.py Fix the check for can_use_expanded_index_path (#140351) 2024-11-15 05:52:23 +00:00
test_schema_check.py
test_segment_reductions.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_serialization.py Fix get_unsafe_globals_in_checkpoint to account for user allowed globals per docstring (#140738) 2024-11-15 22:47:35 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py Add size param check of unfold (#139965) 2024-11-09 17:12:53 +00:00
test_show_pickle.py
test_sort_and_select.py Support torch.bool in torch.sort + CUDA (#139409) 2024-11-06 00:02:54 +00:00
test_sparse.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_sparse_csr.py
test_sparse_semi_structured.py
test_spectral_ops.py
test_stateless.py
test_static_runtime.py
test_subclass.py
test_sympy_utils.py Revert "[dynamo] add SymNode bitwise and/or (#138777)" 2024-11-14 21:52:40 +00:00
test_tensor_creation_ops.py fix test_float_to_int_conversion_nonfinite for NumPy 2 (#138131) 2024-11-14 04:19:19 +00:00
test_tensorboard.py
test_tensorexpr.py
test_tensorexpr_pybind.py
test_testing.py More flexible test parametrization with @reparametrize (#138369) 2024-10-29 22:14:38 +00:00
test_throughput_benchmark.py
test_torch.py [Codemod] skipIfMps->skipIfMPS (#140562) 2024-11-13 19:45:08 +00:00
test_transformers.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py use more elements per thread for narrow dtypes (#139449) 2024-11-14 22:50:16 +00:00
test_utils.py
test_utils_config_module.py Add type annotations to Configs (#139833) 2024-11-07 03:49:09 +00:00
test_view_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_vulkan.py
test_weak.py
test_xnnpack_integration.py
test_xpu.py Revert "Enable XPUEvent elapsed_time function (#134666)" (#140872) 2024-11-18 02:58:05 +00:00