pytorch/test
Shangdi Yu a47bb4a393 Fix autocast for non-strict export (#137495)
Summary:

add testing for autocast and set_grad nodes for export_for_training. In export_for_training, we do not wrap the autocast and set_grad node in to HOP, but we should still have the set_grad_enabled/autocast nodes.

add support for autocast in non-strict export. Previously, `_enter_autocast` and `_exit_autocast` nodes don't show up in the export graph when we use `strict=False`.

- In autocast's enter and exit function, we dispatch to `PreDispatchTorchFunctionMode.__torch_function__`.
 if we have PreDispatchTorchFunctionMode in our function_mode_stack, the call stack looks like below. This is mostly the same call stack as strict mode, except strict mode enters [here](https://www.internalfb.com/code/fbsource/[0d4f1135cacdb26c6e01d5dce1ce52a15d61ee48]/xplat/caffe2/torch/_dynamo/variables/ctx_manager.py?lines=806).
```
- torch.amp.autocast.__enter__()'s torch.overrides.handle_torch_function
- torch.fx.experimental.proxy_tensor.TorchFunctionMetadataMode.__torch_function__
- torch.amp._enter_autocast()'s torch.overrides.handle_torch_function
- PreDispatchTorchFunctionMode.__torch_function__
```
- in `PreDispatchTorchFunctionMode.__torch_function__`, we create the autocast nodes.
- to match the strict mode behavior, we let the input node to the `_exist_autocast` node be the corresponding `_enter_autocast` node. This requires us to maintain a stack in `PreDispatchTorchFunctionMode`.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export  -- -r  test_export_with_autocast
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export  -- -r  test_export_with_set_grad
```

Differential Revision: D64016023

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137495
Approved by: https://github.com/bdhirsh
2024-10-16 17:39:00 +00:00
..
ao/sparsity
autograd
backends/xeon
benchmark_utils c10::optional -> std::optional in PyTorch (#137333) 2024-10-11 00:16:10 +00:00
bottleneck_test
cpp [c10d] Add unit test for CUDAEventCache to ensure caching is working (#138059) 2024-10-16 17:34:57 +00:00
cpp_api_parity
cpp_extensions Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)" 2024-10-15 17:19:16 +00:00
custom_backend
custom_operator Use C10_UNUSED instead of (void)X (#137239) 2024-10-15 14:32:59 +00:00
distributed Upgrade distributed test to g4dn instances (T4 GPUs) (#137161) 2024-10-16 16:42:57 +00:00
distributions
dynamo Revert "[compiled autograd] Compiled autograd configs in TLS (#137821)" 2024-10-16 16:38:29 +00:00
dynamo_expected_failures Fix autograd function calls without context arg (#137809) 2024-10-15 01:25:47 +00:00
dynamo_skips
edge Add option to disable operator profiling (#136838) 2024-10-04 22:56:00 +00:00
error_messages
expect Add decomposition for permute_copy (#130944) 2024-10-15 13:51:20 +00:00
export Fix autocast for non-strict export (#137495) 2024-10-16 17:39:00 +00:00
forward_backward_compatibility Clean up op BC check list (#137634) 2024-10-10 04:29:21 +00:00
functorch Add decomposition for permute_copy (#130944) 2024-10-15 13:51:20 +00:00
fx
higher_order_ops
inductor Revert "[compiled autograd] Compiled autograd configs in TLS (#137821)" 2024-10-16 16:38:29 +00:00
jit
jit_hooks
lazy
mobile
nn Remove dtype check on meta device (#136774) 2024-10-12 05:45:21 +00:00
onnx Revert "[ONNX] Remove deprecated export_to_pretty_string (#137790)" 2024-10-15 17:40:06 +00:00
optim Autoupdate min_lrs for ReduceLROnPlateau if possible, fixes #104361 (#137637) 2024-10-10 01:23:30 +00:00
package
profiler test_execution_trace.py: Use instantiate_device_type_tests to run GPU tests on HPU as well (#133975) 2024-10-16 07:53:06 +00:00
quantization Skip doc test internally (#137813) 2024-10-14 21:29:15 +00:00
scripts
test_img
torch_np Fix torch_np/test_basic for NumPy 2 (#137814) 2024-10-15 16:40:28 +00:00
typing
xpu
_test_bazel.py
allowlist_for_publicAPI.json
conftest.py
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py Upgrade distributed test to g4dn instances (T4 GPUs) (#137161) 2024-10-16 16:42:57 +00:00
simulate_nccl_errors.py
slow_tests.json
test_ao_sparsity.py
test_autocast.py
test_autograd.py Determine autograd engine ready queue based on InputMetadata instead of InputBuffer (#135633) 2024-10-04 23:59:46 +00:00
test_autograd_fallback.py
test_autoload.py
test_binary_ufuncs.py Fix test_binary_ufuncs.py for NumPy 2 (#137937) 2024-10-15 17:04:24 +00:00
test_bundled_images.py
test_bundled_inputs.py
test_ci_sanity_check_fail.py
test_comparison_utils.py
test_compile_benchmark_util.py
test_complex.py
test_content_store.py
test_cpp_api_parity.py
test_cpp_extensions_aot.py
test_cpp_extensions_jit.py Unify cpp_extension build directory removal (#136059) 2024-10-03 06:22:11 +00:00
test_cpp_extensions_mtia_backend.py Unify cpp_extension build directory removal (#136059) 2024-10-03 06:22:11 +00:00
test_cpp_extensions_open_device_registration.py Remove dependency on numpy for serialization for XLA/open registration devices without numpy (#137444) 2024-10-09 19:35:55 +00:00
test_cpp_extensions_stream_and_event.py Unify cpp_extension build directory removal (#136059) 2024-10-03 06:22:11 +00:00
test_cuda.py [ROCm] Add AMDSMI support for UUID input (#129741) 2024-10-15 15:56:30 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py
test_cuda_trace.py
test_custom_ops.py Fix torch.library.register_vmap (#137306) 2024-10-04 03:46:35 +00:00
test_dataloader.py [Inductor UT] Generalize device-bias code introduced from #134874 and (#136596) 2024-09-26 02:56:59 +00:00
test_datapipe.py
test_decomp.py
test_deploy.py
test_determination.py
test_dispatch.py
test_dlpack.py
test_dynamic_shapes.py Simplify find_localzeros (#133325) 2024-10-10 00:52:50 +00:00
test_expanded_weights.py
test_fake_tensor.py [fake_tensor][cache] Supports ops with tuple of output tensors (#137935) 2024-10-15 22:15:07 +00:00
test_file_check.py
test_flop_counter.py
test_foreach.py
test_function_schema.py
test_functional_autograd_benchmark.py
test_functional_optim.py
test_functionalization.py
test_functionalization_of_rng_ops.py
test_futures.py
test_fx.py
test_fx_experimental.py
test_fx_passes.py
test_fx_reinplace_pass.py
test_hub.py
test_import_stats.py
test_indexing.py
test_itt.py
test_jit.py
test_jit_autocast.py Using device-agnostic autocast api (#136613) 2024-09-27 07:16:24 +00:00
test_jit_disabled.py
test_jit_fuser.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py
test_jit_legacy.py
test_jit_llga_fuser.py Using device-agnostic autocast api (#136613) 2024-09-27 07:16:24 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py
test_license.py
test_linalg.py [ROCm] TunableOp more unit test follow-up - Part 2 (#134517) 2024-10-16 01:49:47 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py Fix memory leak on masked Tensor (#137890) 2024-10-15 18:37:55 +00:00
test_matmul_cuda.py [FP8][CUDA] Fix stale expected error message (#136581) 2024-09-26 20:57:38 +00:00
test_meta.py add meta for _segment_reduce_backward (#137442) 2024-10-08 18:40:06 +00:00
test_metal.py
test_mkl_verbose.py
test_mkldnn.py
test_mkldnn_fusion.py
test_mkldnn_verbose.py
test_mobile_optimizer.py
test_model_dump.py
test_model_exports_to_core_aten.py
test_module_tracker.py
test_modules.py
test_monitor.py
test_mps.py Add decomposition for permute_copy (#130944) 2024-10-15 13:51:20 +00:00
test_multiprocessing.py
test_multiprocessing_spawn.py multiprocessing.spawn: allow a grace period when shutdown (#131278) 2024-10-07 12:37:34 +00:00
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py
test_nestedtensor.py Fix autograd.Function + NJT when an output grad is None (#136875) 2024-10-14 19:31:50 +00:00
test_nn.py Enable additional tests for MPS CI runs (#134356) 2024-10-04 21:52:38 +00:00
test_nnapi.py
test_numba_integration.py
test_numpy_interop.py Fix dtype test for NumPy 2 (#137532) 2024-10-10 18:12:25 +00:00
test_openmp.py
test_ops.py Ensure noncontiguous tensor creation tests offsetting (#136396) 2024-10-02 00:40:43 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py
test_ops_jit.py
test_optim.py Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict (#134107) 2024-10-14 19:24:44 +00:00
test_out_dtype_op.py
test_overrides.py Revert "[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669)" 2024-10-15 23:22:58 +00:00
test_package.py
test_per_overload_api.py
test_prims.py
test_proxy_tensor.py Fix bug in functional tensor decomp (#136600) 2024-09-25 17:37:50 +00:00
test_pruning_op.py
test_public_bindings.py Revert "[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669)" 2024-10-15 23:22:58 +00:00
test_python_dispatch.py Fix wrapper subclass reentrant dispatch + TorchDispatchMode (#136566) 2024-09-26 14:06:51 +00:00
test_pytree.py
test_quantization.py
test_reductions.py Ensure noncontiguous tensor creation tests offsetting (#136396) 2024-10-02 00:40:43 +00:00
test_scatter_gather_ops.py
test_schema_check.py
test_segment_reductions.py
test_serialization.py Revert "Expose option to disable CRC-32 computation during torch.save (#137735)" 2024-10-16 17:03:06 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py
test_show_pickle.py
test_sort_and_select.py
test_sparse.py Fix bmm_sparse_cuda illegal memory access (#131977) 2024-10-07 22:47:34 +00:00
test_sparse_csr.py [ROCm] Enable test_triton* in test_sparse_csr suite (#137712) 2024-10-15 15:41:21 +00:00
test_sparse_semi_structured.py Add Triton CPU as an Inductor backend (#133408) 2024-09-30 20:24:52 +00:00
test_spectral_ops.py
test_stateless.py
test_static_runtime.py
test_subclass.py
test_sympy_utils.py
test_tensor_creation_ops.py [redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341) 2024-10-07 00:58:51 +00:00
test_tensorboard.py
test_tensorexpr.py
test_tensorexpr_pybind.py
test_testing.py
test_throughput_benchmark.py
test_torch.py Fixes NumPy 2 test failures in test_torch.py (#137740) 2024-10-12 02:40:17 +00:00
test_transformers.py [CUDA][SDPA] Fix TF32 handling and bump threshold for multiheadattention test (#137752) 2024-10-12 03:05:21 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py
test_utils.py
test_utils_internal.py
test_view_ops.py Enable additional tests for MPS CI runs (#134356) 2024-10-04 21:52:38 +00:00
test_vulkan.py
test_weak.py
test_xnnpack_integration.py
test_xpu.py Make device-specific event inherits from torch.Event (#134845) 2024-10-01 06:28:41 +00:00