pytorch/test
Sheng Fu 2ea4b56ec8 Record min/max of integral tensor in ET (#143088)
Summary:
In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143088
Approved by: https://github.com/sanrise
2024-12-18 08:20:35 +00:00
..
ao/sparsity Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
autograd Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
backends/xeon
benchmark_utils
bottleneck_test Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
cpp [14/N] Fix extra warnings brought by clang-tidy-17 (#141644) 2024-12-13 06:22:13 +00:00
cpp_api_parity
cpp_extensions Enable CPP/CUDAExtension with py_limited_api for python agnosticism (#138088) 2024-12-11 18:22:55 +00:00
custom_backend
custom_operator Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
distributed [fr] recognize all_reduce_barrier as a valid op (#143354) 2024-12-17 21:09:18 +00:00
distributions Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
dynamo Support garbage collection after pt2 compilation (#143364) 2024-12-18 07:25:11 +00:00
dynamo_expected_failures [dynamo] Fix VariableBuilder._wrap on frozenset and enforce invariants on ConstantVariable (#141504) 2024-11-27 21:58:35 +00:00
dynamo_skips config: Add env_name_default and env_name_force to Config (#138956) 2024-11-06 21:20:42 +00:00
edge
error_messages
expect [BE] Add type annotation to eliminate_dead_code (#142251) 2024-12-10 17:09:21 +00:00
export fix checking non-trivial input constraints (#143442) 2024-12-18 07:29:08 +00:00
forward_backward_compatibility [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
functorch Fix NJT backward tests (#143072) 2024-12-12 18:06:23 +00:00
fx Enhance "from_node" node meta to track source recursively (#142066) 2024-12-09 23:39:15 +00:00
higher_order_ops Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
inductor [provenance_tracking] Dump inductor_triton_kernel_to_post_grad_nodes.json info in debug_trace (#143055) 2024-12-18 06:51:50 +00:00
jit No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
jit_hooks
lazy
mobile
nn Fix unused Python variables in test/nn (#143396) 2024-12-18 03:30:54 +00:00
onnx [ONNX] Remove special handling of torchvision.ops imports in onnx export (#141569) 2024-11-28 18:05:40 +00:00
optim
package [ci, 3.13] skip failing torch.package dynamo-wrapped test (#141886) 2024-12-05 00:33:26 +00:00
profiler Record min/max of integral tensor in ET (#143088) 2024-12-18 08:20:35 +00:00
quantization [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505) 2024-12-13 22:26:22 +00:00
scripts
test_img
torch_np [ci, 3.13] fix/skip failing numpy 2.0+ dynamo-wrapped tests (#141950) 2024-12-05 00:33:26 +00:00
typing
xpu
_test_bazel.py
allowlist_for_publicAPI.json Refactor ShapeGuardPrinter for future C++ addiiton (#140968) 2024-11-27 20:09:58 +00:00
conftest.py
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py Flip default on weights_only (#137602) 2024-11-04 18:30:29 +00:00
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py skip test dynamo for aot_dispatch tests on ci (#142185) 2024-12-11 18:46:58 +00:00
simulate_nccl_errors.py
slow_tests.json Update slow tests (#143278) 2024-12-16 12:40:40 +00:00
test_accelerator.py [RELAND] Add UTs for accelerator device-agnostic runtime APIs (#133572) 2024-12-16 02:18:41 +00:00
test_ao_sparsity.py
test_autocast.py [MPS] Add support for bf16 autocast (#139390) 2024-11-20 19:52:28 +00:00
test_autograd.py Run only listed tests on s390x (#140265) 2024-11-20 22:53:09 +00:00
test_autograd_fallback.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_autoload.py
test_binary_ufuncs.py Fix torch.lerp RuntimeError when weight is CPU scalar while input & end are CUDA tensor (#141820) 2024-12-09 18:14:54 +00:00
test_bundled_images.py
test_bundled_inputs.py
test_ci_sanity_check_fail.py
test_comparison_utils.py [export] Add device and dtype fields to assert_tensor_metadata (#141071) 2024-11-22 20:54:55 +00:00
test_compile_benchmark_util.py
test_complex.py
test_content_store.py
test_cpp_api_parity.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_cpp_extensions_aot.py Enable CPP/CUDAExtension with py_limited_api for python agnosticism (#138088) 2024-12-11 18:22:55 +00:00
test_cpp_extensions_jit.py
test_cpp_extensions_mtia_backend.py
test_cpp_extensions_open_device_registration.py Allow user to manually pass module.name associated with global in {add}_safe_global (#142153) 2024-12-06 18:56:39 +00:00
test_cpp_extensions_stream_and_event.py
test_cuda.py Revert "[AMD] Turn on TF32 for aten::mm (#139869)" 2024-12-16 16:46:48 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py
test_cuda_trace.py
test_custom_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_dataloader.py Fixes in-order test flakiness (#142389) 2024-12-10 04:19:20 +00:00
test_datapipe.py
test_decomp.py [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
test_deploy.py
test_determination.py
test_dispatch.py
test_dlpack.py
test_dynamic_shapes.py Try to simplify FloorDiv axioms implications when needed during evaluations. (#141267) 2024-11-28 15:35:35 +00:00
test_expanded_weights.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_fake_tensor.py Allow Fakified subclass to have different device for inner and outer tensor (#141839) 2024-12-03 00:09:41 +00:00
test_file_check.py
test_flop_counter.py FlopCounterMode: Decompose ops for inference mode (#138508) 2024-11-25 16:53:10 +00:00
test_foreach.py pow: fix meta function output argument dtype check. (#140287) 2024-11-20 13:28:47 +00:00
test_function_schema.py
test_functional_autograd_benchmark.py
test_functional_optim.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_functionalization.py
test_functionalization_of_rng_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_futures.py
test_fx.py Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726)" 2024-11-04 17:44:44 +00:00
test_fx_experimental.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_fx_passes.py
test_fx_reinplace_pass.py
test_hub.py
test_import_stats.py
test_indexing.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_itt.py
test_jit.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_jit_autocast.py
test_jit_disabled.py
test_jit_fuser.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py
test_jit_legacy.py
test_jit_llga_fuser.py [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [2/N] (#140238) 2024-11-13 05:13:39 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py
test_license.py
test_linalg.py [ROCm] Fix TunableOp UTs: Rotating Buffer (#143172) 2024-12-14 06:18:11 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py Correctly specify size of sparse_csr tensors in maskedtensor binary ops (#134335) 2024-12-03 02:55:57 +00:00
test_matmul_cuda.py [Matmul][CUDA][FP8] Skip rowwise scaling tests on non-sm90 (#141596) 2024-12-10 23:16:19 +00:00
test_meta.py pow: fix meta function output argument dtype check. (#140287) 2024-11-20 13:28:47 +00:00
test_metal.py
test_mkl_verbose.py
test_mkldnn.py
test_mkldnn_fusion.py
test_mkldnn_verbose.py
test_mobile_optimizer.py
test_model_exports_to_core_aten.py
test_module_tracker.py [ci, 3.13] skip failing module tracker dynamo-wrapped test (#141887) 2024-12-05 00:33:26 +00:00
test_modules.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_monitor.py [ci, 3.13] update tensorboard version for 3.13 to fix broken tests (#141572) 2024-12-05 00:24:07 +00:00
test_mps.py Extend bmm tiling to work up to 2^32 elem in any single output dim (#143095) 2024-12-17 16:03:46 +00:00
test_multiprocessing.py
test_multiprocessing_spawn.py
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py [ROCm] Update to AOTriton 0.8b (#140172) 2024-12-06 21:45:18 +00:00
test_nestedtensor.py Fix NJT backward tests (#143072) 2024-12-12 18:06:23 +00:00
test_nn.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_nnapi.py
test_numba_integration.py
test_numpy_interop.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_openmp.py
test_ops.py [2/N] Enable UBSAN tests (#141740) 2024-12-03 20:52:26 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py Run only listed tests on s390x (#140265) 2024-11-20 22:53:09 +00:00
test_ops_jit.py
test_optim.py Deprecate torch._utils.is_compiling() (#127690) 2024-12-08 22:55:36 +00:00
test_out_dtype_op.py
test_overrides.py c10::string_view -> std::string_view in more places (#142517) 2024-12-12 19:45:59 +00:00
test_package.py
test_per_overload_api.py
test_prims.py
test_proxy_tensor.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_pruning_op.py
test_public_bindings.py [BE] Remove Model Dump utility (#141540) 2024-11-27 22:52:55 +00:00
test_python_dispatch.py Fix fallthrough behaviour when Meta in TLS include set (#141581) 2024-12-09 20:32:44 +00:00
test_pytree.py [dynamo][pytree][2/N] make CXX pytree traceable: tree_flatten / tree_unflatten / tree_structure (#137398) 2024-12-12 18:05:25 +00:00
test_quantization.py
test_reductions.py use more elements per thread for narrow dtypes (#139449) 2024-11-14 22:50:16 +00:00
test_scatter_gather_ops.py Fix the check for can_use_expanded_index_path (#140351) 2024-11-15 05:52:23 +00:00
test_schema_check.py
test_segment_reductions.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_serialization.py Prevent torch.jit.load path in torch.load when weights_only=True (#143326) 2024-12-18 00:17:41 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py Add size param check of unfold (#139965) 2024-11-09 17:12:53 +00:00
test_show_pickle.py
test_sort_and_select.py Support torch.bool in torch.sort + CUDA (#139409) 2024-11-06 00:02:54 +00:00
test_sparse.py sparse_broadcast_to: less memory footprint, fewer kernel launches (#142364) 2024-12-11 16:09:09 +00:00
test_sparse_csr.py
test_sparse_semi_structured.py Enable CI on SM89 (#140305) 2024-12-03 04:49:46 +00:00
test_spectral_ops.py
test_stateless.py
test_static_runtime.py
test_subclass.py
test_sympy_utils.py [dynamo] add SymNode bitwise and/or (#138777) 2024-11-22 23:36:16 +00:00
test_tensor_creation_ops.py fix test_float_to_int_conversion_nonfinite for NumPy 2 (#138131) 2024-11-14 04:19:19 +00:00
test_tensorboard.py [ci, 3.13] update tensorboard version for 3.13 to fix broken tests (#141572) 2024-12-05 00:24:07 +00:00
test_tensorexpr.py
test_tensorexpr_pybind.py
test_testing.py [ci, 3.13] update test_testing.py usage of locals() for 3.13 (#141577) 2024-12-05 00:24:14 +00:00
test_throughput_benchmark.py
test_torch.py Add support for CPU scalar in addcmul (#143264) 2024-12-18 04:43:29 +00:00
test_transformers.py Revert "[ROCm] CK Flash Attention Backend (#138947)" 2024-12-17 16:46:57 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py Implements nonzero_static on cuda (#141838) 2024-12-11 06:44:48 +00:00
test_utils.py
test_utils_config_module.py Add config alias (#142088) 2024-12-16 18:51:17 +00:00
test_utils_filelock.py filelock: Make waitcounter variant to use (#139816) 2024-12-12 01:18:34 +00:00
test_view_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_vulkan.py
test_weak.py
test_xnnpack_integration.py
test_xpu.py [RELAND] Add UTs for accelerator device-agnostic runtime APIs (#133572) 2024-12-16 02:18:41 +00:00