mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Fixes #120242 The example from the issue now results in the graph ```python def forward(self, arg0_1, arg1_1): sin = torch.ops.aten.sin.default(arg0_1); arg0_1 = None copy_ = torch.ops.aten.copy_.default(arg1_1, sin); arg1_1 = sin = None return (copy_,) ``` and the corresponding inductor kernel eliminates the intermediate buffer completely ```python def call(args): arg0_1, arg1_1 = args args.clear() assert_size_stride(arg0_1, (5, ), (1, )) assert_size_stride(arg1_1, (5, ), (1, )) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # Source Nodes: [sin], Original ATen: [aten.sin] stream0 = get_raw_stream(0) triton_poi_fused_sin_0.run(arg0_1, arg1_1, 5, grid=grid(5), stream=stream0) del arg0_1 return (arg1_1, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120514 Approved by: https://github.com/ezyang, https://github.com/oulgen, https://github.com/lezcano |
||
|---|---|---|
| .. | ||
| attn_ft.py | ||
| attn_positional.py | ||
| common_utils.py | ||
| discover_coverage.py | ||
| functorch_additional_op_db.py | ||
| test_aotdispatch.py | ||
| test_control_flow.py | ||
| test_dims.py | ||
| test_eager_transforms.py | ||
| test_logging.py | ||
| test_memory_efficient_fusion.py | ||
| test_minifier.py | ||
| test_ops.py | ||
| test_parsing.py | ||
| test_rearrange.py | ||
| test_vmap.py | ||
| test_vmap_registrations.py | ||
| xfail_suggester.py | ||