pytorch/torchgen
Masaki Kozuki 49f6849f58 Fix codegen logic for foreach derivatives (#95263)
follow-up https://github.com/pytorch/pytorch/pull/93901.

Unexpected numerical mismatches observed in some foreach functions' backward result seemed to be caused by the wrong order of `IndexRangeGenerator::range` call.
This pr has `args_with_derivatives` have the same or similar order of `foreach_native_function.func.arguments.flat_non_out`

---

what the current master generates for `_foreach_mul.List`:
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto other_ix = gen.range(other_size_);
  auto self_ix = gen.range(self_size_);
  variable_list grad_inputs(gen.size());
  auto other = unpack_list(other_);
  auto self = unpack_list(self_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}
```

with this PR the generated backward is
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto self_ix = gen.range(self_size_);                                         <----- diff
  auto other_ix = gen.range(other_size_);                                       <----- diff
  variable_list grad_inputs(gen.size());
  auto self = unpack_list(self_);
  auto other = unpack_list(other_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}

```

The change is to fix the order of `self_ix` and `other_ix`.[](url)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95263
Approved by: https://github.com/soulitzer
2023-03-04 20:03:54 +00:00
..
api Fix codegen logic for foreach derivatives (#95263) 2023-03-04 20:03:54 +00:00
decompositions
dest Fixes for PyTorch/XLA functionalization integration (#94537) 2023-03-02 23:02:34 +00:00
executorch [executorch] Add RuntimeContext to generated C++ API Signature (#94570) 2023-02-16 02:43:18 +00:00
operator_versions Update flatbuffer test models to match pkl models (#93022) 2023-01-26 21:17:57 +00:00
selective_build [BE] Enable more flake8-comprehensions checks (#94601) 2023-02-10 23:40:29 +00:00
shape_functions
static_runtime Make segment_reduce properly private. (#93166) 2023-02-06 18:32:23 +00:00
__init__.py
BUCK.oss
BUILD.bazel
build.bzl [torchgen] Add CI job to make sure torchgen works for Executorch op registration (#89596) 2022-12-21 03:07:32 +00:00
code_template.py
context.py
gen.py [BE] Apply almost all remaining flake8-comprehension checks (#94676) 2023-02-12 01:01:25 +00:00
gen_backend_stubs.py [BE] Enable more flake8-comprehensions checks (#94601) 2023-02-10 23:40:29 +00:00
gen_executorch.py [executorch] Always generate CustomOpsNativeFunctions.h if custom_ops.yaml is present (#95084) 2023-02-20 18:54:41 +00:00
gen_functionalization_type.py Fixes for PyTorch/XLA functionalization integration (#94537) 2023-03-02 23:02:34 +00:00
gen_lazy_tensor.py [BE] Prefer dash over underscore in command-line options (#94505) 2023-02-09 20:16:49 +00:00
gen_vmap_plumbing.py fix: update error when tensor escapes vmap (#89077) 2022-12-06 05:52:09 +00:00
local.py
model.py [MPS] Move max_pool2d to mps dispatch key (#90772) 2023-02-16 01:13:08 +00:00
native_function_generation.py [BE] Apply almost all remaining flake8-comprehension checks (#94676) 2023-02-12 01:01:25 +00:00
utils.py [BE] Import Literal, Protocol, and Final from standard library typing as of Python 3.8+ (#94490) 2023-02-09 19:17:49 +00:00