2022-06-15 18:08:48 +00:00
|
|
|
load(
|
|
|
|
|
":ufunc_defs.bzl",
|
|
|
|
|
"aten_ufunc_generated_cpu_kernel_sources",
|
|
|
|
|
"aten_ufunc_generated_cpu_sources",
|
|
|
|
|
"aten_ufunc_generated_cuda_sources",
|
|
|
|
|
)
|
|
|
|
|
|
2022-04-15 17:25:29 +00:00
|
|
|
def define_targets(rules):
|
2023-04-04 00:42:58 +00:00
|
|
|
rules.cc_library(
|
|
|
|
|
name = "caffe2_core_macros",
|
|
|
|
|
hdrs = [":caffe2_core_macros_h"],
|
|
|
|
|
)
|
|
|
|
|
|
2023-04-04 00:42:57 +00:00
|
|
|
rules.cmake_configure_file(
|
|
|
|
|
name = "caffe2_core_macros_h",
|
|
|
|
|
src = "caffe2/core/macros.h.in",
|
|
|
|
|
out = "caffe2/core/macros.h",
|
|
|
|
|
definitions = [
|
2024-08-12 18:34:56 +00:00
|
|
|
"CAFFE2_BUILD_SHARED_LIBS",
|
2023-04-04 00:42:57 +00:00
|
|
|
"CAFFE2_PERF_WITH_AVX",
|
|
|
|
|
"CAFFE2_PERF_WITH_AVX2",
|
2024-08-12 18:34:56 +00:00
|
|
|
"CAFFE2_USE_EXCEPTION_PTR",
|
2023-04-04 00:42:57 +00:00
|
|
|
"CAFFE2_USE_CUDNN",
|
|
|
|
|
"USE_MKLDNN",
|
|
|
|
|
"CAFFE2_USE_ITT",
|
[ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973 #110532 #108404 #94891
**Context:**
As commented in https://github.com/pytorch/pytorch/blob/6ae0554d11b973930d7b8ec1e937b27ac961d7bf/cmake/Dependencies.cmake#L1198
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.
Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)
**Changes:**
This pull request serves the following purposes:
* Refactor and clean up the logic, make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.
Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++
**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT is OFF(0), and kernel assert is disabled:
```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.
```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)
```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```
Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```
(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```
Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```
Run the assert test, and expected return code not equal to 0.
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
>>> r
-6
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
|
|
|
"USE_ROCM_KERNEL_ASSERT",
|
2023-04-04 00:42:57 +00:00
|
|
|
"EIGEN_MPL2_ONLY",
|
|
|
|
|
],
|
|
|
|
|
)
|
|
|
|
|
|
2022-04-15 17:25:29 +00:00
|
|
|
rules.cc_library(
|
|
|
|
|
name = "caffe2_serialize",
|
|
|
|
|
srcs = [
|
|
|
|
|
"caffe2/serialize/file_adapter.cc",
|
|
|
|
|
"caffe2/serialize/inline_container.cc",
|
|
|
|
|
"caffe2/serialize/istream_adapter.cc",
|
2024-09-13 16:42:37 +00:00
|
|
|
"caffe2/serialize/read_adapter_interface.cc",
|
2022-04-15 17:25:29 +00:00
|
|
|
],
|
2024-10-27 16:31:34 +00:00
|
|
|
copts = ["-fexceptions", "-DFBCODE_CAFFE2"],
|
2022-04-15 17:25:29 +00:00
|
|
|
tags = [
|
2023-04-02 10:36:26 +00:00
|
|
|
"-fbcode",
|
2022-04-15 17:25:29 +00:00
|
|
|
"supermodule:android/default/pytorch",
|
|
|
|
|
"supermodule:ios/default/public.pytorch",
|
2022-04-25 17:45:10 +00:00
|
|
|
"xplat",
|
2022-04-15 17:25:29 +00:00
|
|
|
],
|
|
|
|
|
visibility = ["//visibility:public"],
|
|
|
|
|
deps = [
|
|
|
|
|
":caffe2_headers",
|
2022-04-25 16:04:57 +00:00
|
|
|
"//c10",
|
2024-11-09 00:13:16 +00:00
|
|
|
"//third_party/miniz-3.0.2:miniz",
|
2023-04-02 10:36:26 +00:00
|
|
|
"@com_github_glog//:glog",
|
2022-04-15 17:25:29 +00:00
|
|
|
],
|
2022-06-15 18:08:48 +00:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
#
|
|
|
|
|
# ATen generated code
|
|
|
|
|
# You need to keep this is sync with the files written out
|
|
|
|
|
# by gen.py (in the cmake build system, we track generated files
|
|
|
|
|
# via generated_cpp.txt and generated_cpp.txt-cuda
|
|
|
|
|
#
|
|
|
|
|
# Sure would be nice to use gen.py to create this list dynamically
|
|
|
|
|
# instead of hardcoding, no? Well, we can't, as discussed in this
|
|
|
|
|
# thread:
|
|
|
|
|
# https://fb.facebook.com/groups/askbuck/permalink/1924258337622772/
|
|
|
|
|
|
|
|
|
|
gen_aten_srcs = [
|
|
|
|
|
"aten/src/ATen/native/native_functions.yaml",
|
|
|
|
|
"aten/src/ATen/native/tags.yaml",
|
|
|
|
|
] + rules.glob(["aten/src/ATen/templates/*"])
|
|
|
|
|
|
|
|
|
|
gen_aten_cmd = " ".join([
|
2022-06-16 00:36:32 +00:00
|
|
|
"$(execpath //torchgen:gen)",
|
2022-06-15 18:08:48 +00:00
|
|
|
"--install_dir=$(RULEDIR)",
|
|
|
|
|
"--source-path aten/src/ATen",
|
2024-05-31 23:56:11 +00:00
|
|
|
"--aoti_install_dir=$(RULEDIR)/torch/csrc/inductor/aoti_torch/generated"
|
2022-06-15 18:08:48 +00:00
|
|
|
] + (["--static_dispatch_backend CPU"] if rules.is_cpu_static_dispatch_build() else []))
|
|
|
|
|
|
|
|
|
|
gen_aten_outs_cuda = (
|
2024-08-28 21:53:24 +00:00
|
|
|
GENERATED_H_CUDA + GENERATED_CPP_CUDA + GENERATED_AOTI_CUDA_CPP +
|
2022-06-15 18:08:48 +00:00
|
|
|
aten_ufunc_generated_cuda_sources()
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
gen_aten_outs = (
|
|
|
|
|
GENERATED_H + GENERATED_H_CORE +
|
|
|
|
|
GENERATED_CPP + GENERATED_CPP_CORE +
|
2024-05-31 23:56:11 +00:00
|
|
|
GENERATED_AOTI_CPP +
|
2022-06-15 18:08:48 +00:00
|
|
|
aten_ufunc_generated_cpu_sources() +
|
|
|
|
|
aten_ufunc_generated_cpu_kernel_sources() + [
|
|
|
|
|
"Declarations.yaml",
|
|
|
|
|
] + gen_aten_outs_cuda
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
rules.genrule(
|
|
|
|
|
name = "gen_aten",
|
|
|
|
|
srcs = gen_aten_srcs,
|
|
|
|
|
outs = gen_aten_outs,
|
|
|
|
|
cmd = gen_aten_cmd,
|
2023-04-02 10:36:26 +00:00
|
|
|
tools = ["//torchgen:gen"],
|
2022-06-15 18:08:48 +00:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
rules.genrule(
|
|
|
|
|
name = "gen_aten_hip",
|
|
|
|
|
srcs = gen_aten_srcs,
|
|
|
|
|
outs = gen_aten_outs_cuda,
|
|
|
|
|
cmd = gen_aten_cmd + " --rocm",
|
|
|
|
|
features = ["-create_bazel_outputs"],
|
|
|
|
|
tags = ["-bazel"],
|
2023-04-02 10:36:26 +00:00
|
|
|
tools = ["//torchgen:gen"],
|
2022-04-15 17:25:29 +00:00
|
|
|
)
|
2022-04-25 17:45:10 +00:00
|
|
|
|
2022-05-03 09:48:27 +00:00
|
|
|
rules.genrule(
|
|
|
|
|
name = "generate-code",
|
|
|
|
|
srcs = [
|
2022-05-05 23:31:35 +00:00
|
|
|
":DispatchKeyNativeFunctions.cpp",
|
|
|
|
|
":DispatchKeyNativeFunctions.h",
|
|
|
|
|
":LazyIr.h",
|
2022-05-24 19:29:23 +00:00
|
|
|
":LazyNonNativeIr.h",
|
[torchgen] Generate wrapper functions under custom namespaces (#81744)
Summary:
A follow up of #81581. Before these 2 PRs, if an operator with custom kernel namespace is added to `native_functions.yaml` (or any other yaml consumed by `torchgen`), although we are able to recognize the custom kernel in files such as `NativeFunctions.h` and `RegisterCPU.cpp`, we still generate backend specific wrappers under the hardcoded `at` namespace. This changes the behavior, by generating wrapper functions under custom namespaces.
For example, if the entries in yaml file looks like:
```
- func: op_1(Tensor(a) self) -> Tensor(a)
dispatch:
CPU: at::op_1_kernel # ATen kernel
- func: op_2(Tensor(a) self) -> Tensor(a)
dispatch:
CPU: custom::op_2_kernel # custom kernel
```
We generate the following code for `CPUFunctions_inl.h` and `RegisterCPU.cpp`:
`CPUFunctions_inl.h`:
```
namespace at {
namespace cpu {
TORCH_API at::Tensor & op_1(const at::Tensor & self);
} // namespace cpu
} // namespace at
namespace custom {
namespace cpu {
TORCH_API at::Tensor & op_2(const at::Tensor & self);
} // namespace cpu
} // namespace custom
```
Notice the difference between `at::cpu` and `custom::cpu`.
Then the definition for these can be found in `RegisterCPU.cpp`.
`RegisterCPU.cpp`:
```
#include "CPUFunctions.h"
namespace at {
namespace {
at::Tensor & wrapper_op_1(const at::Tensor & self) {
// No device check
// DeviceGuard omitted
return at::native::op_1_kernel(self);
}
} // anonymous namespace
TORCH_LIBRARY_IMPL(aten, CPU, m) {
m.impl("op_1", TORCH_FN(wrapper_op_1));
}
namespace cpu {
at::Tensor & op_1(at::Tensor & self) {
return wrapper_op_1(self);
}
} // namespace cpu
} // namespace at
namespace custom {
namespace {
at::Tensor & wrapper_op_2(const at::Tensor & self) {
// No device check
// DeviceGuard omitted
return at::native::op_2_kernel(self);
}
} // anonymous namespace
TORCH_LIBRARY_IMPL(aten, CPU, m) {
m.impl("op_2", TORCH_FN(wrapper_op_2));
}
namespace cpu {
at::Tensor & op_2(at::Tensor & self) {
return wrapper_op_2(self);
}
} // namespace cpu
} // namespace custom
```
The benefit for this change is that it unifies all the namespaces derived from custom ops. In the example above, there are:
1. `custom::native` for kernels
2. `custom::<dispatch_key>` e.g., `custom::cpu` for wrappers
This customized operator will have nothing to do with `at::native`, `at::cpu` etc.
Test Plan: This is very hard to test. I will refactor this logic, abstract out some layers so it's testable. Will do it in coming PRs
Differential Revision: D37972772
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81744
Approved by: https://github.com/bdhirsh
2022-08-04 07:48:44 +00:00
|
|
|
":RegisterDispatchDefinitions.ini",
|
2023-04-02 10:36:26 +00:00
|
|
|
":RegisterDispatchKey.cpp",
|
2022-05-03 09:48:27 +00:00
|
|
|
":native_functions.yaml",
|
2022-05-05 23:31:35 +00:00
|
|
|
":shape_inference.h",
|
2022-05-03 09:48:27 +00:00
|
|
|
":tags.yaml",
|
2022-05-05 23:31:35 +00:00
|
|
|
":ts_native_functions.cpp",
|
|
|
|
|
":ts_native_functions.yaml",
|
2022-05-03 09:48:27 +00:00
|
|
|
],
|
2022-05-16 21:40:31 +00:00
|
|
|
outs = GENERATED_AUTOGRAD_CPP + GENERATED_AUTOGRAD_PYTHON + GENERATED_TESTING_PY,
|
2022-06-16 00:36:30 +00:00
|
|
|
cmd = "$(execpath //tools/setup_helpers:generate_code) " +
|
2022-05-04 13:48:20 +00:00
|
|
|
"--gen-dir=$(RULEDIR) " +
|
2022-05-03 09:48:27 +00:00
|
|
|
"--native-functions-path $(location :native_functions.yaml) " +
|
|
|
|
|
"--tags-path=$(location :tags.yaml) " +
|
|
|
|
|
"--gen_lazy_ts_backend",
|
2023-04-02 10:36:26 +00:00
|
|
|
tools = ["//tools/setup_helpers:generate_code"],
|
2022-05-03 09:48:27 +00:00
|
|
|
)
|
|
|
|
|
|
2022-05-18 18:28:43 +00:00
|
|
|
rules.cc_library(
|
|
|
|
|
name = "generated-autograd-headers",
|
|
|
|
|
hdrs = [":{}".format(h) for h in _GENERATED_AUTOGRAD_CPP_HEADERS + _GENERATED_AUTOGRAD_PYTHON_HEADERS],
|
|
|
|
|
visibility = ["//visibility:public"],
|
|
|
|
|
)
|
|
|
|
|
|
2022-04-25 17:45:10 +00:00
|
|
|
rules.genrule(
|
|
|
|
|
name = "version_h",
|
|
|
|
|
srcs = [
|
|
|
|
|
":torch/csrc/api/include/torch/version.h.in",
|
|
|
|
|
":version.txt",
|
|
|
|
|
],
|
|
|
|
|
outs = ["torch/csrc/api/include/torch/version.h"],
|
2022-06-16 00:36:32 +00:00
|
|
|
cmd = "$(execpath //tools/setup_helpers:gen_version_header) " +
|
2022-04-25 17:45:10 +00:00
|
|
|
"--template-path $(location :torch/csrc/api/include/torch/version.h.in) " +
|
|
|
|
|
"--version-path $(location :version.txt) --output-path $@ ",
|
|
|
|
|
tools = ["//tools/setup_helpers:gen_version_header"],
|
|
|
|
|
)
|
2022-05-03 09:48:27 +00:00
|
|
|
|
2022-06-01 15:02:21 +00:00
|
|
|
#
|
|
|
|
|
# ATen generated code
|
|
|
|
|
# You need to keep this is sync with the files written out
|
|
|
|
|
# by gen.py (in the cmake build system, we track generated files
|
|
|
|
|
# via generated_cpp.txt and generated_cpp.txt-cuda
|
|
|
|
|
#
|
|
|
|
|
# Sure would be nice to use gen.py to create this list dynamically
|
|
|
|
|
# instead of hardcoding, no? Well, we can't, as discussed in this
|
|
|
|
|
# thread:
|
|
|
|
|
# https://fb.facebook.com/groups/askbuck/permalink/1924258337622772/
|
|
|
|
|
|
|
|
|
|
GENERATED_H = [
|
|
|
|
|
"Functions.h",
|
|
|
|
|
"NativeFunctions.h",
|
|
|
|
|
"NativeMetaFunctions.h",
|
|
|
|
|
"FunctionalInverses.h",
|
|
|
|
|
"RedispatchFunctions.h",
|
|
|
|
|
"RegistrationDeclarations.h",
|
2022-07-27 19:14:43 +00:00
|
|
|
"VmapGeneratedPlumbing.h",
|
2022-06-01 15:02:21 +00:00
|
|
|
]
|
|
|
|
|
|
|
|
|
|
GENERATED_H_CORE = [
|
|
|
|
|
"Operators.h",
|
|
|
|
|
# CPUFunctions.h (and likely similar headers) need to be part of core because
|
|
|
|
|
# of the static dispatch build: TensorBody.h directly includes CPUFunctions.h.
|
|
|
|
|
# The disinction looks pretty arbitrary though; maybe will can kill core
|
|
|
|
|
# and merge the two?
|
|
|
|
|
"CPUFunctions.h",
|
|
|
|
|
"CPUFunctions_inl.h",
|
|
|
|
|
"CompositeExplicitAutogradFunctions.h",
|
|
|
|
|
"CompositeExplicitAutogradFunctions_inl.h",
|
2022-06-15 15:34:00 +00:00
|
|
|
"CompositeExplicitAutogradNonFunctionalFunctions.h",
|
|
|
|
|
"CompositeExplicitAutogradNonFunctionalFunctions_inl.h",
|
2022-06-01 15:02:21 +00:00
|
|
|
"CompositeImplicitAutogradFunctions.h",
|
|
|
|
|
"CompositeImplicitAutogradFunctions_inl.h",
|
2022-09-01 20:01:39 +00:00
|
|
|
"CompositeImplicitAutogradNestedTensorFunctions.h",
|
|
|
|
|
"CompositeImplicitAutogradNestedTensorFunctions_inl.h",
|
2022-06-01 15:02:21 +00:00
|
|
|
"MetaFunctions.h",
|
|
|
|
|
"MetaFunctions_inl.h",
|
|
|
|
|
"core/TensorBody.h",
|
|
|
|
|
"MethodOperators.h",
|
|
|
|
|
"core/aten_interned_strings.h",
|
2022-06-10 21:48:56 +00:00
|
|
|
"core/enum_tag.h",
|
2022-06-01 15:02:21 +00:00
|
|
|
]
|
|
|
|
|
|
|
|
|
|
GENERATED_H_CUDA = [
|
|
|
|
|
"CUDAFunctions.h",
|
|
|
|
|
"CUDAFunctions_inl.h",
|
|
|
|
|
]
|
|
|
|
|
|
2022-06-02 10:39:40 +00:00
|
|
|
GENERATED_CPP_CUDA = [
|
2025-01-09 23:00:21 +00:00
|
|
|
"RegisterCUDA_0.cpp",
|
|
|
|
|
"RegisterNestedTensorCUDA_0.cpp",
|
|
|
|
|
"RegisterSparseCUDA_0.cpp",
|
|
|
|
|
"RegisterSparseCsrCUDA_0.cpp",
|
|
|
|
|
"RegisterQuantizedCUDA_0.cpp",
|
2022-06-02 10:39:40 +00:00
|
|
|
]
|
|
|
|
|
|
2022-06-01 15:02:21 +00:00
|
|
|
GENERATED_CPP = [
|
|
|
|
|
"Functions.cpp",
|
|
|
|
|
"RegisterBackendSelect.cpp",
|
2025-01-09 23:00:21 +00:00
|
|
|
"RegisterCPU_0.cpp",
|
|
|
|
|
"RegisterCPU_1.cpp",
|
|
|
|
|
"RegisterCPU_2.cpp",
|
|
|
|
|
"RegisterCPU_3.cpp",
|
|
|
|
|
"RegisterQuantizedCPU_0.cpp",
|
|
|
|
|
"RegisterNestedTensorCPU_0.cpp",
|
|
|
|
|
"RegisterSparseCPU_0.cpp",
|
|
|
|
|
"RegisterSparseCsrCPU_0.cpp",
|
|
|
|
|
"RegisterMkldnnCPU_0.cpp",
|
|
|
|
|
"RegisterCompositeImplicitAutograd_0.cpp",
|
|
|
|
|
"RegisterCompositeImplicitAutogradNestedTensor_0.cpp",
|
|
|
|
|
"RegisterZeroTensor_0.cpp",
|
|
|
|
|
"RegisterMeta_0.cpp",
|
|
|
|
|
"RegisterQuantizedMeta_0.cpp",
|
|
|
|
|
"RegisterNestedTensorMeta_0.cpp",
|
|
|
|
|
"RegisterSparseMeta_0.cpp",
|
|
|
|
|
"RegisterCompositeExplicitAutograd_0.cpp",
|
|
|
|
|
"RegisterCompositeExplicitAutogradNonFunctional_0.cpp",
|
2022-06-01 15:02:21 +00:00
|
|
|
"CompositeViewCopyKernels.cpp",
|
|
|
|
|
"RegisterSchema.cpp",
|
|
|
|
|
"RegisterFunctionalization_0.cpp",
|
|
|
|
|
"RegisterFunctionalization_1.cpp",
|
|
|
|
|
"RegisterFunctionalization_2.cpp",
|
|
|
|
|
"RegisterFunctionalization_3.cpp",
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
GENERATED_CPP_CORE = [
|
|
|
|
|
"Operators_0.cpp",
|
|
|
|
|
"Operators_1.cpp",
|
|
|
|
|
"Operators_2.cpp",
|
|
|
|
|
"Operators_3.cpp",
|
|
|
|
|
"Operators_4.cpp",
|
|
|
|
|
"core/ATenOpList.cpp",
|
|
|
|
|
"core/TensorMethods.cpp",
|
|
|
|
|
]
|
|
|
|
|
|
2022-05-03 09:48:27 +00:00
|
|
|
# These lists are temporarily living in and exported from the shared
|
|
|
|
|
# structure so that an internal build that lives under a different
|
|
|
|
|
# root can access them. These could technically live in a separate
|
|
|
|
|
# file in the same directory but that would require extra work to
|
|
|
|
|
# ensure that file is synced to both Meta internal repositories and
|
|
|
|
|
# GitHub. This problem will go away when the targets downstream of
|
|
|
|
|
# generate-code that use these lists are moved into the shared
|
|
|
|
|
# structure as well.
|
|
|
|
|
|
2022-05-16 21:40:31 +00:00
|
|
|
_GENERATED_AUTOGRAD_PYTHON_HEADERS = [
|
2022-05-16 21:40:29 +00:00
|
|
|
"torch/csrc/autograd/generated/python_functions.h",
|
2023-09-13 17:42:46 +00:00
|
|
|
"torch/csrc/autograd/generated/python_return_types.h",
|
2022-05-16 21:40:29 +00:00
|
|
|
]
|
|
|
|
|
|
2022-05-16 21:40:31 +00:00
|
|
|
_GENERATED_AUTOGRAD_CPP_HEADERS = [
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/autograd/generated/Functions.h",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType.h",
|
Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.
```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
virtual ~ViewFunc() {}
/// Returns any SymInts in the saved state.
virtual std::vector<c10::SymInt> get_symints() const { return {}; }
/// Returns the number of SymInts in the saved state.
virtual size_t num_symints() const { return 0; }
/// Returns any tensors in the saved state.
virtual std::vector<at::Tensor> get_tensors() const { return {}; }
/// Returns the number of tensors in the saved state.
virtual size_t num_tensors() const { return 0; }
/// Reapplies the view on the given base using the saved state.
virtual at::Tensor operator()(const at::Tensor&) const = 0;
/// Returns a clone of this ViewFunc, optionally with the specified saved state.
virtual std::unique_ptr<ViewFunc> clone_and_set(
std::optional<std::vector<c10::SymInt>> = c10::nullopt,
std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;
protected:
/// Sets the values of any SymInts in the saved state. The input vector size must
/// match the number of SymInts in the saved state (i.e. the size of the list
/// returned by get_symints()).
virtual void set_symints(std::vector<c10::SymInt>) {}
/// Sets the values of any Tensors in the saved state. The input vector size must
/// match the number of Tensors in the saved state (i.e. the size of the list
/// returned by get_tensors()).
virtual void set_tensors(std::vector<at::Tensor>) {}
};
```
New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`
The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.
Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
{};
virtual ~SliceTensorViewFunc() override {};
virtual std::vector<c10::SymInt> get_symints() const override;
virtual size_t num_symints() const override;
virtual std::vector<at::Tensor> get_tensors() const override;
virtual size_t num_tensors() const override;
virtual at::Tensor operator()(const at::Tensor&) const override;
virtual std::unique_ptr<ViewFunc> clone_and_set(
std::optional<std::vector<c10::SymInt>> = c10::nullopt,
std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;
protected:
virtual void set_symints(std::vector<c10::SymInt>) override;
virtual void set_tensors(std::vector<at::Tensor>) override;
private:
int64_t dim;
c10::optional<c10::SymInt> start;
c10::optional<c10::SymInt> end;
c10::SymInt step;
};
...
// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
::std::vector<c10::SymInt> symints;
symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
if(start.has_value()) symints.insert(symints.end(), *(start));
if(end.has_value()) symints.insert(symints.end(), *(end));
symints.push_back(step);
return symints;
}
size_t SliceTensorViewFunc::num_symints() const {
return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}
void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
auto i = 0;
if(start.has_value()) start = symints[i];
i += (start.has_value() ? 1 : 0);
if(end.has_value()) end = symints[i];
i += (end.has_value() ? 1 : 0);
step = symints[i];
}
std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
::std::vector<at::Tensor> tensors;
return tensors;
}
size_t SliceTensorViewFunc::num_tensors() const {
return static_cast<size_t>(0);
}
void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());
}
at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}
std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
std::optional<std::vector<c10::SymInt>> symints,
std::optional<std::vector<at::Tensor>> tensors) const {
auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
if (symints.has_value()) {
output->set_symints(std::move(*(symints)));
}
if (tensors.has_value()) {
output->set_tensors(std::move(*(tensors)));
}
return output;
}
```
The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.
For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 18:44:46 +00:00
|
|
|
"torch/csrc/autograd/generated/ViewFuncs.h",
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/autograd/generated/variable_factories.h",
|
2022-05-16 21:40:31 +00:00
|
|
|
]
|
|
|
|
|
|
2022-05-03 09:48:27 +00:00
|
|
|
GENERATED_TESTING_PY = [
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/testing/_internal/generated/annotated_fn_args.py",
|
2022-05-03 09:48:27 +00:00
|
|
|
]
|
|
|
|
|
|
|
|
|
|
GENERATED_LAZY_H = [
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/lazy/generated/LazyIr.h",
|
2022-05-24 19:29:23 +00:00
|
|
|
"torch/csrc/lazy/generated/LazyNonNativeIr.h",
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/lazy/generated/LazyNativeFunctions.h",
|
2022-05-03 09:48:27 +00:00
|
|
|
]
|
|
|
|
|
|
2022-05-16 21:40:29 +00:00
|
|
|
_GENERATED_AUTOGRAD_PYTHON_CPP = [
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/autograd/generated/python_functions_0.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_functions_1.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_functions_2.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_functions_3.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_functions_4.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_nn_functions.cpp",
|
2022-09-12 04:03:49 +00:00
|
|
|
"torch/csrc/autograd/generated/python_nested_functions.cpp",
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/autograd/generated/python_fft_functions.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_linalg_functions.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_return_types.cpp",
|
2022-06-10 21:48:56 +00:00
|
|
|
"torch/csrc/autograd/generated/python_enum_tag.cpp",
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/autograd/generated/python_sparse_functions.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_special_functions.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_torch_functions_0.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_torch_functions_1.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_torch_functions_2.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/python_variable_methods.cpp",
|
2022-05-16 21:40:29 +00:00
|
|
|
]
|
|
|
|
|
|
2022-05-16 21:40:31 +00:00
|
|
|
GENERATED_AUTOGRAD_PYTHON = _GENERATED_AUTOGRAD_PYTHON_HEADERS + _GENERATED_AUTOGRAD_PYTHON_CPP
|
2022-05-16 21:40:29 +00:00
|
|
|
|
2022-05-16 21:40:31 +00:00
|
|
|
GENERATED_AUTOGRAD_CPP = [
|
2022-05-16 21:40:29 +00:00
|
|
|
"torch/csrc/autograd/generated/Functions.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType_0.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType_1.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType_2.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType_3.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/VariableType_4.cpp",
|
Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.
```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
virtual ~ViewFunc() {}
/// Returns any SymInts in the saved state.
virtual std::vector<c10::SymInt> get_symints() const { return {}; }
/// Returns the number of SymInts in the saved state.
virtual size_t num_symints() const { return 0; }
/// Returns any tensors in the saved state.
virtual std::vector<at::Tensor> get_tensors() const { return {}; }
/// Returns the number of tensors in the saved state.
virtual size_t num_tensors() const { return 0; }
/// Reapplies the view on the given base using the saved state.
virtual at::Tensor operator()(const at::Tensor&) const = 0;
/// Returns a clone of this ViewFunc, optionally with the specified saved state.
virtual std::unique_ptr<ViewFunc> clone_and_set(
std::optional<std::vector<c10::SymInt>> = c10::nullopt,
std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;
protected:
/// Sets the values of any SymInts in the saved state. The input vector size must
/// match the number of SymInts in the saved state (i.e. the size of the list
/// returned by get_symints()).
virtual void set_symints(std::vector<c10::SymInt>) {}
/// Sets the values of any Tensors in the saved state. The input vector size must
/// match the number of Tensors in the saved state (i.e. the size of the list
/// returned by get_tensors()).
virtual void set_tensors(std::vector<at::Tensor>) {}
};
```
New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`
The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.
Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
{};
virtual ~SliceTensorViewFunc() override {};
virtual std::vector<c10::SymInt> get_symints() const override;
virtual size_t num_symints() const override;
virtual std::vector<at::Tensor> get_tensors() const override;
virtual size_t num_tensors() const override;
virtual at::Tensor operator()(const at::Tensor&) const override;
virtual std::unique_ptr<ViewFunc> clone_and_set(
std::optional<std::vector<c10::SymInt>> = c10::nullopt,
std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;
protected:
virtual void set_symints(std::vector<c10::SymInt>) override;
virtual void set_tensors(std::vector<at::Tensor>) override;
private:
int64_t dim;
c10::optional<c10::SymInt> start;
c10::optional<c10::SymInt> end;
c10::SymInt step;
};
...
// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
::std::vector<c10::SymInt> symints;
symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
if(start.has_value()) symints.insert(symints.end(), *(start));
if(end.has_value()) symints.insert(symints.end(), *(end));
symints.push_back(step);
return symints;
}
size_t SliceTensorViewFunc::num_symints() const {
return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}
void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
auto i = 0;
if(start.has_value()) start = symints[i];
i += (start.has_value() ? 1 : 0);
if(end.has_value()) end = symints[i];
i += (end.has_value() ? 1 : 0);
step = symints[i];
}
std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
::std::vector<at::Tensor> tensors;
return tensors;
}
size_t SliceTensorViewFunc::num_tensors() const {
return static_cast<size_t>(0);
}
void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());
}
at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}
std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
std::optional<std::vector<c10::SymInt>> symints,
std::optional<std::vector<at::Tensor>> tensors) const {
auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
if (symints.has_value()) {
output->set_symints(std::move(*(symints)));
}
if (tensors.has_value()) {
output->set_tensors(std::move(*(tensors)));
}
return output;
}
```
The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.
For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 18:44:46 +00:00
|
|
|
"torch/csrc/autograd/generated/ViewFuncs.cpp",
|
2022-05-16 21:40:29 +00:00
|
|
|
"torch/csrc/autograd/generated/TraceType_0.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/TraceType_1.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/TraceType_2.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/TraceType_3.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/TraceType_4.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp",
|
|
|
|
|
"torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp",
|
2022-05-04 13:48:20 +00:00
|
|
|
"torch/csrc/lazy/generated/LazyNativeFunctions.cpp",
|
|
|
|
|
"torch/csrc/lazy/generated/RegisterAutogradLazy.cpp",
|
|
|
|
|
"torch/csrc/lazy/generated/RegisterLazy.cpp",
|
2022-05-16 21:40:31 +00:00
|
|
|
] + _GENERATED_AUTOGRAD_CPP_HEADERS + GENERATED_LAZY_H
|
2024-05-31 23:56:11 +00:00
|
|
|
|
|
|
|
|
GENERATED_AOTI_CPP = [
|
|
|
|
|
"torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp",
|
2024-08-28 21:53:24 +00:00
|
|
|
]
|
|
|
|
|
|
|
|
|
|
GENERATED_AOTI_CUDA_CPP = [
|
2024-05-31 23:56:11 +00:00
|
|
|
"torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp",
|
|
|
|
|
]
|