mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037 As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985 |
||
|---|---|---|
| .. | ||
| no_python_abi_suffix_test | ||
| self_compiler_include_dirs_test | ||
| torch_test_cpp_extension | ||
| cpp_c10d_extension.cpp | ||
| cpp_c10d_extension.hpp | ||
| cpp_frontend_extension.cpp | ||
| cuda_extension.cpp | ||
| cuda_extension.cu | ||
| cuda_extension_kernel.cu | ||
| cuda_extension_kernel2.cu | ||
| cudnn_extension.cpp | ||
| doubler.h | ||
| extension.cpp | ||
| jit_extension.cpp | ||
| jit_extension2.cpp | ||
| msnpu_extension.cpp | ||
| rng_extension.cpp | ||
| setup.py | ||
| torch_library.cu | ||