pytorch/tools
Edward Yang 2ab497012f Add at::cpu namespace of functions for structured kernels (#49505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49505

I have a problem which is that static runtime needs a way to bypass
dispatch and call into kernels directly.  Previously, it used
native:: bindings to do this; but these bindings no longer exist
for structured kernels!  Enter at::cpu: a namespace of exactly
at:: compatible functions that assume all of their arguments are
CPU and non-autograd!  The header looks like this:

```
namespace at {
namespace cpu {

CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & upsample_nearest1d_out(Tensor & out, const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d(const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor & upsample_nearest1d_backward_out(Tensor & grad_input, const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d_backward(const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);

}}
```

This slows down static runtime because these are not the "allow
resize of nonzero tensor" variant binding (unlike the ones I had manually
written).  We can restore this: it's a matter of adding codegen smarts to
do this, but I haven't done it just yet since it's marginally more
complicated.

In principle, non-structured kernels could get this treatment too.
But, like an evil mastermind, I'm withholding it from this patch, as an extra
carrot to get people to migrate to structured muahahahaha.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25616105

Pulled By: ezyang

fbshipit-source-id: 84955ae09d0b373ca1ed05e0e4e0074a18d1a0b5
2021-01-22 13:11:59 -08:00
..
amd_build [ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917) 2021-01-22 07:24:05 -08:00
autograd Padding: support complex dtypes (#50594) 2021-01-22 11:57:42 -08:00
clang_format_hash [tools] Update clang-format linux hash (#50520) 2021-01-13 20:50:56 -08:00
code_analyzer kill multinomial_alias_setup/draw (#50489) 2021-01-19 00:23:58 -08:00
code_coverage
codegen Add at::cpu namespace of functions for structured kernels (#49505) 2021-01-22 13:11:59 -08:00
config
docker
fast_nvcc initial commit to enable fast_nvcc (#49773) 2021-01-19 14:50:54 -08:00
jit Remove generated_unboxing_wrappers and setManuallyBoxedKernel (#49251) 2021-01-06 14:22:50 -08:00
pyi Revert D25958987: [pytorch][PR] Add type annotations to torch.overrides 2021-01-20 08:59:44 -08:00
rules
setup_helpers [pytorch] clean up unused util srcs under tools/autograd (#50611) 2021-01-18 23:54:02 -08:00
shared Introducing TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API to the code (#50627) 2021-01-21 19:09:11 -08:00
__init__.py
build_libtorch.py
build_pytorch_libs.py Cleanup unused code for Python < 3.6 (#47822) 2020-11-13 21:37:01 -08:00
build_variables.bzl Refactor build targets for torch::deploy (#50288) 2021-01-22 09:16:32 -08:00
clang_format_all.py
clang_format_ci.sh
clang_format_utils.py
clang_tidy.py
download_mnist.py
flake8_hook.py
generate_torch_version.py tools: Move sha check to else statement (#50773) 2021-01-20 09:34:43 -08:00
generated_dirs.txt
git-clang-format
git-pre-commit
git_add_generated_dirs.sh
git_reset_generated_dirs.sh
nightly.py Bugfix nightly checkout tool to work on Windows (#49274) 2021-01-06 16:14:51 -08:00
pytorch.version
README.md Bring fast_nvcc.py to PyTorch OSS (#48934) 2020-12-11 08:17:21 -08:00
update_disabled_tests.sh

This folder contains a number of scripts which are used as part of the PyTorch build process. This directory also doubles as a Python module hierarchy (thus the __init__.py).

Overview

Modern infrastructure:

  • autograd - Code generation for autograd. This includes definitions of all our derivatives.
  • jit - Code generation for JIT
  • shared - Generic infrastructure that scripts in tools may find useful.
    • module_loader.py - Makes it easier to import arbitrary Python files in a script, without having to add them to the PYTHONPATH first.

Legacy infrastructure (we should kill this):

  • cwrap - Implementation of legacy code generation for THNN/THCUNN. This is used by nnwrap.

Build system pieces:

  • setup_helpers - Helper code for searching for third-party dependencies on the user system.
  • build_pytorch_libs.py - cross-platform script that builds all of the constituent libraries of PyTorch, but not the PyTorch Python extension itself.
  • build_libtorch.py - Script for building libtorch, a standalone C++ library without Python support. This build script is tested in CI.
  • fast_nvcc - Mostly-transparent wrapper over nvcc that parallelizes compilation when used to build CUDA files for multiple architectures at once.
    • fast_nvcc.py - Python script, entrypoint to the fast nvcc wrapper.

Developer tools which you might find useful:

Important if you want to run on AMD GPU:

  • amd_build - HIPify scripts, for transpiling CUDA into AMD HIP. Right now, PyTorch and Caffe2 share logic for how to do this transpilation, but have separate entry-points for transpiling either PyTorch or Caffe2 code.
    • build_amd.py - Top-level entry point for HIPifying our codebase.

Tools which are only situationally useful: