mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d |
||
|---|---|---|
| .. | ||
| amd_build | ||
| autograd | ||
| clang_format_hash | ||
| code_analyzer | ||
| code_coverage | ||
| codegen | ||
| config | ||
| docker | ||
| fast_nvcc | ||
| jit | ||
| lite_interpreter | ||
| pyi | ||
| rules | ||
| setup_helpers | ||
| shared | ||
| __init__.py | ||
| build_libtorch.py | ||
| build_pytorch_libs.py | ||
| build_variables.bzl | ||
| clang_format_all.py | ||
| clang_format_ci.sh | ||
| clang_format_utils.py | ||
| clang_tidy.py | ||
| download_mnist.py | ||
| flake8_hook.py | ||
| generate_torch_version.py | ||
| generated_dirs.txt | ||
| git-clang-format | ||
| git-pre-commit | ||
| git_add_generated_dirs.sh | ||
| git_reset_generated_dirs.sh | ||
| nightly.py | ||
| pytorch.version | ||
| README.md | ||
| test_history.py | ||
| update_disabled_tests.sh | ||
This folder contains a number of scripts which are used as
part of the PyTorch build process. This directory also doubles
as a Python module hierarchy (thus the __init__.py).
Overview
Modern infrastructure:
- autograd - Code generation for autograd. This includes definitions of all our derivatives.
- jit - Code generation for JIT
- shared - Generic infrastructure that scripts in
tools may find useful.
- module_loader.py - Makes it easier to import arbitrary Python files in a script, without having to add them to the PYTHONPATH first.
Legacy infrastructure (we should kill this):
- cwrap - Implementation of legacy code generation for THNN/THCUNN. This is used by nnwrap.
Build system pieces:
- setup_helpers - Helper code for searching for third-party dependencies on the user system.
- build_pytorch_libs.py - cross-platform script that builds all of the constituent libraries of PyTorch, but not the PyTorch Python extension itself.
- build_libtorch.py - Script for building libtorch, a standalone C++ library without Python support. This build script is tested in CI.
- fast_nvcc - Mostly-transparent wrapper over nvcc that
parallelizes compilation when used to build CUDA files for multiple
architectures at once.
- fast_nvcc.py - Python script, entrypoint to the fast nvcc wrapper.
Developer tools which you might find useful:
- clang_tidy.py - Script for running clang-tidy on lines of your script which you changed.
- git_add_generated_dirs.sh and git_reset_generated_dirs.sh - Use this to force add generated files to your Git index, so that you can conveniently run diffs on them when working on code-generation. (See also generated_dirs.txt which specifies the list of directories with generated files.)
- test_history.py - Query S3 to display history of a single test across multiple jobs over time.
Important if you want to run on AMD GPU:
- amd_build - HIPify scripts, for transpiling CUDA
into AMD HIP. Right now, PyTorch and Caffe2 share logic for how to
do this transpilation, but have separate entry-points for transpiling
either PyTorch or Caffe2 code.
- build_amd.py - Top-level entry point for HIPifying our codebase.
Tools which are only situationally useful:
- docker - Dockerfile for running (but not developing) PyTorch, using the official conda binary distribution. Context: https://github.com/pytorch/pytorch/issues/1619
- download_mnist.py - Download the MNIST dataset; this is necessary if you want to run the C++ API tests.
- run-clang-tidy-in-ci.sh - Responsible for checking that C++ code is clang-tidy clean in CI on Travis