pytorch/test/test_cpp_extensions_jit.py

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

872 lines
32 KiB
Python
Raw Normal View History

# Owner(s): ["module: cpp-extensions"]
import os
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
import shutil
import sys
import unittest
import warnings
import re
import tempfile
import subprocess
import glob
import torch.testing._internal.common_utils as common
import torch
import torch.backends.cudnn
import torch.utils.cpp_extension
from torch.utils.cpp_extension import CUDA_HOME, ROCM_HOME
from torch.testing._internal.common_utils import gradcheck
TEST_CUDA = torch.cuda.is_available() and CUDA_HOME is not None
TEST_CUDNN = False
TEST_ROCM = torch.cuda.is_available() and torch.version.hip is not None and ROCM_HOME is not None
if TEST_CUDA and torch.version.cuda is not None: # the skip CUDNN test for ROCm
CUDNN_HEADER_EXISTS = os.path.isfile(os.path.join(CUDA_HOME, "include/cudnn.h"))
TEST_CUDNN = (
TEST_CUDA and CUDNN_HEADER_EXISTS and torch.backends.cudnn.is_available()
)
IS_WINDOWS = sys.platform == "win32"
def remove_build_path():
if sys.platform == "win32":
print("Not wiping extensions build folder because Windows")
return
default_build_root = torch.utils.cpp_extension.get_default_build_root()
if os.path.exists(default_build_root):
shutil.rmtree(default_build_root)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
class TestCppExtensionJIT(common.TestCase):
"""Tests just-in-time cpp extensions.
Don't confuse this with the PyTorch JIT (aka TorchScript).
"""
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
def setUp(self):
super().setUp()
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
# cpp extensions use relative paths. Those paths are relative to
# this file, so we'll change the working directory temporarily
self.old_working_dir = os.getcwd()
os.chdir(os.path.dirname(os.path.abspath(__file__)))
def tearDown(self):
super().tearDown()
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
# return the working directory (see setUp)
os.chdir(self.old_working_dir)
@classmethod
def setUpClass(cls):
remove_build_path()
@classmethod
def tearDownClass(cls):
remove_build_path()
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
def test_jit_compile_extension(self):
module = torch.utils.cpp_extension.load(
name="jit_extension",
sources=[
"cpp_extensions/jit_extension.cpp",
"cpp_extensions/jit_extension2.cpp",
],
extra_include_paths=["cpp_extensions"],
extra_cflags=["-g"],
verbose=True,
)
x = torch.randn(4, 4)
y = torch.randn(4, 4)
z = module.tanh_add(x, y)
self.assertEqual(z, x.tanh() + y.tanh())
# Checking we can call a method defined not in the main C++ file.
z = module.exp_add(x, y)
self.assertEqual(z, x.exp() + y.exp())
# Checking we can use this JIT-compiled class.
doubler = module.Doubler(2, 2)
self.assertIsNone(doubler.get().grad)
self.assertEqual(doubler.get().sum(), 4)
self.assertEqual(doubler.forward().sum(), 8)
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
def test_jit_cuda_extension(self):
# NOTE: The name of the extension must equal the name of the module.
module = torch.utils.cpp_extension.load(
name="torch_test_cuda_extension",
sources=[
"cpp_extensions/cuda_extension.cpp",
"cpp_extensions/cuda_extension.cu",
],
extra_cuda_cflags=["-O2"],
verbose=True,
keep_intermediates=False,
)
x = torch.zeros(100, device="cuda", dtype=torch.float32)
y = torch.zeros(100, device="cuda", dtype=torch.float32)
z = module.sigmoid_add(x, y).cpu()
# 2 * sigmoid(0) = 2 * 0.5 = 1
self.assertEqual(z, torch.ones_like(z))
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
def _run_jit_cuda_archflags(self, flags, expected):
# Compile an extension with given `flags`
def _check_cuobjdump_output(expected_values, is_ptx=False):
elf_or_ptx = '--list-ptx' if is_ptx else '--list-elf'
lib_ext = '.pyd' if IS_WINDOWS else '.so'
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
# Note, .extension name may include _v1, _v2, so first find exact name
ext_filename = glob.glob(os.path.join(temp_dir,
'cudaext_archflag*' + lib_ext))[0]
command = ['cuobjdump', elf_or_ptx, ext_filename]
p = subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
output, err = p.communicate()
output = output.decode("ascii")
err = err.decode("ascii")
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
if not p.returncode == 0 or not err == '':
raise AssertionError("Flags: {}\nReturncode: {}\nStderr: {}\n"
"Output: {} ".format(flags, p.returncode,
err, output))
actual_arches = sorted(re.findall(r'sm_\d\d', output))
expected_arches = sorted(['sm_' + xx for xx in expected_values])
self.assertEqual(actual_arches, expected_arches,
msg="Flags: {}, Actual: {}, Expected: {}\n"
"Stderr: {}\nOutput: {}".format(
flags, actual_arches, expected_arches,
err, output))
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
temp_dir = tempfile.mkdtemp()
old_envvar = os.environ.get('TORCH_CUDA_ARCH_LIST', None)
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
try:
os.environ['TORCH_CUDA_ARCH_LIST'] = flags
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
torch.utils.cpp_extension.load(
name="cudaext_archflags",
sources=[
"cpp_extensions/cuda_extension.cpp",
"cpp_extensions/cuda_extension.cu",
],
extra_cuda_cflags=["-O2"],
verbose=True,
build_directory=temp_dir,
)
# Expected output for --list-elf:
# ELF file 1: cudaext_archflags.1.sm_61.cubin
# ELF file 2: cudaext_archflags.2.sm_52.cubin
_check_cuobjdump_output(expected[0])
if expected[1] is not None:
# Expected output for --list-ptx:
# PTX file 1: cudaext_archflags.1.sm_61.ptx
_check_cuobjdump_output(expected[1], is_ptx=True)
finally:
if IS_WINDOWS:
print("Not wiping extensions build folder because Windows")
else:
shutil.rmtree(temp_dir)
if old_envvar is None:
os.environ.pop('TORCH_CUDA_ARCH_LIST')
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
else:
os.environ['TORCH_CUDA_ARCH_LIST'] = old_envvar
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
@unittest.skipIf(not TEST_CUDA, "CUDA not found")
@unittest.skipIf(TEST_ROCM, "disabled on rocm")
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
def test_jit_cuda_archflags(self):
# Test a number of combinations:
# - the default for the machine we're testing on
# - Separators, can be ';' (most common) or ' '
# - Architecture names
# - With/without '+PTX'
Fix test_jit_cuda_archflags on machine with more than one arch (#50405) Summary: This fixes the following flaky test on machine with gpus of different arch: ``` _________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________ self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags> unittest.skipIf(not TEST_CUDA, "CUDA not found") unittest.skipIf(TEST_ROCM, "disabled on rocm") def test_jit_cuda_archflags(self): # Test a number of combinations: # - the default for the machine we're testing on # - Separators, can be ';' (most common) or ' ' # - Architecture names # - With/without '+PTX' capability = torch.cuda.get_device_capability() # expected values is length-2 tuple: (list of ELF, list of PTX) # note: there should not be more than one PTX value archflags = { '': (['{}{}'.format(capability[0], capability[1])], None), "Maxwell+Tegra;6.1": (['53', '61'], None), "Pascal 3.5": (['35', '60', '61'], None), "Volta": (['70'], ['70']), } if int(torch.version.cuda.split('.')[0]) >= 10: # CUDA 9 only supports compute capability <= 7.2 archflags["7.5+PTX"] = (['75'], ['75']) archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60']) for flags, expected in archflags.items(): > self._run_jit_cuda_archflags(flags, expected) test_cpp_extensions_jit.py:198: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags _check_cuobjdump_output(expected[0]) test_cpp_extensions_jit.py:134: in _check_cuobjdump_output self.assertEqual(actual_arches, expected_arches, ../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg)) E AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1. E Flags: , Actual: ['sm_75', 'sm_86'], Expected: ['sm_86'] E Stderr: E Output: ELF file 1: cudaext_archflags.1.sm_75.cubin E ELF file 2: cudaext_archflags.2.sm_86.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50405 Reviewed By: albanD Differential Revision: D25920200 Pulled By: mrshenli fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce
2021-01-26 16:36:47 +00:00
n = torch.cuda.device_count()
capabilities = {torch.cuda.get_device_capability(i) for i in range(n)}
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
# expected values is length-2 tuple: (list of ELF, list of PTX)
# note: there should not be more than one PTX value
archflags = {
'': (['{}{}'.format(capability[0], capability[1]) for capability in capabilities], None),
"Maxwell+Tegra;6.1": (['53', '61'], None),
"Pascal 3.5": (['35', '60', '61'], None),
"Volta": (['70'], ['70']),
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
}
if int(torch.version.cuda.split('.')[0]) >= 10:
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
# CUDA 9 only supports compute capability <= 7.2
archflags["7.5+PTX"] = (['75'], ['75'])
archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
for flags, expected in archflags.items():
self._run_jit_cuda_archflags(flags, expected)
@unittest.skipIf(not TEST_CUDNN, "CuDNN not found")
def test_jit_cudnn_extension(self):
# implementation of CuDNN ReLU
if IS_WINDOWS:
extra_ldflags = ["cudnn.lib"]
else:
extra_ldflags = ["-lcudnn"]
module = torch.utils.cpp_extension.load(
name="torch_test_cudnn_extension",
sources=["cpp_extensions/cudnn_extension.cpp"],
extra_ldflags=extra_ldflags,
verbose=True,
with_cuda=True,
)
x = torch.randn(100, device="cuda", dtype=torch.float32)
y = torch.zeros(100, device="cuda", dtype=torch.float32)
module.cudnn_relu(x, y) # y=relu(x)
self.assertEqual(torch.nn.functional.relu(x), y)
with self.assertRaisesRegex(RuntimeError, "same size"):
y_incorrect = torch.zeros(20, device="cuda", dtype=torch.float32)
module.cudnn_relu(x, y_incorrect)
def test_inline_jit_compile_extension_with_functions_as_list(self):
cpp_source = """
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
return x.tanh() + y.tanh();
}
"""
module = torch.utils.cpp_extension.load_inline(
name="inline_jit_extension_with_functions_list",
cpp_sources=cpp_source,
functions="tanh_add",
verbose=True,
)
self.assertEqual(module.tanh_add.__doc__.split("\n")[2], "tanh_add")
x = torch.randn(4, 4)
y = torch.randn(4, 4)
z = module.tanh_add(x, y)
self.assertEqual(z, x.tanh() + y.tanh())
def test_inline_jit_compile_extension_with_functions_as_dict(self):
cpp_source = """
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
return x.tanh() + y.tanh();
}
"""
module = torch.utils.cpp_extension.load_inline(
name="inline_jit_extension_with_functions_dict",
cpp_sources=cpp_source,
functions={"tanh_add": "Tanh and then sum :D"},
verbose=True,
)
self.assertEqual(module.tanh_add.__doc__.split("\n")[2], "Tanh and then sum :D")
def test_inline_jit_compile_extension_multiple_sources_and_no_functions(self):
cpp_source1 = """
torch::Tensor sin_add(torch::Tensor x, torch::Tensor y) {
return x.sin() + y.sin();
}
"""
cpp_source2 = """
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
#include <torch/extension.h>
torch::Tensor sin_add(torch::Tensor x, torch::Tensor y);
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("sin_add", &sin_add, "sin(x) + sin(y)");
}
"""
module = torch.utils.cpp_extension.load_inline(
name="inline_jit_extension",
cpp_sources=[cpp_source1, cpp_source2],
verbose=True,
)
x = torch.randn(4, 4)
y = torch.randn(4, 4)
z = module.sin_add(x, y)
self.assertEqual(z, x.sin() + y.sin())
@unittest.skip("Temporarily disabled")
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
def test_inline_jit_compile_extension_cuda(self):
cuda_source = """
__global__ void cos_add_kernel(
const float* __restrict__ x,
const float* __restrict__ y,
float* __restrict__ output,
const int size) {
const auto index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < size) {
output[index] = __cosf(x[index]) + __cosf(y[index]);
}
}
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y) {
auto output = torch::zeros_like(x);
const int threads = 1024;
const int blocks = (output.numel() + threads - 1) / threads;
cos_add_kernel<<<blocks, threads>>>(x.data<float>(), y.data<float>(), output.data<float>(), output.numel());
return output;
}
"""
# Here, the C++ source need only declare the function signature.
cpp_source = "torch::Tensor cos_add(torch::Tensor x, torch::Tensor y);"
module = torch.utils.cpp_extension.load_inline(
name="inline_jit_extension_cuda",
cpp_sources=cpp_source,
cuda_sources=cuda_source,
functions=["cos_add"],
verbose=True,
)
self.assertEqual(module.cos_add.__doc__.split("\n")[2], "cos_add")
x = torch.randn(4, 4, device="cuda", dtype=torch.float32)
y = torch.randn(4, 4, device="cuda", dtype=torch.float32)
z = module.cos_add(x, y)
self.assertEqual(z, x.cos() + y.cos())
@unittest.skip("Temporarily disabled")
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
def test_inline_jit_compile_custom_op_cuda(self):
cuda_source = """
__global__ void cos_add_kernel(
const float* __restrict__ x,
const float* __restrict__ y,
float* __restrict__ output,
const int size) {
const auto index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < size) {
output[index] = __cosf(x[index]) + __cosf(y[index]);
}
}
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y) {
auto output = torch::zeros_like(x);
const int threads = 1024;
const int blocks = (output.numel() + threads - 1) / threads;
cos_add_kernel<<<blocks, threads>>>(x.data_ptr<float>(), y.data_ptr<float>(), output.data_ptr<float>(), output.numel());
return output;
}
"""
# Here, the C++ source need only declare the function signature.
cpp_source = """
#include <torch/library.h>
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y);
TORCH_LIBRARY(inline_jit_extension_custom_op_cuda, m) {
m.def("cos_add", cos_add);
}
"""
torch.utils.cpp_extension.load_inline(
name="inline_jit_extension_custom_op_cuda",
cpp_sources=cpp_source,
cuda_sources=cuda_source,
verbose=True,
is_python_module=False,
)
x = torch.randn(4, 4, device="cuda", dtype=torch.float32)
y = torch.randn(4, 4, device="cuda", dtype=torch.float32)
z = torch.ops.inline_jit_extension_custom_op_cuda.cos_add(x, y)
self.assertEqual(z, x.cos() + y.cos())
def test_inline_jit_compile_extension_throws_when_functions_is_bad(self):
with self.assertRaises(ValueError):
torch.utils.cpp_extension.load_inline(
name="invalid_jit_extension", cpp_sources="", functions=5
)
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a *hooks* interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 17:28:33 +00:00
def test_lenient_flag_handling_in_jit_extensions(self):
cpp_source = """
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
return x.tanh() + y.tanh();
}
"""
module = torch.utils.cpp_extension.load_inline(
name="lenient_flag_handling_extension",
cpp_sources=cpp_source,
functions="tanh_add",
extra_cflags=["-g\n\n", "-O0 -Wall"],
extra_include_paths=[" cpp_extensions\n"],
verbose=True,
)
x = torch.zeros(100, dtype=torch.float32)
y = torch.zeros(100, dtype=torch.float32)
z = module.tanh_add(x, y).cpu()
self.assertEqual(z, x.tanh() + y.tanh())
@unittest.skip("Temporarily disabled")
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
def test_half_support(self):
"""
Checks for an issue with operator< ambiguity for half when certain
THC headers are included.
See https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333
for the corresponding issue.
"""
cuda_source = """
template<typename T, typename U>
__global__ void half_test_kernel(const T* input, U* output) {
if (input[0] < input[1] || input[0] >= input[1]) {
output[0] = 123;
}
}
torch::Tensor half_test(torch::Tensor input) {
auto output = torch::empty(1, input.options().dtype(torch::kFloat));
AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "half_test", [&] {
half_test_kernel<scalar_t><<<1, 1>>>(
input.data<scalar_t>(),
output.data<float>());
});
return output;
}
"""
module = torch.utils.cpp_extension.load_inline(
name="half_test_extension",
cpp_sources="torch::Tensor half_test(torch::Tensor input);",
cuda_sources=cuda_source,
functions=["half_test"],
verbose=True,
)
x = torch.randn(3, device="cuda", dtype=torch.half)
result = module.half_test(x)
self.assertEqual(result[0], 123)
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
def test_reload_jit_extension(self):
def compile(code):
return torch.utils.cpp_extension.load_inline(
name="reloaded_jit_extension",
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
cpp_sources=code,
functions="f",
verbose=True,
)
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
module = compile("int f() { return 123; }")
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
self.assertEqual(module.f(), 123)
module = compile("int f() { return 456; }")
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
self.assertEqual(module.f(), 456)
module = compile("int f() { return 456; }")
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
self.assertEqual(module.f(), 456)
module = compile("int f() { return 789; }")
Introduce ExtensionVersioner for C++ extensions (#11725) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 21:35:02 +00:00
self.assertEqual(module.f(), 789)
def test_cpp_frontend_module_has_same_output_as_python(self, dtype=torch.double):
extension = torch.utils.cpp_extension.load(
name="cpp_frontend_extension",
sources="cpp_extensions/cpp_frontend_extension.cpp",
verbose=True,
)
input = torch.randn(2, 5, dtype=dtype)
cpp_linear = extension.Net(5, 2)
cpp_linear.to(dtype)
python_linear = torch.nn.Linear(5, 2).to(dtype)
# First make sure they have the same parameters
cpp_parameters = dict(cpp_linear.named_parameters())
with torch.no_grad():
python_linear.weight.copy_(cpp_parameters["fc.weight"])
python_linear.bias.copy_(cpp_parameters["fc.bias"])
cpp_output = cpp_linear.forward(input)
python_output = python_linear(input)
self.assertEqual(cpp_output, python_output)
cpp_output.sum().backward()
python_output.sum().backward()
for p in cpp_linear.parameters():
self.assertFalse(p.grad is None)
self.assertEqual(cpp_parameters["fc.weight"].grad, python_linear.weight.grad)
self.assertEqual(cpp_parameters["fc.bias"].grad, python_linear.bias.grad)
def test_cpp_frontend_module_python_inter_op(self):
extension = torch.utils.cpp_extension.load(
name="cpp_frontend_extension",
sources="cpp_extensions/cpp_frontend_extension.cpp",
verbose=True,
)
# Create a torch.nn.Module which uses the C++ module as a submodule.
class M(torch.nn.Module):
def __init__(self):
super(M, self).__init__()
self.x = torch.nn.Parameter(torch.tensor(1.0))
self.net = extension.Net(3, 5)
def forward(self, input):
return self.net.forward(input) + self.x
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
net = extension.Net(5, 2)
net.double()
net.to(torch.get_default_dtype())
self.assertEqual(str(net), "Net")
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
# Further embed the torch.nn.Module into a Sequential, and also add the
# C++ module as an element of the Sequential.
sequential = torch.nn.Sequential(M(), torch.nn.Tanh(), net, torch.nn.Sigmoid())
input = torch.randn(2, 3)
# Try calling the module!
output = sequential.forward(input)
# The call operator is bound to forward too.
self.assertEqual(output, sequential(input))
self.assertEqual(list(output.shape), [2, 2])
# Do changes on the module hierarchy.
old_dtype = torch.get_default_dtype()
sequential.to(torch.float64)
sequential.to(torch.float32)
sequential.to(old_dtype)
self.assertEqual(sequential[2].parameters()[0].dtype, old_dtype)
# Make sure we can access these methods recursively.
self.assertEqual(len(list(sequential.parameters())), len(net.parameters()) * 2 + 1)
self.assertEqual(len(list(sequential.named_parameters())), len(net.named_parameters()) * 2 + 1)
self.assertEqual(len(list(sequential.buffers())), len(net.buffers()) * 2)
self.assertEqual(len(list(sequential.modules())), 8)
# Test clone()
net2 = net.clone()
self.assertEqual(len(net.parameters()), len(net2.parameters()))
self.assertEqual(len(net.buffers()), len(net2.buffers()))
self.assertEqual(len(net.modules()), len(net2.modules()))
# Try differentiating through the whole module.
for parameter in net.parameters():
self.assertIsNone(parameter.grad)
output.sum().backward()
for parameter in net.parameters():
self.assertFalse(parameter.grad is None)
self.assertGreater(parameter.grad.sum(), 0)
# Try calling zero_grad()
net.zero_grad()
for p in net.parameters():
self.assertEqual(p.grad, torch.zeros_like(p))
# Test train(), eval(), training (a property)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
self.assertTrue(net.training)
net.eval()
self.assertFalse(net.training)
net.train()
self.assertTrue(net.training)
net.eval()
# Try calling the additional methods we registered.
biased_input = torch.randn(4, 5)
output_before = net.forward(biased_input)
bias = net.get_bias().clone()
self.assertEqual(list(bias.shape), [2])
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
net.set_bias(bias + 1)
self.assertEqual(net.get_bias(), bias + 1)
output_after = net.forward(biased_input)
self.assertNotEqual(output_before, output_after)
# Try accessing parameters
self.assertEqual(len(net.parameters()), 2)
np = net.named_parameters()
self.assertEqual(len(np), 2)
self.assertIn("fc.weight", np)
self.assertIn("fc.bias", np)
self.assertEqual(len(net.buffers()), 1)
nb = net.named_buffers()
self.assertEqual(len(nb), 1)
self.assertIn("buf", nb)
self.assertEqual(nb[0][1], torch.eye(5))
def test_cpp_frontend_module_has_up_to_date_attributes(self):
extension = torch.utils.cpp_extension.load(
name="cpp_frontend_extension",
sources="cpp_extensions/cpp_frontend_extension.cpp",
verbose=True,
)
net = extension.Net(5, 2)
self.assertEqual(len(net._parameters), 0)
net.add_new_parameter("foo", torch.eye(5))
self.assertEqual(len(net._parameters), 1)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
self.assertEqual(len(net._buffers), 1)
net.add_new_buffer("bar", torch.eye(5))
self.assertEqual(len(net._buffers), 2)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
self.assertEqual(len(net._modules), 1)
net.add_new_submodule("fc2")
self.assertEqual(len(net._modules), 2)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
def test_cpp_frontend_module_python_inter_op_with_cuda(self):
extension = torch.utils.cpp_extension.load(
name="cpp_frontend_extension",
sources="cpp_extensions/cpp_frontend_extension.cpp",
verbose=True,
)
net = extension.Net(5, 2)
for p in net.parameters():
self.assertTrue(p.device.type == "cpu")
cpu_parameters = [p.clone() for p in net.parameters()]
device = torch.device("cuda", 0)
net.to(device)
for i, p in enumerate(net.parameters()):
self.assertTrue(p.device.type == "cuda")
self.assertTrue(p.device.index == 0)
self.assertEqual(cpu_parameters[i], p)
Unify C++ API with C++ extensions (#11510) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 21:28:54 +00:00
net.cpu()
net.add_new_parameter("a", torch.eye(5))
net.add_new_parameter("b", torch.eye(5))
net.add_new_buffer("c", torch.eye(5))
net.add_new_buffer("d", torch.eye(5))
net.add_new_submodule("fc2")
net.add_new_submodule("fc3")
for p in net.parameters():
self.assertTrue(p.device.type == "cpu")
net.cuda()
for p in net.parameters():
self.assertTrue(p.device.type == "cuda")
def test_returns_shared_library_path_when_is_python_module_is_true(self):
source = """
#include <torch/script.h>
torch::Tensor func(torch::Tensor x) { return x; }
static torch::RegisterOperators r("test::func", &func);
"""
torch.utils.cpp_extension.load_inline(
name="is_python_module",
cpp_sources=source,
functions="func",
verbose=True,
is_python_module=False,
)
self.assertEqual(torch.ops.test.func(torch.eye(5)), torch.eye(5))
def test_set_default_type_also_changes_aten_default_type(self):
module = torch.utils.cpp_extension.load_inline(
name="test_set_default_type",
cpp_sources="torch::Tensor get() { return torch::empty({}); }",
functions="get",
verbose=True,
)
initial_default = torch.get_default_dtype()
try:
self.assertEqual(module.get().dtype, initial_default)
torch.set_default_dtype(torch.float64)
self.assertEqual(module.get().dtype, torch.float64)
torch.set_default_dtype(torch.float32)
self.assertEqual(module.get().dtype, torch.float32)
torch.set_default_dtype(torch.float16)
self.assertEqual(module.get().dtype, torch.float16)
finally:
torch.set_default_dtype(initial_default)
Properly formats errors rising up from C++ extension compilation (#22445) Summary: Here's a C++ extension with a missing semicolon: ```python torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }') ``` which currently generates this error ``` RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23: error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int main() { return 0 }\n ^\nninja: build stopped: subcommand failed.\n' ``` After this PR, the error is ``` RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o FAILED: main.o c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test - DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site- packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o /tmp/torch_extensions/test/main.cpp: In function ‘int main()’: /tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token int main() { return 0 } ^ ninja: build stopped: subcommand failed. ``` which is a lot easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445 Differential Revision: D16094205 Pulled By: ezyang fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 22:31:58 +00:00
def test_compilation_error_formatting(self):
Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 22:20:38 +00:00
# Test that the missing-semicolon error message has linebreaks in it.
Properly formats errors rising up from C++ extension compilation (#22445) Summary: Here's a C++ extension with a missing semicolon: ```python torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }') ``` which currently generates this error ``` RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23: error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int main() { return 0 }\n ^\nninja: build stopped: subcommand failed.\n' ``` After this PR, the error is ``` RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o FAILED: main.o c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test - DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site- packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o /tmp/torch_extensions/test/main.cpp: In function ‘int main()’: /tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token int main() { return 0 } ^ ninja: build stopped: subcommand failed. ``` which is a lot easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445 Differential Revision: D16094205 Pulled By: ezyang fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 22:31:58 +00:00
# This'll fail if the message has been munged into a single line.
# It's hard to write anything more specific as every compiler has it's own
# error formatting.
with self.assertRaises(RuntimeError) as e:
torch.utils.cpp_extension.load_inline(
name="test_compilation_error_formatting",
cpp_sources="int main() { return 0 }")
pattern = r'.*(\\n|\\r).*'
Properly formats errors rising up from C++ extension compilation (#22445) Summary: Here's a C++ extension with a missing semicolon: ```python torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }') ``` which currently generates this error ``` RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23: error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int main() { return 0 }\n ^\nninja: build stopped: subcommand failed.\n' ``` After this PR, the error is ``` RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o FAILED: main.o c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test - DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site- packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o /tmp/torch_extensions/test/main.cpp: In function ‘int main()’: /tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token int main() { return 0 } ^ ninja: build stopped: subcommand failed. ``` which is a lot easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445 Differential Revision: D16094205 Pulled By: ezyang fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 22:31:58 +00:00
self.assertNotRegex(str(e), pattern)
def test_warning(self):
# Note: the module created from this source will include the py::key_error
# symbol. But because of visibility and the fact that it lives in a
# different compilation unit than pybind, this trips up ubsan even though
# it is fine. "ubsan.supp" thus needs to contain "vptr:warn_mod.so".
source = '''
// error_type:
// 0: no error
// 1: torch::TypeError
// 2: python_error()
// 3: py::error_already_set
at::Tensor foo(at::Tensor x, int error_type) {
std::ostringstream err_stream;
err_stream << "Error with " << x.type();
TORCH_WARN(err_stream.str());
if(error_type == 1) {
throw torch::TypeError(err_stream.str().c_str());
}
if(error_type == 2) {
PyObject* obj = PyTuple_New(-1);
TORCH_CHECK(!obj);
// Pretend it was caught in a different thread and restored here
auto e = python_error();
e.persist();
e.restore();
throw e;
}
if(error_type == 3) {
throw py::key_error(err_stream.str());
}
return x.cos();
}
'''
# Ensure double type for hard-coded c name below
t = torch.rand(2).double()
cpp_tensor_name = r"CPUDoubleType"
# Without error handling, the warnings cannot be catched
warn_mod = torch.utils.cpp_extension.load_inline(name='warn_mod',
cpp_sources=[source],
functions=['foo'],
with_pytorch_error_handling=False)
with warnings.catch_warnings(record=True) as w:
warn_mod.foo(t, 0)
self.assertEqual(len(w), 0)
with self.assertRaisesRegex(TypeError, t.type()):
warn_mod.foo(t, 1)
self.assertEqual(len(w), 0)
with self.assertRaisesRegex(SystemError, "bad argument to internal function"):
warn_mod.foo(t, 2)
self.assertEqual(len(w), 0)
with self.assertRaisesRegex(KeyError, cpp_tensor_name):
warn_mod.foo(t, 3)
self.assertEqual(len(w), 0)
warn_mod = torch.utils.cpp_extension.load_inline(name='warn_mod',
cpp_sources=[source],
functions=['foo'],
with_pytorch_error_handling=True)
with warnings.catch_warnings(record=True) as w:
# Catched with no error should be detected
warn_mod.foo(t, 0)
self.assertEqual(len(w), 1)
# Catched with cpp error should also be detected
with self.assertRaisesRegex(TypeError, t.type()):
warn_mod.foo(t, 1)
self.assertEqual(len(w), 2)
# Catched with python error should also be detected
with self.assertRaisesRegex(SystemError, "bad argument to internal function"):
warn_mod.foo(t, 2)
self.assertEqual(len(w), 3)
# Catched with pybind error should also be detected
# Note that there is no type name translation for pybind errors
with self.assertRaisesRegex(KeyError, cpp_tensor_name):
warn_mod.foo(t, 3)
self.assertEqual(len(w), 4)
# Make sure raising warnings are handled properly
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter("error")
# No error, the warning should raise
with self.assertRaisesRegex(UserWarning, t.type()):
warn_mod.foo(t, 0)
self.assertEqual(len(w), 0)
# Another error happened, the warning is ignored
with self.assertRaisesRegex(TypeError, t.type()):
warn_mod.foo(t, 1)
self.assertEqual(len(w), 0)
def test_autograd_from_cpp(self):
source = '''
void run_back(at::Tensor x) {
x.backward({});
}
void run_back_no_gil(at::Tensor x) {
pybind11::gil_scoped_release no_gil;
x.backward({});
}
'''
class MyFn(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
return x.clone()
@staticmethod
def backward(ctx, gx):
return gx
test_backward_deadlock = torch.utils.cpp_extension.load_inline(name='test_backward_deadlock',
cpp_sources=[source],
functions=['run_back', 'run_back_no_gil'],)
# This used to deadlock
inp = torch.rand(20, requires_grad=True)
loss = MyFn.apply(inp).sum()
with self.assertRaisesRegex(RuntimeError, "The autograd engine was called while holding the GIL."):
test_backward_deadlock.run_back(loss)
inp = torch.rand(20, requires_grad=True)
loss = MyFn.apply(inp).sum()
test_backward_deadlock.run_back_no_gil(loss)
def test_custom_compound_op_autograd(self):
# Test that a custom compound op (i.e. a custom op that just calls other aten ops)
# correctly returns gradients of those other ops
source = """
#include <torch/library.h>
torch::Tensor my_add(torch::Tensor x, torch::Tensor y) {
return x + y;
}
TORCH_LIBRARY(my, m) {
m.def("add", &my_add);
}
"""
torch.utils.cpp_extension.load_inline(
name="is_python_module",
cpp_sources=source,
verbose=True,
is_python_module=False,
)
a = torch.randn(5, 5, requires_grad=True)
b = torch.randn(5, 5, requires_grad=True)
gradcheck(torch.ops.my.add, [a, b], eps=1e-2)
if __name__ == "__main__":
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a *hooks* interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 17:28:33 +00:00
common.run_tests()