2021-10-21 14:50:52 +00:00
|
|
|
# Owner(s): ["module: cpp-extensions"]
|
|
|
|
|
|
2018-08-21 01:18:10 +00:00
|
|
|
import os
|
2018-09-20 21:35:02 +00:00
|
|
|
import shutil
|
2018-06-21 05:32:50 +00:00
|
|
|
import sys
|
2021-08-12 17:56:55 +00:00
|
|
|
import unittest
|
|
|
|
|
import warnings
|
2021-08-12 18:39:31 +00:00
|
|
|
import re
|
|
|
|
|
import tempfile
|
|
|
|
|
import subprocess
|
|
|
|
|
import glob
|
|
|
|
|
|
|
|
|
|
import torch.testing._internal.common_utils as common
|
2018-01-23 00:49:11 +00:00
|
|
|
import torch
|
2018-06-21 05:32:50 +00:00
|
|
|
import torch.backends.cudnn
|
2018-12-13 16:01:10 +00:00
|
|
|
import torch.utils.cpp_extension
|
2021-08-12 17:56:55 +00:00
|
|
|
from torch.utils.cpp_extension import CUDA_HOME, ROCM_HOME
|
2022-05-03 20:21:55 +00:00
|
|
|
from torch.testing._internal.common_utils import gradcheck
|
2018-12-13 16:01:10 +00:00
|
|
|
|
2018-11-26 17:37:04 +00:00
|
|
|
|
2018-04-05 02:37:13 +00:00
|
|
|
TEST_CUDA = torch.cuda.is_available() and CUDA_HOME is not None
|
2018-08-21 01:18:10 +00:00
|
|
|
TEST_CUDNN = False
|
2021-08-12 18:39:31 +00:00
|
|
|
TEST_ROCM = torch.cuda.is_available() and torch.version.hip is not None and ROCM_HOME is not None
|
2020-04-13 18:41:27 +00:00
|
|
|
if TEST_CUDA and torch.version.cuda is not None: # the skip CUDNN test for ROCm
|
2018-12-13 16:01:10 +00:00
|
|
|
CUDNN_HEADER_EXISTS = os.path.isfile(os.path.join(CUDA_HOME, "include/cudnn.h"))
|
|
|
|
|
TEST_CUDNN = (
|
|
|
|
|
TEST_CUDA and CUDNN_HEADER_EXISTS and torch.backends.cudnn.is_available()
|
|
|
|
|
)
|
|
|
|
|
IS_WINDOWS = sys.platform == "win32"
|
2018-02-13 23:02:50 +00:00
|
|
|
|
2018-01-23 00:49:11 +00:00
|
|
|
|
2020-06-23 15:41:21 +00:00
|
|
|
def remove_build_path():
|
|
|
|
|
if sys.platform == "win32":
|
|
|
|
|
print("Not wiping extensions build folder because Windows")
|
|
|
|
|
return
|
|
|
|
|
default_build_root = torch.utils.cpp_extension.get_default_build_root()
|
|
|
|
|
if os.path.exists(default_build_root):
|
|
|
|
|
shutil.rmtree(default_build_root)
|
2018-09-24 21:28:54 +00:00
|
|
|
|
|
|
|
|
|
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
|
|
|
class TestCppExtensionJIT(common.TestCase):
|
|
|
|
|
"""Tests just-in-time cpp extensions.
|
|
|
|
|
Don't confuse this with the PyTorch JIT (aka TorchScript).
|
|
|
|
|
"""
|
|
|
|
|
|
2018-09-20 21:35:02 +00:00
|
|
|
def setUp(self):
|
2021-07-20 22:07:14 +00:00
|
|
|
super().setUp()
|
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
|
|
|
# cpp extensions use relative paths. Those paths are relative to
|
|
|
|
|
# this file, so we'll change the working directory temporarily
|
|
|
|
|
self.old_working_dir = os.getcwd()
|
|
|
|
|
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
|
|
|
|
|
|
|
|
|
def tearDown(self):
|
2021-07-20 22:07:14 +00:00
|
|
|
super().tearDown()
|
Add option to use ninja to compile ahead-of-time cpp_extensions (#32495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-06 02:44:19 +00:00
|
|
|
# return the working directory (see setUp)
|
|
|
|
|
os.chdir(self.old_working_dir)
|
|
|
|
|
|
2020-06-23 15:41:21 +00:00
|
|
|
@classmethod
|
|
|
|
|
def setUpClass(cls):
|
|
|
|
|
remove_build_path()
|
|
|
|
|
|
2019-12-03 15:42:23 +00:00
|
|
|
@classmethod
|
|
|
|
|
def tearDownClass(cls):
|
2020-06-23 15:41:21 +00:00
|
|
|
remove_build_path()
|
2018-09-20 21:35:02 +00:00
|
|
|
|
2018-01-31 19:29:27 +00:00
|
|
|
def test_jit_compile_extension(self):
|
|
|
|
|
module = torch.utils.cpp_extension.load(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="jit_extension",
|
2018-01-31 19:29:27 +00:00
|
|
|
sources=[
|
2018-12-13 16:01:10 +00:00
|
|
|
"cpp_extensions/jit_extension.cpp",
|
|
|
|
|
"cpp_extensions/jit_extension2.cpp",
|
2018-01-31 19:29:27 +00:00
|
|
|
],
|
2018-12-13 16:01:10 +00:00
|
|
|
extra_include_paths=["cpp_extensions"],
|
|
|
|
|
extra_cflags=["-g"],
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-01-31 19:29:27 +00:00
|
|
|
x = torch.randn(4, 4)
|
|
|
|
|
y = torch.randn(4, 4)
|
2018-02-17 03:31:04 +00:00
|
|
|
|
2018-01-31 19:29:27 +00:00
|
|
|
z = module.tanh_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.tanh() + y.tanh())
|
|
|
|
|
|
2018-02-17 03:31:04 +00:00
|
|
|
# Checking we can call a method defined not in the main C++ file.
|
|
|
|
|
z = module.exp_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.exp() + y.exp())
|
|
|
|
|
|
|
|
|
|
# Checking we can use this JIT-compiled class.
|
|
|
|
|
doubler = module.Doubler(2, 2)
|
2018-03-09 22:31:05 +00:00
|
|
|
self.assertIsNone(doubler.get().grad)
|
2018-02-17 03:31:04 +00:00
|
|
|
self.assertEqual(doubler.get().sum(), 4)
|
|
|
|
|
self.assertEqual(doubler.forward().sum(), 8)
|
|
|
|
|
|
2020-07-17 19:12:57 +00:00
|
|
|
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
|
2018-02-15 20:50:01 +00:00
|
|
|
def test_jit_cuda_extension(self):
|
|
|
|
|
# NOTE: The name of the extension must equal the name of the module.
|
|
|
|
|
module = torch.utils.cpp_extension.load(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="torch_test_cuda_extension",
|
2018-02-15 20:50:01 +00:00
|
|
|
sources=[
|
2018-12-13 16:01:10 +00:00
|
|
|
"cpp_extensions/cuda_extension.cpp",
|
|
|
|
|
"cpp_extensions/cuda_extension.cu",
|
2018-02-15 20:50:01 +00:00
|
|
|
],
|
2018-12-13 16:01:10 +00:00
|
|
|
extra_cuda_cflags=["-O2"],
|
|
|
|
|
verbose=True,
|
2020-04-13 18:41:27 +00:00
|
|
|
keep_intermediates=False,
|
2018-12-13 16:01:10 +00:00
|
|
|
)
|
2018-02-15 20:50:01 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
x = torch.zeros(100, device="cuda", dtype=torch.float32)
|
|
|
|
|
y = torch.zeros(100, device="cuda", dtype=torch.float32)
|
2018-02-15 20:50:01 +00:00
|
|
|
|
|
|
|
|
z = module.sigmoid_add(x, y).cpu()
|
|
|
|
|
|
|
|
|
|
# 2 * sigmoid(0) = 2 * 0.5 = 1
|
|
|
|
|
self.assertEqual(z, torch.ones_like(z))
|
|
|
|
|
|
2019-08-15 22:20:38 +00:00
|
|
|
def _run_jit_cuda_archflags(self, flags, expected):
|
|
|
|
|
# Compile an extension with given `flags`
|
|
|
|
|
def _check_cuobjdump_output(expected_values, is_ptx=False):
|
2021-08-12 18:39:31 +00:00
|
|
|
elf_or_ptx = '--list-ptx' if is_ptx else '--list-elf'
|
|
|
|
|
lib_ext = '.pyd' if IS_WINDOWS else '.so'
|
2019-08-15 22:20:38 +00:00
|
|
|
# Note, .extension name may include _v1, _v2, so first find exact name
|
2021-08-12 18:39:31 +00:00
|
|
|
ext_filename = glob.glob(os.path.join(temp_dir,
|
|
|
|
|
'cudaext_archflag*' + lib_ext))[0]
|
|
|
|
|
command = ['cuobjdump', elf_or_ptx, ext_filename]
|
|
|
|
|
p = subprocess.Popen(command,
|
|
|
|
|
stdout=subprocess.PIPE,
|
|
|
|
|
stderr=subprocess.PIPE)
|
2019-08-15 22:20:38 +00:00
|
|
|
output, err = p.communicate()
|
2020-04-22 16:20:13 +00:00
|
|
|
output = output.decode("ascii")
|
|
|
|
|
err = err.decode("ascii")
|
2019-08-15 22:20:38 +00:00
|
|
|
|
2021-08-12 18:39:31 +00:00
|
|
|
if not p.returncode == 0 or not err == '':
|
|
|
|
|
raise AssertionError("Flags: {}\nReturncode: {}\nStderr: {}\n"
|
|
|
|
|
"Output: {} ".format(flags, p.returncode,
|
|
|
|
|
err, output))
|
|
|
|
|
|
|
|
|
|
actual_arches = sorted(re.findall(r'sm_\d\d', output))
|
|
|
|
|
expected_arches = sorted(['sm_' + xx for xx in expected_values])
|
|
|
|
|
self.assertEqual(actual_arches, expected_arches,
|
|
|
|
|
msg="Flags: {}, Actual: {}, Expected: {}\n"
|
|
|
|
|
"Stderr: {}\nOutput: {}".format(
|
|
|
|
|
flags, actual_arches, expected_arches,
|
|
|
|
|
err, output))
|
2019-08-15 22:20:38 +00:00
|
|
|
|
|
|
|
|
temp_dir = tempfile.mkdtemp()
|
2021-08-12 18:39:31 +00:00
|
|
|
old_envvar = os.environ.get('TORCH_CUDA_ARCH_LIST', None)
|
2019-08-15 22:20:38 +00:00
|
|
|
try:
|
2021-08-12 18:39:31 +00:00
|
|
|
os.environ['TORCH_CUDA_ARCH_LIST'] = flags
|
2019-08-15 22:20:38 +00:00
|
|
|
torch.utils.cpp_extension.load(
|
|
|
|
|
name="cudaext_archflags",
|
|
|
|
|
sources=[
|
|
|
|
|
"cpp_extensions/cuda_extension.cpp",
|
|
|
|
|
"cpp_extensions/cuda_extension.cu",
|
|
|
|
|
],
|
|
|
|
|
extra_cuda_cflags=["-O2"],
|
|
|
|
|
verbose=True,
|
|
|
|
|
build_directory=temp_dir,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# Expected output for --list-elf:
|
|
|
|
|
# ELF file 1: cudaext_archflags.1.sm_61.cubin
|
|
|
|
|
# ELF file 2: cudaext_archflags.2.sm_52.cubin
|
|
|
|
|
_check_cuobjdump_output(expected[0])
|
|
|
|
|
if expected[1] is not None:
|
|
|
|
|
# Expected output for --list-ptx:
|
|
|
|
|
# PTX file 1: cudaext_archflags.1.sm_61.ptx
|
|
|
|
|
_check_cuobjdump_output(expected[1], is_ptx=True)
|
|
|
|
|
finally:
|
|
|
|
|
if IS_WINDOWS:
|
|
|
|
|
print("Not wiping extensions build folder because Windows")
|
|
|
|
|
else:
|
|
|
|
|
shutil.rmtree(temp_dir)
|
|
|
|
|
|
|
|
|
|
if old_envvar is None:
|
2021-08-12 18:39:31 +00:00
|
|
|
os.environ.pop('TORCH_CUDA_ARCH_LIST')
|
2019-08-15 22:20:38 +00:00
|
|
|
else:
|
2021-08-12 18:39:31 +00:00
|
|
|
os.environ['TORCH_CUDA_ARCH_LIST'] = old_envvar
|
2019-08-15 22:20:38 +00:00
|
|
|
|
|
|
|
|
@unittest.skipIf(not TEST_CUDA, "CUDA not found")
|
2020-04-13 18:41:27 +00:00
|
|
|
@unittest.skipIf(TEST_ROCM, "disabled on rocm")
|
2019-08-15 22:20:38 +00:00
|
|
|
def test_jit_cuda_archflags(self):
|
|
|
|
|
# Test a number of combinations:
|
|
|
|
|
# - the default for the machine we're testing on
|
|
|
|
|
# - Separators, can be ';' (most common) or ' '
|
|
|
|
|
# - Architecture names
|
|
|
|
|
# - With/without '+PTX'
|
|
|
|
|
|
Fix test_jit_cuda_archflags on machine with more than one arch (#50405)
Summary:
This fixes the following flaky test on machine with gpus of different arch:
```
_________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________
self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags>
unittest.skipIf(not TEST_CUDA, "CUDA not found")
unittest.skipIf(TEST_ROCM, "disabled on rocm")
def test_jit_cuda_archflags(self):
# Test a number of combinations:
# - the default for the machine we're testing on
# - Separators, can be ';' (most common) or ' '
# - Architecture names
# - With/without '+PTX'
capability = torch.cuda.get_device_capability()
# expected values is length-2 tuple: (list of ELF, list of PTX)
# note: there should not be more than one PTX value
archflags = {
'': (['{}{}'.format(capability[0], capability[1])], None),
"Maxwell+Tegra;6.1": (['53', '61'], None),
"Pascal 3.5": (['35', '60', '61'], None),
"Volta": (['70'], ['70']),
}
if int(torch.version.cuda.split('.')[0]) >= 10:
# CUDA 9 only supports compute capability <= 7.2
archflags["7.5+PTX"] = (['75'], ['75'])
archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])
for flags, expected in archflags.items():
> self._run_jit_cuda_archflags(flags, expected)
test_cpp_extensions_jit.py:198:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags
_check_cuobjdump_output(expected[0])
test_cpp_extensions_jit.py:134: in _check_cuobjdump_output
self.assertEqual(actual_arches, expected_arches,
../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual
super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1.
E Flags: , Actual: ['sm_75', 'sm_86'], Expected: ['sm_86']
E Stderr:
E Output: ELF file 1: cudaext_archflags.1.sm_75.cubin
E ELF file 2: cudaext_archflags.2.sm_86.cubin
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50405
Reviewed By: albanD
Differential Revision: D25920200
Pulled By: mrshenli
fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce
2021-01-26 16:36:47 +00:00
|
|
|
n = torch.cuda.device_count()
|
|
|
|
|
capabilities = {torch.cuda.get_device_capability(i) for i in range(n)}
|
2019-08-15 22:20:38 +00:00
|
|
|
# expected values is length-2 tuple: (list of ELF, list of PTX)
|
|
|
|
|
# note: there should not be more than one PTX value
|
|
|
|
|
archflags = {
|
2021-08-12 18:39:31 +00:00
|
|
|
'': (['{}{}'.format(capability[0], capability[1]) for capability in capabilities], None),
|
|
|
|
|
"Maxwell+Tegra;6.1": (['53', '61'], None),
|
|
|
|
|
"Pascal 3.5": (['35', '60', '61'], None),
|
|
|
|
|
"Volta": (['70'], ['70']),
|
2019-08-15 22:20:38 +00:00
|
|
|
}
|
2021-08-12 18:39:31 +00:00
|
|
|
if int(torch.version.cuda.split('.')[0]) >= 10:
|
2019-08-15 22:20:38 +00:00
|
|
|
# CUDA 9 only supports compute capability <= 7.2
|
2021-08-12 18:39:31 +00:00
|
|
|
archflags["7.5+PTX"] = (['75'], ['75'])
|
|
|
|
|
archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])
|
2019-08-15 22:20:38 +00:00
|
|
|
|
|
|
|
|
for flags, expected in archflags.items():
|
|
|
|
|
self._run_jit_cuda_archflags(flags, expected)
|
|
|
|
|
|
2018-06-21 05:32:50 +00:00
|
|
|
@unittest.skipIf(not TEST_CUDNN, "CuDNN not found")
|
|
|
|
|
def test_jit_cudnn_extension(self):
|
|
|
|
|
# implementation of CuDNN ReLU
|
2018-11-28 01:33:54 +00:00
|
|
|
if IS_WINDOWS:
|
2018-12-13 16:01:10 +00:00
|
|
|
extra_ldflags = ["cudnn.lib"]
|
2018-06-21 05:32:50 +00:00
|
|
|
else:
|
2018-12-13 16:01:10 +00:00
|
|
|
extra_ldflags = ["-lcudnn"]
|
2018-06-21 05:32:50 +00:00
|
|
|
module = torch.utils.cpp_extension.load(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="torch_test_cudnn_extension",
|
|
|
|
|
sources=["cpp_extensions/cudnn_extension.cpp"],
|
2018-06-21 05:32:50 +00:00
|
|
|
extra_ldflags=extra_ldflags,
|
|
|
|
|
verbose=True,
|
2018-12-13 16:01:10 +00:00
|
|
|
with_cuda=True,
|
|
|
|
|
)
|
2018-06-21 05:32:50 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
x = torch.randn(100, device="cuda", dtype=torch.float32)
|
|
|
|
|
y = torch.zeros(100, device="cuda", dtype=torch.float32)
|
2018-06-21 05:32:50 +00:00
|
|
|
module.cudnn_relu(x, y) # y=relu(x)
|
|
|
|
|
self.assertEqual(torch.nn.functional.relu(x), y)
|
|
|
|
|
with self.assertRaisesRegex(RuntimeError, "same size"):
|
2018-12-13 16:01:10 +00:00
|
|
|
y_incorrect = torch.zeros(20, device="cuda", dtype=torch.float32)
|
2018-06-21 05:32:50 +00:00
|
|
|
module.cudnn_relu(x, y_incorrect)
|
|
|
|
|
|
2018-04-30 15:48:44 +00:00
|
|
|
def test_inline_jit_compile_extension_with_functions_as_list(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source = """
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
|
2018-04-30 15:48:44 +00:00
|
|
|
return x.tanh() + y.tanh();
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="inline_jit_extension_with_functions_list",
|
2018-04-30 15:48:44 +00:00
|
|
|
cpp_sources=cpp_source,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions="tanh_add",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-04-30 15:48:44 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
self.assertEqual(module.tanh_add.__doc__.split("\n")[2], "tanh_add")
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
x = torch.randn(4, 4)
|
|
|
|
|
y = torch.randn(4, 4)
|
|
|
|
|
|
|
|
|
|
z = module.tanh_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.tanh() + y.tanh())
|
|
|
|
|
|
|
|
|
|
def test_inline_jit_compile_extension_with_functions_as_dict(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source = """
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
|
2018-04-30 15:48:44 +00:00
|
|
|
return x.tanh() + y.tanh();
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="inline_jit_extension_with_functions_dict",
|
2018-04-30 15:48:44 +00:00
|
|
|
cpp_sources=cpp_source,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions={"tanh_add": "Tanh and then sum :D"},
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-04-30 15:48:44 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
self.assertEqual(module.tanh_add.__doc__.split("\n")[2], "Tanh and then sum :D")
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
def test_inline_jit_compile_extension_multiple_sources_and_no_functions(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source1 = """
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor sin_add(torch::Tensor x, torch::Tensor y) {
|
2018-04-30 15:48:44 +00:00
|
|
|
return x.sin() + y.sin();
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-04-30 15:48:44 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source2 = """
|
2018-09-24 21:28:54 +00:00
|
|
|
#include <torch/extension.h>
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor sin_add(torch::Tensor x, torch::Tensor y);
|
2018-04-30 15:48:44 +00:00
|
|
|
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
|
|
|
|
|
m.def("sin_add", &sin_add, "sin(x) + sin(y)");
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="inline_jit_extension",
|
2018-04-30 15:48:44 +00:00
|
|
|
cpp_sources=[cpp_source1, cpp_source2],
|
2018-12-13 16:01:10 +00:00
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
x = torch.randn(4, 4)
|
|
|
|
|
y = torch.randn(4, 4)
|
|
|
|
|
|
|
|
|
|
z = module.sin_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.sin() + y.sin())
|
|
|
|
|
|
2020-08-28 01:11:11 +00:00
|
|
|
@unittest.skip("Temporarily disabled")
|
2020-07-17 19:12:57 +00:00
|
|
|
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
|
2018-04-30 15:48:44 +00:00
|
|
|
def test_inline_jit_compile_extension_cuda(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
cuda_source = """
|
2018-04-30 15:48:44 +00:00
|
|
|
__global__ void cos_add_kernel(
|
|
|
|
|
const float* __restrict__ x,
|
|
|
|
|
const float* __restrict__ y,
|
|
|
|
|
float* __restrict__ output,
|
|
|
|
|
const int size) {
|
|
|
|
|
const auto index = blockIdx.x * blockDim.x + threadIdx.x;
|
|
|
|
|
if (index < size) {
|
|
|
|
|
output[index] = __cosf(x[index]) + __cosf(y[index]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y) {
|
|
|
|
|
auto output = torch::zeros_like(x);
|
2018-04-30 15:48:44 +00:00
|
|
|
const int threads = 1024;
|
|
|
|
|
const int blocks = (output.numel() + threads - 1) / threads;
|
|
|
|
|
cos_add_kernel<<<blocks, threads>>>(x.data<float>(), y.data<float>(), output.data<float>(), output.numel());
|
|
|
|
|
return output;
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
# Here, the C++ source need only declare the function signature.
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source = "torch::Tensor cos_add(torch::Tensor x, torch::Tensor y);"
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="inline_jit_extension_cuda",
|
2018-04-30 15:48:44 +00:00
|
|
|
cpp_sources=cpp_source,
|
|
|
|
|
cuda_sources=cuda_source,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions=["cos_add"],
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-04-30 15:48:44 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
self.assertEqual(module.cos_add.__doc__.split("\n")[2], "cos_add")
|
2018-04-30 15:48:44 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
x = torch.randn(4, 4, device="cuda", dtype=torch.float32)
|
|
|
|
|
y = torch.randn(4, 4, device="cuda", dtype=torch.float32)
|
2018-04-30 15:48:44 +00:00
|
|
|
|
|
|
|
|
z = module.cos_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.cos() + y.cos())
|
2018-01-23 00:49:11 +00:00
|
|
|
|
2020-08-28 01:11:11 +00:00
|
|
|
@unittest.skip("Temporarily disabled")
|
2020-07-17 19:12:57 +00:00
|
|
|
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
|
|
|
|
|
def test_inline_jit_compile_custom_op_cuda(self):
|
|
|
|
|
cuda_source = """
|
|
|
|
|
__global__ void cos_add_kernel(
|
|
|
|
|
const float* __restrict__ x,
|
|
|
|
|
const float* __restrict__ y,
|
|
|
|
|
float* __restrict__ output,
|
|
|
|
|
const int size) {
|
|
|
|
|
const auto index = blockIdx.x * blockDim.x + threadIdx.x;
|
|
|
|
|
if (index < size) {
|
|
|
|
|
output[index] = __cosf(x[index]) + __cosf(y[index]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y) {
|
|
|
|
|
auto output = torch::zeros_like(x);
|
|
|
|
|
const int threads = 1024;
|
|
|
|
|
const int blocks = (output.numel() + threads - 1) / threads;
|
|
|
|
|
cos_add_kernel<<<blocks, threads>>>(x.data_ptr<float>(), y.data_ptr<float>(), output.data_ptr<float>(), output.numel());
|
|
|
|
|
return output;
|
|
|
|
|
}
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
# Here, the C++ source need only declare the function signature.
|
|
|
|
|
cpp_source = """
|
|
|
|
|
#include <torch/library.h>
|
|
|
|
|
torch::Tensor cos_add(torch::Tensor x, torch::Tensor y);
|
|
|
|
|
|
|
|
|
|
TORCH_LIBRARY(inline_jit_extension_custom_op_cuda, m) {
|
|
|
|
|
m.def("cos_add", cos_add);
|
|
|
|
|
}
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
torch.utils.cpp_extension.load_inline(
|
|
|
|
|
name="inline_jit_extension_custom_op_cuda",
|
|
|
|
|
cpp_sources=cpp_source,
|
|
|
|
|
cuda_sources=cuda_source,
|
|
|
|
|
verbose=True,
|
|
|
|
|
is_python_module=False,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
x = torch.randn(4, 4, device="cuda", dtype=torch.float32)
|
|
|
|
|
y = torch.randn(4, 4, device="cuda", dtype=torch.float32)
|
|
|
|
|
|
|
|
|
|
z = torch.ops.inline_jit_extension_custom_op_cuda.cos_add(x, y)
|
|
|
|
|
self.assertEqual(z, x.cos() + y.cos())
|
|
|
|
|
|
2018-04-30 15:48:44 +00:00
|
|
|
def test_inline_jit_compile_extension_throws_when_functions_is_bad(self):
|
|
|
|
|
with self.assertRaises(ValueError):
|
|
|
|
|
torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="invalid_jit_extension", cpp_sources="", functions=5
|
|
|
|
|
)
|
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so
Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds. This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time. And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).
This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support. This brings ATen's dynamic library
structure more similar to Caffe2's. libATen.so is no more
(this is BC BREAKING)
The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry. This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits. Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class. We similarly use
the hooks interface to handle Variable registration.
We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL. If you access the
registry directly you must maintain this invariant.
There are a few potholes along the way. I document them here:
- Previously, PyTorch maintained a separate registry for variable
types, because no provision for them was made in the Context's
type_registry. Now that we have the hooks mechanism, we can easily
have PyTorch register variables in the main registry. The code
has been refactored accordingly.
- There is a subtle ordering issue between Variable and CUDA.
We permit libATen_cuda.so and PyTorch to be loaded in either
order (in practice, CUDA is always loaded "after" PyTorch, because
it is lazily initialized.) This means that, when CUDA types are
loaded, we must subsequently also initialize their Variable equivalents.
Appropriate hooks were added to VariableHooks to make this possible;
similarly, getVariableHooks() is not referentially transparent, and
will change behavior after Variables are loaded. (This is different
to CUDAHooks, which is "burned in" after you try to initialize CUDA.)
- The cmake is adjusted to separate dependencies into either CPU
or CUDA dependencies. The generator scripts are adjusted to either
generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).
- I changed all native functions which were CUDA-only (the cudnn functions)
to have dispatches for CUDA only (making it permissible to not specify
all dispatch options.) This uncovered a bug in how we were handling
native functions which dispatch on a Type argument; I introduced a new
self_ty keyword to handle this case. I'm not 100% happy about it
but it fixed my problem.
This also exposed the fact that set_history incompletely handles
heterogenous return tuples combining Tensor and TensorList. I
swapped this codegen to use flatten() (at the possible cost of
a slight perf regression, since we're allocating another vector now
in this code path).
- thc_state is no longer a public member of Context; use getTHCState() instead
- This PR comes with Registry from Caffe2, for handling static initialization.
I needed to make a bunch of fixes to Registry to make it more portable
- No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
token pasting because it does not work with MSVC.
- It seems MSVC is not willing to generate code for constructors of template
classes at use sites which cross DLL boundaries. So we explicitly instantiate
the class to get around the problem. This involved tweaks to the boilerplate
generating macros, and also required us to shuffle around namespaces a bit,
because you can't specialize a template unless you are in the same namespace as
the template.
- Insertion of AT_API to appropriate places where the registry must be exported
- We have a general problem which is that on recent Ubuntu distributions,
--as-needed is enabled for shared libraries, which is (cc @apaszke who was
worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
make CI work, but a more sustainable solution is to attempt to dlopen
libATen_cuda.so when CUDA functionality is requested.
- The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so
- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353
- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
a few more things to CUDAHooks (getNumGPUs)
- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
only does something different for CUDAGenerator)
- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)
- CUDAHooks/VariableHooks structs live in at namespace because Registry's
namespace support is not good enough to handle it otherwise (see Registry
changes above)
- There's some modest moving around of native functions in ReduceOps and
UnaryOps to get the CUDA-only function implementations into separate files, so
they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
function due to object linkage boundaries.
- Some direct uses of native functions in CUDA code has to go away, since these
functions are not exported, so you have to go through the dispatcher
(at::native::empty_like to at::empty_like)
- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
(which matters now that TH and THC are not in the same library)
- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
both TH_API and THC_API
- TensorUtils.h is now properly exported with AT_API
- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently
- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
declare a type as possibly undefined when we should have. We didn't catch this
previously because optional annotations are not tested on "pass-through" native
ATen ops (which don't have dispatch). Upstream issue at #7316
- There's a new cmake macro aten_compile_options for applying all of our
per-target compile time options. We use this on the cpu and cuda libraries.
- test/test_cpp_extensions.py can be run directly by invoking in Python,
assuming you've setup your PYTHONPATH setup correctly
- type_from_string does some new funny business to only query for all valid CUDA
types (which causes CUDA initialization) when we see "torch.cuda." in the
requested string
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Last mile libtorch fixes
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* pedantic fix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 17:28:33 +00:00
|
|
|
|
2018-05-16 22:17:18 +00:00
|
|
|
def test_lenient_flag_handling_in_jit_extensions(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_source = """
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor tanh_add(torch::Tensor x, torch::Tensor y) {
|
2018-05-16 22:17:18 +00:00
|
|
|
return x.tanh() + y.tanh();
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-05-16 22:17:18 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="lenient_flag_handling_extension",
|
2018-05-16 22:17:18 +00:00
|
|
|
cpp_sources=cpp_source,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions="tanh_add",
|
|
|
|
|
extra_cflags=["-g\n\n", "-O0 -Wall"],
|
|
|
|
|
extra_include_paths=[" cpp_extensions\n"],
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-05-16 22:17:18 +00:00
|
|
|
|
|
|
|
|
x = torch.zeros(100, dtype=torch.float32)
|
|
|
|
|
y = torch.zeros(100, dtype=torch.float32)
|
|
|
|
|
z = module.tanh_add(x, y).cpu()
|
|
|
|
|
self.assertEqual(z, x.tanh() + y.tanh())
|
|
|
|
|
|
2020-08-28 01:11:11 +00:00
|
|
|
@unittest.skip("Temporarily disabled")
|
2020-07-17 19:12:57 +00:00
|
|
|
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
|
2018-09-10 21:04:14 +00:00
|
|
|
def test_half_support(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-09-10 21:04:14 +00:00
|
|
|
Checks for an issue with operator< ambiguity for half when certain
|
|
|
|
|
THC headers are included.
|
|
|
|
|
|
|
|
|
|
See https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333
|
|
|
|
|
for the corresponding issue.
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
|
|
|
|
cuda_source = """
|
2018-09-10 21:04:14 +00:00
|
|
|
template<typename T, typename U>
|
|
|
|
|
__global__ void half_test_kernel(const T* input, U* output) {
|
|
|
|
|
if (input[0] < input[1] || input[0] >= input[1]) {
|
|
|
|
|
output[0] = 123;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-11-06 22:28:20 +00:00
|
|
|
torch::Tensor half_test(torch::Tensor input) {
|
|
|
|
|
auto output = torch::empty(1, input.options().dtype(torch::kFloat));
|
2019-03-09 00:39:04 +00:00
|
|
|
AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "half_test", [&] {
|
2018-09-10 21:04:14 +00:00
|
|
|
half_test_kernel<scalar_t><<<1, 1>>>(
|
|
|
|
|
input.data<scalar_t>(),
|
|
|
|
|
output.data<float>());
|
|
|
|
|
});
|
|
|
|
|
return output;
|
|
|
|
|
}
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-09-10 21:04:14 +00:00
|
|
|
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="half_test_extension",
|
|
|
|
|
cpp_sources="torch::Tensor half_test(torch::Tensor input);",
|
2018-09-10 21:04:14 +00:00
|
|
|
cuda_sources=cuda_source,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions=["half_test"],
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-09-10 21:04:14 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
x = torch.randn(3, device="cuda", dtype=torch.half)
|
2018-09-10 21:04:14 +00:00
|
|
|
result = module.half_test(x)
|
|
|
|
|
self.assertEqual(result[0], 123)
|
|
|
|
|
|
2018-09-20 21:35:02 +00:00
|
|
|
def test_reload_jit_extension(self):
|
|
|
|
|
def compile(code):
|
|
|
|
|
return torch.utils.cpp_extension.load_inline(
|
2018-12-13 16:01:10 +00:00
|
|
|
name="reloaded_jit_extension",
|
2018-09-20 21:35:02 +00:00
|
|
|
cpp_sources=code,
|
2018-12-13 16:01:10 +00:00
|
|
|
functions="f",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-09-20 21:35:02 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
module = compile("int f() { return 123; }")
|
2018-09-20 21:35:02 +00:00
|
|
|
self.assertEqual(module.f(), 123)
|
|
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
module = compile("int f() { return 456; }")
|
2018-09-20 21:35:02 +00:00
|
|
|
self.assertEqual(module.f(), 456)
|
2018-12-13 16:01:10 +00:00
|
|
|
module = compile("int f() { return 456; }")
|
2018-09-20 21:35:02 +00:00
|
|
|
self.assertEqual(module.f(), 456)
|
|
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
module = compile("int f() { return 789; }")
|
2018-09-20 21:35:02 +00:00
|
|
|
self.assertEqual(module.f(), 789)
|
|
|
|
|
|
2019-10-12 19:36:41 +00:00
|
|
|
def test_cpp_frontend_module_has_same_output_as_python(self, dtype=torch.double):
|
2018-12-13 16:01:10 +00:00
|
|
|
extension = torch.utils.cpp_extension.load(
|
|
|
|
|
name="cpp_frontend_extension",
|
|
|
|
|
sources="cpp_extensions/cpp_frontend_extension.cpp",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
|
|
|
|
|
2019-10-12 19:36:41 +00:00
|
|
|
input = torch.randn(2, 5, dtype=dtype)
|
2018-12-13 16:01:10 +00:00
|
|
|
cpp_linear = extension.Net(5, 2)
|
2019-10-12 19:36:41 +00:00
|
|
|
cpp_linear.to(dtype)
|
|
|
|
|
python_linear = torch.nn.Linear(5, 2).to(dtype)
|
2018-12-13 16:01:10 +00:00
|
|
|
|
|
|
|
|
# First make sure they have the same parameters
|
|
|
|
|
cpp_parameters = dict(cpp_linear.named_parameters())
|
|
|
|
|
with torch.no_grad():
|
|
|
|
|
python_linear.weight.copy_(cpp_parameters["fc.weight"])
|
|
|
|
|
python_linear.bias.copy_(cpp_parameters["fc.bias"])
|
|
|
|
|
|
|
|
|
|
cpp_output = cpp_linear.forward(input)
|
|
|
|
|
python_output = python_linear(input)
|
|
|
|
|
self.assertEqual(cpp_output, python_output)
|
|
|
|
|
|
|
|
|
|
cpp_output.sum().backward()
|
|
|
|
|
python_output.sum().backward()
|
|
|
|
|
|
|
|
|
|
for p in cpp_linear.parameters():
|
|
|
|
|
self.assertFalse(p.grad is None)
|
|
|
|
|
|
|
|
|
|
self.assertEqual(cpp_parameters["fc.weight"].grad, python_linear.weight.grad)
|
|
|
|
|
self.assertEqual(cpp_parameters["fc.bias"].grad, python_linear.bias.grad)
|
|
|
|
|
|
|
|
|
|
def test_cpp_frontend_module_python_inter_op(self):
|
|
|
|
|
extension = torch.utils.cpp_extension.load(
|
|
|
|
|
name="cpp_frontend_extension",
|
|
|
|
|
sources="cpp_extensions/cpp_frontend_extension.cpp",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# Create a torch.nn.Module which uses the C++ module as a submodule.
|
|
|
|
|
class M(torch.nn.Module):
|
|
|
|
|
def __init__(self):
|
|
|
|
|
super(M, self).__init__()
|
|
|
|
|
self.x = torch.nn.Parameter(torch.tensor(1.0))
|
|
|
|
|
self.net = extension.Net(3, 5)
|
|
|
|
|
|
|
|
|
|
def forward(self, input):
|
|
|
|
|
return self.net.forward(input) + self.x
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
net = extension.Net(5, 2)
|
|
|
|
|
net.double()
|
2018-12-14 16:29:15 +00:00
|
|
|
net.to(torch.get_default_dtype())
|
|
|
|
|
self.assertEqual(str(net), "Net")
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
# Further embed the torch.nn.Module into a Sequential, and also add the
|
|
|
|
|
# C++ module as an element of the Sequential.
|
|
|
|
|
sequential = torch.nn.Sequential(M(), torch.nn.Tanh(), net, torch.nn.Sigmoid())
|
|
|
|
|
|
|
|
|
|
input = torch.randn(2, 3)
|
|
|
|
|
# Try calling the module!
|
|
|
|
|
output = sequential.forward(input)
|
|
|
|
|
# The call operator is bound to forward too.
|
|
|
|
|
self.assertEqual(output, sequential(input))
|
|
|
|
|
self.assertEqual(list(output.shape), [2, 2])
|
|
|
|
|
|
2018-12-14 16:29:15 +00:00
|
|
|
# Do changes on the module hierarchy.
|
|
|
|
|
old_dtype = torch.get_default_dtype()
|
|
|
|
|
sequential.to(torch.float64)
|
|
|
|
|
sequential.to(torch.float32)
|
|
|
|
|
sequential.to(old_dtype)
|
|
|
|
|
self.assertEqual(sequential[2].parameters()[0].dtype, old_dtype)
|
|
|
|
|
|
2018-12-18 00:08:05 +00:00
|
|
|
# Make sure we can access these methods recursively.
|
2021-08-12 18:39:31 +00:00
|
|
|
self.assertEqual(len(list(sequential.parameters())), len(net.parameters()) * 2 + 1)
|
|
|
|
|
self.assertEqual(len(list(sequential.named_parameters())), len(net.named_parameters()) * 2 + 1)
|
2018-12-14 16:29:15 +00:00
|
|
|
self.assertEqual(len(list(sequential.buffers())), len(net.buffers()) * 2)
|
|
|
|
|
self.assertEqual(len(list(sequential.modules())), 8)
|
|
|
|
|
|
|
|
|
|
# Test clone()
|
|
|
|
|
net2 = net.clone()
|
|
|
|
|
self.assertEqual(len(net.parameters()), len(net2.parameters()))
|
|
|
|
|
self.assertEqual(len(net.buffers()), len(net2.buffers()))
|
|
|
|
|
self.assertEqual(len(net.modules()), len(net2.modules()))
|
|
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
# Try differentiating through the whole module.
|
|
|
|
|
for parameter in net.parameters():
|
|
|
|
|
self.assertIsNone(parameter.grad)
|
|
|
|
|
output.sum().backward()
|
|
|
|
|
for parameter in net.parameters():
|
|
|
|
|
self.assertFalse(parameter.grad is None)
|
|
|
|
|
self.assertGreater(parameter.grad.sum(), 0)
|
|
|
|
|
|
|
|
|
|
# Try calling zero_grad()
|
|
|
|
|
net.zero_grad()
|
|
|
|
|
for p in net.parameters():
|
|
|
|
|
self.assertEqual(p.grad, torch.zeros_like(p))
|
|
|
|
|
|
|
|
|
|
# Test train(), eval(), training (a property)
|
2018-09-24 21:28:54 +00:00
|
|
|
self.assertTrue(net.training)
|
|
|
|
|
net.eval()
|
|
|
|
|
self.assertFalse(net.training)
|
|
|
|
|
net.train()
|
|
|
|
|
self.assertTrue(net.training)
|
|
|
|
|
net.eval()
|
|
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
# Try calling the additional methods we registered.
|
|
|
|
|
biased_input = torch.randn(4, 5)
|
|
|
|
|
output_before = net.forward(biased_input)
|
|
|
|
|
bias = net.get_bias().clone()
|
|
|
|
|
self.assertEqual(list(bias.shape), [2])
|
2018-09-24 21:28:54 +00:00
|
|
|
net.set_bias(bias + 1)
|
|
|
|
|
self.assertEqual(net.get_bias(), bias + 1)
|
2018-12-13 16:01:10 +00:00
|
|
|
output_after = net.forward(biased_input)
|
|
|
|
|
|
|
|
|
|
self.assertNotEqual(output_before, output_after)
|
|
|
|
|
|
|
|
|
|
# Try accessing parameters
|
|
|
|
|
self.assertEqual(len(net.parameters()), 2)
|
|
|
|
|
np = net.named_parameters()
|
|
|
|
|
self.assertEqual(len(np), 2)
|
|
|
|
|
self.assertIn("fc.weight", np)
|
|
|
|
|
self.assertIn("fc.bias", np)
|
|
|
|
|
|
|
|
|
|
self.assertEqual(len(net.buffers()), 1)
|
|
|
|
|
nb = net.named_buffers()
|
|
|
|
|
self.assertEqual(len(nb), 1)
|
|
|
|
|
self.assertIn("buf", nb)
|
|
|
|
|
self.assertEqual(nb[0][1], torch.eye(5))
|
|
|
|
|
|
|
|
|
|
def test_cpp_frontend_module_has_up_to_date_attributes(self):
|
|
|
|
|
extension = torch.utils.cpp_extension.load(
|
|
|
|
|
name="cpp_frontend_extension",
|
|
|
|
|
sources="cpp_extensions/cpp_frontend_extension.cpp",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
net = extension.Net(5, 2)
|
|
|
|
|
|
|
|
|
|
self.assertEqual(len(net._parameters), 0)
|
|
|
|
|
net.add_new_parameter("foo", torch.eye(5))
|
|
|
|
|
self.assertEqual(len(net._parameters), 1)
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
self.assertEqual(len(net._buffers), 1)
|
|
|
|
|
net.add_new_buffer("bar", torch.eye(5))
|
|
|
|
|
self.assertEqual(len(net._buffers), 2)
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
self.assertEqual(len(net._modules), 1)
|
|
|
|
|
net.add_new_submodule("fc2")
|
|
|
|
|
self.assertEqual(len(net._modules), 2)
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2020-07-17 19:12:57 +00:00
|
|
|
@unittest.skipIf(not (TEST_CUDA or TEST_ROCM), "CUDA not found")
|
2018-12-13 16:01:10 +00:00
|
|
|
def test_cpp_frontend_module_python_inter_op_with_cuda(self):
|
|
|
|
|
extension = torch.utils.cpp_extension.load(
|
|
|
|
|
name="cpp_frontend_extension",
|
|
|
|
|
sources="cpp_extensions/cpp_frontend_extension.cpp",
|
|
|
|
|
verbose=True,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
net = extension.Net(5, 2)
|
|
|
|
|
for p in net.parameters():
|
|
|
|
|
self.assertTrue(p.device.type == "cpu")
|
|
|
|
|
cpu_parameters = [p.clone() for p in net.parameters()]
|
|
|
|
|
|
|
|
|
|
device = torch.device("cuda", 0)
|
|
|
|
|
net.to(device)
|
|
|
|
|
|
|
|
|
|
for i, p in enumerate(net.parameters()):
|
|
|
|
|
self.assertTrue(p.device.type == "cuda")
|
|
|
|
|
self.assertTrue(p.device.index == 0)
|
|
|
|
|
self.assertEqual(cpu_parameters[i], p)
|
2018-09-24 21:28:54 +00:00
|
|
|
|
2018-12-18 00:08:05 +00:00
|
|
|
net.cpu()
|
|
|
|
|
net.add_new_parameter("a", torch.eye(5))
|
|
|
|
|
net.add_new_parameter("b", torch.eye(5))
|
|
|
|
|
net.add_new_buffer("c", torch.eye(5))
|
|
|
|
|
net.add_new_buffer("d", torch.eye(5))
|
|
|
|
|
net.add_new_submodule("fc2")
|
|
|
|
|
net.add_new_submodule("fc3")
|
|
|
|
|
|
|
|
|
|
for p in net.parameters():
|
|
|
|
|
self.assertTrue(p.device.type == "cpu")
|
|
|
|
|
|
|
|
|
|
net.cuda()
|
|
|
|
|
|
|
|
|
|
for p in net.parameters():
|
|
|
|
|
self.assertTrue(p.device.type == "cuda")
|
|
|
|
|
|
2018-11-26 17:37:04 +00:00
|
|
|
def test_returns_shared_library_path_when_is_python_module_is_true(self):
|
2018-12-13 16:01:10 +00:00
|
|
|
source = """
|
2018-11-26 17:37:04 +00:00
|
|
|
#include <torch/script.h>
|
|
|
|
|
torch::Tensor func(torch::Tensor x) { return x; }
|
2019-10-18 17:44:41 +00:00
|
|
|
static torch::RegisterOperators r("test::func", &func);
|
2018-12-13 16:01:10 +00:00
|
|
|
"""
|
2018-11-26 17:37:04 +00:00
|
|
|
torch.utils.cpp_extension.load_inline(
|
|
|
|
|
name="is_python_module",
|
|
|
|
|
cpp_sources=source,
|
|
|
|
|
functions="func",
|
|
|
|
|
verbose=True,
|
2018-12-13 16:01:10 +00:00
|
|
|
is_python_module=False,
|
|
|
|
|
)
|
2018-11-26 17:37:04 +00:00
|
|
|
self.assertEqual(torch.ops.test.func(torch.eye(5)), torch.eye(5))
|
|
|
|
|
|
2018-12-05 18:18:20 +00:00
|
|
|
def test_set_default_type_also_changes_aten_default_type(self):
|
|
|
|
|
module = torch.utils.cpp_extension.load_inline(
|
|
|
|
|
name="test_set_default_type",
|
|
|
|
|
cpp_sources="torch::Tensor get() { return torch::empty({}); }",
|
|
|
|
|
functions="get",
|
2018-12-13 16:01:10 +00:00
|
|
|
verbose=True,
|
|
|
|
|
)
|
2018-12-05 18:18:20 +00:00
|
|
|
|
|
|
|
|
initial_default = torch.get_default_dtype()
|
|
|
|
|
try:
|
|
|
|
|
self.assertEqual(module.get().dtype, initial_default)
|
|
|
|
|
torch.set_default_dtype(torch.float64)
|
|
|
|
|
self.assertEqual(module.get().dtype, torch.float64)
|
|
|
|
|
torch.set_default_dtype(torch.float32)
|
|
|
|
|
self.assertEqual(module.get().dtype, torch.float32)
|
|
|
|
|
torch.set_default_dtype(torch.float16)
|
|
|
|
|
self.assertEqual(module.get().dtype, torch.float16)
|
|
|
|
|
finally:
|
|
|
|
|
torch.set_default_dtype(initial_default)
|
|
|
|
|
|
2019-07-09 22:31:58 +00:00
|
|
|
def test_compilation_error_formatting(self):
|
2019-08-15 22:20:38 +00:00
|
|
|
# Test that the missing-semicolon error message has linebreaks in it.
|
2019-07-09 22:31:58 +00:00
|
|
|
# This'll fail if the message has been munged into a single line.
|
|
|
|
|
# It's hard to write anything more specific as every compiler has it's own
|
|
|
|
|
# error formatting.
|
|
|
|
|
with self.assertRaises(RuntimeError) as e:
|
|
|
|
|
torch.utils.cpp_extension.load_inline(
|
|
|
|
|
name="test_compilation_error_formatting",
|
2021-08-12 18:39:31 +00:00
|
|
|
cpp_sources="int main() { return 0 }")
|
|
|
|
|
pattern = r'.*(\\n|\\r).*'
|
2019-07-09 22:31:58 +00:00
|
|
|
self.assertNotRegex(str(e), pattern)
|
|
|
|
|
|
2019-11-07 16:32:51 +00:00
|
|
|
def test_warning(self):
|
|
|
|
|
# Note: the module created from this source will include the py::key_error
|
|
|
|
|
# symbol. But because of visibility and the fact that it lives in a
|
|
|
|
|
# different compilation unit than pybind, this trips up ubsan even though
|
|
|
|
|
# it is fine. "ubsan.supp" thus needs to contain "vptr:warn_mod.so".
|
2021-08-12 18:39:31 +00:00
|
|
|
source = '''
|
2019-11-07 16:32:51 +00:00
|
|
|
// error_type:
|
|
|
|
|
// 0: no error
|
|
|
|
|
// 1: torch::TypeError
|
|
|
|
|
// 2: python_error()
|
|
|
|
|
// 3: py::error_already_set
|
|
|
|
|
at::Tensor foo(at::Tensor x, int error_type) {
|
|
|
|
|
std::ostringstream err_stream;
|
|
|
|
|
err_stream << "Error with " << x.type();
|
|
|
|
|
|
|
|
|
|
TORCH_WARN(err_stream.str());
|
|
|
|
|
if(error_type == 1) {
|
|
|
|
|
throw torch::TypeError(err_stream.str().c_str());
|
|
|
|
|
}
|
|
|
|
|
if(error_type == 2) {
|
|
|
|
|
PyObject* obj = PyTuple_New(-1);
|
|
|
|
|
TORCH_CHECK(!obj);
|
|
|
|
|
// Pretend it was caught in a different thread and restored here
|
|
|
|
|
auto e = python_error();
|
|
|
|
|
e.persist();
|
|
|
|
|
e.restore();
|
|
|
|
|
throw e;
|
|
|
|
|
}
|
|
|
|
|
if(error_type == 3) {
|
|
|
|
|
throw py::key_error(err_stream.str());
|
|
|
|
|
}
|
|
|
|
|
return x.cos();
|
|
|
|
|
}
|
2021-08-12 18:39:31 +00:00
|
|
|
'''
|
2019-11-07 16:32:51 +00:00
|
|
|
|
|
|
|
|
# Ensure double type for hard-coded c name below
|
|
|
|
|
t = torch.rand(2).double()
|
|
|
|
|
cpp_tensor_name = r"CPUDoubleType"
|
|
|
|
|
|
|
|
|
|
# Without error handling, the warnings cannot be catched
|
2021-08-12 18:39:31 +00:00
|
|
|
warn_mod = torch.utils.cpp_extension.load_inline(name='warn_mod',
|
|
|
|
|
cpp_sources=[source],
|
|
|
|
|
functions=['foo'],
|
|
|
|
|
with_pytorch_error_handling=False)
|
2019-11-07 16:32:51 +00:00
|
|
|
|
|
|
|
|
with warnings.catch_warnings(record=True) as w:
|
|
|
|
|
warn_mod.foo(t, 0)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
2020-02-18 19:28:35 +00:00
|
|
|
with self.assertRaisesRegex(TypeError, t.type()):
|
2019-11-07 16:32:51 +00:00
|
|
|
warn_mod.foo(t, 1)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
2021-08-12 18:39:31 +00:00
|
|
|
with self.assertRaisesRegex(SystemError, "bad argument to internal function"):
|
2019-11-07 16:32:51 +00:00
|
|
|
warn_mod.foo(t, 2)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
|
|
|
|
with self.assertRaisesRegex(KeyError, cpp_tensor_name):
|
|
|
|
|
warn_mod.foo(t, 3)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
2021-08-12 18:39:31 +00:00
|
|
|
|
|
|
|
|
warn_mod = torch.utils.cpp_extension.load_inline(name='warn_mod',
|
|
|
|
|
cpp_sources=[source],
|
|
|
|
|
functions=['foo'],
|
|
|
|
|
with_pytorch_error_handling=True)
|
|
|
|
|
|
2019-11-07 16:32:51 +00:00
|
|
|
|
|
|
|
|
with warnings.catch_warnings(record=True) as w:
|
|
|
|
|
# Catched with no error should be detected
|
|
|
|
|
warn_mod.foo(t, 0)
|
|
|
|
|
self.assertEqual(len(w), 1)
|
|
|
|
|
|
2020-07-09 18:36:13 +00:00
|
|
|
# Catched with cpp error should also be detected
|
2019-11-07 16:32:51 +00:00
|
|
|
with self.assertRaisesRegex(TypeError, t.type()):
|
|
|
|
|
warn_mod.foo(t, 1)
|
2020-07-09 18:36:13 +00:00
|
|
|
self.assertEqual(len(w), 2)
|
2019-11-07 16:32:51 +00:00
|
|
|
|
2020-07-09 18:36:13 +00:00
|
|
|
# Catched with python error should also be detected
|
2021-08-12 18:39:31 +00:00
|
|
|
with self.assertRaisesRegex(SystemError, "bad argument to internal function"):
|
2019-11-07 16:32:51 +00:00
|
|
|
warn_mod.foo(t, 2)
|
2020-07-09 18:36:13 +00:00
|
|
|
self.assertEqual(len(w), 3)
|
2019-11-07 16:32:51 +00:00
|
|
|
|
2020-07-09 18:36:13 +00:00
|
|
|
# Catched with pybind error should also be detected
|
2019-11-07 16:32:51 +00:00
|
|
|
# Note that there is no type name translation for pybind errors
|
|
|
|
|
with self.assertRaisesRegex(KeyError, cpp_tensor_name):
|
|
|
|
|
warn_mod.foo(t, 3)
|
2020-07-09 18:36:13 +00:00
|
|
|
self.assertEqual(len(w), 4)
|
2019-11-07 16:32:51 +00:00
|
|
|
|
|
|
|
|
# Make sure raising warnings are handled properly
|
|
|
|
|
with warnings.catch_warnings(record=True) as w:
|
|
|
|
|
warnings.simplefilter("error")
|
|
|
|
|
|
|
|
|
|
# No error, the warning should raise
|
|
|
|
|
with self.assertRaisesRegex(UserWarning, t.type()):
|
|
|
|
|
warn_mod.foo(t, 0)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
|
|
|
|
# Another error happened, the warning is ignored
|
|
|
|
|
with self.assertRaisesRegex(TypeError, t.type()):
|
|
|
|
|
warn_mod.foo(t, 1)
|
|
|
|
|
self.assertEqual(len(w), 0)
|
|
|
|
|
|
2020-01-23 16:51:32 +00:00
|
|
|
def test_autograd_from_cpp(self):
|
2021-08-12 18:39:31 +00:00
|
|
|
source = '''
|
2020-01-23 16:51:32 +00:00
|
|
|
void run_back(at::Tensor x) {
|
|
|
|
|
x.backward({});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void run_back_no_gil(at::Tensor x) {
|
|
|
|
|
pybind11::gil_scoped_release no_gil;
|
|
|
|
|
x.backward({});
|
|
|
|
|
}
|
2021-08-12 18:39:31 +00:00
|
|
|
'''
|
2020-01-23 16:51:32 +00:00
|
|
|
|
|
|
|
|
class MyFn(torch.autograd.Function):
|
|
|
|
|
@staticmethod
|
|
|
|
|
def forward(ctx, x):
|
|
|
|
|
return x.clone()
|
|
|
|
|
|
|
|
|
|
@staticmethod
|
|
|
|
|
def backward(ctx, gx):
|
|
|
|
|
return gx
|
|
|
|
|
|
2021-08-12 18:39:31 +00:00
|
|
|
test_backward_deadlock = torch.utils.cpp_extension.load_inline(name='test_backward_deadlock',
|
|
|
|
|
cpp_sources=[source],
|
|
|
|
|
functions=['run_back', 'run_back_no_gil'],)
|
2020-01-23 16:51:32 +00:00
|
|
|
|
|
|
|
|
# This used to deadlock
|
|
|
|
|
inp = torch.rand(20, requires_grad=True)
|
|
|
|
|
loss = MyFn.apply(inp).sum()
|
2021-08-12 18:39:31 +00:00
|
|
|
with self.assertRaisesRegex(RuntimeError, "The autograd engine was called while holding the GIL."):
|
2020-01-23 16:51:32 +00:00
|
|
|
test_backward_deadlock.run_back(loss)
|
|
|
|
|
|
|
|
|
|
inp = torch.rand(20, requires_grad=True)
|
|
|
|
|
loss = MyFn.apply(inp).sum()
|
|
|
|
|
test_backward_deadlock.run_back_no_gil(loss)
|
|
|
|
|
|
2020-04-29 03:09:09 +00:00
|
|
|
def test_custom_compound_op_autograd(self):
|
|
|
|
|
# Test that a custom compound op (i.e. a custom op that just calls other aten ops)
|
|
|
|
|
# correctly returns gradients of those other ops
|
|
|
|
|
|
|
|
|
|
source = """
|
|
|
|
|
#include <torch/library.h>
|
|
|
|
|
torch::Tensor my_add(torch::Tensor x, torch::Tensor y) {
|
|
|
|
|
return x + y;
|
|
|
|
|
}
|
|
|
|
|
TORCH_LIBRARY(my, m) {
|
|
|
|
|
m.def("add", &my_add);
|
|
|
|
|
}
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
torch.utils.cpp_extension.load_inline(
|
|
|
|
|
name="is_python_module",
|
|
|
|
|
cpp_sources=source,
|
|
|
|
|
verbose=True,
|
|
|
|
|
is_python_module=False,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
a = torch.randn(5, 5, requires_grad=True)
|
|
|
|
|
b = torch.randn(5, 5, requires_grad=True)
|
|
|
|
|
|
|
|
|
|
gradcheck(torch.ops.my.add, [a, b], eps=1e-2)
|
|
|
|
|
|
2021-08-12 18:39:31 +00:00
|
|
|
|
2018-12-13 16:01:10 +00:00
|
|
|
if __name__ == "__main__":
|
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so
Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds. This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time. And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).
This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support. This brings ATen's dynamic library
structure more similar to Caffe2's. libATen.so is no more
(this is BC BREAKING)
The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry. This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits. Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class. We similarly use
the hooks interface to handle Variable registration.
We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL. If you access the
registry directly you must maintain this invariant.
There are a few potholes along the way. I document them here:
- Previously, PyTorch maintained a separate registry for variable
types, because no provision for them was made in the Context's
type_registry. Now that we have the hooks mechanism, we can easily
have PyTorch register variables in the main registry. The code
has been refactored accordingly.
- There is a subtle ordering issue between Variable and CUDA.
We permit libATen_cuda.so and PyTorch to be loaded in either
order (in practice, CUDA is always loaded "after" PyTorch, because
it is lazily initialized.) This means that, when CUDA types are
loaded, we must subsequently also initialize their Variable equivalents.
Appropriate hooks were added to VariableHooks to make this possible;
similarly, getVariableHooks() is not referentially transparent, and
will change behavior after Variables are loaded. (This is different
to CUDAHooks, which is "burned in" after you try to initialize CUDA.)
- The cmake is adjusted to separate dependencies into either CPU
or CUDA dependencies. The generator scripts are adjusted to either
generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).
- I changed all native functions which were CUDA-only (the cudnn functions)
to have dispatches for CUDA only (making it permissible to not specify
all dispatch options.) This uncovered a bug in how we were handling
native functions which dispatch on a Type argument; I introduced a new
self_ty keyword to handle this case. I'm not 100% happy about it
but it fixed my problem.
This also exposed the fact that set_history incompletely handles
heterogenous return tuples combining Tensor and TensorList. I
swapped this codegen to use flatten() (at the possible cost of
a slight perf regression, since we're allocating another vector now
in this code path).
- thc_state is no longer a public member of Context; use getTHCState() instead
- This PR comes with Registry from Caffe2, for handling static initialization.
I needed to make a bunch of fixes to Registry to make it more portable
- No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
token pasting because it does not work with MSVC.
- It seems MSVC is not willing to generate code for constructors of template
classes at use sites which cross DLL boundaries. So we explicitly instantiate
the class to get around the problem. This involved tweaks to the boilerplate
generating macros, and also required us to shuffle around namespaces a bit,
because you can't specialize a template unless you are in the same namespace as
the template.
- Insertion of AT_API to appropriate places where the registry must be exported
- We have a general problem which is that on recent Ubuntu distributions,
--as-needed is enabled for shared libraries, which is (cc @apaszke who was
worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
make CI work, but a more sustainable solution is to attempt to dlopen
libATen_cuda.so when CUDA functionality is requested.
- The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so
- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353
- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
a few more things to CUDAHooks (getNumGPUs)
- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
only does something different for CUDAGenerator)
- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)
- CUDAHooks/VariableHooks structs live in at namespace because Registry's
namespace support is not good enough to handle it otherwise (see Registry
changes above)
- There's some modest moving around of native functions in ReduceOps and
UnaryOps to get the CUDA-only function implementations into separate files, so
they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
function due to object linkage boundaries.
- Some direct uses of native functions in CUDA code has to go away, since these
functions are not exported, so you have to go through the dispatcher
(at::native::empty_like to at::empty_like)
- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
(which matters now that TH and THC are not in the same library)
- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
both TH_API and THC_API
- TensorUtils.h is now properly exported with AT_API
- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently
- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
declare a type as possibly undefined when we should have. We didn't catch this
previously because optional annotations are not tested on "pass-through" native
ATen ops (which don't have dispatch). Upstream issue at #7316
- There's a new cmake macro aten_compile_options for applying all of our
per-target compile time options. We use this on the cpu and cuda libraries.
- test/test_cpp_extensions.py can be run directly by invoking in Python,
assuming you've setup your PYTHONPATH setup correctly
- type_from_string does some new funny business to only query for all valid CUDA
types (which causes CUDA initialization) when we see "torch.cuda." in the
requested string
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Last mile libtorch fixes
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* pedantic fix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 17:28:33 +00:00
|
|
|
common.run_tests()
|