pytorch/torch/csrc/jit/node_hashing.cpp

170 lines
4.7 KiB
C++
Raw Normal View History

Canonicalize all includes in PyTorch. (#14849) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-09 03:32:01 +00:00
#include <torch/csrc/jit/ir.h>
#include <algorithm>
#include <unordered_map>
#include <ATen/core/functional.h>
#include <ATen/core/interned_strings.h>
#include <c10/util/Exception.h>
Canonicalize all includes in PyTorch. (#14849) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-09 03:32:01 +00:00
#include <torch/csrc/jit/node_hashing.h>
#include <torch/csrc/jit/passes/common_subexpression_elimination.h>
Canonicalize all includes in PyTorch. (#14849) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-09 03:32:01 +00:00
#include <torch/csrc/utils/hash.h>
namespace torch {
namespace jit {
namespace {
bool tensorEqual(const at::Tensor& lhs, const at::Tensor& rhs) {
return lhs.type() == rhs.type() && lhs.equal(rhs);
}
bool tensorListEqual(
const std::vector<at::Tensor>& lhs,
const std::vector<at::Tensor>& rhs) {
if (lhs.size() != rhs.size())
return false;
return std::equal(lhs.begin(), lhs.end(), rhs.begin(), tensorEqual);
}
bool typeListEqual(
const std::vector<TypePtr>& lhs,
const std::vector<TypePtr>& rhs) {
if (lhs.size() != rhs.size())
return false;
for (size_t i = 0; i < lhs.size(); ++i) {
if (*lhs[i] != *rhs[i]) {
return false;
}
}
return true;
}
// Check whether two nodes have the same attributes in CSE.
// This function may be too conservative for general use.
// Do NOT support g/gs attributes.
bool attributesEqualCSE(const Node* lhs, const Node* rhs) {
AT_ASSERT(lhs != nullptr);
AT_ASSERT(rhs != nullptr);
// One has attributes, the other does not.
if (lhs->hasAttributes() != rhs->hasAttributes())
return false;
// Neither has attributes.
if (!lhs->hasAttributes() && !rhs->hasAttributes())
return true;
auto lnames = lhs->attributeNames();
auto rnames = rhs->attributeNames();
std::sort(lnames.begin(), lnames.end());
std::sort(rnames.begin(), rnames.end());
if (lnames != rnames)
return false;
for (auto name : lnames) {
if (lhs->kindOf(name) != rhs->kindOf(name))
return false;
#define COMPARE_ATTRIBUTEVALUE(selector) \
case AttributeKind::selector: { \
if (lhs->selector(name) != rhs->selector(name)) \
return false; \
} break;
switch (lhs->kindOf(name)) {
COMPARE_ATTRIBUTEVALUE(f)
COMPARE_ATTRIBUTEVALUE(fs)
COMPARE_ATTRIBUTEVALUE(i)
COMPARE_ATTRIBUTEVALUE(is)
COMPARE_ATTRIBUTEVALUE(s)
COMPARE_ATTRIBUTEVALUE(ss)
case AttributeKind::t: {
if (!tensorEqual(lhs->t(name), rhs->t(name)))
return false;
break;
}
case AttributeKind::ts: {
if (!tensorListEqual(lhs->ts(name), rhs->ts(name)))
return false;
break;
}
case AttributeKind::ty:
if (*lhs->ty(name) != *rhs->ty(name)) {
return false;
}
break;
case AttributeKind::tys:
if (!typeListEqual(lhs->tys(name), rhs->tys(name))) {
return false;
}
break;
case AttributeKind::g:
case AttributeKind::gs:
return false;
}
#undef COMPARE_ATTRIBUTEVALUE
}
return true;
}
} // anonymous namespace
size_t HashNode::operator()(const Node* k) const {
AT_ASSERT(k != nullptr);
size_t constant_hash = 0;
if (k->kind() == prim::Constant) {
TypePtr type = k->output()->type();
if (type->isSubtypeOf(NumberType::get()) && k->kindOf(attr::value) == AttributeKind::i) {
constant_hash = std::hash<int64_t>{}(k->i(attr::value));
} else if (type->isSubtypeOf(NumberType::get()) && k->kindOf(attr::value) == AttributeKind::f) {
constant_hash = std::hash<float>{}(k->f(attr::value));
} else if (type->isSubtypeOf(BoolType::get())) {
constant_hash = std::hash<bool>{}(k->i(attr::value));
}
}
return get_hash(
k->kind(),
fmap(k->outputs(), [](const Value* v) { return v->type()->kind(); }),
fmap(k->inputs(), [](const Value* v) { return v->unique(); }),
constant_hash);
};
bool EqualNode::operator()(const Node* lhs, const Node* rhs) const {
if (lhs == nullptr && rhs == nullptr)
return true;
if (lhs == nullptr || rhs == nullptr)
return false;
if (lhs->kind() != rhs->kind())
return false;
// Check whether the output types are the same.
auto lhs_outputs = lhs->outputs();
auto rhs_outputs = rhs->outputs();
if (lhs_outputs.size() != rhs_outputs.size())
return false;
for (size_t i = 0; i < lhs_outputs.size(); ++i) {
if (*lhs_outputs[i]->type() != *rhs_outputs[i]->type())
return false;
Initial torchbind prototype (#21098) Summary: I have some test code in there as well, along with a script "test_libtorch" to run it. You'll need to modify `test_libtorch` to point to where you have `pytorch` built. I currently require that `pybind11` is included as a subdirectory of the test, but added it to the `.gitignore` to make this reviewable. Currently, something like this works: ```cpp struct Foo { int x, y; Foo(): x(2), y(5){} Foo(int x_, int y_) : x(x_), y(y_) {} void display() { cout<<"x: "<<x<<' '<<"y: "<<y<<endl; } int64_t add(int64_t z) { return (x+y)*z; } }; static auto test = torch::jit::class_<Foo>("Foo") .def(torch::jit::init<int64_t, int64_t>()) .def("display", &Foo::display) .def("add", &Foo::add) .def("combine", &Foo::combine); ``` with ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val.display() print(val.add(3)) ``` results in ``` x: 5 y: 3 24 ``` Current issues: - [x] The python class created by torchscript doesn't interactly properly with the surrounding code. ``` torch.jit.script def f(x): val = torch._C.Foo(5, 3) return val ``` - [x] Doesn't properly take in non-pointer classes. Can't define this function signature in cpp (We don't want to support this I believe). ```cpp void combine(Foo x) { ``` - [x] Has some issues with memory for blobs when constructing multiple objects (fix constant propagation pass to not treat capsules as the same object). ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val2 = torch._C.Foo(100, 0) val.display() print(val.add(3)) ``` - [ ] Can't define multiple constructors (need to define overload string. Currently not possible since we don't support overloaded methods). - [x] `init` is a little bit different syntax than `pybind`. `.init<...>()` instead of `.def(py::init<>())` - [x] I couldn't figure out how to add some files into the build so they'd be copied to the `include/` directories, so I symlinked them manually. - [ ] Currently, the conversion from Python into Torchscript doesn't work. - [ ] Torchbind also currently requires Python/Pybind dependency. Fixing this would probably involve some kind of macro to bind into Python when possible. - [ ] We pass back into Python by value, currently. There's no way of passing by reference. - [x] Currently can only register one method with the same type signature. This is because we create a `static auto opRegistry`, and the function is templated on the type signature. Somewhat blocked on https://github.com/pytorch/pytorch/pull/21177. We currently use some structures that will be refactored by his PR (namely `return_type_to_ivalue` and `ivalue_to_arg_type`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21098 Differential Revision: D16634872 Pulled By: Chillee fbshipit-source-id: 1408bb89ea649c27d560df59e2cf9920467fe1de
2019-08-03 01:41:34 +00:00
if (lhs_outputs[i]->type() == CapsuleType::get())
return false;
}
// Check whether the inputs are the same.
auto lhs_inputs = lhs->inputs();
auto rhs_inputs = rhs->inputs();
if (lhs_inputs.size() != rhs_inputs.size())
return false;
if (!std::equal(lhs_inputs.begin(), lhs_inputs.end(), rhs_inputs.begin()))
return false;
if (!attributesEqualCSE(lhs, rhs))
return false;
return true;
};
} // namespace jit
} // namespace torch