2019-08-12 21:48:06 +00:00
|
|
|
#include <ATen/ATen.h>
|
|
|
|
|
#include <ATen/core/interned_strings.h>
|
|
|
|
|
#include <ATen/core/ivalue.h>
|
|
|
|
|
#include <ATen/Parallel.h>
|
|
|
|
|
#include <ATen/ThreadLocalDebugInfo.h>
|
|
|
|
|
|
2019-01-31 01:48:59 +00:00
|
|
|
#include "test/cpp/jit/test_base.h"
|
2019-03-15 20:53:23 +00:00
|
|
|
#include "test/cpp/jit/test_utils.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
|
2019-03-12 18:25:37 +00:00
|
|
|
#include <torch/csrc/jit/passes/canonicalize.h>
|
2020-01-29 02:05:52 +00:00
|
|
|
#include <torch/csrc/jit/type_hashing.h>
|
2018-12-26 14:52:25 +00:00
|
|
|
#include "torch/csrc/autograd/generated/variable_factories.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/autograd/variable.h"
|
|
|
|
|
#include "torch/csrc/jit/argument_spec.h"
|
|
|
|
|
#include "torch/csrc/jit/attributes.h"
|
|
|
|
|
#include "torch/csrc/jit/autodiff.h"
|
|
|
|
|
#include "torch/csrc/jit/code_template.h"
|
|
|
|
|
#include "torch/csrc/jit/custom_operator.h"
|
Hierarchical device independent -> device specific architecture (#13108)
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.
The fusion flow is now:
- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.
This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.
This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.
Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)
Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code
apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108
Reviewed By: gchanan, fmassa
Differential Revision: D12850608
Pulled By: soumith
fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
2018-11-01 01:10:40 +00:00
|
|
|
#include "torch/csrc/jit/fuser/interface.h"
|
2019-02-01 20:42:28 +00:00
|
|
|
#include "torch/csrc/jit/import.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/interpreter.h"
|
2019-08-20 03:47:50 +00:00
|
|
|
#include "torch/csrc/jit/irparser.h"
|
2019-04-12 21:53:17 +00:00
|
|
|
#include "torch/csrc/jit/pass_manager.h"
|
2018-11-22 01:46:46 +00:00
|
|
|
#include "torch/csrc/jit/passes/alias_analysis.h"
|
2019-06-10 18:40:49 +00:00
|
|
|
#include "torch/csrc/jit/passes/bailout_graph.h"
|
2018-11-15 01:20:36 +00:00
|
|
|
#include "torch/csrc/jit/passes/common_subexpression_elimination.h"
|
2018-11-07 07:17:01 +00:00
|
|
|
#include "torch/csrc/jit/passes/constant_propagation.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/passes/create_autodiff_subgraphs.h"
|
|
|
|
|
#include "torch/csrc/jit/passes/dead_code_elimination.h"
|
2018-12-13 15:51:08 +00:00
|
|
|
#include "torch/csrc/jit/passes/graph_fuser.h"
|
2019-06-03 16:36:49 +00:00
|
|
|
#include "torch/csrc/jit/passes/guard_elimination.h"
|
2019-12-10 23:37:39 +00:00
|
|
|
#include "torch/csrc/jit/passes/inline_autodiff_subgraphs.h"
|
2019-05-20 17:37:49 +00:00
|
|
|
#include "torch/csrc/jit/passes/insert_guards.h"
|
2019-06-13 00:19:26 +00:00
|
|
|
#include "torch/csrc/jit/passes/liveness.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/passes/lower_grad_of.h"
|
2018-11-18 17:20:29 +00:00
|
|
|
#include "torch/csrc/jit/passes/lower_tuples.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/passes/requires_grad_analysis.h"
|
|
|
|
|
#include "torch/csrc/jit/passes/shape_analysis.h"
|
2018-11-15 01:20:36 +00:00
|
|
|
#include "torch/csrc/jit/passes/utils/subgraph_utils.h"
|
2019-11-20 01:55:42 +00:00
|
|
|
#include "torch/csrc/jit/scope.h"
|
2018-12-19 02:56:06 +00:00
|
|
|
#include "torch/csrc/jit/symbolic_script.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/tracer.h"
|
|
|
|
|
|
|
|
|
|
#include "torch/csrc/autograd/engine.h"
|
|
|
|
|
#include "torch/csrc/autograd/variable.h"
|
|
|
|
|
|
2019-03-12 18:25:37 +00:00
|
|
|
#include <torch/csrc/jit/testing/file_check.h>
|
2019-04-17 04:08:38 +00:00
|
|
|
#include "torch/csrc/jit/profiling_record.h"
|
2018-10-07 05:58:28 +00:00
|
|
|
#include "torch/csrc/jit/script/compiler.h"
|
|
|
|
|
#include "torch/csrc/jit/script/module.h"
|
2019-04-11 20:30:42 +00:00
|
|
|
#include "torch/jit.h"
|
2019-11-23 08:04:23 +00:00
|
|
|
#include <torch/script.h>
|
2018-10-07 05:58:28 +00:00
|
|
|
|
|
|
|
|
#include "onnx/onnx_pb.h"
|
|
|
|
|
|
2018-10-25 19:16:22 +00:00
|
|
|
#include <c10/util/Exception.h>
|
|
|
|
|
|
2018-10-07 05:58:28 +00:00
|
|
|
#include <algorithm>
|
|
|
|
|
#include <cstddef>
|
|
|
|
|
#include <functional>
|
|
|
|
|
#include <iostream>
|
|
|
|
|
#include <memory>
|
|
|
|
|
#include <stdexcept>
|
|
|
|
|
#include <string>
|
|
|
|
|
#include <tuple>
|
|
|
|
|
#include <unordered_set>
|
|
|
|
|
#include <utility>
|
|
|
|
|
#include <vector>
|
|
|
|
|
|
|
|
|
|
namespace torch {
|
|
|
|
|
namespace jit {
|
2019-12-03 23:57:47 +00:00
|
|
|
inline c10::OperatorOptions aliasAnalysisFromSchema() {
|
2019-08-18 23:46:56 +00:00
|
|
|
c10::OperatorOptions result;
|
|
|
|
|
result.setAliasAnalysis(c10::AliasAnalysisKind::FROM_SCHEMA);
|
|
|
|
|
return result;
|
|
|
|
|
}
|
2018-10-07 05:58:28 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
template <typename T>
|
|
|
|
|
std::ostream& operator<<(std::ostream& out, const std::vector<T>& list) {
|
|
|
|
|
size_t i = 0;
|
|
|
|
|
out << "{";
|
|
|
|
|
for (auto&& e : list) {
|
|
|
|
|
if (i++ > 0)
|
|
|
|
|
out << ", ";
|
|
|
|
|
out << e;
|
|
|
|
|
}
|
|
|
|
|
out << "}";
|
|
|
|
|
return out;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testInternedStrings() {
|
|
|
|
|
ASSERT_EQ(prim::Param, Symbol::prim("Param"));
|
|
|
|
|
ASSERT_EQ(prim::Return, Symbol::prim("Return"));
|
|
|
|
|
ASSERT_EQ(prim::Return.toUnqualString(), std::string("Return"));
|
|
|
|
|
ASSERT_EQ(prim::Return.toQualString(), std::string("prim::Return"));
|
|
|
|
|
Symbol newsym = Symbol::aten("__NEW_SYMBOL");
|
|
|
|
|
size_t symstart = newsym;
|
|
|
|
|
ASSERT_EQ(newsym.toQualString(), std::string("aten::__NEW_SYMBOL"));
|
|
|
|
|
// TODO: This test is a bit too close to the implementation details.
|
|
|
|
|
ASSERT_EQ(Symbol::aten("What"), symstart + 1);
|
|
|
|
|
ASSERT_EQ(Symbol::aten("What2"), symstart + 2);
|
|
|
|
|
ASSERT_EQ(Symbol::aten("What"), symstart + 1);
|
|
|
|
|
ASSERT_EQ(Symbol::aten("What2"), symstart + 2);
|
|
|
|
|
ASSERT_EQ(Symbol(symstart + 2).toUnqualString(), std::string("What2"));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testFromQualString() {
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("prim::Param"), Symbol::prim("Param"));
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("aten::mm"), Symbol::aten("mm"));
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("onnx::LSTM"), Symbol::onnx("LSTM"));
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("attr::value"), Symbol::attr("value"));
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("scope::"), Symbol::scope(""));
|
|
|
|
|
ASSERT_EQ(Symbol::fromQualString("::").toUnqualString(), std::string(""));
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
Symbol::fromQualString("::").ns().toQualString(),
|
|
|
|
|
std::string("namespaces::"));
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
Symbol::fromQualString("new_ns::param").toUnqualString(),
|
|
|
|
|
std::string("param"));
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
Symbol::fromQualString("new_ns::param").ns().toUnqualString(),
|
|
|
|
|
std::string("new_ns"));
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
Symbol::fromQualString("new_ns::param").ns(),
|
|
|
|
|
Symbol::fromQualString("namespaces::new_ns"));
|
|
|
|
|
|
|
|
|
|
auto bad_inputs = {"scope", ":", ""};
|
|
|
|
|
for (auto input : bad_inputs) {
|
|
|
|
|
try {
|
|
|
|
|
Symbol::fromQualString(input);
|
|
|
|
|
ASSERT_TRUE(0);
|
2018-11-07 19:18:17 +00:00
|
|
|
} catch (const std::exception& c) {
|
2018-10-07 05:58:28 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-11-18 17:20:29 +00:00
|
|
|
void testTHNNConv() {
|
|
|
|
|
std::vector<int64_t> input_size = {4, 3, 15, 17}; // B x C x H x W
|
|
|
|
|
std::vector<int64_t> kernel_size = {3, 5};
|
|
|
|
|
std::vector<int64_t> stride = {1, 2};
|
|
|
|
|
std::vector<int64_t> padding = {2, 1};
|
|
|
|
|
constexpr int out_channels = 5;
|
|
|
|
|
|
|
|
|
|
// make inputs
|
|
|
|
|
at::Tensor input = torch::randn(input_size);
|
2018-12-26 14:52:25 +00:00
|
|
|
at::Tensor weight = torch::randn(
|
|
|
|
|
{out_channels, input_size[1], kernel_size[0], kernel_size[1]});
|
2018-11-18 17:20:29 +00:00
|
|
|
at::Tensor bias = torch::randn({out_channels});
|
|
|
|
|
|
|
|
|
|
// run forward eagerly
|
|
|
|
|
at::Tensor output, finput, fgradinput;
|
2018-12-26 14:52:25 +00:00
|
|
|
std::tie(output, finput, fgradinput) = at::thnn_conv2d_forward(
|
|
|
|
|
input, weight, kernel_size, bias, stride, padding);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// make grad_outputs
|
2019-11-19 05:45:42 +00:00
|
|
|
at::Tensor grad_output =
|
|
|
|
|
torch::randn_like(output, at::MemoryFormat::Preserve);
|
|
|
|
|
at::Tensor grad_finput =
|
|
|
|
|
torch::zeros_like(finput, at::MemoryFormat::Preserve);
|
|
|
|
|
at::Tensor grad_fgradinput =
|
|
|
|
|
torch::zeros_like(fgradinput, at::MemoryFormat::Preserve);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// run backward eagerly
|
|
|
|
|
at::Tensor grad_input, grad_weight, grad_bias;
|
2018-12-26 14:52:25 +00:00
|
|
|
std::tie(grad_input, grad_weight, grad_bias) = at::thnn_conv2d_backward(
|
|
|
|
|
grad_output,
|
|
|
|
|
input,
|
|
|
|
|
weight,
|
|
|
|
|
kernel_size,
|
|
|
|
|
stride,
|
|
|
|
|
padding,
|
|
|
|
|
finput,
|
|
|
|
|
fgradinput,
|
|
|
|
|
{true, true, true});
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// make JIT graph
|
|
|
|
|
auto graph = std::make_shared<Graph>();
|
remove list specialization from ivalue (#30734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734
What are specialized lists?
The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types.
e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>.
Why do we have specialized lists?
When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>,
std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain
these same types. Conversion was just unwrapping the IValue, very easy and cheap.
What is the problem with specialized lists?
We end up with significant special cases through the compiler. Other types like Dict are not
specialized. So in the Pickler, for instance, there is a single piece of logic to handle
their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't
match Python, leading to problems along translation boundaries. Our pickle serialization
is slightly different than python, so it is harder to load objects from our IValue serialization
as Python values.
They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++
bindings to TorchScript. This would entail having a single torch::List class (untemplated)
that can be used to construct inputs. This is made much harder if the underlying ivalue needs
to be different depending on the type inside the list. The ideal case would be to have a constructor like
```
template<typename T>
List(std::vector<T> foo);
```
It would then set up the type tags correctly based on type T, without the need for passing tags.
Do specialized lists improve perf?
Not in a way we have been able to measure. Our major concern initially was having to translate
a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern
for aten::_convolution which takes a number of mostly-constant lists of integers. However,
when we measure the effect of actually having to do this conversion for an aten::_convolution,
it does not take measurable time (benchmark results below).
This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code.
What are the issues removing them?
This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly
the same. The only visible change is that toTensorListRef and family have turned into toTensorVector
because they now return by value a copy of the list as a vector.
Further PRs can then clean up the complexity issues that arose from speclization. This will likely
involve removing the isTensorList/isIntList functions, and refactoring the code that used them to
work generically. At some point we will also change serialization to no longer write specialized
lists in the pickle binary. This is forward incompatible, so will go in its own PR.
Benchmark:
```
import torch
import torch.nn as nn
import torch.nn.functional as F
import time
class MnistNet(nn.Module):
def __init__(self):
super(MnistNet, self).__init__()
self.conv1 = nn.Conv2d(1, 1, kernel_size=1)
self.conv2 = nn.Conv2d(1, 1, kernel_size=1)
def forward(self, x):
for i in range(10):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
return x
model = MnistNet()
x = torch.rand(1, 1, 1, 1)
r = torch.jit.trace(model, x )
r(x)
r(x)
r(x)
r(x)
print(torch.jit.last_executed_optimized_graph())
while True:
b = time.time()
for i in range(100):
r(x)
e = time.time()
print(e - b)
```
Results (no observable difference):
```
Before (actual conv)
0.13251137733459473
0.13260436058044434
0.13276338577270508
0.1327497959136963
0.13250041007995605
0.13270330429077148
0.13290190696716309
0.13265132904052734
0.13274288177490234
0.1326758861541748
0.13253355026245117
0.13254785537719727
0.13260746002197266
0.13285017013549805
0.13264012336730957
0.132490873336792
0.13280034065246582
0.13243484497070312
0.1325232982635498
0.1326127052307129
0.13264131546020508
0.13274383544921875
0.13298296928405762
0.1326909065246582
-------------------
After (actual conv)
0.13127517700195312
0.13150334358215332
0.13092470169067383
0.13102364540100098
0.13134360313415527
0.13155555725097656
0.13314104080200195
0.13151955604553223
0.13160037994384766
0.1315293312072754
0.13137340545654297
0.13148093223571777
0.131455659866333
0.1327371597290039
0.13134026527404785
0.13152337074279785
0.13151192665100098
0.13165974617004395
0.13403725624084473
0.13251852989196777
0.13135504722595215
0.1315624713897705
0.1317615509033203
0.1314380168914795
0.13157200813293457
--------------------
The following replace the convolution operator with a no-op, to show
that even if the conv op was made faster, then we still would not see
a difference:
Before (fake conv)
0.0069539546966552734
0.0069522857666015625
0.007120847702026367
0.007344722747802734
0.007689952850341797
0.007932662963867188
0.00761723518371582
0.007501363754272461
0.007532835006713867
0.007141828536987305
0.007174253463745117
0.007114410400390625
0.007071495056152344
------------------
After (fake conv)
0.007458209991455078
0.007337093353271484
0.007268190383911133
0.007313251495361328
0.007306575775146484
0.007468700408935547
0.0073091983795166016
0.007308483123779297
0.007538318634033203
0.007356882095336914
0.007464170455932617
0.007372140884399414
```
Test Plan: Imported from OSS
Differential Revision: D18814702
Pulled By: zdevito
fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6
2020-01-13 02:26:36 +00:00
|
|
|
auto ksz_val = graph->insertConstant(kernel_size);
|
|
|
|
|
auto kst_val = graph->insertConstant(stride);
|
|
|
|
|
auto pad_val = graph->insertConstant(padding);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
auto inputg = graph->addInput("self");
|
|
|
|
|
auto weightg = graph->addInput("weight");
|
|
|
|
|
auto biasg = graph->addInput("bias");
|
|
|
|
|
|
2018-12-26 14:52:25 +00:00
|
|
|
Value* conv = graph->insert(
|
|
|
|
|
aten::thnn_conv2d_forward,
|
|
|
|
|
{inputg, weightg, ksz_val, biasg, kst_val, pad_val});
|
2018-11-18 17:20:29 +00:00
|
|
|
auto outputs = conv->node()->outputs();
|
|
|
|
|
for (auto output : outputs) {
|
|
|
|
|
graph->registerOutput(output);
|
|
|
|
|
}
|
|
|
|
|
LowerAllTuples(graph);
|
|
|
|
|
graph->lint();
|
|
|
|
|
|
|
|
|
|
// differentiate JIT graph
|
|
|
|
|
EliminateDeadCode(graph); // Tracing of some ops depends on the DCE trick
|
|
|
|
|
ConstantPropagation(graph);
|
|
|
|
|
auto grad_spec = differentiate(graph);
|
|
|
|
|
LowerGradOf(*grad_spec.df);
|
|
|
|
|
|
|
|
|
|
// prepare JIT inputs / gradients
|
|
|
|
|
tensor_list tensors_in;
|
|
|
|
|
tensors_in.push_back(input);
|
|
|
|
|
tensors_in.push_back(weight);
|
|
|
|
|
tensors_in.push_back(bias);
|
|
|
|
|
|
|
|
|
|
tensor_list tensor_grads_in;
|
|
|
|
|
tensor_grads_in.push_back(grad_output);
|
|
|
|
|
tensor_grads_in.push_back(grad_finput);
|
|
|
|
|
tensor_grads_in.push_back(grad_fgradinput);
|
|
|
|
|
|
|
|
|
|
// Get outputs from the interpreter
|
|
|
|
|
tensor_list tensors_out, tensor_grads_out;
|
|
|
|
|
std::tie(tensors_out, tensor_grads_out) =
|
2018-12-26 14:52:25 +00:00
|
|
|
runGradient(grad_spec, tensors_in, tensor_grads_in);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// prepare expected structs
|
|
|
|
|
tensor_list expected_tensors_out, expected_tensor_grads_out;
|
|
|
|
|
expected_tensors_out.push_back(output);
|
|
|
|
|
expected_tensors_out.push_back(finput);
|
|
|
|
|
expected_tensors_out.push_back(fgradinput);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_input);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_weight);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_bias);
|
|
|
|
|
|
|
|
|
|
// Compare results
|
|
|
|
|
assertAllClose(tensors_out, expected_tensors_out);
|
|
|
|
|
assertAllClose(tensor_grads_out, expected_tensor_grads_out);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testATenNativeBatchNorm() {
|
2018-12-26 14:52:25 +00:00
|
|
|
// aten::native_batch_norm(Tensor input, Tensor weight, Tensor bias, Tensor
|
|
|
|
|
// running_mean, Tensor running_var, bool training, float momentum, float eps)
|
|
|
|
|
// -> (Tensor, Tensor, Tensor)
|
2018-11-18 17:20:29 +00:00
|
|
|
std::vector<int64_t> input_size = {4, 3, 15, 17}; // B x C x H x W
|
|
|
|
|
bool training = true;
|
|
|
|
|
float momentum = 0.9;
|
|
|
|
|
float eps = 1e-5;
|
|
|
|
|
|
|
|
|
|
// make inputs
|
|
|
|
|
at::Tensor input = torch::randn(input_size);
|
|
|
|
|
at::Tensor weight = torch::randn({input_size[1]});
|
|
|
|
|
at::Tensor bias = torch::randn({input_size[1]});
|
|
|
|
|
at::Tensor running_mean = torch::randn({input_size[1]});
|
|
|
|
|
at::Tensor running_var = torch::randn({input_size[1]});
|
|
|
|
|
|
|
|
|
|
// running_mean and running_var are changed in-place, so clone and send them
|
|
|
|
|
at::Tensor running_mean_eager = running_mean.clone();
|
|
|
|
|
at::Tensor running_var_eager = running_var.clone();
|
|
|
|
|
at::Tensor running_mean_jit = running_mean.clone();
|
|
|
|
|
at::Tensor running_var_jit = running_var.clone();
|
|
|
|
|
|
|
|
|
|
// run forward eagerly
|
|
|
|
|
at::Tensor output, savemean, saveinvstd;
|
2018-12-26 14:52:25 +00:00
|
|
|
std::tie(output, savemean, saveinvstd) = at::native_batch_norm(
|
|
|
|
|
input,
|
|
|
|
|
weight,
|
|
|
|
|
bias,
|
|
|
|
|
running_mean_eager,
|
|
|
|
|
running_var_eager,
|
|
|
|
|
training,
|
|
|
|
|
momentum,
|
|
|
|
|
eps);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// make grad_outputs
|
2019-11-19 05:45:42 +00:00
|
|
|
at::Tensor grad_output =
|
|
|
|
|
torch::randn_like(output, at::MemoryFormat::Preserve);
|
|
|
|
|
at::Tensor grad_savemean =
|
|
|
|
|
torch::zeros_like(savemean, at::MemoryFormat::Preserve);
|
|
|
|
|
at::Tensor grad_saveinvstd =
|
|
|
|
|
torch::zeros_like(saveinvstd, at::MemoryFormat::Preserve);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// run backward eagerly
|
|
|
|
|
at::Tensor grad_input, grad_weight, grad_bias;
|
2018-12-26 14:52:25 +00:00
|
|
|
// aten::native_batch_norm_backward(Tensor grad_out, Tensor input, Tensor
|
|
|
|
|
// weight, Tensor running_mean, Tensor running_var, Tensor save_mean, Tensor
|
|
|
|
|
// save_invstd, bool train, float eps, bool[3] output_mask) -> (Tensor,
|
|
|
|
|
// Tensor, Tensor)
|
|
|
|
|
std::tie(grad_input, grad_weight, grad_bias) = at::native_batch_norm_backward(
|
|
|
|
|
grad_output,
|
|
|
|
|
input,
|
|
|
|
|
weight,
|
|
|
|
|
running_mean_eager,
|
|
|
|
|
running_var_eager,
|
|
|
|
|
savemean,
|
|
|
|
|
saveinvstd,
|
|
|
|
|
training,
|
|
|
|
|
eps,
|
|
|
|
|
{true, true, true});
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// make JIT graph
|
|
|
|
|
auto graph = std::make_shared<Graph>();
|
|
|
|
|
auto training_val = graph->insertConstant(IValue(training));
|
|
|
|
|
auto momentum_val = graph->insertConstant(IValue(momentum));
|
|
|
|
|
auto eps_val = graph->insertConstant(IValue(eps));
|
|
|
|
|
|
|
|
|
|
auto inputg = graph->addInput("self");
|
|
|
|
|
auto weightg = graph->addInput("weight");
|
|
|
|
|
auto biasg = graph->addInput("bias");
|
|
|
|
|
auto running_meang = graph->addInput("running_mean");
|
|
|
|
|
auto running_varg = graph->addInput("running_var");
|
|
|
|
|
|
2018-12-26 14:52:25 +00:00
|
|
|
Value* bn = graph->insert(
|
|
|
|
|
aten::native_batch_norm,
|
|
|
|
|
{inputg,
|
|
|
|
|
weightg,
|
|
|
|
|
biasg,
|
|
|
|
|
running_meang,
|
|
|
|
|
running_varg,
|
|
|
|
|
training_val,
|
|
|
|
|
momentum_val,
|
|
|
|
|
eps_val});
|
2018-11-18 17:20:29 +00:00
|
|
|
auto outputs = bn->node()->outputs();
|
|
|
|
|
for (auto output : outputs) {
|
|
|
|
|
graph->registerOutput(output);
|
|
|
|
|
}
|
|
|
|
|
LowerAllTuples(graph);
|
|
|
|
|
graph->lint();
|
|
|
|
|
|
|
|
|
|
// differentiate JIT graph
|
|
|
|
|
EliminateDeadCode(graph); // Tracing of some ops depends on the DCE trick
|
|
|
|
|
ConstantPropagation(graph);
|
|
|
|
|
auto grad_spec = differentiate(graph);
|
|
|
|
|
LowerGradOf(*grad_spec.df);
|
|
|
|
|
|
|
|
|
|
// prepare JIT inputs / gradients
|
|
|
|
|
tensor_list tensors_in;
|
|
|
|
|
tensors_in.push_back(input);
|
|
|
|
|
tensors_in.push_back(weight);
|
|
|
|
|
tensors_in.push_back(bias);
|
|
|
|
|
tensors_in.push_back(running_mean_jit);
|
|
|
|
|
tensors_in.push_back(running_var_jit);
|
|
|
|
|
|
|
|
|
|
tensor_list tensor_grads_in;
|
|
|
|
|
tensor_grads_in.push_back(grad_output);
|
|
|
|
|
tensor_grads_in.push_back(grad_savemean);
|
|
|
|
|
tensor_grads_in.push_back(grad_saveinvstd);
|
|
|
|
|
|
|
|
|
|
// Get outputs from the interpreter
|
|
|
|
|
tensor_list tensors_out, tensor_grads_out;
|
|
|
|
|
std::tie(tensors_out, tensor_grads_out) =
|
2018-12-26 14:52:25 +00:00
|
|
|
runGradient(grad_spec, tensors_in, tensor_grads_in);
|
2018-11-18 17:20:29 +00:00
|
|
|
|
|
|
|
|
// prepare expected structs
|
|
|
|
|
tensor_list expected_tensors_out, expected_tensor_grads_out;
|
|
|
|
|
expected_tensors_out.push_back(output);
|
|
|
|
|
expected_tensors_out.push_back(savemean);
|
|
|
|
|
expected_tensors_out.push_back(saveinvstd);
|
|
|
|
|
expected_tensors_out.push_back(running_mean_eager);
|
|
|
|
|
expected_tensors_out.push_back(running_var_eager);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_input);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_weight);
|
|
|
|
|
expected_tensor_grads_out.push_back(grad_bias);
|
|
|
|
|
|
|
|
|
|
tensors_out.push_back(running_mean_jit);
|
|
|
|
|
tensors_out.push_back(running_var_jit);
|
|
|
|
|
|
|
|
|
|
// Compare results
|
|
|
|
|
assertAllClose(tensors_out, expected_tensors_out);
|
|
|
|
|
assertAllClose(tensor_grads_out, expected_tensor_grads_out);
|
|
|
|
|
}
|
|
|
|
|
|
2019-05-07 06:11:58 +00:00
|
|
|
void testCustomFusion() {
|
2019-08-20 03:47:50 +00:00
|
|
|
auto graph_string = R"IR(
|
|
|
|
|
graph(%0 : Float(2, 3, 4),
|
|
|
|
|
%1 : Float(2, 3, 4)):
|
|
|
|
|
%2 : Tensor = aten::mul(%0, %1)
|
|
|
|
|
%3 : Tensor = aten::mul(%2, %0)
|
|
|
|
|
return (%3))IR";
|
|
|
|
|
auto g = std::make_shared<Graph>();
|
|
|
|
|
torch::jit::script::parseIR(graph_string, g.get());
|
2019-05-07 06:11:58 +00:00
|
|
|
|
|
|
|
|
torch::jit::overrideCanFuseOnCPU(true);
|
|
|
|
|
CustomFuseGraph(
|
2019-08-20 03:47:50 +00:00
|
|
|
g,
|
2019-05-07 06:11:58 +00:00
|
|
|
[](Node* n) { return n->kind() != prim::Param; },
|
|
|
|
|
Symbol::fromQualString("prim::FusionGroup"));
|
|
|
|
|
torch::jit::overrideCanFuseOnCPU(false);
|
|
|
|
|
|
2019-08-20 03:47:50 +00:00
|
|
|
const auto& nodes = g->nodes();
|
2019-05-07 06:11:58 +00:00
|
|
|
auto fusion_group =
|
|
|
|
|
std::find_if(nodes.begin(), nodes.end(), [](const Node* node) {
|
|
|
|
|
return node->kind() == Symbol::fromQualString("prim::FusionGroup");
|
|
|
|
|
});
|
|
|
|
|
AT_ASSERT(fusion_group != nodes.end());
|
|
|
|
|
|
|
|
|
|
auto subgraph = fusion_group->g(attr::Subgraph);
|
|
|
|
|
auto hits = 0;
|
|
|
|
|
// two multiplications
|
|
|
|
|
for (const auto& n : subgraph->nodes()) {
|
2019-08-06 00:49:13 +00:00
|
|
|
(void)n;
|
2019-05-07 06:11:58 +00:00
|
|
|
hits++;
|
|
|
|
|
}
|
|
|
|
|
AT_ASSERT(hits == 2);
|
|
|
|
|
}
|
|
|
|
|
|
2019-07-25 12:50:43 +00:00
|
|
|
void testCustomFusionNestedBlocks() {
|
2019-08-20 03:47:50 +00:00
|
|
|
auto graph_string = R"IR(
|
|
|
|
|
graph(%0 : Float(2, 3, 4),
|
|
|
|
|
%1 : Float(2, 3, 4),
|
|
|
|
|
%2 : Float(2, 3, 4)):
|
|
|
|
|
%3 : int = prim::Constant[value=1]()
|
|
|
|
|
%4 : Tensor = prim::If(%2)
|
|
|
|
|
block0():
|
|
|
|
|
%5 : Tensor = aten::mul(%0, %2)
|
|
|
|
|
%6 : Tensor = aten::mul(%5, %1)
|
|
|
|
|
-> (%6)
|
|
|
|
|
block1():
|
|
|
|
|
%7 : Tensor = aten::add(%0, %2, %3)
|
|
|
|
|
%8 : Tensor = aten::add(%7, %1, %3)
|
|
|
|
|
-> (%8)
|
|
|
|
|
%9 : Tensor = aten::add(%4, %2, %3)
|
|
|
|
|
return (%4))IR";
|
2019-07-25 12:50:43 +00:00
|
|
|
auto g = std::make_shared<Graph>();
|
2019-08-20 03:47:50 +00:00
|
|
|
torch::jit::script::parseIR(graph_string, g.get());
|
2019-07-25 12:50:43 +00:00
|
|
|
|
|
|
|
|
CustomFuseGraph(
|
|
|
|
|
g,
|
|
|
|
|
[](Node* n) { return n->kind() == aten::mul; },
|
|
|
|
|
Symbol::fromQualString("prim::FusionGroup"));
|
2019-08-06 00:49:13 +00:00
|
|
|
|
2019-07-25 12:50:43 +00:00
|
|
|
// Could be done in more efficient ways, but this is only a test.
|
|
|
|
|
std::function<bool(const Block*, Symbol)> dfs = [&](const Block* b, Symbol s) {
|
|
|
|
|
for (auto node : b->nodes()) {
|
2019-08-06 00:49:13 +00:00
|
|
|
if (node->kind() == s)
|
2019-07-25 12:50:43 +00:00
|
|
|
return true;
|
|
|
|
|
for (auto nested_b : node->blocks())
|
|
|
|
|
if (dfs(nested_b, s))
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
AT_ASSERT(dfs(g->block(), Symbol::fromQualString("prim::FusionGroup")));
|
|
|
|
|
}
|
|
|
|
|
|
2019-03-18 16:56:25 +00:00
|
|
|
static const auto cf_examples = R"JIT(
|
2018-10-07 05:58:28 +00:00
|
|
|
def if_test(a, b):
|
|
|
|
|
# FIXME: use 0 instead of a.
|
|
|
|
|
# c = 0
|
|
|
|
|
c = a
|
|
|
|
|
if bool(a < b):
|
|
|
|
|
c = b
|
|
|
|
|
else:
|
|
|
|
|
c = a
|
|
|
|
|
return c
|
|
|
|
|
def if_one(a, b):
|
|
|
|
|
c = b
|
|
|
|
|
if bool(a < b):
|
|
|
|
|
c = a
|
|
|
|
|
return c
|
|
|
|
|
def while_test(a, i):
|
|
|
|
|
while bool(i < 3):
|
|
|
|
|
a *= a
|
|
|
|
|
i += 1
|
|
|
|
|
return a
|
|
|
|
|
)JIT";
|
|
|
|
|
void testControlFlow() {
|
2019-04-11 20:30:42 +00:00
|
|
|
auto cu = compile(cf_examples);
|
|
|
|
|
|
2018-10-07 05:58:28 +00:00
|
|
|
auto run = [&](const std::string& name, std::vector<IValue> stack) {
|
2019-04-11 20:30:42 +00:00
|
|
|
auto graph = cu->get_function(name).graph();
|
2018-10-07 05:58:28 +00:00
|
|
|
Code code(graph);
|
|
|
|
|
InterpreterState interp(code);
|
|
|
|
|
interp.run(stack);
|
|
|
|
|
return stack;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
auto L = [](int64_t l) {
|
2019-11-13 15:40:45 +00:00
|
|
|
return IValue(scalar_to_tensor(at::Scalar(l)));
|
2018-10-07 05:58:28 +00:00
|
|
|
};
|
|
|
|
|
auto V = [](IValue t) { return std::move(t).toTensor().item<int64_t>(); };
|
|
|
|
|
auto run_binary = [&](const std::string& name, int64_t a, int64_t b) {
|
|
|
|
|
return V(run(name, {L(a), L(b)})[0]);
|
|
|
|
|
};
|
|
|
|
|
ASSERT_EQ(2, run_binary("if_test", 1, 2));
|
|
|
|
|
ASSERT_EQ(3, run_binary("if_test", 3, 2));
|
|
|
|
|
ASSERT_EQ(2, run_binary("if_one", 2, 3));
|
|
|
|
|
ASSERT_EQ(2, run_binary("if_one", 3, 2));
|
|
|
|
|
ASSERT_EQ(256, run_binary("while_test", 2, 0));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testProto() {
|
|
|
|
|
::ONNX_NAMESPACE::ModelProto proto;
|
|
|
|
|
proto.set_producer_name("foo");
|
|
|
|
|
}
|
|
|
|
|
|
2019-02-01 20:42:28 +00:00
|
|
|
void testEvalModeForLoadedModule() {
|
2019-02-19 19:34:46 +00:00
|
|
|
if (isSandcastle())
|
|
|
|
|
return; // The module file to load is not generated in Sandcastle
|
2019-02-01 20:42:28 +00:00
|
|
|
std::string module_path = "dropout_model.pt";
|
2019-06-25 20:20:43 +00:00
|
|
|
torch::jit::script::Module module = torch::jit::load(module_path);
|
2019-11-07 06:56:46 +00:00
|
|
|
AT_ASSERT(module.attr("dropout").toModule().is_training());
|
2019-06-25 20:20:43 +00:00
|
|
|
module.eval();
|
2019-11-07 06:56:46 +00:00
|
|
|
AT_ASSERT(!module.attr("dropout").toModule().is_training());
|
2019-06-25 20:20:43 +00:00
|
|
|
module.train();
|
2019-11-07 06:56:46 +00:00
|
|
|
AT_ASSERT(module.attr("dropout").toModule().is_training());
|
2019-02-01 20:42:28 +00:00
|
|
|
}
|
|
|
|
|
|
2019-11-23 08:04:23 +00:00
|
|
|
void testSerializationInterop() {
|
|
|
|
|
if (isSandcastle()){
|
|
|
|
|
// The module file to load is not generated in Sandcastle
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// This should be generated by `test/cpp/jit/tests_setup.py`
|
|
|
|
|
std::ifstream input_stream("ivalue.pt");
|
|
|
|
|
std::vector<char> input;
|
|
|
|
|
input.insert(
|
|
|
|
|
input.begin(),
|
|
|
|
|
std::istream_iterator<char>(input_stream),
|
|
|
|
|
std::istream_iterator<char>());
|
|
|
|
|
IValue ivalue = pickle_load(input);
|
|
|
|
|
|
|
|
|
|
auto elements = ivalue.toTuple()->elements();
|
|
|
|
|
auto ones = torch::ones({2, 2});
|
|
|
|
|
ASSERT_TRUE(ones.equal(elements.at(0).toTensor()));
|
|
|
|
|
|
|
|
|
|
auto twos = torch::ones({3, 5}) * 2;
|
|
|
|
|
ASSERT_TRUE(twos.equal(elements.at(1).toTensor()));
|
|
|
|
|
}
|
|
|
|
|
|
2020-01-08 00:19:01 +00:00
|
|
|
void testTorchSaveError() {
|
|
|
|
|
if (isSandcastle()){
|
|
|
|
|
// The file to load is not generated in Sandcastle
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// This should be generated by `test/cpp/jit/tests_setup.py`
|
|
|
|
|
bool passed = true;
|
|
|
|
|
try {
|
|
|
|
|
torch::jit::load("eager_value.pt");
|
|
|
|
|
passed = false;
|
|
|
|
|
} catch (const std::exception& c) {
|
|
|
|
|
}
|
|
|
|
|
// Ensure torch::jit::load did not run
|
|
|
|
|
ASSERT_TRUE(passed);
|
|
|
|
|
}
|
|
|
|
|
|
2018-10-15 21:49:37 +00:00
|
|
|
// test a few features that are not directly used in schemas yet
|
|
|
|
|
void testSchemaParser() {
|
|
|
|
|
// nested arrays
|
|
|
|
|
auto s = parseSchema("at::what(int[][4] foo) -> ()");
|
2018-10-24 17:28:04 +00:00
|
|
|
ASSERT_TRUE(s.arguments().at(0).N() == 4);
|
2018-12-26 14:52:25 +00:00
|
|
|
ASSERT_TRUE(IntType::get()->isSubtypeOf(s.arguments()
|
|
|
|
|
.at(0)
|
|
|
|
|
.type()
|
|
|
|
|
->expect<ListType>()
|
2018-10-15 21:49:37 +00:00
|
|
|
->getElementType()
|
|
|
|
|
->expect<ListType>()
|
|
|
|
|
->getElementType()));
|
|
|
|
|
auto s2 = parseSchema("at::what(int[][] foo) -> ()");
|
2018-12-26 14:52:25 +00:00
|
|
|
ASSERT_TRUE(IntType::get()->isSubtypeOf(s2.arguments()
|
|
|
|
|
.at(0)
|
|
|
|
|
.type()
|
|
|
|
|
->expect<ListType>()
|
|
|
|
|
->getElementType()
|
|
|
|
|
->expect<ListType>()
|
|
|
|
|
->getElementType()));
|
2018-10-27 16:58:44 +00:00
|
|
|
|
2018-10-15 21:49:37 +00:00
|
|
|
// named returns
|
|
|
|
|
parseSchema("at::what(Tensor! i_will_be_written_to) -> ()");
|
2018-12-26 14:52:25 +00:00
|
|
|
auto s3 =
|
|
|
|
|
parseSchema("at::what() -> (Tensor the_return, Tensor the_return2)");
|
2018-10-24 17:28:04 +00:00
|
|
|
ASSERT_TRUE(s3.returns().at(0).name() == "the_return");
|
|
|
|
|
ASSERT_TRUE(s3.returns().at(1).name() == "the_return2");
|
2018-10-15 21:49:37 +00:00
|
|
|
|
2018-10-27 16:58:44 +00:00
|
|
|
// futures
|
|
|
|
|
auto s4 = parseSchema("at::what(Future(int) foo) -> ()");
|
2018-12-26 14:52:25 +00:00
|
|
|
ASSERT_TRUE(IntType::get()->isSubtypeOf(
|
|
|
|
|
s4.arguments().at(0).type()->expect<FutureType>()->getElementType()));
|
2018-10-27 16:58:44 +00:00
|
|
|
|
2018-10-15 21:49:37 +00:00
|
|
|
// test tensor with annotated alias sets
|
2018-11-16 19:32:34 +00:00
|
|
|
parseSchema("at::what(Tensor(a) foo) -> (Tensor(a))");
|
2018-10-15 21:49:37 +00:00
|
|
|
|
2018-11-16 19:32:34 +00:00
|
|
|
{
|
|
|
|
|
const auto s = parseSchema(
|
|
|
|
|
"at::what(Tensor(b|c)[](a!) list, Tensor(c) element)"
|
|
|
|
|
" -> (Tensor(b|c)[](a!))");
|
|
|
|
|
|
|
|
|
|
// The list itself is annotated with `a`
|
|
|
|
|
const auto& aliasInfo = *s.arguments().at(0).alias_info();
|
|
|
|
|
ASSERT_TRUE(
|
2019-02-28 19:28:16 +00:00
|
|
|
aliasInfo.beforeSets() ==
|
2018-11-16 19:32:34 +00:00
|
|
|
std::unordered_set<Symbol>{Symbol::fromQualString("alias::a")});
|
|
|
|
|
ASSERT_TRUE(aliasInfo.isWrite());
|
|
|
|
|
|
|
|
|
|
// Check the contained types
|
|
|
|
|
ASSERT_TRUE(!aliasInfo.containedTypes().empty());
|
|
|
|
|
const auto& containedAliasInfo = aliasInfo.containedTypes()[0];
|
|
|
|
|
const auto expected = std::unordered_set<Symbol>{
|
|
|
|
|
Symbol::fromQualString("alias::b"),
|
|
|
|
|
Symbol::fromQualString("alias::c"),
|
|
|
|
|
};
|
2019-02-28 19:28:16 +00:00
|
|
|
ASSERT_TRUE(containedAliasInfo.beforeSets() == expected);
|
|
|
|
|
ASSERT_TRUE(containedAliasInfo.afterSets() == expected);
|
|
|
|
|
ASSERT_FALSE(containedAliasInfo.isWrite());
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
const auto s = parseSchema(
|
|
|
|
|
"at::what(Tensor(b -> b|c)[](a!) list, Tensor(c) element)"
|
|
|
|
|
" -> (Tensor(b|c)[](a!))");
|
|
|
|
|
|
|
|
|
|
// The list itself is annotated with `a`
|
|
|
|
|
const auto& aliasInfo = *s.arguments().at(0).alias_info();
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
aliasInfo.beforeSets(),
|
|
|
|
|
std::unordered_set<Symbol>{Symbol::fromQualString("alias::a")});
|
|
|
|
|
ASSERT_EQ(
|
|
|
|
|
aliasInfo.afterSets(),
|
|
|
|
|
std::unordered_set<Symbol>{Symbol::fromQualString("alias::a")});
|
|
|
|
|
ASSERT_TRUE(aliasInfo.isWrite());
|
|
|
|
|
ASSERT_EQ(aliasInfo.containedTypes().size(), 1);
|
|
|
|
|
|
|
|
|
|
// Check the contained types
|
|
|
|
|
ASSERT_TRUE(!aliasInfo.containedTypes().empty());
|
|
|
|
|
const auto& containedAliasInfo = aliasInfo.containedTypes()[0];
|
|
|
|
|
const auto expectedBefore = std::unordered_set<Symbol>{
|
|
|
|
|
Symbol::fromQualString("alias::b"),
|
|
|
|
|
};
|
|
|
|
|
const auto expectedAfter = std::unordered_set<Symbol>{
|
2019-03-12 18:25:37 +00:00
|
|
|
Symbol::fromQualString("alias::b"), Symbol::fromQualString("alias::c")};
|
2019-02-28 19:28:16 +00:00
|
|
|
ASSERT_TRUE(containedAliasInfo.beforeSets() == expectedBefore);
|
|
|
|
|
ASSERT_TRUE(containedAliasInfo.afterSets() == expectedAfter);
|
2018-11-16 19:32:34 +00:00
|
|
|
ASSERT_FALSE(containedAliasInfo.isWrite());
|
|
|
|
|
}
|
2018-10-15 21:49:37 +00:00
|
|
|
}
|
|
|
|
|
|
2018-10-23 06:53:06 +00:00
|
|
|
void testTopologicalIndex() {
|
|
|
|
|
{
|
|
|
|
|
Graph graph;
|
2019-03-01 23:00:01 +00:00
|
|
|
auto node1 = graph.create(prim::AutogradZero);
|
|
|
|
|
auto node2 = graph.create(prim::AutogradZero);
|
|
|
|
|
auto node3 = graph.create(prim::AutogradZero);
|
|
|
|
|
auto node4 = graph.create(prim::AutogradZero);
|
2018-10-23 06:53:06 +00:00
|
|
|
|
|
|
|
|
graph.appendNode(node4);
|
|
|
|
|
graph.prependNode(node1);
|
|
|
|
|
node2->insertAfter(node1);
|
|
|
|
|
node3->insertBefore(node4);
|
|
|
|
|
|
|
|
|
|
// nodes should be in numerical order
|
|
|
|
|
ASSERT_TRUE(node1->isBefore(node2));
|
|
|
|
|
ASSERT_TRUE(node1->isBefore(node3));
|
|
|
|
|
ASSERT_TRUE(node1->isBefore(node4));
|
|
|
|
|
ASSERT_TRUE(node2->isAfter(node1));
|
|
|
|
|
ASSERT_TRUE(node2->isBefore(node3));
|
|
|
|
|
ASSERT_TRUE(node2->isBefore(node4));
|
|
|
|
|
ASSERT_FALSE(node3->isBefore(node1));
|
|
|
|
|
ASSERT_FALSE(node3->isBefore(node2));
|
|
|
|
|
ASSERT_FALSE(node3->isAfter(node4));
|
|
|
|
|
|
2018-11-12 22:29:55 +00:00
|
|
|
// Built up a block structure
|
|
|
|
|
// node3
|
|
|
|
|
// /\ ...
|
|
|
|
|
// A B block1
|
|
|
|
|
// \ ...
|
|
|
|
|
// C block2
|
|
|
|
|
auto block1 = node3->addBlock();
|
2019-03-01 23:00:01 +00:00
|
|
|
auto A = graph.create(prim::AutogradZero);
|
2018-11-12 22:29:55 +00:00
|
|
|
block1->appendNode(A);
|
2019-03-01 23:00:01 +00:00
|
|
|
auto B = graph.create(prim::AutogradZero);
|
2018-11-12 22:29:55 +00:00
|
|
|
block1->appendNode(B);
|
|
|
|
|
auto block2 = B->addBlock();
|
2019-03-01 23:00:01 +00:00
|
|
|
auto C = graph.create(prim::AutogradZero);
|
2018-11-12 22:29:55 +00:00
|
|
|
block2->appendNode(C);
|
|
|
|
|
|
|
|
|
|
// Check isAfter on different block levels
|
|
|
|
|
ASSERT_TRUE(node1->isBefore(A));
|
|
|
|
|
ASSERT_TRUE(A->isBefore(B));
|
|
|
|
|
ASSERT_TRUE(A->isBefore(C));
|
|
|
|
|
|
2018-10-23 06:53:06 +00:00
|
|
|
// make sure things don't blow up on deletions
|
|
|
|
|
node2->destroy();
|
2019-03-01 23:00:01 +00:00
|
|
|
auto node2p = graph.create(prim::AutogradZero);
|
2018-10-23 06:53:06 +00:00
|
|
|
node2p->insertAfter(node1);
|
|
|
|
|
ASSERT_TRUE(node1->isBefore(node2p));
|
|
|
|
|
ASSERT_TRUE(node2p->isBefore(node3));
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
// Induce reindexing to test that path
|
|
|
|
|
Graph graph;
|
|
|
|
|
std::map<size_t, Node*> nodes;
|
|
|
|
|
|
2019-03-01 23:00:01 +00:00
|
|
|
auto anchor = graph.create(prim::AutogradZero);
|
2018-10-23 06:53:06 +00:00
|
|
|
graph.appendNode(anchor);
|
|
|
|
|
// Inserting to the same place a lot will trigger reindexing
|
|
|
|
|
for (auto i = 0; i < 100; ++i) {
|
2019-03-01 23:00:01 +00:00
|
|
|
auto n = graph.create(prim::AutogradZero);
|
2018-10-23 06:53:06 +00:00
|
|
|
n->insertAfter(anchor);
|
|
|
|
|
nodes[i] = n;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Nodes should be in reverse order
|
|
|
|
|
for (auto i = 0; i < 100; ++i) {
|
|
|
|
|
for (auto j = i + 1; j < 100; ++j) {
|
|
|
|
|
ASSERT_TRUE(nodes[i]->isAfter(nodes[j]));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2019-04-16 03:24:10 +00:00
|
|
|
at::Tensor invokeTestRecordFunction(at::Tensor& t) {
|
|
|
|
|
RECORD_FUNCTION("test", std::vector<c10::IValue>({t}));
|
|
|
|
|
|
|
|
|
|
auto t2 = t.pow(2);
|
|
|
|
|
return t2;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static const auto invokeTestRecordFunction_JIT = R"JIT(
|
|
|
|
|
def forward(t):
|
|
|
|
|
t2 = t.pow(2)
|
|
|
|
|
return t2
|
|
|
|
|
)JIT";
|
|
|
|
|
|
|
|
|
|
at::Tensor invokeTestRecordFunctionJIT(at::Tensor& t) {
|
|
|
|
|
RECORD_FUNCTION("test", std::vector<c10::IValue>({t}));
|
|
|
|
|
|
|
|
|
|
auto cu = compile(invokeTestRecordFunction_JIT);
|
|
|
|
|
return cu->get_function("forward")({t}).toTensor();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
using TracedTestInputs =
|
|
|
|
|
std::vector<std::tuple<std::string, std::vector<std::vector<int64_t>>>>;
|
|
|
|
|
|
|
|
|
|
void checkTracedInputs(const TracedTestInputs& inputs) {
|
|
|
|
|
bool found_test = false;
|
|
|
|
|
bool found_pow = false;
|
|
|
|
|
bool found_mul = false;
|
|
|
|
|
for (const auto& input : inputs) {
|
|
|
|
|
const auto& fn = std::get<0>(input);
|
|
|
|
|
const auto& sizes = std::get<1>(input);
|
|
|
|
|
if (fn == "test") {
|
|
|
|
|
found_test = true;
|
2019-05-15 14:58:48 +00:00
|
|
|
TORCH_CHECK(sizes.size() == 1);
|
|
|
|
|
TORCH_CHECK(sizes[0] == std::vector<int64_t>({1, 2, 3}));
|
2019-04-16 03:24:10 +00:00
|
|
|
} else if (fn == "test::pow") {
|
|
|
|
|
found_pow = true;
|
2019-05-15 14:58:48 +00:00
|
|
|
TORCH_CHECK(sizes.size() == 2);
|
|
|
|
|
TORCH_CHECK(sizes[0] == std::vector<int64_t>({1, 2, 3}));
|
|
|
|
|
TORCH_CHECK(sizes[1].empty());
|
2019-04-16 03:24:10 +00:00
|
|
|
} else if (fn.find("::mul") != std::string::npos) {
|
|
|
|
|
found_mul = true;
|
2019-05-15 14:58:48 +00:00
|
|
|
TORCH_CHECK(sizes.size() > 1);
|
|
|
|
|
TORCH_CHECK(sizes[0] == std::vector<int64_t>({1, 2, 3}));
|
2019-04-16 03:24:10 +00:00
|
|
|
}
|
|
|
|
|
}
|
2019-05-15 14:58:48 +00:00
|
|
|
TORCH_CHECK(found_test);
|
|
|
|
|
TORCH_CHECK(found_pow);
|
|
|
|
|
TORCH_CHECK(found_mul);
|
2019-03-29 00:42:47 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
std::string getFullName(const autograd::profiler::RecordFunction* fn_ptr) {
|
|
|
|
|
std::string full_name = "";
|
|
|
|
|
while (fn_ptr != nullptr) {
|
|
|
|
|
if (!full_name.empty()) {
|
|
|
|
|
full_name = std::string(fn_ptr->name().str()) + "::" + full_name;
|
|
|
|
|
} else {
|
|
|
|
|
full_name = fn_ptr->name().str();
|
|
|
|
|
}
|
|
|
|
|
fn_ptr = fn_ptr->parent();
|
|
|
|
|
}
|
|
|
|
|
return full_name;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testRecordFunction() {
|
2019-04-16 03:24:10 +00:00
|
|
|
// [(fn, [[sizes], [sizes], ...]), ...]
|
|
|
|
|
TracedTestInputs traced_inputs;
|
2019-04-11 20:30:42 +00:00
|
|
|
autograd::profiler::pushCallback(
|
2019-04-16 03:24:10 +00:00
|
|
|
[&traced_inputs](const autograd::profiler::RecordFunction& fn) {
|
|
|
|
|
auto inputs = fn.inputs();
|
|
|
|
|
std::vector<std::vector<int64_t>> sizes;
|
|
|
|
|
for (const auto& input : inputs) {
|
2019-04-11 20:30:42 +00:00
|
|
|
if (input.isTensor()) {
|
2019-04-16 03:24:10 +00:00
|
|
|
sizes.push_back(input.toTensor().sizes().vec());
|
2019-05-20 17:37:49 +00:00
|
|
|
} else if (input.isScalar()) {
|
2019-04-16 03:24:10 +00:00
|
|
|
sizes.push_back(std::vector<int64_t>());
|
2019-04-11 20:30:42 +00:00
|
|
|
}
|
|
|
|
|
}
|
2019-04-16 03:24:10 +00:00
|
|
|
traced_inputs.push_back(
|
|
|
|
|
std::make_tuple(std::string(getFullName(&fn)), sizes));
|
2019-05-20 17:37:49 +00:00
|
|
|
},
|
|
|
|
|
[](const autograd::profiler::RecordFunction&) {},
|
2019-06-06 20:40:03 +00:00
|
|
|
/* needs_inputs */ true);
|
2019-05-15 21:38:37 +00:00
|
|
|
|
2019-03-29 00:42:47 +00:00
|
|
|
auto t = torch::randn({1, 2, 3}, at::kCPU);
|
2019-04-16 03:24:10 +00:00
|
|
|
t.set_requires_grad(true);
|
|
|
|
|
auto t2 = invokeTestRecordFunction(t);
|
2019-11-19 05:45:42 +00:00
|
|
|
t2.backward(torch::ones_like(t2, at::MemoryFormat::Preserve));
|
2019-04-16 03:24:10 +00:00
|
|
|
auto eager_inputs = traced_inputs;
|
|
|
|
|
traced_inputs.clear();
|
|
|
|
|
|
|
|
|
|
t = torch::randn({1, 2, 3}, at::kCPU);
|
|
|
|
|
t.set_requires_grad(true);
|
|
|
|
|
t2 = invokeTestRecordFunctionJIT(t);
|
2019-11-19 05:45:42 +00:00
|
|
|
t2.backward(torch::ones_like(t2, at::MemoryFormat::Preserve));
|
2019-04-16 03:24:10 +00:00
|
|
|
auto jit_inputs = traced_inputs;
|
|
|
|
|
traced_inputs.clear();
|
2019-03-29 00:42:47 +00:00
|
|
|
|
|
|
|
|
autograd::profiler::popCallback();
|
|
|
|
|
|
2019-04-16 03:24:10 +00:00
|
|
|
checkTracedInputs(eager_inputs);
|
|
|
|
|
checkTracedInputs(jit_inputs);
|
2019-06-06 20:40:03 +00:00
|
|
|
|
|
|
|
|
// test sampled callbacks
|
|
|
|
|
int sampled_cb_ctr = 0;
|
|
|
|
|
autograd::profiler::pushCallback(
|
|
|
|
|
[&sampled_cb_ctr](const autograd::profiler::RecordFunction& fn) {
|
|
|
|
|
if (std::string(fn.name().str()) == "test") {
|
|
|
|
|
++sampled_cb_ctr;
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
[](const autograd::profiler::RecordFunction&) {},
|
|
|
|
|
/* needs_inputs */ false,
|
|
|
|
|
/* sampled */ true);
|
|
|
|
|
|
|
|
|
|
int non_sampled_cb_ctr = 0;
|
|
|
|
|
autograd::profiler::pushCallback(
|
|
|
|
|
[&non_sampled_cb_ctr](const autograd::profiler::RecordFunction& fn) {
|
|
|
|
|
if (std::string(fn.name().str()) == "test") {
|
|
|
|
|
++non_sampled_cb_ctr;
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
[](const autograd::profiler::RecordFunction&) {},
|
|
|
|
|
/* needs_inputs */ false,
|
|
|
|
|
/* sampled */ false);
|
|
|
|
|
|
|
|
|
|
auto run_test_function = []() {
|
|
|
|
|
auto t = torch::randn({1, 2, 3}, at::kCPU);
|
|
|
|
|
for (auto k = 0; k < 1000; k++) {
|
|
|
|
|
invokeTestRecordFunction(t);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
autograd::profiler::setSamplingProbability(0.5);
|
|
|
|
|
run_test_function();
|
|
|
|
|
|
|
|
|
|
TORCH_CHECK(non_sampled_cb_ctr == 1000);
|
|
|
|
|
TORCH_CHECK(sampled_cb_ctr > 0 && sampled_cb_ctr < 1000);
|
|
|
|
|
|
|
|
|
|
sampled_cb_ctr = 0;
|
|
|
|
|
autograd::profiler::setSamplingProbability(0.0);
|
|
|
|
|
run_test_function();
|
|
|
|
|
|
|
|
|
|
TORCH_CHECK(non_sampled_cb_ctr == 2000);
|
|
|
|
|
TORCH_CHECK(sampled_cb_ctr == 0);
|
|
|
|
|
|
|
|
|
|
sampled_cb_ctr = 0;
|
|
|
|
|
autograd::profiler::setSamplingProbability(1.0);
|
|
|
|
|
run_test_function();
|
|
|
|
|
|
|
|
|
|
TORCH_CHECK(non_sampled_cb_ctr == 3000);
|
|
|
|
|
TORCH_CHECK(sampled_cb_ctr == 1000);
|
|
|
|
|
|
|
|
|
|
autograd::profiler::popCallback();
|
|
|
|
|
autograd::profiler::popCallback();
|
2019-03-29 00:42:47 +00:00
|
|
|
}
|
|
|
|
|
|
2019-08-12 21:48:06 +00:00
|
|
|
class TestThreadLocalDebugInfo
|
|
|
|
|
: public at::ThreadLocalDebugInfoBase {
|
|
|
|
|
public:
|
|
|
|
|
int getModelId() const {
|
|
|
|
|
return model_id_;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void setModelId(int model_id) {
|
|
|
|
|
model_id_ = model_id;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
virtual ~TestThreadLocalDebugInfo() {}
|
|
|
|
|
|
|
|
|
|
private:
|
|
|
|
|
int model_id_ = 0;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
void testThreadLocalDebugInfo() {
|
|
|
|
|
auto checkDebugInfo = [](){
|
|
|
|
|
auto debug_info = at::getThreadLocalDebugInfo();
|
|
|
|
|
TORCH_CHECK(debug_info != nullptr);
|
|
|
|
|
auto* test_debug_info = dynamic_cast<TestThreadLocalDebugInfo*>(
|
|
|
|
|
debug_info.get());
|
|
|
|
|
TORCH_CHECK(test_debug_info != nullptr);
|
|
|
|
|
TORCH_CHECK(test_debug_info->getModelId() == 42);
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
TORCH_CHECK(at::getThreadLocalDebugInfo() == nullptr);
|
|
|
|
|
auto debug_info = std::make_shared<TestThreadLocalDebugInfo>();
|
|
|
|
|
debug_info->setModelId(42);
|
|
|
|
|
at::setThreadLocalDebugInfo(debug_info);
|
|
|
|
|
|
|
|
|
|
checkDebugInfo();
|
|
|
|
|
|
|
|
|
|
// check that thread local debug info is propagated through fork calls
|
|
|
|
|
std::atomic<bool> done {false};
|
|
|
|
|
at::launch([checkDebugInfo, &done](){
|
|
|
|
|
checkDebugInfo();
|
|
|
|
|
done = true;
|
|
|
|
|
});
|
|
|
|
|
while (!done) {}
|
|
|
|
|
checkDebugInfo();
|
|
|
|
|
|
|
|
|
|
// check that thread local debug info is propagated through backward pass
|
|
|
|
|
autograd::profiler::pushCallback(
|
|
|
|
|
[&checkDebugInfo](const autograd::profiler::RecordFunction& fn) {
|
|
|
|
|
checkDebugInfo();
|
|
|
|
|
},
|
|
|
|
|
[](const autograd::profiler::RecordFunction&) {});
|
|
|
|
|
{
|
|
|
|
|
auto t = torch::randn({1, 2, 3}, at::kCPU);
|
|
|
|
|
t.set_requires_grad(true);
|
|
|
|
|
auto t2 = t.pow(2);
|
2019-11-19 05:45:42 +00:00
|
|
|
t2.backward(torch::ones_like(t2, at::MemoryFormat::Preserve));
|
2019-08-12 21:48:06 +00:00
|
|
|
}
|
|
|
|
|
autograd::profiler::popCallback();
|
|
|
|
|
|
|
|
|
|
checkDebugInfo();
|
|
|
|
|
at::setThreadLocalDebugInfo(nullptr);
|
|
|
|
|
TORCH_CHECK(at::getThreadLocalDebugInfo() == nullptr);
|
|
|
|
|
}
|
|
|
|
|
|
2019-01-31 18:06:26 +00:00
|
|
|
void testAutogradProfiler() {
|
|
|
|
|
constexpr int batch_size = 4;
|
|
|
|
|
constexpr int input_size = 256;
|
|
|
|
|
constexpr int seq_len = 32;
|
|
|
|
|
|
|
|
|
|
int hidden_size = 2 * input_size;
|
|
|
|
|
auto input = torch::randn({seq_len, batch_size, input_size}, at::kCPU);
|
|
|
|
|
auto hx = torch::randn({batch_size, hidden_size}, at::kCPU);
|
|
|
|
|
auto cx = torch::randn({batch_size, hidden_size}, at::kCPU);
|
|
|
|
|
auto w_ih = t_def(torch::randn({4 * hidden_size, input_size}, at::kCPU));
|
|
|
|
|
auto w_hh = t_def(torch::randn({4 * hidden_size, hidden_size}, at::kCPU));
|
|
|
|
|
|
|
|
|
|
std::stringstream ss;
|
|
|
|
|
{
|
|
|
|
|
autograd::profiler::RecordProfile guard(ss);
|
|
|
|
|
for (size_t i = 0; i < 100; ++i) {
|
|
|
|
|
std::tie(hx, cx) = lstm(input[0], hx, cx, w_ih, w_hh);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
std::string result = ss.str();
|
|
|
|
|
size_t count = 0;
|
|
|
|
|
for (size_t pos = 0; (pos = result.find("tanh", pos)) != std::string::npos;
|
|
|
|
|
count++, pos++) {
|
|
|
|
|
}
|
2019-05-15 14:58:48 +00:00
|
|
|
TORCH_CHECK(count == 200);
|
2019-01-31 18:06:26 +00:00
|
|
|
}
|
|
|
|
|
|
2019-02-19 19:34:46 +00:00
|
|
|
void testNoneSchemaMatch() {
|
|
|
|
|
RegisterOperators reg({
|
|
|
|
|
Operator(
|
2019-04-05 17:40:19 +00:00
|
|
|
"prim::test_none() -> int?",
|
2019-09-22 03:49:48 +00:00
|
|
|
[](const Node* node) -> Operation {
|
2019-02-19 19:34:46 +00:00
|
|
|
return [](Stack& stack) {
|
|
|
|
|
push(stack, IValue());
|
|
|
|
|
return 0;
|
|
|
|
|
};
|
2019-07-25 18:37:34 +00:00
|
|
|
},
|
|
|
|
|
aliasAnalysisFromSchema()),
|
2019-02-19 19:34:46 +00:00
|
|
|
Operator(
|
2019-04-05 17:40:19 +00:00
|
|
|
"prim::is_none(int? a) -> bool",
|
2019-09-22 03:49:48 +00:00
|
|
|
[](const Node* node) -> Operation {
|
2019-02-19 19:34:46 +00:00
|
|
|
return [](Stack& stack) {
|
|
|
|
|
IValue a = pop(stack);
|
|
|
|
|
if (a.isNone()) {
|
|
|
|
|
push(stack, true);
|
|
|
|
|
} else {
|
|
|
|
|
push(stack, false);
|
|
|
|
|
}
|
|
|
|
|
return 0;
|
|
|
|
|
};
|
2019-07-25 18:37:34 +00:00
|
|
|
},
|
|
|
|
|
aliasAnalysisFromSchema()),
|
2019-02-19 19:34:46 +00:00
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Constant propagation will run test_none and produce a None,
|
|
|
|
|
// testing that its type is set appropriately and schema matching doesn't
|
|
|
|
|
// fail when running is_none
|
|
|
|
|
|
|
|
|
|
auto r = std::make_shared<Graph>();
|
|
|
|
|
auto& g = *r;
|
2019-04-05 17:40:19 +00:00
|
|
|
auto opt_int = g.insert(Symbol::fromQualString("prim::test_none"), {});
|
|
|
|
|
auto out_bool = g.insert(Symbol::fromQualString("prim::is_none"), {opt_int});
|
2019-02-19 19:34:46 +00:00
|
|
|
g.registerOutput(out_bool);
|
|
|
|
|
ConstantPropagation(r);
|
|
|
|
|
|
|
|
|
|
auto nodes = r->block()->nodes();
|
|
|
|
|
// checking that constant propagation ran wo/failure
|
|
|
|
|
AT_ASSERT(std::distance(nodes.begin(), nodes.end()) == 1);
|
|
|
|
|
}
|
2019-04-11 20:30:42 +00:00
|
|
|
|
|
|
|
|
void testModuleDefine() {
|
2019-06-25 20:20:43 +00:00
|
|
|
script::Module m("m");
|
|
|
|
|
m.register_parameter("foo", torch::ones({}), false);
|
|
|
|
|
m.define(R"(
|
2019-04-11 20:30:42 +00:00
|
|
|
def add_it(self, x, b : int = 4):
|
|
|
|
|
return self.foo + x + b
|
|
|
|
|
)");
|
2019-06-25 20:20:43 +00:00
|
|
|
auto result = m.run_method("add_it", torch::ones({}));
|
2019-07-23 16:47:32 +00:00
|
|
|
AT_ASSERT(result.toTensor().item<float>() == 6);
|
2019-04-11 20:30:42 +00:00
|
|
|
}
|
|
|
|
|
|
2019-04-26 20:03:50 +00:00
|
|
|
void testModuleConversion() {
|
2019-06-25 20:20:43 +00:00
|
|
|
script::Module m("test");
|
2019-04-26 20:03:50 +00:00
|
|
|
{
|
|
|
|
|
// test cuda to cpu for params and buffers
|
2019-06-25 20:20:43 +00:00
|
|
|
m.register_parameter("foo", torch::ones({}, at::kCUDA), false);
|
|
|
|
|
m.register_buffer("bar", torch::ones({}, at::kCUDA));
|
2019-05-10 21:14:49 +00:00
|
|
|
|
2019-06-25 20:20:43 +00:00
|
|
|
m.to(at::kCUDA);
|
|
|
|
|
m.to(at::kCPU);
|
2019-11-07 06:56:46 +00:00
|
|
|
AT_ASSERT(m.attr("foo").toTensor().device().is_cpu());
|
|
|
|
|
AT_ASSERT(m.attr("bar").toTensor().device().is_cpu());
|
2019-04-26 20:03:50 +00:00
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
// test cpu to cuda for params and buffers
|
2019-06-25 20:20:43 +00:00
|
|
|
m.register_parameter("foo", torch::ones({}), false);
|
|
|
|
|
m.register_buffer("bar", torch::ones({}));
|
2019-05-10 21:14:49 +00:00
|
|
|
|
2019-06-25 20:20:43 +00:00
|
|
|
m.to(at::kCUDA);
|
2019-11-07 06:56:46 +00:00
|
|
|
AT_ASSERT(m.attr("foo").toTensor().device().is_cuda());
|
|
|
|
|
AT_ASSERT(m.attr("bar").toTensor().device().is_cuda());
|
2019-04-26 20:03:50 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2019-04-12 21:53:17 +00:00
|
|
|
static int testPassValue = 0;
|
|
|
|
|
void fakePass(std::shared_ptr<Graph>& g) {
|
|
|
|
|
testPassValue++;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
RegisterPass p(fakePass);
|
|
|
|
|
|
|
|
|
|
void testPassManagement() {
|
|
|
|
|
std::shared_ptr<Graph> graph = std::make_shared<Graph>();
|
|
|
|
|
script::parseIR(
|
|
|
|
|
R"IR(
|
|
|
|
|
graph(%a):
|
|
|
|
|
return (%a))IR",
|
|
|
|
|
&*graph);
|
|
|
|
|
|
|
|
|
|
std::vector<IValue> stack = {IValue(torch::randn({22}, at::kCPU))};
|
|
|
|
|
auto run = [&](std::shared_ptr<Graph>& graph, std::vector<IValue> stack) {
|
|
|
|
|
GraphExecutor executor(graph);
|
|
|
|
|
executor.run(stack);
|
|
|
|
|
return stack;
|
|
|
|
|
};
|
|
|
|
|
run(graph, stack);
|
2019-11-11 21:39:03 +00:00
|
|
|
// we will not run fusion in simple mode
|
|
|
|
|
if (!getExecutorMode()) {
|
|
|
|
|
AT_ASSERT(testPassValue);
|
|
|
|
|
}
|
2019-04-12 21:53:17 +00:00
|
|
|
}
|
|
|
|
|
|
2019-05-20 17:37:49 +00:00
|
|
|
static void checkShape(
|
|
|
|
|
Node* n,
|
|
|
|
|
std::vector<int64_t> expected,
|
|
|
|
|
bool prev = true) {
|
|
|
|
|
auto profile = (prev) ? n->inputs().at(0)->node() : n;
|
2019-05-10 21:14:49 +00:00
|
|
|
auto tp = profile->output()->type();
|
2019-08-20 19:57:40 +00:00
|
|
|
auto ptp = tp->expect<TensorType>();
|
2019-04-17 04:08:38 +00:00
|
|
|
ASSERT_EQ(ptp->sizes().concrete_sizes().value(), expected);
|
|
|
|
|
}
|
|
|
|
|
|
2019-06-21 22:01:25 +00:00
|
|
|
void testInsertAndEliminateRedundantGuards() {
|
2019-05-20 17:37:49 +00:00
|
|
|
static const auto basic_example = R"JIT(
|
|
|
|
|
def basic(x, y):
|
|
|
|
|
a = x + y
|
|
|
|
|
b = x * y
|
|
|
|
|
c = x + 1
|
|
|
|
|
d = a - c
|
|
|
|
|
e = b - c
|
|
|
|
|
return d + e
|
|
|
|
|
)JIT";
|
|
|
|
|
|
|
|
|
|
auto cu = compile(basic_example);
|
|
|
|
|
auto& fun = cu->get_function("basic");
|
|
|
|
|
auto pr = ProfilingRecord::instrumentGraph(fun.graph());
|
|
|
|
|
auto x = at::randn({2, 3}, at::kCPU);
|
|
|
|
|
auto y = at::randn({2, 3}, at::kCPU);
|
2019-11-13 15:40:45 +00:00
|
|
|
auto stack = createStack({x, y});
|
2019-05-20 17:37:49 +00:00
|
|
|
// introduce some profiling information
|
|
|
|
|
Code cd(pr->profiled_graph_);
|
|
|
|
|
InterpreterState is{cd};
|
|
|
|
|
is.run(stack);
|
|
|
|
|
auto copy = pr->profiled_graph_->copy();
|
|
|
|
|
InsertGuards(copy);
|
|
|
|
|
auto nodes = copy->block()->nodes();
|
|
|
|
|
auto guard = std::find_if(nodes.begin(), nodes.end(), [](Node* n) {
|
|
|
|
|
return n->kind() == prim::Guard;
|
|
|
|
|
});
|
|
|
|
|
ASSERT_NE(guard, nodes.end());
|
2019-08-20 19:57:40 +00:00
|
|
|
ASSERT_EQ(
|
|
|
|
|
guard->input()->type()->expect<TensorType>()->sizes().size(),
|
|
|
|
|
c10::nullopt);
|
2019-05-20 17:37:49 +00:00
|
|
|
checkShape(*guard, {2, 3}, false);
|
2019-06-03 16:36:49 +00:00
|
|
|
auto is_guard = [](Node* n) { return n->kind() == prim::Guard; };
|
|
|
|
|
int num_guards = std::count_if(nodes.begin(), nodes.end(), is_guard);
|
2019-10-03 17:38:07 +00:00
|
|
|
ASSERT_EQ(num_guards, 12);
|
2019-06-03 16:36:49 +00:00
|
|
|
// now eliminate as many guards as possible
|
|
|
|
|
// we should be left with two guards on x and y's defs
|
2019-06-14 23:51:59 +00:00
|
|
|
EliminateRedundantGuards(copy);
|
2019-06-03 16:36:49 +00:00
|
|
|
num_guards = std::count_if(nodes.begin(), nodes.end(), is_guard);
|
|
|
|
|
ASSERT_EQ(num_guards, 2);
|
2019-05-20 17:37:49 +00:00
|
|
|
}
|
|
|
|
|
|
2019-06-10 18:40:49 +00:00
|
|
|
void testInsertBailOuts() {
|
|
|
|
|
static const auto basic_example = R"JIT(
|
|
|
|
|
def basic_loop(x, y):
|
|
|
|
|
|
|
|
|
|
a = x + 1
|
|
|
|
|
b = y + 2
|
|
|
|
|
c = x + y + 3
|
|
|
|
|
|
|
|
|
|
for i in range(10):
|
|
|
|
|
a = a + b
|
|
|
|
|
# invariant
|
|
|
|
|
d = b * c
|
|
|
|
|
#
|
|
|
|
|
a = a - d
|
|
|
|
|
|
|
|
|
|
e = a + 4
|
|
|
|
|
return e
|
|
|
|
|
)JIT";
|
|
|
|
|
|
|
|
|
|
auto cu = compile(basic_example);
|
|
|
|
|
auto& fun = cu->get_function("basic_loop");
|
|
|
|
|
auto pr = ProfilingRecord::instrumentGraph(fun.graph());
|
|
|
|
|
auto x = at::randn({2, 3}, at::kCPU);
|
|
|
|
|
auto y = at::randn({2, 3}, at::kCPU);
|
2019-11-13 15:40:45 +00:00
|
|
|
auto stack = createStack({x, y});
|
2019-06-10 18:40:49 +00:00
|
|
|
// introduce some profiling information
|
|
|
|
|
Code cd(pr->profiled_graph_);
|
|
|
|
|
InterpreterState is{cd};
|
|
|
|
|
is.run(stack);
|
|
|
|
|
auto copy = pr->profiled_graph_->copy();
|
|
|
|
|
InsertGuards(copy);
|
2019-06-14 23:51:59 +00:00
|
|
|
EliminateRedundantGuards(copy);
|
2019-06-10 18:40:49 +00:00
|
|
|
auto nodes = copy->block()->nodes();
|
|
|
|
|
auto is_guard = [](Node* n) { return n->kind() == prim::Guard; };
|
|
|
|
|
auto num_guards = std::count_if(nodes.begin(), nodes.end(), is_guard);
|
|
|
|
|
ASSERT_EQ(num_guards, 3);
|
|
|
|
|
InsertBailOuts(copy);
|
|
|
|
|
auto is_bailout = [](Node* n) { return n->kind() == prim::BailOut; };
|
|
|
|
|
auto num_bailouts = std::count_if(nodes.begin(), nodes.end(), is_bailout);
|
|
|
|
|
ASSERT_EQ(num_guards, num_bailouts);
|
|
|
|
|
std::vector<Node*> bailouts(num_bailouts);
|
|
|
|
|
std::copy_if(nodes.begin(), nodes.end(), bailouts.begin(), is_bailout);
|
2019-06-21 04:19:25 +00:00
|
|
|
|
|
|
|
|
for (auto blo : bailouts) {
|
2019-07-05 00:07:52 +00:00
|
|
|
ASSERT_EQ(blo->inputs().at(0)->node()->kind(), prim::BailoutTemplate);
|
2019-06-21 04:19:25 +00:00
|
|
|
}
|
2019-06-10 18:40:49 +00:00
|
|
|
}
|
|
|
|
|
|
2019-04-17 04:08:38 +00:00
|
|
|
void testProfiler() {
|
|
|
|
|
constexpr int batch_size = 4;
|
|
|
|
|
constexpr int input_size = 256;
|
|
|
|
|
|
|
|
|
|
int hidden_size = 2 * input_size;
|
|
|
|
|
|
|
|
|
|
auto input = at::randn({batch_size, input_size}, at::kCPU);
|
|
|
|
|
auto hx = at::randn({batch_size, hidden_size}, at::kCPU);
|
|
|
|
|
auto cx = at::randn({batch_size, hidden_size}, at::kCPU);
|
|
|
|
|
auto w_ih = t_def(at::randn({4 * hidden_size, input_size}, at::kCPU));
|
|
|
|
|
auto w_hh = t_def(at::randn({4 * hidden_size, hidden_size}, at::kCPU));
|
|
|
|
|
|
|
|
|
|
auto g = build_lstm();
|
2019-11-13 15:40:45 +00:00
|
|
|
auto stack = createStack({input, hx, cx, w_ih, w_hh});
|
2019-04-17 04:08:38 +00:00
|
|
|
|
|
|
|
|
auto& opt_graph = *g.get();
|
|
|
|
|
ArgumentSpecCreator arg_spec_creator(opt_graph);
|
|
|
|
|
ArgumentSpec spec =
|
|
|
|
|
arg_spec_creator.create(autograd::GradMode::is_enabled(), stack);
|
Specialize Optional[T] to T (or subtype for Tensor) or None when executing graph (#18407)
Summary:
This patch specializes `Optional[Tensor]` graph inputs to either a `DimensionedTensorType` (if a Tensor is passed) or `NoneType`. Other `Optional[T]` are specialized to `T` or `None`.
- For unwrapping (checked and unchecked) we need to keep the output type, as IR code that follows unwrapping may not work with NoneType (just as it doesn't deal with Optional). While it would not be hit during execution, it will run against the (legitimate) assumptions of the analysis passes.
- Function lookup currently will not match NoneType when it expects optional (I'm not entirely sure why this doesn't lead to unhappyness currently, but hey), I amend this at the level of the function matching code (`operator.cpp`), but see Adam's comments. We would run into trouble if we needed to select between functions whose signature only differs in Optional types with different subtypes, but we would have the same problem when calling them directly, so I would think this is OK.
- It would enable throwing away branches we can't hit. This also reduces the "blockyness" of the graph, so it may be easier to apply optimizations (e.g. fuse things in `if t is None: ...` and outside the `if`.
- Arguments passed into `Optional[Tensor]` arguments will get shape information, which is very handy.
- It get's rid of the problem that tensors passed into Optional arguments get requires_grad set erroneously #18270 (though that also affects lists, which aren't fixed here).
- `Optional[List[int]]` is needed for #18697.
- We're changing typing in a more subtle way than the `TensorType`->`DimensionedTensorType`.
- In particular, specializing to NoneType loses the Type information captured in the `OptionalType` element type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18407
Reviewed By: zdevito
Differential Revision: D15216808
Pulled By: eellison
fbshipit-source-id: 01f1a7643deaf4962c3f55eff2070d54b0e54b69
2019-05-06 21:54:10 +00:00
|
|
|
arg_spec_creator.specializeTypes(opt_graph, spec);
|
2019-04-17 04:08:38 +00:00
|
|
|
auto pr = ProfilingRecord::instrumentGraph(g);
|
|
|
|
|
Code cd(pr->profiled_graph_);
|
|
|
|
|
InterpreterState is{cd};
|
|
|
|
|
is.run(stack);
|
|
|
|
|
|
|
|
|
|
auto begin = pr->profiled_graph_->block()->nodes().begin();
|
|
|
|
|
auto end = pr->profiled_graph_->block()->nodes().end();
|
|
|
|
|
auto mm =
|
|
|
|
|
std::find_if(begin, end, [](Node* n) { return n->kind() == aten::mm; });
|
|
|
|
|
ASSERT_NE(mm, end);
|
2019-05-10 21:14:49 +00:00
|
|
|
std::vector<int64_t> mm_expected{4, 256};
|
2019-04-17 04:08:38 +00:00
|
|
|
std::vector<int64_t> eltwise{4, 512};
|
|
|
|
|
checkShape(*mm, mm_expected);
|
|
|
|
|
auto sigmoid_n = std::find_if(
|
|
|
|
|
begin, end, [](Node* n) { return n->kind() == aten::sigmoid; });
|
|
|
|
|
ASSERT_NE(sigmoid_n, end);
|
|
|
|
|
checkShape(*sigmoid_n, eltwise);
|
|
|
|
|
auto tanh_n =
|
|
|
|
|
std::find_if(begin, end, [](Node* n) { return n->kind() == aten::tanh; });
|
|
|
|
|
checkShape(*tanh_n, eltwise);
|
|
|
|
|
}
|
|
|
|
|
|
2019-11-20 01:55:42 +00:00
|
|
|
void testCallStack() {
|
|
|
|
|
const auto text = R"(
|
|
|
|
|
def ham(x):
|
|
|
|
|
return x/7
|
|
|
|
|
|
|
|
|
|
def bar(x):
|
|
|
|
|
return x*3
|
|
|
|
|
|
|
|
|
|
def baz(x):
|
|
|
|
|
return ham(x)*x
|
|
|
|
|
|
|
|
|
|
def foo(x):
|
|
|
|
|
return bar(x)*baz(x)*11
|
|
|
|
|
)";
|
|
|
|
|
auto cu = compile(text);
|
|
|
|
|
const Function& foo = cu->get_function("foo");
|
|
|
|
|
for (Node* n : foo.optimized_graph()->nodes()) {
|
|
|
|
|
if (n->kind() == prim::Constant) {
|
|
|
|
|
if (!n->hasAttribute(attr::value) ||
|
|
|
|
|
n->kindOf(attr::value) != AttributeKind::i) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
int v = n->i(attr::value);
|
|
|
|
|
switch (v) {
|
|
|
|
|
case 3: {
|
|
|
|
|
// Const 3 comes from function 'bar', which gets inlined to 'foo'.
|
|
|
|
|
// The callstack for the corresponding node should contain only the
|
|
|
|
|
// function 'bar'.
|
|
|
|
|
ASSERT_TRUE(n->callstack());
|
|
|
|
|
auto callstack_vector = (*n->callstack())->vec();
|
|
|
|
|
ASSERT_EQ(callstack_vector.size(), 1);
|
|
|
|
|
ASSERT_EQ(callstack_vector[0].first, &cu->get_function("bar"));
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case 7: {
|
|
|
|
|
// Const 7 comes from function 'ham', which gets inlined to 'baz',
|
|
|
|
|
// which is then inlined to 'foo'. The callstack for the corresponding
|
|
|
|
|
// node should contain these two functions.
|
|
|
|
|
ASSERT_TRUE(n->callstack());
|
|
|
|
|
auto callstack_vector = (*n->callstack())->vec();
|
|
|
|
|
ASSERT_EQ(callstack_vector.size(), 2);
|
|
|
|
|
ASSERT_EQ(callstack_vector[0].first, &cu->get_function("baz"));
|
|
|
|
|
ASSERT_EQ(callstack_vector[1].first, &cu->get_function("ham"));
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case 11: {
|
|
|
|
|
// Const 11 comes from function 'foo', which is not inlined anywhere
|
|
|
|
|
// and thus it should not have a callstack.
|
|
|
|
|
ASSERT_FALSE(n->callstack());
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Check that inlining doesn't corrupt callstack of the callee's nodes.
|
|
|
|
|
const Function& baz = cu->get_function("baz");
|
|
|
|
|
for (Node* n : baz.optimized_graph()->nodes()) {
|
|
|
|
|
if (n->kind() == prim::Constant) {
|
|
|
|
|
if (!n->hasAttribute(attr::value) ||
|
|
|
|
|
n->kindOf(attr::value) != AttributeKind::i) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
int v = n->i(attr::value);
|
|
|
|
|
ASSERT_TRUE(v == 7);
|
|
|
|
|
// Const 7 comes from function 'ham', which gets inlined to 'baz'. 'baz'
|
|
|
|
|
// was also inlined into 'foo', but when looking at the graph of 'baz' we
|
|
|
|
|
// should only see a callstack of depth 1 (containing only 'ham').
|
|
|
|
|
ASSERT_TRUE(n->callstack());
|
|
|
|
|
auto callstack_vector = (*n->callstack())->vec();
|
|
|
|
|
ASSERT_EQ(callstack_vector.size(), 1);
|
|
|
|
|
ASSERT_EQ(callstack_vector[0].first, &cu->get_function("ham"));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void testCallStackCaching() {
|
|
|
|
|
const auto text = R"(
|
|
|
|
|
|
|
|
|
|
def a(x):
|
|
|
|
|
print("a1")
|
|
|
|
|
print("a2")
|
|
|
|
|
return x
|
|
|
|
|
|
|
|
|
|
def b(x):
|
|
|
|
|
print("b1")
|
|
|
|
|
print("b2")
|
|
|
|
|
a(x)
|
|
|
|
|
return x
|
|
|
|
|
|
|
|
|
|
def c(x):
|
|
|
|
|
print("c1")
|
|
|
|
|
print("c2")
|
|
|
|
|
b(x)
|
|
|
|
|
return x
|
|
|
|
|
)";
|
|
|
|
|
auto cu = compile(text);
|
|
|
|
|
const Function& baz = cu->get_function("c");
|
|
|
|
|
std::unordered_map<std::string, InlinedCallStack*> callstack_objects;
|
|
|
|
|
for (Node* n : baz.optimized_graph()->nodes()) {
|
|
|
|
|
if (n->kind() == prim::Constant) {
|
|
|
|
|
if (!n->hasAttribute(attr::value) ||
|
|
|
|
|
n->kindOf(attr::value) != AttributeKind::s) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
std::string v = n->s(attr::value);
|
|
|
|
|
if (n->callstack()) {
|
|
|
|
|
callstack_objects[v] = n->callstack()->get();
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// We expect to see nodes prim::Constant[value="a1"] and
|
|
|
|
|
// prim::Constant[value="a2"] inlined to function 'c'. Their callstacks are
|
|
|
|
|
// the same (a->b->c), so we want to make sure we're not creating different
|
|
|
|
|
// callstack entries for them.
|
|
|
|
|
ASSERT_TRUE(callstack_objects.count("a1") && callstack_objects.count("a2"));
|
|
|
|
|
ASSERT_TRUE(callstack_objects.at("a1") == callstack_objects.at("a2"));
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-10 23:37:39 +00:00
|
|
|
void testAutogradSymbols() {
|
|
|
|
|
Symbol sym = Symbol::fromQualString("aten::test_symbol");
|
|
|
|
|
Graph graph;
|
|
|
|
|
auto node = graph.create(sym);
|
|
|
|
|
TORCH_CHECK(canRunWithAutograd(node));
|
|
|
|
|
|
|
|
|
|
sym = Symbol::fromQualString("prim::test_symbol");
|
|
|
|
|
node = graph.create(sym);
|
|
|
|
|
TORCH_CHECK(canRunWithAutograd(node));
|
|
|
|
|
|
|
|
|
|
sym = Symbol::fromQualString("prim::FusionGroup");
|
|
|
|
|
node = graph.create(sym);
|
|
|
|
|
TORCH_CHECK(!canRunWithAutograd(node));
|
|
|
|
|
|
|
|
|
|
sym = Symbol::fromQualString("custom::test_symbol");
|
|
|
|
|
node = graph.create(sym);
|
|
|
|
|
TORCH_CHECK(!canRunWithAutograd(node));
|
|
|
|
|
}
|
|
|
|
|
|
2018-10-07 05:58:28 +00:00
|
|
|
} // namespace jit
|
|
|
|
|
} // namespace torch
|