Commit graph

4 commits

Author SHA1 Message Date
Jiakai Liu
9e0ce72e9e [pytorch] change op dependency output to use double-quoted strings (#32464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32464

Changed to double quoted strings to make FB linter happy.

Test Plan: Imported from OSS

Differential Revision: D19507859

Pulled By: ljk53

fbshipit-source-id: fa70535c7fbea73214b3b0efb0532184b5ee6854
2020-01-24 15:27:28 -08:00
Jiakai Liu
fc598f9023 generate op dependency graph as python code
Summary:
Add support to print op dependence as python code so that both custom
build script and BUCK can import it without yaml parser.

Test Plan:
- generate the file:
```
ANALYZE_TORCH=1 FORMAT=py DEPLOY=1 tools/code_analyzer/build.sh -closure=false
```

- load the file in python:
```
python
>>> from tools.code_analyzer.generated.torch import TORCH_DEPS
>>> print(TORCH_DEPS)
```

Differential Revision: D18894639

Pulled By: ljk53

fbshipit-source-id: e304d0525a07a13cf6e8a9317cd22637200d044c
2020-01-02 20:26:28 -08:00
Jiakai Liu
be55874f2c style fixes to code analyzer (#30808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30808

Addressed some comments on #29550 after it's landed.

Test Plan:
```
LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
LLVM_DIR=... ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false -debug_path=true
```

Differential Revision: D18835100

Pulled By: ljk53

fbshipit-source-id: 991d292ddc0211a88b04d0bdc24719f471c7786e
2019-12-05 11:25:37 -08:00
Jiakai Liu
c0299d2707 add LLVM code analyzer in order to replace static dispatch
Summary:
[Why static dispatch]
Static dispatch was introduced to allow stripping out unused ops at link
time (with “gc-sections” linker flag) for mobile build.

The alternative approaches to do "non-static" dispatch are:
* virtual methods - old ATen dispatcher, which has already been deprecated;
* registry pattern - used by caffe2, c10 and JIT;

However, none of them are “gc-sections” friendly. Global registers are
root symbols - linker cannot strip out any op if we use registry pattern
for mobile.

[Why static dispatch isn’t great]
* One more code path to maintain;
* Need recompile framework to add new backends/ops;
* Doesn’t support AutoGrad yet thus blocks on-device training;

[Static Code Analysis]
This PR introduces a LLVM analysis pass. It takes LLVM bitcode /
assembly as input and generates dependecy graph among aten ops. From a
set of root ops used by a model, we can calculate transitive closure of
all dependent ops, then we can ask codegen to only register these ops.

[Approach]
To generate the dependency graph it searches for 3 types of connections in
LLVM bitcode / assembly:
 1) op registration: op name (schema string literal) -> registered function;
 2) regular function call: function -> function;
 3) op invocation: function -> op name (schema string literal)

For 2) it uses similar algorithm as llvm::LazyCallGraph - not only looks into
call/invoke instructions but also recursively searches for function pointers
in each instruction's operands.

For 1) and 3) it searches for connections between operator name string
literals / function pointers and c10 op registration/invocation API calls in
LLVM IR graph via "use" edges (bi-directional):
 1. llvm::Value has "users()" method to get other llvm::Value nodes that use
    the value;
 2. most of types derive from llvm::User which has "operands()" method to get
    other llvm::Value nodes being used by the value;

[Limitation]
For now the search doesn't go beyond the function boundary because the
reference to op name string literals and c10 op registration/invocation
APIs are almost always in the same function.

The script uses regular expression to identify c10 API calls:
* op_schema_pattern="^(aten|quantized|profiler|_test)::[^ ]+"
* op_register_pattern="c10::RegisterOperators::(op|checkSchemaAndRegisterOp_)"
* op_invoke_pattern="c10::Dispatcher::findSchema|callOp"

If we create helper function around c10 API (e.g. the "callOp" method
defined in aten/native), we could simply add them to the regular expression
used to identify c10 API.

[Example]
In the following example, it finds out:
 1) the registered function for "quantized:add" operator;
 2) one possible call path to at::empty() function;
 3) the called operator name "aten::empty":

- "quantized::add"
- c10::detail::wrap_kernel_functor_unboxed_<at::native::(anonymous namespace)::QAdd<false>, at::Tensor (at::Tensor, at::Tensor, double, long)>::call(c10::OperatorKernel*, at::Tensor, at::Tensor, double, long)
- at::native::(anonymous namespace)::QAdd<false>::operator()(at::Tensor, at::Tensor, double, long)
- void at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::operator()<at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::DeviceType, at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::choose_cpu_impl()
- void at::native::(anonymous namespace)::qadd_kernel<false>(at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool)
- at::TensorIterator::build()
- at::TensorIterator::fast_set_up()
- at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>)
- "aten::empty"

[How do we know it’s correct?]
* Built a test project that contains different op registration/invocation
  patterns found in pytorch codebase, including both codegen and non-codegen
  cases.
* Tried different optimization flags “-O0”, “-O3” - the result seems to
  be stable.
* Filtered by common patterns: “aten::”, “at::”, “at::native”,
  “at::CPUType”, “at::TypeDefault” - manually checked the relationship
  between function schema strings and corresponding implementations were
  captured.
* It can print instruction level data flow and show warning message if it
  encounters unexpected cases (e.g.: found 0 or multiple op names per
  registration/invocation API call, found 0 registered functions, etc).
* Verified consistent results on different linux / macOs hosts. It can
  handle different STL library ABI reliably, including rare corner cases
  for short string literals

[Known issues]
* Doesn’t handle C code yet;
* Doesn’t handle overload name yet (all variants are collapsed into the
  main op name);

Test Plan:
```
LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 scripts/build_code_analyzer.sh
```

Differential Revision: D18428118

Pulled By: ljk53

fbshipit-source-id: d505363fa0cbbcdae87492c1f2c29464f6df2fed
2019-12-04 01:02:33 -08:00