pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Chen Lai	355acfdebc	[PyTorch Edge][tracing-based] use operator.yaml to build libtorch library (#66237 ) Summary: https://pxl.cl/1QK3N Enable using the yaml file from tracer to build libtorch library for ios and android. 1. Android: ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh x86 ``` libtorch_lite.so x86: 3 MB (larger than H1, static is ~3.2 MB) 2. iOS ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh ``` Binary size: 7.6 MB Size: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66237 ghstack-source-id: 140197164 Reviewed By: dhruvbird Differential Revision: D31463119 fbshipit-source-id: c3f4eb71bdef1969eab6cb60999fec8547641cbd	2021-10-10 14:07:01 -07:00
Peter Bell	93e0f3a330	Shard Operators.cpp (#62185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62185 This file can take 5 minutes on its own to compile, and is the single limiting factor for compile time of `libtorch_cpu` on a 32-core threadripper. Instead, sharding into 5 files that take around 1 minute each cuts a full minute off the overall build time. This also factors out the `.findSchemaOrThrow(...).typed` step so the code can be shared between `call` and `redispatch`. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962049 Pulled By: albanD fbshipit-source-id: be5df05fbea09ada0d825855f1618c25a11abbd8	2021-08-09 16:19:49 -07:00
Jiakai Liu	69b2bf70f9	[pytorch] fix tools/code_analyzer for llvm 11 (#60322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60322 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29250420 Pulled By: ljk53 fbshipit-source-id: ff7f9cbacd1d9518ed81c06fc843a90d6948f760	2021-06-20 00:39:11 -07:00
Jiakai Liu	501320ed81	[pytorch] deprecate default_op_deps.yaml (#59573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59573 To do mobile selective build, we have several options: 1. static dispatch; 2. dynamic dispatch + static analysis (to create the dependency graph); 3. dynamic dispatch + tracing; We are developing 3. For open source, we used to only support 1, and currently we support both 1 and 2. This file is only used for 2. It was introduced when we deprecated the static dispatch (1). The motivation was to make sure we have a low-friction selective build workflow for dynamic dispatch (2). As the name indicates, it is the default dependency graph that users can try if they don't bother to run the static analyzer themselves. We have a CI to run the full workflow of 2 on every PR, which creates the dependency graph on-the-fly instead of using the committed file. Since the workflow to automatically update the file has been broken for a while, it started to confuse other pytorch developers as people are already manually editing it, and it might be broken for some models already. We reintroduced the static dispatch recently, so we decide to deprecate this file now and automatically turn on static dispatch if users run selective build without providing the static analysis graph. The tracing-based selective build will be the ultimate solution we'd like to provide for OSS, but it will take some more effort to polish and release. Differential Revision: D28941020 D28941020 Test Plan: Imported from OSS Reviewed By: dhruvbird Pulled By: ljk53 fbshipit-source-id: 9977ab8568e2cc1bdcdecd3d22e29547ef63889e	2021-06-07 19:37:37 -07:00
Sam Estep	737d920b21	Strictly type everything in .github and tools (#59117 ) Summary: This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117 Test Plan: ``` flake8 mypy --config mypy-strict.ini ``` Reviewed By: malfet Differential Revision: D28765386 Pulled By: samestep fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e	2021-06-07 14:49:36 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Michael Carilli	bbc3cc6718	[CUDA graphs] [BC-breaking] Makes torch.cuda.amp.GradScaler scale updates in-place for better composability with graph capture (#55562 ) Summary: I'd like the following pattern (a natural composition of Amp with full fwd+bwd capture) to work: ```python # Create "static_input" with dummy data, run warmup iterations, # call optimizer.zero_grad(set_to_none=True), then g = torch.cuda._Graph() s.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(s): optimizer.zero_grad(set_to_none=True) g.capture_begin() with autocast(): out = model(static_input) loss = loss_fn(out) scaler.scale(loss).backward() g.capture_end() torch.cuda.current_stream().wait_stream(s) # Training loop: for b in data: # optimizer.zero_grad() deliberately omitted, replay()'s baked-in backward will refill statically held .grads static_input.copy_(b) g.replay() scaler.step(optimizer) scaler.update() ``` Right now `GradScaler` can't work with this pattern because `update()` creates the scale tensor for the next iteration out of place. This PR changes `update()` to act in place on a long-lived scale tensor that stays static across iterations. I'm not sure how this change affects XLA (see https://github.com/pytorch/pytorch/pull/48570), so we shouldn't merge without approval from ailzhang yaochengji. Tagged bc-breaking because it's a change to the amp update utility function in native_functions.yaml. The function was never meant to be user-facing though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55562 Reviewed By: zou3519 Differential Revision: D28046159 Pulled By: ngimel fbshipit-source-id: 02018c221609974546c562f691e20ab6ac611910	2021-04-30 13:03:05 -07:00
lezcano	fd02fc5d71	Port put_ and take from TH to ATen (#53356 ) Summary: The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel. Resolves https://github.com/pytorch/pytorch/issues/24751 Resolves https://github.com/pytorch/pytorch/issues/24614 Resolves https://github.com/pytorch/pytorch/issues/24640 Resolves https://github.com/pytorch/pytorch/issues/24772 This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388 This PR also makes these two functions correct in the following aspects (all of them added to the tests as well): - Support for complex numbers - Correct handling of scalar inputs and zero-dimensional inputs - Implementation that does not do any copies nor sorting of any of the input tensors - Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`) - Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats) - Adds the `torch.put` function that was missing, (`index_put` exists, for example) - Corrected docs It also adds a much more thorough testing to the operations and their gradients. There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one? Edit. Benchmarks: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device, cmd): print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") large_tensor = torch.rand(([size] ndims), device=device) small_tensor = torch.rand((index_len,), device=device) index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device) if cmd == "put": command = "large_tensor.put_(index, small_tensor, accumulate=False)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "accumulate": command = "large_tensor.put_(index, small_tensor, accumulate=True)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "take": command = "torch.take(large_tensor, index)" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for method, device in product(["accumulate", "put", "take"], [cpu, cuda]): run_test(3, 1000, 10, device, method) run_test(3, 1000, 1000, device, method) run_test(3, 1000, 10000, device, method) run_test(2, 10000, 100000, device, method) ``` </details> ```python put_(accumulate=False) ``` <details> <summary>ATen CPU (1.5x - 2x speedup)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> ```python put_(accumulate=True) ``` <details> <summary>ATen CPU (x2 speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (3x - 11x speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> ```python take() ``` <details> <summary>ATen CPU (1.1x speedup)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356 Reviewed By: mruberry Differential Revision: D27520243 Pulled By: ngimel fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093	2021-04-05 18:05:38 -07:00
Jiakai Liu	b2d8f0a431	[pytorch][bot] update mobile op deps (#52110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52110 LLVM_DIR=/usr ANALYZE_TORCH=1 tools/code_analyzer/build.sh cp build_code_analyzer/work/torch_result.yaml tools/code_analyzer/default_op_deps.yaml Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26419138 Pulled By: ljk53 fbshipit-source-id: 26bf00036b19ad18a9cf06111df4d9fe32e5feab	2021-02-12 14:50:29 -08:00
nikitaved	c458558334	kill `multinomial_alias_setup/draw` (#50489 ) Summary: As per title. Partially Fixes https://github.com/pytorch/pytorch/issues/49421. These functions appear to be dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50489 Reviewed By: mruberry Differential Revision: D25948912 Pulled By: ngimel fbshipit-source-id: 108723bd4c76cbc3535eba902d6f74597bfdfa58	2021-01-19 00:23:58 -08:00
Jiakai Liu	5252e9857a	[pytorch] clean up unused util srcs under tools/autograd (#50611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50611 Removed the unused old-style code to prevent it from being used. Added all autograd/gen_pyi sources to mypy-strict.ini config. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25929730 Pulled By: ljk53 fbshipit-source-id: 1fc94436fd4a6b9b368ee0736e99bfb3c01d38ef	2021-01-18 23:54:02 -08:00
Sebastian Messmer	4a14020c0d	Remove .impl_UNBOXED() and functionalities associated with it (#49220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220 Since all ops are c10-full, we can remove .impl_UNBOXED now. This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels. ghstack-source-id: 119450489 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25490225 fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad	2021-01-06 14:22:46 -08:00
Brian Hirsh	b5149513ec	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API, update code_analyzer regex (#48308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48308 The original regex that I added didn't correctly match namespaces that started with an underscore (e.g. `_test`), which caused a master-only test to fail. The only change from the previous commit is that I updated the regex like so: before: `^.TORCH_LIBRARY_IMPL_init_([^_]+)_([^_]+)_[0-9]+(\(.)?$` after: `^.TORCH_LIBRARY_IMPL_init_([_][^_]+)_([^_]+)_[0-9]+(\(.)?$` I added in a `[_]` to the beginning of the namespace capture. I did the same for the `_FRAGMENT` regex. Verified that running `ANALYZE_TEST=1 tools/code_analyzer/build.sh` (as the master-only test does) produces no diff in the output. Fixing regex pattern to allow for underscores at the beginning of the namespace This reverts commit `3c936ecd3c`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25123295 Pulled By: bdhirsh fbshipit-source-id: 54bd1e3f0c8e28145e736142ad62a18806bb9672	2020-11-30 13:05:33 -08:00
Brian Hirsh	3c936ecd3c	Revert D25056091: migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API Test Plan: revert-hammer Differential Revision: D25056091 (`0ea4982cf3`) Original commit changeset: 0f647ab9bc5e fbshipit-source-id: e54047b91d82df25460ee00482373c4580f94d50	2020-11-19 19:10:14 -08:00
Brian Hirsh	0ea4982cf3	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#48097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48097 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25056091 Pulled By: bdhirsh fbshipit-source-id: 0f647ab9bc5e5aee497dac058df492f6e742cfe9	2020-11-19 17:56:56 -08:00
Jiakai Liu	4f538a2ba4	[pytorch][bot] update mobile op deps (#47825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47825 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D24913587 Pulled By: ljk53 fbshipit-source-id: b6219573c3238fb453d88019197a00c9f9dbabb8	2020-11-12 19:19:25 -08:00
Nikita Shulga	3d962430a9	Make gen_op_registration flake8 compliant (#47604 ) Summary: Fixes regression introduced by D24686838 (`8182558c22`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47604 Reviewed By: walterddr Differential Revision: D24832687 Pulled By: malfet fbshipit-source-id: e9f7a35561c2b1705e11fd11abe402e3c83cf5cc	2020-11-09 08:31:07 -08:00
Martin Yuan	8182558c22	[PyTorch Mobile] Don't use __ROOT__ for inference only ops Summary: `__ROOT__` ops are only used in full-jit. To make size compact, disable using it in inference. Since FL is still in fill-jit, keep it for training only. It saves -17 KB for fbios. TODO: when FL is migrated to lite_trainer, remove `__ROOT__` to save size in training too. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D24686838 fbshipit-source-id: 15214cebb9d8defa3fdac3aa0d73884b352aa753	2020-11-08 15:27:47 -08:00
albanD	27e2ea4cea	Make add_relu an internal function (#46676 ) Summary: Cleanup for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46676 Reviewed By: gchanan Differential Revision: D24458565 Pulled By: albanD fbshipit-source-id: b1e4b4630233d3f1a4bac20e3077411d1ae17f7b	2020-10-22 18:08:15 -07:00
Dhruv Matani	75322dbeb4	[PyTorch] [BUCK] Replace pt_deps.bzl with a YAML operator dependency file which is generated by the code analyser (#46057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46057 The code analyser (that uses LLVM and runs in the OSS PyTorch git repo) already produces a YAML file which contains base operator names and the operators that they depend on. Currently, this operator dependency graph is converted into a python dictionary to be imported in BUCK and used there. However, it is mostly fed into other executables by serializing the JSON and the consumer pieces this JSON together by concatenating each argument together. This seems unnecessary. Instead, this diff retains the original YAML file and makes all consumers consume that same YAML file. ghstack-source-id: 114641582 Test Plan: Build Lite Predictor + sandcastle. Reviewed By: iseeyuan Differential Revision: D24186303 fbshipit-source-id: eecf41bf673d90b960c3efe7a1271249f0a4867f	2020-10-20 02:00:36 -07:00
Dhruv Matani	0c5cd8c2b9	[RFC] Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722 This diff does a bunch of things: 1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files. 2. Adds helper methods to combine operators, debug info, operator lists, etc... 3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that. 4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list. 5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string. There are a bunch of structural changes as well: 1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names) 2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments 3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library). ghstack-source-id: 114578753 (Note: this ignores all push blocking failures!) Test Plan: `buck test caffe2/test:selective_build` Generated YAML files after the change: {P143981979} {P143982025} {P143982056} Ensure that the generated files are same before and after the change: ``` [dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name ".cpp" \| xargs md5sum d72c3d125baa7b77e4c5581bbc7110d2 ./after_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./after_change/lite_predictor_lib_aten/TypeDefault.cpp d72c3d125baa7b77e4c5581bbc7110d2 ./before_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./before_change/lite_predictor_lib_aten/TypeDefault.cpp ``` `VariableTypes_N.cpp` are generated the same both before and after the change: ``` [dhruvbird@devvm2490 /tmp/VariableType] find -name ".cpp" \| xargs -n 1 md5sum \| sort 3be89f63fd098291f01935077a60b677 ./after/VariableType_2.cpp 3be89f63fd098291f01935077a60b677 ./before/VariableType_2.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./after/VariableType_4.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./before/VariableType_4.cpp a4911699ceda3c3a430f08c64e8243fd ./after/VariableType_1.cpp a4911699ceda3c3a430f08c64e8243fd ./before/VariableType_1.cpp ca9aa611fcb2a573a8cba4e269468c99 ./after/VariableType_0.cpp ca9aa611fcb2a573a8cba4e269468c99 ./before/VariableType_0.cpp e18f639ed23d802dc4a31cdba40df570 ./after/VariableType_3.cpp e18f639ed23d802dc4a31cdba40df570 ./before/VariableType_3.cpp ``` Reviewed By: ljk53 Differential Revision: D23837010 fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442	2020-10-18 15:10:42 -07:00
vishalrao487	d2623da52c	replaced whitelist with allowlist (#45260 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41754 (1) Intially file was named gen_op_registration_whitelist.py I changed it to gen_op_registration_allowlist.py (2) There were some whitelist in comment inside the file, I changed it to allowlist ![update1](https://user-images.githubusercontent.com/62737243/94106752-b296e780-fe59-11ea-8541-632a1dbf90d6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45260 Reviewed By: dhruvbird Differential Revision: D23947182 Pulled By: ljk53 fbshipit-source-id: 31b486592451dbb0605d7950e07747cbb72ab80f	2020-09-29 00:27:46 -07:00
Jiakai Liu	4a9c80e82e	[pytorch][bot] update mobile op deps (#44854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44854 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23751925 Pulled By: ljk53 fbshipit-source-id: 8e1905091bf3abaac20d97182eb88f96e905ffc2	2020-09-17 18:33:13 -07:00
Jiakai Liu	3fa7f515a5	[pytorch][bot] update mobile op deps (#44700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44700 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23719486 Pulled By: ljk53 fbshipit-source-id: 39219ceeee51861f90b228fdfe2ab59ac8a9704d	2020-09-16 17:20:15 -07:00
Ann Shan	0e3cf6b8d2	[pytorch] remove code analyzer build folder between builds (#44148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44148 Automatically remove the build_code_analyzer folder each time build.sh is run ghstack-source-id: 111458413 Test Plan: Run build.sh with different options and compare the outputs (should be different). Ex: `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseops MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=OFF' tools/code_analyzer/build.sh ` should produce a shorter file than `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseops MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=ON' tools/code_analyzer/build.sh` Reviewed By: iseeyuan Differential Revision: D23503886 fbshipit-source-id: 9b95d4365540da0bd2d27760e1315caed5f44eec	2020-09-04 10:38:12 -07:00
Jiakai Liu	b10c527a1f	[pytorch][bot] update mobile op deps (#44100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44100 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23496532 Pulled By: ljk53 fbshipit-source-id: 1e5b9059482e423960349d1361a7a98718c2d9ed	2020-09-03 11:24:26 -07:00
Jiakai Liu	402e9953df	[pytorch][bot] update mobile op deps (#44018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44018 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23470528 Pulled By: ljk53 fbshipit-source-id: b677e1c5677fc8929713ee108df69098502c50ea	2020-09-02 14:34:33 -07:00
Jiakai Liu	76ca365661	[pytorch][bot] update mobile op deps (#43937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43937 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23443927 Pulled By: ljk53 fbshipit-source-id: 526ca08dfb5bd32527bff98b243da90dbbf2ea49	2020-09-01 10:07:52 -07:00
Jiakai Liu	ffca81e38b	[pytorch][bot] update mobile op deps (#43871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43871 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23422523 Pulled By: ljk53 fbshipit-source-id: 95f2a1b6a2d25b13618c65944a2b919922083fb8	2020-08-31 14:42:12 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Jiakai Liu	3afd24d62c	[pytorch] check in default generated op dependency graph (#43570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43570 Add the default op dependency graph to the source tree - use it if user runs custom build in dynamic dispatch mode without providing the graph. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23326988 Pulled By: ljk53 fbshipit-source-id: 5fefe90ca08bb0ca20284e87b70fe1dba8c66084	2020-08-27 14:51:44 -07:00
Ann Shan	87905b5856	[pytorch] add option to include autograd for code analyzer (#43155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43155 Update the code_analyzer build.sh script to be able to take additional build flags in the mobile build/analysis Test Plan: Checkout associated PR or copy contents of build.sh into PyTorch repo (must be run from root of PyTorch repo) To run with inclusion of autograd dependencies (note BUILD_MOBILE_AUTOGRAD is still an experimental build flag): `ANALYZE_TORCH=1 DEPLOY=1 BASE_OPS_FILE=/path/to/baseopsfile MOBILE_BUILD_FLAGS='-DBUILD_MOBILE_AUTOGRAD=ON' tools/code_analyzer/build.sh` Reviewed By: ljk53 Differential Revision: D23065754 fbshipit-source-id: d83a7ad62ad366a84725430ed020adf4d56687bd	2020-08-24 15:04:43 -07:00
Edward Yang	7c50c2f79e	Reimplement per-operator selective build (#39401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39401 This uses the technique proposed by smessmer in D16451848 to selectively register operators without codegen. See the Note inside for more details. This PR has feature parity with the old selective build apparatus: it can whitelist schema def()s, impl()s, and on a per dispatch key basis. It has expanded dispatch key whitelisting, whereas previously manually written registrations were not whitelisted at all. (This means we may be dropping dispatch keys where we weren't previously!) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D21905593 Pulled By: ezyang fbshipit-source-id: d4870f800c66be5ce57ec173c9b6e14a52c4a48b	2020-08-20 19:10:02 -07:00
Jiakai Liu	8ddd2c4e1b	[pytorch] fix code analyzer for LLVM 9 & 10 (#42135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42135 Tested the code analyzer with LLVM 9 & 10 and fixed a couple issues: - Rename local demangle() which is available as public API since LLVM 9; - Fix falsely associated op registrations due to the `phi` instruction; Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22795508 Pulled By: ljk53 fbshipit-source-id: 2d47af088acd3312a7ea5fd9361cdccd48940fe6	2020-07-28 14:57:07 -07:00
Jiakai Liu	f92089b8ca	[pytorch] tweak code analyzer script to handle new namespaces (#40276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40276 - add a couple new namespaces; - handle the case where both contextual namespace and opreator namespace are set (BackendSelectRegister.cpp and #39401); - improve error message; Test Plan: Imported from OSS Differential Revision: D22135686 Pulled By: ljk53 fbshipit-source-id: 14d359c93573349b8fe1e05d7e44d875295a5f6d	2020-06-19 14:54:21 -07:00
Sebastian Messmer	5af4e76683	Back out "Revert D21530545: Remove call_unboxed_super_slow_temp_shim" (#38742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38742 Original commit changeset: af9013ed37d2 ghstack-source-id: 104397898 Test Plan: waitforsandcastle Differential Revision: D21651660 fbshipit-source-id: 8bb56eb8abd43fd01d1468f104babe92a09d2ad4	2020-05-19 18:23:20 -07:00
Sebastian Messmer	363a2d9455	Revert D21530545: Remove call_unboxed_super_slow_temp_shim Test Plan: revert-hammer Differential Revision: D21530545 Original commit changeset: cdfb801e5519 fbshipit-source-id: af9013ed37d27bf8dca859902918c02eb8cceeb4	2020-05-19 16:07:36 -07:00
Sebastian Messmer	423a00ad39	Remove call_unboxed_super_slow_temp_shim (#38351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38351 ghstack-source-id: 104368838 Test Plan: waitforsandcastle Differential Revision: D21530545 fbshipit-source-id: cdfb801e551993ecb339f3f8ec7c9b3039766989	2020-05-19 14:19:28 -07:00
Sebastian Messmer	6968c8153e	Warn against callOp (#37797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37797 This is slow (see comment in code). Not fixing this yet, but at least adding a warning so people are aware and don't add new call sites. ghstack-source-id: 103887226 Test Plan: waitforsandcastle Differential Revision: D21390364 fbshipit-source-id: 7bff1c3b9756a16c9d9110f209c23bf557266dda	2020-05-11 19:21:50 -07:00
Jiakai Liu	9d0891f886	[pytorch][buck] tweak code analyzer e2e script Summary: - Add debug mode to include debug information. - Move codegen comment to FB shell script (as it's only checked-in FB repo). - Analyze lite-predictor instead of full-JIT as full-JIT BUCK target contains variable kernels thus pull in a lot more dependencies. - Use pre-opt bitcode instead of pre-codegen bitcode - there is one special `callOp()` case in RNN.cpp where optimized bitcode has opname string and API body inlined together: https://fburl.com/diffusion/8rz6u4rg; pre-optimization bitcode should give more stable result. Test Plan: - Tested the bash script with stacked diff. Reviewed By: iseeyuan Differential Revision: D21298837 fbshipit-source-id: be33e2db5d8cb0f804460c503e52beb0dcb4857f	2020-04-29 22:38:09 -07:00
Jiakai Liu	8258d42bd0	[pytorch] add '__BASE__' section to op deps to factor out frequently used util ops (#37404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37404 Many aten operators are really like util functions, e.g.: aten::is_nonzero, aten::is_floating_point, etc. These ops can be called via overloaded c++ operator, so seemingly trivial and innocent code changes can affect how these ops are used by other ops (thus changes the output of static analyzer). Most of these util ops are rather small in terms of build size cost, so for the purpose of optimizing binary size with custom build, whether to include these ops or not does not make significant difference. In fact for non-trivial models a set of these ops are almost always used. This PR introduced the (optional) '__BASE__' ops section to the dependency graph. We can maintain the list of frequently used small util ops for internal BUCK build. This way, the output dependency graph will only contain meaningful edges with significant binary size impact, and it will be more stable from trivial code changes (which is checked in FB codebase). Having a stable and sparse deps graph by factoring out frequently used based ops is also a nice property to allow us to explore alternative custom build solutions in case we find it hard to maintain the static code analyzer. Test Plan: Imported from OSS Differential Revision: D21280835 Pulled By: ljk53 fbshipit-source-id: c4d0d1f07ca868c60f23118d877fc1eeead4c875	2020-04-28 17:18:09 -07:00
Jiakai Liu	e0a5b443d6	[pytorch] remove unused flags from code analyzer & move format support to python (#37393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37393 Simplify the code analyzer by removing some unused flags and moving the different format printer logic to python script. It's easier to add other post processing logic to adapt to different BUCK build configs. Test Plan: Imported from OSS Differential Revision: D21280836 Pulled By: ljk53 fbshipit-source-id: 0d66d5891d850f012c4ab4f39eabbd9aecc1caa9	2020-04-28 17:16:55 -07:00
Edward Yang	a894fff265	Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" Summary: Original commit changeset: 636e8a11afc6 Test Plan: export to OSS Reviewed By: malfet Differential Revision: D21170502 fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab	2020-04-22 09:18:23 -07:00
James Reed	2ccdc39dce	Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API Test Plan: revert-hammer Differential Revision: D21089648 Original commit changeset: 8d54329c1252 fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e	2020-04-21 12:21:45 -07:00
Edward Yang	01100cb477	Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742 Now, you can define a custom class inside a TORCH_LIBRARY block. It looks very similar to what you did before. Instead of ``` static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo); ``` you write ``` TORCH_LIBRARY(Namespace, m) { m.class_<Class>("Class") .def("foo", foo); } ``` All the old usages still work, but at some point we should start updating the tutorials when we're ready to go 100% live with the new pybind11 style API. custom class API previously lived in torch/ folder and in torch namespace, so for consistency, the new TORCH_LIBRARY also got moved to torch/library.h The definition of Library::class_ is in the bottom of that header because I need all of the class_ constructors available, but there is a circular dependency between the two headers. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D21089648 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507	2020-04-21 10:05:21 -07:00
Edward Yang	e29348f828	Switch to pybind11 style registration function API. (#36258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258 Previous we had a && chaining style API. There are some downsides to this API: - It's easy to forget the 'static' qualifier in front, leading to subtle ODR bugs. - It is not compatible with torchbind class_ definitions, as these need multiple levels of chaining. So in practice people end up having to define multiple static initializers, one per class. - It's not like pybind11. - There's no way to conveniently get the file and line number of the registration, as there is no macro point in the API. - The old API doesn't really encourage people to put all of their definitions for a library in one place, and to give a custom namespace for it. Similarly, the old API wasn't very DRY, because you had to keep repeating the namespace/dispatch key you were writing implementations for. The new API is modeled exactly off of the PYBIND11_MODULE macro: you write: ``` TORCH_LIBRARY(aten, m) { m.def("aten::add(Tensor self, Tensor other) -> Tensor"); ... } ``` in a non-chaining fashion, and under the hood the macro expands to define a function, and define a static initializer that allocates c10::Library (previously called c10::Module, but we renamed it to avoid confusion with the existing NN module concept), passes it to your function, and then retains it for the rest of the lifetime of the program. Specification of the namespace is mandatory, and in later commit I plan to make it a hard error to TORCH_LIBRARY the same library name twice. If you are specifying an implementation for an existing operator (e.g., you're the XLA backend, or even if you're just putting registrations for implementations at the implementation site), you should use TORCH_LIBRARY_IMPL, which instead takes a backend argument (instead of namespace) and can be used to specify an implementation for a backend. Unlike TORCH_LIBRARY, you can do as many of these as you want for a backend. This needs updates to the mobile code analyzer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929257 Pulled By: ezyang fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada	2020-04-16 10:44:21 -07:00
Jiakai Liu	f98e0a099a	[pytorch] handle pybind11 style registration API with code analyzer (#36607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36607 PR #36258 and subsequent PRs in the stack switch c10 registrations to the new pybind11 style registration API. One notable difference from old c10 registration API is that, operator's namespace is no longer in op schema string, e.g. "aten::" will be factored out from "aten::conv", "aten::emtpy" and etc. The namespace string will be declared at the beginning of registrations with TORCH_LIBRARY / TORCH_LIBRARY_IMPL macro. A rather simple fix is to extract namespace string from the name of enclosing function of registrations, as the TORCH_LIBRARY macro will always create an init function (per namespace) by appending namespace string to a common prefix. Another side effect of the API change is that it adds some debug string constants to the registration API, and because of factoring out the namespace part from op name, there is no longer an effect way to differentiate between real op name and debug strings. A simple workaround is that we only keep the first string constant it encounters while BFSing the LLVM IR - the real op name is directly passed into the registration call while the debug string is indirectly passed via CppFunction. These new assumptions might be broken by future changes but it's so simple to implement to unblock the API work. Test Plan: Imported from OSS Differential Revision: D21026008 Pulled By: ljk53 fbshipit-source-id: c8c171d23aaba6d6b7985d342e8797525126a713	2020-04-15 11:03:41 -07:00
Jiakai Liu	9cac2b83d9	[pytorch] improve code analyzer to dump ops called from c++ functions (#35941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35941 The key step of mobile custom build is to find out ops used by specific model, with which it can produce a tailored build of optimal size. However, ops can not only be called from TorchScript model but can also be called from C++ code directly, e.g.: via torch::jit:: APIs. With static dispatch, ops called this way will be statically linked into client code. With dynamic dispatch, we need obtain & keep these ops explicitly. This PR improves static code analyzer to dump ops that are called from visible c++ symbols matching specific regex. This provides a mechanism to solve the custom build problem with dynamic dispatch. It starts with dumping ops that are callable from functions in torch::jit namespace and include them in custom build with dynamic dispatch. We can extend it to analyze custom code / to refine the set of JIT APIs that are relevant, and etc. This is just a preliminary version. We need improve its usability for more general purpose. Test Plan: Imported from OSS Differential Revision: D20835166 Pulled By: ljk53 fbshipit-source-id: a87cfb22b34f89545edd0674a5dfca6b7cff2b0c	2020-04-14 23:21:19 -07:00
Edward Yang	2de3e491a8	[RELAND] Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#36223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223 Previously #35714 There are a lot of unboxed only defs. We're committed to removing them at the end of the half but as I am about to do a lot of porting to the new API, let's get them into a form where they're easy to remove. This is a new overload impl_UNBOXED that will pass the function pointer straight to CppFunction::makeUnboxedOnly I don't attempt to make the _UNBOXED API complete; in particular, catchall declarations don't get this sugar (as there are very few of them). To get some coverage of _UNBOXED API for code analysis, I switched one of our unboxed tests to be an impl rather than a def. This shouldn't materially affect coverage. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929259 Pulled By: ezyang fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184	2020-04-09 14:58:33 -07:00

1 2

66 commits