pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Kimish Patel	e0fc473e47	[Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062 This diff introduces the following changes: 1. InlinedCallStack pickler/serializer is introduced. It is serialized as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack} Module instance info is serialized as tuple of {class_type_name, instance_name}. Note that callee of the serialized inlined callstack points to the tuple of already serialized callstack. This means the first callstack ptr to serialize, will serialize entire path of the tree, where some callee nodes might be shared with callstack pointers that will be serialized subsequently. Pickler supports memoization of pickled objects, where if a tuple has been serialized then object id is obtained instead of serialized object again. Thus we stll serialize the tree and not every path from the root separately. Furthermore, InlinedCallStackSerializer also uses cache to lookup the pointer and return the serialized IValue. Furthermore, note that we must also serialize the source range of InlinedCallStack. In order to this serializer requires map of source-range-tags-to-source-range map. This was done in the previous diff, where as part of source range serialization we also generate unique tags. These are the tags that are serialized in InlinedCallStack. Thus during deserialization we would have to deserialize source range before deserializing InlinedCallStacks. 2. Furthermore, each serialized InlinedCallStack is serialized with a unique debug_handle and source range tag. BackendDebugHandleManager manages generation of unique debug handles and saves the map of debug-handles-to-{source_range_tag, inlined-callstack-ptr}. This map is then serialized as callstack_debug_map.pkl. Note that inlined callstack is not sufficient to get all the source information since it contains source information about the nodes which are inlined. The top-of-the-stack (or bottom) node, which is the actual op node, is not part of the inlined callstack pointer and thus the source range of this node is serialized separately using source_range_tag. This is similar to how JIT creates callstack in torch/csrc/jit/runtime/interpreter.cpp Unique debug handles facilitates exception throwing or profiling using just the debug handle without any further qualifications, such as which function or module the inlined-callstack belongs to. Furthermore, this diff refactors the old mobile code for tracking module hierarchy information per op. Mainly now bytecode serialization will serialize debug handles corresponding to ops/nodes in graph and have callstack_debug_map.pkl help generate: 1. Entire callstack and 2. Module hierarchy information. Test Plan: python test/mobile/test_lite_script_module.py TestLiteScriptModule ./build/bin/test_jit --gtest_filter=*ModuleInfo Imported from OSS Reviewed By: raziel Differential Revision: D27468709 fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23	2021-05-04 09:21:12 -07:00
Mikhail Zolotukhin	6b2cb939c5	[TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. (#57074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57074 The new methods allow to peak into bufferArgs which describe parameters that codegen expects. This description includes info whether a given parameter is a scalar var or a buffer and in case it's a buffer allows to get the corresponding `Buf*` pointer from which we could get the expected sizes. Differential Revision: D28048289 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 3867e862a0ec3593906820826c2344bd8a8f5c0a	2021-05-03 20:02:28 -07:00
Chen Lai	ac71432c54	[PyTorch][Edge] Add api to get bytecode version from runtime (#56948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56948 Add api to get runtime bytecode version ## Test Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass ghstack-source-id: 127939889 Test Plan: Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass Reviewed By: raziel, iseeyuan Differential Revision: D27987811 fbshipit-source-id: 35ed9bd626aecffc226f6dacfa046e6cdabfed51	2021-05-03 11:26:38 -07:00
Ailing Zhang	0ecdbfebff	s/InplaceOrView/ADInplaceOrView/g (#57372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57324 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28121821 Pulled By: ailzhang fbshipit-source-id: f568dd2505f6279da9ffb93ce1d22e0f98c606bb	2021-05-01 22:56:18 -07:00
Mike Ruberry	05b255c543	Revert D27487549: [TensorExpr] Add `CodeGen::call_raw` method. Test Plan: revert-hammer Differential Revision: D27487549 (`c9ab384af7`) Original commit changeset: d8f3d92262cd fbshipit-source-id: ea8e71dbe2d632bc0fb557362c8bd899eb6aa83a	2021-05-01 19:48:07 -07:00
Hui Guo	afe6b4c8ee	[NNC] Add logical Operators '&&' and '\|\|' (#56947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56947 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28007342 Pulled By: huiguoo fbshipit-source-id: a2ad8d2e99d7c8d8c8bdcd8f65fa3f340bdd2bbc	2021-05-01 18:44:27 -07:00
Mike Ruberry	3018093066	Revert D28110359: [TensorExpr] Add `TensorExprKernel::runFast` method. Test Plan: revert-hammer Differential Revision: D28110359 (`f219ed6627`) Original commit changeset: 4fdffc8196d2 fbshipit-source-id: 3c93a058b5dd7a3b71e399341a408ec74949ef56	2021-05-01 16:16:37 -07:00
Luca Wehrstedt	0422e67336	Use Devices instead of DeviceIndexes in TensorPipe agent (#57294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57294 With the advent of CPUs in the device maps, and to be more generic (e.g., to support AMD GPUs), and to avoid conversions when passing to Future and RRef and such, it's easier to use Devices instead of DeviceIndices. This started by just migrating the TensorPipe agent but the RPC layer is quite intertwined so I had to migrate a lot of stuff. ghstack-source-id: 127916562 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28092733 fbshipit-source-id: 024dcb3648c5898ab13e770413c43958f04f1a8a	2021-05-01 16:12:55 -07:00
Jiakai Liu	3c4d57c18b	[pytorch][nnc] update external functions for mobile build (#56850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56850 This is part of the changes to enable NNC AOT compilation for mobile. The generated kernels need to call these external functions thus change the declarations to use C linkage when building the mobile runtime. Added nnc_aten_addmm external function. ghstack-source-id: 127877411 Test Plan: - build & CI; - tested mobile build with stacked PRs; Reviewed By: ZolotukhinM Differential Revision: D27897154 fbshipit-source-id: 61d5499d7781a83bd2657859659fd1b5043d6b04	2021-04-30 19:07:19 -07:00
Mikhail Zolotukhin	f219ed6627	[TensorExpr] Add `TensorExprKernel::runFast` method. (#57328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57328 This method uses `CodeGen::call_raw` instead of `CodeGen::call`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28110359 Pulled By: ZolotukhinM fbshipit-source-id: 4fdffc8196d24fc3300a9b4bc69f67562042a045	2021-04-30 15:26:18 -07:00
Mikhail Zolotukhin	c9ab384af7	[TensorExpr] Add `CodeGen::call_raw` method. (#55113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55113 The new method allows to pass input and output arguments by `void*` pointers instead of CallArgs. That helps to reduce the invocation overhead. Currently this is only supported in LLVM codegen. Differential Revision: D27487549 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: d8f3d92262cde1c155beefb629454370d9af2f89	2021-04-30 15:24:37 -07:00
Scott Wolchok	b87d3fa432	[PyTorch][jit] Don't allow create() on singleton types (#56807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807 If I understand correctly, there's no reason to create your own instance of these global singleton types. ghstack-source-id: 127312270 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27973447 fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912	2021-04-30 10:28:50 -07:00
Raghavan Raman	e795f88d6b	[NNC] Make flatten transform in-place (#56629 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/56157 This PR updates the `flatten` API in `LoopNest` to perform the flattening transformation in-place. After this transformation, the first loop in the input becomes the flattened loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56629 Reviewed By: H-Huang Differential Revision: D28004787 Pulled By: navahgar fbshipit-source-id: 7474ae237fae3fff0cd1c64a276a8831dc5b7db0	2021-04-30 09:51:45 -07:00
CodemodService FBSourceClangFormatLinterBot	e903e16d40	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28088724 fbshipit-source-id: 3a350580427b92719a3c300bec310aea78375996	2021-04-29 04:12:25 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Chen Lai	c91ea7d488	[PyTorch][Edge] Add binarires for unittests (#57039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57039 ## Summary Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002) ## Test plan CI Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28047615 Pulled By: cccclai fbshipit-source-id: 47f7df3094dadb7e013ed57bc713cc8b3d1c8ce0	2021-04-27 20:46:34 -07:00
Nikita Shulga	a93ceb333d	Workaround intermittent gcc-7.5 ICE in cpp tests (#57016 ) Summary: gcc-7.5 optimizer can hit internal compiler error if both `-fopenmp` and `-faligned-new` are passed: ``` /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp: In function 'void transformer_decoder_test_helper(bool)': /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6: internal compiler error: in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429 void transformer_decoder_test_helper(bool is_cuda) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Fixes https://github.com/pytorch/pytorch/issues/40941 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57016 Reviewed By: walterddr Differential Revision: D28027670 Pulled By: malfet fbshipit-source-id: 834e34b95e09bcae39ada25e02749f479a7e9013	2021-04-27 09:21:23 -07:00
Mikhail Zolotukhin	f3743f097f	[TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27977461 Pulled By: ZolotukhinM fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f	2021-04-26 01:51:21 -07:00
Tugsbayasgalan Manlaibaatar	2041cd6707	Enable forward/backward compatibility in TS mobile (#56079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27828149 Pulled By: tugsbayasgalan fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da	2021-04-23 16:55:18 -07:00
Rong Rong (AI Infra)	5288d05cfd	Revert D27958477: [PyTorch][Edge] Add v4 and v5 models and remove unused model Test Plan: revert-hammer Differential Revision: D27958477 (`2e4c68a727`) Original commit changeset: 2e6f985a988d fbshipit-source-id: 520cb8a353d91cd26cb27880a0a8e27dbfcd2d99	2021-04-23 14:42:01 -07:00
Raghavan Raman	5b7317b562	[NNC] API for Buffer Compression (#55853 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54338 This PR adds the following API in NNC to implement "buffer compression". ``` static void compressBuffer(Buf* buf, Stmt* stmt); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853 Reviewed By: ezyang Differential Revision: D27960986 Pulled By: navahgar fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac	2021-04-23 14:12:03 -07:00
Chen Lai	2e4c68a727	[PyTorch][Edge] Add v4 and v5 models and remove unused model (#56751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56751 ## Summary 1. Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002) 2. Remove an unused model. Side note: these binaries are part of the test in https://github.com/pytorch/pytorch/pull/56002, and currently there is an ongoing issue to `ghexport` with binaries (post is https://fb.workplace.com/groups/533197713799375/permalink/1130109004108240/). `ghimport` can work with binary after checking temporary diff (D23336574). Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27958477 Pulled By: cccclai fbshipit-source-id: 2e6f985a988da55ad08fb9a5037434a2b6db0776	2021-04-23 11:52:42 -07:00
Raghavan Raman	d43d6593cd	[NNC] Handling conditionals in reorderAxis (#56063 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063 Reviewed By: huiguoo Differential Revision: D27894772 Pulled By: navahgar fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84	2021-04-21 09:35:17 -07:00
Bert Maher	c91c4a081d	[NNC] Horizontally fuse all loops (#56324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324 Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs (and thus multiple loops), CSE has no chance. So, this pass "horizontally" fuses the output loops together so that CSE can go to town. Essentially we want to turn ``` for (...) { output_1[] = some_complicated_expr... } for (...) { output_2[] = some_complicated_expr... } ``` Into: ``` for (...) { output_1[] = complicated_expr output_2[] = complicated_expr. // llvm cse should take care of this } ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27841194 Pulled By: bertmaher fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a	2021-04-20 23:54:40 -07:00
Raghavan Raman	13ac0019ae	[NNC] Update loop-carried dependence check to handle all known dependences (#56354 ) Summary: This PR includes: * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW. * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354 Reviewed By: bertmaher Differential Revision: D27856202 Pulled By: navahgar fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc	2021-04-20 17:12:51 -07:00
Ailing Zhang	1d8053655d	Rename AutoNonVariableTypeMode to AutoDispatchBelowAutograd and add a warning. (#56422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56422 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866608 Pulled By: ailzhang fbshipit-source-id: 507bbcaa4c25edf23e67162780efaa70f64ad14a	2021-04-20 17:04:08 -07:00
Lucas Hosseini	8868f9c8e3	[TensorPipe] Use targetDevice in tensorpipe_agent. (#56346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56346 Now that TensorPipe's API has `targetDevice`, use that instead of manually writing the CUDA device index in `metadata`. Test Plan: CI Reviewed By: lw Differential Revision: D27703235 fbshipit-source-id: c5b620e3b3ce619367412efdbe9fa3778f6b8869	2021-04-20 11:54:13 -07:00
davidriazati@fb.com	4e0760f41a	Remove `is_variable` from tests (#56305 ) Summary: `is_variable` spits out a deprecation warning during the build (if it's still something that needs to be tested we can ignore deprecated warnings for the whole test instead of this change). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56305 Pulled By: driazati Reviewed By: ezyang Differential Revision: D27834218 fbshipit-source-id: c7bbea7e9d8099bac232a3a732a27e4cd7c7b950	2021-04-20 09:03:53 -07:00
Alban Desmaison	63dac82444	Make grad mode error just a warning (#56401 ) Summary: Temporary fix to give people extra time to finish the deprecation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401 Reviewed By: xw285cornell, drdarshan Differential Revision: D27862196 Pulled By: albanD fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6	2021-04-20 06:30:55 -07:00
Raghavan Raman	0d94c04247	[NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/56357 Changes the `fuseLoops` API to the following form: ``` static bool fuseLoops(const std::vector<For>& loops, For* fused); ``` Also, adds a new API to check for loop-carried dependences: ``` static bool hasLoopCarriedDependence(For* loop); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353 Reviewed By: bertmaher Differential Revision: D27856214 Pulled By: navahgar fbshipit-source-id: 443557088692585657faee296602c547a00117dd	2021-04-19 17:20:40 -07:00
Ailing Zhang	98162cb0bb	Enable AutoGradMode in InferenceMode. (#56107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56107 Test Plan: Imported from OSS Reviewed By: pbelevich, driazati Differential Revision: D27807137 Pulled By: ailzhang fbshipit-source-id: bfacf11ec5a431589cec73d6371cac81b425a115	2021-04-19 10:24:20 -07:00
Raghavan Raman	b387f7ca47	[NNC] Make normalization transformation in-place (#56158 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/56157 This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one. New API: ``` static bool normalize(For* f); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158 Reviewed By: agolynski Differential Revision: D27798361 Pulled By: navahgar fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b	2021-04-18 23:54:13 -07:00
Raghavan Raman	29c5cb797d	[NNC] Fuse loops that have the same bounds as expressions (#55997 ) Summary: This PR allows fusing loops whose bounds are specified as expressions that are equal. For example: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 * j; } for (int k = 0; k < M + N; k++) { B[k] = 20 * k; } ``` `fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr` and will result in: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 j; B[j] = 20 * j; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997 Reviewed By: bertmaher Differential Revision: D27841270 Pulled By: navahgar fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e	2021-04-18 11:14:26 -07:00
Mikhail Zolotukhin	85126629a5	[TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319 With this change the TorchScript graph can have constant tensors in it and we still will be able to lower it to TE. The constants are registered (or bound) within the `TensorExprKernel` object and when the codegen is called, they are passed along with usual inputs and outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27838747 Pulled By: ZolotukhinM fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437	2021-04-17 11:15:35 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit `ad17fadbfc` (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Lucas Hosseini	3802e577fb	[TensorPipe] Use Descriptor::Tensor::sourceDevice in tensorpipe_agent. (#55821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55821 Test Plan: CI Reviewed By: lw Differential Revision: D27661608 fbshipit-source-id: fd241f073d8928528a749758c7d0f570dfeb677b	2021-04-15 03:21:26 -07:00
Lucas Hosseini	047164437e	[TensorPipe] Prepare for new Pipe API. (#55820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55820 Test Plan: CI Reviewed By: lw Differential Revision: D27648291 fbshipit-source-id: e08db6e8c1f5f333ec355de29e25fbe552904b25	2021-04-15 03:20:32 -07:00
Mikhail Zolotukhin	556dfcb0db	[TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094 Now FunctionCalls are merged with Loads and vectorization for intermediate values automatically started to work. Fixes #53553. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27781519 Pulled By: ZolotukhinM fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0	2021-04-14 21:39:03 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Mikhail Zolotukhin	7ab654afd7	[TensorExpr] Rename `Tensor::call` to `Tensor::load` to be consistent with `Buf` and `Placeholder`. (#55826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826 It's a mechanical change. Differential Revision: D27717777 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51	2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin	b01a15d3d3	[TensorExpr] Redesign Rfactor loopnest transformation. (#55324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324 With this change `rfactor` only affects the passed loop and its body never touching anything outside (that was a rootcause of a bug with the previous implementation). Also, we don't have an `insertion_point` parameter anymore - its meaning was vague, and the effect of it should've been achievable with other transformations anyway. The new `rfactor` semantics is as follows: ``` Requirements: * S is the reduction store * S is the only statement in the innermost loop * There is at least two reduction arguments in S * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable used in the store and all other reduction variables are index variables of children loops of OUTER_REDUCTION_FOR * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops corresponding to the other reduction variables and the store, nested into each other What it does: * Introduce a new buffer with an extra dimension of a size equal to the span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via RFAC_BUF_PTR) * Insert an initialization store for the new buffer in OUTER_REDUCTION_FOR before its nested loop * Replace the reduction store to the original buffer with the reduction store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR from reduction arguments * Insert a final reduction store over the extra dimension of the new buffer to the original buffer * Returns TRUE if the transformation succeeded and FALSE otherwise Example: Original IR: S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis S4: for k # reduction axis S5: X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k}) After RFACTOR(S5, S3) S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis for X, normal axis for X_rfac X_rfac[i,j] = 0 S4: for k # reduction axis X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k}) X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j}) ``` Differential Revision: D27694960 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c	2021-04-13 12:08:48 -07:00
Raghavan Raman	d805908c34	[NNC] API to reorder multiple loops (#55568 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52690 This PR adds the following APIs: ``` static bool areLoopsPerfectlyNested(const std::vector<For>& loops); static std::vector<For> reorder( const std::vector<For*>& loops, const std::vector<size_t>& permutation); ``` The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568 Reviewed By: albanD Differential Revision: D27689734 Pulled By: navahgar fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6	2021-04-12 18:12:24 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Ailing Zhang	6842da6251	[WIP]Relax some limitations of InferenceMode. (#54403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403 A few important points about InferenceMode behavior: 1. All tensors created in InferenceMode are inference tensors except for view ops. - view ops produce output has the same is_inference_tensor property as their input. Namely view of normal tensor inside InferenceMode produce a normal tensor, which is exactly the same as creating a view inside NoGradMode. And view of inference tensor outside InferenceMode produce inference tensor as output. 2. All ops are allowed inside InferenceMode, faster than normal mode. 3. Inference tensor cannot be saved for backward. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27316483 Pulled By: ailzhang fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b	2021-04-09 14:40:37 -07:00
Joel Schlosser	defc649eca	Update to short forms of splitWithTail / splitWithMask (#55542 ) Summary: Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542 Reviewed By: mrshenli Differential Revision: D27632033 Pulled By: jbschlosser fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df	2021-04-09 10:15:20 -07:00
Bert Maher	90f848572c	NNC depthwise conv2d implementation (#54920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920 Add a depthwise convolution implementation and reasonably good schedules for 3x3 stride=1,2. ghstack-source-id: 126076113 Test Plan: new tensorexpr test: Conv.DepthwiseConv2D Reviewed By: ZolotukhinM Differential Revision: D27413745 fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e	2021-04-08 21:56:53 -07:00
Jeffrey Wan	3f9492c8b3	[Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/55203 Fixes issues (1) and (2) in the following tests: tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512 Reviewed By: mrshenli Differential Revision: D27630679 Pulled By: soulitzer fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80	2021-04-08 08:34:25 -07:00

1 2 3 4 5 ...

1357 commits