pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Wang, Eikan	e5a5cd149f	Simplify IfThenElse and CompareSelect within for-loop (#76793 ) Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793 Approved by: https://github.com/huiguoo	2022-05-15 20:21:28 +00:00
Mikhail Zolotukhin	1855b14922	[TensorExpr] Delet `DimArg` class. (#72390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390 This class didn't add much value and only caused more boilerplate code. This change removes the class and updates all the use cases with uses of `ExprHandle`. A side effect of this change is different names in loop variables, which caused massive mechanical changes in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030296 Pulled By: ZolotukhinM fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108 (cherry picked from commit c2ec46a0587cafd4e915c5bf1e0dc0b5d244e8d5)	2022-02-11 01:21:59 +00:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Hui Guo	4481c87ac4	[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763 Simplification pattern: x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30845854 Pulled By: huiguoo fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd	2021-09-10 20:33:02 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	7fdba4564a	[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197 This solves non-determinism from using hash values in sort methods. Changes in tests are mostly mechanical. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292776 Pulled By: ZolotukhinM fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055	2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Raghavan Raman	59dd12042e	[nnc] Removed const from all fields in IR. (#62336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336 This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change. This is the first step in making all NNC mutations in-place. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30049829 Pulled By: navahgar fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63	2021-08-03 11:44:36 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Raghavan Raman	843c42ffd8	[nnc] Refactored test macros and updated compress buffer tests to use them (#61716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715754 Pulled By: navahgar fbshipit-source-id: c400a58b7f393c0f93e5a25f118403124f8834b0	2021-07-15 21:17:14 -07:00
Raghavan Raman	34d6618386	[NNC] Fixing a bug in simplifier (#58291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58291 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28435393 Pulled By: navahgar fbshipit-source-id: 517e47385a93a43d2ddf054382adc81c18484066	2021-05-18 01:28:33 -07:00
CodemodService FBSourceClangFormatLinterBot	cbfce376a8	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28319469 fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988	2021-05-10 03:39:31 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Bert Maher	7367bca066	[nnc] Tests for proposed feature: loop bounds conditional simplification (#54121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54121 It would be nice to do range analysis to determine if a condition cannot be satisfied. These are some tests that we should be able to turn on once we have this feature. ghstack-source-id: 124116847 Test Plan: Simplify.*LoopBounds Reviewed By: ZolotukhinM Differential Revision: D27107956 fbshipit-source-id: bb27e3d3bc803f0101c416e4a351ba2278684980	2021-03-17 11:01:10 -07:00
Hui Guo	8737c2a1a2	[TensorExpr] Reland: "Simplify index expressions constructed in loop flattening. Fixes #51173 " (#53861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53861 Replaced the iterators in the for-loops with integer index variables due to overflow when handling empty vectors. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26998894 Pulled By: huiguoo fbshipit-source-id: a1f6475c8ba123968ef7247b4f6f38edbf24b9ef	2021-03-11 23:52:36 -08:00
Edward Yang	07d315fce8	Revert D26676150: Simplify index expressions constructed in loop flattening - #51173 Test Plan: revert-hammer Differential Revision: D26676150 (`1f01899e4a`) Original commit changeset: e202e0c8610e fbshipit-source-id: 9611dda6897b67e16e44c731994bc9e5fccab0b9	2021-03-11 07:17:38 -08:00
Hui Guo	1f01899e4a	Simplify index expressions constructed in loop flattening - #51173 (#52882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52882 Test Plan: Imported from OSS build/bin/test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D26676150 Pulled By: huiguoo fbshipit-source-id: e202e0c8610eb107558a3add8a6560a0cb97704a	2021-03-10 18:37:42 -08:00
Mikhail Zolotukhin	d3b427a0e3	[TensorExpr] Add an unmasked `Load` constructor. (#52790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52790 Fixes #52774. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26649542 Pulled By: ZolotukhinM fbshipit-source-id: ab1c9e55f52e59d0bd00fbde2ec3125f8c7917ee	2021-02-24 22:45:29 -08:00
Hui Guo	973e306c84	changed TE 'Allocate' API to take one argument 'Buf' instead of three arguments 'Var', 'dtype', 'dims'. (#50167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50167 Test Plan: Imported from OSS `python test/test_jit_fuser_te.py` `python test/test_jit_fuser_legacy.py` `python test/test_jit_fuser.py` `build/bin/test_tensorexpr` Reviewed By: ZolotukhinM Differential Revision: D25814342 Pulled By: huiguoo fbshipit-source-id: 44cba7f92365b826c9cb1d385a94858934570dee	2021-02-22 15:08:51 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Mikhail Zolotukhin	a5b27d7a31	[TensorExpr] Move `SimpleIREval` implementation from .h to .cpp. (#49697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697 Mostly mechanical move. This refactoring helps to hide unnecessary details from the SimpleIREval interface and make it more similar to a pure 'codegen'. Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D25668696 Pulled By: ZolotukhinM fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c	2020-12-21 20:20:15 -08:00
Raghavan Raman	46c9a0e679	Do not use negative values in GCD computation. (#49379 ) Summary: GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49379 Reviewed By: ezyang Differential Revision: D25597115 Pulled By: navahgar fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927	2020-12-21 15:08:43 -08:00
Peng Wu	6568572712	Support integral types for kAbs in SimpleIREvaluator (#49357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357 This is a follow-up fix for PR #48679, where the previous PR adds support for integer inputs to aten::abs by promoting integers to float and then demote the result back to integers. This PR supports integer inputs to aten::abs more efficiently in the SimpleIREvaluator by allowing implementing integer inputs for kAbs (renamed from kFabs). - Rename kFabs to kAbs - Add support for integer input to kAbs in SimpleIREvalator (note that: llvm_codegen and cuda_codegen already supports integer inputs to kAbs) Test Plan: - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` Imported from OSS Reviewed By: eellison Differential Revision: D25545791 fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230	2020-12-18 07:57:58 -08:00
Bert Maher	07657b6001	[tensorexpr] Switch cpp tests to pure gtest (#48160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160 We no longer use the custom c++ test infra anyways, so move to pure gtest. Fixes #45703 ghstack-source-id: 116977283 Test Plan: `buck test //caffe2/test/cpp/tensorexpr` Reviewed By: navahgar, nickgg Differential Revision: D25046618 fbshipit-source-id: da34183d87465f410379048148c28e1623618553	2020-11-18 12:23:34 -08:00
Nick Gibson	17f8c329df	[NNC] IRSimplifier rules for Compare and Mod (#46412 ) Summary: Adds new rules to the NNC IRSimplifier to take care of the following cases: * Comparisons which are symbolic but have a constant difference. E.g. this is most useful in cases like `if (x > x + 4) ...` which we can now eliminate. * Simplification of `Mod` nodes, including simple rules such as `0 % x` and `x % 1`, but also factorization of both sides to find common symbolic multiples. E.g. `(x * y) % x` can be cancelled out to `0`. See tests for many more examples! Pull Request resolved: https://github.com/pytorch/pytorch/pull/46412 Reviewed By: navahgar Differential Revision: D24396151 Pulled By: nickgg fbshipit-source-id: abb954dc930867d62010dcbcd8a4701430733715	2020-10-19 19:37:09 -07:00
Nick Gibson	2fa91fa305	[NNC] Fix crash when simplifying certain subtractions (#46108 ) Summary: Fixes a crash bug in the IRSimplifier when the LHS is a Term (e.g. 2x) and the RHS is a Polynomial (e.g. 2x+1). This case crashes 100% of the time so I guess it's not very common in models we've been benchmarking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46108 Reviewed By: agolynski Differential Revision: D24226593 Pulled By: nickgg fbshipit-source-id: ef454c855ff472febaeba16ec34891df932723c0	2020-10-09 15:15:55 -07:00
Mikhail Zolotukhin	4aca63d38a	[TensorExpr] Change API for creating Load and Store expressions. (#45520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520 With this change `Load`s and `Store`s no longer accept `Placeholder`s in their constructor and `::make` functions and can only be built with `Buf`. `Placeholder` gets its own `store`, `load`, `storeWithMask`, and `loadWithMask` method for more convenient construction. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23998789 Pulled By: ZolotukhinM fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912	2020-09-29 20:52:38 -07:00
Mikhail Zolotukhin	3c33695a6d	[TensorExpr] Rename `Buffer` to `Placeholder`. (#45389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389 Differential Revision: D23952866 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75	2020-09-29 01:21:54 -07:00
Alex Suhan	76c185dcca	[TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179 ) Summary: We need to check if dtypes differ in scalar type or lanes to decide between Cast and Broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander Reviewed By: bwasti Differential Revision: D23873316 Pulled By: asuhan fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620	2020-09-23 17:06:54 -07:00
Alex Suhan	215679573e	[TensorExpr] Fix operator order in combineMultilane (#45157 ) Summary: combineMultilane used the wrong order when ramp was on the left hand side, which matters for subtract. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast Reviewed By: ailzhang Differential Revision: D23851751 Pulled By: asuhan fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858	2020-09-22 23:50:47 -07:00
Nick Gibson	4bbb6adff5	[NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909 ) Summary: A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled. The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads. To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place. Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909 Reviewed By: agolynski Differential Revision: D23800565 Pulled By: nickgg fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f	2020-09-21 09:27:22 -07:00
Nick Gibson	f175830558	[NNC] Fuse identical conditions in simplifier (#44886 ) Summary: Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g. ``` if (i < 10) { do_thing_1; } else { do_thing_2; } if (i < 10) { do_thing_3; } ``` is transformed into: ``` if (i < 10) { do_thing_1; do_thing_3; } else { do_thing_2; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886 Reviewed By: glaringlee Differential Revision: D23768565 Pulled By: nickgg fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df	2020-09-18 11:38:03 -07:00
Nick Gibson	204f985fc3	[NNC] Add simplification of Loop + Condition patterns. (#44764 ) Summary: Adds a new optimization to the IRSimplifier which changes this pattern: ``` for ... if ... do thing; ``` into: ``` if ... for ... do thing; ``` Which should be almost strictly better. There are many cases where this isn't safe to do, hence tests. Most obviously when the condition depends on something modified within the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764 Reviewed By: mruberry Differential Revision: D23734463 Pulled By: nickgg fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36	2020-09-16 18:41:58 -07:00
Alex Suhan	7b3432caff	[TensorExpr] Support boolean in simplifier (#44659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool Reviewed By: ngimel Differential Revision: D23714675 Pulled By: asuhan fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c	2020-09-16 15:30:19 -07:00
Nick Gibson	69839ea3f6	[NNC] make inlining immediate (take 3) (#44231 ) Summary: This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context. The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it. I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231 Reviewed By: albanD Differential Revision: D23689688 Pulled By: nickgg fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9	2020-09-15 11:12:24 -07:00
Raghavan Raman	ad7a2eb1c9	Simplify nested Min and Max patterns. (#44142 ) Summary: Improve simplification of nested Min and Max patterns. Specifically, handles the following pattern simplications: * `Max(A, Max(A, Const)) => Max(A, Const)` * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))` * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))` - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)` Similarly, for the case of Min as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142 Reviewed By: albanD Differential Revision: D23644486 Pulled By: navahgar fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4	2020-09-14 13:24:46 -07:00
Bert Maher	6d4a605ce9	Fix bug simplifying if-then-else when it can be removed (#44462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23671157 Pulled By: bertmaher fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58	2020-09-13 10:29:28 -07:00
Mikhail Zolotukhin	6474057c76	Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs Test Plan: revert-hammer Differential Revision: D23503636 (`70aecd2a7f`) Original commit changeset: cdbdc902b7a1 fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230	2020-09-04 10:58:23 -07:00
Nick Gibson	70aecd2a7f	[NNC] make inlining immediate (take 2) and fix bugs (#43885 ) Summary: A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two. This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches. This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs). This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list: * When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body. * When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined. * `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885 Reviewed By: gmagogsfm Differential Revision: D23503636 Pulled By: nickgg fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa	2020-09-03 16:49:24 -07:00
Nick Gibson	1390cad2d8	[NNC] Hook up registerizer to Cuda codegen [2/x] (#42878 ) Summary: Insert the registerizer into the Cuda Codegen pass list, to enable scalar replacement and close the gap in simple reduction performance. First up the good stuff, benchmark before: ``` Column sum Caffe2 NNC Simple Better (10, 100) 5.7917 9.7037 6.9386 6.0448 (100, 100) 5.9338 14.972 7.1139 6.3254 (100, 10000) 21.453 741.54 145.74 12.555 (1000, 1000) 8.0678 122.75 22.833 9.0778 Row sum Caffe2 NNC Simple Better (10, 100) 5.4502 7.9661 6.1469 5.5587 (100, 100) 5.7613 13.897 21.49 5.5808 (100, 10000) 21.702 82.398 75.462 22.793 (1000, 1000) 22.527 129 176.51 22.517 ``` After: ``` Column sum Caffe2 NNC Simple Better (10, 100) 6.0458 9.4966 7.1094 6.056 (100, 100) 5.9299 9.1482 7.1693 6.593 (100, 10000) 21.739 121.97 162.63 14.376 (1000, 1000) 9.2374 29.01 26.883 10.127 Row sum Caffe2 NNC Simple Better (10, 100) 5.9773 8.1792 7.2307 5.8941 (100, 100) 6.1456 9.3155 24.563 5.8163 (100, 10000) 25.384 30.212 88.531 27.185 (1000, 1000) 26.517 32.702 209.31 26.537 ``` Speedup about 3-8x depending on the size of the data (increasing with bigger inputs). The gap between NNC and simple is closed or eliminated - remaining issue appears to be kernel launch overhead. Next up is getting us closer to the _Better_ kernel. It required a lot of refactoring and bug fixes on the way: * Refactored flattening of parallelized loops out of the CudaPrinter and into its own stage, so we can transform the graph in the stage between flattening and printing (where registerization occurs). * Made AtomicAddFuser less pessimistic, it will now recognize that if an Add to a buffer is dependent on all used Block and Thread vars then it has no overlap and does not need to be atomic. This allows registerization to apply to these stores. * Fixed PrioritizeLoad mutator so that it does not attempt to separate the Store and Load to the same buffer (i.e. reduction case). * Moved CudaAnalysis earlier in the process, allowing later stages to use the analyzed bufs. * Fixed a bug in the Registerizer where when adding a default initializer statement it would use the dtype of the underlying var (which is always kHandle) instead of the dtype of the Buf. * Fixed a bug in the IRMutator where Allocate statements logic was inverted to be replaced only if they did not change. * Added simplification of simple Division patterns to the IRSimplifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42878 Reviewed By: glaringlee Differential Revision: D23382499 Pulled By: nickgg fbshipit-source-id: 3640a98fd843723abad9f54e67070d48c96fe949	2020-08-31 10:39:46 -07:00
Alex Suhan	f20a04fa2d	[TensorExpr] Simplify conditional select (#43350 ) Summary: Fold conditional select when both sides are constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43350 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConditionalSelectFold* Reviewed By: pbelevich Differential Revision: D23256602 Pulled By: asuhan fbshipit-source-id: ec04b1e4ae64f59fa574047f2d7af55a717a5262	2020-08-21 11:15:48 -07:00
Nick Gibson	6fb5ce5569	[NNC] Fix some bugs in Round+Mod simplification (#42934 ) Summary: When working on the Cuda Codegen, I found that running the IRSimplifier before generating code lead to test fails. This was due to a bug in Round+Mod simplification (e.g. (x / y * y) + (x % y) => x) to do with the order in which the terms appeared. After fixing it and writing a few tests around those cases, I found another bug in simplification of the same pattern and have fixed it (with some more test coverage). Pull Request resolved: https://github.com/pytorch/pytorch/pull/42934 Reviewed By: zhangguanheng66 Differential Revision: D23085548 Pulled By: nickgg fbshipit-source-id: e780967dcaa7a5fda9f6d7d19a6b7e7b4e94374b	2020-08-13 09:47:21 -07:00

1 2

77 commits