pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Mikhail Zolotukhin	f3743f097f	[TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27977461 Pulled By: ZolotukhinM fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f	2021-04-26 01:51:21 -07:00
Tugsbayasgalan Manlaibaatar	2041cd6707	Enable forward/backward compatibility in TS mobile (#56079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27828149 Pulled By: tugsbayasgalan fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da	2021-04-23 16:55:18 -07:00
Rong Rong (AI Infra)	5288d05cfd	Revert D27958477: [PyTorch][Edge] Add v4 and v5 models and remove unused model Test Plan: revert-hammer Differential Revision: D27958477 (`2e4c68a727`) Original commit changeset: 2e6f985a988d fbshipit-source-id: 520cb8a353d91cd26cb27880a0a8e27dbfcd2d99	2021-04-23 14:42:01 -07:00
Raghavan Raman	5b7317b562	[NNC] API for Buffer Compression (#55853 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54338 This PR adds the following API in NNC to implement "buffer compression". ``` static void compressBuffer(Buf* buf, Stmt* stmt); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853 Reviewed By: ezyang Differential Revision: D27960986 Pulled By: navahgar fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac	2021-04-23 14:12:03 -07:00
Chen Lai	2e4c68a727	[PyTorch][Edge] Add v4 and v5 models and remove unused model (#56751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56751 ## Summary 1. Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002) 2. Remove an unused model. Side note: these binaries are part of the test in https://github.com/pytorch/pytorch/pull/56002, and currently there is an ongoing issue to `ghexport` with binaries (post is https://fb.workplace.com/groups/533197713799375/permalink/1130109004108240/). `ghimport` can work with binary after checking temporary diff (D23336574). Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27958477 Pulled By: cccclai fbshipit-source-id: 2e6f985a988da55ad08fb9a5037434a2b6db0776	2021-04-23 11:52:42 -07:00
Raghavan Raman	d43d6593cd	[NNC] Handling conditionals in reorderAxis (#56063 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063 Reviewed By: huiguoo Differential Revision: D27894772 Pulled By: navahgar fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84	2021-04-21 09:35:17 -07:00
Bert Maher	c91c4a081d	[NNC] Horizontally fuse all loops (#56324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324 Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs (and thus multiple loops), CSE has no chance. So, this pass "horizontally" fuses the output loops together so that CSE can go to town. Essentially we want to turn ``` for (...) { output_1[] = some_complicated_expr... } for (...) { output_2[] = some_complicated_expr... } ``` Into: ``` for (...) { output_1[] = complicated_expr output_2[] = complicated_expr. // llvm cse should take care of this } ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27841194 Pulled By: bertmaher fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a	2021-04-20 23:54:40 -07:00
Raghavan Raman	13ac0019ae	[NNC] Update loop-carried dependence check to handle all known dependences (#56354 ) Summary: This PR includes: * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW. * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354 Reviewed By: bertmaher Differential Revision: D27856202 Pulled By: navahgar fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc	2021-04-20 17:12:51 -07:00
Ailing Zhang	1d8053655d	Rename AutoNonVariableTypeMode to AutoDispatchBelowAutograd and add a warning. (#56422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56422 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866608 Pulled By: ailzhang fbshipit-source-id: 507bbcaa4c25edf23e67162780efaa70f64ad14a	2021-04-20 17:04:08 -07:00
Lucas Hosseini	8868f9c8e3	[TensorPipe] Use targetDevice in tensorpipe_agent. (#56346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56346 Now that TensorPipe's API has `targetDevice`, use that instead of manually writing the CUDA device index in `metadata`. Test Plan: CI Reviewed By: lw Differential Revision: D27703235 fbshipit-source-id: c5b620e3b3ce619367412efdbe9fa3778f6b8869	2021-04-20 11:54:13 -07:00
davidriazati@fb.com	4e0760f41a	Remove `is_variable` from tests (#56305 ) Summary: `is_variable` spits out a deprecation warning during the build (if it's still something that needs to be tested we can ignore deprecated warnings for the whole test instead of this change). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56305 Pulled By: driazati Reviewed By: ezyang Differential Revision: D27834218 fbshipit-source-id: c7bbea7e9d8099bac232a3a732a27e4cd7c7b950	2021-04-20 09:03:53 -07:00
Alban Desmaison	63dac82444	Make grad mode error just a warning (#56401 ) Summary: Temporary fix to give people extra time to finish the deprecation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401 Reviewed By: xw285cornell, drdarshan Differential Revision: D27862196 Pulled By: albanD fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6	2021-04-20 06:30:55 -07:00
Raghavan Raman	0d94c04247	[NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/56357 Changes the `fuseLoops` API to the following form: ``` static bool fuseLoops(const std::vector<For>& loops, For* fused); ``` Also, adds a new API to check for loop-carried dependences: ``` static bool hasLoopCarriedDependence(For* loop); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353 Reviewed By: bertmaher Differential Revision: D27856214 Pulled By: navahgar fbshipit-source-id: 443557088692585657faee296602c547a00117dd	2021-04-19 17:20:40 -07:00
Ailing Zhang	98162cb0bb	Enable AutoGradMode in InferenceMode. (#56107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56107 Test Plan: Imported from OSS Reviewed By: pbelevich, driazati Differential Revision: D27807137 Pulled By: ailzhang fbshipit-source-id: bfacf11ec5a431589cec73d6371cac81b425a115	2021-04-19 10:24:20 -07:00
Raghavan Raman	b387f7ca47	[NNC] Make normalization transformation in-place (#56158 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/56157 This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one. New API: ``` static bool normalize(For* f); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158 Reviewed By: agolynski Differential Revision: D27798361 Pulled By: navahgar fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b	2021-04-18 23:54:13 -07:00
Raghavan Raman	29c5cb797d	[NNC] Fuse loops that have the same bounds as expressions (#55997 ) Summary: This PR allows fusing loops whose bounds are specified as expressions that are equal. For example: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 * j; } for (int k = 0; k < M + N; k++) { B[k] = 20 * k; } ``` `fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr` and will result in: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 j; B[j] = 20 * j; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997 Reviewed By: bertmaher Differential Revision: D27841270 Pulled By: navahgar fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e	2021-04-18 11:14:26 -07:00
Mikhail Zolotukhin	85126629a5	[TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319 With this change the TorchScript graph can have constant tensors in it and we still will be able to lower it to TE. The constants are registered (or bound) within the `TensorExprKernel` object and when the codegen is called, they are passed along with usual inputs and outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27838747 Pulled By: ZolotukhinM fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437	2021-04-17 11:15:35 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit `ad17fadbfc` (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Lucas Hosseini	3802e577fb	[TensorPipe] Use Descriptor::Tensor::sourceDevice in tensorpipe_agent. (#55821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55821 Test Plan: CI Reviewed By: lw Differential Revision: D27661608 fbshipit-source-id: fd241f073d8928528a749758c7d0f570dfeb677b	2021-04-15 03:21:26 -07:00
Lucas Hosseini	047164437e	[TensorPipe] Prepare for new Pipe API. (#55820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55820 Test Plan: CI Reviewed By: lw Differential Revision: D27648291 fbshipit-source-id: e08db6e8c1f5f333ec355de29e25fbe552904b25	2021-04-15 03:20:32 -07:00
Mikhail Zolotukhin	556dfcb0db	[TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094 Now FunctionCalls are merged with Loads and vectorization for intermediate values automatically started to work. Fixes #53553. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27781519 Pulled By: ZolotukhinM fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0	2021-04-14 21:39:03 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Mikhail Zolotukhin	7ab654afd7	[TensorExpr] Rename `Tensor::call` to `Tensor::load` to be consistent with `Buf` and `Placeholder`. (#55826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826 It's a mechanical change. Differential Revision: D27717777 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51	2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin	b01a15d3d3	[TensorExpr] Redesign Rfactor loopnest transformation. (#55324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324 With this change `rfactor` only affects the passed loop and its body never touching anything outside (that was a rootcause of a bug with the previous implementation). Also, we don't have an `insertion_point` parameter anymore - its meaning was vague, and the effect of it should've been achievable with other transformations anyway. The new `rfactor` semantics is as follows: ``` Requirements: * S is the reduction store * S is the only statement in the innermost loop * There is at least two reduction arguments in S * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable used in the store and all other reduction variables are index variables of children loops of OUTER_REDUCTION_FOR * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops corresponding to the other reduction variables and the store, nested into each other What it does: * Introduce a new buffer with an extra dimension of a size equal to the span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via RFAC_BUF_PTR) * Insert an initialization store for the new buffer in OUTER_REDUCTION_FOR before its nested loop * Replace the reduction store to the original buffer with the reduction store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR from reduction arguments * Insert a final reduction store over the extra dimension of the new buffer to the original buffer * Returns TRUE if the transformation succeeded and FALSE otherwise Example: Original IR: S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis S4: for k # reduction axis S5: X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k}) After RFACTOR(S5, S3) S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis for X, normal axis for X_rfac X_rfac[i,j] = 0 S4: for k # reduction axis X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k}) X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j}) ``` Differential Revision: D27694960 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c	2021-04-13 12:08:48 -07:00
Raghavan Raman	d805908c34	[NNC] API to reorder multiple loops (#55568 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52690 This PR adds the following APIs: ``` static bool areLoopsPerfectlyNested(const std::vector<For>& loops); static std::vector<For> reorder( const std::vector<For*>& loops, const std::vector<size_t>& permutation); ``` The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568 Reviewed By: albanD Differential Revision: D27689734 Pulled By: navahgar fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6	2021-04-12 18:12:24 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Ailing Zhang	6842da6251	[WIP]Relax some limitations of InferenceMode. (#54403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403 A few important points about InferenceMode behavior: 1. All tensors created in InferenceMode are inference tensors except for view ops. - view ops produce output has the same is_inference_tensor property as their input. Namely view of normal tensor inside InferenceMode produce a normal tensor, which is exactly the same as creating a view inside NoGradMode. And view of inference tensor outside InferenceMode produce inference tensor as output. 2. All ops are allowed inside InferenceMode, faster than normal mode. 3. Inference tensor cannot be saved for backward. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27316483 Pulled By: ailzhang fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b	2021-04-09 14:40:37 -07:00
Joel Schlosser	defc649eca	Update to short forms of splitWithTail / splitWithMask (#55542 ) Summary: Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542 Reviewed By: mrshenli Differential Revision: D27632033 Pulled By: jbschlosser fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df	2021-04-09 10:15:20 -07:00
Bert Maher	90f848572c	NNC depthwise conv2d implementation (#54920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920 Add a depthwise convolution implementation and reasonably good schedules for 3x3 stride=1,2. ghstack-source-id: 126076113 Test Plan: new tensorexpr test: Conv.DepthwiseConv2D Reviewed By: ZolotukhinM Differential Revision: D27413745 fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e	2021-04-08 21:56:53 -07:00
Jeffrey Wan	3f9492c8b3	[Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/55203 Fixes issues (1) and (2) in the following tests: tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512 Reviewed By: mrshenli Differential Revision: D27630679 Pulled By: soulitzer fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80	2021-04-08 08:34:25 -07:00
Brian Hirsh	dd2bccafc5	nnc hackathon - use new APIs in tests (#55497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497 Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203 I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27628625 Pulled By: bdhirsh fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d	2021-04-07 13:03:25 -07:00
Martin Yuan	3551bd31be	[PyTorch] Lite interpreter with a backend delegate (#54462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54462 Unclean files during sync - Sat Mar 20 04:00:02 PDT 2021 Unclean files during sync - Sun Mar 21 04:00:01 PDT 2021 ghstack-source-id: 124585992 Test Plan: ``` buck run xplat/caffe2/fb/test/delegate:interpreter_test -- --model_file_path=/path/to/mobile_model.ptl ``` Reviewed By: raziel Differential Revision: D27232309 fbshipit-source-id: 8504a3185339d73bfa6e924485c4745acf269cec	2021-04-06 00:55:26 -07:00
Nikitha Malgi	197f9f0826	Merge CUDA Streams and Events (#53902 ) Summary: ----------- - Updates current_stream and default stream API's to take `optional[device]` argument - Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT - Merges StreamContext manager for both Eager and JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902 Test Plan: ------ Run JIT tests: python test/test_jit.py -v TestCUDA Run eager tests: python test/test_cuda.py -v TestCuda Reviewed By: glaringlee Differential Revision: D27494627 Pulled By: nikithamalgifb fbshipit-source-id: b30b0570e38a33fb335c83762eb06ffd46a44b5c	2021-04-05 08:19:55 -07:00
Louis Feng	159fdde9ae	Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D27449877 fbshipit-source-id: 69918b729565f5899471d9db42a587f9af52238d	2021-04-02 15:16:17 -07:00
Maxim Grechkin	38a08a49ea	Flip clip_grad_norm default for error_if_nonfinite to false (#55169 ) Summary: Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169 Reviewed By: mruberry Differential Revision: D27511150 Pulled By: jbschlosser fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525	2021-04-02 12:25:32 -07:00
Lucas Hosseini	09f1f14569	Transition to new tensorpipe::Pipe API. (#55193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193 Test Plan: CI Reviewed By: lw Differential Revision: D27466387 fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3	2021-04-02 02:28:07 -07:00
Mikhail Zolotukhin	0b75f862c7	[TensorExpr] Nuke FunctionCall. (#54998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998 The only reason why we couldn't use Load instead of FunctionCall was DepTracker. Now this is gone and we finally could replace FunctionCall with Load. Test Plan: Imported from OSS Reviewed By: bertmaher, pbelevich Differential Revision: D27446412 Pulled By: ZolotukhinM fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47	2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin	688e350725	[TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997 DepTracker was used to automatically pull in dependent computations from output ones. While it seems quite convenient, it's led to several architectural issues, which are fixed in this stack. DepTracker worked on Tensors, which is a pair of Buf and Stmt. However, Stmt could become stale and there was no way to reliably update the corresponding tensor. We're now using Bufs and Stmts directly and moving away from using Tensors to avoid these problems. Removing DepTracker allowed to unify Loads and FunctionCalls, which essentially were duplicates of each other. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446414 Pulled By: ZolotukhinM fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399	2021-04-01 19:46:26 -07:00
Lucas Hosseini	9d6a81d1a6	Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136 This will ease the transition to the new API where `Buffer` does not store a length anymore. Test Plan: CI Reviewed By: lw Differential Revision: D27466385 fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925	2021-04-01 06:55:02 -07:00
Ailing Zhang	43d4f3b8d0	Implement public API InferenceMode and its error handling (#55008 ) Summary: https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008 Test Plan: Imported from OSS Pulled By: ailzhang Reviewed By: bhosmer Differential Revision: D27443229 fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c	2021-03-31 10:48:00 -07:00
Jianyu Huang	7fc03dd7c9	Back out "[pytorch][PR] Merge CUDA Streams and Events" (#54996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54996 Original commit changeset: 45d9fee9a582 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D27444718 fbshipit-source-id: deb627230817923eaf84ade50ecb14bfbce4e779	2021-03-31 10:21:35 -07:00
Jacob Szwejbka	a0ae3e520f	[Pytorch Mobile] 'fix' filter of named parameters for FL (#54633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54633 Theres currently no information that could be used to determine what is a parameter during the loading of a mobile module. This prevents named parameters from functioning correctly. This change is a temporary hack to help out federated learning the sole user of this api currently. ghstack-source-id: 124885201 Test Plan: todo Reviewed By: dhruvbird Differential Revision: D27308738 fbshipit-source-id: 0af5d1e8381ab7b7a43b20560941aa070a02e7b8	2021-03-31 09:21:35 -07:00
Qi Zhao	5b448cf21a	Revert D25966661: Support needsOutputs for RecordFunction and ObserverUtil improvements Test Plan: revert-hammer Differential Revision: D25966661 (`0e43a73f76`) Original commit changeset: 707886e1f212 fbshipit-source-id: a4e4af29abf622c1e0aaaf7dfb019c045988b4bc	2021-03-30 15:41:12 -07:00
Louis Feng	0e43a73f76	Support needsOutputs for RecordFunction and ObserverUtil improvements (#54442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D25966661 fbshipit-source-id: 707886e1f212f40ba16a1fe292ea7dd33f2646e3	2021-03-30 14:26:22 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Ailing Zhang	263180d7fc	Revert D26973911: Implement public API InferenceMode and its error handling Test Plan: revert-hammer Differential Revision: D26973911 (`7caa464631`) Original commit changeset: 0ebdac7a3cd5 fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273	2021-03-29 11:17:49 -07:00
Kurt Mohler	3ddc6174da	Raise error in clip_grad_norm_ if norm is non-finite (#53843 ) Summary: BC-breaking note: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False` Fixes https://github.com/pytorch/pytorch/issues/46849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843 Reviewed By: malfet Differential Revision: D27291838 Pulled By: jbschlosser fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4	2021-03-29 08:41:21 -07:00

1 2 3 4 5 ...

1340 commits