pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Sameer Deshmukh	d100d98db8	`torch.linalg` routines return `torch.linalg.LinAlgError` when a numerical error in the computation is found. (#68571 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/64785 by introducing a `torch.LinAlgError` for reporting errors caused by bad values in linear algebra routines which should allow users to easily catch errors caused by numerical errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68571 Reviewed By: malfet Differential Revision: D33254087 Pulled By: albanD fbshipit-source-id: 94b59000fdb6a9765e397158e526d1f815f18f0f	2021-12-23 10:53:26 -08:00
Michael Suo	23ab6ce723	Revert D33141011: extract //c10/macros into its own package Test Plan: revert-hammer Differential Revision: D33141011 (`8f4c724bb6`) Original commit changeset: caa97448f922 Original Phabricator Diff: D33141011 (`8f4c724bb6`) fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4	2021-12-22 17:48:45 -08:00
Michael Suo	f126501d37	Revert D33141010: allow Bazel to build without glog and gflags Test Plan: revert-hammer Differential Revision: D33141010 (`8c41f258f4`) Original commit changeset: d951e5616459 Original Phabricator Diff: D33141010 (`8c41f258f4`) fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae	2021-12-22 17:47:23 -08:00
Michael Dagitses	8c41f258f4	allow Bazel to build without glog and gflags (#69995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995 ghstack-source-id: 146027060 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141010 fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8	2021-12-22 14:30:30 -08:00
Michael Dagitses	8f4c724bb6	extract //c10/macros into its own package (#69994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994 ghstack-source-id: 145799968 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141011 fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b	2021-12-22 14:30:27 -08:00
Michael Dagitses	02c63c3006	extract out c10 targets to the c10 package (#69992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69992 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33141013 fbshipit-source-id: e5edd6bd5b5834ac27390ba940ebed9148512c8d	2021-12-16 13:11:49 -08:00
CodemodService FBSourceClangFormatLinterBot	f7210f8d90	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33090919 fbshipit-source-id: 78efa486776014a27f280a01a21f9e0af6742e3e	2021-12-14 08:06:58 -08:00
Peter Bell	b08d64202a	Remove THGeneral (#69041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041 `TH_CONCAT_{N}` is still being used by THP so I've moved that into it's own header but all the compiled code is gone. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872477 Pulled By: ngimel fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a	2021-12-13 16:14:28 -08:00
Nikita Shulga	59deee8308	Make c10 tests compilable with -Werror (#69711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69711 Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997005 Pulled By: malfet fbshipit-source-id: 369194051ece9d213b48584ca84e5d76b3794dae	2021-12-10 16:47:46 -08:00
Scott Wolchok	d026057bb3	[PyTorch] Update SmallVector from LLVM (#69110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110 I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing. The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412. ghstack-source-id: 145327773 Test Plan: CI Reviewed By: bhosmer, mruberry Differential Revision: D32733017 fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d	2021-12-10 11:57:19 -08:00
Richard Barnes	29d759948e	use irange for loops 2 (#66746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705361 fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268	2021-12-10 04:26:23 -08:00
CodemodService FBSourceClangFormatLinterBot	015e481a41	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32975574 fbshipit-source-id: 66856595c7bc29921f24a2c5c00c72892f262aa1	2021-12-09 00:10:33 -08:00
anjali411	3e6164449f	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834987 Pulled By: anjali411 fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd	2021-12-08 10:37:39 -08:00
Nelson Elhage	a813ddf5ec	CUDACachingAllocator: make an error message more accurate. (#69174 ) Summary: The `TORCH_CHECK` asserts for strictly-greater-than `kLargeBuffer`, but the exception claims `>=`. Fix the error message to match the code. Happy to open an issue if it's helpful; I was hopeful the trivial fix doesn't need a separate issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69174 Reviewed By: zou3519 Differential Revision: D32760055 Pulled By: H-Huang fbshipit-source-id: 1a8ab68f36b326ed62d78afdcb198f4d6572d017	2021-12-03 15:04:59 -08:00
Mark Richardson	834bd3134e	Back out "Add efficient zero tensors" (#69327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327 Original commit changeset: d44096d88265 Original Phabricator Diff: D32144240 (`668574af4a`) Test Plan: CI original diff failed 175 builds in CI Reviewed By: airboyang, anjali411 Differential Revision: D32809407 fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071	2021-12-02 19:11:41 -08:00
Michael Carilli	572c3e3118	Fix some usages of CUDA_VERSION (#69092 ) Summary: See https://pytorch.slack.com/archives/G4Z791LL8/p1638229956006300 I grepped c10, aten, and torch for CUDA_VERSION and checked the usages I saw. I can't guarantee I made a clean sweep. but this improves the status quo. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/69092 Reviewed By: zou3519 Differential Revision: D32786919 Pulled By: ngimel fbshipit-source-id: 1d29827dca246f33118d81e136252ddb5bf3830f	2021-12-02 18:32:47 -08:00
Peter Bell	33c3c539b6	THPStorage: Prefer intrusive_ptr over owning raw pointers (#69248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69248 Reviewed By: zou3519 Differential Revision: D32771035 Pulled By: ngimel fbshipit-source-id: cf9bbcc5563ae9715ecf13631ba56c32240e59e3	2021-12-02 16:33:03 -08:00
anjali411	668574af4a	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32144240 Pulled By: anjali411 fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40	2021-12-02 08:47:45 -08:00
Hao Lu	ed3b73fd4d	[Static Runtime] Skip ProcessedNode:: verify_no_memory_overlap() for out variants (#68639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68639 Fix all problems related to `ProcessedNode:: verify_no_memory_overlap()` - Only enable this check for native and fallback ops that are not inplace or view ops - Enable ProcessedNode:: verify_no_memory_overlap() in debug mode and enforce it - Add gflag --static_runtime_disable_debug_memory_overlap_check to test the runtime memory overlap fix for bad schemas fb::expand_dims's schema was not correct after this check is re-enabled. It's fixed in D32556204 (`39ab417107`) Reviewed By: mikeiovine Differential Revision: D32553708 fbshipit-source-id: 88de63cdf1ee4f87b7726c8b65a11a5fb8a99d13	2021-12-02 05:03:12 -08:00
Peter Bell	f7d598948a	Remove native_functions.yaml dependency from TensorModeKernel.cu (#66913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66913 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31856102 Pulled By: dagitses fbshipit-source-id: 8888a1984adef09104a40ae683d091143cd1f4fa	2021-11-30 04:22:09 -08:00
Kurt Mohler	d9e7d85390	Remove TH/THC Storage (#68556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67852 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/68556 Reviewed By: ejguan Differential Revision: D32652758 Pulled By: ngimel fbshipit-source-id: 170956fca112606f9008abe09b92c6ddc411be09	2021-11-29 12:55:20 -08:00
Dhruv Matani	cb2a41e508	[PyTorch Edge] Don't use LeftRight in mobile (#66064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66064 The only place this is used seems to be in the dispatcher for `operatorLookupTable_`. Disarming `LeftRight` disarms it for this one use case. This should make .so loading faster, and also reduce memory consumption since `LeftRight<T>` does 2 writes for every write. I'd like to get a thorough review from reviewers for this diff since I want to make sure that initialization of stuff that writes into the dispatcher isn't going to happen on multiple threads for on-device use. Created a new class named `LeftRightNoOpWrapper<T>` for use in mobile builds. ### Why is LeftRight<T> slow? It maintains 2 copies of each data structure `T` to be able to keep reads quick. Every write goes to both data structures, which means that writes that 2x and memory overhead is also 2x ### Why is this safe for mobile builds? 1. .so loading never happens concurrently with model execution 2. Custom ops are loaded during .so load - initializers are all run serially 3. I don't see any threads being spawned from the global schema and kernel initializers After discussing with dreiss, it seems like there could be rare cases in OSS apps or internal Android/iOS apps where a `.so` or `dylib` is loaded after the PT runtime is loaded, and this load happens concurrently with an in-progress inference run, which is looking up the operator table in the dispatcher. To avoid crashes there, it seems reasonable to use the RW lock, since I don't expect any contention 99.9% of the time. When registering operators, everything is serial so only one thread will ever hold the lock. The next time it needs the lock, it will have already released it. During inference runs, only one thread will ask for the shared lock unless multiple concurrent inferences are in progress. Even in that case, they will all be able to simultaneously get the Read lock. Test Plan: Build and generate a local build of the iOS app to test. Reviewed By: swolchok Differential Revision: D31352346 fbshipit-source-id: c3f12454de3dbd7b421a6057d561e9373ef5bf98	2021-11-09 21:49:45 -08:00
Scott Wolchok	b0c05297f9	[Static Runtime] Arena allocate StorageImpls for managed tensors (#66130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66130 We're reusing backing storage for these tensors, which is only safe because they have non-overlapping lifetimes. Accordingly, it seems that they can also share their StorageImpl. ghstack-source-id: 142427752 Test Plan: benchmarked ctr_mobile_feed local and local_ro: Using recordio inputs for model 302008423_0 ``` swolchok@devbig032 ~/f/fbcode> env MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 > environment^C swolchok@devbig032 ~/f/fbcode> sudo ~/fbsource2/fbcode/scripts/bertrand/noise/denoise-env.sh \ /tmp/ptvsc2_predictor_benchNov1ArenaAllocateStorageImpls \ --scripted_model=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.predictor.disagg.local \ --method_name=local.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=2 --warmup_iters=2 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 --repetitions=5 --recordio_use_ivalue_format=1 --recordio_inputs=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.local.inputs.recordio Stable ======================================== I1101 14:19:16.473964 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0131. Iters per second: 49.9673 I1101 14:20:12.193130 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0155. Iters per second: 49.9612 I1101 14:21:07.761898 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9751. Iters per second: 50.0624 I1101 14:22:03.218066 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9104. Iters per second: 50.2249 I1101 14:22:58.723256 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.956. Iters per second: 50.1102 I1101 14:22:58.723306 2748837 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.974, standard deviation: 0.043643 ArenaAllocateStorageImpls ======================================== I1101 14:08:57.070914 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9771. Iters per second: 50.0572 I1101 14:09:52.605121 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.924. Iters per second: 50.1907 I1101 14:10:48.098287 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9353. Iters per second: 50.1624 I1101 14:11:43.645395 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9723. Iters per second: 50.0694 I1101 14:12:39.171636 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9673. Iters per second: 50.0819 I1101 14:12:39.171685 2695478 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.9552, standard deviation: 0.0239318 difference: 0.0188 (0.09%), which is less than 1 standard deviation Stable, local_ro ======================================== I1101 14:26:10.796161 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25991. Iters per second: 793.708 I1101 14:26:12.194727 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26862. Iters per second: 788.26 I1101 14:26:13.591312 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26549. Iters per second: 790.207 I1101 14:26:14.982439 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25943. Iters per second: 794.01 I1101 14:26:16.377033 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25995. Iters per second: 793.68 I1101 14:26:16.377094 2787930 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.26268, standard deviation: 0.00414788 ArenaAllocateStorageImpls, local_ro ======================================== I1101 14:26:45.875073 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20987. Iters per second: 826.536 I1101 14:26:47.207271 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20827. Iters per second: 827.633 I1101 14:26:48.533766 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20023. Iters per second: 833.174 I1101 14:26:49.850610 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19206. Iters per second: 838.884 I1101 14:26:51.172356 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19958. Iters per second: 833.622 I1101 14:26:51.172411 2790009 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.202, standard deviation: 0.00722754 Difference: 0.06 usec/iter (4.8%), which is much more than 1 standard deviation ``` we can see that this is a large relative improvement on local_ro, but no effect on local. Reviewed By: hlu1 Differential Revision: D31357486 fbshipit-source-id: 229c003677da76e89c659d0e0639002accced76e	2021-11-04 15:43:39 -07:00
Richard Barnes	a122ba776a	Fix less_than_lowest warnings (#67422 ) Summary: Fixes useless comparison against zero warnings for Half.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/67422 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31951939 fbshipit-source-id: 3e9940adda2d57b4d9b122f3862706c673f9ef4b	2021-11-01 11:19:55 -07:00
Brian Hirsh	0032fa7725	Add a Functionalization pass in core (#64432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432 Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048 I've addressed all of the feedback in the original PR and made some pretty large changes, listed below. Table of Contents - Starting points - List of the main changes from the original PR - Next Steps - Example codegen output (for a view, mutation, and view+mutation op) Starting Points A good place to start when looking through the PR: * Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass. * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement. * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))` * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic. * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen. * documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large) * documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12 * Reading through the codegen output at the bottom of this description. Main changes from the original PR (1) I use lambdas instead of a giant enum to handle all of the different views. This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`) (2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`. This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now. (3) `FunctionalTensorWrapper` objects accurately report stride information. It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual wrapper tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping. To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it. (4) `FunctionalTensorWrapper` objects accurately report aliasing information. There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage). One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set. Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)? (5) better docs :) View operator coverage (6) The functionalization pass now gets math-composite view ops for free. I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation. There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets. (7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}). From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation (8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`. These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op. The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()). I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`). I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing. Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though: * the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators). * For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites): * select * slice * diagonal * as_stridied * split * split_with_sizes A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though. Current State + Next Steps There are a bunch of followups after this PR eventually lands. Roughly in order: * Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it). * Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys * Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway. * Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage. Example Codegen Output View Op: ``` ::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) { auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); ::std::vector<at::Tensor> out; { at::AutoDispatchBelowFunctionalize guard; auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim); out = at::functionalization::impl::wrapFunctionalTensor(tmp_output); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.split(split_size, dim)[mutated_view_idx]; }, [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim); } ); at::functionalization::impl::set_view_meta(out, self, view_meta); at::AutoDispatchDirectlyToNative native_guard; ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim); at::functionalization::impl::set_strides(out, reference_tensor_output); return out; } ``` Mutation Op: ``` at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { at::functionalization::impl::sync(self); at::functionalization::impl::sync(other); auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other); at::Tensor tmp_output; { at::AutoDispatchBelowFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch tmp_output = at::redispatch::add( ks & c10::after_func_keyset, self_, other_, alpha); } self.replace_(tmp_output); at::functionalization::impl::maybe_add_update(self); return self; } ``` View + Mutation Op: ``` at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) { at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.transpose(dim0, dim1); }, [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1); } ); at::functionalization::impl::mutate_view_meta(self, view_meta); // See Note [Propagating strides in the functionalization pass] // Directly update the sizes/strides/storage_offset fields on self using the inplace call. // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels. // Its only job is to directly compute the output size/stride/storage_offset metadata. at::AutoDispatchDirectlyToNative native_guard; at::native::transpose_(self, dim0, dim1); return self; } ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942093 Pulled By: bdhirsh fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf	2021-10-28 10:51:17 -07:00
Richard Barnes	9900310133	Fix sign warnings in CUDA kernels (#66753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66753 Fixes these Wextra compilation errors: ``` stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits] 49 \| AT_DISPATCH_ALL_TYPES_AND2 (`44fd312604`)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() { \| ~~^~~ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ ``` And also these warnings: ``` caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here ``` I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (`cff1ff7fb6`), but apparently that was insufficient. Test Plan: Sandcastle Reviewed By: malfet, ngimel Differential Revision: D31708173 fbshipit-source-id: 7714f6bbf109d2f2164630d3fc46bad18046c06c	2021-10-27 13:39:27 -07:00
Horace He	4fe8055b9f	made functorch not decompose by default (#66945 ) Summary: Basically reverting this: https://github.com/pytorch/pytorch/pull/63616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66945 Reviewed By: zou3519 Differential Revision: D31802176 Pulled By: Chillee fbshipit-source-id: b1cabd7af66aef26411801516c87336eaea4fccb	2021-10-21 19:18:00 -07:00
andrewor	e046386be8	Avoid inlining error reporting in checked_convert (#66721 ) Summary: Summary: Move the error reporting part to the cpp file to avoid callers inlining it, which inflates the generated code size. See https://github.com/pytorch/pytorch/issues/65830. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66721 Test Plan: Compiling the simple program below now generates ~150 lines of assembly, compared to 700+ lines before. ``` #include <c10/core/Scalar.h> void g(float) {} void f(const c10::Scalar& scalar) { auto x = scalar.to<float>(); g(x); } ``` Reviewers: Brian Hirsh Subscribers: Brian Hirsh, Edward Yang, Yining Lu Tasks: T103384490 Tags: pytorch Fixes https://github.com/pytorch/pytorch/issues/65830 Reviewed By: zou3519, bdhirsh Differential Revision: D31737607 Pulled By: andrewor14 fbshipit-source-id: 3d493c4d8e51d8f8a19d00f59b8ea28176c8a9e3	2021-10-20 16:04:09 -07:00
Giuseppe Ottaviano	72803dbcfd	[caffe2] Fix invalid vector accesses and polar() call (#66757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757 `InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future. Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++). Test Plan: JIT tests pass. Reviewed By: zhxchen17 Differential Revision: D31715587 fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33	2021-10-19 00:29:54 -07:00
Scott Wolchok	44fd312604	[PyTorch] Use intrusive_ptr to save space in KernelFunction (#65618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65618 This saves 8 bytes per KernelFunction, which should help in resource-constrained environments. ghstack-source-id: 140731069 Test Plan: CI Reviewed By: ezyang Differential Revision: D25405736 fbshipit-source-id: 757c0f1387da9147e46ac69af2aa9fffd2998e35	2021-10-18 12:53:45 -07:00
Mengwei Liu	53aac4b6f3	[PyTorch] Allow override for macro `HAS_DEMANGLE` (#66540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540 Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files. Test Plan: Rely on CI Reviewed By: poweic Differential Revision: D31600007 fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036	2021-10-17 16:10:45 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Richard Barnes	bd25f92e81	Fix Wextra issues in Half.h (#66643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66643 Fixes: ``` caffe2/c10/util/Half.h:456:14: error: comparison of integers of different signs: 'long' and 'unsigned long' [-Werror,-Wsign-compare] return f > limit::max() \|\| ~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31656816 fbshipit-source-id: 7623d20e166a9e95a949ebd8b23793f24960cf07	2021-10-15 13:38:10 -07:00
Peter Bell	5f45927d15	Autograd: Delay warnings until the end of backward execution (#66235 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50209 This adds a new warning handler that stores all warnings in a shared queue, which can be "replayed" at a later time and, crucially, on another thread. Then, I use this inside the autograd engine to ensure that warnings are processed by the handler registered on the main thread. For testing, I also add an operator that always warns in the backward pass and test that the warning is a normal Python warning. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235 Reviewed By: ejguan Differential Revision: D31505413 Pulled By: albanD fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9	2021-10-13 15:38:04 -07:00
Mengwei Liu	d8532e3524	[PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445 `Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`. Test Plan: Rely on CI Reviewed By: iseeyuan Differential Revision: D31551432 fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147	2021-10-11 21:52:24 -07:00
Nikita Shulga	acb0157a3d	Specialization for `c10::util:get_type_index<std::string>` (#66290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66290 Add full specialization for std::string type index It slightly speeds up compilation as well as solves the ambiguity how template instantiations implemented in inline namespaces are rendered during `__PRETTY_FUNCTION__` computation. Not sure what `#pragma` controls this behaviour, but when code is compiled by clang-12+ using libstdc++, `__PRETTY_PRINT__`, sometimes resolve `std::string` to `std::basic_string<char>` and sometimes to `std::__cxx11::basic_string<char>`, even though in the object file symbol is always inside `std::__cxx11::` namespace, which might break caffe2 serialization code that depends on dynamic hash generation Template name resolution were debugged using https://gist.github.com/malfet/c83b9ebd35730ebf8bac7af42682ea37 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: r-barnes Differential Revision: D31490050 fbshipit-source-id: 127091574cf6b92c7ec3f972821e4e76f5f626a9	2021-10-11 11:11:59 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Luca Wehrstedt	bc06eefebe	[reland] Allow external CUDA streams to be set as current (#66324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66324 Fixes https://github.com/pytorch/pytorch/issues/65822. Reland of https://github.com/pytorch/pytorch/pull/65914. ghstack-source-id: 140105651 Test Plan: Added tests Reviewed By: ngimel Differential Revision: D31506134 fbshipit-source-id: ff56203a120befdb282e974309478ac11aa56652	2021-10-11 02:41:43 -07:00
Richard Barnes	109aa135e6	Remove apparently unnecessary std::remove_cv_t (#66254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66254 `std::decay_t` already implies dropping the const Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31465856 fbshipit-source-id: 851cdb9194354fe9a89b3a37a4463a43dbbcd77a	2021-10-09 00:38:44 -07:00
Scott Wolchok	c80693f7e6	[jit] Add cache for MemoryDAG::collectAllContainedMemoryLocations (#65122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65122 Failure to cache this seems to contribute to quadratic startup time for the static runtime. Disclaimer: I am entirely un-versed in the performance considerations for the JIT and have no idea what the other impacts of this change may be. Let the reviewer beware. ghstack-source-id: 140052522 Reviewed By: suo Differential Revision: D30983268 fbshipit-source-id: 4329aee6b5781f5c2e2d2334c396fab8528d4b7b	2021-10-08 10:29:23 -07:00
Luca Wehrstedt	201174cb91	Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current Test Plan: revert-hammer Differential Revision: D31389480 (`61f0bb70c1`) Original commit changeset: 2b2f40e5452c fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d	2021-10-08 09:20:24 -07:00
Scott Wolchok	1ae468a484	[jit] Refcounting spot fixes (#65346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346 Tidying up the top sources of reference count decrements seen during static runtime startup. ghstack-source-id: 140027349 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31057277 fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da	2021-10-08 08:39:20 -07:00
Luca Wehrstedt	61f0bb70c1	Allow external CUDA streams to be set as current (#65914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65822. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914 Reviewed By: dagitses Differential Revision: D31389480 Pulled By: lw fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead	2021-10-08 06:09:32 -07:00
CodemodService FBSourceClangFormatLinterBot	227f91e72d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31495160 fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61	2021-10-07 21:09:22 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Shijun Kong	e2be087207	[oss][pytorch] Add quint2x4 dtype (#65545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65545 Introduce 2bit qtensor. The new dtype added for this is c10::quint2x4 The underlying storage for this is still uint8_t, so we pack 4 2-bit values in a byte while quantizing it. Kernels that use this dtype should be aware of the packing format. (4 2-bit values in one byte) Test Plan: `buck test mode/dev-asan caffe2/test/:quantization -- test_qtensor` Reviewed By: supriyar Differential Revision: D31148141 fbshipit-source-id: 1dc1de719e097adaf93fee47c6d1b8010a3eae6c	2021-10-06 14:22:00 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Dhruv Matani	e7747795c9	[PyTorch Edge] Reduce dispatch table size further for a trimmed build (#66112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66112 Eliminate Metal and Vulkan Dispatch Keys. Test Plan: Build + Sandcastle Differential Revision: D31298307 fbshipit-source-id: 31302fc626382db7997e5058750fa85458c9cbc1	2021-10-05 15:24:07 -07:00

1 2 3 4 5 ...

1260 commits