pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Zhengxu Chen	d459e79500	[jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037 Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android. ghstack-source-id: 145818696 Test Plan: eyes. Reviewed By: qihqi, tugsbayasgalan Differential Revision: D32264616 fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a	2021-12-16 13:11:46 -08:00
Jiawei Lv	b4c4a015d6	Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33163841 Original commit changeset: e262b6d8c80a Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8	2021-12-16 11:12:18 -08:00
Jiawei Lv	c80b5b8c8f	Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33102715 (`eb374de3f5`) Original commit changeset: 3816ff01c578 Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29	2021-12-16 09:39:57 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
Tristan Rice	eb374de3f5	Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923 Original commit changeset: fbaf2cc06ad4 Original Phabricator Diff: D32606547 (`e61fc1c03b`) This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck. Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor will add ciflow tags to ensure mac builds are fine Reviewed By: aivanou Differential Revision: D33102715 fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb	2021-12-15 22:51:43 -08:00
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Michael Suo	f565167fbd	Revert D32606547: torch/monitor: add C++ events and handlers Test Plan: revert-hammer Differential Revision: D32606547 (`e61fc1c03b`) Original commit changeset: a00d0364092d Original Phabricator Diff: D32606547 (`e61fc1c03b`) fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56	2021-12-11 22:51:03 -08:00
Tristan Rice	e61fc1c03b	torch/monitor: add C++ events and handlers (#68783 ) Summary: This adds a C++ event handler corresponding to the Python one mentioned in the RFC. This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32606547 fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead	2021-12-11 16:44:46 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Hao Lu	91d16cb633	[Jit] Fix schema of aten::split int[] version (#69745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745 Missed in D31935573 (`6b44e75f6b`). Reviewed By: d1jang Differential Revision: D31889867 fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756	2021-12-10 02:33:36 -08:00
Nikita Shulga	3bb20ae49f	Make c10d tests -Werror clean (#69703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997001 Pulled By: malfet fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2	2021-12-09 22:10:04 -08:00
Ivan Kobzarev	7dba88dfdb	[nnc][quant] Fix quantized concat (#69596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32941108 Pulled By: IvanKobzarev fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2	2021-12-09 18:55:32 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Richard Barnes	afb742382a	use irange for loops 10 (#69394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837991 fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32	2021-12-09 09:49:34 -08:00
Chen Lai	13faaff54c	[Operator Versioning][Edge] Implement register function for upgrader (#67730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730 This pr implement the register function for upgrader so it can be used at loading stage ghstack-source-id: 145170986 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092518 fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36	2021-12-09 02:18:09 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
Bin Bao	e8f4c9cc40	[LT] Upstream LazyView and view ops IR Nodes (#69277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277 LazyView is the main class for tracking alias caused by view ops. The corresponding IR classes for view ops are hand-written now, and we can switch to code-gen them in future. For certain view ops, they have a reverse IR class to perform inplace update in the backward direction on a chain of alias ops. As part of the future work, we will simplify the logic for LazyView once the functionalization pass in core is ready to use. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32820014 Pulled By: desertfire fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256	2021-12-04 08:44:54 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Jacob Szwejbka	291e56eda4	[Pytorch Edge] Update Black Box Api with operator versioning (#68678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678 Test Plan: Ill update the unit test before land Reviewed By: cccclai Differential Revision: D32573603 fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d	2021-12-01 19:13:32 -08:00
Chen Lai	b9738e923e	[Operator Versioning][Edge] Add old models and unittest (#67726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726 1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader. ``` _helper(mobile_module_v2, div_tensor_0_3) _helper(current_mobile_module, torch.div) ``` 2. Update the commented code accordingly. Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders: ``` // Tensor x Tensor test_versioned_div_tensor_v3 // Tensor x Scalar test_versioned_div_scalar_float_v3 test_versioned_div_scalar_reciprocal_int_v3 test_versioned_div_scalar_inplace_float_v3 // Scalar x Scalar test_versioned_div_scalar_scalar_v3 // Tensor x Tensor with out kwarg test_versioned_div_tensor_out_v3 // Tensor x Tensor inplace test_versioned_div_tensor_inplace_v3 // Tensor x Scalar inplace test_versioned_div_scalar_inplace_int_v3 ``` Note: In this pr, per model, it includes the following test: 1. Model (with old op) load/run test will be in both cpp and python 2. Model (with old op) + upgrader test will be in python Other tests considered adding: 1. per upgrader bytecode test 2. app level integration test ghstack-source-id: 144422418 Test Plan: CI and the added unittest Reviewed By: iseeyuan Differential Revision: D32069653 fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9	2021-12-01 18:46:30 -08:00
Jiewen Tan	e6c435bf96	[LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064 This commit upstreams helpers for converting a c10::Device to BackendDevice and vice versa. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten Reviewed By: wconstab Differential Revision: D32732607 Pulled By: alanwaketan fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b	2021-12-01 12:15:32 -08:00
Scott Wolchok	1d84d8c5d8	[PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410 First step toward not heap-allocating a string in RecordFunction::before() every time ghstack-source-id: 144287654 Test Plan: CI Reviewed By: chaekit Differential Revision: D32453847 fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87	2021-11-30 13:20:08 -08:00
Joel Schlosser	8fef7c09f5	Remove finput from slow2d signatures (#68896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655874 Pulled By: jbschlosser fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5	2021-11-30 09:47:24 -08:00
Jiewen Tan	0cdeb586ae	[LTC] Upstream some utilities (#69046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046 This commit upstreams utilities including ExceptionCleanup, MaybeRef, Iota, ToVector, ToOptionalVector and GetEnumValue. Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.* Reviewed By: wconstab, Chillee Differential Revision: D32709090 Pulled By: alanwaketan fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714	2021-11-30 02:44:02 -08:00
Mikhail Zolotukhin	75ce040620	[TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756 That fixes some warnings in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600952 Pulled By: ZolotukhinM fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98	2021-11-30 00:06:34 -08:00
Vinnam Kim	7b701ce2d4	Add set_to_none option to C++ API (#68801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68167. Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801 Reviewed By: mruberry Differential Revision: D32625239 Pulled By: jbschlosser fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82	2021-11-29 08:42:39 -08:00
Bin Bao	787ded5103	Add lazy::Shape::numel() (#68314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314 Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions). This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts. Test Plan: add unit tests Reviewed By: alanwaketan Differential Revision: D32409138 fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2	2021-11-29 08:38:09 -08:00
Han Qi	959cb03132	Populate operator_input_sizes_ (#68542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D32508159 fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8	2021-11-23 12:18:06 -08:00
Tristan Rice	758d7dea9c	torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a	2021-11-18 21:46:23 -08:00
Hongyi Jia	146a7f68e2	Enable desync root cause analysis for NCCL (#68310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310 Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling. Test Plan: Standalone test * Typical desync - P467288969 * Mismatched collectives - P467288916 * Mismatched broadcast size - P467288873 DDP benchmark * DDP benchmark desync - P467433483, P467520195 No perf regression: * w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs * w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32348647 fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a	2021-11-17 20:29:03 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Raghavan Raman	2fd468e5f8	[jit] Set the graph input types before interpreting the graph during tracing (#68242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D32382958 Pulled By: navahgar fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f	2021-11-15 15:44:32 -08:00
Mike Iovine	c697eeba72	[JIT] Combine concat nodes where possible (#67000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000 See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context. This new JIT optimization transforms patterns like this: ``` %inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c) %concat.1 : Tensor = aten::cat(%inputs, %dim) %inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` into this: ``` %inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` (it can do this for chains of `aten::cat` longer than 2 as well) A few conditions have to hold: 1. The `dim`s have to match. 2. `inputs.1` and `inputs.2` cannot be mutated Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt` Reviewed By: d1jang Differential Revision: D31819491 fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba	2021-11-15 12:02:45 -08:00
Mikhail Zolotukhin	e511a7a5b4	[TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277 Differential Revision: D32400553 D32400553 Test Plan: Imported from OSS Reviewed By: saketh-are, priyaramani Pulled By: ZolotukhinM fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e	2021-11-13 00:50:57 -08:00
Will Constable	6ddaf3bd37	[LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154 Test Plan: added a basic test; cover more by using lazy_tensor_staging tests Reviewed By: Krovatkin, alanwaketan Differential Revision: D32224303 fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914	2021-11-12 12:57:20 -08:00
Will Constable	dc24503a89	Fix Hash(c10::Scalar), account for garbage data in union (#68201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201 Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct. Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases. Hash() should only read the bytes corresponding to the currently active type. Test Plan: Added new unit tests. Verified HashTest.Scalar failed with the original Hash() impl and then fixed. Reviewed By: alanwaketan Differential Revision: D32367564 fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b	2021-11-12 07:20:08 -08:00
Howard Huang	7b376bf844	Remove ProcessGroup from TensorPipeAgent initialization (#68128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128 Reland of D31762735 (`0cbfd466d2`). This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler. I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls. Test Plan: rpc_pickler_test file: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx rpc_pickler stress test: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results Reviewed By: mrshenli Differential Revision: D32316077 fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4	2021-11-11 12:28:55 -08:00
Martin Yuan	bd5f33f91e	demo backend decoupled from operators (#66100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100 A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose. Test Plan: Imported from OSS Reviewed By: pavithranrao Differential Revision: D31384614 Pulled By: iseeyuan fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42	2021-11-11 10:26:17 -08:00
Will Constable	d6e6064efc	[LT] Upstream backend interfaces (#67927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927 BackendData - represents 'tensor data' in opaque backend storage LoweringContext - interface for performing backend-specific IR lowering BackendImplInterface - interface for lazy tensors backends to implement Reorgs backend-related files into lazy/backend subdir includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master. Test Plan: used by lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D32142032 fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5	2021-11-10 12:55:31 -08:00
Jiewen Tan	6011c35a79	[LTC] Upstream class BackendDevice (#68027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027 This commit upstreams class BackendDevice to the master, which is a backend specific representation of the actual hardware, for instances, CPU, GPU, or TPU. This concept is important for backend like XLA where it needs to tell the actual hardware type from the c10::DeviceType::Lazy virtual device during both IR constructions and lowerings. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* Reviewed By: wconstab Differential Revision: D32261838 Pulled By: alanwaketan fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc	2021-11-10 07:05:43 -08:00
Bin Bao	a027551358	[LT] Merge cache.h (#67929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929 1. Write a node-hash based unit test for Cache 2. Replace CHECK with TORCH_CHECK in IrUtil Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32246134 Pulled By: desertfire fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0	2021-11-09 12:02:02 -08:00
Bin Bao	a473417076	[LT] Merge permutation_util into master (#67766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab Differential Revision: D32147676 Pulled By: desertfire fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76	2021-11-09 12:00:39 -08:00

1 2 3 4 5 ...

1688 commits