pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Peter Bell	587f8d9924	OperatorEntry: Avoid unnecessarily templated code (#67986 ) Summary: `assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986 Reviewed By: jbschlosser Differential Revision: D33108600 Pulled By: swolchok fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346	2021-12-15 12:43:53 -08:00
Sahan Chanuka Paliskara	986d19c0a7	Avoid adding torch::deploy interpreter library to the data section (#69245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69245 Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt. Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy check the size of the .data section Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB ``` (pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy There are 55 section headers, starting at offset 0x24bac82b0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000200350 00000350 0000000000000028 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000200378 00000378 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000200398 00000398 0000000000000024 0000000000000000 A 0 0 4 [ 4] .dynsym DYNSYM 00000000002003c0 000003c0 0000000000d07a48 0000000000000018 A 9 1 8 [ 5] .gnu.version VERSYM 0000000000f07e08 00d07e08 0000000000115f86 0000000000000002 A 4 0 2 [ 6] .gnu.version_r VERNEED 000000000101dd90 00e1dd90 0000000000000510 0000000000000000 A 9 15 4 [ 7] .gnu.hash GNU_HASH 000000000101e2a0 00e1e2a0 00000000003b4fb0 0000000000000000 A 4 0 8 [ 8] .hash HASH 00000000013d3250 011d3250 0000000000457e20 0000000000000004 A 4 0 4 [ 9] .dynstr STRTAB 000000000182b070 0162b070 0000000004ef205a 0000000000000000 A 0 0 1 [10] .rela.dyn RELA 000000000671d0d0 0651d0d0 0000000000110b80 0000000000000018 A 4 0 8 [11] .rela.plt RELA 000000000682dc50 0662dc50 00000000000093f0 0000000000000018 A 4 35 8 [12] .rodata PROGBITS 0000000006837040 06637040 00000000034067a8 0000000000000000 AMS 0 0 64 [13] fb_build_info PROGBITS 0000000009c3d7f0 09a3d7f0 00000000000002ee 0000000000000000 A 0 0 16 [14] .gcc_except_table PROGBITS 0000000009c3dae0 09a3dae0 00000000014a9340 0000000000000000 A 0 0 4 [15] .eh_frame_hdr PROGBITS 000000000b0e6e20 0aee6e20 00000000004abf54 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 000000000b592d78 0b392d78 000000000200e344 0000000000000000 A 0 0 8 [17] .text PROGBITS 000000000d5a2000 0d3a2000 000000001e55944e 0000000000000000 AX 0 0 256 [18] .init PROGBITS 000000002bafb450 2b8fb450 0000000000000017 0000000000000000 AX 0 0 4 [19] .fini PROGBITS 000000002bafb468 2b8fb468 0000000000000009 0000000000000000 AX 0 0 4 [20] .never_hugify PROGBITS 000000002bafb480 2b8fb480 0000000000000db3 0000000000000000 AX 0 0 16 [21] text_env PROGBITS 000000002bafc240 2b8fc240 0000000000002e28 0000000000000000 AX 0 0 16 [22] .plt PROGBITS 000000002baff070 2b8ff070 00000000000062b0 0000000000000000 AX 0 0 16 [23] .tdata PROGBITS 000000002bb06000 2b906000 0000000000000b20 0000000000000000 WAT 0 0 8 [24] .tbss NOBITS 000000002bb06b40 2b906b20 0000000000007cb8 0000000000000000 WAT 0 0 64 [25] .fini_array FINI_ARRAY 000000002bb06b20 2b906b20 0000000000000028 0000000000000000 WA 0 0 8 [26] .init_array INIT_ARRAY 000000002bb06b48 2b906b48 0000000000008878 0000000000000000 WA 0 0 8 [27] .data.rel.ro PROGBITS 000000002bb0f3c0 2b90f3c0 0000000000029ce0 0000000000000000 WA 0 0 64 [28] .ctors PROGBITS 000000002bb390a0 2b9390a0 0000000000000010 0000000000000000 WA 0 0 8 [29] .dynamic DYNAMIC 000000002bb390b0 2b9390b0 0000000000000340 0000000000000010 WA 9 0 8 [30] .got PROGBITS 000000002bb393f0 2b9393f0 000000000001f040 0000000000000000 WA 0 0 8 [31] .bss.rel.ro NOBITS 000000002bb58440 2b958430 0000000000000c40 0000000000000000 WA 0 0 32 [32] .data PROGBITS 000000002bb5a000 2b959000 0000000000194188 0000000000000000 WA 0 0 4096 [33] .tm_clone_table PROGBITS 000000002bcee188 2baed188 0000000000000000 0000000000000000 WA 0 0 8 [34] .probes PROGBITS 000000002bcee188 2baed188 0000000000000002 0000000000000000 WA 0 0 2 [35] .got.plt PROGBITS 000000002bcee190 2baed190 0000000000003168 0000000000000000 WA 0 0 8 [36] .bss NOBITS 000000002bcf1300 2baf02f8 00000000005214f0 0000000000000000 WA 0 0 128 [37] .nvFatBinSegment PROGBITS 000000002c213000 2baf1000 0000000000002850 0000000000000000 A 0 0 8 [38] .nv_fatbin PROGBITS 000000002c216000 2baf4000 0000000052baed38 0000000000000000 WA 0 0 8 [39] .comment PROGBITS 0000000000000000 7e6a2d38 00000000000001dc 0000000000000000 MS 0 0 1 [40] .debug_aranges PROGBITS 0000000000000000 7e6a2f20 0000000001266c00 0000000000000000 0 0 16 [41] .debug_info PROGBITS 0000000000000000 7f909b20 000000007b21de49 0000000000000000 0 0 1 [42] .debug_abbrev PROGBITS 0000000000000000 fab27969 000000000179f365 0000000000000000 0 0 1 [43] .debug_line PROGBITS 0000000000000000 fc2c6cce 00000000176954ac 0000000000000000 0 0 1 [44] .debug_str PROGBITS 0000000000000000 11395c17a 0000000039dc32b0 0000000000000001 MS 0 0 1 [45] .debug_ranges PROGBITS 0000000000000000 14d71f430 0000000026a2d930 0000000000000000 0 0 16 [46] .debug_types PROGBITS 0000000000000000 17414cd60 000000000b211ff5 0000000000000000 0 0 1 [47] .debug_loc PROGBITS 0000000000000000 17f35ed55 000000009ca80c7e 0000000000000000 0 0 1 [48] .debug_macinfo PROGBITS 0000000000000000 21bddf9d3 000000000000151c 0000000000000000 0 0 1 [49] .note.stapsdt NOTE 0000000000000000 21bde0ef0 0000000000001b3c 0000000000000000 0 0 4 [50] .debug_macro PROGBITS 0000000000000000 21bde2a2c 0000000000040e6a 0000000000000000 0 0 1 [51] .torch_deploy_pay PROGBITS 0000000000000000 21be23896 0000000026ba5d28 0000000000000000 0 0 1 [52] .symtab SYMTAB 0000000000000000 2429c95c0 00000000020ce0c8 0000000000000018 54 863985 8 [53] .shstrtab STRTAB 0000000000000000 244a97688 000000000000025c 0000000000000000 0 0 1 [54] .strtab STRTAB 0000000000000000 244a978e4 00000000070309c6 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` Reviewed By: shunting314 Differential Revision: D32596676 fbshipit-source-id: 1ab15b2d36422506d8f781d3bbc0c70c44bc3d91	2021-12-15 11:27:57 -08:00
Scott Wolchok	c6bcfb152d	[PyTorch][easy] Move GlobalRecordFunctionCallbacks{,Entry} to cpp file (#68483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483 Doesn't need to be in the header. ghstack-source-id: 145668417 Test Plan: CI Reviewed By: chaekit Differential Revision: D32477113 fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d	2021-12-15 09:38:51 -08:00
Mike Iovine	873585da2b	[SR] Improve set_inputs (#69087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087 This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`: 1. Eliminate code duplication between rvalue/lvalue overloads 2. Add type checks 3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed. 4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.\| ghstack-source-id: 145599837 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32711705 fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b	2021-12-15 09:31:19 -08:00
Scott Wolchok	aeedd89d4e	[PyTorch] RecordFunction: use SmallVector for ObserverContextList (#68412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412 These lists have the same size as CallbackHandles, so they should be the same container type. ghstack-source-id: 145668416 Test Plan: Run same command as previous diff. Before: see previous diff, average about 0.46us After: P467928077, average about 0.43us Reviewed By: chaekit Differential Revision: D32454856 fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56	2021-12-15 09:31:16 -08:00
Jane Xu	29914f55bf	Skip print_test_stats checks for tests that use repeat_test_for_types (#69872 ) Summary: Once https://github.com/pytorch/pytorch/issues/69865 is fixed, this change should be undone. This will avoid print_test_stats errors in CI, such as https://github.com/pytorch/pytorch/runs/4501145212?check_suite_focus=true (HUD view `fc37e5b3ed`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69872 Reviewed By: dagitses, suo Differential Revision: D33094446 Pulled By: janeyx99 fbshipit-source-id: 7378556d75ea94dd407a2bf9dda37b15c57014f7	2021-12-15 09:29:58 -08:00
Nikita Shulga	d71b8e1a8d	More distutils.version.LooseVersion changes (#69947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69947 Reviewed By: seemethere Differential Revision: D33111996 Pulled By: malfet fbshipit-source-id: e7d2cc4ed3e39452e809965e360b05f0b409ec0d	2021-12-15 08:07:36 -08:00
Alban Desmaison	6f9844693f	Revert D32974907: [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops Test Plan: revert-hammer Differential Revision: D32974907 (`bf089840ac`) Original commit changeset: ba205e74b566 Original Phabricator Diff: D32974907 (`bf089840ac`) fbshipit-source-id: e47838f3008ba014d884aef53460df654f0cf731	2021-12-15 05:46:49 -08:00
Alban Desmaison	87bc1f4ed8	Revert D33024528: [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict Test Plan: revert-hammer Differential Revision: D33024528 (`59000cff91`) Original commit changeset: 5c770c82c8f6 Original Phabricator Diff: D33024528 (`59000cff91`) fbshipit-source-id: 7da6f421ef63f47fbffad8b3ad91f6a31d19d867	2021-12-15 05:45:29 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	43b8e833e9	Fix bug in aten::full signature in version_map.h to accurately reflect the current schema (#69860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860 Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full. Test Plan: None Reviewed By: gmagogsfm Differential Revision: D33066985 fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14	2021-12-15 01:09:31 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	5c7817fd43	Add test operator in upgrader entry (#69427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D32867984 Pulled By: tugsbayasgalan fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c	2021-12-15 00:40:05 -08:00
soulitzer	47f11730ec	Add testing for forward over reverse gradgrad (#69740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69740 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031727 Pulled By: soulitzer fbshipit-source-id: 2bcba422b4bcea3bbc936d07ba45171a6531e578	2021-12-14 23:35:10 -08:00
soulitzer	d0fe7db1f6	Add formulas for distributions (#69690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69690 * #69558 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031726 Pulled By: soulitzer fbshipit-source-id: 9ae461dc6043d48d5bb8c2bbaa266d06ad99f317	2021-12-14 23:35:07 -08:00
soulitzer	b399a4d7b9	Add some reduction forward AD formulas (#69661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020601 Pulled By: soulitzer fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e	2021-12-14 23:33:43 -08:00
Scott Wolchok	3b7fc0243c	[PyTorch] Make TypePrinter take const Type& (#69412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412 TypePrinter does not need to take ownership of the Type. This helps unblock the following diff to stop refcounting Type singletons. ghstack-source-id: 145671619 Test Plan: CI Reviewed By: suo Differential Revision: D32858525 fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d	2021-12-14 23:13:03 -08:00
CodemodService FBSourceBuckFormatLinterBot	7a12b5063e	[AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily `arc lint --take BUCKFORMAT` Reviewed By: zertosh Differential Revision: D33119794 fbshipit-source-id: ca327caf34560c0bba32511e57d5dc18b71bdfe1	2021-12-14 21:54:41 -08:00
Jerry Zhang	59000cff91	[quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#69778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778 This PR extends fusion pattern support from simple sequence of ops to a simple subgraph like conv - add ``` x - conv ---\ y ---------add ---- ouptut ``` where input x, y and output are observed/quantized Test Plan: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33024528 fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef	2021-12-14 20:46:01 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Jerry Zhang	bf089840ac	[quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#69658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658 This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one TODO: we can also move this to backend_config_dict folder Test Plan: regression fusion test ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32974907 fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1	2021-12-14 19:04:21 -08:00
Mike Iovine	102684b252	[SR] Fix stack/concat bug (#68777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777 Fixed some cases where negative dimensions were not handled correctly * `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547). * concat also needs to wrap its dim Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Added new tests to cover this case Reviewed By: hlu1 Differential Revision: D32604623 fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf	2021-12-14 16:26:27 -08:00
David Berard	ebc35a7ead	[JIT] Enable freezing for sparse COO tensors (#69614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614 Previously sparse COO tensors were ignored during freezing, because `tryInsertConstant` would fail during `freeze_module.cpp`, and because hashes weren't implemented for COO tensor IValues. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32954620 Pulled By: davidberard98 fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831	2021-12-14 15:43:50 -08:00
Brian Hirsh	33363cea64	Revert D32498572: allow external backend codegen to be used without autograd kernels Test Plan: revert-hammer Differential Revision: D32498572 (`b83b6f7424`) Original commit changeset: 3e7159c633f6 Original Phabricator Diff: D32498572 (`b83b6f7424`) fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535	2021-12-14 15:28:49 -08:00
Brian Hirsh	f6cad53443	Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels Test Plan: revert-hammer Differential Revision: D32498569 (`aa0cf68c17`) Original commit changeset: ebd932d042b9 Original Phabricator Diff: D32498569 (`aa0cf68c17`) fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d	2021-12-14 15:27:24 -08:00
Brian Hirsh	0ef523633f	Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen Test Plan: revert-hammer Differential Revision: D32498570 (`2e7a91c45f`) Original commit changeset: 0ce6a5614417 Original Phabricator Diff: D32498570 (`2e7a91c45f`) fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289	2021-12-14 15:04:10 -08:00
Nikita Shulga	24ee1d13f6	Another attempt to fix version comparison check (#69939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939 Reviewed By: atalman Differential Revision: D33108135 Pulled By: malfet fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c	2021-12-14 14:54:15 -08:00
Mike Guo	d4f8313497	Add low level torch.profiler.kineto_profile base class (#63302 ) Summary: Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper. The PR include the following change: 1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile. 2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler. 3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler. 4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction. With the refactoring, the responsibility boundary is clear and the new logic is simple to understand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302 Reviewed By: albanD Differential Revision: D33006442 Pulled By: robieta fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690	2021-12-14 14:47:43 -08:00
kshitij12345	e8d5c7cf7f	[nn] mha : no-batch-dim support (python) (#67176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 * [x] Update docs * [x] Tests for shape checking Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests. ``` pytest test/test_modules.py -k _multih --durations=20 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 372 items / 336 deselected / 36 selected test/test_modules.py ..............ssssssss.............. [100%] ================================================================================================ warnings summary ================================================================================================ ../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73 test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ============================================================================================== slowest 20 durations ============================================================================================== 8.66s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64 2.02s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64 1.89s call test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64 1.01s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 0.51s call test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64 0.46s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32 0.45s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64 0.44s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32 0.18s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64 0.17s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32 0.16s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64 0.11s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64 ============================================================================================ short test summary info ============================================================================================= =========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s =========================================================================== ``` cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176 Reviewed By: dagitses Differential Revision: D33094285 Pulled By: jbschlosser fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d	2021-12-14 13:21:21 -08:00
Shirong Wu	37ec99c0e4	Open source trt lowering workflow (#69381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381 Open source lowering workflow, related tools and tests. Test Plan: CI Reviewed By: 842974287 Differential Revision: D32815136 fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3	2021-12-14 13:00:21 -08:00
Nikita Shulga	930067d129	Build clang builds with -Werror (#69712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69712 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997002 Pulled By: malfet fbshipit-source-id: 8ebb5a955f8ae2d3fb67bc70636a2b1d66010c84	2021-12-14 12:41:57 -08:00
hwangdeyu	c76c6e9bd3	[ONNX] Add BFloat16 type support when export to ONNX (#66788 ) Summary: - PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model. - Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788 Reviewed By: jansel Differential Revision: D32283510 Pulled By: malfet fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a	2021-12-14 12:23:32 -08:00
Wanchao Liang	800a457b6f	[shard] add ShardedOptimizer (#68607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607 This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor The state_dict support will be a follow up diff ghstack-source-id: 145532834 Test Plan: python test_sharded_optim.py Reviewed By: pritamdamania87 Differential Revision: D32539994 fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743	2021-12-14 12:15:20 -08:00
Brian Hirsh	457ba1dd3e	Porting index_add to structured kernels, add an out variant (#65993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993 This PR attempts to port `index_add` to structured kernels, but does more than that: * Adds an `out=` variant to `index_add` * Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`. * Changes in `derivatives.yaml` file for autograd functioning * Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615 Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this) ~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~ Issue tracker: https://github.com/pytorch/pytorch/issues/55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32646426 fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5	2021-12-14 11:57:13 -08:00
Brian Hirsh	9594a94d80	fix CompositeImplicitAutograd ops improperly labeled (#69863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69863 This reverts commit `41c344d460`. Test Plan: Imported from OSS Reviewed By: albanD, soulitzer Differential Revision: D33072958 Pulled By: bdhirsh fbshipit-source-id: 3d3488f37986256986ab009d6f16476f29cff625	2021-12-14 11:47:07 -08:00
Nikita Shulga	269e92669a	[c2] Remove unused private fields (#69709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709 Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997006 Pulled By: malfet fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c	2021-12-14 11:31:08 -08:00
Nikita Shulga	fef9981998	Update run_test.py (#69920 ) Summary: Do not compare LooseVersion against string Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920 Reviewed By: atalman Differential Revision: D33101166 Pulled By: malfet fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5	2021-12-14 11:26:56 -08:00
Andrew Or	3e43c478a8	[Quant][fx] Lower reference conv[1-3]d module (#69228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69228 Implement lowering logic for reference conv modules, similar to https://github.com/pytorch/pytorch/pull/65723. ghstack-source-id: 145058198 Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_lowering Imported from OSS Reviewed By: anjali411 Differential Revision: D32890743 fbshipit-source-id: 04f2500628c60b0fbc84d22705164215e190aeba	2021-12-14 11:23:39 -08:00
Kevin Tse	b67eaec853	[DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862 Fixes #69445 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan, ngimel Differential Revision: D33068792 Pulled By: NivekT fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969	2021-12-14 11:18:26 -08:00
Peter Bell	1188d89a1d	TestMathBits: Call functions with original sample input values (#68947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947 `_test_math_view` currently calls the operator with different values than those specified in the `SampleInput`. This is undesirable as it could break mathematical properties required by the operator. Instead, this calls `math_op_view(math_op_physical(sample.input))` to get a view that represents the same value as the original input. `test_neg_view` already did this by returning `torch._neg_view(-x)` from `math_op_view` but this moves the handling into `_test_math_view` to make it apply to all view op tests. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064327 Pulled By: anjali411 fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8	2021-12-14 11:10:13 -08:00
Rui Zhu	1a299d8f1b	Add support for transformer layout of masked_softmax (#69272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272 In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D). This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory. In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread. This new layout is not currently supported in CPU yet. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32605557 fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2	2021-12-14 10:51:58 -08:00
Brian Hirsh	2e7a91c45f	make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498570 Pulled By: bdhirsh fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0	2021-12-14 10:25:04 -08:00
Brian Hirsh	aa0cf68c17	allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498569 Pulled By: bdhirsh fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04	2021-12-14 10:25:02 -08:00
Brian Hirsh	b83b6f7424	allow external backend codegen to be used without autograd kernels (#68529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32498572 Pulled By: bdhirsh fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca	2021-12-14 10:23:12 -08:00
Rick Weyrauch	8acd0a8b2f	Allow row sizes to support int64/size_t. (#69303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792 Follow up to D32715453 (`e60fd10659`), allowing row size to be 64-bit. Test Plan: buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt //caffe2/test: Reviewed By: jspark1105, jianyuh Differential Revision: D32768838 fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a	2021-12-14 10:09:08 -08:00
francescocastelli	2c9dd886af	Modify torch.movedim to handle scalar as no-op (#69537 ) Summary: `torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537 Test Plan: This code now works fine and res1 is a view of tensor ``` import torch tensor = torch.rand(torch.Size([])) res1 = torch.movedim(tensor, 0, 0) ``` Fixes https://github.com/pytorch/pytorch/issues/69432 Reviewed By: jbschlosser Differential Revision: D33020014 Pulled By: albanD fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007	2021-12-14 09:55:59 -08:00
Ivan Kobzarev	7503ec58b2	[nnc][fix] xnnpack ifdef (#69870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69870 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33075061 Pulled By: IvanKobzarev fbshipit-source-id: dd53ad8b7d0ff36a68f0864540d6f7dd2284f0e0	2021-12-14 09:50:24 -08:00
Donald Dong	f7294cd865	[Static Runtime] Skip ReplaceWithCopy when inputs have writters (#69819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819 We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them. Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB. Test Plan: - Added test cases, including a one that is problematic before this diff - CI Reviewed By: mikeiovine Differential Revision: D33052562 fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7	2021-12-14 09:39:49 -08:00
Nikita Shulga	07767569c9	Properly import LooseVersion (#69904 ) Summary: This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040 Somehow importing `distutils` from `setuptool` caused import of `distutils.versions`, which is not a documented dependency and got change with the release of [setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0) We should not rely on that, as `import distutils` never re-imports `distutils.version`, which one can see by observing https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py or by running: ``` % python3 -c "import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys'] % python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version'] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904 Reviewed By: albanD, atalman, janeyx99 Differential Revision: D33094453 Pulled By: malfet fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97	2021-12-14 09:28:19 -08:00
John Muradeli	fdcb78df38	`print` fix in `lr_scheduler` (#68338 ) Summary: `{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338 Reviewed By: jbschlosser Differential Revision: D33063970 Pulled By: albanD fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884	2021-12-14 09:05:19 -08:00

1 2 3 4 5 ...

42420 commits