Commit graph

42420 commits

Author SHA1 Message Date
Taylor Robie
24bc3be146 [Profiler] Clean up profiler includes. (#69421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421

I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.

Test Plan: Unit tests and CI.

Reviewed By: aaronenyeshi, albanD

Differential Revision: D32865907

fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
2021-12-15 12:50:24 -08:00
Peter Bell
587f8d9924 OperatorEntry: Avoid unnecessarily templated code (#67986)
Summary:
`assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986

Reviewed By: jbschlosser

Differential Revision: D33108600

Pulled By: swolchok

fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346
2021-12-15 12:43:53 -08:00
Sahan Chanuka Paliskara
986d19c0a7 Avoid adding torch::deploy interpreter library to the data section (#69245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69245

Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt.

Test Plan:
buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy
readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy
check the size of the .data section
Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so

The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB

```
(pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy
There are 55 section headers, starting at offset 0x24bac82b0:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000200350  00000350
       0000000000000028  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000200378  00000378
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000200398  00000398
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .dynsym           DYNSYM           00000000002003c0  000003c0
       0000000000d07a48  0000000000000018   A       9     1     8
  [ 5] .gnu.version      VERSYM           0000000000f07e08  00d07e08
       0000000000115f86  0000000000000002   A       4     0     2
  [ 6] .gnu.version_r    VERNEED          000000000101dd90  00e1dd90
       0000000000000510  0000000000000000   A       9    15     4
  [ 7] .gnu.hash         GNU_HASH         000000000101e2a0  00e1e2a0
       00000000003b4fb0  0000000000000000   A       4     0     8
  [ 8] .hash             HASH             00000000013d3250  011d3250
       0000000000457e20  0000000000000004   A       4     0     4
  [ 9] .dynstr           STRTAB           000000000182b070  0162b070
       0000000004ef205a  0000000000000000   A       0     0     1
  [10] .rela.dyn         RELA             000000000671d0d0  0651d0d0
       0000000000110b80  0000000000000018   A       4     0     8
  [11] .rela.plt         RELA             000000000682dc50  0662dc50
       00000000000093f0  0000000000000018   A       4    35     8
  [12] .rodata           PROGBITS         0000000006837040  06637040
       00000000034067a8  0000000000000000 AMS       0     0     64
  [13] fb_build_info     PROGBITS         0000000009c3d7f0  09a3d7f0
       00000000000002ee  0000000000000000   A       0     0     16
  [14] .gcc_except_table PROGBITS         0000000009c3dae0  09a3dae0
       00000000014a9340  0000000000000000   A       0     0     4
  [15] .eh_frame_hdr     PROGBITS         000000000b0e6e20  0aee6e20
       00000000004abf54  0000000000000000   A       0     0     4
  [16] .eh_frame         PROGBITS         000000000b592d78  0b392d78
       000000000200e344  0000000000000000   A       0     0     8
  [17] .text             PROGBITS         000000000d5a2000  0d3a2000
       000000001e55944e  0000000000000000  AX       0     0     256
  [18] .init             PROGBITS         000000002bafb450  2b8fb450
       0000000000000017  0000000000000000  AX       0     0     4
  [19] .fini             PROGBITS         000000002bafb468  2b8fb468
       0000000000000009  0000000000000000  AX       0     0     4
  [20] .never_hugify     PROGBITS         000000002bafb480  2b8fb480
       0000000000000db3  0000000000000000  AX       0     0     16
  [21] text_env          PROGBITS         000000002bafc240  2b8fc240
       0000000000002e28  0000000000000000  AX       0     0     16
  [22] .plt              PROGBITS         000000002baff070  2b8ff070
       00000000000062b0  0000000000000000  AX       0     0     16
  [23] .tdata            PROGBITS         000000002bb06000  2b906000
       0000000000000b20  0000000000000000 WAT       0     0     8
  [24] .tbss             NOBITS           000000002bb06b40  2b906b20
       0000000000007cb8  0000000000000000 WAT       0     0     64
  [25] .fini_array       FINI_ARRAY       000000002bb06b20  2b906b20
       0000000000000028  0000000000000000  WA       0     0     8
  [26] .init_array       INIT_ARRAY       000000002bb06b48  2b906b48
       0000000000008878  0000000000000000  WA       0     0     8
  [27] .data.rel.ro      PROGBITS         000000002bb0f3c0  2b90f3c0
       0000000000029ce0  0000000000000000  WA       0     0     64
  [28] .ctors            PROGBITS         000000002bb390a0  2b9390a0
       0000000000000010  0000000000000000  WA       0     0     8
  [29] .dynamic          DYNAMIC          000000002bb390b0  2b9390b0
       0000000000000340  0000000000000010  WA       9     0     8
  [30] .got              PROGBITS         000000002bb393f0  2b9393f0
       000000000001f040  0000000000000000  WA       0     0     8
  [31] .bss.rel.ro       NOBITS           000000002bb58440  2b958430
       0000000000000c40  0000000000000000  WA       0     0     32
  [32] .data             PROGBITS         000000002bb5a000  2b959000
       0000000000194188  0000000000000000  WA       0     0     4096
  [33] .tm_clone_table   PROGBITS         000000002bcee188  2baed188
       0000000000000000  0000000000000000  WA       0     0     8
  [34] .probes           PROGBITS         000000002bcee188  2baed188
       0000000000000002  0000000000000000  WA       0     0     2
  [35] .got.plt          PROGBITS         000000002bcee190  2baed190
       0000000000003168  0000000000000000  WA       0     0     8
  [36] .bss              NOBITS           000000002bcf1300  2baf02f8
       00000000005214f0  0000000000000000  WA       0     0     128
  [37] .nvFatBinSegment  PROGBITS         000000002c213000  2baf1000
       0000000000002850  0000000000000000   A       0     0     8
  [38] .nv_fatbin        PROGBITS         000000002c216000  2baf4000
       0000000052baed38  0000000000000000  WA       0     0     8
  [39] .comment          PROGBITS         0000000000000000  7e6a2d38
       00000000000001dc  0000000000000000  MS       0     0     1
  [40] .debug_aranges    PROGBITS         0000000000000000  7e6a2f20
       0000000001266c00  0000000000000000           0     0     16
  [41] .debug_info       PROGBITS         0000000000000000  7f909b20
       000000007b21de49  0000000000000000           0     0     1
  [42] .debug_abbrev     PROGBITS         0000000000000000  fab27969
       000000000179f365  0000000000000000           0     0     1
  [43] .debug_line       PROGBITS         0000000000000000  fc2c6cce
       00000000176954ac  0000000000000000           0     0     1
  [44] .debug_str        PROGBITS         0000000000000000  11395c17a
       0000000039dc32b0  0000000000000001  MS       0     0     1
  [45] .debug_ranges     PROGBITS         0000000000000000  14d71f430
       0000000026a2d930  0000000000000000           0     0     16
  [46] .debug_types      PROGBITS         0000000000000000  17414cd60
       000000000b211ff5  0000000000000000           0     0     1
  [47] .debug_loc        PROGBITS         0000000000000000  17f35ed55
       000000009ca80c7e  0000000000000000           0     0     1
  [48] .debug_macinfo    PROGBITS         0000000000000000  21bddf9d3
       000000000000151c  0000000000000000           0     0     1
  [49] .note.stapsdt     NOTE             0000000000000000  21bde0ef0
       0000000000001b3c  0000000000000000           0     0     4
  [50] .debug_macro      PROGBITS         0000000000000000  21bde2a2c
       0000000000040e6a  0000000000000000           0     0     1
  [51] .torch_deploy_pay PROGBITS         0000000000000000  21be23896
       0000000026ba5d28  0000000000000000           0     0     1
  [52] .symtab           SYMTAB           0000000000000000  2429c95c0
       00000000020ce0c8  0000000000000018          54   863985     8
  [53] .shstrtab         STRTAB           0000000000000000  244a97688
       000000000000025c  0000000000000000           0     0     1
  [54] .strtab           STRTAB           0000000000000000  244a978e4
       00000000070309c6  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
```

Reviewed By: shunting314

Differential Revision: D32596676

fbshipit-source-id: 1ab15b2d36422506d8f781d3bbc0c70c44bc3d91
2021-12-15 11:27:57 -08:00
Scott Wolchok
c6bcfb152d [PyTorch][easy] Move GlobalRecordFunctionCallbacks{,Entry} to cpp file (#68483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483

Doesn't need to be in the header.
ghstack-source-id: 145668417

Test Plan: CI

Reviewed By: chaekit

Differential Revision: D32477113

fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d
2021-12-15 09:38:51 -08:00
Mike Iovine
873585da2b [SR] Improve set_inputs (#69087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087
This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`:

1. Eliminate code duplication between rvalue/lvalue overloads
2. Add type checks
3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed.
4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.|
ghstack-source-id: 145599837

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32711705

fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b
2021-12-15 09:31:19 -08:00
Scott Wolchok
aeedd89d4e [PyTorch] RecordFunction: use SmallVector for ObserverContextList (#68412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412

These lists have the same size as CallbackHandles, so they should be the same container type.
ghstack-source-id: 145668416

Test Plan:
Run same command as previous diff.

Before: see previous diff, average about 0.46us
After: P467928077, average about 0.43us

Reviewed By: chaekit

Differential Revision: D32454856

fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56
2021-12-15 09:31:16 -08:00
Jane Xu
29914f55bf Skip print_test_stats checks for tests that use repeat_test_for_types (#69872)
Summary:
Once https://github.com/pytorch/pytorch/issues/69865 is fixed, this change should be undone.

This will avoid print_test_stats errors in CI, such as https://github.com/pytorch/pytorch/runs/4501145212?check_suite_focus=true (HUD view fc37e5b3ed)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69872

Reviewed By: dagitses, suo

Differential Revision: D33094446

Pulled By: janeyx99

fbshipit-source-id: 7378556d75ea94dd407a2bf9dda37b15c57014f7
2021-12-15 09:29:58 -08:00
Nikita Shulga
d71b8e1a8d More distutils.version.LooseVersion changes (#69947)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69947

Reviewed By: seemethere

Differential Revision: D33111996

Pulled By: malfet

fbshipit-source-id: e7d2cc4ed3e39452e809965e360b05f0b409ec0d
2021-12-15 08:07:36 -08:00
Alban Desmaison
6f9844693f Revert D32974907: [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops
Test Plan: revert-hammer

Differential Revision:
D32974907 (bf089840ac)

Original commit changeset: ba205e74b566

Original Phabricator Diff: D32974907 (bf089840ac)

fbshipit-source-id: e47838f3008ba014d884aef53460df654f0cf731
2021-12-15 05:46:49 -08:00
Alban Desmaison
87bc1f4ed8 Revert D33024528: [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict
Test Plan: revert-hammer

Differential Revision:
D33024528 (59000cff91)

Original commit changeset: 5c770c82c8f6

Original Phabricator Diff: D33024528 (59000cff91)

fbshipit-source-id: 7da6f421ef63f47fbffad8b3ad91f6a31d19d867
2021-12-15 05:45:29 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
43b8e833e9 Fix bug in aten::full signature in version_map.h to accurately reflect the current schema (#69860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860

Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full.

Test Plan: None

Reviewed By: gmagogsfm

Differential Revision: D33066985

fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14
2021-12-15 01:09:31 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
5c7817fd43 Add test operator in upgrader entry (#69427)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D32867984

Pulled By: tugsbayasgalan

fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c
2021-12-15 00:40:05 -08:00
soulitzer
47f11730ec Add testing for forward over reverse gradgrad (#69740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69740

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031727

Pulled By: soulitzer

fbshipit-source-id: 2bcba422b4bcea3bbc936d07ba45171a6531e578
2021-12-14 23:35:10 -08:00
soulitzer
d0fe7db1f6 Add formulas for distributions (#69690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69690

* #69558

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031726

Pulled By: soulitzer

fbshipit-source-id: 9ae461dc6043d48d5bb8c2bbaa266d06ad99f317
2021-12-14 23:35:07 -08:00
soulitzer
b399a4d7b9 Add some reduction forward AD formulas (#69661)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020601

Pulled By: soulitzer

fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e
2021-12-14 23:33:43 -08:00
Scott Wolchok
3b7fc0243c [PyTorch] Make TypePrinter take const Type& (#69412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412

TypePrinter does not need to take ownership of the Type.

This helps unblock the following diff to stop refcounting Type singletons.
ghstack-source-id: 145671619

Test Plan: CI

Reviewed By: suo

Differential Revision: D32858525

fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d
2021-12-14 23:13:03 -08:00
CodemodService FBSourceBuckFormatLinterBot
7a12b5063e [AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily arc lint --take BUCKFORMAT
Reviewed By: zertosh

Differential Revision: D33119794

fbshipit-source-id: ca327caf34560c0bba32511e57d5dc18b71bdfe1
2021-12-14 21:54:41 -08:00
Jerry Zhang
59000cff91 [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#69778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778

This PR extends fusion pattern support from simple sequence of ops to a simple
subgraph like conv - add
```
x - conv ---\
y ---------add ---- ouptut
```
where input x, y and output are observed/quantized

Test Plan:
```
python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33024528

fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef
2021-12-14 20:46:01 -08:00
Chen Lai
408283319a [Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731

1. Register upgrader function at loading stage
2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader

The interpreter log is :
```
RUNNING 0 STOREN 1 3
RUNNING 1 DROPR 1
RUNNING 2 LOAD 2
RUNNING 3 LOAD 3
RUNNING 4 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 5 LOAD 2
RUNNING 6 LOAD 3
RUNNING 7 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 8 MOVE 2
RUNNING 9 MOVE 3
RUNNING 10 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 11 TUPLE_CONSTRUCT 3
RUNNING 12 RET
```

The upgrader bytecode is:
```
(STOREN, 1, 2)
(LOAD, 1, 0)
(OP, 0, 0)
(JF, 3, 0)
(LOADC, 1, 0)
(JMP, 3, 0)
(LOAD, 2, 0)
(OP, 0, 0)
(STORE, 3, 0)
(MOVE, 3, 0)
(JF, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(OP, 1, 0)
(JMP, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(LOADC, 0, 0)
(OP, 2, 0)
(STORE, 4, 0)
(DROPR, 2, 0)
(DROPR, 1, 0)
(MOVE, 4, 0)
(RET, 0, 0)
```
ghstack-source-id: 145635622

Test Plan: describe in summary and CI

Reviewed By: iseeyuan

Differential Revision: D32092517

fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3
2021-12-14 19:13:12 -08:00
Chen Lai
9e4d60a552 [Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728

1. Check in upgrader_mobile.h and upgrader_mobile.cpp
2. Add test to parse all bytecode from upgrader_mobile.h
ghstack-source-id: 145635621

Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader'

Reviewed By: iseeyuan

Differential Revision: D32087295

fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6
2021-12-14 19:10:51 -08:00
Jerry Zhang
bf089840ac [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#69658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658

This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one

TODO: we can also move this to backend_config_dict folder

Test Plan:
regression fusion test
```
python test/test_quantization.py TestFuseFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32974907

fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1
2021-12-14 19:04:21 -08:00
Mike Iovine
102684b252 [SR] Fix stack/concat bug (#68777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777

Fixed some cases where negative dimensions were not handled correctly

* `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547).
* concat also needs to wrap its dim

Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Added new tests to cover this case

Reviewed By: hlu1

Differential Revision: D32604623

fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf
2021-12-14 16:26:27 -08:00
David Berard
ebc35a7ead [JIT] Enable freezing for sparse COO tensors (#69614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614

Previously sparse COO tensors were ignored during freezing, because
`tryInsertConstant` would fail during `freeze_module.cpp`, and because
hashes weren't implemented for COO tensor IValues.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32954620

Pulled By: davidberard98

fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831
2021-12-14 15:43:50 -08:00
Brian Hirsh
33363cea64 Revert D32498572: allow external backend codegen to be used without autograd kernels
Test Plan: revert-hammer

Differential Revision:
D32498572 (b83b6f7424)

Original commit changeset: 3e7159c633f6

Original Phabricator Diff: D32498572 (b83b6f7424)

fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535
2021-12-14 15:28:49 -08:00
Brian Hirsh
f6cad53443 Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels
Test Plan: revert-hammer

Differential Revision:
D32498569 (aa0cf68c17)

Original commit changeset: ebd932d042b9

Original Phabricator Diff: D32498569 (aa0cf68c17)

fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d
2021-12-14 15:27:24 -08:00
Brian Hirsh
0ef523633f Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen
Test Plan: revert-hammer

Differential Revision:
D32498570 (2e7a91c45f)

Original commit changeset: 0ce6a5614417

Original Phabricator Diff: D32498570 (2e7a91c45f)

fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289
2021-12-14 15:04:10 -08:00
Nikita Shulga
24ee1d13f6 Another attempt to fix version comparison check (#69939)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939

Reviewed By: atalman

Differential Revision: D33108135

Pulled By: malfet

fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c
2021-12-14 14:54:15 -08:00
Mike Guo
d4f8313497 Add low level torch.profiler.kineto_profile base class (#63302)
Summary:
Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper.

The PR include the following change:
1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile.
2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler.
3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler.
4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction.

With the refactoring, the responsibility boundary is clear and the new logic is simple to understand.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302

Reviewed By: albanD

Differential Revision: D33006442

Pulled By: robieta

fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690
2021-12-14 14:47:43 -08:00
kshitij12345
e8d5c7cf7f [nn] mha : no-batch-dim support (python) (#67176)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

* [x] Update docs
* [x] Tests for shape checking

Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests.

```
pytest test/test_modules.py -k _multih --durations=20
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini
plugins: hypothesis-6.23.2, repeat-0.9.1
collected 372 items / 336 deselected / 36 selected

test/test_modules.py ..............ssssssss..............                                                                                                                                                  [100%]

================================================================================================ warnings summary ================================================================================================
../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73
test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
  /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================================================== slowest 20 durations ==============================================================================================
8.66s call     test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64
2.02s call     test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64
1.89s call     test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64
1.01s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
0.51s call     test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64
0.46s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32
0.45s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64
0.44s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32
0.18s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64
0.17s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32
0.16s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64
0.11s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64
============================================================================================ short test summary info =============================================================================================
=========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s ===========================================================================
```

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176

Reviewed By: dagitses

Differential Revision: D33094285

Pulled By: jbschlosser

fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d
2021-12-14 13:21:21 -08:00
Shirong Wu
37ec99c0e4 Open source trt lowering workflow (#69381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381

Open source lowering workflow, related tools and tests.

Test Plan: CI

Reviewed By: 842974287

Differential Revision: D32815136

fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3
2021-12-14 13:00:21 -08:00
Nikita Shulga
930067d129 Build clang builds with -Werror (#69712)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69712

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32997002

Pulled By: malfet

fbshipit-source-id: 8ebb5a955f8ae2d3fb67bc70636a2b1d66010c84
2021-12-14 12:41:57 -08:00
hwangdeyu
c76c6e9bd3 [ONNX] Add BFloat16 type support when export to ONNX (#66788)
Summary:
- PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model.
- Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788

Reviewed By: jansel

Differential Revision: D32283510

Pulled By: malfet

fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a
2021-12-14 12:23:32 -08:00
Wanchao Liang
800a457b6f [shard] add ShardedOptimizer (#68607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607

This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor

The state_dict support will be a follow up diff
ghstack-source-id: 145532834

Test Plan: python test_sharded_optim.py

Reviewed By: pritamdamania87

Differential Revision: D32539994

fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
2021-12-14 12:15:20 -08:00
Brian Hirsh
457ba1dd3e Porting index_add to structured kernels, add an out variant (#65993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993

This PR attempts to port `index_add` to structured kernels, but does more than that:

* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615

Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)

~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~

Issue tracker: https://github.com/pytorch/pytorch/issues/55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32646426

fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
2021-12-14 11:57:13 -08:00
Brian Hirsh
9594a94d80 fix CompositeImplicitAutograd ops improperly labeled (#69863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69863

This reverts commit 41c344d460.

Test Plan: Imported from OSS

Reviewed By: albanD, soulitzer

Differential Revision: D33072958

Pulled By: bdhirsh

fbshipit-source-id: 3d3488f37986256986ab009d6f16476f29cff625
2021-12-14 11:47:07 -08:00
Nikita Shulga
269e92669a [c2] Remove unused private fields (#69709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709

Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which
contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997006

Pulled By: malfet

fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c
2021-12-14 11:31:08 -08:00
Nikita Shulga
fef9981998 Update run_test.py (#69920)
Summary:
Do not compare LooseVersion against string

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920

Reviewed By: atalman

Differential Revision: D33101166

Pulled By: malfet

fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5
2021-12-14 11:26:56 -08:00
Andrew Or
3e43c478a8 [Quant][fx] Lower reference conv[1-3]d module (#69228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69228

Implement lowering logic for reference conv modules,
similar to https://github.com/pytorch/pytorch/pull/65723.
ghstack-source-id: 145058198

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_lowering

Imported from OSS

Reviewed By: anjali411

Differential Revision: D32890743

fbshipit-source-id: 04f2500628c60b0fbc84d22705164215e190aeba
2021-12-14 11:23:39 -08:00
Kevin Tse
b67eaec853 [DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862

Fixes #69445

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan, ngimel

Differential Revision: D33068792

Pulled By: NivekT

fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969
2021-12-14 11:18:26 -08:00
Peter Bell
1188d89a1d TestMathBits: Call functions with original sample input values (#68947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947

`_test_math_view` currently calls the operator with different values
than those specified in the `SampleInput`. This is undesirable as it
could break mathematical properties required by the operator. Instead,
this calls `math_op_view(math_op_physical(sample.input))` to get a
view that represents the same value as the original input.

`test_neg_view` already did this by returning `torch._neg_view(-x)`
from `math_op_view` but this moves the handling into `_test_math_view`
to make it apply to all view op tests.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33064327

Pulled By: anjali411

fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8
2021-12-14 11:10:13 -08:00
Rui Zhu
1a299d8f1b Add support for transformer layout of masked_softmax (#69272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272

In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D).
This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory.
In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread.

This new layout is not currently supported in CPU yet.

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32605557

fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2
2021-12-14 10:51:58 -08:00
Brian Hirsh
2e7a91c45f make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498570

Pulled By: bdhirsh

fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0
2021-12-14 10:25:04 -08:00
Brian Hirsh
aa0cf68c17 allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498569

Pulled By: bdhirsh

fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04
2021-12-14 10:25:02 -08:00
Brian Hirsh
b83b6f7424 allow external backend codegen to be used without autograd kernels (#68529)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32498572

Pulled By: bdhirsh

fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca
2021-12-14 10:23:12 -08:00
Rick Weyrauch
8acd0a8b2f Allow row sizes to support int64/size_t. (#69303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792

Follow up to D32715453 (e60fd10659), allowing row size to be 64-bit.

Test Plan:
buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test
   buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test
   buck test mode/opt //caffe2/test:

Reviewed By: jspark1105, jianyuh

Differential Revision: D32768838

fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a
2021-12-14 10:09:08 -08:00
francescocastelli
2c9dd886af Modify torch.movedim to handle scalar as no-op (#69537)
Summary:
`torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537

Test Plan:
This code now works fine and res1 is a view of tensor
```
import torch

tensor = torch.rand(torch.Size([]))
res1 = torch.movedim(tensor, 0, 0)
```

Fixes https://github.com/pytorch/pytorch/issues/69432

Reviewed By: jbschlosser

Differential Revision: D33020014

Pulled By: albanD

fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007
2021-12-14 09:55:59 -08:00
Ivan Kobzarev
7503ec58b2 [nnc][fix] xnnpack ifdef (#69870)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69870

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33075061

Pulled By: IvanKobzarev

fbshipit-source-id: dd53ad8b7d0ff36a68f0864540d6f7dd2284f0e0
2021-12-14 09:50:24 -08:00
Donald Dong
f7294cd865 [Static Runtime] Skip ReplaceWithCopy when inputs have writters (#69819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819

We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them.

Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB.

Test Plan:
- Added test cases, including a one that is problematic before this diff
- CI

Reviewed By: mikeiovine

Differential Revision: D33052562

fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7
2021-12-14 09:39:49 -08:00
Nikita Shulga
07767569c9 Properly import LooseVersion (#69904)
Summary:
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
2021-12-14 09:28:19 -08:00
John Muradeli
fdcb78df38 print fix in lr_scheduler (#68338)
Summary:
`{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338

Reviewed By: jbschlosser

Differential Revision: D33063970

Pulled By: albanD

fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884
2021-12-14 09:05:19 -08:00