Commit graph

1688 commits

Author SHA1 Message Date
Zhengxu Chen
d459e79500 [jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037

Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696

Test Plan: eyes.

Reviewed By: qihqi, tugsbayasgalan

Differential Revision: D32264616

fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
2021-12-16 13:11:46 -08:00
Jiawei Lv
b4c4a015d6 Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33163841

Original commit changeset: e262b6d8c80a

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8
2021-12-16 11:12:18 -08:00
Jiawei Lv
c80b5b8c8f Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33102715 (eb374de3f5)

Original commit changeset: 3816ff01c578

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29
2021-12-16 09:39:57 -08:00
David Berard
8c7f4a0d0b [tensorexpr] check for index out of bounds in ir_eval (#68858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858

when executing with ir_eval, check for index out of bounds.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32657881

Pulled By: davidberard98

fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922
2021-12-16 09:27:45 -08:00
jiej
76d282d447 Nvfuser code bump 12 5 (#69964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964

Things added in this PR that requires review:
1. cuLaunchCooperativeKernel driver API added
aten/src/ATen/cuda/detail/LazyNVRTC.cpp
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:
1. perf turning on codegen scheduler that improves performance.
2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:
1. aten::gelu with approximation
2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
2021-12-16 08:28:54 -08:00
Tristan Rice
eb374de3f5 Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923

Original commit changeset: fbaf2cc06ad4

Original Phabricator Diff: D32606547 (e61fc1c03b)

This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck.

Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

will add ciflow tags to ensure mac builds are fine

Reviewed By: aivanou

Differential Revision: D33102715

fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb
2021-12-15 22:51:43 -08:00
Taylor Robie
24bc3be146 [Profiler] Clean up profiler includes. (#69421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421

I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.

Test Plan: Unit tests and CI.

Reviewed By: aaronenyeshi, albanD

Differential Revision: D32865907

fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
2021-12-15 12:50:24 -08:00
Chen Lai
408283319a [Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731

1. Register upgrader function at loading stage
2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader

The interpreter log is :
```
RUNNING 0 STOREN 1 3
RUNNING 1 DROPR 1
RUNNING 2 LOAD 2
RUNNING 3 LOAD 3
RUNNING 4 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 5 LOAD 2
RUNNING 6 LOAD 3
RUNNING 7 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 8 MOVE 2
RUNNING 9 MOVE 3
RUNNING 10 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 11 TUPLE_CONSTRUCT 3
RUNNING 12 RET
```

The upgrader bytecode is:
```
(STOREN, 1, 2)
(LOAD, 1, 0)
(OP, 0, 0)
(JF, 3, 0)
(LOADC, 1, 0)
(JMP, 3, 0)
(LOAD, 2, 0)
(OP, 0, 0)
(STORE, 3, 0)
(MOVE, 3, 0)
(JF, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(OP, 1, 0)
(JMP, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(LOADC, 0, 0)
(OP, 2, 0)
(STORE, 4, 0)
(DROPR, 2, 0)
(DROPR, 1, 0)
(MOVE, 4, 0)
(RET, 0, 0)
```
ghstack-source-id: 145635622

Test Plan: describe in summary and CI

Reviewed By: iseeyuan

Differential Revision: D32092517

fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3
2021-12-14 19:13:12 -08:00
Chen Lai
9e4d60a552 [Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728

1. Check in upgrader_mobile.h and upgrader_mobile.cpp
2. Add test to parse all bytecode from upgrader_mobile.h
ghstack-source-id: 145635621

Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader'

Reviewed By: iseeyuan

Differential Revision: D32087295

fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6
2021-12-14 19:10:51 -08:00
Michael Suo
f565167fbd Revert D32606547: torch/monitor: add C++ events and handlers
Test Plan: revert-hammer

Differential Revision:
D32606547 (e61fc1c03b)

Original commit changeset: a00d0364092d

Original Phabricator Diff: D32606547 (e61fc1c03b)

fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56
2021-12-11 22:51:03 -08:00
Tristan Rice
e61fc1c03b torch/monitor: add C++ events and handlers (#68783)
Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.

This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan: buck test //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32606547

fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
2021-12-11 16:44:46 -08:00
Yanan Cao
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
Hao Lu
91d16cb633 [Jit] Fix schema of aten::split int[] version (#69745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745

Missed in D31935573 (6b44e75f6b).

Reviewed By: d1jang

Differential Revision: D31889867

fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756
2021-12-10 02:33:36 -08:00
Nikita Shulga
3bb20ae49f Make c10d tests -Werror clean (#69703)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32997001

Pulled By: malfet

fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2
2021-12-09 22:10:04 -08:00
Ivan Kobzarev
7dba88dfdb [nnc][quant] Fix quantized concat (#69596)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32941108

Pulled By: IvanKobzarev

fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2
2021-12-09 18:55:32 -08:00
Peter Bell
b2e79ed5ec Remove WindowsTorchApiMacro.h in favor of Export.h (#69585)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/68095

This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585

Reviewed By: mrshenli

Differential Revision: D32958594

Pulled By: albanD

fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061
2021-12-09 17:30:09 -08:00
Han Qi
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
Richard Barnes
afb742382a use irange for loops 10 (#69394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837991

fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32
2021-12-09 09:49:34 -08:00
Chen Lai
13faaff54c [Operator Versioning][Edge] Implement register function for upgrader (#67730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730

This pr implement the register function for upgrader so it can be used at loading stage
ghstack-source-id: 145170986

Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D32092518

fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36
2021-12-09 02:18:09 -08:00
Peter Bell
e279963eef Remove remaining THC code (#69039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872476

Pulled By: ngimel

fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31
2021-12-08 12:18:08 -08:00
Bin Bao
e8f4c9cc40 [LT] Upstream LazyView and view ops IR Nodes (#69277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277

LazyView is the main class for tracking alias caused by view
ops. The corresponding IR classes for view ops are hand-written now, and
we can switch to code-gen them in future. For certain view ops, they
have a reverse IR class to perform inplace update in the backward
direction on a chain of alias ops.

As part of the future work, we will simplify the logic for LazyView once
the functionalization pass in core is ready to use.

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32820014

Pulled By: desertfire

fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256
2021-12-04 08:44:54 -08:00
Ramanpreet Nara
f587267dc7 Revert D31705359: use irange for loops 8
Test Plan: revert-hammer

Differential Revision:
D31705359 (17e5200441)

Original commit changeset: c9ea2fbc0f9c

fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14
2021-12-02 12:55:08 -08:00
Richard Barnes
17e5200441 use irange for loops 8 (#66743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705359

fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
2021-12-02 10:21:29 -08:00
Alban Desmaison
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
Han Qi
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
Jacob Szwejbka
291e56eda4 [Pytorch Edge] Update Black Box Api with operator versioning (#68678)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678

Test Plan: Ill update the unit test before land

Reviewed By: cccclai

Differential Revision: D32573603

fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d
2021-12-01 19:13:32 -08:00
Chen Lai
b9738e923e [Operator Versioning][Edge] Add old models and unittest (#67726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726

1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader.
```
_helper(mobile_module_v2, div_tensor_0_3)
_helper(current_mobile_module, torch.div)
```

2. Update the commented code accordingly.

Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders:
```
// Tensor x Tensor

test_versioned_div_tensor_v3

// Tensor x Scalar

test_versioned_div_scalar_float_v3
test_versioned_div_scalar_reciprocal_int_v3
test_versioned_div_scalar_inplace_float_v3

// Scalar x Scalar

test_versioned_div_scalar_scalar_v3

// Tensor x Tensor with out kwarg

test_versioned_div_tensor_out_v3

// Tensor x Tensor inplace

test_versioned_div_tensor_inplace_v3

// Tensor x Scalar inplace

test_versioned_div_scalar_inplace_int_v3

```
Note:
In this pr, per model, it includes the following test:
1. Model (with old op) load/run test will be in both cpp and python
2. Model (with old op) + upgrader test will be in python
Other tests considered adding:
1. per upgrader bytecode test
2. app level integration test
ghstack-source-id: 144422418

Test Plan: CI and the added unittest

Reviewed By: iseeyuan

Differential Revision: D32069653

fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9
2021-12-01 18:46:30 -08:00
Jiewen Tan
e6c435bf96 [LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064

This commit upstreams helpers for converting a c10::Device to
BackendDevice and vice versa.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten

Reviewed By: wconstab

Differential Revision: D32732607

Pulled By: alanwaketan

fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b
2021-12-01 12:15:32 -08:00
Scott Wolchok
1d84d8c5d8 [PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410

First step toward not heap-allocating a string in RecordFunction::before() every time
ghstack-source-id: 144287654

Test Plan: CI

Reviewed By: chaekit

Differential Revision: D32453847

fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87
2021-11-30 13:20:08 -08:00
Joel Schlosser
8fef7c09f5 Remove finput from slow2d signatures (#68896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32655874

Pulled By: jbschlosser

fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5
2021-11-30 09:47:24 -08:00
Jiewen Tan
0cdeb586ae [LTC] Upstream some utilities (#69046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046

This commit upstreams utilities including ExceptionCleanup, MaybeRef,
Iota, ToVector, ToOptionalVector and GetEnumValue.

Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.*

Reviewed By: wconstab, Chillee

Differential Revision: D32709090

Pulled By: alanwaketan

fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714
2021-11-30 02:44:02 -08:00
Mikhail Zolotukhin
75ce040620 [TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756

That fixes some warnings in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600952

Pulled By: ZolotukhinM

fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98
2021-11-30 00:06:34 -08:00
Vinnam Kim
7b701ce2d4 Add set_to_none option to C++ API (#68801)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68167.

Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801

Reviewed By: mruberry

Differential Revision: D32625239

Pulled By: jbschlosser

fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82
2021-11-29 08:42:39 -08:00
Bin Bao
787ded5103 Add lazy::Shape::numel() (#68314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314

Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions).  This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts.

Test Plan: add unit tests

Reviewed By: alanwaketan

Differential Revision: D32409138

fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2
2021-11-29 08:38:09 -08:00
Han Qi
959cb03132 Populate operator_input_sizes_ (#68542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D32508159

fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8
2021-11-23 12:18:06 -08:00
Tristan Rice
758d7dea9c torch.monitor - Initial C++ Stats (#68074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074

This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30

This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.

This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.

Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
  After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).

Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.

Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

CI

Reviewed By: kiukchung

Differential Revision: D32266032

fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a
2021-11-18 21:46:23 -08:00
Hongyi Jia
146a7f68e2 Enable desync root cause analysis for NCCL (#68310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310

Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling.

Test Plan:
Standalone test
* Typical desync - P467288969
* Mismatched collectives - P467288916
* Mismatched broadcast size - P467288873

DDP benchmark
* DDP benchmark desync - P467433483, P467520195

No perf regression:
* w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs
* w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs

Reviewed By: mingzhe09088

Differential Revision: D32348647

fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a
2021-11-17 20:29:03 -08:00
Han Qi
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
jjsjann123
0dc3f829d9 Nvfuser code bump 11 5 (#67943)
Summary:
nvfuser code update:
1. Tuning heuristics on schedulers for reduction/normalization kernels;
2. bfloat16 on IO tensor support;
3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last;
4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`.

Things that are reverted from our local branch:
1. changes on some entries in autodiff
2. aten::gelu with approximation
3. native_dropout(_backward)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943

Reviewed By: ngimel

Differential Revision: D32288709

Pulled By: dzhulgakov

fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1
2021-11-17 01:22:17 -08:00
Raghavan Raman
2fd468e5f8 [jit] Set the graph input types before interpreting the graph during tracing (#68242)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D32382958

Pulled By: navahgar

fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f
2021-11-15 15:44:32 -08:00
Mike Iovine
c697eeba72 [JIT] Combine concat nodes where possible (#67000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000

See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context.

This new JIT optimization transforms patterns like this:
```
%inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c)
%concat.1 : Tensor = aten::cat(%inputs, %dim)
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
into this:
```
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
(it can do this for chains of `aten::cat` longer than 2 as well)

A few conditions have to hold:
1.  The `dim`s have to match.
2. `inputs.1` and `inputs.2` cannot be mutated

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt`

Reviewed By: d1jang

Differential Revision: D31819491

fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba
2021-11-15 12:02:45 -08:00
Mikhail Zolotukhin
e511a7a5b4 [TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277

Differential Revision:
D32400553
D32400553

Test Plan: Imported from OSS

Reviewed By: saketh-are, priyaramani

Pulled By: ZolotukhinM

fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e
2021-11-13 00:50:57 -08:00
Will Constable
6ddaf3bd37 [LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154

Test Plan: added a basic test; cover more by using lazy_tensor_staging tests

Reviewed By: Krovatkin, alanwaketan

Differential Revision: D32224303

fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914
2021-11-12 12:57:20 -08:00
Will Constable
dc24503a89 Fix Hash(c10::Scalar), account for garbage data in union (#68201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201

Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct.

Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases.  Hash() should only read the bytes corresponding to the currently active type.

Test Plan: Added new unit tests.  Verified HashTest.Scalar failed with the original Hash() impl and then fixed.

Reviewed By: alanwaketan

Differential Revision: D32367564

fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b
2021-11-12 07:20:08 -08:00
Howard Huang
7b376bf844 Remove ProcessGroup from TensorPipeAgent initialization (#68128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128

Reland of D31762735 (0cbfd466d2).

This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler.

I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls.

Test Plan:
rpc_pickler_test file:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx

rpc_pickler stress test:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results

Reviewed By: mrshenli

Differential Revision: D32316077

fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4
2021-11-11 12:28:55 -08:00
Martin Yuan
bd5f33f91e demo backend decoupled from operators (#66100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100

A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose.

Test Plan: Imported from OSS

Reviewed By: pavithranrao

Differential Revision: D31384614

Pulled By: iseeyuan

fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42
2021-11-11 10:26:17 -08:00
Will Constable
d6e6064efc [LT] Upstream backend interfaces (#67927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927

BackendData - represents 'tensor data' in opaque backend storage
LoweringContext - interface for performing backend-specific IR lowering
BackendImplInterface - interface for lazy tensors backends to implement

Reorgs backend-related files into lazy/backend subdir

includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master.

Test Plan: used by lazy_tensor_staging branch

Reviewed By: desertfire

Differential Revision: D32142032

fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5
2021-11-10 12:55:31 -08:00
Jiewen Tan
6011c35a79 [LTC] Upstream class BackendDevice (#68027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027

This commit upstreams class BackendDevice to the master, which is a backend
specific representation of the actual hardware, for instances, CPU, GPU, or
TPU.

This concept is important for backend like XLA where it needs to tell the
actual hardware type from the c10::DeviceType::Lazy virtual device during
both IR constructions and lowerings.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.*

Reviewed By: wconstab

Differential Revision: D32261838

Pulled By: alanwaketan

fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc
2021-11-10 07:05:43 -08:00
Bin Bao
a027551358 [LT] Merge cache.h (#67929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929

1. Write a node-hash based unit test for Cache
2. Replace CHECK with TORCH_CHECK in IrUtil

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32246134

Pulled By: desertfire

fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0
2021-11-09 12:02:02 -08:00
Bin Bao
a473417076 [LT] Merge permutation_util into master (#67766)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766

Test Plan: `build/bin/test_lazy`

Reviewed By: wconstab

Differential Revision: D32147676

Pulled By: desertfire

fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76
2021-11-09 12:00:39 -08:00