Commit graph

2846 commits

Author SHA1 Message Date
pbialecki
7b85adf20f Add back pycuda.autoinit to test_pt_onnx_trt (#51106)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51105 by adding back the `import pycuda.autoinit`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51106

Reviewed By: mingzhe09088

Differential Revision: D26086808

Pulled By: heitorschueroff

fbshipit-source-id: 88d98796c87a44cedaa1f6666e9f71a424293641
2021-01-27 07:10:11 -08:00
Arindam Roy
09b896261c Skip test_lc_1d for ROCM (#50964)
Summary:
The test is flaky on ROCM when deadline is set to 1 second. This is affecting builds as it is failing randomly.
Disabling for now.

Signed-off-by: Arindam Roy <rarindam@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50964

Reviewed By: houseroad

Differential Revision: D26049370

Pulled By: BIT-silence

fbshipit-source-id: 22337590a8896ad75f1281e56fbbeae897f5c3b2
2021-01-25 11:43:37 -08:00
Lu Fang
f32b10e564 [BE] Fix the broken test caffe2/caffe2/python:lazy_dyndep_test - test_allcompare (#50696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50696

set no deadline for test_alklcompare

Test Plan: buck test mode/dev //caffe2/caffe2/python:lazy_dyndep_test -- --exact 'caffe2/caffe2/python:lazy_dyndep_test - test_allcompare (caffe2.caffe2.python.lazy_dyndep_test.TestLazyDynDepAllCompare)' --run-disabled

Reviewed By: hl475

Differential Revision: D25947800

fbshipit-source-id: d2043f97128e257ef06ebca9b68262bb1c0c5e6b
2021-01-18 16:21:06 -08:00
Lu Fang
1fdc35da2c [BE] Fix the broken test -- caffe2/caffe2/python:hypothesis_test - test_recurrent (#50668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50668

GPU initialization sometimes is slow

Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled

Reviewed By: hl475

Differential Revision: D25939037

fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086
2021-01-17 21:21:38 -08:00
Zhijing Li
05542f6222 EMA op (#50393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50393

Exponential Moving Average

Usage:

add ema_options in adagrad optimizer. For details, plz refer to the test workflow setting.

if ema_end == -1, it means ema will never end.

Test Plan:
buck test caffe2/caffe2/fb/optimizers:ema_op_optimizer_test

buck test caffe2/caffe2/fb/optimizers:ema_op_test

f240459719

Differential Revision: D25416056

fbshipit-source-id: a25e676a364969e3be2bc47750011c812fc3a62f
2021-01-13 08:58:01 -08:00
Hugo van Kemenade
473e78c0fa Remove redundant code for unsupported Python versions (#49486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486

Remove code for Python 3.5 and lower.

There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579

Reviewed By: zou3519

Differential Revision: D24453571

Pulled By: ezyang

fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e
2021-01-06 12:45:46 -08:00
Richard Barnes
9945fd7253 Drop unused imports from caffe2/python (#49980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49980

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727359

fbshipit-source-id: c4f60005b10546423dc093d31d46deb418352286
2021-01-05 13:17:46 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
skyline75489
46b83212d1 Remove unused six code for Python 2/3 compatibility (#48077)
Summary:
This is basically a reborn version of https://github.com/pytorch/pytorch/issues/45254 .

Ref: https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48077

Reviewed By: ngimel

Differential Revision: D25687042

Pulled By: bugra

fbshipit-source-id: 05f20a6f3c5212f73d0b1505b493b720e6cf74e5
2020-12-22 18:07:08 -08:00
Taylor Robie
faf6032945 Remove deadlines for Caffe2 hypothesis_test when running on GPU. (#49591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49591

A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test.

This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy.

CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness.

Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.)

Reviewed By: ngimel

Differential Revision: D25632981

fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a
2020-12-18 10:00:24 -08:00
Andrey Malevich
f5a26a554b [C2] Revive unsafe CoalesceOp (#49402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49402

In cases of NCCLAllReduce operations there could be non-trivial overhead for
launching cooperative kernels (especially in case of async execution of
different parts of the model). This diff is reviving this operator to make it
possible to fuse multiple operations into a single kernel.

Test Plan:
Unit-test.
Used in a later diff.

Reviewed By: xianjiec

Differential Revision: D25531206

fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc
2020-12-17 04:31:29 -08:00
Andrey Malevich
46debe7f23 [DPER] Introduce barrier operation to force synchronization of threads in async execution (#49322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49322

In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.

This operator allows to address these issues by introducing extra explicit dependencies between ops.

Test Plan:
Unit-test/
E2E testing in the future diffs.

Reviewed By: xianjiec

Differential Revision: D24933471

fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567
2020-12-15 16:13:42 -08:00
Newsha Ardalani
0fb58d76a1 Support ArgMin in c2_pt_converter
Summary:
+ Add ArgMin support to Caffe2 to PyTorch converter
+ Using hypothesis to parameterize different conditions for test

Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test

Reviewed By: houseroad

Differential Revision: D25016203

fbshipit-source-id: 94489fcf1ed3183ec96f9796a5b4fb348fbde5bc
2020-12-05 16:35:34 -08:00
Rahul Manghwani
142b21fd44 Add SparseLengthsSum4BitRowwiseSparse in c2_pt_converter (#48240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48240

Adds the support for converting the SparseLengthsSum4BitRowwiseSparse operator from caffe2 to pytorch as a part of c2_pt_converter

Test Plan:
Added a unit tested

buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test

Tests Passed :
https://our.intern.facebook.com/intern/testinfra/testrun/2251799856412296

Reviewed By: houseroad

Differential Revision: D25067833

fbshipit-source-id: 45cbc331ca35bee27e083714e65a1e87a2a2d2e0
2020-12-04 14:16:25 -08:00
Tristan Rice
dc7d8a889e caffe2: refactor context to allow being typed (#48340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48340

This changes the context managed classes from using a decorator to define them to using inheritance. Inheritance allows the python static type checking to work correctly.

```
context.define_context()
class Bar(object): ...

context.define_context(allow_default=True)
class Foo(object): ...
```

becomes
```
class Foo(context.Managed): ...

class Bar(context.DefaultManaged): ...
```

Behavior differences:
* arg_name has been removed since it's not used anywhere
* classes need to call `super()` in `__enter__/__exit__` methods if they override (none do)

This also defines a context.pyi file to add types for python3. python2 support should not be affected

Test Plan:
ci

  buck test //caffe2/caffe2/python:context_test //caffe2/caffe2/python:checkpoint_test

Reviewed By: dongyuzheng

Differential Revision: D25133469

fbshipit-source-id: 16368bf723eeb6ce3308d6827f5ac5e955b4e29a
2020-11-30 18:31:14 -08:00
Frank Seide
29f0e1e2ce Fused8BitRowwiseQuantizedToFloat operator support (#48407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48407

T79817692: Fused8BitRowwiseQuantizedToFloat operator support for c2_pt_converter.

Also refactored some repeated code from the existing test functions. (Initial commit only has refactoring.)

Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test

Reviewed By: bugra

Differential Revision: D25069936

fbshipit-source-id: 72f6a845a1b4639b9542c6b230c8cd74b06bc5a0
2020-11-30 17:11:39 -08:00
Xiaodong Wang
d386d3323f [dper] supress excessive msg (#48404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48404

On bento this is printing a lot of msgs like (see N408483 if you're an internal user)
```
W1123 120952.322 schema.py:811] Scalar should be considered immutable. Only call Scalar.set() on newly created Scalar with unsafe=True. This will become an error soon.
```
And it's ignoring the log level I set at global level. Removing this line unless this is super important.

Test Plan: build a local dper package and verify

Differential Revision: D25163808

fbshipit-source-id: 338d01c82b4e67269328bbeafc088987c4cbac75
2020-11-30 14:55:52 -08:00
shubhambhokare1
bdf360f9f2 [ONNX] Update onnx submodule (#47366)
Summary:
Update onnx submodule to 1.8 release

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47366

Reviewed By: hl475

Differential Revision: D24968733

Pulled By: houseroad

fbshipit-source-id: 2f0a3436ab3c9380ed8ff0887a483743c1209721
2020-11-30 00:05:46 -08:00
Tristan Rice
6eaf1e358c caffe2/core.Net: is_external_input rebuild lookup tables when necessary
Summary: is_external_input doesn't check if the lookup tables are valid. Calling .Proto() should invalidate all lookup tables and have them rebuilt on call to any methods depending on them. This adds this check to is_external_input.

Test Plan: internal unit tests

Reviewed By: dzhulgakov, esqu1

Differential Revision: D25100464

fbshipit-source-id: d792dec7e5aa9ffeafda88350e05cb757f4c4831
2020-11-20 10:53:24 -08:00
Xiaomeng Yang
2039ff3fbb [Caffe2] Optimize MishOp on CPU (#48212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48212

Optimize MishOp on CPU

Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:activation_ops_test -- "mish"

Reviewed By: houseroad

Differential Revision: D25071304

fbshipit-source-id: fe94bfab512188d60412d66962983eff4f37bc07
2020-11-19 14:17:27 -08:00
Scott Wolchok
4c9eb57914 [PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023

DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know.
ghstack-source-id: 116901430

Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect

Reviewed By: dzhulgakov

Differential Revision: D24605460

fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2
2020-11-18 19:39:40 -08:00
Tristan Rice
b10d6c6089 [caffe2] cache NextName indexes for faster name generation (#47768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47768

This stores the next ID for a given NextName(prefix, output_id) so repeated calls to NextName are significantly faster. This accounts for ~65% of time spent for large models.

Test Plan:
buck test //caffe2/caffe2/python/...

will launch canary job before landing to ensure no regressions + confirm speedup

Reviewed By: dzhulgakov

Differential Revision: D24876961

fbshipit-source-id: 668d73060d800513bc72d7cd405a47d15c4acc34
2020-11-17 12:24:00 -08:00
Ankur Singla
549ef1d668 [caffe][memonger] Extend operator schema check to dag memonger (#48021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48021

Extending operator schema check for simple memonger to dag memonger as well. As part of this a fix is being made to handle inplace ops (having at least one output name same as input blob). Earlier all the output blobs from ops were being treated as shareable but it failed assertion of external input blobs with the same name not allowed to share.

Test Plan: Added corresponding unit tests

Reviewed By: hlu1

Differential Revision: D24968862

fbshipit-source-id: b6679a388a82b0d68f65ade64b85560354aaa3ef
2020-11-16 19:17:55 -08:00
Ankur Singla
f743b5639a [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger (#47718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47718

Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs.

As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error.

Test Plan: Added corresponding tests in memonger_test.py .  Could not find unit tests in c++ version of memonger.

Reviewed By: hlu1

Differential Revision: D24872010

fbshipit-source-id: 1dc99b2fb52b2bc692fa4fc0aff6b7e4c5e4f5b0
2020-11-13 14:12:07 -08:00
Jonathan Kwok
a3e08e5344 Support ReduceSum in c2_pt_converter (#47889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47889

Adds support for converting the [caffe2 ReduceSum](https://caffe2.ai/docs/operators-catalogue#reducesum) operator to torch.
ghstack-source-id: 116580127

Test Plan:
buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test : [results](https://our.intern.facebook.com/intern/testinfra/testrun/6755399466095119)

    ✓ ListingSuccess: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - main (60.273)
    ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_sub_op (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.119)
    ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_layer_norm_conversion (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.404)
    ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_local_model_conversion (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (101.966)
    ✓ Pass: caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test - test_reduce_sum (caffe2.torch.fb.model_transform.c2_convert.c2_pt_converter_test.C2PTConverterTest) (114.896)

Reviewed By: bugra

Differential Revision: D24925318

fbshipit-source-id: 3f3b791eff1b03e8f5adee744560fe8bc811c659
2020-11-13 12:02:58 -08:00
Gary Zheng
f1babb00f0 [caffe2] Fix ListWithEvicted _pprint_impl wrongly printing _evicted_values (#47881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47881

ListWithEvicted's _pprint_impl was accidentally printing _items before this change.

Reviewed By: dzhulgakov

Differential Revision: D24928521

fbshipit-source-id: 0d7940719b4a27defbaae3b99af104d7fe7b5144
2020-11-13 09:23:10 -08:00
Alberto Alfarano
59e96c55f7 Support MatMul in c2_pt_converter
Summary: Added the MatMul operator for caffe2

Test Plan: buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test

Reviewed By: bugra

Differential Revision: D24920937

fbshipit-source-id: 7ba09ba0439cb9bd15d6a41fd8ff1a86d8d11437
2020-11-12 20:56:58 -08:00
Peiyao Zhou
4078f44668 [TB][embedding supporting] Modify histogram to accept multipy types to skip Castop and avoid OOMing in Castop
Summary: To support min/max/mean/std, SummarizeOp need to skip size checking (similar to the LpNorm error mentioned above) and accept multiple types

Test Plan:
unit test:
`buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test`

https://our.intern.facebook.com/intern/testinfra/testrun/1407375057859572

`buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test --stress-runs 1000`

https://our.intern.facebook.com/intern/testinfra/testrun/2533274832166362

Reviewed By: cryptopic

Differential Revision: D24605507

fbshipit-source-id: fa08372d7c9970083c38abd432d4c86e84fb10e0
2020-11-11 12:03:54 -08:00
Richard Zou
17c58720fe Revert D24346771: [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger
Test Plan: revert-hammer

Differential Revision:
D24346771 (5882f2e540)

Original commit changeset: ad2dd2e63f3e

fbshipit-source-id: 90346f08c890eebe71f068748a8e24e4db88c250
2020-11-10 12:11:22 -08:00
Ankur Singla
5882f2e540 [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger
Summary:
Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. Here is one reference part0 predict net with AsyncIf ops: https://www.internalfb.com/intern/paste/P145812115/

As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error.

Reviewed By: hlu1

Differential Revision: D24346771

fbshipit-source-id: ad2dd2e63f3e822ad172682f6d63f8474492255d
2020-11-10 09:35:28 -08:00
Gary Zheng
8b3f1d1288 [caffe2] Add __slots__ to all classes in schema.py (#47541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47541

The profiler has guided us to `schema.py`. Since these `Field`s are used everywhere and in huge quantities, we can easily make some optimizations system wide by adding `__slots__`.

From StackOverflow, benefits include:

* faster attribute access.
* space savings in memory.

Read more: https://stackoverflow.com/a/28059785/

Reviewed By: dzhulgakov

Differential Revision: D24771078

fbshipit-source-id: 13f6064d367440069767131a433c820eabfe931b
2020-11-09 16:16:28 -08:00
Gary Zheng
4c52a56c40 [caffe2] Properly call super init in schema.py (#47542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47542

The previous way of doing `Field.__init__(self, [])` is just wrong. Switching to Python2 compatible way: `super(ObjectName, self).__init__(...)`

Reviewed By: dzhulgakov

Differential Revision: D24771077

fbshipit-source-id: d6798c72090c0264b6c583602cae441a1b14587c
2020-11-09 15:02:22 -08:00
Gary Zheng
4a58f35bef [caffe2] Fix duplicate name bug in Net.AddExternalInput (#47530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47530

`Net.AddExternalInput` should raise if there are duplicate names. The previous code would only raise if the addition of duplicates was in separate calls, but not if it was in the same call.

Test Plan:
Added two new regression tests

```
    ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.622)
    ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicate (caffe2.caffe2.python.core_test.TestExternalInputs) (9.639)
    ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithoutBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.883)
    ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicateInSameCall (caffe2.caffe2.python.core_test.TestExternalInputs) (10.153)
```

Test trained 2 models. No issues

f230755456
f230754926

Reviewed By: dzhulgakov

Differential Revision: D24763586

fbshipit-source-id: c87088441d76f7198f8b07508b2607aec13521ed
2020-11-09 08:30:58 -08:00
Shiyan Deng
c19eb4ad73 BoxWithNMSLimit support int batch_splits input (#47504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47504

allow int type input of `batch_splits`

Test Plan:
```
buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_box_with_nms_limits
```

Reviewed By: jackm321

Differential Revision: D24629522

fbshipit-source-id: 61cb132e792bddd8f9f1bca5b808f1a9131808f0
2020-11-07 00:27:51 -08:00
Gary Zheng
582e852fba [caffe2] Add unittests for schema.Field init (#47512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47512

I deleted the last line of `__init__` -- `self._field_offsets.append(offset)` -- and the unittests didn't fail.

So this diff is to improve test coverage.

Test Plan:
```
    ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetEmptyParent (caffe2.caffe2.python.schema_test.TestField) (8.225)
    ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetFieldOffsetsIfNoChildren (caffe2.caffe2.python.schema_test.TestField) (8.339)
    ✓ Pass: caffe2/caffe2/python:schema_test - testInitShouldSetFieldOffsets (caffe2.caffe2.python.schema_test.TestField) (8.381)
```

Reviewed By: dzhulgakov

Differential Revision: D24767188

fbshipit-source-id: b6ce8cc96ecc61768b55360e0238f7317a2f18ea
2020-11-06 13:27:58 -08:00
Bugra Akyildiz
c26c4690fe Add sub operator
Summary: Add sub operator for caffe2

Test Plan:
```
buck test //caffe2/torch/fb/model_transform/c2_convert:c2_pt_converter_test
```

Reviewed By: houseroad

Differential Revision: D24685090

fbshipit-source-id: 60d745065d01b634ebd3087e533d8b9ddab77a1f
2020-11-06 12:31:17 -08:00
Tristan Rice
47198e3208 [caffe2] improve core.Net cloning/init performance (24x for large models!) (#47475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47475

This improves the core.Net cloning/init performance by quite a bit. It makes set_input_record run in linear time instead of O(n) by checking the external_input map instead of regenerating the external inputs each time and then iterating over it.

Test Plan: unit tests + canary runs

Reviewed By: dzhulgakov

Differential Revision: D24765346

fbshipit-source-id: 92d9f6dec158512bd50513b78675174686f0f411
2020-11-06 11:34:12 -08:00
Yen-Jung Chang
6e22b6008d [MLF] Allow for computing prune quantile thresholds on absolute value of indicators in distributed-inference-compatible embedding LUT pruning (#46789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46789

1. Now `SelfBinningHistogram` can calculate the binning histogram using the absolute values from the given an array of values.
2. Update the invocation of `SelfBinningHistogram` in `post_training_prune`.

Test Plan:
1. [buck test caffe2/caffe2/python/operator_test:self_binning_histogram_test](https://www.internalfb.com/intern/testinfra/testconsole/testrun/6473924488326108/)
2. [buck test dper3/dper3_backend/delivery/tests:post_training_prune_test](https://www.internalfb.com/intern/testinfra/testconsole/testrun/2251799854023163/)

Reviewed By: hwangjeff

Differential Revision: D24494097

fbshipit-source-id: 95e47137b25746e686ef9baa9409560af5d58fc1
2020-11-02 11:31:31 -08:00
Basil Hosmer
f05b66b70d pass TypeMeta by value (#45026)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45026

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23802943

Pulled By: bhosmer

fbshipit-source-id: 81b06ef00bf8eb4375c0e0ff2032e03bd1d1188a
2020-10-30 10:14:17 -07:00
Bugra Akyildiz
eec201c138 Add last_n_window_collector
Summary:
Add `last_n_window_collector` as C2 supports and PyTorch currently does not have this operator: https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/caffe2/operators/last_n_window_collector.cc?lines=139

## Problem that we are solving

This operator works on multiple pieces of data and collects last `n` element that has been seen.

If you have the following pieces of data that has been passed around:

```
  [1, 2, 3, 4]
  [5, 6, 7]
  [8, 9, 10, 11]
```

for 3 times and the number of collector is given to be 6. The expected result is:

```
  [6, 7, 8, 9, 10, 11]
```
What this means is that, almost like we need a FIFO(First in First Out) mechanism where as we are passing this data through the collector, we will be pushing some other data at the end.

In this particular example, in the first pass(the data is `[1, 2, 3, 4]`) , we hold `[1, 2, 3, 4]` in the queue as our queue size is 6.

In the second pass(the data is `[5, 6, 7]`), we hold `[2, 3, 4, 5, 6, 7]` in the queue and since 1 is inserted the last, it will drop due to the size limitation of the queue.

In the third pass(the data is `[8, 9, 10, 11]`), we hold `[6, 7, 8, 9, 10, 11]` in the queue and `2,3,4,5` are dropped due the the size of the queue.

For multidimension case, when we have the following data:

```
  [[1, 2], [2, 3], [3, 4], [4, 5]]
  [[5, 6], [6, 7], [7, 8]]
  [[8, 9], [9, 10], [10, 11], [11, 12]]
```

and our queue size is 6.

In the first pass, we will have `  [[1, 2], [2, 3], [3, 4], [4, 5]]`
In the second pass, we will have `[2, 3], [3, 4], [4, 5]] [[5, 6], [6, 7], [7, 8]]`
In the third pass, we will have `[6, 7], [7, 8]] [[8, 9], [9, 10], [10, 11], [11, 12]]`

### The implementation

I am using FIFO queue in Python which is in the collections library. This accepts `maxlen` argument which can be used to set the size of the queue.

I am using last n indices of the tensor through list indices and in this operator, I am not doing copy.

In the test plan, I have both single dimension tensors as well as multi-dimension tensors.

### Benchmark
I used various different configurations and added a benchmark test. PyTorch implementation is much master than Caffe2 implementation:

#### CPU Benchmark
```
torch_response.median
0.00019254473969340324

caffe_response.median
0.00030233583599794657
```

#### GPU Benchmark

```
torch_response.mean
0.000081007429903838786

caffe_response.mean
0.00010279081099724863
```

Test Plan:
### For CPU:
```
buck test //caffe2/torch/fb/sparsenn:test
```

### For GPU:
- Used an on-demand machine and did the following commands:
```
jf get D24435544
buck test mode/opt  //caffe2/torch/fb/sparsenn:test
```
https://www.internalfb.com/intern/testinfra/testconsole/testrun/4222124688138052/

Reviewed By: dzhulgakov, radkris-git

Differential Revision: D24435544

fbshipit-source-id: 8193b4746b20f2a4920fd4d41271341045cdcee1
2020-10-30 02:35:54 -07:00
Brandon Lin
4a581ba6c2 Implement LengthsToOffsets operator in Caffe2 (#46590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46590

This operator is very similar to LengthsToRanges but doesn't pack the offsets next to the original lengths.

Reviewed By: yf225

Differential Revision: D24419746

fbshipit-source-id: aa8b014588bb22eced324853c545f8684086c4e4
2020-10-29 07:03:34 -07:00
Kunal Bhalla
18d273dc0e [RFC][LocalSession] Fix workspace type
Summary: I was reading/looking into how LocalSession works and realized that the workspace type being passed around was the bound function on TaskGroup instead of the actual type. This meant that all workspaces for localsession would always be global, because they'd never match the private workspace type.

Test Plan: <not sure, could use some suggestions>

Reviewed By: cryptopic

Differential Revision: D24458428

fbshipit-source-id: 0f87874babe9c1ddff25b5363b443f9ca37e03c1
2020-10-29 04:12:17 -07:00
Dmytro Dzhulgakov
115bbf9945 [caffe2] Disable running full grad check in tests by default
Summary:
We've been seeing a lot of Hypothesis timeouts and from profiling a few of the failing tests one of the contributing factors is really slow grad checker. In short, it launches the whole op for each of the input elements so the overall complexity is O(numel^2) at least.

This applies a very unscientific hack to just run grad check on the first and last few elements. It's not ideal, but it's better than flaky tests. One can still explicitly opt in with the env var.

Reviewed By: malfet

Differential Revision: D23336220

fbshipit-source-id: f04d8d43c6aa1590c2f3e72fc7ccc6aa674e49d2
2020-10-27 16:10:03 -07:00
Huan Gui
b5662ba0f0 [uhm][0/n] add cuda Mod Op (#46732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46732

as titled

Test Plan:
unittest

buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:mod_op_test

Reviewed By: xianjiec

Differential Revision: D24368100

fbshipit-source-id: 1232d22a67ac268986043911d548fa9d657470ec
2020-10-26 11:07:51 -07:00
Yunfan Zhong
e519fcd1aa Remap net name inside arg.n for AsyncIf operator
Summary: Similar to If operator, AsyncIf also contains nets in args. It needs the same handling.

Test Plan:
New unit test test_control_op_remap
`buck test caffe2/caffe2/python:core_test`

Also it worked end to end in prototype of dist bulk eval workflow f226680903

Reviewed By: yyetim

Differential Revision: D24451775

fbshipit-source-id: 50594e2ab9bb457329ed8da7b035f7409461b5f6
2020-10-23 10:41:06 -07:00
Alexander Grund
93719440b8 Replace map(lambda constructs (#46462)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal

Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462

Reviewed By: zou3519

Differential Revision: D24422343

Pulled By: ezyang

fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
2020-10-22 09:50:22 -07:00
Jeff Hwang
9b5197b763 [mlf][efficiency] add tensor inference function to last-n collector op (#46693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46693

title

Test Plan: unit tests

Reviewed By: hx89

Differential Revision: D23946770

fbshipit-source-id: f7c3d4a1b4ef3b0e5f56e5a9a30f5003ce9f40b0
2020-10-22 01:15:00 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
Jongsoo Park
c37baa9177 [caffe2] add concat benchmark (#46457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457

Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later.

Test Plan: .

Reviewed By: dskhudia

Differential Revision: D24345901

fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d
2020-10-16 08:48:42 -07:00
Nikita Shulga
84771fc64f [caffe2] Add 10s deadline for all Caffe2 hypothesis fuzz tests
Test Plan: CI

Reviewed By: walterddr

Differential Revision: D24298118

fbshipit-source-id: 2286c1e37ed9c43f404b888386c0bd4b0b6a55c6
2020-10-14 06:30:09 -07:00