Summary:
This pull request contains changes for:
1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc
2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc
3. Modifying the tests to run convTranspose op using MIOpen engine
Differential Revision: D13055099
Pulled By: bddppq
fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865
Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function.
Reviewed By: itomatik
Differential Revision: D9891514
fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8
Summary:
Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators.
Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703
Reviewed By: dzhulgakov
Differential Revision: D12970810
Pulled By: ajyu
fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1
Summary:
I was hitting this error:
caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'
So, the assignment from int64_t to float loses some precision and because of that we overflow.
Reproduced this issue with this diff D12945013
Reviewed By: mlappelbaum, jdshi-fb
Differential Revision: D12927086
fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733
Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D10333829
fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554
D10233252 broke ROCM test.
We don't have group conv in NHWC for hip yet and this diff omits related tests.
Reviewed By: hyuen
Differential Revision: D12917880
fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428
Group conv in NHWC layout was enabled in CPU after D7547497.
In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py .
Reviewed By: BIT-silence
Differential Revision: D10233252
fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde
Summary:
Essentially makes cuDNN to think of those kernels like of Nx1 ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902
Reviewed By: BIT-silence
Differential Revision: D10852862
Pulled By: soumith
fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843
This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp.
The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation.
Reviewed By: houseroad
Differential Revision: D10453776
fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec
Summary:
This is mostly for reusing all the cudnn test cases in our python operator_tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278
Differential Revision: D10842592
Pulled By: bddppq
fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898
Summary:
TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers.
Converted the old test to a non-numba regular python op test.
Reviewed By: xw285cornell
Differential Revision: D10519910
fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736
This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720.
Reviewed By: houseroad
Differential Revision: D10416228
fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957
Summary: Added 2 years ago in D3665603, never used, kill it.
Reviewed By: ezyang
Differential Revision: D10421336
fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382
implement fp16-> (uint8 + scale and bias in fp32)
this is similar to fp32 rowwise quantization
we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways
Reviewed By: csummersea
Differential Revision: D10220463
fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390
Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes.
Reviewed By: mlappelbaum
Differential Revision: D10209812
fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.
After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389
Differential Revision: D10224267
Pulled By: yf225
fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
Summary:
Original commit changeset: f5614a5d2607
D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.
Reviewed By: orionr
Differential Revision: D10123245
fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349
Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case.
Reviewed By: jspark1105, ilia-cher
Differential Revision: D7218043
fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7
Summary:
Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594)
Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner.
I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase).
1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests.
2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests.
3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients.
4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object.
5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo.
I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do
```
settings(...)
given(...)
def test_my_stuff(...)
```
But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350
Reviewed By: houseroad
Differential Revision: D9693857
Pulled By: ajyu
fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413
LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.
Reviewed By: manojkris, xianjiec
Differential Revision: D9724988
fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291
This new operator will do the following:
Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:
1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)
Reviewed By: bddppq, chocjy
Differential Revision: D9013119
fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888
Add cuda version of SpatialBNOp also optimize SpatialBN on CPU
Reviewed By: houseroad
Differential Revision: D9512435
fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.
Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect
Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264
Differential Revision: D9656793
Pulled By: yf225
fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.
To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.
Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)
Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594
Reviewed By: ezyang
Differential Revision: D9370359
Pulled By: ajyu
fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955
Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.
Reviewed By: enosair
Differential Revision: D9499353
fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545