Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413
LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.
Reviewed By: manojkris, xianjiec
Differential Revision: D9724988
fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291
This new operator will do the following:
Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:
1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)
Reviewed By: bddppq, chocjy
Differential Revision: D9013119
fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888
Add cuda version of SpatialBNOp also optimize SpatialBN on CPU
Reviewed By: houseroad
Differential Revision: D9512435
fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.
Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect
Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264
Differential Revision: D9656793
Pulled By: yf225
fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.
To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.
Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)
Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594
Reviewed By: ezyang
Differential Revision: D9370359
Pulled By: ajyu
fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955
Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.
Reviewed By: enosair
Differential Revision: D9499353
fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439
Update Im2Col related to make preparation for group conv in NHWC order.
Reviewed By: houseroad
Differential Revision: D9285344
fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395
Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images.
This diff generalizes them to 1D and 3D, and also add a unit test we didn't have.
Reviewed By: protonu
Differential Revision: D9261177
fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390
Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified.
* The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold.
* In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'.
Reviewed By: wat3rBro
Differential Revision: D9252726
fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389
Added some unit test for box_with_nms_limit_op.
Reviewed By: wat3rBro
Differential Revision: D9237860
fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731
Summary:
This operator implements b (1/2/4/8) bit stochastic quantization of a floating
matrix in a row-wise fashion. 8/b floating values are concatenated to a byte
and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629
Reviewed By: harouwu
Differential Revision: D8493264
fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905
This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate
Reviewed By: pjh5
Differential Revision: D9020606
fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826
Differential Revision: D9032840
Pulled By: bddppq
fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747
Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases.
Reviewed By: houseroad
Differential Revision: D8963635
fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581
Mostly to simplify code. Should also improve performance but order switch ops
don't take much time anyway.
Reviewed By: viswanathgs
Differential Revision: D8909766
fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643
Current map interface assumes float data type, which is not always correct.
Reviewed By: kennyhorror
Differential Revision: D8455784
fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598
The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.
Reviewed By: jerryzh168
Differential Revision: D8919799
fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594
When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.
Reviewed By: pjh5
Differential Revision: D8849732
fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403
In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits
within the images. For rotated boxes, this doesn't always make sense as there
could be multiple ways to clip a rotated box within an image boundary.
Moreover, clipping to a horizontal box means we leave out pixels of interest
potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a
specified `angle_thresh` tolerance).
Reviewed By: pjh5
Differential Revision: D8828588
fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299
Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.
I only implemented this on CPU so far.
Reviewed By: pjh5
Differential Revision: D8757381
fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385
The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output.
Reviewed By: bddppq
Differential Revision: D8820351
fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999
Closes https://github.com/pytorch/pytorch/pull/8999
Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case.
Reviewed By: pjh5
Differential Revision: D8627933
fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046