Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26227
In the previous implementation of composite lr, the lr_scale for each sub policy will be rewritten by the last lr_scale.
Due to another bug in unittest (where policy_lr_scale being the same for all sub policies), this bug was not detected by unittest...
Fix: add an additional field in CompositeLearningRateItem so that we store lr_scale values for all sub policies
If fix unittest, the error in previous implementation:
https://fburl.com/testinfra/ikdbnmey
With the fix,
https://fburl.com/testinfra/m694ehl1
Test Plan:
unittest
buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_composite_learning_rate_op
Reviewed By: chocjy, alex1o1o7cloud
Differential Revision: D17380363
fbshipit-source-id: 161e9cb71bb2ea7f0734a3361e270616057a08e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080
Will be used in c2 ctr_mbl_feed model to PyTorch conversion
Test Plan: Unit test
Reviewed By: yinghai
Differential Revision: D17337604
fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24357
SparseNormalize does not need to know the gradient value to the lookup table, only the indices of the embeddings that need to be updated. By removing this input, we allow SparseNormalize to be used alongside SparseAdagradFusion
Differential Revision: D16809919
fbshipit-source-id: cc19692ba4dea8854663ae1ed8cf9365e90c99bc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23679
Full Canary: https://fburl.com/fblearner/sa1pkpya
Add LambdaRank DCG Loss Option
* when use_idcg_normalization == true, regular LambdaRank with NDCG loss
* when use_idcg_normalization == false, gradient and loss functions are not normalized by idcg.
Differential Revision: D16605459
fbshipit-source-id: a16f071e69516974e48d27bef4ca179019ca4ae7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348
This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient.
Reviewed By: itomatik
Differential Revision: D16044736
fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21927
Add `OUTPUT_PROB` output to CTCBeamSearchDecoderOp to return a probability for each sequence.
Add argument to output top-k instead of top-1 decoded sequences.
Reviewed By: SuperIRabbit
Differential Revision: D15797371
fbshipit-source-id: 737ca5cc4f90a0bcc3660ac9f58519a175977b69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22279
This new operator is used for embedding table weight re-init. After we get the evicted indices, they will be the rows need reseting in embedding table. Then we can create a 1d tensor with default values, and apply this operator to copy the tensor to all evicted rows in embedding table
Will add gradient op in next diff
Reviewed By: itomatik
Differential Revision: D15709866
fbshipit-source-id: 2297b70a7326591524d0be09c73a588da245cc08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21230
tsia; we support empty tensor with this diff for reshape operator
Reviewed By: jerryzh168
Differential Revision: D15583356
fbshipit-source-id: 6d44c04e95ca3546509bfb12102e29c878f9a7c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20868
When `input_boxes_include_bg_cls` is false (which means `input_scores_fg_cls_starting_id` is 0), It doesn't map the class index of score currectly when sorting and limiting the detections over all classes after nms.
Reviewed By: newstzpz
Differential Revision: D15472706
fbshipit-source-id: dc1e808b63ad09fb4bd95acf866771bb3fa92d69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20802
Need this for sequence model
Reviewed By: dzhulgakov
Differential Revision: D15448529
fbshipit-source-id: cd5abe3b689fc0e02feff10faf8cd61c99369f4f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20502
Following D15307410 removing more floating point exceptions in unit tests
Reviewed By: hx89
Differential Revision: D15340930
fbshipit-source-id: 269fc75e0800bc9d39126767a0f3ca15cd8b0cad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20501
Fixing unit tests related to optimizer related operators and tests
Reviewed By: hx89
Differential Revision: D15307410
fbshipit-source-id: e5400c26e08f26191ee542fe6b02e0a69bc4e1ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20020
Add shape inference for LearningRate op. The output (lr) should have similar shape with input (iteration), but not the same type (float vs int).
Reviewed By: un-disclosed
Differential Revision: D15112300
fbshipit-source-id: 09969aefa15172a6f3c70cd9b2548e3020da5d7a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660
Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.
Reviewed By: BIT-silence
Differential Revision: D14984233
fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20044
We do not have a gating functor. This diff adds it. I'm leveraging existing learning rate op because there are other policies I'll need to use as a union together.
* Since there are other policy in LearningRateOp which will be used as a union, I chose to add it as a LearningRateOp.
* constantwarmup cannot do step function of nonzero first and zero later
* There are multiple uses for it,
* e.g. as a gating blob generator that is useful for turning off.
* e.g. as a learning rate switcher at certain iteration.
* For generalizability, no regulation or constraint is applied on the range of the values
* see figure below for illustration
{F157366621}
Reviewed By: ccheng16
Differential Revision: D15178229
fbshipit-source-id: 1e66e9a4bc1bfb946a57f8aefc97d8170f6be731
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.
See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133
Reviewed By: dzhulgakov
Differential Revision: D14883698
Pulled By: chandlerzuo
fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083
As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.
This is the C2 part of the change and requires change from Glow side to work e2e.
Reviewed By: rdzhabarov
Differential Revision: D14860582
fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6