pytorch/caffe2/python/layers
Yangxin Zhong ed788ec780 Linearizable Label: Class Weights, Allow Missing Label, and Average by Batch Size (#29707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29707

In D17885977, Linearizable label (a multi-class classification) was implemented in MTML.

In this diff, we add several items for Linearizable label:

- Assigning different weights to each class through ```model_def.tasks[i].class_weights```.

  - This option is a dictionary, the keys of which are indices of the classes and the values of which are weights for each class.

  - For example, if a linearizable-label task has 4 classes and its ```class_weights = {"0": 1, "1": 0.1, "2": 0.1, "3": 0.01}```, it means that in the loss function of this task, we assign weight 1 to its first class, weight 0.1 to its second and third class, and weight 0.01 to its forth class. The index/order of classes follows the logic of linearizable label.

  - Note that when you assign different weights to different classes, you need to correct the calibration by setting an appropriate ```model_def.tasks[i].calibration.linearizable_class_weight```. Basically, the class weights in calibration should be the reciprocals of the class weights in loss function. So the ```calibration.linearizable_class_weight = {"0": 1, "1": 10, "2": 10, "3": 100}``` for the example above.

  - Example FBLearner job: f150763093

- We also support ```model_def.allow_missing_label_with_zero_weight``` for linearizable label, which will ignore those examples with first label missing, by assigning zero weights to them in loss function.

  - We need to set ```allow_missing_label_with_zero_weight = true``` to enable it.

  - Example FBLearner job: f150763093

- Last but not least, we update caffe2 operator ```SoftmaxWithLoss``` to support loss averaged by batch size.

  - We need to set ```model_def.tasks[i].loss.softmaxLoss.average_by_batch_size = true``` to enable it.

  - Previously, the loss was averaged by weight sum of examples in batch, which is still the default behavior now (when ```average_by_batch_size = null``` or ```average_by_batch_size = false```).

  - Without this new feature, the calibration will be incorrect when applying non-equal-weight training among different classes to a linearizable task.

  - Example FBLearner job with ```average_by_batch_size = true``` results in a correct calibration: f150763093

  - Example FBLearner job with ```average_by_batch_size = null``` results in an incorrect calibration: f150762990

Test Plan:
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_class_weights
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_zero_weight
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_average_by_batch_size

All tests passed.

full canary: https://fburl.com/fblearner/troznfgh

Reviewed By: chenshouyuan

Differential Revision: D18461163

fbshipit-source-id: aaf3df031406ae94f74e2e365b57e47409ef0bfe
2019-11-13 16:52:27 -08:00
..
__init__.py
adaptive_weight.py
add_bias.py
arc_cosine_feature_map.py
batch_huber_loss.py Add new regression loss function type to FBLearner (#21080) 2019-06-17 17:43:00 -07:00
batch_lr_loss.py Exponential decay of the weight of task loss (#27508) 2019-10-08 09:15:41 -07:00
batch_mse_loss.py Change dper3 loss module to match dper2 (#28265) 2019-10-18 10:08:38 -07:00
batch_normalization.py
batch_sigmoid_cross_entropy_loss.py
batch_softmax_loss.py Linearizable Label: Class Weights, Allow Missing Label, and Average by Batch Size (#29707) 2019-11-13 16:52:27 -08:00
blob_weighted_sum.py
bpr_loss.py Add BPR loss to TTSN (#24439) 2019-08-15 23:20:15 -07:00
bucket_weighted.py add feature name into module and update position weighted to match dper2 2019-10-14 08:06:19 -07:00
build_index.py
concat.py
constant_weight.py
conv.py
dropout.py
fc.py Integrate FC fp16 exporter into Dper2 (#26582) 2019-09-29 10:19:28 -07:00
fc_with_bootstrap.py Creating new layer FCWithBootstrap used in bootstrapping uncertainty approach (#29152) 2019-11-04 21:18:15 -08:00
fc_without_bias.py
feature_sparse_to_dense.py Return list of AccessedFeatures from get_accessed_features (#23983) 2019-08-14 10:50:27 -07:00
functional.py
gather_record.py
homotopy_weight.py
label_smooth.py
last_n_window_collector.py
layer_normalization.py
layers.py Return list of AccessedFeatures from get_accessed_features (#23983) 2019-08-14 10:50:27 -07:00
margin_rank_loss.py
merge_id_lists.py
pairwise_similarity.py
position_weighted.py
random_fourier_features.py
reservoir_sampling.py
sampling_train.py
sampling_trainable_mixin.py
select_record_by_context.py
semi_random_features.py
sparse_dropout_with_replacement.py hook up dropout sparse with replacement operator 2019-07-23 14:34:25 -07:00
sparse_feature_hash.py Refactor and expose metadata of tum_history layer for online prediction 2019-08-15 00:27:11 -07:00
sparse_lookup.py Fix predict net issue with LRU hash eviction 2019-10-14 16:08:14 -07:00
split.py Enable variable size embedding (#25782) 2019-09-09 22:08:32 -07:00
tags.py
uniform_sampling.py