Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable
Reviewed By: chocjy
Differential Revision: D5416489
fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.
Reviewed By: jackielxu
Differential Revision: D5551742
fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.
I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.
There are some details such as making the initializer more structured
that I need to finalize.
Reviewed By: kennyhorror
Differential Revision: D5435132
fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
Summary:
Feed team uses distributed training and wants to also use transfer learning.
Currently, transfer learning implements by overwriting the layer parameter
initializer. Therefore, PS builder can't infer correctly the parameter shape.
To fix this, add a field 'shape' in `layer_parameter` and set the shape if we
overwrite its initializer.
We also enforce the check of parameter shape between the original initializer
and the loaded blob. (this adds extra cost)
Differential Revision: D5520541
fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81
Summary:
Updated the semi-random layer model for multi-layer models using semi-random layers.
Notable changes:
- Input and outputs for the semi-random layer is now a Struct with "full" and "random" components
- Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer)
Reviewed By: chocjy
Differential Revision: D5496034
fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04
Summary:
The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer.
I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished.
In this diff, I've:
- Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers
- Updated unit tests
- Ran an end-to-end test, enabling multiple readers to test the fixed issue
Reviewed By: chocjy
Differential Revision: D5483372
fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd
Summary:
In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details.
To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px.
To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI.
I also add units test for DRelu. We check the shape of output and also do the numeric unit tests.
For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer.
Reviewed By: chocjy
Differential Revision: D5341464
fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630
Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id.
Reviewed By: kittipatv
Differential Revision: D5421667
fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5
Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer)
Reviewed By: kittipatv
Differential Revision: D5430889
fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb
Summary:
- (Split diff from Arc Cosine)
- Implemented [[ https://arxiv.org/pdf/1702.08882.pdf | Semi-Random Features ]] Layer
- Created a buck unit test for SRF Layer
Reviewed By: chocjy
Differential Revision: D5374803
fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae
Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3.
Reviewed By: chocjy
Differential Revision: D5418538
fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d
Summary:
We want it to be able to register children of layers who
are not direct children of ModelLayer.
This requires us to find subclasses of ModelLayer recursively.
Reviewed By: kittipatv, kennyhorror
Differential Revision: D5397120
fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.
Reviewed By: kittipatv
Differential Revision: D5386853
fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.
Reviewed By: kennyhorror
Differential Revision: D5385432
fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
Summary:
- Integrated RFF into the preprocessing workflow for dense features
- Developed Flow interface to input RFF parameters
- Created unit test for using RFF with sparseNN
Reviewed By: chocjy
Differential Revision: D5367534
fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf | Random Features for Large-Scale Kernel Machines]]
Reviewed By: chocjy
Differential Revision: D5318105
fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
Summary:
In some cases we don't want to compute the full FC during eval.
These layers allow us to compute dot product between
X and W[idx,:] where idx is an input, e.g., label.
Reviewed By: kittipatv
Differential Revision: D5305364
fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078
Summary:
similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment.
**Compare Run**
cat(128, 32)512-512 :
GPU 21138598 https://fburl.com/jpeod1pi
CPU 21138787 https://fburl.com/vma7225l
Reviewed By: dzhulgakov
Differential Revision: D5308789
fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85
Summary:
As described in T19378176 by kittipatv, in this diff, we fix the issue of __getitem__() of schema.List.
For example, given Map(int32, float) (Map is a special List), field_names() will return "lengths", "values:keys", & "values:values". "values:keys" and "values:values" are not accessible via __getitem__(). __getitem__() bypasses the values prefix and directly access the fields in the map. Other APIs (e.g., _SchemaNode & dataset_ops) expect "values:keys" and "values:values" as it simplifies traversal logic. Therefore, we should keep field_names() as is and fix __getitem__().
Reviewed By: kittipatv
Differential Revision: D5251657
fbshipit-source-id: 1acfb8d6e53e286eb866cf5ddab01d2dce97e1d2
Summary:
- Incorporated dropout layer to the sparseNN training and testing pipeline
- Integrated an advanced model options feature on Flow UI for users to specify dropout rate
- Created an end-to-end unit test to build and run a model with dropout
Reviewed By: chocjy
Differential Revision: D5273478
fbshipit-source-id: f7ae7bf4de1172b6e320f5933eaaebca3fd8749e
Summary:
truncate id list using the max length computed in compute meta, so that it has determined length,
which is useful for position weighted pooling method.
Reviewed By: sunwael
Differential Revision: D5233739
fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce
Summary:
The SparseToDense layer is essentially calling the SparseToDenseMask op.
This makes it impossible to call the functional layer with the true SparseToDense op.
This diff is to rename the layer.
Please let me know if I missed anything or you have a better name suggestion.
Differential Revision: D5169353
fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb
Summary:
If there're 2 SparseToDense layers that are densifying same IdList feature
it'll result in the situation, where we might export invalid input for the
prediction in input specs. This diff is changing the behavior to support to use
Alias to a new blob instead of passing things directly.
Reviewed By: dzhulgakov
Differential Revision: D5093754
fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7
Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem.
Reviewed By: xianjiec
Differential Revision: D5132606
fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61
Summary:
In Dper utility, add a function `load_parameters_from_model_init_options` to
allow init parameters from pretrained models
Reviewed By: xianjiec
Differential Revision: D4926075
fbshipit-source-id: 5ab563140b5b072c9ed076bbba1aca43e71c6ac5
Summary:
Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not
have this requirements.
Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet.
Differential Revision: D5019165
fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e