pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Long Jin	ef64a4f6b2	Add conv layer and layer tests Reviewed By: xianjiec Differential Revision: D5569206 fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d	2017-08-08 10:57:43 -07:00
Jacqueline Xu	a1bf14d8e6	Building new randomized sparse nn model Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable Reviewed By: chocjy Differential Revision: D5416489 fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d	2017-08-07 12:48:58 -07:00
Jiyan Yang	4b80ff89e2	Use softsign op for s=0 in arc-cosine feature map Summary: The current implementation for s=0 doesn't support backward pass. Switching to using pow op instead as a temporary solution. Reviewed By: jackielxu Differential Revision: D5551742 fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3	2017-08-03 23:35:11 -07:00
Wenlin Chen	adc5510ecb	dynamic embedding Summary: refactor get_categorical_limit Reviewed By: xianjiec Differential Revision: D5459389 fbshipit-source-id: 14a7e07394db52fb090c6923e341c34576fcb6d6	2017-08-03 00:33:18 -07:00
Jiyan Yang	a8695178aa	Adding parameter sharing API to Dper2 Summary: To achive this, I modified the blob name scheme defined in a layer. Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc within the same scope). Now I change it to scope/fc/w and scope/fc_auto_0/w. That is, we rely on the uniqueness of the scoped layer name to define names for blobs. I also overwrote the create_param method in LayerModelHelper to let it use the resolved name for blobs given the sharingparameter context. There are some details such as making the initializer more structured that I need to finalize. Reviewed By: kennyhorror Differential Revision: D5435132 fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455	2017-08-03 00:33:18 -07:00
Xiaolong Wang	82adbde878	pass layer_parameter shape to ps builder if cannot inferred from initializer Summary: Feed team uses distributed training and wants to also use transfer learning. Currently, transfer learning implements by overwriting the layer parameter initializer. Therefore, PS builder can't infer correctly the parameter shape. To fix this, add a field 'shape' in `layer_parameter` and set the shape if we overwrite its initializer. We also enforce the check of parameter shape between the original initializer and the loaded blob. (this adds extra cost) Differential Revision: D5520541 fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81	2017-07-31 16:04:23 -07:00
Jacqueline Xu	13569c9aa0	Fixing semi-random layer model for multi-layer models Summary: Updated the semi-random layer model for multi-layer models using semi-random layers. Notable changes: - Input and outputs for the semi-random layer is now a Struct with "full" and "random" components - Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer) Reviewed By: chocjy Differential Revision: D5496034 fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04	2017-07-27 15:25:19 -07:00
Bangsheng Tang	d8443b8ffa	BatchGatherOp Summary: 1. added BatchGatherOp and BatchGatherGradientOp 2. unit tests Reviewed By: xianjiec Differential Revision: D5443965 fbshipit-source-id: bdcbb7f9f91c55484372a4bdb1727ae6d49e2018	2017-07-27 10:17:42 -07:00
Jacqueline Xu	9bec54bbf1	Modify arc cosine feature map and semi random layers to initialize parameters as global constants Summary: The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer. I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished. In this diff, I've: - Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers - Updated unit tests - Ran an end-to-end test, enabling multiple readers to test the fixed issue Reviewed By: chocjy Differential Revision: D5483372 fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd	2017-07-26 16:37:00 -07:00
Honghao Wei	290acab2c7	implement drelu and unittest Summary: In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details. To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px. To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI. I also add units test for DRelu. We check the shape of output and also do the numeric unit tests. For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer. Reviewed By: chocjy Differential Revision: D5341464 fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630	2017-07-20 11:50:08 -07:00
Tao Wu	4a81b0f24a	make SparseLookup support None pooling Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id. Reviewed By: kittipatv Differential Revision: D5421667 fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5	2017-07-18 16:39:55 -07:00
Bangsheng Tang	e5a7891038	dot product using matmul Summary: 1. PairwiseDotProduct in layers 2. add_axis argument in Concat and Split(just for backward propagtion) Reviewed By: xianjiec Differential Revision: D5383208 fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3	2017-07-17 23:20:37 -07:00
Tao Wu	427cc68ba2	added TensorInferenceFunction for ExpandDims operator; deleted Reshape layer. Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer) Reviewed By: kittipatv Differential Revision: D5430889 fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb	2017-07-17 21:03:00 -07:00
Jacqueline Xu	2aa8fc7e8d	Implementing Semi-Random Features Layer Summary: - (Split diff from Arc Cosine) - Implemented [[ https://arxiv.org/pdf/1702.08882.pdf \| Semi-Random Features ]] Layer - Created a buck unit test for SRF Layer Reviewed By: chocjy Differential Revision: D5374803 fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae	2017-07-14 13:15:50 -07:00
Honghao Wei	34f7acbedf	Report bugs in BatchNormalization, the dimension is wrong for second order Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3. Reviewed By: chocjy Differential Revision: D5418538 fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d	2017-07-13 18:31:18 -07:00
Jiyan Yang	043640c3eb	Return top K classes Reviewed By: kittipatv Differential Revision: D5363481 fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af	2017-07-13 00:20:00 -07:00
Jiyan Yang	d6f5452240	Allow to import subclasses of layers Summary: We want it to be able to register children of layers who are not direct children of ModelLayer. This requires us to find subclasses of ModelLayer recursively. Reviewed By: kittipatv, kennyhorror Differential Revision: D5397120 fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71	2017-07-12 20:19:47 -07:00
Tao Wu	02aa5ad9fb	make functional layer return scalar if only one output Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations. Reviewed By: kittipatv Differential Revision: D5386853 fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42	2017-07-12 11:34:31 -07:00
Tao Wu	b9e64ecef1	allow param_info to set optimizer Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter. Reviewed By: kennyhorror Differential Revision: D5385432 fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92	2017-07-12 08:49:48 -07:00
Jacqueline Xu	e89e71c595	Simplifying Random Fourier Features and layer test Summary: - Condensed operators in RFF layer - Adjusted RFF layer test; made test code more concise Reviewed By: chocjy Differential Revision: D5391436 fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1	2017-07-11 00:40:53 -07:00
Jacqueline Xu	6ea71155c1	Implementing Arc Cosine Layer Summary: - Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf \| Arc Cosine ]] layer - Developed buck unit test for Arc Cosine Reviewed By: chocjy Differential Revision: D5367604 fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0	2017-07-10 10:10:36 -07:00
Jiyan Yang	3598bdd044	Modify samplingTrain layer to take more general inputs Summary: As desc. Reviewed By: kittipatv Differential Revision: D5363486 fbshipit-source-id: cb8fa65d750e80d2bf3e9909ca9b2d83a5548099	2017-07-08 22:19:55 -07:00
Bangsheng Tang	5f63f5697a	IndexHash Summary: 1. IndexHashOp 2. Helper class SparseFeatureHash 3. FeatureSpec changes to add desired_hash_size Reviewed By: kennyhorror Differential Revision: D5361370 fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55	2017-07-07 23:06:11 -07:00
Jacqueline Xu	8cedf35d55	Adding Random Fourier Features to SparseNN Model and Flow Summary: - Integrated RFF into the preprocessing workflow for dense features - Developed Flow interface to input RFF parameters - Created unit test for using RFF with sparseNN Reviewed By: chocjy Differential Revision: D5367534 fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68	2017-07-07 09:39:32 -07:00
Jacqueline Xu	25bd5dda27	Implementing random fourier features layer Summary: - Created the random fourier features layer - Generated a unit test to test the random fourier features layer is built correctly - Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf \| Random Features for Large-Scale Kernel Machines]] Reviewed By: chocjy Differential Revision: D5318105 fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176	2017-07-04 23:48:42 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Jiyan Yang	8260002941	Partial eval layers Summary: In some cases we don't want to compute the full FC during eval. These layers allow us to compute dot product between X and W[idx,:] where idx is an input, e.g., label. Reviewed By: kittipatv Differential Revision: D5305364 fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078	2017-06-28 00:36:40 -07:00
Yiming Wu	a1fcbb8be1	offline_all_gpu_experiment Summary: similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment. Compare Run cat(128, 32)512-512 : GPU 21138598 https://fburl.com/jpeod1pi CPU 21138787 https://fburl.com/vma7225l Reviewed By: dzhulgakov Differential Revision: D5308789 fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85	2017-06-27 23:09:54 -07:00
Yiming Wu	1fce3eac4e	single trainer hybrid device Summary: First try of single trainer hybrid device training for sparsenn Comparison results with CPU training: https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293 Reviewed By: dzhulgakov Differential Revision: D5205723 fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92	2017-06-27 22:06:30 -07:00
Yan Shang	cf4ac83a91	Make List.__getitem__() works with output of List.field_names() Summary: As described in T19378176 by kittipatv, in this diff, we fix the issue of __getitem__() of schema.List. For example, given Map(int32, float) (Map is a special List), field_names() will return "lengths", "values:keys", & "values:values". "values:keys" and "values:values" are not accessible via __getitem__(). __getitem__() bypasses the values prefix and directly access the fields in the map. Other APIs (e.g., _SchemaNode & dataset_ops) expect "values:keys" and "values:values" as it simplifies traversal logic. Therefore, we should keep field_names() as is and fix __getitem__(). Reviewed By: kittipatv Differential Revision: D5251657 fbshipit-source-id: 1acfb8d6e53e286eb866cf5ddab01d2dce97e1d2	2017-06-21 14:06:05 -07:00
Tao Wu	4be5337cca	add support for weight in batch_softmax_loss Summary: weighted batch_softmax_loss when weight exists in input_record Reviewed By: kittipatv Differential Revision: D5291646 fbshipit-source-id: f1bcd386ad1fc0e95e0a0315ec1c36531c792495	2017-06-21 10:32:15 -07:00
Bokai Cao	d9087edb07	add rekey in feature_processor Differential Revision: D5270972 fbshipit-source-id: 8805c0e947f4752d2c575e2a7b8986cd804601dc	2017-06-20 23:19:09 -07:00
Jacqueline Xu	5957218cf0	Adding Dropout Layer to SparseNN Model and Flow Summary: - Incorporated dropout layer to the sparseNN training and testing pipeline - Integrated an advanced model options feature on Flow UI for users to specify dropout rate - Created an end-to-end unit test to build and run a model with dropout Reviewed By: chocjy Differential Revision: D5273478 fbshipit-source-id: f7ae7bf4de1172b6e320f5933eaaebca3fd8749e	2017-06-20 15:46:55 -07:00
Bokai Cao	d2b1cb22a4	rekey layer Differential Revision: D5210095 fbshipit-source-id: dc66a10d95842e0f10cb53a5afb7ddcc3fcac0de	2017-06-19 18:47:28 -07:00
Jacqueline Xu	6150d9bef2	Building dropout as layer Summary: Dropout layer and unittest for DPer2 Reviewed By: chocjy Differential Revision: D5254866 fbshipit-source-id: 5eaea81808ddf8e0c7a7d76209ea44cda2ee28aa	2017-06-19 14:46:52 -07:00
haracejacob	2ec294a8bb	Fix a few typos and grammars in comment Summary: Fix a few typos and grammars in comment by using language-check, python library spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py here is the text file which indicates what things should be fixed : https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2 Closes https://github.com/caffe2/caffe2/pull/719 Differential Revision: D5165118 Pulled By: aaronmarkham fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b	2017-06-14 18:22:39 -07:00
Xianjie Chen	795e7e64e8	add truncation for sparse feature Summary: truncate id list using the max length computed in compute meta, so that it has determined length, which is useful for position weighted pooling method. Reviewed By: sunwael Differential Revision: D5233739 fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce	2017-06-13 10:46:53 -07:00
Jiyan Yang	c822e89956	Rename SparseToDense layer Summary: The SparseToDense layer is essentially calling the SparseToDenseMask op. This makes it impossible to call the functional layer with the true SparseToDense op. This diff is to rename the layer. Please let me know if I missed anything or you have a better name suggestion. Differential Revision: D5169353 fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb	2017-06-09 12:48:27 -07:00
Wael Abdelghani	c291c97494	Add integration test for pos_w Summary: Title Reviewed By: azzolini Differential Revision: D5197307 fbshipit-source-id: 425bf8e7c5068ea544e5b2709b6bb27eef140bf3	2017-06-08 18:04:53 -07:00
Fedor Borisyuk	686470a6b8	Feature importance in dper 2.0: build network representation Summary: Changes to enable feature importance. Reviewed By: kennyhorror Differential Revision: D5075252 fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1	2017-06-05 18:03:34 -07:00
Wael Abdelghani	ebecafbcca	Support for position weighted in distributed PS Summary: Title Reviewed By: azzolini Differential Revision: D5081871 fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028	2017-06-05 17:04:42 -07:00
Wael Abdelghani	5447f5c0d7	Move position weighted to separate layer Reviewed By: kennyhorror Differential Revision: D5063086 fbshipit-source-id: 212c08946728437bcc8b6049438ae82235137ec6	2017-06-05 15:49:22 -07:00
Tao Wu	3bd6195891	removed Sum from simple_operator_layers.py; passed unit tests Summary: removed softmax, sigmoid, tanh, relu from simple_operator_layers.py; passed all unit tests Reviewed By: kittipatv Differential Revision: D5150271 fbshipit-source-id: abe611bf6c5de5caba189181e9e41d705d8c5c54	2017-06-02 15:03:16 -07:00
Jiyan Yang	6aff754dbc	Add batch normalization layer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5077230 fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0	2017-05-26 16:46:52 -07:00
Andrey Malevich	6c12df3003	Fix export of SparseToDense layer. Summary: If there're 2 SparseToDense layers that are densifying same IdList feature it'll result in the situation, where we might export invalid input for the prediction in input specs. This diff is changing the behavior to support to use Alias to a new blob instead of passing things directly. Reviewed By: dzhulgakov Differential Revision: D5093754 fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7	2017-05-25 21:46:28 -07:00
Jiyan Yang	9bf1f16255	Add bias to cosine distance for two tower models Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem. Reviewed By: xianjiec Differential Revision: D5132606 fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61	2017-05-25 19:50:20 -07:00
Huazhong Ning	e394b60a9c	Support un-equal weight training for mtml models Reviewed By: queqichao Differential Revision: D5047939 fbshipit-source-id: 857d0d77e0413939e5774fa37d21b92a00d34bf0	2017-05-15 12:56:11 -07:00
Huazhong Ning	942f53b5a6	gradient impact of task layers on shared is configurable Reviewed By: chocjy Differential Revision: D4943948 fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3	2017-05-11 20:34:04 -07:00
Xiaolong Wang	11bcdbc3f0	Load Parameters from Model Summary: In Dper utility, add a function `load_parameters_from_model_init_options` to allow init parameters from pretrained models Reviewed By: xianjiec Differential Revision: D4926075 fbshipit-source-id: 5ab563140b5b072c9ed076bbba1aca43e71c6ac5	2017-05-10 10:33:04 -07:00
Xianjie Chen	8a7f00d61b	fix mean pooling Summary: Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not have this requirements. Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet. Differential Revision: D5019165 fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e	2017-05-08 01:09:07 -07:00

1 2 3

106 commits