pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Huazhong Ning	f7ad13694c	support model init Summary: a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters. This diff fixes this issue and also support model init for ads-intent project Reviewed By: dragonxlwang Differential Revision: D5991291 fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7	2017-10-19 15:56:37 -07:00
Bangsheng Tang	7b30436201	remove Alias in SparseFeatureHash Summary: remove Alias in SparseFeatureHash Reviewed By: kennyhorror Differential Revision: D6094663 fbshipit-source-id: f313aeb17bf6cfdacae62b2c1ad6b4175d0882dd	2017-10-19 13:24:20 -07:00
Dmytro Dzhulgakov	2972a6ca02	Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger" Summary: This reverts commit 95c634872ac02be721257169e38c8fead04cd66b bypass-lint Differential Revision: D6026557 fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776	2017-10-12 20:21:52 -07:00
Artem Volkhin	5b10ad255b	Use EMBEDDING feature type instead of FLOAT_TENSOR Summary: create a special type for embeddings Differential Revision: D5997808 fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454	2017-10-11 13:50:03 -07:00
Luke Yeager	75bece6ede	Fix "No handlers could be found for logger" Summary: Closes https://github.com/caffe2/caffe2/pull/1316 Differential Revision: D6026557 Pulled By: Yangqing fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b	2017-10-10 22:32:13 -07:00
Xianjie Chen	9455eda57b	cast distill loss teacher label to float Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float. Reviewed By: kittipatv Differential Revision: D6018163 fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4	2017-10-10 01:16:07 -07:00
Kittipat Virochsiri	d5f60b240d	Fix distill loss Summary: The layer should also apply to evaluation as it's needed for feature importance run. Reviewed By: xianjiec Differential Revision: D6016125 fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8	2017-10-09 18:17:31 -07:00
Artem Volkhin	fb8a7679cc	preprocs for embeddings Summary: embeddings Differential Revision: D5888420 fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421	2017-10-04 22:18:21 -07:00
Hassan Eslami	8e309c014c	Tagging sparse parameters Summary: This is the first step on DPER side to use net transformation step (`parallelize_net`). So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built. Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`. Reviewed By: azzolini Differential Revision: D5902733 fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a	2017-10-04 18:46:48 -07:00
Hassan Eslami	7fc7756487	Refactor param initialization from model manipulation to layers logic Summary: This diff refactors the parameter initialization logic from model manipulation to layers Reviewed By: azzolini Differential Revision: D5920225 fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46	2017-10-02 22:08:40 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Xiaolong Wang	642dea487d	update inline comment Summary: as desc Reviewed By: kennyhorror Differential Revision: D5930526 fbshipit-source-id: 510388fd66b487410ff748a9e6f546a8ce27bc1d	2017-09-28 10:17:13 -07:00
Huazhong Ning	808c9e3e70	fix a small typo error in sparse_lookup Summary: as title Reviewed By: kittipatv Differential Revision: D5908455 fbshipit-source-id: e7c66e84a27273156d66dfd043e9cfd9b0ab9a98	2017-09-25 21:46:56 -07:00
Kittipat Virochsiri	5aac6a2e06	Make LastNWindowCollector thread-safe Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data. Reviewed By: chocjy Differential Revision: D5858335 fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e	2017-09-22 09:48:30 -07:00
Anshul Verma	a340d141de	Check num_elements > num_samples in UniformSampling Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time. Reviewed By: kittipatv Differential Revision: D5858085 fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891	2017-09-21 16:37:20 -07:00
Kittipat Virochsiri	1b059f4c98	Add option to ignore parameter initialization Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization. Reviewed By: chocjy Differential Revision: D5870362 fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a	2017-09-20 12:03:22 -07:00
Yan Shang	6a883d1bc0	Remove dot_product layer Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer. Reviewed By: kittipatv Differential Revision: D5783303 fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b	2017-09-07 18:48:30 -07:00
Xianjie Chen	ec713d437d	make sure the output of sparse lookup layer is float Summary: currently, if reduer=Nonoe, the output if fp16 Differential Revision: D5773560 fbshipit-source-id: 24d7e5fae366d70352582e9a1ee14c7613753b7a	2017-09-07 17:47:39 -07:00
Dmitrii Podoprikhin	c7684e3b27	Rowwise quantization Reviewed By: kennyhorror Differential Revision: D5753626 fbshipit-source-id: 680c627a81658bcd653feab68e7040db0cb7a185	2017-09-06 10:19:38 -07:00
Long Jin	3faeb621d3	support id_score_list for Feed Reviewed By: xianjiec Differential Revision: D5624894 fbshipit-source-id: 1b2caba9ffcce68f346020485cb1f4edb01ca5e7	2017-08-24 00:32:05 -07:00
Jeonghee Yi	98da4e3a04	pairwise dot product with dot_groups support Summary: extending pairwise dot-product only between dot_groups Differential Revision: D5527060 fbshipit-source-id: be5d3178c332e122853a2f9d8da12a880608b0ab	2017-08-23 15:23:36 -07:00
Jeonghee Yi	d675c101e9	extend pairwise dot product for non-equal x & y dimension size Summary: extend pairwise dot product for different number of embeddings on x & y dimensions Differential Revision: D5663553 fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6	2017-08-23 02:08:20 -07:00
Badri Narayan Bhaskar	9507cae9e0	Create MergeIdListsLayer Summary: We create a layer for MergeIdListsOp Differential Revision: D5531348 fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25	2017-08-22 17:00:55 -07:00
Kittipat Virochsiri	0e5fcc7ca2	Make Tags a decorator as well Summary: In case the whole function should be wrapped in certain context, this make it less ugly. Reviewed By: xianjiec Differential Revision: D5665253 fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a	2017-08-22 11:01:14 -07:00
Yan Shang	57c93435e3	Dedup name in functional layer Summary: Before this fix, a functional layer name can appear several time in a blob and causes confusion. This diff fix this issue. Reviewed By: kittipatv Differential Revision: D5641354 fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2	2017-08-17 17:50:34 -07:00
Kittipat Virochsiri	fa984af0f9	use create_param() in layers Summary: These layers were not codemoded Reviewed By: chocjy Differential Revision: D5645982 fbshipit-source-id: 4325f77a0f8152dfe6dfdeee59697b25ecb1de35	2017-08-17 13:47:57 -07:00
Kittipat Virochsiri	b91c2f5064	Make reservoir sampling thread safe Summary: Guarding reservoir sampling with mutex & fix the bug in counting number of new entries. Reviewed By: chocjy Differential Revision: D5503300 fbshipit-source-id: fd6b0bacb71fbab99d6d5df2c72da523fba02847	2017-08-10 15:27:21 -07:00
Kittipat Virochsiri	9c4872f4bc	Reservoir sampling with object ID deduplication Summary: Adding the option to dedup by object ID so that more frequent objects are not present more than once in the reservoir Reviewed By: chocjy Differential Revision: D5503109 fbshipit-source-id: e36c3ad8eea134d6c10a4c875fceadc0f843c976	2017-08-10 15:27:20 -07:00
Kittipat Virochsiri	f78af06f1b	Features collection with reservoir sampling Summary: Make the candidate pool less localized Reviewed By: chocjy Differential Revision: D5453289 fbshipit-source-id: 848cb7551d7112f6f47f2cf647bb0daca6eff341	2017-08-10 15:27:20 -07:00
Long Jin	ef64a4f6b2	Add conv layer and layer tests Reviewed By: xianjiec Differential Revision: D5569206 fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d	2017-08-08 10:57:43 -07:00
Jacqueline Xu	a1bf14d8e6	Building new randomized sparse nn model Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable Reviewed By: chocjy Differential Revision: D5416489 fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d	2017-08-07 12:48:58 -07:00
Jiyan Yang	4b80ff89e2	Use softsign op for s=0 in arc-cosine feature map Summary: The current implementation for s=0 doesn't support backward pass. Switching to using pow op instead as a temporary solution. Reviewed By: jackielxu Differential Revision: D5551742 fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3	2017-08-03 23:35:11 -07:00
Wenlin Chen	adc5510ecb	dynamic embedding Summary: refactor get_categorical_limit Reviewed By: xianjiec Differential Revision: D5459389 fbshipit-source-id: 14a7e07394db52fb090c6923e341c34576fcb6d6	2017-08-03 00:33:18 -07:00
Jiyan Yang	a8695178aa	Adding parameter sharing API to Dper2 Summary: To achive this, I modified the blob name scheme defined in a layer. Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc within the same scope). Now I change it to scope/fc/w and scope/fc_auto_0/w. That is, we rely on the uniqueness of the scoped layer name to define names for blobs. I also overwrote the create_param method in LayerModelHelper to let it use the resolved name for blobs given the sharingparameter context. There are some details such as making the initializer more structured that I need to finalize. Reviewed By: kennyhorror Differential Revision: D5435132 fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455	2017-08-03 00:33:18 -07:00
Xiaolong Wang	82adbde878	pass layer_parameter shape to ps builder if cannot inferred from initializer Summary: Feed team uses distributed training and wants to also use transfer learning. Currently, transfer learning implements by overwriting the layer parameter initializer. Therefore, PS builder can't infer correctly the parameter shape. To fix this, add a field 'shape' in `layer_parameter` and set the shape if we overwrite its initializer. We also enforce the check of parameter shape between the original initializer and the loaded blob. (this adds extra cost) Differential Revision: D5520541 fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81	2017-07-31 16:04:23 -07:00
Jacqueline Xu	13569c9aa0	Fixing semi-random layer model for multi-layer models Summary: Updated the semi-random layer model for multi-layer models using semi-random layers. Notable changes: - Input and outputs for the semi-random layer is now a Struct with "full" and "random" components - Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer) Reviewed By: chocjy Differential Revision: D5496034 fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04	2017-07-27 15:25:19 -07:00
Bangsheng Tang	d8443b8ffa	BatchGatherOp Summary: 1. added BatchGatherOp and BatchGatherGradientOp 2. unit tests Reviewed By: xianjiec Differential Revision: D5443965 fbshipit-source-id: bdcbb7f9f91c55484372a4bdb1727ae6d49e2018	2017-07-27 10:17:42 -07:00
Jacqueline Xu	9bec54bbf1	Modify arc cosine feature map and semi random layers to initialize parameters as global constants Summary: The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer. I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished. In this diff, I've: - Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers - Updated unit tests - Ran an end-to-end test, enabling multiple readers to test the fixed issue Reviewed By: chocjy Differential Revision: D5483372 fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd	2017-07-26 16:37:00 -07:00
Honghao Wei	290acab2c7	implement drelu and unittest Summary: In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details. To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px. To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI. I also add units test for DRelu. We check the shape of output and also do the numeric unit tests. For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer. Reviewed By: chocjy Differential Revision: D5341464 fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630	2017-07-20 11:50:08 -07:00
Tao Wu	4a81b0f24a	make SparseLookup support None pooling Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id. Reviewed By: kittipatv Differential Revision: D5421667 fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5	2017-07-18 16:39:55 -07:00
Bangsheng Tang	e5a7891038	dot product using matmul Summary: 1. PairwiseDotProduct in layers 2. add_axis argument in Concat and Split(just for backward propagtion) Reviewed By: xianjiec Differential Revision: D5383208 fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3	2017-07-17 23:20:37 -07:00
Tao Wu	427cc68ba2	added TensorInferenceFunction for ExpandDims operator; deleted Reshape layer. Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer) Reviewed By: kittipatv Differential Revision: D5430889 fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb	2017-07-17 21:03:00 -07:00
Jacqueline Xu	2aa8fc7e8d	Implementing Semi-Random Features Layer Summary: - (Split diff from Arc Cosine) - Implemented [[ https://arxiv.org/pdf/1702.08882.pdf \| Semi-Random Features ]] Layer - Created a buck unit test for SRF Layer Reviewed By: chocjy Differential Revision: D5374803 fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae	2017-07-14 13:15:50 -07:00
Honghao Wei	34f7acbedf	Report bugs in BatchNormalization, the dimension is wrong for second order Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3. Reviewed By: chocjy Differential Revision: D5418538 fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d	2017-07-13 18:31:18 -07:00
Jiyan Yang	043640c3eb	Return top K classes Reviewed By: kittipatv Differential Revision: D5363481 fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af	2017-07-13 00:20:00 -07:00
Jiyan Yang	d6f5452240	Allow to import subclasses of layers Summary: We want it to be able to register children of layers who are not direct children of ModelLayer. This requires us to find subclasses of ModelLayer recursively. Reviewed By: kittipatv, kennyhorror Differential Revision: D5397120 fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71	2017-07-12 20:19:47 -07:00
Tao Wu	02aa5ad9fb	make functional layer return scalar if only one output Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations. Reviewed By: kittipatv Differential Revision: D5386853 fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42	2017-07-12 11:34:31 -07:00
Tao Wu	b9e64ecef1	allow param_info to set optimizer Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter. Reviewed By: kennyhorror Differential Revision: D5385432 fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92	2017-07-12 08:49:48 -07:00
Jacqueline Xu	e89e71c595	Simplifying Random Fourier Features and layer test Summary: - Condensed operators in RFF layer - Adjusted RFF layer test; made test code more concise Reviewed By: chocjy Differential Revision: D5391436 fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1	2017-07-11 00:40:53 -07:00
Jacqueline Xu	6ea71155c1	Implementing Arc Cosine Layer Summary: - Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf \| Arc Cosine ]] layer - Developed buck unit test for Arc Cosine Reviewed By: chocjy Differential Revision: D5367604 fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0	2017-07-10 10:10:36 -07:00

1 2 3

135 commits