Summary: Introduced weight for labels in multi-lable setting. An extra weight blob is introduced and read in the operator in case lable setting is weighted sparse.
Reviewed By: kevinwilfong
Differential Revision: D5812467
fbshipit-source-id: efb209092e1e9effc915b0a753fa0c67b47a4fb6
Summary:
Now that Buck supports a way to opt-out external C/C++ libs from omnibus linking,
this diff removes the hack we previously relied on (and which got copy-pasta-d everywhere).
Reviewed By: pixelb
Differential Revision: D5832450
fbshipit-source-id: cc3d12488f8498be6fb12bce1fedb3ad1accb518
Summary: On CPU, no need to replicate parameters. So try using only one copy (cpu_0) for parameters. Made resnet50_trainer use shared model in cpu mode.
Reviewed By: wesolwsk
Differential Revision: D5812181
fbshipit-source-id: 93254733edbc4a62bd74a629a68f5fa23f7e96ea
Summary: This caused gradient generation problems. Output was made in-place in PR-1185, by mistake, I believe.
Differential Revision: D5844825
fbshipit-source-id: 4ad84d0fb468aafde9f78463b9acf89316e633ca
Summary: Ported existing adhoc test code to use python unittests. Small tweak to caffe2.python.hypothesis_test_util
Reviewed By: kmatzen
Differential Revision: D5837295
fbshipit-source-id: daa2360db3c18c7d4bda7785e7a0b9175f5858af
Summary:
This is useful for pure throughput tests where
we don't care about training a real model.
Reviewed By: akyrola
Differential Revision: D5834293
fbshipit-source-id: dab528c9269fb713e6f6b42457966219c06e0a35
Summary: When trained on billions of data, the adagrad gradient square sum be very big and create an issue of adding small numbers to big numbers. This diff Allow to decay the adagrad gradient square sum.
Reviewed By: queqichao
Differential Revision: D5825932
fbshipit-source-id: 570224483b77d42ae53410fa2f767af86de167eb
Summary: PR 1175 caused a build error because gemmBatched was only under a specific #ifdef. Now put it outside the #ifdef, and things work.
Reviewed By: asaadaldien
Differential Revision: D5834868
fbshipit-source-id: 072a64c8f4b259ff7504104121766115b46b8aa0
Summary: Otherwise weights, biases are not created and test creation fails
Reviewed By: gsethi523
Differential Revision: D5836438
fbshipit-source-id: 32a75313b6b9ebecbfaa43ebd39f19c8eaba8cd1
Summary: get and getheader are the same in Python 2
Reviewed By: akyrola
Differential Revision: D5836486
fbshipit-source-id: 3bacfccc872c44741d7f26c68ba967093fce45c2
Summary:
To speed up deprecating legacy_pad, we added the option
to remove legacy pad in the caffe_translator
Reviewed By: bddppq
Differential Revision: D5724079
fbshipit-source-id: 25465d26f35bd009aa71667c7c523047de42e802
Summary: Fix comment on core.Net.RunAllOnMKL (the comment was actually for core.Net.RunAllOnGPU)
Reviewed By: zem7
Differential Revision: D5734309
fbshipit-source-id: 2cc40a99a2c0083c73ec1e4c8279f55f296a003c
Summary:
Also add the ability to mark an argument as required.
Added a string constant `OpSchema::Arg_IsTest` for `is_test` arg.
If users define the `is_test` argument with `ArgIsTest(...)`, then it automatically becomes required argument, in the meanwhile user can still use `Arg("is_test", ...)` to define an optional `is_test` argument.
Reviewed By: akyrola
Differential Revision: D5812391
fbshipit-source-id: eaaba50d027813a8012389edc6c459de23c3c728
Summary: For data parallel we need the batch size to be multiple of nubmer of replicas. In order to do so with this diff we do Dataset(rec).trim(multiple_of=num_replicas)
Reviewed By: dzhulgakov, harouwu
Differential Revision: D5753861
fbshipit-source-id: c5d728b925707dbd3d1f500a93e67e185c223569
Summary:
I would expect that tests marked "expected failure" mean that there is a known issue in the code which will be fixed later. Both of these tests are simply verifying proper error-checking - nothing needs fixing.
Before (looks like something is wrong):
```
======================================= 2 xfailed in 0.27 seconds =======================================
```
After:
```
======================================= 2 passed in 0.28 seconds ========================================
```
/cc akyrola gsethi523
Closes https://github.com/caffe2/caffe2/pull/1209
Differential Revision: D5825373
Pulled By: akyrola
fbshipit-source-id: 1b98f503e4e406f69567d02425532f43bd16a465
Summary: Default value for timeout in CreateOrCloneCommonWorld does not work properly: if the value of dpm._DEFAULT_TIMEOUT is changed, the default still stays as old 30s. Changed to use None instead as default.
Reviewed By: pietern
Differential Revision: D5813228
fbshipit-source-id: f617ceec40a03893c27d3e13c426e1ca6b2114e2
Summary:
Computes a fixed grid or RMAC region coordinates for a given 4D feature tensor
(NCHW) as described in https://arxiv.org/abs/1511.05879. The output is the
`roi` format expected by RoIPoolOp. To compute the actual RMAC itself, the
output of this op should be passed to RoIPoolOp.
Reviewed By: wickedfoo
Differential Revision: D5594994
fbshipit-source-id: 5edac98a18137b53555f9a16354419b424679c99
Summary: Explicit function to sync blobs. Notice that this must be called before CreateNet(), and syncs the blobs every run.
Reviewed By: asaadaldien, jay-mahadeokar
Differential Revision: D5805891
fbshipit-source-id: 58a1bb47805d75d5cbead136e2e0e9fe663ea954
Summary: This is will allow the same decoder to handle different go tokens.
Differential Revision: D5801811
fbshipit-source-id: ddd309963c97e32c728b15d2ccd4ba0c4ad5ebbe
Summary:
If the Gloo InfiniBand transport is used, the Gloo algorithms can use
GPUDirect to DMA directly from/to GPU memory. This is done through the
CudaDeviceWorkspace. This change adds a "gpu_direct" option to the
Allreduce operator that makes it use GPUDirect if the transport
supports it.
Closes https://github.com/caffe2/caffe2/pull/1203
Reviewed By: wesolwsk
Differential Revision: D5806366
Pulled By: pietern
fbshipit-source-id: 9e9a78f059f2b5c6e4fbf6574b7db4776a94696c
Summary: RNN executor previously relied on getting the mapping from x to x_prev (and gradients) from recurrent.py, but we can just infer them from links. This makes all models compatible with rnn executor, given enable_rnn_executor=1 argument.
Reviewed By: jamesr66a
Differential Revision: D5801436
fbshipit-source-id: 14d0e26dfbad6347f645d907da493187c98e9b17
Summary:
Before this change there were two ways for machines to rendezvous for a
distributed run: shared file system or Redis. If you're using an MPI
cluster it is much more convenient to simply execute mpirun and expect
the "right thing (tm)" to happen. This change adds the "mpi_rendezvous"
option to the CreateCommonWorld operator. If this is set, the common
world size and rank will be pulled from the MPI context and Gloo
rendezvous takes place using MPI. Note that this does NOT mean the MPI
BTL is used; MPI is only used for rendezvous.
Closes https://github.com/caffe2/caffe2/pull/1190
Reviewed By: akyrola
Differential Revision: D5796060
Pulled By: pietern
fbshipit-source-id: f8276908d3f3afef2ac88594ad377e38c17d0226
Summary: As title. Made the configurations op-specific since many models run multiple RNNs.
Reviewed By: jamesr66a
Differential Revision: D5796208
fbshipit-source-id: 88173879dfff9f3f7bf583ccc4f4c6385cca5aca
Summary: Allow context to be passed into piper function
Reviewed By: volkhin
Differential Revision: D5684716
fbshipit-source-id: 693f0464fe28f8692d75901705a85a0a413a7bed
Summary:
`ModifierContext` is the base class for `OptimizerContext` and `RegularizationContext`.
`UseModifierBase` is the base class for `UseRegularizer `and `UseOptimizer`
Most of codes in `OptimizerContext`, `RegularizationContext` and other potential Context class in future could be shared. We thus implemented a new base class, called `ModifierContext` to support it.
It happens to be the same for `UseRegularizer` and `UseOptimizer`, and we implemented a new base class called `UseModifierBase`.
In this way, users only need to provide API for **get** and **has** operation. Also, they need to tell what's the **context class**.
**Note**
Mirrored code in fbandroid and fbobj would be added when finally check in.
Reviewed By: kittipatv, xianjiec
Differential Revision: D5724613
fbshipit-source-id: de19bb822dcd41ec5c459d65065603a0abe2fd20
Summary:
Regularization added for caffe2 and dper.
This regularization is intended for `dense feature `only. Sparse feature would serve as individual optimizer, see ` D5618405 ` and `D5534579` for details.
The implementation of dense regularization is similar to the ones in optimizer. we now support `l1 norm` and ` l2 norm` in regularizer. In dper, we would call different regularization based on regularization type defined in model_definition.thrift.
Reviewed By: xianjiec
Differential Revision: D5724851
fbshipit-source-id: 0fbee698cfeff1ac477fc9d07785406069f8d9c8
Summary:
These arguments control which Gloo transport (TCP or IB) and which
network interface is used for the common world. If not specified, it
defaults to using TCP and the network interface for the IP that the
machine's hostname resolves to.
The valid values for the transport argument are "tcp" and "ibverbs".
For ibverbs to work, Gloo must have been compiled with ibverbs
support. If Gloo is built as part of Caffe2 (sourced from the
third_party directory), then you can pass -DUSE_IBVERBS=ON to CMake to
enable ibverbs support in Gloo.
Closes https://github.com/caffe2/caffe2/pull/1177
Reviewed By: akyrola
Differential Revision: D5789729
Pulled By: pietern
fbshipit-source-id: 0dea1a115c729e54c5c1f9fdd5fb29c14a834a82
Summary:
The predictor export functions allowed a way to specify a net type, but no way to specify num_workers for when you use net type 'dag'. This adds that option to the PredictorExportMeta named tuple and populates the field in the exported protobuf. Also added parameters to callsites in NMT ensemble model class and model repackager to populate net_type and num_workers.
Using DAGNet for our base predictor net (not recurrent stepnets) speeds up our inference by 1.15x, since we can now run encoder forward and backward RecurrentNet's for each model in the ensemble in parallel.
Reviewed By: salexspb
Differential Revision: D5792203
fbshipit-source-id: cb9a8237a0cbe1a09645d4de051dfbb23f06dcfa
Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer.
Reviewed By: kittipatv
Differential Revision: D5783303
fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b
Summary: The shape inference of distance_op has issues (only works when inputs are 1D tensors). This diff fix the shape inference and the unit test.
Reviewed By: kittipatv
Differential Revision: D5788744
fbshipit-source-id: cb1b7facf7b9ccd64b54edca156325eceef50f33