Summary: Allow to drill down on data throuhgput overall and per field.
Reviewed By: dzhulgakov
Differential Revision: D4622168
fbshipit-source-id: 1462bb2fac05824fda0c02f4f5f0b8713893e650
Summary:
Use AddNet and AddBlobs to add net and blobs to meta_net_def.
This a codemod and does not change the functionality.
It is for preparation of the protobuf change.
Depends on: D4770648
Reviewed By: salexspb
Differential Revision: D4771110
fbshipit-source-id: 00cecb2105f2c332bd50c3c51b9a10e1004fa90f
Summary:
This was a nasty one to track down. This was the error message:
```
E0323 14:47:46.138900 2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered
F0323 14:47:46.139143 2870 operator.h:176] Computation on device returned error in operator
input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 }
```
Closes https://github.com/caffe2/caffe2/pull/220
Differential Revision: D4771086
Pulled By: Yangqing
fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41
Summary:
Codemod to use a separate function, for protobuf change later on
It does not change the functionality
Reviewed By: salexspb
Differential Revision: D4770648
fbshipit-source-id: d8090f45d31ffa5ca1dca47297fb7c196f34d8a6
Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob
Differential Revision: D4768643
fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a
Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net
Reviewed By: salexspb
Differential Revision: D4752259
fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.
Reviewed By: kennyhorror
Differential Revision: D4744949
fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
Summary:
Add ConvNd interface for Nd convluton and keep Conv for 2d convlution.
I added _BaseConv to share code between ConvNd and Conv.
Reviewed By: Yangqing
Differential Revision: D4660822
fbshipit-source-id: 8339421351ce9a36ce5a165f7fa455cfcc61733d
Summary:
This completes the fix that viswanathgs started in an earlier diff but did not
cover the full Caffe convention. It should have proper guards for all the stuff
that Caffe implies, either supporting it or throwing an explicit exception.
Reviewed By: viswanathgs
Differential Revision: D4751751
fbshipit-source-id: 474e921c33840cff333a631b7b19f881b39ebccd
Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on
Reviewed By: urikz
Differential Revision: D4742670
fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816
Summary:
aaronmarkham this solves your Windows build issue. Basically:
(1) VS 2017 does not have CUDA support yet, and we will be waiting on NVidia to do so.
(2) VS 2015 and 2017 need different cmake generator strings.
This PR shows how to determine those and also updates appveyor to do contbuild guard for the following 3 settings:
- VS2015 without cuda
- VS2017 without cuda
- VS2015 with cuda
Closes https://github.com/caffe2/caffe2/pull/210
Differential Revision: D4745007
Pulled By: Yangqing
fbshipit-source-id: 50952552843abd0eb6f4145d9f132daeee3a6794
Summary: Created `BatchDistillLRLoss` layer and added support for it in DPer2.
Differential Revision: D4718333
fbshipit-source-id: b873954ea704daafed94ac65fef47a20d56858e2
Summary: D4734505 part 2. Remove more instances of the batch_size parameter
Reviewed By: urikz
Differential Revision: D4736906
fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter
Reviewed By: urikz
Differential Revision: D4734505
fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
Summary: we don't use this one any more except a few tests
Reviewed By: urikz
Differential Revision: D4731401
fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836
Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.
Reviewed By: urikz
Differential Revision: D4631914
fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works.
Reviewed By: ajtulloch
Differential Revision: D4724244
fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers
Reviewed By: kittipatv
Differential Revision: D4680215
fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
Summary: layer that takes a label, prediction pair and outputs the L2 loss
Reviewed By: kittipatv
Differential Revision: D4702111
fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb
Summary:
it was broken in trunk and I fixed it locally then had a
wrong merge in D4672026. This is just a revert of those changes
Reviewed By: ajtulloch
Differential Revision: D4723138
fbshipit-source-id: 14757d9c8ae5135bd7c084003a64e25efc74b54f
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter
Reviewed By: urikz
Differential Revision: D4702086
fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
Summary:
/cc akyrola
I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way?
Closes https://github.com/caffe2/caffe2/pull/187
Differential Revision: D4699312
Pulled By: Yangqing
fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7
Summary: For some embedding task, we don't want to include bias term in embedding computation.
Reviewed By: xianjiec
Differential Revision: D4689620
fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644
Summary:
Make it use Gloo and optionally use Redis for rendezvous (where a
shared filesystem is not available).
Differential Revision: D4709943
fbshipit-source-id: 59cc7a14316c7b634417ea5161a75fab3c19f2fa
Summary:
We are having more and more nested Struct schema. There is increasing need to get/adda field by nested name, e.g., for the following nest Struct schema:
st = Struct(
('a': Scalar()),
('b': Struct(
('c': Scalar()),
)),
)
We may want to get the field "b:c" and/or insert a new field "b:x". The immediate need is for dper2 metrics.
This diff is to achieve this.
Reviewed By: kittipatv
Differential Revision: D4690225
fbshipit-source-id: 71d4a74b36bd1228a2fefd901db2f200602152b7
Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name.
Reviewed By: jhcross
Differential Revision: D4712152
fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956
Summary:
No longer need GPU to CPU copies. The allreduce operator no longer
uses 'local allreduce - global allreduce - local broadcast' sequence
when Gloo is used, but passes all input blobs directly.
Depends on D4708860.
Differential Revision: D4709897
fbshipit-source-id: 4d745d5d8bac9c2fcca081dd5d812c902808c3b6
Summary:
This is going to allow to experiment with various training from scratch / fine tunning technics. The code itself for the new model is not intended to be used as is. Instead one could train a full precision model first. Then add quantization for the last layer, then for the next one and so on.
In my experiments I tried getting a pretrained model and then quantizing all inception layers with 4 bits. This restored original accuracy after several dozen iterations
Also in this diff I added a common prefix to the model checkpoint + added this prefix to git / hg ignore.
And also some extra logs which are usefull to quickly see how things changed right after enabling quantization
Differential Revision: D4672026
fbshipit-source-id: b022c8ccf11dd8a2af1a7b2e92673483bc741a11
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.
With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```
Future work:
* Get all of the tests to pass
* Some seem to be testing operations which don't have GPU implementations
* I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
* Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199
Reviewed By: pietern
Differential Revision: D4704504
Pulled By: Yangqing
fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
- cell net/step net external inputs must be namespace scoped
- prevent double-namescoping of cellnet inputs
- make data parallel model understand recurrentnets so the device-mapping works
Reviewed By: salexspb
Differential Revision: D4708840
fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually.
Reviewed By: xianjiec, kennyhorror
Differential Revision: D4703982
fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6
Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that.
Differential Revision: D4702123
fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4
Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers
Reviewed By: Yangqing
Differential Revision: D4671478
fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12
Summary: UNK needs tobe indexed in the vocabulary for validation to work. Default args now result in training loss decreasing.
Reviewed By: urikz
Differential Revision: D4703393
fbshipit-source-id: e4d6ad100daf8392f8ba1e502f9ecf39bb8ce24a
Summary:
It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input).
We can also consider including this directly in the predictor exporter for FB usage.
Reviewed By: rpenggithub
Differential Revision: D4693264
fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090
Summary:
Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts.
The reduction happens among the last dimenstions for example if input is a
M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the
rowwise sums.
Differential Revision: D4689768
fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7
Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used.
Differential Revision: D4671125
fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619
Summary: We should be using the vocabulary built on the training data, and corpus_eval as data for the evaluation phase.
Reviewed By: urikz
Differential Revision: D4700382
fbshipit-source-id: ca1dd043a28f9bb585faad050c82fb12c1cdf6cc