Commit graph

157 commits

Author SHA1 Message Date
Andrey Malevich
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
Aapo Kyrola
06398e9bfb softmax-with-loss, handle gracefully cases when total weight is 0
Summary:
Spatial Softmax allows specifying locations that are not counted for the loss. If none of the locations are counted, this resulted in NaNs, and headache. This diff fixes that by explicitly handling these cases.

+ assertion for label blob dimension(0)

Created a new test as well.

Differential Revision: D4442939

fbshipit-source-id: 8641bfad2a994e517ca3eda39345380a6ca1ba50
2017-01-20 15:29:21 -08:00
Aapo Kyrola
e18643f90b More fixes
Summary:
When testing the code, a couple of issues arised:
 - we need to have different name for last layer than the preprocessed model, otherwise a shape assertion is created
 - preprocess_noaugmentation still needs to do a crop for images larger than 227x227, otherwise things fail.

Reviewed By: viswanathgs

Differential Revision: D4442700

fbshipit-source-id: 05f54e7f17c266280f5ba5bb57af1721fe30df12
2017-01-20 13:44:24 -08:00
Kevin Matzen
6a7dd236fa instance norm
Summary: Added gradient and GPU implementation to caffe2 InstanceNorm op

Reviewed By: Yangqing

Differential Revision: D4304808

fbshipit-source-id: 6feecaed589ea9f825260a49b39b4260da6e5426
2017-01-20 12:29:28 -08:00
Alexander Sidorov
3f66f66da9 DebugMode helper for Caffe2
Summary:
It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically)

Usage example:

```
if __name__ == '__main__':
  workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])
  from caffe2.python.utils import DebugMode
  DebugMode.enable()
  DebugMode.run(main)
```

Reviewed By: Yangqing

Differential Revision: D4424096

fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f
2017-01-20 09:29:31 -08:00
Aapo Kyrola
afe822ebd7 Small tweaks
Summary:
Some tweaks, hopefully getting us to 0.98 MAP
- no cropping for test dataset (as per patrick)
- spatialBN momentum 0.1 (default is 0.9)

Also added some additional logging and reduced frequency of running of test net and logging.

Reviewed By: viswanathgs

Differential Revision: D4439790

fbshipit-source-id: 700705b811a5fc8c7139a265de96db646605ca5a
2017-01-19 18:44:26 -08:00
Ahmed Taei
411059d649 Generate huffman tree
Summary:
In this diff :
[1] Change the output from generating all paths from root to labels to TreeProto.
TreeProto itself is required by inference and we can use hsm_util to get the
paths from TreeProto.

[2] Fix hsm_util index assigment.

Differential Revision: D4416731

fbshipit-source-id: 657d8b9b4df6fa30c9f92d391cf7e07b5c5db1f8
2017-01-19 16:14:23 -08:00
Ahmed Taei
dd51336611 Fix label start index for HuffmanTreeHierarchyOp
Summary: Change labels indices range to be in the range [0, num_classes[

Differential Revision: D4416685

fbshipit-source-id: b16ca8539fd538ad62bf1298dbad3f1553956241
2017-01-19 15:14:53 -08:00
Andrey Malevich
9f0a7935f6 Replace one more place from _net.external_input to _external_input_map
Summary: #accept2ship

Reviewed By: dzhulgakov

Differential Revision: D4435301

fbshipit-source-id: 6b62492c190325e82bc14d5397852106d07d5235
2017-01-19 12:29:30 -08:00
Yangqing Jia
91ebfa3c7c Unit test for big batch size avg pooling
Summary: basically copied test_pooling and hard coded values

Reviewed By: prigoyal

Differential Revision: D4428162

fbshipit-source-id: 6c0444ac8c21f08824df7ff53999a94967607dc4
2017-01-18 19:29:20 -08:00
Viswanath Sivakumar
be97f491e6 Unbreak caffe_translator for Conv op
Summary:
Minor bug in D4426513 - bias is added
as input blob always. Running it on xray throws "RuntimeError: [enforce fail at operator.cc:25] blob
!= nullptr. op Conv: Encountered a non-existing input blob:
caffe.SpatialConvolution_0_b"

Reviewed By: Yangqing

Differential Revision: D4429231

fbshipit-source-id: 0d3905ea6e87128ec1aa9d0f0a2f43126b1069b1
2017-01-18 14:00:04 -08:00
Viswanath Sivakumar
e67425647a Support bias for Scale layer in caffe_translate
Summary:
Turns out xray models have some independent Scale layers (with bias) besides
the Conv-Scale pairs. We could still fuse it with previous layers with some
work, but for simplicity, including Add op followed by Mul for bias if needed.
We could revisit optimizations layer fusion in the future once we have
something working for xray.

Reviewed By: Yangqing

Differential Revision: D4427266

fbshipit-source-id: ef7d8677ccd7d10dbd20759eeed378d9bc4522d1
2017-01-18 09:59:21 -08:00
Yangqing Jia
bfca2b86c3 Removed the old group convolution code
Summary: Now that we direct support group convolution, this will no longer be needed. I also took the chance to add dilated convolution and also optional bias.

Reviewed By: prigoyal

Differential Revision: D4426513

fbshipit-source-id: eb2bb0aa619f8ff5f732512570f736bc59cd57dd
2017-01-18 00:44:31 -08:00
Andrew Tulloch
e23ddf06e9 UnsafeCoalesceOp for nn.Module.flattenParameters style coalescing
Summary:
This is a handy tool for amortizing expensive operators (e.g.
distributed communication, some heavier kernel launches, etc) over a
lot of small blobs (e.g. all the biases in a network). We can just
coalesce these small blobs in-place into a single blob, act on them in
operators, etc as if they are non-coalsed (passing them as inputs to
operators, etc), and then finally for heavier operators, just work on
the coalesced blob that contains each of these units.

I named it UnsafeCoalesce since it introduces blob aliasing, which
needs care for work like memory management, graph rewriting as in
memonger, etc.

Reviewed By: Yangqing

Differential Revision: D3557149

fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795
2017-01-17 17:14:35 -08:00
Viswanath Sivakumar
d63f58013b Throw error in caffe_translator on Scale layer with bias
Summary: Failing fast instead of swallowing the bias term.

Differential Revision: D4419130

fbshipit-source-id: 98ce0af9a20adecfb027ffe8293ff69910873abc
2017-01-17 09:59:20 -08:00
Viswanath Sivakumar
7d6742f2f5 Tool to convert caffe models to c2 + fixes for xray v10
Summary:
Simple tool similar to caffe_translator_test.py for conversion from caffe to
caffe2. The differences are:

There are a couple of issues that need to be fixed as mentioned in
https://our.intern.facebook.com/intern/tasks?t=15424761, especially related to
the 'legacy_pad' field in conv op.

Differential Revision: D4407146

fbshipit-source-id: ec641f6d7e0cf6cdf2eca21f058b4451635d4a56
2017-01-17 08:59:58 -08:00
Aapo Kyrola
b96c2ed6ab fix validation to consider cpu-only ops
Summary: Data paralell model has a sanity check that ensures that operators inputs/outputs do not cross device boundaries. This failed when the operator was a CPU-only operator (such as the new AccuracyOp version). This fixes that.

Reviewed By: prigoyal

Differential Revision: D4417841

fbshipit-source-id: 9bc4e7a2074a544ca4db69ecf24183bbd41f84ca
2017-01-13 18:59:32 -08:00
Yangqing Jia
8683737410 Caffe translator: match torch pooling
Summary: See code comments: legacy is a legend.

Reviewed By: viswanathgs

Differential Revision: D4414447

fbshipit-source-id: 7cd96778bbc00aff053100871f273b2e1b43c973
2017-01-13 10:59:20 -08:00
Ahmed Taei
9ad10959ee Enable large PlanDef protobuf message.
Summary:
Enable cases where PlanDef message is bigger than protobuf string decoding
limits.

Differential Revision: D4412736

fbshipit-source-id: 91ee02d7a8ab85b1c8169683a6c1dccd4c79be40
2017-01-13 09:29:29 -08:00
Bram Wasti
0d5f3654b2 Adding back untracked files from manual github pull
Summary: Github import didn't work and the manual import lost some files.

Reviewed By: Yangqing

Differential Revision: D4408509

fbshipit-source-id: ec8edb8c02876410f0ef212bde6847a7ba327fe4
2017-01-12 08:59:19 -08:00
Yangqing Jia
1cd166d330 CMake completions work
Summary: Closes https://github.com/caffe2/caffe2/pull/88

Differential Revision: D4404292

Pulled By: bwasti

fbshipit-source-id: 8a4351c2dee5136aaa12b90f1a61fd7afee51994
2017-01-11 16:59:22 -08:00
Pooya Davoodi
92ebb58a06 Top-k accuracy operator on host
Summary:
Automatically copy from device -> host if necessary.

Thanks to pooyadavoodi for the host top-k code.
Closes https://github.com/caffe2/caffe2/pull/51

Reviewed By: Yangqing

Differential Revision: D4348953

Pulled By: bwasti

fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe
2017-01-10 18:44:30 -08:00
Andrey Malevich
8047b8dc83 Fix random issues with some of the layers getting missing from registry.
Summary:
It looks like for the types that are created directly through type(...)
function call, we don't store the strong references anywhere. As a result
a GC call in Python might/or might not clean up these classes depending on the
phase of the moon and other random things. This results in a fact that in some
cases simple layers as a Relu might disappear.

cat_shame

Reviewed By: xianjiec

Differential Revision: D4396289

fbshipit-source-id: ba4e9b7ef54ee43349853b0acc3d3f40c74e4d73
2017-01-10 15:14:31 -08:00
Aapo Kyrola
bb928f3cc0 Latest fixes to Xray Flow workflows for Caffe2
Summary:
(Ignore the convolution-op related changes, they will be later patched separately)

This diff ignores work from latest few weeks:
- some refactoring of the flow ops
- no_bias setting
- MAP computation (instead of accuracy) for OC
- adaptive learning rate for Xray concepts
- various small bug fixes

Reviewed By: viswanathgs

Differential Revision: D4329500

fbshipit-source-id: 000d4fd22ec408af5290480c788eb86546bff52e
2017-01-10 12:59:23 -08:00
Aapo Kyrola
4f1db36cff add CUDA gradient for Div
Summary: DivOp missed a gradient for CUDA, so implemented it. Also added operator test.

Differential Revision: D4396638

fbshipit-source-id: 9949e47aa3735bb418a0db003e2b2f4896056a71
2017-01-09 21:59:23 -08:00
Aapo Kyrola
95b3309a87 Gradient Input memory sharing using memonger blob sharing
Summary:
This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs.

In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock.

The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough.

Module data_parallel_model supports this feature natively.

Reviewed By: prigoyal

Differential Revision: D4363209

fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1
2017-01-09 19:44:23 -08:00
Yangqing Jia
3732a0044c Move mpi_python.cc to the python folder to be more consistent about source file locations.
Summary: TSIA

Differential Revision: D4386553

fbshipit-source-id: 2c7196171be7d0af90b46b75f68c949ee3980c2e
2017-01-09 10:59:39 -08:00
Bram Wasti
737000b166 Linter fix up to sync fbsource and github 2017-01-06 15:36:17 -08:00
Bram Wasti
3833dad5f6 manual sync of old never sync'd files 2017-01-06 15:28:45 -08:00
Yangqing Jia
375c0816b3 goodbye old brewery 2017-01-04 20:58:35 -08:00
Yangqing Jia
5bfd6c4cd1 semicolon 2017-01-04 14:36:16 -08:00
Yangqing Jia
311ae2ba33 build file fix and avx2 on mac fix 2017-01-04 14:35:15 -08:00
Bram Wasti
2f3b5d7943 Moved binaries/python CMake files to reflect paradigm of the rest of the codebase 2017-01-04 14:02:52 -08:00
Bram Wasti
7ea9f9e0ee Updated naming convention of Caffe2_LINK* 2017-01-04 12:03:27 -08:00
Yangqing Jia
b1a31942fc Merge remote-tracking branch 'upstream/master' into cmake 2017-01-03 18:10:58 -08:00
Simon Layton
7c3f1521a7 Gpu transform
Summary:
Adds a thread pool for image decode, and optional GPU-based data conversion, mean subtraction and std division
Closes https://github.com/caffe2/caffe2/pull/56

Reviewed By: Yangqing

Differential Revision: D4341326

Pulled By: bwasti

fbshipit-source-id: 6485616ea7d212c7701274a40fae912db30dff4a
2017-01-03 17:59:34 -08:00
Alisson Gusatti Azzolini
6618d7462d Improvements+fixes for NetBuilder
Summary: Title.

Reviewed By: dzhulgakov

Differential Revision: D4358227

fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8
2017-01-03 16:59:24 -08:00
bwasti
9ce23cbb71 Fix false positive for non-clang compilers. 2016-12-29 11:39:50 -08:00
Bram Wasti
b48f1ff810 OS X build 2016-12-29 12:25:53 -05:00
Xianjie Chen
4b3bd06a7f sparse nn converges better by dedupping sparse gradient by mean
Summary:
this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter.

experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first.

Differential Revision: D4367283

fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607
2016-12-27 22:59:29 -08:00
Jason Jeong
9e75aa4d35 specify path to write htrace logs
Summary: This diff adds a gflag for specifying the path for htrace span log files. This flag is used by the net types `HTraceDAGNet` and `HTraceAsyncDAGNet`.

Differential Revision: D4366849

fbshipit-source-id: 56038d3d64a3fd5ab363feda86a19a6f2496971c
2016-12-27 11:44:31 -08:00
Bram Wasti
826abe8438 Merge pull request #5 from caffe2/master
merge caffe2:master into bwasti:master
2016-12-26 14:19:30 -05:00
Ou Jin
a4f3721e15 weightedsum on ps
Summary:
Rewrite D3993337 based on new stack.
Comparing to the old one, we need more readers to achieve the same speed. But so far the speed is the same and the new bottleneck is the write bandwidth of trainer. Model quality is the same as the base.

Reviewed By: azzolini

Differential Revision: D4310803

fbshipit-source-id: 6d04ae8040c1ee7caa9aea5287f054e73fbe325a
2016-12-22 19:14:38 -08:00
Ievgen Soboliev
a7f8fe0423 introduce request net into prediction schema
Summary: As title. We want to have request_only net which runs on user_only sparse features. Submitting to get early feedback.

Reviewed By: dzhulgakov

Differential Revision: D4282783

fbshipit-source-id: 71241bf5444550075884c788c2da4783659bc1e0
2016-12-22 15:59:27 -08:00
Aapo Kyrola
e51e651255 Remove redundant and failing test of FeedBlob asserts
Summary: Recently a PR landed that removed asserts of trying to feed float64 to FeedBlob for GPUs and changed to a warning. Thus the test testing assertions were given started to fail. Removing it.

Reviewed By: Yangqing

Differential Revision: D4363780

fbshipit-source-id: d9e222c309302243138d4ff3c223c711a4d2052d
2016-12-22 14:59:28 -08:00
Priya Goyal
3eb08feff5 Support no_bias in naive group conv implementation
Summary:
I was testing perf difference between naive group conv and cudnn group conv. I am doing no_bias conv and added support for that in naive implementation
although its deprecated, i thought it would be nice to have working things in our code

Differential Revision: D4363168

fbshipit-source-id: 29719013d79b449fd359884709c7a1195be51ae3
2016-12-22 14:14:26 -08:00
Bram Wasti
d4a783405f Merge branch 'master' of github.com:caffe2/caffe2 into cmake 2016-12-22 13:15:05 -08:00
Aapo Kyrola
db5cc8f278 revert exhaustive_search setting to False
Summary: As per discussion in D4355529

Reviewed By: prigoyal

Differential Revision: D4362162

fbshipit-source-id: 795fcf1507235a7dc3c7a10b0453037936d057aa
2016-12-22 12:44:42 -08:00
Maxime Boucher
e2181a32ca Normalize rank loss gradient to avoid convergence issues when the number of pairs is really large
Summary:
Essentially, when number of pairs is around 1000, then only positive samples in the list gets a massive boost from all the negative examples. This diff normalizes the gradient and the loss with the number of pairs.

This diff also adds protection against NaN and more logging to help debug.

Reviewed By: kdub0

Differential Revision: D4359782

fbshipit-source-id: 7240344ddb1f2f670d1eec1b03e7f6e413f3dfcc
2016-12-21 17:29:24 -08:00
Yangqing Jia
2c6a579859 Make all convolution operators allow optional bias term
Summary:
It used to be that only the cudnn engine supports it, and now it should be
fully supported by any conv engine.

To ignore bias, simply use a convolution op that has two inputs instead of
3. The gradient operator will automatically figure out that it does not
compute the bias gradient.

Reviewed By: prigoyal

Differential Revision: D4354183

fbshipit-source-id: cf71b6289a254d15a6a663a85df63fbbaec3702b
2016-12-21 15:14:24 -08:00