Commit graph

382 commits

Author SHA1 Message Date
Simon Layton
a2b31cf9e2 Install fixes
Fix paths, __init__.py initialization
Other assorted fixes
2016-12-21 09:14:04 -05:00
Simon Layton
99e97a4b7a Correction to paths to find cuDNN 2016-12-16 16:03:23 -05:00
Simon Layton
dac78727fb Add missing file 2016-12-16 08:00:47 -05:00
Bram Wasti
0154db83c0 Merge pull request #54 from slayton58/cmake
Initial CMake building with deps
2016-12-15 10:46:19 -08:00
Simon Layton
03c9d54fd0 Support openCV 2 2016-12-14 14:59:59 -05:00
Simon Layton
a46f0fb3cb Merge branch 'cmake' of https://github.com/slayton58/caffe2 into cmake 2016-12-14 11:00:17 -05:00
Simon Layton
788f715a6e third_party protobuf support
Fix python lib missed proto dep
2016-12-14 10:54:15 -05:00
Simon Layton
df12f431e0 Removing extraneous cmake files
Leftover from Caffe cmake build system
2016-12-13 09:29:01 -05:00
Simon Layton
d7eeebc269 Refactored CUDA detection a bit
Refactoring, minor fixes
2016-12-13 09:29:01 -05:00
Simon Layton
d74bd7ee55 Add CUDA NVRTC cases 2016-12-13 09:29:01 -05:00
Simon Layton
fbbb87cd46 Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-13 09:29:01 -05:00
Simon Layton
5e699ce6c2 CUDA fixes
Fix NCCL build
move CUDA dep into Dependencies file
2016-12-13 09:29:01 -05:00
Simon Layton
b9599c7464 Compiling entire project
Can run CIFAR10 Python example!
2016-12-13 09:29:01 -05:00
Simon Layton
e9f1222408 Compiling most of the project
Now compiles all CPU + GPU code, tests + binaries with deps
2016-12-13 09:29:01 -05:00
Simon Layton
c05ff206b6 Build binaries 2016-12-13 09:29:01 -05:00
Simon Layton
2610d62813 Build Python libs 2016-12-13 09:29:01 -05:00
Simon Layton
52f09fe2c9 Initial building with deps 2016-12-13 09:29:01 -05:00
Bram Wasti
e9de70f296 Added basic build system 2016-12-13 09:29:01 -05:00
Simon Layton
122e115937 Removing extraneous cmake files
Leftover from Caffe cmake build system
2016-12-12 12:50:08 -05:00
Simon Layton
681267b66a Refactored CUDA detection a bit
Refactoring, minor fixes
2016-12-12 12:29:00 -05:00
Simon Layton
9f35f47411 Add CUDA NVRTC cases 2016-12-09 11:01:27 -05:00
Simon Layton
09de969e9f Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-09 10:29:06 -05:00
Simon Layton
cdb2fb6737 CUDA fixes
Fix NCCL build
move CUDA dep into Dependencies file
2016-12-09 09:02:26 -05:00
Simon Layton
f79bffc78d Compiling entire project
Can run CIFAR10 Python example!
2016-12-08 13:23:04 -05:00
Simon Layton
4255ee9944 Compiling most of the project
Now compiles all CPU + GPU code, tests + binaries with deps
2016-12-08 08:40:29 -05:00
Simon Layton
497659ce0d Build binaries 2016-12-07 10:54:06 -05:00
Simon Layton
f3c20620ed Build Python libs 2016-12-06 13:06:16 -05:00
Simon Layton
3d719f4bff Initial building with deps 2016-12-06 11:39:15 -05:00
Xianjie Chen
dea27ca4ca use TIndex for set in math.h
Summary: as desc

Differential Revision: D4271900

fbshipit-source-id: 92f7cbbe33e0ce4fcc21a8af9ded4f436afb43e2
2016-12-05 11:53:27 -08:00
Alisson Gusatti Azzolini
5f7d1f02f2 Use native reader for evaluation
Summary:
Since hashing is different.

This should be ready to commit now. Running ads nn canaries.

Differential Revision: D4264009

fbshipit-source-id: 3aa16b0c47c61f9a442b0375524c5f1580af5892
2016-12-05 11:53:27 -08:00
Byung-Gon Chun
1aba4280d8 Make xray net_type configurable
Summary: Make xray net_type configub a command line argument

Differential Revision: D4262076

fbshipit-source-id: e2ecb9cd5bee5d6aaebe0ea8d2d4d9b378058cba
2016-12-05 11:53:27 -08:00
Pieter Noordhuis
6c13dc3dd0 Fix CreateCommonWorld schema
Summary: TSIA

Reviewed By: dzhulgakov

Differential Revision: D4264328

fbshipit-source-id: 59eaf791a05b0202000f3b7266aba63e146229d4
2016-12-05 11:53:27 -08:00
Yangqing Jia
ab3fea540d Add serialization interface for MKLMemory
Summary: This allows us to serialize things between MKLMemory and a TensorProto.

Reviewed By: dzhulgakov

Differential Revision: D4218044

fbshipit-source-id: 934181493b482cb259c17ff4b17008eac52fd885
2016-12-05 11:53:27 -08:00
Aapo Kyrola
e65eeff665 LMDB example
Summary:
This examples writes a LMDB database of image data and labels (random). Then it reads them using Caffe2's TensorProtosDBINput and validates the checksums match. This example shows how to coerce image data into TensorProtos and be happy.

Before there was no clear example how to create databases for Caffe2.

Differential Revision: D4263614

fbshipit-source-id: 21e08066899095b4efcc2d23dbc3ede81e75914a
2016-12-05 11:53:26 -08:00
Aapo Kyrola
96a5e88d63 Fix consequtive checkpoint syncs
Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process).

Differential Revision: D4251754

fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9
2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov
3125e6a821 Hacky fix for cloned model rewriting
Summary:
Disclaimer: this is really hacky

Continues a fix from D4218902. The root problem is that DPER builds net incrementally and input_record doesn't support it properly. For not I just manipulate the input record directly. Alisson wants to fix it properly later by allowing set_input_record to accept a superset of current record.

But it should unblock our experimentation.

I'm curious how it's going to look in dper_example world.

Reviewed By: azzolini

Differential Revision: D4255285

fbshipit-source-id: ff65b6f943d705a9b3399035597e2e8ded2e1ff3
2016-12-05 11:53:26 -08:00
Martin Raison
ea9a0f24bf automatic aggregation of sparse gradients
Summary:
This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs).

I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients.

Reviewed By: dzhulgakov

Differential Revision: D4219788

fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a
2016-12-05 11:53:26 -08:00
Xianjie Chen
2045a5de9f add position based weighting
Summary: adding more methods to the layer representation. The corresponding implementation in DPER is: https://fburl.com/563869364

Differential Revision: D4256583

fbshipit-source-id: 91326b7bb9e960a5bc70b5a13812fce90054eceb
2016-12-05 11:53:26 -08:00
Aapo Kyrola
3410939459 pass learning rate scaling factor to parameter update builder function
Summary:
When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus.

Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size.

Reviewed By: prigoyal

Differential Revision: D4248907

fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be
2016-12-05 11:53:26 -08:00
Pieter Noordhuis
a3942b2d64 Add store ops and tests
Summary: Basic ops to set/get/check/wait against a StoreHandler.

Differential Revision: D4248059

fbshipit-source-id: cc53061fcc13823d4b9eed6b7c1c346b9e8ec991
2016-12-05 11:53:26 -08:00
Pieter Noordhuis
f3403a1110 Add RedisStoreHandler
Summary:
Add store handler implementation backed by a Redis server.

This allows for easy rendezvous when participating machines have no
access to a shared filesystem.

Differential Revision: D4241715

fbshipit-source-id: 4ce881df3a96af24f7efbb02d1050b3b2b9bc3c0
2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov
119b687994 Allow PythonOp to access the workspace
Summary:
DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine.

Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators.

I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side.

Reviewed By: ajtulloch

Differential Revision: D4242943

fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41
2016-12-05 11:53:26 -08:00
Andrey Malevich
2390dfefdb Kill few more CHECKs.
Summary:
One more small batch of CHECKs that left in C2 codebase. Most of the left overs
should be in tests/GPU only code.

Reviewed By: Yangqing

Differential Revision: D4243782

fbshipit-source-id: a4a03c116ea8ba16facd2efc135746d5921f19d5
2016-12-05 11:53:25 -08:00
Jason Jeong
af2a3076a2 add header for AsyncDAGNet
Summary: This diff adds a header file for net_gpu.cc so that the AsyncDAGNet class can be used to create other derived classes.

Reviewed By: ajtulloch

Differential Revision: D4230046

fbshipit-source-id: 379c3ff7ebb7aeeb4294f39e6f5d1ecad48b92f0
2016-12-05 11:53:25 -08:00
Bram Wasti
8f398d795e Added basic build system 2016-12-04 16:42:00 -08:00
Yangqing Jia
107966b059 add error message for asan
Summary:
This makes sure that we have useful CUDA error message in asan mode. Also
made a fb specific task pass by explicitly marking it not asan-able.

Reviewed By: dzhulgakov

Differential Revision: D4243471

fbshipit-source-id: 2ce303b97b3b4728c05575a8e7e21eb5960ecbc7
2016-11-29 15:18:39 -08:00
Martin Raison
da72658fa8 sparsehash-based implementation of UniqueOp
Summary:
Faster implementation of UniqueOp using google::dense_hash_map, as suggested by dzhulgakov. I haven't benchmarked it precisely but early measurements with my workflow show a significant speed bump (this operation went from using 20% of overall CPU time down to 7%).

I gated the implementation using the "engine" feature, to avoid adding sparsehash as a dependency to caffe2.

Reviewed By: dzhulgakov

Differential Revision: D4219768

fbshipit-source-id: 2f142981e772105b42fffa24afb199ef816f8e0c
2016-11-29 15:18:39 -08:00
Maxime Boucher
f16c2fe3da Create a reserve operation for tensors to avoid reallocating memory on Extend() and Resize() operations
Summary: I want to collect tensors over multiple batches and so this operation could become helpful to allocate enough memory from the beginning

Reviewed By: dzhulgakov

Differential Revision: D4216198

fbshipit-source-id: e6b67cc7d80d71455487878da9b6b7a225035085
2016-11-29 15:18:39 -08:00
Liang Xiong
1aafeb3565 clean up memory of c2/sigrid predictor
Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata.

Differential Revision: D4232595

fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665
2016-11-29 15:18:39 -08:00
Xianjie Chen
f41b2ca85c fix sliceop for empty batch
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.

Reviewed By: dzhulgakov

Differential Revision: D4235498

fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
2016-11-29 15:18:39 -08:00