pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Simon Layton	a2b31cf9e2	Install fixes Fix paths, __init__.py initialization Other assorted fixes	2016-12-21 09:14:04 -05:00
Simon Layton	99e97a4b7a	Correction to paths to find cuDNN	2016-12-16 16:03:23 -05:00
Simon Layton	dac78727fb	Add missing file	2016-12-16 08:00:47 -05:00
Bram Wasti	0154db83c0	Merge pull request #54 from slayton58/cmake Initial CMake building with deps	2016-12-15 10:46:19 -08:00
Simon Layton	03c9d54fd0	Support openCV 2	2016-12-14 14:59:59 -05:00
Simon Layton	a46f0fb3cb	Merge branch 'cmake' of https://github.com/slayton58/caffe2 into cmake	2016-12-14 11:00:17 -05:00
Simon Layton	788f715a6e	third_party protobuf support Fix python lib missed proto dep	2016-12-14 10:54:15 -05:00
Simon Layton	df12f431e0	Removing extraneous cmake files Leftover from Caffe cmake build system	2016-12-13 09:29:01 -05:00
Simon Layton	d7eeebc269	Refactored CUDA detection a bit Refactoring, minor fixes	2016-12-13 09:29:01 -05:00
Simon Layton	d74bd7ee55	Add CUDA NVRTC cases	2016-12-13 09:29:01 -05:00
Simon Layton	fbbb87cd46	Enhancements Add BLAS chooser Move cuDNN detection from Cuda -> FindCuDNN Refactor main C2 libs, should enable no-GPU build (untested)	2016-12-13 09:29:01 -05:00
Simon Layton	5e699ce6c2	CUDA fixes Fix NCCL build move CUDA dep into Dependencies file	2016-12-13 09:29:01 -05:00
Simon Layton	b9599c7464	Compiling entire project Can run CIFAR10 Python example!	2016-12-13 09:29:01 -05:00
Simon Layton	e9f1222408	Compiling most of the project Now compiles all CPU + GPU code, tests + binaries with deps	2016-12-13 09:29:01 -05:00
Simon Layton	c05ff206b6	Build binaries	2016-12-13 09:29:01 -05:00
Simon Layton	2610d62813	Build Python libs	2016-12-13 09:29:01 -05:00
Simon Layton	52f09fe2c9	Initial building with deps	2016-12-13 09:29:01 -05:00
Bram Wasti	e9de70f296	Added basic build system	2016-12-13 09:29:01 -05:00
Simon Layton	122e115937	Removing extraneous cmake files Leftover from Caffe cmake build system	2016-12-12 12:50:08 -05:00
Simon Layton	681267b66a	Refactored CUDA detection a bit Refactoring, minor fixes	2016-12-12 12:29:00 -05:00
Simon Layton	9f35f47411	Add CUDA NVRTC cases	2016-12-09 11:01:27 -05:00
Simon Layton	09de969e9f	Enhancements Add BLAS chooser Move cuDNN detection from Cuda -> FindCuDNN Refactor main C2 libs, should enable no-GPU build (untested)	2016-12-09 10:29:06 -05:00
Simon Layton	cdb2fb6737	CUDA fixes Fix NCCL build move CUDA dep into Dependencies file	2016-12-09 09:02:26 -05:00
Simon Layton	f79bffc78d	Compiling entire project Can run CIFAR10 Python example!	2016-12-08 13:23:04 -05:00
Simon Layton	4255ee9944	Compiling most of the project Now compiles all CPU + GPU code, tests + binaries with deps	2016-12-08 08:40:29 -05:00
Simon Layton	497659ce0d	Build binaries	2016-12-07 10:54:06 -05:00
Simon Layton	f3c20620ed	Build Python libs	2016-12-06 13:06:16 -05:00
Simon Layton	3d719f4bff	Initial building with deps	2016-12-06 11:39:15 -05:00
Xianjie Chen	dea27ca4ca	use TIndex for set in math.h Summary: as desc Differential Revision: D4271900 fbshipit-source-id: 92f7cbbe33e0ce4fcc21a8af9ded4f436afb43e2	2016-12-05 11:53:27 -08:00
Alisson Gusatti Azzolini	5f7d1f02f2	Use native reader for evaluation Summary: Since hashing is different. This should be ready to commit now. Running ads nn canaries. Differential Revision: D4264009 fbshipit-source-id: 3aa16b0c47c61f9a442b0375524c5f1580af5892	2016-12-05 11:53:27 -08:00
Byung-Gon Chun	1aba4280d8	Make xray net_type configurable Summary: Make xray net_type configub a command line argument Differential Revision: D4262076 fbshipit-source-id: e2ecb9cd5bee5d6aaebe0ea8d2d4d9b378058cba	2016-12-05 11:53:27 -08:00
Pieter Noordhuis	6c13dc3dd0	Fix CreateCommonWorld schema Summary: TSIA Reviewed By: dzhulgakov Differential Revision: D4264328 fbshipit-source-id: 59eaf791a05b0202000f3b7266aba63e146229d4	2016-12-05 11:53:27 -08:00
Yangqing Jia	ab3fea540d	Add serialization interface for MKLMemory Summary: This allows us to serialize things between MKLMemory and a TensorProto. Reviewed By: dzhulgakov Differential Revision: D4218044 fbshipit-source-id: 934181493b482cb259c17ff4b17008eac52fd885	2016-12-05 11:53:27 -08:00
Aapo Kyrola	e65eeff665	LMDB example Summary: This examples writes a LMDB database of image data and labels (random). Then it reads them using Caffe2's TensorProtosDBINput and validates the checksums match. This example shows how to coerce image data into TensorProtos and be happy. Before there was no clear example how to create databases for Caffe2. Differential Revision: D4263614 fbshipit-source-id: 21e08066899095b4efcc2d23dbc3ede81e75914a	2016-12-05 11:53:26 -08:00
Aapo Kyrola	96a5e88d63	Fix consequtive checkpoint syncs Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process). Differential Revision: D4251754 fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9	2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov	3125e6a821	Hacky fix for cloned model rewriting Summary: Disclaimer: this is really hacky Continues a fix from D4218902. The root problem is that DPER builds net incrementally and input_record doesn't support it properly. For not I just manipulate the input record directly. Alisson wants to fix it properly later by allowing set_input_record to accept a superset of current record. But it should unblock our experimentation. I'm curious how it's going to look in dper_example world. Reviewed By: azzolini Differential Revision: D4255285 fbshipit-source-id: ff65b6f943d705a9b3399035597e2e8ded2e1ff3	2016-12-05 11:53:26 -08:00
Martin Raison	ea9a0f24bf	automatic aggregation of sparse gradients Summary: This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs). I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients. Reviewed By: dzhulgakov Differential Revision: D4219788 fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a	2016-12-05 11:53:26 -08:00
Xianjie Chen	2045a5de9f	add position based weighting Summary: adding more methods to the layer representation. The corresponding implementation in DPER is: https://fburl.com/563869364 Differential Revision: D4256583 fbshipit-source-id: 91326b7bb9e960a5bc70b5a13812fce90054eceb	2016-12-05 11:53:26 -08:00
Aapo Kyrola	3410939459	pass learning rate scaling factor to parameter update builder function Summary: When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus. Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size. Reviewed By: prigoyal Differential Revision: D4248907 fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be	2016-12-05 11:53:26 -08:00
Pieter Noordhuis	a3942b2d64	Add store ops and tests Summary: Basic ops to set/get/check/wait against a StoreHandler. Differential Revision: D4248059 fbshipit-source-id: cc53061fcc13823d4b9eed6b7c1c346b9e8ec991	2016-12-05 11:53:26 -08:00
Pieter Noordhuis	f3403a1110	Add RedisStoreHandler Summary: Add store handler implementation backed by a Redis server. This allows for easy rendezvous when participating machines have no access to a shared filesystem. Differential Revision: D4241715 fbshipit-source-id: 4ce881df3a96af24f7efbb02d1050b3b2b9bc3c0	2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov	119b687994	Allow PythonOp to access the workspace Summary: DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine. Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators. I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side. Reviewed By: ajtulloch Differential Revision: D4242943 fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41	2016-12-05 11:53:26 -08:00
Andrey Malevich	2390dfefdb	Kill few more CHECKs. Summary: One more small batch of CHECKs that left in C2 codebase. Most of the left overs should be in tests/GPU only code. Reviewed By: Yangqing Differential Revision: D4243782 fbshipit-source-id: a4a03c116ea8ba16facd2efc135746d5921f19d5	2016-12-05 11:53:25 -08:00
Jason Jeong	af2a3076a2	add header for AsyncDAGNet Summary: This diff adds a header file for net_gpu.cc so that the AsyncDAGNet class can be used to create other derived classes. Reviewed By: ajtulloch Differential Revision: D4230046 fbshipit-source-id: 379c3ff7ebb7aeeb4294f39e6f5d1ecad48b92f0	2016-12-05 11:53:25 -08:00
Bram Wasti	8f398d795e	Added basic build system	2016-12-04 16:42:00 -08:00
Yangqing Jia	107966b059	add error message for asan Summary: This makes sure that we have useful CUDA error message in asan mode. Also made a fb specific task pass by explicitly marking it not asan-able. Reviewed By: dzhulgakov Differential Revision: D4243471 fbshipit-source-id: 2ce303b97b3b4728c05575a8e7e21eb5960ecbc7	2016-11-29 15:18:39 -08:00
Martin Raison	da72658fa8	sparsehash-based implementation of UniqueOp Summary: Faster implementation of UniqueOp using google::dense_hash_map, as suggested by dzhulgakov. I haven't benchmarked it precisely but early measurements with my workflow show a significant speed bump (this operation went from using 20% of overall CPU time down to 7%). I gated the implementation using the "engine" feature, to avoid adding sparsehash as a dependency to caffe2. Reviewed By: dzhulgakov Differential Revision: D4219768 fbshipit-source-id: 2f142981e772105b42fffa24afb199ef816f8e0c	2016-11-29 15:18:39 -08:00
Maxime Boucher	f16c2fe3da	Create a reserve operation for tensors to avoid reallocating memory on Extend() and Resize() operations Summary: I want to collect tensors over multiple batches and so this operation could become helpful to allocate enough memory from the beginning Reviewed By: dzhulgakov Differential Revision: D4216198 fbshipit-source-id: e6b67cc7d80d71455487878da9b6b7a225035085	2016-11-29 15:18:39 -08:00
Liang Xiong	1aafeb3565	clean up memory of c2/sigrid predictor Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata. Differential Revision: D4232595 fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665	2016-11-29 15:18:39 -08:00
Xianjie Chen	f41b2ca85c	fix sliceop for empty batch Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch. Reviewed By: dzhulgakov Differential Revision: D4235498 fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc	2016-11-29 15:18:39 -08:00

1 2 3 4 5 ...

382 commits