The way `splits()` is currently used is so convoluted. It's impossible to compose ReaderBuilder. I'm working on a composite reader so this is a prerequisite for it.
The idea is that the ReaderBuilder should maintain the states it needs to create a reader. Any setup is done through the new `setup()` method. Currently, `setup()` should only be called once, but, if needed, it should be safe to call it multiple times.
Summary: The interface is not used anywhere AFAICT; cleaning up to make it less confusing.
Reviewed By: kuttas
Differential Revision: D6867040
fbshipit-source-id: 3e8a77df76ef09c6864c308561825777b326f76c
Summary: ReaderWithTimeLimit() class to stop after a certain amount of time
Reviewed By: boryiingsu
Differential Revision: D6477623
fbshipit-source-id: 165874c9344b0c9c7e0b33e12e72e24c46669cb2
Summary:
Comments say experimental: don't use it. But these functions are used in the critical path from pipeline.py, so better to remove the comment?
Also changed if-else to first check for None. Although python does not crash with getattr(None, "x"), it is confusing.
Some lint issues.
Reviewed By: azzolini
Differential Revision: D5853639
fbshipit-source-id: 977de5ba0ea3ae26343ae5fcacac883faf892b0e
Summary: Allow context to be passed into piper function
Reviewed By: volkhin
Differential Revision: D5684716
fbshipit-source-id: 693f0464fe28f8692d75901705a85a0a413a7bed
Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split
Reviewed By: azzolini
Differential Revision: D5004212
fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
Summary:
Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified.
Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out.
Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\
Reviewed By: xianjiec
Differential Revision: D4665677
fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f
Summary: This makes sure dper_example is compatible with the new way of defining checkpoint epochs. See D4499320.
Reviewed By: xianjiec
Differential Revision: D4511618
fbshipit-source-id: f5188010cdefe3739f87f6049d1ea6aee765c514
Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information.
Differential Revision: D4492399
fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc
(1) nccl submodule, cnmem submodule
(2) mpi ops fallback test
(3) a bit more blob interface
(4) fixed tests
(5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts
(6) In the build system autogen __init__.py instead of having manual
rules just to copy over an empty __init__.py.