Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg
Differential Revision: D4801798
fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444