(1) Loss: do not coerce a gradient output. Although it may be numerically more efficient to do so, it makes the definition of a loss kind of funny if one does not really want to run backward pass.
(2) Autodifferentiation: allow more explicit in-place check, in-place is now opt-in, and implemented a simple SSA/IR gradient generation scheme. Also added some core gradient tests.
Misc bugfixes as well.
(1) cudnn for conv
(2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it.
(3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using
NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.
(1) various bugfixes.
(2) Tensor is now a class independent from its data type. This allows us
to write easier type-independent operators.
(3) code convention changes a bit: dtype -> T, Tensor<*Context> -> Tensor* alias.
(4) ParallelNet -> DAGNet to be more consistent with what it does.
(5) Caffe's own flags library instead of gflags.
(6) Caffe's own logging library instead of glog, but glog can be chosen with
compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros
like CHECK, DCHECK now have prefix CAFFE_, and LOG(*) now becomes
CAFFE_LOG_*.
(7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF
in build_env.py.
(2) blob serialization comments
(3) cudnn: putting it under a separate device name
so we can explicitly choose cudnn instead of
having CUDA device prioritizing it.
(4) note that mint is not available with ipython
due to zeromq conflict
(5) db_throughput utility
(6) added gprofiler
(1) added blob serialization.
(2) registry can now use key types other than string.
(3) changed load_save_op so they interface with a db.
(4) change sgd iter op: it does increments so we can resume an iter.
(5) mnist linear classifier tests snapshot functionality.
(6) added protodb which is a small wrapper over TensorProtos.