Make it easier to plug in intermediate steps between preprocessing & trainer by maintaining a stable schema.
I also fixed enqueue() so that we can pass in the same blob in multiple location without causing data corruption.
The way `splits()` is currently used is so convoluted. It's impossible to compose ReaderBuilder. I'm working on a composite reader so this is a prerequisite for it.
The idea is that the ReaderBuilder should maintain the states it needs to create a reader. Any setup is done through the new `setup()` method. Currently, `setup()` should only be called once, but, if needed, it should be safe to call it multiple times.
* Add CollectAndDistributeFpnRpnProposalsOp for FPN support
* Adds a C++ operator equivalent to the Python op in Detectron
* Once some additional GenerateProposalsOp changes are made this will
let us support Detectron FPN models with straight Caffe2 C++ ops
* RetinaNet and segmentation models require additional work
* Remove some uses of conservativeResize
* Add notes about training and inputs/outputs to operator documentation
* Fixing conda
* Adding hypothesis and onnx to conda builds
* Updates but still not working
* Adding required changes to conda_full
* Updates
* Moving to more general build_anaconda script
* Adding check for gcc version
* Adding general ways to add/remove packages from meta.yaml?
* Changes for specific packages to build on gcc 5.4
* Fix with glog spec
* Requiring >numpy 1.12 for python 3 to satisfy opencv dependency
* Adding pydot to required testing packages
* Adding script to read conda versions for gcc ABI
* Trying to fix segfault by installing in env instead
* conda activate -> source activate
* Trying adding back leveldb
* Setting locale for ONNX + conda-search changed its format
* read_conda_versions handles libprotobuf
* Conda script updates
* Adding a protobuf-working test
* Removing changes to proto defs b/c they will require internal changes in a separate diff
* Fix useless opset_import in onnx
* Set the default ir version in make_model
* Use the target_opset_version in Caffe2Frontend
* remove make_model from helper in caffe2.python.onnx
* Reduce Sum and Reduce Mean
* Handle reductions with empty 'axes'
* Merge codebase and simplify tesnor reduction logic
* Restructure code and add comments.
* Fix parameter to scale
* Fix parameter to scale
* [GanH]: two_task_discriminator
as titled
and adding label smooth
* [Dper2] Simplified UI options needed for blob magnitude visualization
* [GanH]: fix tags
as titled
* Added type and shape inference for GatherRange operator
This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.
* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python
We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.
* Bind Gloo IoException to IoError in Python
Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.
* [GanH]: add label smoothing to softmax with loss
as titled
* [C2] Enable LARS in Adagrad and hook it to DPER
* [DPER] Don't pass LayerModelHelper in create_trainer_nodes
Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.
* fix bugs in LambdaRankNdcgOp
the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.
* Restrict thread pool on iOS to only big cores
Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.
* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
* make clang happy and get fewer warnings
make clang happy and get fewer warnings
* [Personalization] Support add_output_schema() in layer_model_helper
Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.
Solution:
For flexibility, we want to add fields to output_schema incrementally.
Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.
Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer
Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
* [C2] Don't crash kernel in case of invalid shapes for ConcatOp
Enforce correctness of the shapes for input tensors so we won't access invalid index.
* [Caffe2] Add analytical performance counters to Dynolog
Initial diff for counting analytical flops and memory writes for C2 operators.
* BBoxTransform op: Handle RoIs from multiple images per batch
BBoxTransform op used during typical Faster-RCNN inference operates only on
RoIs from a single image (no batching). Adding support to handle that with an
optional output blob containing the batch splits (i.e., the number of RoIs
belonging to each item in the batch). The code is perfectly backward compatible
and shouldn't break any existing models..
* [mkl] Make MKL-DNN cooperate with memongered nets
C2's MKL-DNN implementation caches input dims and reuses intermediate and
output buffers across net runs, which prevents memonger from being used. This
may not always be useful since input dims may vary widely in many cases and
we'll end up reallocating anyway. Added an option to force reallocation when
memonger is used.
* [oncall] fix batch gather ops for empty input
still need to bisect for the breaking change, but this shall fix the case for empty input.
the error logging is like: https://interncache-ftw.fbcdn.net/t49.3276-7/23938497_293562711176943_6500112636590424064_n.txt?_nc_log=1
@[557759185:raychen] can you help to subscribe oncall from ads side. this may affect the Sigrid online trainer.
* optimize BatchOneHotOp
We want to iterate in row-major as opposed to column-major for better
locality.
* Supported exporting model with int blobs.
Supported exporting model with int blobs. Needed by condensenet.
* BoxWithNMSLimit op: Handle boxes from mutiple images per batch
Similar to D7135360. Added support for multiple images per batch in the op.
Takes an optional additional input "batch_splits" as output by BBoxTransform
op, and returns new batch_splits after applying NMS and filtering. Otherwise,
backward compatibility is maintained.
Summary:
Executing loop's body in a separate workspace, using WorkspaceStack to
support saving and reusing of workspaces
Test Plan:
python caffe2/python/operator_test/onnx_while_test.py
Reviewers: caffe2-review, jamesreed
Subscribers:
Tasks:
Tags:
This op is used for gradient clipping to take care of exploding / vanishing gradients.
If original_norm is larger than the threshold,
then each element of the tensor is scaled by threshold / original_norm.
Adding NUMA awareness through numa_node_id in DeviceOption. Blobs of operators
with numa_node_id are allocated on corr. memory banks, using CPU pools with
NUMA affinity set to run operators.
with python3 np.int defaults to int64. This diff should fix it. I don't know if test exist for this function already, however following ASR test was breaking when i switch to py3
```
buck test caffe2/caffe2/fb/speech/asr_training/:tensor_parser_test
```
After D6953547 some of the blobs were no longer impacted by uint8 quanitzation,
but they would still generate operators expecting uint8 inputs and thus fail.
This diff is adding a temporal hack to avoid doing this quantization when layer
is not quantized.
Will fix it with switching to Net rewriting instead.
* Scope MultiRNN blobs with name as well as layers
Also don't double scope MultiRNN in case of multiple layers.
* Scope input projection of first layer with name
We don't scope it with layers because the projection is done
outside of the layer.
* Avoid scoping input blob in MemongerTest.test_rnn
* Rectify input_blob in prepare_input
Revert change in memonger_test because rectifying input will solve the problem.
* First attempt on sqrt op
* Adding the Sqrt op along with the test cases
* Made changes per @Yangqing's questions re: tensor format and used hypothesis to generate input tensor