Summary:
support 0 size in any of the tensor dimensions in mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295
Differential Revision: D13573747
Pulled By: yinghai
fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53
Summary:
the speed-up of a single operation is up to 3X .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106
Differential Revision: D13429596
Pulled By: bddppq
fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.
Broken out of https://github.com/pytorch/pytorch/pull/13733/
cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898
Reviewed By: pjh5
Differential Revision: D13381123
Pulled By: orionr
fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971
Reviewed By: bddppq
Differential Revision: D12850675
Pulled By: yinghai
fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019
Summary:
This test flushes out the issue that IDEEP cannot handle tensor with dims like (0, 2), which is a valid tensor shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8459
Differential Revision: D10419328
Pulled By: yinghai
fbshipit-source-id: c5efcd152364a544180a8305c47a2a2d126ab070
Summary:
the speed-up of a single operation is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686
Reviewed By: yinghai
Differential Revision: D9828129
Pulled By: wesolwsk
fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8
Summary:
1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks.
2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models.
3. Ignore 0-dim tensor in MKL-DNN concat operator.
4. Generate dynamic library of Detectron module for CPU device.
This PR obsoletes #9164.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157
Differential Revision: D9276837
Pulled By: yinghai
fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f
Summary:
AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293
Differential Revision: D9276789
Pulled By: yinghai
fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667
MKL-DNN doesn't support 64-bit integger (cfee61bf81/include/mkldnn_types.h (L62-L75)). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back.
Reviewed By: pjh5
Differential Revision: D8943544
fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116
* [mpscnn] MPSCNNChannelShuffle
att
* [Easy] Adding tags as an argument to the functional layer
Without it "tags" would be added as an argument to the operator.
The change here is based on the assumption that there is no operator that takes "tags" as an argument.
* Fix locally_connected_op schema check.
Fix locally_connected_op schema check.
* [C2] Add TypeAndShape inference for few more operators
As desc
* [c2] Shape inference should support 0 as dimension
Tensors can have 0 in their dimension.
* Make MockHiveReader loop over and support max_examples
Replace DatasetReader with RandomDatasetReader.
So that Mock Hive Reader can simulate a large data input using a small sample file as source.
* Utility function to wipe cache between benchmark runs
Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.
* Allow caffe2 GlobalInit to be invoked multiple times
Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.
* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes
Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG
* Rethrow current exception on failure
Rethrow current exception instead of copy constructing a new one on op failure.
* Make `clone()` return subclass of List/Struct
`clone()` is not working correctly when we subclass those classes
* Wipe the cache before the net run
the util function is copied from D7409424
will rebase once D7409424 is landed.
* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds
* Correct includes
async_polling include -> async_base include
* Prepare execution flags for executor migration
Making async_scheduling aware of underlying net type to prepare for executor
migration
* Add operator level observers into async executor
Adding operator level observers into RunAsync operators' calls
* Cleanup TEST_Benchmark
Remove duplicate code and provide default implementation in NetBase
* [C2] Fix type and shape inference for binary comparison ops
As desc.
* Add GlobalInit to predictor to ensure initialization is always done before prediction
FACEBOOK:
Redo D7651453 the correct way.
Now use a static variable for the arguments passed to GLog
* Remove spammy log message
This method is currently used in various places inside Caffe itself.
* Disable events for operators inside a chain
We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes
* Ensure correct finish run order
In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net
* Reduce unnecessary polling
Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on
* Squash commit of syncing 9506eeb from github to fbcode
Patch xplat buck fix
add virtual destructor to OptimizationPass
add virtual destructor to OptimizationPass
build fixes for sync
build fixes for sync
* Fix net tracing
Fix net tracing from async_scheduling
* Fix logging