Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865
factored out code used in tests for operators Add, Mul and Sub
into two new methods: a first one to generate the test vectors, a second
one to run the actual tests given a caffe2 and python operator.
Reviewed By: houseroad
Differential Revision: D13526955
fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553
Add unit test and implementation of NHWC layout for Resize operator.
Also, add pragma parallel loop to old NCHWC layout.
Reviewed By: jspark1105
Differential Revision: D13540762
fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5
Summary:
Enable conv+add fusion, same as conv+sum
Caution: only element-wise add is supported on IDEEP without scalar
broadcast. Otherwise, the fusion is illegal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15268
Differential Revision: D13577375
Pulled By: yinghai
fbshipit-source-id: 92c9c4b667c5ca5f7a262a5bffaa8aa68eeff3bd
Summary:
Hello,
This is a little patch to fix `DeprecationWarning: invalid escape sequence`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15733
Differential Revision: D13587291
Pulled By: soumith
fbshipit-source-id: ce68db2de92ca7eaa42f78ca5ae6fbc1d4d90e05
Summary:
support 0 size in any of the tensor dimensions in mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295
Differential Revision: D13573747
Pulled By: yinghai
fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625
3D group conv (both NCHW and NHWC layout) was not correct.
Added group=2 in test_1d_convolution and test_3d_convolution in conv_test
Reviewed By: protonu
Differential Revision: D13562099
fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417
Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken.
Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff.
Reviewed By: dzhulgakov
Differential Revision: D13495011
fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632
Just formatting and a few lints.
Reviewed By: yinghai
Differential Revision: D13562403
fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588
Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions.
Reviewed By: csummersea
Differential Revision: D13557674
fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082
We didn't have unit test for low-precision rowwise adagrad
Reviewed By: chocjy
Differential Revision: D13300732
fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389
SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros.
The unit tests were also not covering this special case which is fixed by this diff.
Reviewed By: salexspb
Differential Revision: D13515970
fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15458
many nets in the wild seem to have outputs that are never produced by the net.
Reviewed By: ZolotukhinM
Differential Revision: D13534185
fbshipit-source-id: 2b23b39c28404c53f68868f3bf6df53c5fea9eab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15453
Just move things around to facilitate further development. No logic change.
Reviewed By: rdzhabarov
Differential Revision: D13533959
fbshipit-source-id: eebab1306939e802aacffb24a711d372fd67916c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174
Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID. In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.
This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went. The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.
Reviewed By: dzhulgakov
Differential Revision: D13451550
fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15366
swap the old implementation with a slightly easier one to understand
I ran the tests and compared the number of chains compared to the old algorithm. This one outperforms on every test, but we have yet to see if that impacts performance at all.
old chain 34 nomnigraph chain 25
old chain 46 nomnigraph chain 34
old chain 228 nomnigraph chain 188
old chain 397 nomnigraph chain 338
Reviewed By: ilia-cher
Differential Revision: D13057451
fbshipit-source-id: ccd050bfead6eb94ab9c7b0a70b09a22c2b9e499
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250
This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook.
The default `__repr__` method just returns the object ID which is very unhelpful.
Reviewed By: hanli0612
Differential Revision: D13475758
fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15191
OSS:
just splitting out basic flags from a unit test. So I can extend them in another test where I need to add additional flags.
Reviewed By: yinghai
Differential Revision: D13159184
fbshipit-source-id: 9823e792cf0ed8d0379235c44564862b7d784845
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110
support casting to string on CPU
Reviewed By: intermilan
Differential Revision: D13429381
fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1
Summary:
the speed-up of a single operation is up to 3X .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106
Differential Revision: D13429596
Pulled By: bddppq
fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14631
adding a empty name scope to allow people jump out from current namescope.
This could be useful when you want to access blob from parent or sibling scope.
Facebook:
e.g: we encoutered a potential usecase in D13124249 (it's a large diff, please search by EmptyNameScope in that diff), we need to access to a blob declared in root namescope from a device namescope (device namescope has been used by parallel_GPU API). `EmptyNameScope` can help us do that with ease.
I referenced to `EmptyDeviceScope` D6103412 while implementing this one.
Reviewed By: yinghai
Differential Revision: D13272240
fbshipit-source-id: d4cde5abcc2336e456b6c6ef086266ef94d86da8
Summary:
…done once
This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982
Differential Revision: D13413960
Pulled By: zdevito
fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
Summary:
Currently in caffe2, one cannot properly fetch the content of Int8 blobs.
Upon digging the source code, it turns out that the relevant source code is not being compiled. Adding the source to CMakeLists.txt fixes this issue.
First time ever doing a pull request. Please let me know if there's any rule I should follow. Thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15047
Differential Revision: D13417583
Pulled By: bddppq
fbshipit-source-id: dd39575971a3012635edbf97a045d80e4b62a8eb
Summary:
fix auto grad summing for IfOp where intermediate output needs renaming.
Bug before this diff:
- we only renames the output of IfOp without changing the subnet ops output
- this results in blob not found error
the unittest provides an example
this diff fix that for IfOp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14772
Differential Revision: D13327090
Pulled By: harouwu
fbshipit-source-id: ec40ee88526ace3619c54551e223dd71158a02f8
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.
Broken out of https://github.com/pytorch/pytorch/pull/13733/
cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898
Reviewed By: pjh5
Differential Revision: D13381123
Pulled By: orionr
fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe