Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12685
In this diff, we push the fake run of the net into the ONNXIFI transformer, because
1. We cannot do shape inference for every op
2. Since the net has been SSA rewritten, we cannot use shape info from outer workspace directly.
In addition, this diff adds input shape info when querying the `onnxBackendCompatibility` function.
Reviewed By: bddppq
Differential Revision: D10390164
fbshipit-source-id: 80475444da2170c814678ed0ed3298e28a1fba92
Summary:
Simple additions that make it vastly easier to use nomnigraph in
python
Reviewed By: duc0
Differential Revision: D10383027
fbshipit-source-id: 441a883b84d4c53cca4f9c6fcc70e58692b8f782
Summary:
This seems to be a typo that never got caught - no actual functionality changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12688
Differential Revision: D10391704
Pulled By: Yangqing
fbshipit-source-id: ce633776957628c4881956c5423bfab78294d512
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661
Was disabled since workspace.has_mkldnn is now set to false
Reviewed By: yinghai
Differential Revision: D10383913
fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686
Summary:
The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636
Differential Revision: D10377099
Pulled By: soumith
fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382
implement fp16-> (uint8 + scale and bias in fp32)
this is similar to fp32 rowwise quantization
we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways
Reviewed By: csummersea
Differential Revision: D10220463
fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
Summary:
This resolves an issue where the `model.Copy` operation would
copy to the wrong GPU, such that the below `net.Sum` operation
would use an input argument for which p2p access was not enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12342
Differential Revision: D10343181
Pulled By: ezyang
fbshipit-source-id: fd2d6d0ec6c09cda2db0a9a4f8086b3560e5a3ec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12178
Fisher GAN calls processor_util.add_mlp, which inject the layer norm through the
normalizer. We allow to use alternative impl for LayerNorn in the normalizer.
Reviewed By: Wakeupbuddy
Differential Revision: D9235528
fbshipit-source-id: 88c126c658102926613242ef84a481f6de1676ed
Summary:
1. Fix BN translator
IntelCaffe and NVCaffe fuse BN+Scale, and the "BatchNorm" op contains 5 params including (scale and bias)
2. Fix Scale translator
the translated outputs of scale have the same names with those of Conv.
All their names are output + '_w' and output + '_b'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10056
Differential Revision: D10099205
Pulled By: yinghai
fbshipit-source-id: 73a73868e3e16c495e8b233fdb1d373d556a9537
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390
Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes.
Reviewed By: mlappelbaum
Differential Revision: D10209812
fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.
After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389
Differential Revision: D10224267
Pulled By: yf225
fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180
I had to fix a lot of call sites, because a lot of places assume that
you can actually get a const vector&, and if the internal representation
of sizes in a tensor is NOT a vector, it's not possible to fulfill
this API contract.
Framework changes:
- I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to
sizes() now.
- De-templatized SetDims; now it is an explicit list of ArrayRef and
variadic overloads. This makes implicit conversions work again,
so I don't need to explicitly list the std::vector cases too.
- As a knock-on effect, this causes Reset() to accept at::IntList as well as
const std::vector<int64_t>&
- Edited variadic overloads of SetDims to all forward to the underlying
arbitrary-dim implementation, reducing code duplication. (It's probably
marginally less efficient in the new world.)
- Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList
- Make MKLTensor accept ArrayRef along with vector in constructor and
Reset (unfortunately, no implicit conversions here, since it's templated on
index type.)
- There are a few other places, like cudnn, where I changed functions
that previously took const std::vector<int64_t>& to take at::IntList
instead.
Classification of call site changes:
- 'const std::vector<int64_t>& x_dims = x.dims()' ==>
'at::IntList x_dims = x.dims()'
- 'std::vector<int64_t> x_dims = x.dims()' ==>
'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!)
Usually this is because we're about to mutably modify the vector
to compute some new dimension. However, it also very commonly occurs in the
form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators.
- Instead of constructing std::vector<int64_t>{blah, blah}, construct an
at::IntList directly
ArrayRef changes:
- cbegin()/cend() iterators, they operate the same aas begin()/end() because
everything on ArrayRef is const.
- Moved operator<< into ArrayRef.h, so that it's always available when
working with ArrayRef. I also templated it, so it now works on an
ArrayRef of any type.
- Add operator== overload for ArrayRef, and also add variants to permit
comparison of ArrayRef with std::vector, a very common operation.
(The non-templated version of operator== can get these automatically
via implicit conversion, but with templates C++ refuses to do
any explicit conversions.)
I'm planning to audit all dims() call sites to make sure they don't
expect 'auto x = t.dims()' to give you an x whose lifetime can validly
outlive the tensor.
I opted not to do a dims() to sizes() rename, because dims() also matches
the protobufs accessor. Bad news!
Reviewed By: jerryzh168
Differential Revision: D10111759
fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452
Summary:
All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source.
Fixes#11757
Reviewed By: Yangqing
Differential Revision: D10031862
Pulled By: SsnL
fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712
Summary:
If block is missing control inputs when do caffe2 net execution, this PR add them back and remove the un-SSA semantics
jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12224
Differential Revision: D10135408
Pulled By: wanchaol
fbshipit-source-id: 746c870bde54ed4ca627167361db1b3f36cd235c
Summary:
Original commit changeset: f5614a5d2607
D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.
Reviewed By: orionr
Differential Revision: D10123245
fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349
Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case.
Reviewed By: jspark1105, ilia-cher
Differential Revision: D7218043
fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7
Summary:
the speed-up of a single operation is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686
Reviewed By: yinghai
Differential Revision: D9828129
Pulled By: wesolwsk
fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8
Summary:
This does 6 things:
- add c10/util/Registry.h as the unified registry util
- cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10
Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077
Reviewed By: ezyang
Differential Revision: D10050771
Pulled By: Yangqing
fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021
TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem.
Also remove some accidentally checked in zips of a test that we didn't end up including for now.
Reviewed By: houseroad
Differential Revision: D10013682
fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.
This is a codemod by mechanically doing the following change:
CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019
Reviewed By: ezyang, teng-li
Differential Revision: D10016276
Pulled By: Yangqing
fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043
Re-trying D9979976, this time with all call sites fixed.
D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems.
I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource.
Reviewed By: ezyang
Differential Revision: D10026392
fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12020
- make it less verbose to create random blobs in python unit test by adding some test helper methods
- move str_compare test helper method to test_util.py
Reviewed By: ZolotukhinM
Differential Revision: D10003637
fbshipit-source-id: cb79d2ad508341f750a1bb8f564e87d055c65652
Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223
Differential Revision:
D10026321
Ninja: stable broken
fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6