Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40139
This unit test runs the same set of operations locally and then with
DDP + RPC to verify correctness.
ghstack-source-id: 106287490
Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/:ddp_under_dist_autograd
I ran these to make sure I am workin on a clean git repo.
git submodule update --init --recursive
to get latest tensor pipe code, otherwise build will have error.
to record installed binaries and torch package wheels to system paths
with-proxy env BUILD_CAFFE2_OPS=0 USE_CUDA=0 USE_MKLDNN=0 USE_DISTRIBUTED=1 python setup.py install --record files.txt
remove binaries and torch package wheels from system paths.
xargs rm -rf < files.txt
build in develop mode
with-proxy env BUILD_CAFFE2_OPS=0 USE_CUDA=0 USE_MKLDNN=0 USE_DISTRIBUTED=1 python setup.py develop
pytest test/distributed/test_ddp_under_dist_autograd.py::TestDdpUnderDistAutograd -v
Differential Revision: D22084385
fbshipit-source-id: e1f57e86ceddd4c96920ed904898e1763b47e8f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37968
Modify memory format promotion rules to avoid promoting when one of the input is ambiguous. New rules are:
Ambiguous + Contiguous = Contiguous
Ambiguous + Channels Last = Channels Last
Contiguous + Ambiguous ( NC11 ) = Contiguous
Contiguous + Channels Last = Contiguous ( + Warning ) Before this PR: Channels Last
Channels Last + Contiguous = Channels Last ( + Warning )
Channels Last + Ambiguous = Channels Last
Bias + Channels Last = Channels Last
Channels Last + Bias = Channels Last
Test Plan: Imported from OSS
Differential Revision: D21819573
Pulled By: VitalyFedyunin
fbshipit-source-id: 7381aad11720b2419fb37a6da6ff4f54009c6532
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985
avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels.
This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale.
Test Plan:
In my devserver
for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done
Before this diff
2-bit
3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized
4-bit
3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized
8-bit
0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized
After this diff
2-bit
0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized
4-bit
0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized
8-bit
0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized
Reviewed By: choudharydhruv, jianyuh
Differential Revision: D22033195
fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467
Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users.
Test Plan: Sandcaslte + OSS CI.
Reviewed By: ngimel
Differential Revision: D22146782
fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40318
rename layernom fakefp16 to the right naming convention
add it to the map of replacement ops
this can be done even if the operator is not complete because we are blacklisting anyways
Test Plan: net_runner and inspected the log that replacement happened
Reviewed By: venkatacrc
Differential Revision: D22145900
fbshipit-source-id: f19794ec05234b877f7697ed8b05dd8f46606c47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40249
Blocking wait didn't work for dist.barrier() since we performed a
cudaDeviceSynchronize() before we performed any of the timeout checks. As a
result, in case of failures/desync the barrier() call would get stuck on
cudaDeviceSynchrnonize() and would never return a timeout error to the user.
To fix this, I've moved the device synchronization after the timeout checks.
ghstack-source-id: 106250153
ghstack-source-id: 106250153
Test Plan: waitforbuildbot
Differential Revision: D22126152
fbshipit-source-id: d919a7a6507cca7111d8ad72e916777b986d0d67
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40296
1. Added a link to parameter server tutorial
2. Explained current states for TorchScript support
Test Plan: Imported from OSS
Differential Revision: D22142647
Pulled By: mrshenli
fbshipit-source-id: ffd697dd64a3aa874cf3f3488122ed805903370d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40276
- add a couple new namespaces;
- handle the case where both contextual namespace and opreator namespace
are set (BackendSelectRegister.cpp and #39401);
- improve error message;
Test Plan: Imported from OSS
Differential Revision: D22135686
Pulled By: ljk53
fbshipit-source-id: 14d359c93573349b8fe1e05d7e44d875295a5f6d
Summary:
Make `common_utils.TestCase.precision` a property, because it is overriden as such in `common_device_type`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40057
Differential Revision: D22138385
Pulled By: malfet
fbshipit-source-id: 0e7c14654bf60f18f585efc61f96fdd0af23346f
Summary:
Update pytorch/onnx docs for new export API args:
Use external data format and Training args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39802
Reviewed By: hl475
Differential Revision: D22139664
Pulled By: houseroad
fbshipit-source-id: 7d6dcf75129cb88987f8c37b7d9d48ca594c0f38
Summary:
Remove black_listed_operators for opset 12 as we now support these ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39414
Reviewed By: hl475
Differential Revision: D21915584
Pulled By: houseroad
fbshipit-source-id: 37ec7bdd2b5a845484535054026d6613d0921b7a
Summary: enhance the sls test to reflect the shapes and values
Test Plan: ran sls tests on device and emulator
Reviewed By: amylittleyang
Differential Revision: D22094433
fbshipit-source-id: 610a79433ae6c58f626b5984a3d89d9e1bbf4668
Summary:
This is to import a few features:
- a fix to a race condition happening in SHM's use of epoll
- a new XTH channel, that uses a memcpy to transfer between threads of the same process
- a new MPT channel, that chunks and multiplexes tensors over multiple transport event loops
Test Plan: Run in CircleCI
Reviewed By: patricklabatut
Differential Revision: D22140736
fbshipit-source-id: a3cee8a3839d98a42b8438844a9fd24fd85b2744
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39126
futureResponseMessage is shadowed in the pipeWrite lambda which
creates some confusion, since it is used in the initial error handling but then
a future of the same name is created when marking the future as completed. This
change removes this by getting rid of the futureResponseMessage capture,
instead capturing the message id. This change also makes it so that we don't
need to copy it into the lambda.
ghstack-source-id: 106211353
Test Plan: CI
Differential Revision: D22127398
fbshipit-source-id: c98a53b5630ce487461e4ca9cd72fbd34788298d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39677
Test Plan:
Moved a test class suite between files, wanted to have same functionality (simple code refactor) so tested to make sure the test output was the same before/after the refactor.
Image below shows the output of TestGraphModePostTrainingStatic before refactor
{F239676498}
This image shows the output of TestQuantizeScript (renamed version that is in test_quantize_script.py instead of test_quantize.py)
{F239676509}
Differential Revision: D21940638
Pulled By: edmundw314
fbshipit-source-id: 54160a5151aadf3a34bdac2bcaeb52904e6653ed
Summary:
There has a missing '=' in rpc_sync call in RPC example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40280
Differential Revision: D22137619
Pulled By: mrshenli
fbshipit-source-id: f4e4b85f68fd68d29834e199416176454b6bbcc2
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4618
`onnxInputNames_` originated from positional name binding. This is inherited from C2, where in C2 inputs are bound by position. So it's useless to check the name here as like as `onnxInputNames_` is filled. If should save cycles on string comparison.
Test Plan: run it.
Reviewed By: jackm321
Differential Revision: D22104338
fbshipit-source-id: 250463744aa37ed291aebd337e26d573048583ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40187
There were two issues:
1) The hand-written definition included an ambiguous default, which made the deprecated signature not selected. This didn't match the handwritten torch.nonzero, now they do.
2) A parsing bug for empty argument lists meant the signature wasn't being marked as deprecated.
Test Plan: Imported from OSS
Differential Revision: D22118236
Pulled By: gchanan
fbshipit-source-id: a433ce9069fef28aea97cbd76f2adf5a285abd73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38840
JIT graph executor runs some canonical optimizations such as cse, dead code
elimination etc before constructing code that interpreter executes.
Since we do not have full JIT in lite interpreter any such graph optimizations
must happen AOT.
This diff applies such canonical optimizations on graph.
Test Plan: CI's test_mobile_optimizer.
Reviewed By: dreiss
Differential Revision: D21675855
fbshipit-source-id: 5dd898088ef8250103ccbbb6aa2bbce156a8d61d
Summary:
Previously the module would log some data using `print()`. This can be
a problem when used in contexts where the process expects to write data to
stdout itself. This diff changes the log statements to use `logger` instead.
This makes it similar to other log statements in the same module.
Test Plan:
Confirmed no weird test showed up when running:
buck test caffe2/test/distributed/nn/api:remote_module_fork
Differential Revision: D22136172
fbshipit-source-id: a3d144eba6c75925ed684981793c84b36eb45a5d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222
Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use.
ghstack-source-id: 106225711
Test Plan: Export to GitHub, build locally and try out the docs.
Differential Revision: D22116494
fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40270
Original commit changeset: 1227e243ab94
D22082806 (1e03d603c6) broke the model generation of pyper models. We trace the namedtuple as input. To unblock the development of PyPer project, let's revert the diff first.
Sorry about the inconvenience, SplitInfinity
ghstack-source-id: 106217609
Test Plan: buck run dper3/dper3_models/experimental/pytorch/feed:feed_generation_script -- --model_files_dir=/tmp/
Reviewed By: alyssawangqq
Differential Revision: D22132960
fbshipit-source-id: ce9278c8462602a341e231ea890e46f74e743ddf