pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Shen Li	cf7a0e5af4	Use RPC context streams to cover serde ops (#57926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57926 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28316526 Pulled By: mrshenli fbshipit-source-id: 1907ec8f46e40fa5049d810c6ad959263361b6aa	2021-05-11 07:07:51 -07:00
Luca Wehrstedt	36e47af58b	Pass reference to parent future in callbacks (#57635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635 Note: this PR looks massive, but it's just one simple change, codemodded many times. In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't). ghstack-source-id: 128296580 Test Plan: CI Reviewed By: wanchaol Differential Revision: D28178783 fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081	2021-05-07 03:59:18 -07:00
Luca Wehrstedt	0422e67336	Use Devices instead of DeviceIndexes in TensorPipe agent (#57294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57294 With the advent of CPUs in the device maps, and to be more generic (e.g., to support AMD GPUs), and to avoid conversions when passing to Future and RRef and such, it's easier to use Devices instead of DeviceIndices. This started by just migrating the TensorPipe agent but the RPC layer is quite intertwined so I had to migrate a lot of stuff. ghstack-source-id: 127916562 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28092733 fbshipit-source-id: 024dcb3648c5898ab13e770413c43958f04f1a8a	2021-05-01 16:12:55 -07:00
Scott Wolchok	b87d3fa432	[PyTorch][jit] Don't allow create() on singleton types (#56807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807 If I understand correctly, there's no reason to create your own instance of these global singleton types. ghstack-source-id: 127312270 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27973447 fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912	2021-04-30 10:28:50 -07:00
Lucas Hosseini	8868f9c8e3	[TensorPipe] Use targetDevice in tensorpipe_agent. (#56346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56346 Now that TensorPipe's API has `targetDevice`, use that instead of manually writing the CUDA device index in `metadata`. Test Plan: CI Reviewed By: lw Differential Revision: D27703235 fbshipit-source-id: c5b620e3b3ce619367412efdbe9fa3778f6b8869	2021-04-20 11:54:13 -07:00
Lucas Hosseini	3802e577fb	[TensorPipe] Use Descriptor::Tensor::sourceDevice in tensorpipe_agent. (#55821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55821 Test Plan: CI Reviewed By: lw Differential Revision: D27661608 fbshipit-source-id: fd241f073d8928528a749758c7d0f570dfeb677b	2021-04-15 03:21:26 -07:00
Lucas Hosseini	047164437e	[TensorPipe] Prepare for new Pipe API. (#55820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55820 Test Plan: CI Reviewed By: lw Differential Revision: D27648291 fbshipit-source-id: e08db6e8c1f5f333ec355de29e25fbe552904b25	2021-04-15 03:20:32 -07:00
Lucas Hosseini	09f1f14569	Transition to new tensorpipe::Pipe API. (#55193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193 Test Plan: CI Reviewed By: lw Differential Revision: D27466387 fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3	2021-04-02 02:28:07 -07:00
Lucas Hosseini	9d6a81d1a6	Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136 This will ease the transition to the new API where `Buffer` does not store a length anymore. Test Plan: CI Reviewed By: lw Differential Revision: D27466385 fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925	2021-04-01 06:55:02 -07:00
Lucas Hosseini	a84afb3a7c	Use type-erased union for Buffer. (#54251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54251 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/324 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. This is a new version of D27001339 (`c618dc13d2`) which broke PyTorch OSS build. Test Plan: CI Reviewed By: lw, mrshenli Differential Revision: D27156053 fbshipit-source-id: 4244302af33a3be91dcd06093c0d6045d081d3cc	2021-03-19 04:58:09 -07:00
Mike Ruberry	8caa7889fc	Revert D27001339: Use type-erased union for Buffer. Test Plan: revert-hammer Differential Revision: D27001339 (`c618dc13d2`) Original commit changeset: 26d7dc19d69d fbshipit-source-id: 6e036ed7e1f71c9cf20e3361607c4fe4fa2d3d02	2021-03-18 05:27:17 -07:00
Lucas Hosseini	c618dc13d2	Use type-erased union for Buffer. (#322 ) Summary: Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54145 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. ghstack-source-id: 124131499 Test Plan: CI Reviewed By: lw Differential Revision: D27001339 fbshipit-source-id: 26d7dc19d69d7e3336df6fd4ff6ec118dc17c5b6	2021-03-18 02:23:17 -07:00
Wanchao Liang	a4f0f8b1e9	[distributed] add base processgroup::options (#53662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53662 Add a base processgroup::options so that we can do inheritance and provide a universal option API in python Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26968856 Pulled By: wanchaol fbshipit-source-id: 858f4b61b27aecb1943959bba68f8c14114f67d8	2021-03-17 18:40:04 -07:00
Shen Li	c7b1979b6b	Use Store collect and verify names in all RPC agents (#53209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53209 closes #40048 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26791524 Pulled By: mrshenli fbshipit-source-id: fc75589f9707014334fcfae6f05af3c04217783b	2021-03-07 16:51:46 -08:00
Rong Rong (AI Infra)	b0aa03b703	fix tensorpipe_agent linked even when USE_TENSORPIPE is turned off (#53281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53281 Reviewed By: xuzhao9 Differential Revision: D26822375 Pulled By: walterddr fbshipit-source-id: d4e2b7ed1b38782a9e7f6c5b96b7bb0e31c4bdae	2021-03-04 13:29:27 -08:00
Rohan Varma	3b11822825	[RPC] Refactor rref_context to not use utils::Future (#51697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51697 Refactors the rest of rref_context, specifically pendingOwners map and `getOwnerRRef` to use JitFuture. ghstack-source-id: 122037611 Test Plan: CI Reviewed By: wanchaol Differential Revision: D26243268 fbshipit-source-id: ab8874c8253274e8fe50dcd7291e0655a8f3f1df	2021-02-19 00:59:38 -08:00
Pritam Damania	fd41ed1cce	Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51939 TestTrainingLoop - TestE2ETensorPipe was flaky since there would still be inflight background RPCs running as we performed the assertions. This resulted in these assertions failing since we didn't wait for all RPCs on the agent to finish. To resolve this issue, in this PR we join() and shutdown() the RPC agent to ensure no further RPCs are done. Then we assertion the map sizes to ensure no leaks occurred. In addition to this, added messageIdToTimeout map to lookup the appropriate timeout for a messageId. This ensures we remove the appropriate entry from the map. The previous solution was passing the expirationTime through the lambda, but it is not guaranteed the lambda would read the response of the request we just sent out. ghstack-source-id: 121412604 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26331585 fbshipit-source-id: a41e0534d7d4dfd240446e661e5541311931c7d7	2021-02-10 22:14:06 -08:00
Pritam Damania	40eea6d9d1	Support device map for distributed autograd while using TensorPipe. (#44859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: https://github.com/pytorch/pytorch/issues/44170 ghstack-source-id: 119950842 Test Plan: 1) waitforbuildbot 2) Unit test added. Reviewed By: mrshenli Differential Revision: D23751975 fbshipit-source-id: 2717d0ef5bde3db029a6172d98aad95734d52140	2021-01-27 13:01:44 -08:00
Pritam Damania	8b501dfd98	Fix memory leak in TensorPipeAgent. (#50564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50564 When an RPC was sent, the associated future was stored in two maps: pendingResponseMessage_ and timeoutMap_. Once the response was received, the entry was only removed from pendingResponseMessage_ and not timeoutMap_. The pollTimedoudRpcs method then eventually removed the entry from timeoutMap_ after the time out duration had passed. Although, in scenarios where there is a large timeout and a large number of RPCs being used, it is very easy for the timeoutMap_ to grow without any bounds. This was discovered in https://github.com/pytorch/pytorch/issues/50522. To fix this issue, I've added some code to cleanup timeoutMap_ as well once we receive a response. ghstack-source-id: 119925182 Test Plan: 1) Unit test added. 2) Tested with repro in https://github.com/pytorch/pytorch/issues/50522 #Closes: https://github.com/pytorch/pytorch/issues/50522 Reviewed By: mrshenli Differential Revision: D25919650 fbshipit-source-id: a0a42647e706d598fce2ca2c92963e540b9d9dbb	2021-01-18 16:34:28 -08:00
Shen Li	098751016e	Completely Remove FutureMessage from RPC cpp tests (#50027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50027 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753815 Pulled By: mrshenli fbshipit-source-id: 85b9b03fec52b4175288ac3a401285607744b451	2021-01-07 19:50:50 -08:00
Shen Li	008206decc	Replace FutureMessage with ivalue::Future in RRefContext (#49960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49960 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25730530 Pulled By: mrshenli fbshipit-source-id: 5d54572c653592d79c40aed616266c87307a1ad8	2021-01-07 19:50:19 -08:00
Shen Li	25ef605132	Replace FutureMessage with ivalue::Future in distributed/autograd/utils.* (#49927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49927 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25724241 Pulled By: mrshenli fbshipit-source-id: d608e448f5224e41fbb0b5be6b9ac51a587f25b4	2021-01-07 19:50:16 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Wanchao Liang	665ac2f7b0	[reland] [c10d] switch Store to be managed by intrusive_ptr (#47808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808 reland https://github.com/pytorch/pytorch/pull/47074 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905246 fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03	2020-11-11 22:53:20 -08:00
Wanchao Liang	1f946e942d	Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667128 (`0cfe3451d4`) Original commit changeset: 9b6024c31c85 fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945	2020-11-11 10:49:58 -08:00
Wanchao Liang	0cfe3451d4	[c10d] switch Store to be managed by intrusive_ptr (#47074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24667128 Pulled By: wanchaol fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b	2020-11-10 23:36:44 -08:00
Pritam Damania	bf85642c4c	Remove lock from GraphTask::set_exception_without_signal. (#45867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867 In most cases the lock ordering was hold a lock in local autograd and then hold a lock in DistAutogradContext. In case of `set_exception_without_signal` the lock order was in reverse and as a result we saw potential deadlock issues in our TSAN tests. To fix this, I removed the lock and instead just used std::atomic exchange. In addition to this, I fixed TestE2E to ensure that we use the appropriate timeout. TestE2EProcessGroup was flaky for these two reasons and now is fixed. ghstack-source-id: 113592709 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D24120962 fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66	2020-10-05 20:02:29 -07:00
Lucas Hosseini	ac8c7c4e9f	Make Channel API accept buffer structs rather than raw pointers. (#45014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212 + Introduce buffer.h defining the buffer struct(s). The `CpuBuffer` struct is always defined, while the `CudaBuffer` struct is defined only when `TENSORPIPE_SUPPORTS_CUDA` is true. + Update all channels to take a `CpuBuffer` or `CudaBuffer` for `send`/`recv` rather than a raw pointer and a length. + Make the base `Channel`/`Context` classes templated on `TBuffer`, effectively creating two channel hierarchies (one for CPU channels, one for CUDA channels). + Update the Pipe and the generic channel tests to use the new API. So far, generic channel tests are CPU only, and tests for the CUDA IPC channel are (temporarily) disabled. A subsequent PR will take care of refactoring tests so that generic tests work for CUDA channels. An other PR will add support for CUDA tensors in the Pipe. Differential Revision: D23598033 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b	2020-09-21 10:18:45 -07:00
Lucas Hosseini	af3fc9725d	Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803 Test Plan: CI Reviewed By: lw Differential Revision: D23732022 fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece	2020-09-18 13:51:43 -07:00
generatedunixname89002005287564@sandcastle1415.cln1.facebook.com	1dd658f28f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#43953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43953 Reviewed By: malfet Differential Revision: D23445556 fbshipit-source-id: 89cd6833aa06f35c5d3c99d698abb08cd61ae4ab	2020-09-01 21:48:28 -07:00
Shen Li	06aaf8c20d	Add set_device_map to TensorPipeOptions to support GPU args (#42637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42637 This commit enables sending non-CPU tensors through RPC using TensorPipe backend. Users can configure device mappings by calling set_map_location on `TensorPipeRpcBackendOptions`. Internally, the `init_rpc` API verifies the correctness of device mappings. It will shutdown RPC if the check failed, or proceed and pass global mappings to `TensorPipeAgent` if the check was successful. For serde, we added a device indices field to TensorPipe read and write buffers, which should be either empty (all tensors must be on CPU) or match the tensors in order and number in the RPC message. This commit does not yet avoid zero-copy, the tensor is always moved to CPU on the sender and then moved to the specified device on the receiver. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D23011572 Pulled By: mrshenli fbshipit-source-id: 62b617eed91237d4e9926bc8551db78b822a1187	2020-08-14 18:46:55 -07:00
Nikita Shulga	2f9fd8ad29	Build test_e2e_tensorpipe only if Gloo is enabled (#43041 ) Summary: test_e2e_tensorpipe depends on ProcessGroupGloo, therefore it could not be tested with Gloo disabled Otherwise, it re-introduces https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43041 Reviewed By: lw Differential Revision: D23122101 Pulled By: malfet fbshipit-source-id: a8a088b6522a3bc888238ede5c2d589b83c6ea94	2020-08-14 09:24:47 -07:00
Luca Wehrstedt	ed242cbec5	Guard TensorPipe agent by USE_TENSORPIPE (#42682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42682 ghstack-source-id: 109834351 Test Plan: CI Reviewed By: malfet Differential Revision: D22978717 fbshipit-source-id: 18b7cbdb532e78ff9259e82f0f92ad279124419d	2020-08-14 02:57:36 -07:00
Luca Wehrstedt	8493b0d5d6	Enroll TensorPipe agent in C++-only E2E test (#42680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42680 ghstack-source-id: 109544678 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22978714 fbshipit-source-id: 04d6d190c240c6ead9bd9f3b7f3a5f964d7451e8	2020-08-13 07:07:30 -07:00
Nikita Shulga	64a7939ee5	test_cpp_rpc: Build test_e2e_process_group.cpp only if USE_GLOO is true (#42836 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42836 Reviewed By: seemethere Differential Revision: D23041274 Pulled By: malfet fbshipit-source-id: 8605332701271bea6d9b3a52023f548c11d8916f	2020-08-10 16:54:26 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Edward Yang	352e15f1a2	Revert D22812445: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D22812445 (`2335430086`) Original commit changeset: e6d824bb28f5 fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d	2020-07-31 10:16:48 -07:00
Luca Wehrstedt	2335430086	Update TensorPipe submodule (#42225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CircleCI is all green. Reviewed By: beauby Differential Revision: D22812445 fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f	2020-07-30 02:32:52 -07:00
Pritam Damania	ff6e560301	Add C++ end to end test for RPC and distributed autograd. (#36893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36893 Adding an end to end test for running a simple training loop in C++ for the distributed RPC framework. The goal of this change is to enable LeakSanitizer and potentially catch memory leaks in the Future. Enabling LSAN with python multiprocessing is tricky and we haven't found a solution for this. As a result, adding a C++ test that triggers most of the critical codepaths would be good for now. As an example, this unit test would've caught the memory leak fixed by: https://github.com/pytorch/pytorch/pull/31030 ghstack-source-id: 107781167 Test Plan: 1) Verify the test catches memory leaks. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D21112208 fbshipit-source-id: 4eb2a6b409253108f6b6e14352e593d250c7a64d	2020-07-15 12:59:19 -07:00
Luca Wehrstedt	72f2ff5950	[TensorPipe] Improve serialization (#39010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39010 The initial version of the serialization for the TensorPipe RPC agent (i.e., the conversion from rpc::Message to tensorpipe::Message) worker around a limitation of TensorPipe of only allowing one payload per message by pickling each tensor separately and storing the pickles as metadata (which is a less efficient way of sending data over, as it goes through more copies). Having now lifter that limitation we can now improve the way we serialize. We now put the type and the id as their own payloads, we do a single pickling pass for all the tensors of the message (which allows us to deduplicate them) and store the pickle as a payload. My impression is that pickling is a somewhat costly operation, so reducing the number of times we do it should be beneficial for performance. For this same reason, another change I've done here is separate the allocation of the buffers from the deserialization. This will allow us (in the future) to perform the allocation on the I/O event loop but perform the unpickling in the worker thread, thus keeping the event loop more responsive. ghstack-source-id: 104810740 Test Plan: RPC tests Differential Revision: D21716067 fbshipit-source-id: c1475cc78afdcf0820a485ffd98c91abb35796c7	2020-05-28 10:48:24 -07:00
Luca Wehrstedt	bc09478a60	[TensorPipe] Use the new multi-payload message API (#37919 ) Summary: In D21209901 TensorPipe added support for a vector of payloads inside each message, instead of a single one, so that users with multiple payloads can send them separately as they are instead of having to copy them into a new block of contiguous memory. The PyTorch agent is using the old API, which is preventing us from deleting it. This change has no effects on over-the-wire format and thus on performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37919 ghstack-source-id: 103572164 Test Plan: On both workers ``` import os import torch import torch.distributed.rpc as rpc os.environ["MASTER_ADDR"] = "127.0.0.1" os.environ["MASTER_PORT"] = "8765" ``` On worker 0 ``` rpc.init_rpc(name="foo", rank=0, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 1 ``` rpc.init_rpc(name="bar", rank=1, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 0 ``` In [15]: rpc.rpc_sync("bar", torch.add, args=(torch.full((2,2), 1), torch.full((2,2), 2))) Out[15]: tensor([[3., 3.], [3., 3.]]) In [16]: rpc.rpc_sync("bar", torch.add, args=(1, 2)) Out[16]: 3 ``` Differential Revision: D21425536 fbshipit-source-id: a0ec2be825556b39aff018a2834baf815a6d8fa5	2020-05-07 02:52:30 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Hongyi Jia	3411ec6e32	[TensorPipe/RPC] Serialize and deserialize message (#36197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36197 Create APIs to convert between rpc::message and tensorpipe::message 1. tensorpipeSerialize() - converts rpc::message to tensorpipe::message without memory copy (tensors). 2. tensorpipeAllocateMessage - allocates rpc::message based on received tensorpipe descriptor to prepare memory-copy-free receiving. Test Plan: buck test caffe2/test/cpp/rpc:test_tensorpipe_serialization Reviewed By: lw Differential Revision: D20084125 fbshipit-source-id: ffbc310f93443e50261aed752be0fe176610dd2a	2020-05-05 05:45:57 -07:00
Jeremy Lilley	443fe7ca0e	[rpc] Avoid wireDeserializer overreading buffers by 1 byte (#36976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36976 The bounds check and the read were swapped in two places - I noticed ASAN complaining in an unrelated change on an erroneous buffer. Adding a couple simple test cases. ghstack-source-id: 102606986 Test Plan: buck test mode/dev caffe2/test/cpp/rpc: Differential Revision: D21148936 fbshipit-source-id: 7ec5007535f7310437ac1b9a72852a223b9dd29a	2020-04-21 17:01:45 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Jeremy Lilley	fff6fe83a7	[pytorch-rpc] WireSerializer should check has_storage() (#34626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34626 We need to check has_storage() before looking at it in cloneSparseTensors(), to avoid gratuitously throwing. Ideally, we'd add a test for this (I wrote one up but had to disable it), but won't work until JIT Pickler supports sparse tensors. ghstack-source-id: 100018077 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcAgent/... Differential Revision: D20399971 fbshipit-source-id: 5debfa8140eb1f949d37336330223962cc320abc	2020-03-12 11:35:21 -07:00
generatedunixname89002005287564	9482683065	Remove dead includes in caffe2/test Reviewed By: ezyang Differential Revision: D19273220 fbshipit-source-id: 3dfc3388914e60611c84472e3fc529f5b5e40534	2020-01-21 11:30:34 -08:00
Jeremy Lilley	dff7b945bf	Avoid sending large unneeded data over wire in process_group_agent. (#31357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31357 If a user selects a subset of a Tensor and sends it in an RPC, we were sending the whole original Tensor Storage over the network. While this sounds reasonable, in practice, we observed view-like Tensors being sent over rpc, where only 1% of the data in the provided Tensor's Storage was actually used/needed. The simple solution here is to just force a clone in the serializer code if we see that less than (arbitrary) half the bits are used, and the tensor is more than a nominal few KB. Add related tests to ensure this doesn't break. An alternate approach would be to modify the Pickler. That said, since Pickler is shared by more components, the logic might be harder to tailor appropriately at that layer (particularly given that the Pickler has explicit logic to share a single Storage* among several Tensors that commonly point to the same Storage*). It's possible that we might want to further refine the basic thresholds in this change. In practice, we've seen a mostly bimodal distribution thus far for the percent of Tensor Storage referred by a Tensor in observed rpcs (i.e. either 90%+ or sub-10% of the Storage referenced), hence the existing 50% threshold here is probably not an unreasonable starting point. ghstack-source-id: 95925474 Test Plan: buck test mode/dev caffe2/test/cpp/rpc/... Differential Revision: D19137056 fbshipit-source-id: e2b3a4dd0cc6e1de820fd0740aa1d59883dbf8d4	2019-12-18 19:24:24 -08:00

1 2

51 commits