Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543
Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292
Test Plan: It builds
Reviewed By: cbalioglu
Differential Revision: D29062002
fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881
This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D29277876
Pulled By: navahgar
fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550
Original commit changeset: ed655497a981
Whatever gcc version OSS Bazel uses wasn't happy move-constructing the
SimpleIREvaluator, so use a unique_ptr instead.
Test Plan:
CI. Hope that the gcc version used by OSS Bazel build is
happier with this (it should be), since actually testing it locally is
an intractable pain.
Reviewed By: navahgar
Differential Revision: D29333116
fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525
This PR added NNC external function call binding for two XNNPack ops:
- prepacked::linear_clamp_run
- prepacked::conv2d_clamp_run
Both ops take two arguments: a regular input tensor and a prepacked context
object that contains other parameters like weights/bias/etc. The prepacked
context object's type is a custom class.
NNC doesn't generate assembly code that reads the content of the prepacked
object directly. It simply passes it into the XNNPack ops wrapper, so both
NNC and the generated assembly code don't need to know the custom class type.
At compilation time, we use a size-1 dummy tensor as the placeholder for the
prepacked XNNPack context object.
At runtime, we pass in the raw pointer of the XNNPack context object as if it
were a regular tensor storage data pointer.
Inside the external function call wrapper, we reinterpret_cast the raw pointer
back to the custom class type before dispatching to the XNNPack ops.
ghstack-source-id: 132135512
Test Plan: unit test
Reviewed By: bertmaher
Differential Revision: D28924934
fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016
Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.
Fixes https://github.com/pytorch/pytorch/issues/47061
Reviewed By: albanD
Differential Revision: D26406444
Pulled By: dagitses
fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
- Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options
Issue Link: https://github.com/pytorch/pytorch/issues/54188
Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
- This is because when running test cases, two different instances
of the singleton are created for the test suite and the actual code
(`jit_log.cpp`)
- On using these methods directly, `is_enabled` calls the singleton
in `jit_log.cpp` while we are setting the config using another
singleton
- See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times
API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`
Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
- Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
- The error output had logs of form [DUMP..] [UPDATE...] etc
Fixes https://github.com/pytorch/pytorch/issues/54188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821
Reviewed By: soulitzer
Differential Revision: D29116712
Pulled By: ZolotukhinM
fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655
This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791
Reviewed By: gchanan
Differential Revision: D29242015
Pulled By: jbschlosser
fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001
Fix the aten::to schema to reflect that the output may alias input.
Test Plan: Added new unit tests.
Reviewed By: ezyang
Differential Revision: D29121620
fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58570
**What the PR does**
Generate a fast-path `at::meta::{op}` API for calling meta functions without having to go through the dispatcher. This will be important for perf for external backends that want to use meta functions for shape checking (which seems likely to be what we end up doing for LazyTensorCore).
**Details**
In order to avoid naming collisions I had to make two small changes:
- rename `MetaFunctions.h` template -> `NativeMetaFunctions.h` (this is the file that declares the impl() function for every structured operator).
- rename the meta class: `at::meta::{op}::meta()` -> `at::meta::structured_{op}::meta()`
I also deleted a few unnecessary includes, since any file that includes NativeFunctions.h will automatically include NativeMetaFunctions.h.
**Why I made the change**
This change isn't actually immediately used anywhere; I already started writing it because I thought it would be useful for structured composite ops, but that isn't actually true (see [comment](https://github.com/pytorch/pytorch/pull/58266#issuecomment-843213147)). The change feels useful and unambiguous though so I think it's safe to add. I added explicit tests for C++ meta function calls just to ensure that I wrote it correctly - which is actually how I hit the internal linkage issue in the PR below this in the stack.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D28711299
Pulled By: bdhirsh
fbshipit-source-id: d410d17358c2b406f0191398093f17308b3c6b9e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59683
Replaces usages of throw std::runtime_error("foo") with the better
torch_check(false, "foo") which allows C++ stacktraces to show up when
TORCH_SHOW_CPP_STACKTRACES=1. This will hopefully provide much better debugging
information when debugging crashes/flaky tests.
ghstack-source-id: 131167210
Test Plan: CI
Reviewed By: cbalioglu
Differential Revision: D28981327
fbshipit-source-id: 677f569e28600263cab18759eb1b282e0391aa7b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58873
BackenDebugInforRecorder
Prior to this PR:
In order to generate debug handles corresponding to the graph being
lowered, backend's preprocess will call generate_debug_handles and will
get map of Node*-to-debug_handles.
In order to facilitate this, to_backend will own
BackendDebugInfoRecorder and initialize thread local pointer to it.
generate_debug_handle function will query thread local pointer to see if
there is a valid BackendDebugInforRecorder for the context. If there is
it will generate debug handles.
After this PR:
Signature of preprocess is changed such that backends have to register
preprocess that accepts instance of BackendDebugInfoRecorder by
reference. generate_debug_handles is no more a free function but becomes
part of the API of BackendDebugInfoRecorder. Now backend's preprocess
function will call generate_debug_handles on BackendDebugInfoRecorder
instead of free function.
Reason for this change:
With RAII that initializes thread local pointer, results in a lose
contract with backends, which may result in backends not storing
debug information. Making it part of API results in
backends having to be aware of BackendDebugInfoRecorder and explicitly
chosing not to generate/store debug information if they chose to do so.
Test Plan:
backend tests
Imported from OSS
Reviewed By: jbschlosser, raziel
Differential Revision: D28648613
fbshipit-source-id: c9b7e7bf0f78e87023ea7bc08612cf893b08cb98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508
An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28918342
Pulled By: ZolotukhinM
fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35379
- Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`.
- Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?).
- Python API now uses `retain_grad` implementation from cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362
Reviewed By: jbschlosser
Differential Revision: D28969298
Pulled By: soulitzer
fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4661
- Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths
- Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412
Reviewed By: jbschlosser
Differential Revision: D28969294
Pulled By: soulitzer
fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328
This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest.
The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack.
ghstack-source-id: 130676384
Test Plan: Run the existing tests since there are no behavioral changes.
Reviewed By: H-Huang
Differential Revision: D28417742
fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59472
Previously, the lite interpreter would refuse to load any model
with a version greater than kProducedBytecodeVersion. Now, we're
able to independently advance the loading and saving code, so we
can roll out changes without breaking forward compatibility.
Test Plan:
CI.
Loaded a bytecode v5 model even with setting kProducedBytecodeVersion
to v4.
Reviewed By: raziel
Differential Revision: D28904350
fbshipit-source-id: 598c22f0adf47d4ed3e976bcbebdf3959dacb1df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279
There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.
Differential Revision:
D28819780
D28819780
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729
Reviewed By: mruberry
Differential Revision: D28874507
Pulled By: soulitzer
fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58753
TSAN was (rightfully!) detecting and complaining about a race due to the fact that upon init the TP agent exchanges the device maps between nodes using RPC requests (and by doing so it accesses the device maps) and then sets the reverse device maps (thus possibly modifying the set of devices). This resulted in a data race, i.e., simultaneously reading and writing the set of devices without synchronizing.
One solution is to add a mutex around the devices, which works, but is "annoying". An alternative solution is to make the set of devices immutable (i.e., `const`). For that to work, we need to exchange the device maps without using RPC calls. We can do so using the process group that we need to create anyways.
Since now there's a lot more logic in Python, I've moved (and restructured) all safety checks over there, and removed them from C++.
ghstack-source-id: 130583775
Test Plan: Unit tests
Reviewed By: mrshenli
Differential Revision: D28603754
fbshipit-source-id: 88533e65d72d1eb806dc41bec8d55def5082e290
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59212
Reland of https://github.com/pytorch/pytorch/pull/58428
Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!).
ghstack-source-id: 130202842
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28623885
fbshipit-source-id: 29333bcb75d077ab801eac92017d0e381e8f5569
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59205
Reland of https://github.com/pytorch/pytorch/pull/58422
Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move).
By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above.
In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR.
ghstack-source-id: 130202849
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28623891
fbshipit-source-id: c9aeea3440679a11741ca78c06b03c57cb815a5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56504
Having callbacks registered but disabled via their
`shouldRun` callback defeats the `shouldRunRecordFunction`
optimization (no relation between the two things, despite the
shared prefix on the names) that aims to skip `RecordFunction`
construction.
This diff attempts to safely rectify this issue: we drop support for
`shouldRun` callbacks (this is bc-breaking; does anything use these
externally? do I need to add the support back and just stop using it
internally?), add support for enabling and disabling callbacks, and
(for global callbacks) make doing so thread-safe.
There is an interesting subtlety with `std::atomic` that came up: it
is neither copyable nor movable, which precludes putting it into
`std::vector`. I manually overrode this because the thread safety
reasons it is neither copyable nor movable don't apply here; we
already state that adding or removing callbacks (the operations that
might copy/move an atomic) are not thread-safe and should be done at
initialization time.
ghstack-source-id: 129614296
Test Plan:
Existing CI should cover correctness, right? Inspected
perf report of a simple benchmark that runs nn.Linear in a loop on
CUDA, where internally have Kineto initialized and thus had a
shouldRun observer previously; we are no longer going through the
dispatcher's slow RecordFunction path or spending measurable time
constructing RecordFunction instances.
Reviewed By: ilia-cher
Differential Revision: D27834944
fbshipit-source-id: 93db1bc0a28b5372f7307490c908457e7853fa92
Summary:
Two main changes:
1. Change the argument of the collection of backport_v{i}_to_v{i-1} from (reader, writer) to (input_model_stream, output_model_stream), so it's easier to backport a model in option 2.
> 2) [Both format and content change] ]Use torch.jit.load() to load the stream,
and save it to output_model_stream.
2. Fix an issue in the test `backportAllVersionCheck`. Previous it declares `std::ostringstream oss` and uses `oss.clear()` to reset the stringstream. However, the `clear()` function doesn't reset the stream content, and causes problematic stream. As a mitigation, checks are added to prevent corrupted stream for each iteration in while loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58790
ghstack-source-id: 129929960
Test Plan:
CI
```
buck test mode/dev //caffe2/test/cpp/jit:jit
```
Reviewed By: raziel, iseeyuan
Differential Revision: D28620961
fbshipit-source-id: b0cbe0e88645ae278eb3999e2a84800702b5f985
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58892
The torchscript model after backport misses the `constants` archive. Add it back, and extend the unit test to run torchscript part.
ghstack-source-id: 129853819
Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit
- LiteInterpreterTest.BackPortByteCodeModelAllVersions'
```
Reviewed By: raziel, iseeyuan
Differential Revision: D28664507
fbshipit-source-id: 5f98723231cc64ed203c062ee6f00d8adbdccf77
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57481
This diff introduces function name to InlinedCallStack.
Since we are using InlinedCallStack for debug information in lite
interpreter as well as delegate backends, where InlinedCallStack cannot
be constructed from model source code, we need to save function name.
In the absence of function name Function* is used to get name of the
function. This is when JIT compiles code at runtime.
When that is not possible, this diff introduces a way to obtain function
name.
Test Plan:
test_backend
test_cs_debug_info_serialization
test_backend
test_cs_debug_info_serialization
Imported from OSS
Differential Revision:
D28159097
D28159097
Reviewed By: raziel, ZolotukhinM
Pulled By: kimishpatel
fbshipit-source-id: deacaea3325e27273f92ae96cf0cd0789bbd6e72
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441
debug info
Previous diffs did not save operator name in debug info. For delegated
backends that only idenfity op for profiling with debug handle, operator
name should be stores as well.
Furthermore to complete debug informaton also serialize function name.
Test Plan:
Existing lite interpreter and backend tests
Existing lite interpreter and backend tests
Imported from OSS
Differential Revision:
D28144581
D28144581
Reviewed By: raziel
Pulled By: kimishpatel
fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99