From 891736f115cfb8414bb46df2821693a7cee140bf Mon Sep 17 00:00:00 2001 From: lezcano Date: Sat, 13 Apr 2024 01:51:52 +0000 Subject: [PATCH] Fix links rendering when surrounding code in Dynamo deepdive (#123427) I thought the RST was rendering correctly, but here we are. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123427 Approved by: https://github.com/peterbell10 --- docs/source/export.rst | 3 +- docs/source/torch.compiler.rst | 2 +- .../source/torch.compiler_dynamo_deepdive.rst | 95 ++++++++----------- ...rst => torch.compiler_dynamo_overview.rst} | 6 +- 4 files changed, 49 insertions(+), 57 deletions(-) rename docs/source/{torch.compiler_deepdive.rst => torch.compiler_dynamo_overview.rst} (99%) diff --git a/docs/source/export.rst b/docs/source/export.rst index f3278e2721c..2ebf780944a 100644 --- a/docs/source/export.rst +++ b/docs/source/export.rst @@ -668,7 +668,8 @@ Read More :caption: Deep Dive for PyTorch Developers :maxdepth: 1 - torch.compiler_deepdive + torch.compiler_dynamo_overview + torch.compiler_dynamo_deepdive torch.compiler_dynamic_shapes torch.compiler_fake_tensor diff --git a/docs/source/torch.compiler.rst b/docs/source/torch.compiler.rst index 69dcd70effb..c861e413d07 100644 --- a/docs/source/torch.compiler.rst +++ b/docs/source/torch.compiler.rst @@ -102,7 +102,7 @@ Read More :caption: Deep Dive for PyTorch Developers :maxdepth: 1 - torch.compiler_deepdive + torch.compiler_dynamo_overview torch.compiler_dynamo_deepdive torch.compiler_dynamic_shapes torch.compiler_nn_module diff --git a/docs/source/torch.compiler_dynamo_deepdive.rst b/docs/source/torch.compiler_dynamo_deepdive.rst index 79af3dab268..f4c45807d11 100644 --- a/docs/source/torch.compiler_dynamo_deepdive.rst +++ b/docs/source/torch.compiler_dynamo_deepdive.rst @@ -1,3 +1,5 @@ +.. _torch.compiler_dynamo_deepdive: + Dynamo Deep-Dive ================ @@ -14,7 +16,7 @@ ground up. We will discuss the functionality it provides, and how it is implemented. By the end of this post, you will have a better understanding of what went wrong when you ``torch.compiled`` a PyTorch program and the compilation errored out, or succeeded but the speed-up -was not what you expected. [1]_ +was not what you expected. A Gentle Introduction to Dynamo ------------------------------- @@ -60,11 +62,11 @@ we see the output that Dynamo traced We call this a **graph (or trace) of the function for the given inputs**. This is represented via an `FX -graph `__. We will simply think +graph `__. We will simply think of an FX graph as a container that stores a list of function calls. The first thing we should notice is that the graph is a linear sequence -of PyTorch operations. [2]_ Dynamo records all the PyTorch operations +of PyTorch operations. [1]_ Dynamo records all the PyTorch operations and stores them sequentially. For example, it split ``z = (x - y) ** 2`` into its two constituting operations, ``sub = l_x_ - l_y_`` and ``z = sub ** 2``. @@ -215,10 +217,10 @@ variables and their names - The builtin functions like ``abs`` or ``print`` You can see all the fields -`here `__. [3]_ +`here `__. [2]_ In summary, CPython provides the user’s interpreter with all the -information necessary to execute the function. [4]_ +information necessary to execute the function. [3]_ With this API, we can implement a tracer by implementing an interpreter that runs the code and records in a graph all the PyTorch operations @@ -242,10 +244,10 @@ Implementing CPython in Python So, we are back in the Python world. We have the bytecode of a function, and all the context necessary to execute it. In particular, we have landed at -```_convert_frame_assert`` `__. +`_convert_frame_assert `__. This is the function that the decorator ``torch.compile`` returns! We get to this function from -```_dynamo.optimize`` `__. +`_dynamo.optimize `__. The decorator ``torch.compile`` is just a nice API around ``_dynamo.optimize``. @@ -259,8 +261,7 @@ of Dynamo. The parent class of the internal class structure is ``VariableTracker`` and represents the different objects that Dynamo understands. For example, ``ListVariable``, represents a ``list`` object, and keeps -internally a `list of -``VariableTracker``\ s `__. +internally a `list of VariableTrackers `__. Another example of ``VariableTracker`` is `ConstantVariable `__. ConstantVariable wraps all the `objects considered constant by @@ -269,12 +270,12 @@ We also have special subclasses for objects that require special attention, like `TensorVariable `__. All these internal classes are defined in the -```torch/_dynamo/variables`` `__ +`torch/_dynamo/variables `__ folder. Python objects are wrapped into their corresponding ``VariableTracker`` class in -```VariableBuilder._wrap`` `__. +`VariableBuilder._wrap `__. This function is just a very long chain of ``elif``\ s that tries to recursively pattern-match the Python inputs into the appropriate type of ``VariableTracker``. @@ -304,9 +305,9 @@ traced into the right ``VariableTracker``. Ok, so we have an IR for our tracer, now we *just* need to reimplement CPython’s stack machine. This is implemented by -```InstructorTranslatorBase`` `__ +`InstructorTranslatorBase `__ in -```symbolic_convert.py`` `__. +`symbolic_convert.py `__. ``InstructionTranslatorBase`` has about 200 methods, implementing almost all of Python bytecodes. As an example, we can see the implementation of @@ -330,10 +331,9 @@ Generating the Output Graph With a way to symbolically execute Python code, we are set to extract the PyTorch operations that happen during the symbolic execution of a program given some inputs. This is implemented in Dynamo via the -```OutputGraph`` `__ +`OutputGraph `__ object. The ``OutputGraph`` object is `bound to an -``InstructionTranslator`` -object `__ +`InstructionTranslator object `__ and it tracks all the data necessary to create the FX graph which will be returned by Dynamo. @@ -342,9 +342,9 @@ All the inputs and intermediary elements of the FX graph are ``fx.Proxy``\ s. ``fx.Proxy``\ s are used to build the FX graph. In particular, they record every PyTorch operation performed on them into the graph. You can can create a new operation to be added to -the graph by calling ```create_proxy`` `__. +the graph by calling `create_proxy `__. Then, we can add it to the graph through the function -```wrap_fx_proxy`` `__. +`wrap_fx_proxy `__. A graph stores operations on tensors… and operations on symbolic integers. We will discuss symbolic integers later on, but first we will @@ -358,7 +358,7 @@ Making Dynamo Sound: Guards At this point, we have a way to trace programs completely disregarding control flow. And for that, we have reimplemented all of CPython… If this sounds like a bit of an overkill, that is because it is. -```torch.jit.trace`` `__ +`torch.jit.trace `__ already implements this without all this machinery, so what gives? The issue with ``torch.jit.trace``, as it is warned in its docs, is that @@ -399,7 +399,7 @@ with ``TORCH_LOGS=guards`` prints (among other guards) L['b'] == 'Hello' This reads as “the local variable ``b`` should have a specific type -(``str`` in this case, represented by the constant `9433...`) and +(``str`` in this case, represented by the constant ``9433...``) and its value should be ``'Hello'``”. If we then execute the function again passing a different argument @@ -442,15 +442,15 @@ the objects they contain. In return a * x ``x`` and ``y`` have -```LocalSource`` `__ +`LocalSource `__ as their source, and ``y[0]`` has -```GetItemSource`` `__, +`GetItemSource `__, which stores a ``LocalSource`` inside. On the other hand, ``a`` will not have a source as it is an intermediate variable that only exists within the fx graph. All these are defined in -```torch/_dynamo/source.py`` `__. +`torch/_dynamo/source.py `__. We can see the guard generated by ``GetItemSource`` in the following example: @@ -496,9 +496,9 @@ Symbolic Shapes Another point we discussed in the introduction is that Dynamo knows how to trace integers. In order to implement this, we use a symbolic class -```torch.SymInt`` `__\ [5]_ +`torch.SymInt `__ that acts like an ``int`` but it records all the operations performed on -it in the output FX graph. We already saw this class in the introduction +it in the output FX graph. [4]_ We already saw this class in the introduction when introducing symbolic integer tracing. Let us now discuss the three properties that define symbolic shape @@ -588,7 +588,7 @@ more general guards on this more generic kernel. **Compilation performance tip**. If you know that a dimension will vary in size, you can mark it as dynamic by calling -```torch._dynamo.mark_dynamic`` `__ +`torch._dynamo.mark_dynamic `__ before calling ``torch.compile``. This will avoid the first compilation with a static shape. There are other useful utility functions like ``maybe_mark_dynamic`` or ``mark_static``. You can also have all @@ -671,7 +671,7 @@ arbitrary Python code” is perhaps a bit too general. Dynamo implements a good part of Python, but does it implement the more complex parts, like coroutines or async? Does it implement the whole Python standard library? NumPy also has a Python API. Does ``torch.compile`` also -understand NumPy? and Django? [6]_ +understand NumPy? and Django? [5]_ Python’s ecosystem is massive, and a good part of it is written in other more performant languages like C++ or Rust, and it just exposes Python @@ -683,15 +683,15 @@ The usual way machine learning tracers handle this issue is by informing the user that the operation they choked on and giving up tracing altogether. This would pose a real usability issue in the case of PyTorch, where its users are used to the flexibility it gives them. As a -real-world example the ```doctr_det_predictor`` model uses NumPy and the -``cv2`` library to postprocess the model’s +real-world example the ``doctr_det_predictor`` model uses NumPy and the +``cv2`` library to `postprocess the model’s result `__. Here is another place where having access to CPython is interesting. Rather than erroring out, Dynamo can let CPython run that problematic code! To do this, Dynamo generates at trace time one graph with all the operations before the problematic code, and one with all the operations -after. [7]_ Then, at runtime, it will delegate to CPython to execute the +after. [6]_ Then, at runtime, it will delegate to CPython to execute the first graph, then the problematic code, and then the second graph. This process of stopping the tracing and generating multiple graphs is called a **graph break**. @@ -811,10 +811,9 @@ implementing the strategy that we described before The code generation of the stack in Dynamo is delegated to ``VariableTracker`` subclasses. Every ``VariableTracker`` object in -Dynamo has a ```reconstruct`` -method `__ -that generates the necessary bytecode to create the python object it -represents on the stack. +Dynamo has a `reconstruct `__ +method that generates the necessary bytecode to create the python object +it represents on the stack. **Debugging tip**. Graph breaks hamper performance, and as such, it is best to avoid them. Running a program with ``TORCH_LOGS=graph_breaks`` @@ -843,34 +842,24 @@ github `__. +Footnotes +--------- -.. [2] - In the literature, this is called a Directed Acyclical Graph (DAG). +.. [1] In the literature, this is called a Directed Acyclical Graph (DAG). -.. [3] - All this binding code lives in ``torch/csrc/dynamo/eval_frame.c``. +.. [2] All this binding code lives in ``torch/csrc/dynamo/eval_frame.c``. -.. [4] - In CPython lingo, the set of all these objects are called `a +.. [3] In CPython lingo, the set of all these objects are called `a frame `__. -.. [5] - There are also ``SymBool`` and ``SymFloat`` classes. The latter one +.. [4] There are also ``SymBool`` and ``SymFloat`` classes. The latter one is not used all that much at the time of this writing. -.. [6] - Interestingly enough, it does understand NumPy code! Have a look at +.. [5] Interestingly enough, it does understand NumPy code! Have a look at `this blogpost `__ - and `the - docs `__. + and `the docs `__. Now, this is just possible because we reimplemented NumPy using PyTorch. Good luck implementing Django in PyTorch though… -.. [7] - Assuming there is just one piece of problematic code. If there are +.. [6] Assuming there is just one piece of problematic code. If there are more, Dynamo can split the code into as many graphs as it needs. diff --git a/docs/source/torch.compiler_deepdive.rst b/docs/source/torch.compiler_dynamo_overview.rst similarity index 99% rename from docs/source/torch.compiler_deepdive.rst rename to docs/source/torch.compiler_dynamo_overview.rst index bdaf13278e8..cce1c393160 100644 --- a/docs/source/torch.compiler_deepdive.rst +++ b/docs/source/torch.compiler_dynamo_overview.rst @@ -1,5 +1,5 @@ -TorchDynamo Deep Dive -===================== +TorchDynamo Overview +==================== Before you read this section, read :ref:`torch.compiler_overview`. @@ -346,3 +346,5 @@ To summarize, the compiled code is conceptually equivalent to the code below: The following diagram demonstrates how ``torch.compile`` transforms and optimizes user-written code: it first extracts computation graphs from the user-written function, and compiles these graphs into optimized functions, then assembles them into a new function, which is functionally equivalent to the user-written code but optimized to have a good computation speed. .. image:: _static/img/dynamo/flowchart.jpg + +To learn more about how all this is implemented internally, see :ref:`torch.compiler_dynamo_deepdive`.