mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Vitaly Fedyunin 4bfe2f0900 Fix jit outplace tracing and reapply changes to *_like operators. (#28839 ) Summary: Reapply reverted and fix files `gen_variable_type.py` `test_jit.py` https://github.com/pytorch/pytorch/issues/27891 Cleanup testing of _like operators https://github.com/pytorch/pytorch/issues/27890 Add memory format support to randn_like operator https://github.com/pytorch/pytorch/issues/27889 Add memory format support to randint_like operator https://github.com/pytorch/pytorch/issues/27562 Add memory format support to zeros_like operator https://github.com/pytorch/pytorch/issues/27561 Add memory format support to rand_like operator https://github.com/pytorch/pytorch/issues/27270 Add memory format support to ones_like operator https://github.com/pytorch/pytorch/issues/27262 Add memory format support to full_like operator Pull Request resolved: https://github.com/pytorch/pytorch/pull/28839 Test Plan: Imported from GitHub, without a `Test Plan:` line. buck test mode/dev //language_technology/neural_mt/os/pytorch_translate/test:test_onnx -- 'test_forced_decoder_export_vocab_reduction \(language_technology\.neural_mt\.os\.pytorch_translate\.test\.test_onnx\.TestONNX\)' Differential Revision: D18203397 Pulled By: VitalyFedyunin fbshipit-source-id: eea41cbd4c232cf5a54172b1e1b16b173798f298		2019-10-31 13:23:08 -07:00
..
cpu	Enable CPU fused kernel on Windows	2019-09-17 07:29:40 -07:00
cuda	add some support for the occupancy API on ROCm (#27390 )	2019-10-04 14:45:53 -07:00
arg_spec.h
codegen.cpp	Fix jit outplace tracing and reapply changes to *_like operators. (#28839 )	2019-10-31 13:23:08 -07:00
codegen.h
compiler.cpp	Use c10::to_string in more places (#28605 )	2019-10-24 15:52:05 -07:00
compiler.h
executor.cpp	Switching tests to ProfilingExecutor (rebased)	2019-10-29 11:41:42 -07:00
executor.h
fallback.cpp	Whenever possible, use function pointers rather than std::function to represent Operation's. (#26560 )	2019-09-21 20:51:24 -07:00
fallback.h
fused_kernel.h
interface.cpp
interface.h
kernel_cache.cpp	Exclude file:line from graphs used for fuser kernel cache (#21252 )	2019-06-01 16:18:55 -07:00
kernel_cache.h
kernel_spec.h	Disable fusion of grad_sum_to_size (#23372 )	2019-07-25 08:55:33 -07:00
partition_desc.h
README.md
tensor_desc.h	Merge ProfiledTensorType and TensorType (#24284 )	2019-08-20 13:01:28 -07:00
tensor_info.h

README.md

PyTorch Fuser

The fuser accepts subgraphs wrapped in "fusion nodes" and tries to execute them by just-in-time (JIT) compiling kernels that run all the graph operations.

Code Organization

The fuser is designed hierarchically with device-independent logic eventually deferring to device-specific logic and implementation. The device-specific code is (mostly) found in each devices' subdirectory. The device-independent logic has six components:

The Interface (interface.h/cpp) has functions to register and run fusions, interrogate fusion functionality, and perform debugging.
The Compiler (compiler.h/cpp) performs "upfront" and "runtime" compilation. When fusions are registered, upfront compilation produces fallback code and and performs some shape inference. When a fusion is run, runtime compilation invokes code generation and the device-specific compilation logic.
The Code Generator (codegen.h/cpp) produces the string to be compiled on the device.
The Executor (executor.h/cpp) runs requested fusions. It performs shape inference, expands tensors as necessary, determines the device to run on, acquires a cached compiled kernel or requests the Compiler produce a new one, invokes device-specific code to launch the kernel and updates the stack.
The Fallback (fallback.h/cpp) runs subgraphs that can't be fused because shape inference didn't determine a common tensor size or the device the tensors are on doesn't support fusion.
The Kernel Specification Cache (kernel_cache.h/cpp) is a thread-safe cache holding the device-independent specifications produced during upfront compilation. These specifications each have their own thread-safe stores of compiled kernels that the Executor checks before requesting runtime compilation.

The device-specific components have logic for compiling and running code in FusedKernelCPU (cpu/fused_kernel.h/cpp) and FusedKernelCUDA (cuda/fused_kernel.h/cpp).