pytorch/tools
Ilia Cherniavskii f7a8bf2855 Use libkineto in profiler (#46470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470

Adding ability to use Kineto (CUPTI) to profile CUDA kernels

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install
python test/test_profiler.py

python test/test_autograd.py -k test_profile
python test/test_autograd.py -k test_record

```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       1.000us             2
                                      sgemm_32x32x32_NN         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       2.000us             1
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                                            aten::randn         5.17%      74.000us         6.71%      96.000us      48.000us       0.000us         0.00%       0.000us       0.000us             2
                                            aten::empty         1.33%      19.000us         1.33%      19.000us       4.750us       0.000us         0.00%       0.000us       0.000us             4
                                          aten::normal_         1.05%      15.000us         1.05%      15.000us       7.500us       0.000us         0.00%       0.000us       0.000us             2
                                               aten::to        77.90%       1.114ms        91.61%       1.310ms     436.667us       0.000us         0.00%       3.000us       1.000us             3
                                    aten::empty_strided         2.52%      36.000us         2.52%      36.000us      12.000us       0.000us         0.00%       0.000us       0.000us             3
                                            aten::copy_         2.73%      39.000us        11.19%     160.000us      53.333us       0.000us         0.00%       3.000us       1.000us             3
                                        cudaMemcpyAsync         4.34%      62.000us         4.34%      62.000us      20.667us       0.000us         0.00%       0.000us       0.000us             3
                                  cudaStreamSynchronize         1.61%      23.000us         1.61%      23.000us       7.667us       0.000us         0.00%       0.000us       0.000us             3
                                               aten::mm         0.21%       3.000us         7.20%     103.000us     103.000us       0.000us         0.00%       2.000us       2.000us             1
                                           aten::stride         0.21%       3.000us         0.21%       3.000us       1.000us       0.000us         0.00%       0.000us       0.000us             3
                                       cudaLaunchKernel         2.45%      35.000us         2.45%      35.000us      17.500us       0.000us         0.00%       0.000us       0.000us             2
                                              aten::add         0.49%       7.000us         4.27%      61.000us      61.000us       0.000us         0.00%       1.000us       1.000us             1
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a

Reviewed By: Chillee

Differential Revision: D25142223

Pulled By: ilia-cher

fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80
2020-11-25 04:32:16 -08:00
..
amd_build Remove __future__ imports for legacy Python2 supports (#45033) 2020-09-23 17:57:02 -07:00
autograd Added linalg.eigh, linalg.eigvalsh (#45526) 2020-11-22 04:57:28 -08:00
clang_format_hash
code_analyzer Revert D25056091: migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API 2020-11-19 19:10:14 -08:00
code_coverage add lcov to oss for beautiful html report (#44568) 2020-09-11 15:29:24 -07:00
codegen [pytorch][codegen] add autograd data model (#48249) 2020-11-19 21:47:05 -08:00
config
docker
jit [pytorch] add script to run all codegen (#46243) 2020-10-29 22:55:12 -07:00
pyi [pytorch][codegen] gen_python_functions.py loading native_functions.yaml / deprecated.yaml directly (#47746) 2020-11-14 02:27:57 -08:00
rules
setup_helpers Fix typos in comments (#48316) 2020-11-24 10:56:40 -08:00
shared Cleanup unused code for Python < 3.6 (#47822) 2020-11-13 21:37:01 -08:00
__init__.py
aten_mirror.sh
build_libtorch.py Remove Incorrect Comment in tools/build_libtorch and remove Python2 support in the module import (#44888) 2020-09-18 10:03:36 -07:00
build_pytorch_libs.py Cleanup unused code for Python < 3.6 (#47822) 2020-11-13 21:37:01 -08:00
build_variables.bzl Use libkineto in profiler (#46470) 2020-11-25 04:32:16 -08:00
clang_format_all.py Replaced whitelist with allowlist (#45796) 2020-10-06 09:18:51 -07:00
clang_format_ci.sh
clang_format_utils.py
clang_tidy.py Remove __future__ imports for legacy Python2 supports (#45033) 2020-09-23 17:57:02 -07:00
download_mnist.py Remove Incorrect Comment in tools/build_libtorch and remove Python2 support in the module import (#44888) 2020-09-18 10:03:36 -07:00
flake8_hook.py
generate_torch_version.py Fix torch.version.debug generation (#47006) 2020-10-28 12:48:30 -07:00
generated_dirs.txt
git-clang-format
git-pre-commit
git_add_generated_dirs.sh
git_reset_generated_dirs.sh
nightly.py Fix typos in comments (#48316) 2020-11-24 10:56:40 -08:00
pytorch.version
README.md
update_disabled_tests.sh

This folder contains a number of scripts which are used as part of the PyTorch build process. This directory also doubles as a Python module hierarchy (thus the __init__.py).

Overview

Modern infrastructure:

  • autograd - Code generation for autograd. This includes definitions of all our derivatives.
  • jit - Code generation for JIT
  • shared - Generic infrastructure that scripts in tools may find useful.
    • module_loader.py - Makes it easier to import arbitrary Python files in a script, without having to add them to the PYTHONPATH first.

Legacy infrastructure (we should kill this):

  • cwrap - Implementation of legacy code generation for THNN/THCUNN. This is used by nnwrap.

Build system pieces:

  • setup_helpers - Helper code for searching for third-party dependencies on the user system.
  • build_pytorch_libs.py - cross-platform script that builds all of the constituent libraries of PyTorch, but not the PyTorch Python extension itself.
  • build_libtorch.py - Script for building libtorch, a standalone C++ library without Python support. This build script is tested in CI.

Developer tools which you might find useful:

Important if you want to run on AMD GPU:

  • amd_build - HIPify scripts, for transpiling CUDA into AMD HIP. Right now, PyTorch and Caffe2 share logic for how to do this transpilation, but have separate entry-points for transpiling either PyTorch or Caffe2 code.
    • build_amd.py - Top-level entry point for HIPifying our codebase.

Tools which are only situationally useful: