pytorch/tools
Dhruv Matani 594c546b69 [PyTorch Edge] Eliminate non-determinism when generating build YAML file (#56539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56539

It seems like a potential source of non-determinism when generating YAML files during the build stems from the fact that when we write out Python lists, they get written out in list order. This isn't a problem per-se, but if you look to see how these lists are generated, you'll see that they come from sets, which are inherently [not order preserving](https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) in Python.

I can't guarantee that this removes non-determinism, but it removes all non-determinism that I know of so far. The surface area of codegen isn't sprawling, and the YAML file is generated by converting the object `toDict()` and passing it into the YAML serializer, so this should cover it (I think). Dictionaries are serialized in key order by pyyaml, so that's not a problem.

This could be releated to the elevated Android build times being seen [here](https://fb.workplace.com/groups/pytorch.edge.users/permalink/841622146708080/).
ghstack-source-id: 126987721

Test Plan: Build + Sandcastle.

Reviewed By: JacobSzwejbka

Differential Revision: D27893058

fbshipit-source-id: 6d7bcb09f34c05b71fbb4a0673bac1c4c33f23d7
2021-04-20 17:26:14 -07:00
..
amd_build
autograd s/AutoNonVariableTypeMode/AutoDispatchBelowAutograd/ (#56423) 2021-04-20 17:17:46 -07:00
clang_format_hash [tools] Remove newline from clang-format reference hashes (#55328) 2021-04-06 17:17:19 -07:00
code_analyzer Port put_ and take from TH to ATen (#53356) 2021-04-05 18:05:38 -07:00
code_coverage
codegen [PyTorch Edge] Eliminate non-determinism when generating build YAML file (#56539) 2021-04-20 17:26:14 -07:00
config
docker
fast_nvcc
gdb Fix Flake8 (#54540) 2021-03-23 13:50:03 -07:00
jit
lite_interpreter Un-ignore F403 in .flake8 (#55838) 2021-04-13 09:24:07 -07:00
pyi Un-ignore F403 in .flake8 (#55838) 2021-04-13 09:24:07 -07:00
rules [codemod][fbcode][1/n] Apply buildifier 2021-04-12 11:04:32 -07:00
setup_helpers Catch and ignore tracebacks for compilation errors (#55986) 2021-04-14 13:05:27 -07:00
shared matches_jit_signatures is dead (#53637) 2021-04-15 12:31:19 -07:00
stats_utils fix boto3 resource not close (#55082) 2021-03-31 16:49:15 -07:00
test Harden "Add annotations" workflow (#56071) 2021-04-16 07:46:20 -07:00
__init__.py
actions_local_runner.py [skip ci] Add simple local actions runner (#56439) 2021-04-20 12:17:55 -07:00
build_libtorch.py
build_pytorch_libs.py
build_variables.bzl add channels last for MaxPool2d (#56361) 2021-04-20 15:02:18 -07:00
clang_format_all.py
clang_format_ci.sh
clang_format_utils.py [tools] Remove newline from clang-format reference hashes (#55328) 2021-04-06 17:17:19 -07:00
clang_tidy.py [BE] Make torch/csrc/jit/tensorexpr/ clang-tidy clean (#55628) 2021-04-08 19:44:14 -07:00
download_mnist.py
export_slow_tests.py Sort slow tests json by test name (#55862) 2021-04-12 20:08:56 -07:00
extract_scripts.py Harden "Add annotations" workflow (#56071) 2021-04-16 07:46:20 -07:00
flake8_hook.py
generate_torch_version.py
generated_dirs.txt
git-clang-format
git-pre-commit
git_add_generated_dirs.sh
git_reset_generated_dirs.sh
mypy_wrapper.py Use mypy internals instead of fnmatch for mypy wrapper (#55702) 2021-04-12 11:30:16 -07:00
nightly.py Fix nightly tool for python 3.6 (#55776) 2021-04-12 09:34:29 -07:00
print_test_stats.py Include short test suites ln total_seconds stat (#56040) 2021-04-14 11:53:55 -07:00
pytorch.version
README.md Harden "Add annotations" workflow (#56071) 2021-04-16 07:46:20 -07:00
run_shellcheck.sh Run ShellCheck on scripts in GitHub Actions workflows (#55486) 2021-04-08 13:15:00 -07:00
test_history.py Clarify tools/test_history.py output for re-runs (#55106) 2021-03-31 14:54:38 -07:00
trailing_newlines.py Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
translate_annotations.py Translate annotation line numbers from merge to head (#55569) 2021-04-09 11:12:40 -07:00

This folder contains a number of scripts which are used as part of the PyTorch build process. This directory also doubles as a Python module hierarchy (thus the __init__.py).

Overview

Modern infrastructure:

  • autograd - Code generation for autograd. This includes definitions of all our derivatives.
  • jit - Code generation for JIT
  • shared - Generic infrastructure that scripts in tools may find useful.
    • module_loader.py - Makes it easier to import arbitrary Python files in a script, without having to add them to the PYTHONPATH first.

Legacy infrastructure (we should kill this):

  • cwrap - Implementation of legacy code generation for THNN/THCUNN. This is used by nnwrap.

Build system pieces:

  • setup_helpers - Helper code for searching for third-party dependencies on the user system.
  • build_pytorch_libs.py - cross-platform script that builds all of the constituent libraries of PyTorch, but not the PyTorch Python extension itself.
  • build_libtorch.py - Script for building libtorch, a standalone C++ library without Python support. This build script is tested in CI.
  • fast_nvcc - Mostly-transparent wrapper over nvcc that parallelizes compilation when used to build CUDA files for multiple architectures at once.
    • fast_nvcc.py - Python script, entrypoint to the fast nvcc wrapper.

Developer tools which you might find useful:

  • clang_tidy.py - Script for running clang-tidy on lines of your script which you changed.
  • extract_scripts.py - Extract scripts from .github/workflows/*.yml into a specified dir, on which linters such as run_shellcheck.sh can be run. Assumes that every run script has shell: bash unless a different shell is explicitly listed on that specific step (so defaults doesn't currently work), but also has some rules for other situations such as actions/github-script. Exits with nonzero status if any of the extracted scripts contain GitHub Actions expressions: ${{<expression> }}
  • git_add_generated_dirs.sh and git_reset_generated_dirs.sh - Use this to force add generated files to your Git index, so that you can conveniently run diffs on them when working on code-generation. (See also generated_dirs.txt which specifies the list of directories with generated files.)
  • mypy_wrapper.py - Run mypy on a single file using the appropriate subset of our mypy*.ini configs.
  • run_shellcheck.sh - Find *.sh files (recursively) in the directories specified as arguments, and run ShellCheck on all of them.
  • test_history.py - Query S3 to display history of a single test across multiple jobs over time.
  • trailing_newlines.py - Take names of UTF-8 files from stdin, print names of nonempty files whose contents don't end in exactly one trailing newline, exit with status 1 if no output printed or 0 if some filenames were printed.
  • translate_annotations.py - Read Flake8 or clang-tidy warnings (according to a --regex) from a --file, convert to the JSON format accepted by pytorch/add-annotations-github-action, and translate line numbers from HEAD back in time to the given --commit by running git diff-index --unified=0 appropriately.

Important if you want to run on AMD GPU:

  • amd_build - HIPify scripts, for transpiling CUDA into AMD HIP. Right now, PyTorch and Caffe2 share logic for how to do this transpilation, but have separate entry-points for transpiling either PyTorch or Caffe2 code.
    • build_amd.py - Top-level entry point for HIPifying our codebase.

Tools which are only situationally useful: