pytorch/tools/stats
Huy Do 6db196b744 Specify the head branch when upload perf stats to Rockset (#97643)
Before this, my assumption was that the workflow was only run on the main branch. This is not correct anymore as it could also now be run as part of the PR, i.e. https://hud.pytorch.org/pr/91316.  So this change does two things:

* Always upload inductor-A100-perf-nightly artifacts to S3 once completed by removing the main branch gating.
* Add `head_branch` to Rockset records, so that the [dashboard](https://torchci-git-fork-huydhn-add-compilers-bench-74abf8-fbopensource.vercel.app/benchmark/compilers) knows if the records come from the daily schedule on the main branch or from experimental PR.  The `head_branch` would be set to `master` in the former.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97643
Approved by: https://github.com/desertfire
2023-03-27 17:17:52 +00:00
..
__init__.py
check_disabled_tests.py Upload external contribution data to s3 (#95747) 2023-03-02 21:57:28 +00:00
export_test_times.py
import_test_stats.py Use s3 for some test infra files (#94642) 2023-02-14 19:45:41 +00:00
monitor.py Bump black version to 23.1.0 (#96578) 2023-03-15 06:27:59 +00:00
README.md
upload_artifacts.py
upload_dynamo_perf_stats.py Specify the head branch when upload perf stats to Rockset (#97643) 2023-03-27 17:17:52 +00:00
upload_external_contrib_stats.py Upload external contribution data to s3 (#95747) 2023-03-02 21:57:28 +00:00
upload_sccache_stats.py
upload_stats_lib.py Bump black version to 23.1.0 (#96578) 2023-03-15 06:27:59 +00:00
upload_test_stats.py Upload failed and rerun tests (#97304) 2023-03-22 22:03:56 +00:00

PyTorch CI Stats

We track various stats about each CI job.

  1. Jobs upload their artifacts to an intermediate data store (either GitHub Actions artifacts or S3, depending on what permissions the job has). Example: a9f6a35a33/.github/workflows/_linux-build.yml (L144-L151)
  2. When a workflow completes, a workflow_run event triggers upload-test-stats.yml.
  3. upload-test-stats downloads the raw stats from the intermediate data store and uploads them as JSON to Rockset, our metrics backend.
graph LR
    J1[Job with AWS creds<br>e.g. linux, win] --raw stats--> S3[(AWS S3)]
    J2[Job w/o AWS creds<br>e.g. mac] --raw stats--> GHA[(GH artifacts)]

    S3 --> uts[upload-test-stats.yml]
    GHA --> uts

    uts --json--> R[(Rockset)]

Why this weird indirection? Because writing to Rockset requires special permissions which, for security reasons, we do not want to give to pull request CI. Instead, we implemented GitHub's recommended pattern for cases like this.

For more details about what stats we export, check out upload-test-stats.yml