pytorch/tools/stats
Catherine Lee dab272eed8 [td] Consistent pytest cache (#113804)
Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding.

Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic)

Each cache upload should only include information from the current run.  Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time)

From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804
Approved by: https://github.com/ZainRizvi
2023-11-17 23:45:47 +00:00
..
__init__.py
check_disabled_tests.py
export_test_times.py [td] Consistent pytest cache (#113804) 2023-11-17 23:45:47 +00:00
import_test_stats.py [td] Consistent pytest cache (#113804) 2023-11-17 23:45:47 +00:00
monitor.py
README.md
upload_artifacts.py
upload_dynamo_perf_stats.py
upload_external_contrib_stats.py
upload_metrics.py Include job name in the emitted metrics (#113884) 2023-11-16 21:26:49 +00:00
upload_sccache_stats.py
upload_stats_lib.py
upload_test_stat_aggregates.py
upload_test_stats.py [ez] Remove unused code in upload_test_stats (#111504) 2023-10-19 16:09:15 +00:00

PyTorch CI Stats

We track various stats about each CI job.

  1. Jobs upload their artifacts to an intermediate data store (either GitHub Actions artifacts or S3, depending on what permissions the job has). Example: a9f6a35a33/.github/workflows/_linux-build.yml (L144-L151)
  2. When a workflow completes, a workflow_run event triggers upload-test-stats.yml.
  3. upload-test-stats downloads the raw stats from the intermediate data store and uploads them as JSON to Rockset, our metrics backend.
graph LR
    J1[Job with AWS creds<br>e.g. linux, win] --raw stats--> S3[(AWS S3)]
    J2[Job w/o AWS creds<br>e.g. mac] --raw stats--> GHA[(GH artifacts)]

    S3 --> uts[upload-test-stats.yml]
    GHA --> uts

    uts --json--> R[(Rockset)]

Why this weird indirection? Because writing to Rockset requires special permissions which, for security reasons, we do not want to give to pull request CI. Instead, we implemented GitHub's recommended pattern for cases like this.

For more details about what stats we export, check out upload-test-stats.yml