pytorch/tools/stats
Yang Wang a9ed7bd78e [utilization] pipeline to create clean db records (#145327)
upload_utilization_script to generate db-ready-insert records to s3
- generate two files: metadata and timeseries in ossci-utilization buckets
- convert log record to db format ones
- add unit test job for tools/stats/

Related Prs:
setup composite action for data pipeline: https://github.com/pytorch/pytorch/pull/145310
add permission for composite action to access S3 bucket: https://github.com/pytorch-labs/pytorch-gha-infra/pull/595
add insert logic in s3 replicator: https://github.com/pytorch/test-infra/pull/6217
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145327
Approved by: https://github.com/huydhn

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-01-29 23:48:50 +00:00
..
upload_utilization_stats [utilization] pipeline to create clean db records (#145327) 2025-01-29 23:48:50 +00:00
__init__.py
check_disabled_tests.py
export_test_times.py
import_test_stats.py
monitor.py [utilization] pipeline to create clean db records (#145327) 2025-01-29 23:48:50 +00:00
README.md
sccache_stats_to_benchmark_format.py
test_dashboard.py
upload_artifacts.py
upload_dynamo_perf_stats.py
upload_external_contrib_stats.py
upload_metrics.py
upload_sccache_stats.py
upload_stats_lib.py [utilization] pipeline to create clean db records (#145327) 2025-01-29 23:48:50 +00:00
upload_test_stats.py
upload_test_stats_intermediate.py
upload_test_stats_running_jobs.py
utilization_stats_lib.py [utilization] pipeline to create clean db records (#145327) 2025-01-29 23:48:50 +00:00

PyTorch CI Stats

We track various stats about each CI job.

  1. Jobs upload their artifacts to an intermediate data store (either GitHub Actions artifacts or S3, depending on what permissions the job has). Example: a9f6a35a33/.github/workflows/_linux-build.yml (L144-L151)
  2. When a workflow completes, a workflow_run event triggers upload-test-stats.yml.
  3. upload-test-stats downloads the raw stats from the intermediate data store and uploads them as JSON to s3, which then uploads to our database backend
graph LR
    J1[Job with AWS creds<br>e.g. linux, win] --raw stats--> S3[(AWS S3)]
    J2[Job w/o AWS creds<br>e.g. mac] --raw stats--> GHA[(GH artifacts)]

    S3 --> uts[upload-test-stats.yml]
    GHA --> uts

    uts --json--> s3[(s3)]
    s3 --> DB[(database)]

Why this weird indirection? Because writing to the database requires special permissions which, for security reasons, we do not want to give to pull request CI. Instead, we implemented GitHub's recommended pattern for cases like this.

For more details about what stats we export, check out upload-test-stats.yml