pytorch/.github/actions
saienduri 7eb51e5464 Ensure GPU isolation for kubernetes pod MI300 runners. (#145829)
Fixes the reason behind moving the tests to unstable initially. (https://github.com/pytorch/pytorch/pull/145790)
We ensure gpu isolation for each pod within kubernetes by propagating the drivers selected for the pod from the Kubernetes layer up to the docker run in pytorch here.
Now we stick with the GPUs assigned to the pod in the first place and there is no overlap between the test runners.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145829
Approved by: https://github.com/jeffdaily
2025-01-28 17:20:46 +00:00
..
build-android
checkout-pytorch [BE] Get rid of malfet/checkout@silent-checkout (#143516) 2024-12-19 00:36:36 +00:00
chown-workspace
diskspace-cleanup [ROCm] Enable post-merge trunk workflow on MI300 runners; skip and fix MI300 related failed tests (#143673) 2025-01-09 05:18:57 +00:00
download-build-artifacts
download-td-artifacts Silent TD warnings when there is no td_results.json (#142083) 2024-12-04 23:43:29 +00:00
filter-test-configs
get-workflow-job-id
linux-test
pytest-cache-download
pytest-cache-upload
setup-linux
setup-rocm Ensure GPU isolation for kubernetes pod MI300 runners. (#145829) 2025-01-28 17:20:46 +00:00
setup-win
setup-xpu
teardown-rocm
teardown-win
teardown-xpu
test-pytorch-binary Remove builder repo from workflows and scripts (#143776) 2024-12-24 14:11:51 +00:00
upload-sccache-stats Upload sccache stats into benchmark database with build step time (#140839) 2024-11-21 22:38:45 +00:00
upload-test-artifacts