pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Huy Do	9f39123d18	Allow to continue when fail to configure Windows Defender (#103454 ) Windows Defender will soon be removed from the AMI. Without the service, the step fails with the following error: ``` Set-MpPreference : Invalid class At C:\actions-runner\_work\_temp\1f029685-bb66-496d-beb8-19268ecbe44a.ps1:5 char:1 + Set-MpPreference -DisableRealtimeMonitoring $True + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : MetadataError: (MSFT_MpPreference:root\Microsoft\...FT_MpPreference) [Set-MpPreference], CimException + FullyQualifiedErrorId : HRESULT 0x80041010,Set-MpPreference ``` For example, https://github.com/pytorch/pytorch-canary/actions/runs/5267043497/jobs/9521809176. This is expected as the service is completely removed. Here are all the places where `Set-MpPreference` is used according to https://github.com/search?type=code&q=org%3Apytorch+Set-MpPreference Pull Request resolved: https://github.com/pytorch/pytorch/pull/103454 Approved by: https://github.com/atalman	2023-06-15 18:30:58 +00:00
PyTorch MergeBot	43127f19f1	Revert "Allow disable binary build jobs on CI (#100754 )" This reverts commit `4c3b52a5a9`. Reverted https://github.com/pytorch/pytorch/pull/100754 on behalf of https://github.com/huydhn due to The subset of Windows binary jobs running only in trunk fails because the runners do not have Python setup ([comment](https://github.com/pytorch/pytorch/pull/100754#issuecomment-1539586399))	2023-05-09 07:15:32 +00:00
Huy Do	4c3b52a5a9	Allow disable binary build jobs on CI (#100754 ) Given the recent outage w.r.t. binary workflows running on CI, I want to close the gap between them and regular CI jobs. The first part is to add the same filter step used by regular CI jobs so that oncalls can disable the job if need. * Nightly runs are excluded as it includes the step to publish nightly binaries. Allowing oncalls to disable this part requires more thoughts. So this covers only CI binary build and test jobs * As binary jobs doesn't have a concept of test matrix config which is a required parameter to the filter script, I use a pseudo input of test config default there ### Testing * https://github.com/pytorch/pytorch/issues/100758. The job is skipped in https://github.com/pytorch/pytorch/actions/runs/4911034089/jobs/8768782689 * https://github.com/pytorch/pytorch/issues/100759. The job is skipped in https://github.com/pytorch/pytorch/actions/runs/4911033966/jobs/8768713669 Note that Windows binary jobs are not run in PR anymore after https://github.com/pytorch/pytorch/pull/100638, and MacOS binary jobs only run nightly. So there are only Linux jobs left. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100754 Approved by: https://github.com/ZainRizvi	2023-05-09 06:53:34 +00:00
Huy Do	1d5577b601	No need to run Windows binary build for every PR (#100638 ) Per the discussion with @malfet , there is no need to run Windows binary build for every PR. We will keep it running in trunk (on push) though just in case. This also moves the workflow back from unstable after the symlink copy fix in `860d444515` Another data point to back this up is the high correlation between Windows binaries debug and release build v.s. Windows CPU CI job. The numbers are: * `libtorch-cpu-shared-with-deps-debug` and `win-vs2019-cpu-py3` has 0.95 correlation * `libtorch-cpu-shared-with-deps-release` and `win-vs2019-cpu-py3` has the same 0.95 correlation The rest is noise, eh? Pull Request resolved: https://github.com/pytorch/pytorch/pull/100638 Approved by: https://github.com/atalman	2023-05-04 21:57:39 +00:00
Huy Do	478a5ddd8a	Mark Windows CPU jobs as unstable (#100581 ) Caused by https://github.com/pytorch/pytorch/pull/100377, something removes VS2019 installation on the non-ephemeral runner. I think moving this to unstable is nicer to gather signals in trunk without completely disable the job or revert https://github.com/pytorch/pytorch/pull/100377 (for the Nth times) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100581 Approved by: https://github.com/clee2000, https://github.com/malfet	2023-05-03 21:43:43 +00:00
Jean Schmidt	2ac6ee7f12	Migrate jobs: `windows.4xlarge`->`windows.4xlarge.nonephemeral` (#100548 ) This is reopening of the PR https://github.com/pytorch/pytorch/pull/100377 # About this PR Due to increased pressure over our windows runners, and the elevated cost of instantiating and bringing down those instances, we want to migrate instances from ephemeral to not ephemeral. Possible impacts are related to breakages in or misbehaves on CI jobs that puts the runners in a bad state. Other possible impacts are related to exhaustion of resources, especially disk space, but memory might be a contender, as CI trash piles up on those instances. As a somewhat middle of the road approach to this, currently nonephemeral instances are stochastically rotated as older instances get higher priority to be terminated when demand is lower. Instances definition can be found here: https://github.com/pytorch/test-infra/pull/4072 This is a first in a multi-step approach where we will migrate away from all ephemeral windows instances and follow the lead of the `windows.g5.4xlarge.nvidia.gpu` in order to help reduce queue times for those instances. The phased approach follows: * migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` instances under `pytorch/pytorch` * migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` instances under `pytorch/pytorch` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` * terminate the existence of `windows.4xlarge` and `windows.8xlarge.nvidia.gpu` * evaluate and start the work related to the adoption of `windows.g5.4xlarge.nvidia.gpu` to replace `windows.8xlarge.nvidia.gpu.nonephemeral` in other repositories and use cases (proposed by @huydhn) The reasoning for this phased approach is to reduce the scope of possible contenders to investigate in case of misbehave of particular CI jobs. # Copilot Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 579d87a</samp> This pull request migrates some windows workflows to use `nonephemeral` runners for better performance and reliability. It also adds support for new Python and CUDA versions for some binary builds. It affects the following files: `.github/templates/windows_binary_build_workflow.yml.j2`, `.github/workflows/generated-windows-binary-*.yml`, `.github/workflows/pull.yml`, `.github/actionlint.yaml`, `.github/workflows/_win-build.yml`, `.github/workflows/periodic.yml`, and `.github/workflows/trunk.yml`. # Copilot Poem <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 579d87a</samp> > _We're breaking free from the ephemeral chains_ > _We're running on the nonephemeral lanes_ > _We're building faster, testing stronger, supporting newer_ > _We're the non-ephemeral runners of fire_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/100377 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/atalman (cherry picked from commit `7caac545b1`) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100548 Approved by: https://github.com/jeanschmidt, https://github.com/janeyx99	2023-05-03 15:47:18 +00:00
PyTorch MergeBot	543b7ebb50	Revert "Migrate jobs from windows.4xlarge windows.4xlarge.nonephemeral instances (#100377 )" This reverts commit `7caac545b1`. Reverted https://github.com/pytorch/pytorch/pull/100377 on behalf of https://github.com/malfet due to This is not the PR I've reviewed ([comment](https://github.com/pytorch/pytorch/pull/100377#issuecomment-1532148086))	2023-05-02 21:05:53 +00:00
Jean Schmidt	7caac545b1	Migrate jobs from windows.4xlarge windows.4xlarge.nonephemeral instances (#100377 ) This is reopening of the PR [100091](https://github.com/pytorch/pytorch/pull/100091) # About this PR Due to increased pressure over our windows runners, and the elevated cost of instantiating and bringing down those instances, we want to migrate instances from ephemeral to not ephemeral. Possible impacts are related to breakages in or misbehaves on CI jobs that puts the runners in a bad state. Other possible impacts are related to exhaustion of resources, especially disk space, but memory might be a contender, as CI trash piles up on those instances. As a somewhat middle of the road approach to this, currently nonephemeral instances are stochastically rotated as older instances get higher priority to be terminated when demand is lower. Instances definition can be found here: https://github.com/pytorch/test-infra/pull/4072 This is a first in a multi-step approach where we will migrate away from all ephemeral windows instances and follow the lead of the `windows.g5.4xlarge.nvidia.gpu` in order to help reduce queue times for those instances. The phased approach follows: * migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` instances under `pytorch/pytorch` * migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` instances under `pytorch/pytorch` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` * terminate the existence of `windows.4xlarge` and `windows.8xlarge.nvidia.gpu` * evaluate and start the work related to the adoption of `windows.g5.4xlarge.nvidia.gpu` to replace `windows.8xlarge.nvidia.gpu.nonephemeral` in other repositories and use cases (proposed by @huydhn) The reasoning for this phased approach is to reduce the scope of possible contenders to investigate in case of misbehave of particular CI jobs. # Copilot Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 579d87a</samp> This pull request migrates some windows workflows to use `nonephemeral` runners for better performance and reliability. It also adds support for new Python and CUDA versions for some binary builds. It affects the following files: `.github/templates/windows_binary_build_workflow.yml.j2`, `.github/workflows/generated-windows-binary-*.yml`, `.github/workflows/pull.yml`, `.github/actionlint.yaml`, `.github/workflows/_win-build.yml`, `.github/workflows/periodic.yml`, and `.github/workflows/trunk.yml`. # Copilot Poem <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 579d87a</samp> > _We're breaking free from the ephemeral chains_ > _We're running on the nonephemeral lanes_ > _We're building faster, testing stronger, supporting newer_ > _We're the non-ephemeral runners of fire_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/100377 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/atalman	2023-05-02 20:41:12 +00:00
PyTorch MergeBot	e5291e633f	Revert "Migrate jobs from windows.4xlarge to windows.4xlarge.nonephemeral instances (#100091 )" This reverts commit `1183eecbf1`. Reverted https://github.com/pytorch/pytorch/pull/100091 on behalf of https://github.com/huydhn due to CPU jobs start failing in trunk due to some error in MSVC setup	2023-04-26 19:17:58 +00:00
Jean Schmidt	1183eecbf1	Migrate jobs from windows.4xlarge to windows.4xlarge.nonephemeral instances (#100091 )	2023-04-26 18:32:50 +02:00
Huy Do	06f19fdbe5	Turn off Windows Defender in temp folder on binary build workflow (#99389 ) This issue starts to show up recently https://github.com/pytorch/pytorch/actions/runs/4724983231/jobs/8385139626 and I'm pretty sure that the root cause is Windows Defender as I did a similar fix on Windows CI a while ago https://github.com/pytorch/pytorch/pull/96931. Without this, Windows binary build could fail flakily when Windows Defender chooses to delete/quarantine a file in the temp folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99389 Approved by: https://github.com/weiwangmeta	2023-04-18 16:45:38 +00:00
Nikita Shulga	2418b94576	Rename default branch to `main` (#99210 ) Mostly `s/@master/@main` in numerous `.yml` files. Keep `master` in `weekly.yml` as it refers to `xla` repo and in `test_trymerge.py` as it refers to a branch PR originates from.	2023-04-16 18:48:14 -07:00

12 commits