mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Disable core dump when rerunning disabled tests (#104131)
Fixes https://github.com/pytorch/pytorch/issues/103612 Figuring out a way to dynamically stop generating core dumps on Linux runner is harder than I thought. The recommend solution is to set a custom script in `/proc/sys/kernel/core_pattern` as documented in https://man7.org/linux/man-pages/man5/core.5.html so that we could dynamically stop generating more core file when the disk space drops below a certain threshold. However, AFAICT this is not yet supported inside Docker container (https://stackoverflow.com/questions/59986788). In addition, when the runner runs out of space, all the subsequent step to clean it up won't be done. The next job running will also fail because nothing could be setup, i.e. https://github.com/pytorch/pytorch/actions/runs/5357044327/jobs/9717914230 So this is only a limit fix to not generate core dumps while re-running disabled tests because a crashed test is run multiple times there and will generate multiple core files. ### Testing ``` ulimit -c 0 kill -3 PID ``` Check that no core file is generated after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104131 Approved by: https://github.com/kit1980, https://github.com/malfet
This commit is contained in:
parent
75dab587ef
commit
202a9108f7
1 changed files with 13 additions and 0 deletions
|
|
@ -58,6 +58,19 @@ if [[ "$BUILD_ENVIRONMENT" == *clang9* ]]; then
|
|||
export VALGRIND=OFF
|
||||
fi
|
||||
|
||||
if [[ "${PYTORCH_TEST_RERUN_DISABLED_TESTS}" == "1" ]]; then
|
||||
# When rerunning disable tests, do not generate core dumps as it could consume
|
||||
# the runner disk space when crashed tests are run multiple times. Running out
|
||||
# of space is a nasty issue because there is no space left to even download the
|
||||
# GHA to clean up the disk
|
||||
ulimit -c 0
|
||||
|
||||
# Note that by piping the core dump to a script set in /proc/sys/kernel/core_pattern
|
||||
# as documented in https://man7.org/linux/man-pages/man5/core.5.html, we could
|
||||
# dynamically stop generating more core file when the disk space drops below a
|
||||
# certain threshold. However, this is not supported inside Docker container atm
|
||||
fi
|
||||
|
||||
# Get fully qualified path using realpath
|
||||
if [[ "$BUILD_ENVIRONMENT" != *bazel* ]]; then
|
||||
CUSTOM_TEST_ARTIFACT_BUILD_DIR=$(realpath "${CUSTOM_TEST_ARTIFACT_BUILD_DIR:-"build/custom_test_artifacts"}")
|
||||
|
|
|
|||
Loading…
Reference in a new issue