RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5]
Christian Hagedorn
chagedorn at openjdk.org
Wed Aug 13 08:43:17 UTC 2025
On Fri, 8 Aug 2025 10:51:42 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:
>> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations.
>>
>> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out.
>>
>> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers.
>>
>> Testing:
>> - [x] Github Actions
>> - [x] tier1, tier2 on all platforms
>> - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug
>> - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail)
>
> Manuel Hässig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
>
> - Merge branch 'master' into JDK-8308094-timeout
> - Rename _timer
> - remove _timeout_armed
> - ASSERT
> - Merge branch 'master' into JDK-8308094-timeout
> - No acquire release semantics
> - Factor Linux specific timeout functionality out of share/
> - Move timeout disarm above if
> - Merge branch 'master' into JDK-8308094-timeout
> - Fix SIGALRM test
> - ... and 1 more: https://git.openjdk.org/jdk/compare/c5920029...8bb5eb7a
Nice improvement! I left some small comments in the code but otherwise the change looks reasonable!
Can we also add some tests for the new `CompileTaskTimeout` flag? Maybe we can add a positive test and negative test:
- Positive test: Could just be a hello world test with a reasonably large non-zero value for `CompileTaskTimeout`.
- Negative test: Maybe we can just set `CompileTaskTimeout=1` which will probably crash immediately for a hello world program. That could be run in a separate VM and then we can check the output. If we are able to also dump the compile task/method that is timing out, we might even be able to match on that when run with `CompileOnly` for a single method. But not sure if the latter is possible.
What do you think?
src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 43:
> 41: switch (signo) {
> 42: case TIMEOUT_SIGNAL: {
> 43: assert(false, "compile task timed out");
Can you somehow also print the task which caused the timeout? Will just accessing `CompilerThread::current()->task()` work?
src/hotspot/os/linux/compilerThreadTimeout_linux.hpp line 46:
> 44: #endif // !PRODUCT
> 45: public:
> 46: CompilerThreadTimeoutLinux() NOT_PRODUCT(DEBUG_ONLY(: _timer(nullptr))) {};
Why do you need the `NOT_PRODUCT`? It only wraps `DEBUG_ONLY`. If that's not set, the `NOT_PRODUCT` wraps nothing.
src/hotspot/os/linux/globals_linux.hpp line 94:
> 92: develop(intx, CompileTaskTimeout, 0, \
> 93: "Set the timeout for compile tasks' CPU time in milliseconds."\
> 94: "0 = no timeout (default)") \
Suggestion:
" 0 = no timeout (default)") \
src/hotspot/share/compiler/compilerThread.hpp line 52:
> 50: CompilerThreadTimeoutGeneric() {};
> 51: void arm() { return; };
> 52: void disarm() { return; };
You can remove the `return`:
Suggestion:
void arm() {};
void disarm() {};
src/hotspot/share/compiler/compilerThread.hpp line 54:
> 52: void disarm() { return; };
> 53: bool init_timeout() { return true; };
> 54: };
Should we also guard this with `ifndef LINUX` since it's only used for non-Linux?
-------------
PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3114494124
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272483803
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272509382
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272512593
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272517713
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272516811
More information about the hotspot-dev
mailing list