RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2]

Dean Long dlong at openjdk.org
Tue Aug 5 04:00:09 UTC 2025


On Tue, 1 Jul 2025 09:11:32 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:

>> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations.
>> 
>> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out.
>> 
>> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers.
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1, tier2 on all platforms
>>  - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug
>>  - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail)
>
> Manuel Hässig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8308094-timeout
>  - Fix SIGALRM test
>  - Add timeout functionality to compiler threads

src/hotspot/share/compiler/compileBroker.cpp line 236:

> 234:     {
> 235:       MutexLocker notifier(thread, CompileTaskWait_lock);
> 236:       thread->timeout_disarm();

Is holding the lock above important for disasming?  If not, can we move the disarms from the if/else branches and do it unconditionally before the if?

src/hotspot/share/compiler/compilerThread.cpp line 97:

> 95:     switch (signo) {
> 96:       case TIMEOUT_SIGNAL: {
> 97:         assert(!Atomic::load_acquire(&_timeout_armed), "compile task timed out");

Why do we need acquire?  Only the current thread is ever going to be looking at this value, right?

src/hotspot/share/compiler/compilerThread.cpp line 157:

> 155:     // Start the timer.
> 156:     timer_settime(_timeout_timer, 0, &its, nullptr);
> 157:     Atomic::release_store(&_timeout_armed, (bool) true);

Same questions about release.  Are other threads reading/writing this value?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253044899
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253045931
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2253046604


More information about the hotspot-dev mailing list