RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v5]

Christian Hagedorn chagedorn at openjdk.org
Wed Aug 13 08:43:17 UTC 2025


On Fri, 8 Aug 2025 10:51:42 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:

>> This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations.
>> 
>> The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out.
>> 
>> Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers.
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1, tier2 on all platforms
>>  - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug
>>  - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail)
>
> Manuel Hässig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8308094-timeout
>  - Rename _timer
>  - remove _timeout_armed
>  - ASSERT
>  - Merge branch 'master' into JDK-8308094-timeout
>  - No acquire release semantics
>  - Factor Linux specific timeout functionality out of share/
>  - Move timeout disarm above if
>  - Merge branch 'master' into JDK-8308094-timeout
>  - Fix SIGALRM test
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/c5920029...8bb5eb7a

Nice improvement! I left some small comments in the code but otherwise the change looks reasonable!

Can we also add some tests for the new `CompileTaskTimeout` flag? Maybe we can add a positive test and negative test:
- Positive test: Could just be a hello world test with a reasonably large non-zero value for `CompileTaskTimeout`.
- Negative test: Maybe we can just set `CompileTaskTimeout=1` which will probably crash immediately for a hello world program. That could be run in a separate VM and then we can check the output. If we are able to also dump the compile task/method that is timing out, we might even be able to match on that when run with `CompileOnly` for a single method. But not sure if the latter is possible.

What do you think?

src/hotspot/os/linux/compilerThreadTimeout_linux.cpp line 43:

> 41:   switch (signo) {
> 42:     case TIMEOUT_SIGNAL: {
> 43:       assert(false, "compile task timed out");

Can you somehow also print the task which caused the timeout? Will just accessing `CompilerThread::current()->task()` work?

src/hotspot/os/linux/compilerThreadTimeout_linux.hpp line 46:

> 44: #endif // !PRODUCT
> 45:  public:
> 46:   CompilerThreadTimeoutLinux() NOT_PRODUCT(DEBUG_ONLY(: _timer(nullptr))) {};

Why do you need the `NOT_PRODUCT`? It only wraps `DEBUG_ONLY`. If that's not set, the `NOT_PRODUCT` wraps nothing.

src/hotspot/os/linux/globals_linux.hpp line 94:

> 92:   develop(intx, CompileTaskTimeout, 0,                                  \
> 93:           "Set the timeout for compile tasks' CPU time in milliseconds."\
> 94:           "0 = no timeout (default)")                                   \

Suggestion:

          " 0 = no timeout (default)")                                  \

src/hotspot/share/compiler/compilerThread.hpp line 52:

> 50:   CompilerThreadTimeoutGeneric() {};
> 51:   void arm() { return; };
> 52:   void disarm() { return; };

You can remove the `return`:
Suggestion:

  void arm() {};
  void disarm() {};

src/hotspot/share/compiler/compilerThread.hpp line 54:

> 52:   void disarm() { return; };
> 53:   bool init_timeout() { return true; };
> 54: };

Should we also guard this with `ifndef LINUX` since it's only used for non-Linux?

-------------

PR Review: https://git.openjdk.org/jdk/pull/26023#pullrequestreview-3114494124
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272483803
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272509382
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272512593
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272517713
PR Review Comment: https://git.openjdk.org/jdk/pull/26023#discussion_r2272516811


More information about the hotspot-dev mailing list