RFR: 8371990: Remove two second delayed OOME after GC shutdown
Stefan Karlsson
stefank at openjdk.org
Mon Nov 17 13:53:40 UTC 2025
On Mon, 17 Nov 2025 13:15:14 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
> In [JDK-8366865](https://bugs.openjdk.org/browse/JDK-8366865) the shutdown code was tweaked so that allocating code would try to block for two seconds and if the JVM didn't shut down within that time, an OOME was thrown from the allocating thread.
>
> One of the reason why this code was introduced was to deal with shutdown problem where the thread that were shutting down the JVM would first initiate the shutdown of the GC and *after* that the thread would call the JVMTI shutdown events and callbacks. The JVMTI callbacks could call arbitrary Java code that could try to allocate memory, and if the heap was filled up, it would have to wait for a GC to do its thing and hand back memory. But the GC had initiated its termination protocol and could be unresponsive to that request, which in term would lead to hanging JVM process.
>
> The problem described above was finally fixed with [JDK-8367902](https://bugs.openjdk.org/browse/JDK-8367902).
>
> So, I propose that we get rid of the workaround put into place with [JDK-8366865](https://bugs.openjdk.org/browse/JDK-8366865).
>
> The proposed patch restructures the GC shutdown a little bit. The idea is all threads that want to schedule a GC VM Operation already take the Heap_lock, and while holding that lock they check the `_is_shutting_down` variable. If the the JVM indeed is shutting down, the threads refuse to schedule the GC operation.
>
> Depending on the type of thread that is trying to schedule the GC operation we do one out of two things:
>
> 1) If it is a Java thread, we simply block the thread from running. The thread is either a daemon thread and the blocking of the thread will not hinder the shutdown. Or, the thread is a non-daemon thread but the Java code called System.halt, which doesn't wait for non-daemon threads.
>
> 2) If it is a Concurrent GC thread, then we let the thread proceed but with the order to skip the GC operation. This is done because the current shutdown code calls "stop" on the Concurrent GC threads and then wait for them to signal back when they have stopped running their code. So, we need to let them run to completion.
>
> There are some G1 specific details to look at:
>
> 1) I've reverted the G1 `concurrent_mark_is_terminating` checks.
>
> 2) `try_collect_concurrently` queries the `_is_shutting_down` while holding the lock, and then uses that queried value after the lock is released.
>
> 3) I've left some breadcrumbs in `should_clear_region`. Any suggestions on what to do with the comment and assert?
>
> This has been ...
Moved from hotspot to hotspot-gc.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28349#issuecomment-3541930053
More information about the hotspot-gc-dev
mailing list