RFR: 8371990: Remove two second delayed OOME after GC shutdown

Tue Nov 18 10:21:32 UTC 2025

On Mon, 17 Nov 2025 13:15:14 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> In [JDK-8366865](https://bugs.openjdk.org/browse/JDK-8366865) the shutdown code was tweaked so that allocating code would try to block for two seconds and if the JVM didn't shut down within that time, an OOME was thrown from the allocating thread.
> 
> One of the reason why this code was introduced was to deal with shutdown problem where the thread that were shutting down the JVM would first initiate the shutdown of the GC and *after* that the thread would call the JVMTI shutdown events and callbacks. The JVMTI callbacks could call arbitrary Java code that could try to allocate memory, and if the heap was filled up, it would have to wait for a GC to do its thing and hand back memory. But the GC had initiated its termination protocol and could be unresponsive to that request, which in term would lead to hanging JVM process.
> 
> The problem described above was finally fixed with [JDK-8367902](https://bugs.openjdk.org/browse/JDK-8367902).
> 
> So, I propose that we get rid of the workaround put into place with [JDK-8366865](https://bugs.openjdk.org/browse/JDK-8366865).
> 
> The proposed patch restructures the GC shutdown a little bit. The idea is all threads that want to schedule a GC VM Operation already take the Heap_lock, and while holding that lock they check the `_is_shutting_down` variable. If the the JVM indeed is shutting down, the threads refuse to schedule the GC operation.
> 
> Depending on the type of thread that is trying to schedule the GC operation we do one out of two things:
> 
> 1) If it is a Java thread, we simply block the thread from running. The thread is either a daemon thread and the blocking of the thread will not hinder the shutdown. Or, the thread is a non-daemon thread but the Java code called System.halt, which doesn't wait for non-daemon threads.
> 
> 2) If it is a Concurrent GC thread, then we let the thread proceed but with the order to skip the GC operation. This is done because the current shutdown code calls "stop" on the Concurrent GC threads and then wait for them to signal back when they have stopped running their code. So, we need to let them run to completion.
> 
> There are some G1 specific details to look at:
> 
> 1) I've reverted the G1 `concurrent_mark_is_terminating` checks.
> 
> 2) `try_collect_concurrently` queries the `_is_shutting_down` while holding the lock, and then uses that queried value after the lock is released.
> 
> 3) I've left some breadcrumbs in `should_clear_region`. Any suggestions on what to do with the comment and assert?
> 
> This has been ...

Marked as reviewed by iwalulya (Reviewer).

src/hotspot/share/gc/g1/g1Policy.cpp line 1284:

> 1282:   // We should not be starting a concurrent start pause if the concurrent mark
> 1283:   // thread is terminating.
> 1284:   assert(!_g1h->concurrent_mark_is_terminating(), "Should not reach here");

We can also remove this assert and associated comment.

src/hotspot/share/gc/g1/g1RemSet.cpp line 1020:

> 1018:       // Mark phase midway, which might have also left stale marks in old generation regions.
> 1019:       // There might actually have been scheduled multiple collections, but at that point we do
> 1020:       // not care that much about performance and just do the work multiple times if needed.

We can do away with the comment and the assert as we can no longer have GCs after shutdown has been initiated.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28349#pullrequestreview-3476648375
PR Review Comment: https://git.openjdk.org/jdk/pull/28349#discussion_r2537281693
PR Review Comment: https://git.openjdk.org/jdk/pull/28349#discussion_r2537241668