RFR: 8344332: (bf) Migrate DirectByteBuffer to use java.lang.ref.Cleaner [v11]

Tue Feb 18 11:04:24 UTC 2025

On Wed, 29 Jan 2025 19:56:00 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> DirectByteBuffers are still using old `jdk.internal.ref.Cleaner` implementation. That implementation carries a doubly-linked list, and so makes DBB suffer from the same issue fixed for generic `java.lang.ref.Cleaner` users with [JDK-8343704](https://bugs.openjdk.org/browse/JDK-8343704). See the bug for the reproducer.
>> 
>> We can migrate DBBs to use `java.lang.ref.Cleaner`.
>> 
>> There are two pecularities during this rewrite.
>> 
>> First, the old ad-hoc `Cleaner` implementation used to exit the VM when cleaning action failed. I presume it was to avoid memory leak / accidental reuse of the buffer. I moved the relevant block to `Deallocator` directly. Unfortunately, I cannot easily test it.
>> 
>> Second is quite a bit hairy. Old DBB cleaning code was hooked straight into `Reference` processing loop. This was possible because we could infer that the weak references we are processing were DBB cleaning actions, since old `Cleaner` was the only use of this code. With standard `Cleaner`, we have lost this association, and so we cannot really do this from the reference processing loop. With the patched version, we now rely on normal `Cleaner` thread to do cleanups for us.
>> 
>> Because of this, there is a new outpacing opportunity window where reference processing might have been over, but cleaner thread has not reacted yet. This is why we need another way to check progress that involves checking if cleaner has acted.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `java/nio java/io`
>>  - [x] Linux AArch64 server fastdebug, `java/nio java/io`
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux AArch64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert waitForReferenceProcessing removals, see JDK-8305186

There are several changes here that I think are of interest.

(1) Put the cleanables for DirectByteBuffer in a separate j.l.r.Cleaner.  I
don't think that was done in previous experiments with eliminating the use of
j.i.r.Cleaner.  That seems like a good thing to do, to avoid interleaving the
processing of these short and potentially more performance-critical cleanables
with others.

(2) Do a GC for each iteraton of the slow path (i.e. after each exponentially
increasing sleep period).  Neither the existing code (which I had a hand in)
nor its predecessor did that.  I'm currently doubtful of this being a good
idea.  The exponential backoff in the current code was just retained from the
prior code, and I think doesn't actually do much other than delay OOME to
allow other threads to happen to close any direct buffers they might be using.
In the current code (using j.i.r.Cleaner), once waitForReference returns false
all Cleaners discovered by a prior GC will have been processed.  I wonder,
with the current code, under what circumstances do we actually get into those
sleeps, but don't ultimately OOME?

To provide something similar with j.l.r.Cleaner would require an operation
on that class similar to waitForReferenceProcessing.  If there is pending
cleanup work then wait for some and then return true.  If there isn't any
pending cleanup work then return false.  I've spent some time thinking about
how this could be done, and think it's feasible (working on a prototype that
seems like it should work, but hasn't been tested at all yet), though not as
simple as one might wish.

(3) Put the slow path under a lock.  That seems clearly essential with the
change to iterate GCs.  It might be that it's a mistake in the old code to not
do something similar.  The current code allows essentially back to back to
back GCs as multiple threads hit the slow path more or less simultaneously.
You could have several threads come in and all waitForReferenceProcessing, all
finish that loop at the same time and gang-request GCs.  That doesn't seem
good.

(4) Add the canary mechanism.  I think this is a poor replacement for what
waitForReferenceProcessing does currently, since it is not even remotely
reliable.  It might make the path to OOME less likely, but bad luck could make
it do nothing useful at all.

src/java.base/share/classes/java/nio/Bits.java line 122:

> 120:         // Short on memory, with potentially many threads competing for it.
> 121:         // To alleviate progress races, acquire the lock and go slow.
> 122:         synchronized (Bits.class) {

Using a (somewhat) public object for this seems questionable.  Why not a
private lock object?

src/java.base/share/classes/java/nio/Bits.java line 156:

> 154:                 // Exponentially back off waiting for Cleaner to catch up.
> 155:                 try {
> 156:                     Thread.sleep(sleepTime);

There isn't a tryReserveMemory after the last sleep period, so that period is
just completely wasted time.  This is why the old code didn't use a for-loop
over the sleep counter, but instead checked for reaching the limit in the
middle of the loop.

-------------

Changes requested by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22165#pullrequestreview-2623158486
PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1959508352
PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1959509819