RFR: 8344332: (bf) Migrate DirectByteBuffer to use java.lang.ref.Cleaner [v13]
Kim Barrett
kbarrett at openjdk.org
Sun Feb 23 10:23:05 UTC 2025
On Tue, 18 Feb 2025 14:33:16 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> DirectByteBuffers are still using old `jdk.internal.ref.Cleaner` implementation. That implementation carries a doubly-linked list, and so makes DBB suffer from the same issue fixed for generic `java.lang.ref.Cleaner` users with [JDK-8343704](https://bugs.openjdk.org/browse/JDK-8343704). See the bug for the reproducer.
>>
>> We can migrate DBBs to use `java.lang.ref.Cleaner`.
>>
>> There are two pecularities during this rewrite.
>>
>> First, the old ad-hoc `Cleaner` implementation used to exit the VM when cleaning action failed. I presume it was to avoid memory leak / accidental reuse of the buffer. I moved the relevant block to `Deallocator` directly. Unfortunately, I cannot easily test it.
>>
>> Second is quite a bit hairy. Old DBB cleaning code was hooked straight into `Reference` processing loop. This was possible because we could infer that the weak references we are processing were DBB cleaning actions, since old `Cleaner` was the only use of this code. With standard `Cleaner`, we have lost this association, and so we cannot really do this from the reference processing loop. With the patched version, we now rely on normal `Cleaner` thread to do cleanups for us.
>>
>> Because of this, there is a new outpacing opportunity window where reference processing might have been over, but cleaner thread has not reacted yet. This is why we need another way to check progress that involves checking if cleaner has acted.
>>
>> Additional testing:
>> - [x] Linux x86_64 server fastdebug, `java/nio java/io`
>> - [x] Linux AArch64 server fastdebug, `java/nio java/io`
>> - [x] Linux x86_64 server fastdebug, `all`
>> - [x] Linux AArch64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> A bit more comments
I've made a prototype that adds Cleaner.waitForCleaning() and changes
Bits.reserveMemory() to use it.
https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:wait-for-cleaning?expand=1
It's based on the shipilev commits 1-19, for ease of comparison and
prototyping. There are also some shortcuts (like making some things public
that likely shouldn't be), again for ease of prototyping.
I ran the DirectBufferAllocTest test a few hundred times without any
problems. I also ran it directly, configured with a 1/2 hour runtime, again
without problems. I ran the two new micros against baseline, the shipilev
changes only, and the waitForCleaning changes. Results below.
DirectByteBufferGC
baseline
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.036 ± 0.244 ms/op
DirectByteBufferGC.test 65536 avgt 9 7.117 ± 0.184 ms/op
DirectByteBufferGC.test 262144 avgt 9 18.482 ± 1.064 ms/op
DirectByteBufferGC.test 1048576 avgt 9 194.590 ± 17.639 ms/op
DirectByteBufferGC.test 4194304 avgt 9 802.354 ± 57.576 ms/op
shipilev
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.104 ± 0.045 ms/op
DirectByteBufferGC.test 65536 avgt 9 6.875 ± 0.244 ms/op
DirectByteBufferGC.test 262144 avgt 9 17.943 ± 0.446 ms/op
DirectByteBufferGC.test 1048576 avgt 9 59.002 ± 3.616 ms/op
DirectByteBufferGC.test 4194304 avgt 9 331.328 ± 36.580 ms/op
waitForCleaning
Benchmark (count) Mode Cnt Score Error Units
DirectByteBufferGC.test 16384 avgt 9 4.058 ± 0.167 ms/op
DirectByteBufferGC.test 65536 avgt 9 6.775 ± 0.281 ms/op
DirectByteBufferGC.test 262144 avgt 9 17.724 ± 0.602 ms/op
DirectByteBufferGC.test 1048576 avgt 9 57.284 ± 2.679 ms/op
DirectByteBufferGC.test 4194304 avgt 9 330.002 ± 44.200 ms/op
DirectByteBufferChurn
baseline
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 15.693 ± 0.991 ns/op
DirectByteBufferChurn.test 256 avgt 9 11.255 ± 0.369 ns/op
DirectByteBufferChurn.test 512 avgt 9 9.422 ± 0.382 ns/op
DirectByteBufferChurn.test 1024 avgt 9 8.529 ± 0.311 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.833 ± 0.287 ns/op
shipilev
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 12.524 ± 0.476 ns/op
DirectByteBufferChurn.test 256 avgt 9 9.248 ± 0.175 ns/op
DirectByteBufferChurn.test 512 avgt 9 8.583 ± 0.197 ns/op
DirectByteBufferChurn.test 1024 avgt 9 8.227 ± 0.274 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.526 ± 0.394 ns/op
waitForCleaning
Benchmark (recipFreq) Mode Cnt Score Error Units
DirectByteBufferChurn.test 128 avgt 9 11.235 ± 1.998 ns/op
DirectByteBufferChurn.test 256 avgt 9 9.129 ± 0.324 ns/op
DirectByteBufferChurn.test 512 avgt 9 8.446 ± 0.115 ns/op
DirectByteBufferChurn.test 1024 avgt 9 7.786 ± 0.425 ns/op
DirectByteBufferChurn.test 2048 avgt 9 7.477 ± 0.150 ns/op
The baseline DirectByteBufferGC suffers significantly at higher counts,
probably because of the long doubly-linked list of registered
jdk.internal.ref.Cleaner objects.
By eye the performance of both shipilev and waitForCleaning on the DBBChurn
benchmark seem to be consistently slightly better than baseline, and within
the noise of each other for both benchmarks.
This waitForCleaning approach raises several questions.
(1) Is the improved reliability of this approach worth the additional effort?
My inclination is to say yes. But note that it should perhaps be different
effort - see item (2).
(2) Should Cleaner be responsible for providing the additional functionality?
An alternative is to have Bits implement it's own cleaning, directly using a
private PhantomReference derivative, ReferenceQueue, and Thread.
If there were multiple (potential) clients for this sort of thing then putting
it in Cleaner may makes sense, so long as their requirements are similar. But
we're kind of fighting with it, esp. with the canary approach, but even with
the waitForCleaning approach. I think things could be simpler and easier to
understand with a direct implementation. And Cleaner was not intended to be
the last word in the area of reference-based cleanup. Rather, it's a
convenient package for addressing some common patterns (esp. with
finalization). It provides some useful infrastructure that would need to be
replaced in a bespoke implementation, but I don't think there's anything all
that large or complicated there, and some streamlining is possible.
Note that I haven't done any work in that direction yet.
(3) Do we need the retry stuff, and how should it be done? I did some
monitoring of the retries, and waitForCleaning DBBA does need the retries
(though rarely more than 1, and I never saw more than 2). I kind of feel like
the retries are a kludgy workaround for what seems like an unreasonable test,
but oh well.
The point of the retries is to allow other threads to drop some DBBs that can
become reclaimable during the associated wait, giving the waiting thread an
opportunity to make use of that now available memory.
I experimented with a retry-style more similar to the current code, where each
thread on the slow path will do at most one GC and then go into a wait/sleep
loop if reservation continues to fail. (I added some locking to avoid the
concurrent requests for GC issue with the current code.) That didn't seem to
make any difference compared to something like the proposed serialized GC and
sleep loop. I can construct theoretical cases that seem to favor either, but
the style in the proposal seems simpler.
Of course, the question of when to do GCs and how many to do is moot if
-XX:+DisableExplicitGC is used. (The default for that option is false.)
src/java.base/share/classes/java/nio/Bits.java line 149:
> 147: // 1. GC needs to discover dead references and hand them over to Reference
> 148: // processing thread. This activity can be asynchronous and can complete after
> 149: // we unblock from System.gc().
Adding cleared references to the pending list is always completed before the
GC invocation completes. Doing otherwise would break or significantly
complicate Reference.waitForReferenceProcessing(), and to no good purpose.
That function should only return false when all references cleared by a prior
GC have been enqueued in their respective queues. There are tests that depend
on that. (I looked, and so far as I can tell, none of the extant GCs violate
this.)
src/java.base/share/classes/java/nio/Bits.java line 166:
> 164: // install the canary, call System.gc(), wait for canary to get processed (dead). This
> 165: // signals that since our last call to System.gc(), steps (1) and (2) have finished, and
> 166: // step (3) is currently in progress.
The canary having been processed doesn't tell us anything definitive about
step (2), because steps (2) and (3) occur concurrently. The reference
processing thread transfers references to their respective queues, and the
cleaner thread processes references that have been placed in its queue, both
running at the same time. All we know is that the canary got transferred and
cleaned. There may or many not have been other references similarly
transferred and cleaned. There may or may not be more references in the
pending list, in the cleaner queue, or both. That the canary has been cleaned
doesn't give us any information about either the pending list or the cleaner
queue.
src/java.base/share/classes/java/nio/Bits.java line 172:
> 170: // corner case would be handled with a normal retry attempt, after trying to allocate.
> 171: // If allocation succeeds even after partial cleanup, we are done. If it does not, we get
> 172: // to try again, this time reliably getting the results of the first cleanup run. Not
After trying again, all we know is that both the previous and the new canary
have been processed. We don't know anything about other references, either
from the latest or preceeding GCs. Consider this unfortunate scenario. The
first canary ended up at the head of the pending list. The reference
processing thread transfered it to the cleaner queue and then stalled. The
cleaner thread processed the first canary. The waiting thread noted that,
retried and failed the reservation, and retried the GC and wait for the
canary. The retry canary also happened to end up at the front of the updated
pending list. The reference processor transferred it and again stalled. The
cleaner thread processed the retry canary. No real cleaning has happened,
i.e. we have not reliably gotten the results of the first cleanup run.
-------------
Changes requested by kbarrett (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/22165#pullrequestreview-2635485651
PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720234
PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720361
PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720509
More information about the core-libs-dev
mailing list