RFR: JDK-8149925 We don't need jdk.internal.ref.Cleaner any more

Tue Apr 5 14:41:26 UTC 2016

Hi Roger,

On 04/04/2016 11:50 PM, Roger Riggs wrote:
> Hi Peter,
>
> Stepping back just a bit.

Right, let's clear up.

>
> The old Cleaner running on the Reference processing thread had a few 
> (2) very well controlled
> functions, reference processing and deallocating DirectByteBuffers.  
> Maybe we can't do too much
> better than that.

...yes, at the beginning, until it was (re)used for other purposes too, 
in: java.lang.ProcessImpl, 
java.lang.invoke.MethodHandleNatives.CallSiteContext, 
jdk.internal.perf.Perf, sun.nio.ch.IOVecWrapper, sun.nio.fs.NativeBuffer 
and sun.java2d.marlin.OffHeapArray.

But those other usages have been converted to use new 
java.lang.ref.Cleaner so old cleaner is now back to basics - 
DirectByteBuffers. And with that, DirectByteBuffers allocating threads 
only help ReferenceHandler thread enqueue References and execute 
DirectByteBuffer deallocators, which is an improvement.

But should we keep that status quo? It's nothing wrong with it as it is, 
except I think we can do better.

>
> The old worst case performance/latency wise was the reference 
> processing thread
> did the work and the allocating thread did very little synchronizing 
> and just did the retries.

The number of retries was exactly the same as the number of References 
helped to be enqueued or in case of Cleaner(s), executed:

         // retry while helping enqueue pending Reference objects
         // which includes executing pending Cleaner(s) which includes
         // Cleaner(s) that free direct buffer memory
         while (jlra.tryHandlePendingReference()) {
             if (tryReserveMemory(size, cap)) {
                 return;
             }
         }

If the share of pending References that are also Cleaners was high, 
chances were higher that not much helping was needed as one cleaned 
DBB.Deallocator could unreserve enough memory for next reservation 
attempt to succeed. So allocating thread helped only until it succeeded 
in reserving the native memory leaving the rest of work to another 
allocating request/thread or to ReferenceHandler.

> In the best case, all the real work was done by the allocating thread, 
> if the interactions with GC
> work out perfectly.  But it was still the case that the buffer 
> alloc/dealloc throughput was
> met with the division of work separating the reference processing 
> thread and the allocating thread.

Yes, whichever thread was quicker. If ReferenceHandler thread had been 
waking up from wait() for a long time, allocating thread could have 
already processed all the References before ReferenceHandler finally 
started to look around. If there were lots of new pending referenced 
discovered, ReferenceHandler thread could finally join the party and 
fight for the same lock...

>
> The function that can only be provided by CleanerImpl / Reference 
> processing thread state is
> knowledge that the cleaning  queue is empty.

...and that the discovered pending references have actually been 
enqueued before that...

> The helping functions were/are a bit troublesome because of mixing 
> execution
> environments of the thread allocating direct buffers and the 
> cleanables and it seemed that
> more than a little complexity was needed to compensate.

I totally agree.

>
> If the bottleneck in processing is between the reference processing 
> and cleanup
> then it should be ok (based on previous comments) for the CleanerImpl 
> to help with
> reference processing (after it has an empty queue and before it blocks 
> waiting or in every loop).
> Though if you already tried this combination, I don't recall the results.

I don't thing there is a problem because of any bottleneck. And if there 
was a bottleneck we would only have a problem with 
allocation/deallocation throughput and not with OOME(s). The problem is 
because reference discovery is not triggered as a result of native 
memory reservation approaching or reaching the limit. There is no heap 
memory pressure from DirectByteBuffer(s) because they are small objects. 
So a mechanism must be in place that triggers reference discovery and 
waits for discovered references to be processed before failing the 
native memory allocation. A mechanism that tries to simulate what 
happens with GC when there is heap memory pressure. GC guarantees that 
full-GC is executed and heap allocation retried after that before 
finally giving up with OOME. We need a mechanism that attempts to do the 
same for direct memory. Throughput is a nice property but we are not 
directly seeking its improvement. We just not want to make things much 
worse.

Helping the Cleaner thread to process cleanup functions is the easiest 
way to wait for cleanup functions to be processed and for queue to 
drain. Simply because of ReferenceQueue API. If you poll() next 
Reference from the queue and get null, you know the queue is empty, but 
if you get something, you have to execute it and not just ignore it. 
Maybe we could patch into the ReferenceQueue implementation and extend 
its API with an internal method that would not return next Reference but 
just information that ReferenceHandler thread has done so or that the 
queue is empty. I'll think about it.

>
> As you pointed out it would be more efficient if the allocating thread 
> could be aware
> when it was known there was nothing ready to cleanup so it can retry 
> and invoke GC or
> throw out of memory if appropriate.
> Adding a method that returned the count of completed cleaning cycles 
> (or similar)
> to CleanerImpl could exist with a minimal of coupling and still provide
> the information needed without commingling the execution threads.

I'll think about how to surface this functionality in the CleanerImpl 
most elegantly. The functionality of providing only the counter of 
cleaning cycles as a getter might not be most appropriate. What we also 
need is some mechanism to wait and be woken up to retry reservation only 
at appropriate points in time otherwise allocating threads could just 
spin eating CPU time. So my latest attempt was to encapsulate the entire 
retry logic inside ExtendedCleaner with ByteBuffer/Bits only providing 
allocation function to this logic, which in my view of API is pretty 
decoupled and general.

>
> I don't see the need to change Cleaner to an interface to be able to 
> provide
> an additional method on CleanerImpl or a subclass and a factory method 
> could
> provide for a clean and very targeted interface to Bits/Direct buffer.

I would like this to be an instance method so it would naturally pertain 
to a particular Cleaner instance. Or it could be a static method that 
takes a Cleaner instance. One of my previous webrevs did have such 
method on the CleanerImpl, but I was advised to move it to Cleaner as a 
package-private method and expose it via SharedSecrets to internal code. 
I feel such "camouflage" is very awkward now that we have modules and 
other mechanisms exist. So I thought it would be most elegant to make 
Cleaner an interface so it can be extended with an internal interface to 
communicate intent in a type-safe and auto-discoverable way. The change 
to make it interface:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/

...actually simplifies implementation (33 lines removed in total) and 
could be seen as an improvement in itself.

Are you afraid that if Cleaner was an interface, others would attempt to 
make implementations of it? Now that we have default methods on 
interfaces it is easy to compatibly extend the API even if it is an 
interface so that no 3rd party implementations are immediately broken. 
Are you thinking of security implications when some code is handed a 
Cleaner instance that it doesn't trust? I don't think there is a utility 
for Cleaner instances to be passed from untrusted to trusted code, do you?

In the end it doesn't really matter. We can do it one way or the other. 
I just feel that using an interface is cleaner.

>
> I'm sorry I haven't had time to try out concretely what I have in mind.
> Please correct or remind me of missing salient considerations.

The bottom line is that we need a mechanism that:

- triggers reference discovery when native memory limit is approached or 
reached
- retires native memory reservation at appropriate time slots until 
succeeding or until all pending references have been processed and 
Cleanables executed at which time native memory reservation can fail 
with OOME.
- if possible, doesn't execute cleanup functions by the allocating 
thread but just waits for system threads to do the job.
- when triggered, does not make native memory allocation a bottleneck.

I think that what I did in my latest webrevs with ReferenceHandler 
thread is an improvement in minimizing contended synchronization and 
interference of allocating thread(s) with Reference enqueue-ing. But 
interaction of allocating thread(s) with Cleaner background thread could 
be improved and I have a couple of ideas to explore.

>
> Thanks, Roger
>

Regards, Peter

>
> On 4/2/2016 7:24 AM, Peter Levart wrote:
>> Hi Roger,
>>
>> Thanks for looking at the patch.
>>
>> On 04/02/2016 01:31 AM, Roger Riggs wrote:
>>> Hi Peter,
>>>
>>> I overlooked the introduction of another nested class (Task) to 
>>> handle the cleanup.
>>> But there are too many changes to see which ones solve a single 
>>> problem.
>>>
>>> Sorry to make more work, but I think we need to go back to the 
>>> minimum necessary
>>> change to make progress on this. Omit all of the little cleanups 
>>> until the end
>>> or do them first and separately.
>>>
>>> Thanks, Roger
>>
>> No Problem. I understand. So let's proceed in stages. Since part1 is 
>> already pushed, I'll call part2 stages with names: part2.1, part2.2, 
>> ... and I'll start counting webrev revisions from 01 again, so webrev 
>> names will be in the form: webrev.part2.1.rev01. Each part will be an 
>> incremental change to the previous one.
>>
>> part2.1: This is preparation work to be able to have an extended 
>> java.lang.ref.Cleaner type for internal use. Since 
>> java.lang.ref.Cleaner is a final class, I propose to make it an 
>> interface instead:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/
>>
>> This is a source-compatible change and it also simplifies 
>> implementation (no injection of Cleaner.impl access function into 
>> CleanerImpl needed any more). What used to be java.lang.ref.Cleaner 
>> is renamed to jdk.internal.ref.CleanerImpl. What used to be 
>> jdk.internal.ref.CleanerImpl is now a nested static class 
>> jdk.internal.ref.CleanerImpl.Task (because it implements Runnable). 
>> Otherwise nothing has changed in the overall architecture of the 
>> Cleaner except that public-facing API is now an interface instead of 
>> a final class. This allows specifying internal extension interface 
>> and internal extension implementation.
>>
>> CleanerTest passes with this change.
>>
>> So what do you think?
>>
>> Regards, Peter
>>
>>>
>>>
>>>
>>>
>>> On 4/1/16 5:51 PM, Roger Riggs wrote:
>>>> Hi Peter,
>>>>
>>>> Thanks for the diffs to look at.
>>>>
>>>> Two observations on the changes.
>>>>
>>>> - The Cleaner instance was intentionally and necessarily different 
>>>> than the CleanerImpl to enable
>>>> the CleanerImpl and its thread to terminate if the Cleaner is not 
>>>> longer referenced.
>>>> Folding them into a single object breaks that.
>>>>
>>>> Perhaps it is not too bad for ExtendedCleaner to subclass 
>>>> CleanerImpl with the cleanup helper/supplier behavior
>>>> and expose itself to Bits. There will be fewer moving parts. There 
>>>> is no need for two factory methods for
>>>> ExtendedCleaner unless you are going to use  a separate ThreadFactory.
>>>>
>>>> - The Deallocator (and now Allocator) nested classes are identical, 
>>>> and there is a separate copy for each
>>>> type derived from the Direct-X-template.  But it may not be worth 
>>>> fixing until the rest of it is settled to avoid
>>>> more moving parts.
>>>>
>>>> I don't have an opinion on the code changes in Reference, that's 
>>>> different kettle of fish.
>>>>
>>>> More next week.
>>>>
>>>> Have a good weekend, Roger
>>>>
>>>>
>>>> On 4/1/2016 12:46 PM, Peter Levart wrote:
>>>>>
>>>>>
>>>>> On 04/01/2016 06:08 PM, Peter Levart wrote:
>>>>>>
>>>>>>
>>>>>> On 04/01/2016 05:18 PM, Peter Levart wrote:
>>>>>>> @Roger:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> About entanglement between nio Bits and 
>>>>>>> ExtendedCleaner.retryWhileHelpingClean(). It is the same level 
>>>>>>> of entanglement as between the DirectByteBuffer constructor and 
>>>>>>> Cleaner.register(). In both occasions an action is provided to 
>>>>>>> the Cleaner. Cleaner.register() takes a cleanup action and 
>>>>>>> ExtendedCleaner.retryWhileHelpingClean() takes a retriable 
>>>>>>> "allocating" or "reservation" action. "allocation" or 
>>>>>>> "reservation" is the opposite of cleanup. Both methods are 
>>>>>>> encapsulated in the same object because those two functions must 
>>>>>>> be coordinated. So I think that collocating them together makes 
>>>>>>> sense. What do you think?
>>>>>>
>>>>>> ...to illustrate what I mean, here's a variant that totally 
>>>>>> untangles Bits from Cleaner and moves the whole Cleaner 
>>>>>> interaction into the DirectByteBuffer itself:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.13.part2/ 
>>>>>>
>>>>>>
>>>>>> Notice the symmetry between Cleaner.retryWhileHelpingClean : 
>>>>>> Cleaner.register and Allocator : Deallocator ?
>>>>>>
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>
>>>>> And here's also a diff between webrev.12.part2 and webrev.13.part2:
>>>>>
>>>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.diff.12to13.part2/ 
>>>>>
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>
>>>
>>
>