RFR(M,v3): JDK-8059036 : Implement Diagnostic Commands for heap and finalizerinfo

Sat May 16 07:35:39 UTC 2015

Derek,

Personally, I'm for volatile over explicit fence too.

So I'll change the code if Peter don't have objections.

-Dmitry

On 2015-05-16 01:59, Derek White wrote:
> Hi Dmitry, Peter,
> 
> I should read my email more frequently - I missed Dmitry's last webrev
> email.
> 
> My comments on a preference for volatile over Unsafe are not a strong
> objection to using Unsafe fences. I just don't have enough experience
> with Unsafe fences yet to agree that this use is correct.
> 
> While looking over this Reference code I found 3-6 really simple things
> that need cleaning up that have been there for years, so I'm not a big
> cheerleader for more complicated things here :-)
> 
>  - Derek
> 
> On 5/15/15 6:46 PM, Derek White wrote:
>> Hi Peter,
>>
>> Some comments below...
>>
>> On 5/14/15 3:50 AM, Peter Levart wrote:
>>> Hi Derek,
>>>
>>> On 05/13/2015 10:34 PM, Derek White wrote:
>>>> Hi Peter,
>>>>
>>>> I don't have smoking gun evidence that your change introduces a bug,
>>>> but I have some concerns...
>>>
>>> I did have a concern too, ...
>>>
>>>>
>>>> On 5/12/15 6:05 PM, Peter Levart wrote:
>>>>> Hi Dmitry,
>>>>>
>>>>> You iterate the queue then, not the unfinalized list. That's more
>>>>> logical.
>>>>>
>>>>> Holding the queue's lock may pause reference handler and finalizer
>>>>> threads for the entire time of iteration. This can blow up the
>>>>> application. Suppose one wants to diagnose the application because
>>>>> he suspects that finalizer thread hardly keeps up with production
>>>>> of finalizable instances. This can happen if the allocation rate of
>>>>> finalizable objects is very high and/or finalize() methods are not
>>>>> coded to be fast enough. Suppose the queue of Finalizer(s) builds
>>>>> up gradually and is already holding several million objects. Now
>>>>> you initiate the diagnostic command to print the queue. The
>>>>> iteration over and grouping/counting of several millions of
>>>>> Finalizer(s) takes some time. This blocks finalizer thread and
>>>>> reference handler thread. But does not block the threads that
>>>>> allocate finalizable objects. By the time the iteration is over,
>>>>> the JVM blows up and application slows down to a halt doing GCs
>>>>> most of the time, getting OOMEs, etc...
>>>>>
>>>>> It is possible to iterate the elements of the queue for diagnostic
>>>>> purposes without holding a lock. The modification required to do
>>>>> that is the following (havent tested this, but I think it should work):
>>>>>
>>>>>    
>>>>> http://cr.openjdk.java.net/~plevart/misc/Finalizer.printFinalizationQueue/webrev.01/
>>>>>
>>>> One issue to watch out for is the garbage collectors inspect the
>>>> Reference.next field from C code directly (to distinguish active vs.
>>>> pending, iterate over oops, etc.). You can search hotspot for
>>>> java_lang_ref_Reference::next*, java_lang_ref_Reference::set_next*.
>>>>
>>>> Your change makes "inactive" References superficially look like
>>>> "enqueued" References. The GC code is rather subtle and I haven't
>>>> yet seen a case where it would get confused by this change, but
>>>> there could easily be something like that lurking in the GC code.
>>>
>>> ...but then I thought that GC can in no way treat a Reference
>>> differently whether it is still enqueued in a ReferenceQueue or
>>> already dequeued (inactive) - responsibility is already on the Java
>>> side. Currently the definition of Reference.next is this:
>>>
>>>     /* When active:   NULL
>>>      *     pending:   this
>>>      *    Enqueued:   next reference in queue (or this if last)
>>>      *    Inactive:   this
>>>      */
>>>     @SuppressWarnings("rawtypes")
>>>     Reference next;
>>>
>>> We see that, unless GC inspects all ReferenceQueue instances and
>>> scans their lists too, the state of a Reference that is enqueued as
>>> last in list is indistinguishable from the state of inactive
>>> Reference. So I deduced that this distinction (enqueued/inactive)
>>> can't be established solely on the value of .next field ( == this or
>>> != this)...
>>>
>>> But I should inspect the GC code too to build a better understanding
>>> of that part of the story...
>>>
>>> Anyway. It turns out that there is already enough state in Reference
>>> to destinguish between it being enqueued as last in list and already
>>> dequeued (inactive) - the additional state is Reference.queue that
>>> transitions from ReferenceQueue.ENQUEUED -> ReferenceQueue.NULL in
>>> ReferenceQueue.reallyPoll. We need to just make sure the two fields
>>> (r.next and r.queue) are assigned and read in correct order. This can
>>> be achieved either by making Reference.next a volatile field or by a
>>> couple of explicit fences:
>>>
>>>    
>>> http://cr.openjdk.java.net/~plevart/misc/Finalizer.printFinalizationQueue/webrev.02/
>>>
>>> The assignment of r.queue to ReferenceQueue.ENQUEUED already happens
>>> before linking the reference into the queue's head in
>>> ReferenceQueue.enqueue():
>>>
>>>             r.queue = ENQUEUED;
>>>             r.next = (head == null) ? r : head;
>>>             head = r;
>>>
>>> Both stores are volatile.
>>
>> This sounds a bit better. I don't have a good handle on the pros and
>> cons of a volatile field over explicit fences via Unsafe in this case.
>> Kim might jump in soon.
>>
>> I'd like to suggest an entirely less-clever solution. Since the
>> problem is that forEach() might see inconsistent values for the queue
>> and next fields of a Reference, but we don't want to grab a lock
>> walking the whole queue, we could instead grab the lock as we look at
>> each Reference (and do the "back-up" trick if we were interfered
>> with). Of course this is slower than going lock-free, but it's not any
>> more overhead than the ReferenceHandler thread or FinalizerThreads incur.
>>
>> The only benefit of this solution over the others is that it is less
>> clever, and I'm not sure how much cleverness this problem deserves.
>> Given that, and ranking the solutions as "lock" < (clever) "volatile"
>> < "fence", I'd be happier with the "volatile" or "lock" solution if
>> that is sufficient.
>>
>>  - Derek
>>>>
>>>>> I also suggest the addition to the ReferenceQueue to be contained
>>>>> (package-private) and as generic as possible. That's why I suggest
>>>>> forEach iteration method with no concrete logic.
>>>>>
>>>>> It would be possible to encapsulate the entire logic into a special
>>>>> package-private class (say java.lang.ref.DiagnosticCommands) with
>>>>> static method(s) (printFinalizationQueue)... You could just expose
>>>>> a package-private forEach static method from Finalizer and code the
>>>>> rest in DiagnosticCommands.
>>>> That's good for encapsulation. But I'm concerned that if "forEach"
>>>> got exposed beyond careful use in DiagnosticCommands, and the
>>>> Referents were leaked back into the heap, then we could get
>>>> unexpected object resurrection after finalization. This isn't a bug
>>>> on it's own - any finalizer might resurrect its object if not
>>>> careful, but ordinarily /only/ the finalizer could resurrect the
>>>> object. This could invalidate application invariants?
>>>
>>> Well, it all stays in the confines of package-private API - internal
>>> to JDK. Any future additional use should be reviewed carefully.
>>> Comments on the forEach() method should warn about that.
>>>
>>>>
>>>> I agree that using a lock over the ReferenceQueue iteration opens up
>>>> /another/ case of diagnostics affecting application behavior. And
>>>> there are pathological scenarios where this gets severe. This is
>>>> unfortunate but not uncommon. There is enough complication here that
>>>> you should be sure that the fix for diagnostics performance doesn't
>>>> introduce subtle bugs to the system in general. A change in this
>>>> area should get a thorough review from both the runtime and GC sides.
>>>
>>> Is the webrev.02 proposed above more acceptable in that respect? It
>>> does not introduce any logical changes to existing code.
>>>
>>>>
>>>> Better yet, the reference handling code in GC and runtime should
>>>> probably be thrown out and re-written, but that's another issue :-)
>>>
>>> I may have some proposals in that direction. Stay tuned.
>>>
>>> Regards, Peter
>>>
>>>>
>>>> - Derek
>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>> On 05/12/2015 07:10 PM, Dmitry Samersoff wrote:
>>>>>> Everybody,
>>>>>>
>>>>>> Updated version:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8059036/webrev.03/
>>>>>>
>>>>>> Now it iterates over queue and output result sorted by number of instances.
>>>>>>
>>>>>> -Dmitry
>>>>>>
>>>>>> On 2015-05-07 00:51, Derek White wrote:
>>>>>>> Hi Dmitry, Staffan,
>>>>>>>
>>>>>>> Lots of good comments here.
>>>>>>>
>>>>>>> On the topic of what list should be printed out, I think we should focus
>>>>>>> on objects waiting to be finalized - e.g. the contents of the
>>>>>>> ReferenceQueue. It's more of a pain to walk the ReferenceQueue, but you
>>>>>>> could add a summerizeQueue(TreeMap<String, Integer>) method, or a
>>>>>>> general iterator and lambda.
>>>>>>>
>>>>>>> A histogram of objects with finalize methods might also be interesting,
>>>>>>> but you can get most of the same information from a heap histogram
>>>>>>> (ClassHistogramDCmd, and search for count of Finalizer instances).
>>>>>>>
>>>>>>> BTW, by only locking the ReferenceQueue, we should only be blocking the
>>>>>>> FinalizerThread from making progress (and I think there's a GC thread
>>>>>>> that runs after GC that sorts found References objects that need
>>>>>>> processing into their respective ReferenceQueues). But locking the
>>>>>>> "unfinalized" list blocks initializing any object with a finalize() method.
>>>>>>>
>>>>>>> The sorting suggestion is a nice touch.
>>>>>>>
>>>>>>>  - Derek White, GC team
>>>>>>>
>>>>>>> On 5/5/15 10:40 AM, Peter Levart wrote:
>>>>>>>> Hi Dmitry, Staffan,
>>>>>>>>
>>>>>>>> On 05/05/2015 12:38 PM, Staffan Larsen wrote:
>>>>>>>>> Dmitry,
>>>>>>>>>
>>>>>>>>> I think this should be reviewed on hotspot-gc and core-libs-dev as
>>>>>>>>> well considering the changes to Finalizer.
>>>>>>>>>
>>>>>>>>> I’m a little worried about the potentially very large String that is
>>>>>>>>> returned from printFinalizationQueue(). A possible different approach
>>>>>>>>> would be to write printFinalizationQueue() in C++ inside Hotspot.
>>>>>>>>> That would allow for outputting one line at a time. The output would
>>>>>>>>> still be saved in memory (since the stream is buffered), but at least
>>>>>>>>> the data is only saved once in memory, then. It would make the code a
>>>>>>>>> bit harder to write, so its a question of how scared we are of
>>>>>>>>> running out of memory.
>>>>>>>> If the output is just a histogram of # of instances per class name,
>>>>>>>> then it should not be too large, as there are not many different
>>>>>>>> classes overriding finalize() method (I counted 72 overriddings of
>>>>>>>> finalize() method in the whole jdk/src tree).
>>>>>>>>
>>>>>>>> I'm more concerned about the fact that while traversing the list, a
>>>>>>>> lock is held, which might impact prompt finalization(). Is it
>>>>>>>> acceptable for diagnostic output to impact performance and/or
>>>>>>>> interfere with synchronization?
>>>>>>>>
>>>>>>>> It might be possible to scan the list without holding a lock for
>>>>>>>> diagnostic purposes if Finalizer.remove() didn't set the removed
>>>>>>>> Finalizer.next pointer to point back to itself:
>>>>>>>>
>>>>>>>>     private void remove() {
>>>>>>>>         synchronized (lock) {
>>>>>>>>             if (unfinalized == this) {
>>>>>>>>                 if (this.next != null) {
>>>>>>>>                     unfinalized = this.next;
>>>>>>>>                 } else {
>>>>>>>>                     unfinalized = this.prev;
>>>>>>>>                 }
>>>>>>>>             }
>>>>>>>>             if (this.next != null) {
>>>>>>>>                 this.next.prev = this.prev;
>>>>>>>>             }
>>>>>>>>             if (this.prev != null) {
>>>>>>>>                 this.prev.next = this.next;
>>>>>>>>             }
>>>>>>>>             // this.next = this; must not be set so that we can
>>>>>>>> traverse the list unsynchronized
>>>>>>>>             this.prev = this;   /* Indicates that this has been
>>>>>>>> finalized */
>>>>>>>>         }
>>>>>>>>     }
>>>>>>>>
>>>>>>>> For detecting whether a Finalizer is already processed, the 'prev'
>>>>>>>> pointer could be used instead:
>>>>>>>>
>>>>>>>>     private boolean hasBeenFinalized() {
>>>>>>>>         return (prev == this);
>>>>>>>>     }
>>>>>>>>
>>>>>>>> Also, to make sure a data race dereferencing 'unfinalized' in
>>>>>>>> unsynchronized printFinalizationQueue() would get you a fully
>>>>>>>> initialized Finalizer instance (in particular the next pointer), you
>>>>>>>> would have to make the 'unfinalized' field volatile or insert an
>>>>>>>> Unsafe.storeFence() before assigning to 'unfinalized':
>>>>>>>>
>>>>>>>>     private void add() {
>>>>>>>>         synchronized (lock) {
>>>>>>>>             if (unfinalized != null) {
>>>>>>>>                 this.next = unfinalized;
>>>>>>>>                 unfinalized.prev = this;
>>>>>>>>             }
>>>>>>>>             // make sure a data race dereferencing 'unfinalized'
>>>>>>>>             // in printFinalizationQueue() does see the Finalizer
>>>>>>>>             // instance fully initialized
>>>>>>>>             // (in particular the 'next' pointer)
>>>>>>>>             U.storeFence();
>>>>>>>>             unfinalized = this;
>>>>>>>>         }
>>>>>>>>     }
>>>>>>>>
>>>>>>>>
>>>>>>>> By doing these modifications, I think you can remove
>>>>>>>> synchronized(lock) {} from printFinalizationQueue().
>>>>>>>>
>>>>>>>>> I see you are traversing the ‘unfinalized’ list in Finalizer, whereas
>>>>>>>>> the old SA code for ‘-finalizerinfo' traverses the ‘queue’ list. I am
>>>>>>>>> not sure of the difference, perhaps someone from the GC team can help
>>>>>>>>> out?
>>>>>>>> The 'queue' is a ReferenceQueue of Finalizer (FinalReference)
>>>>>>>> instances pending processing by finalizer thread because their
>>>>>>>> referents are eligible for finalization (they are not reachable any
>>>>>>>> more). The 'unfinalized' is a doubly-linked list of all Finalizer
>>>>>>>> instances for which their referents have not been finalized yet
>>>>>>>> (including those that are still reachable and alive). The later serves
>>>>>>>> two purposes:
>>>>>>>> - it keeps Finalizer instances reachable until they are processed
>>>>>>>> - it is a source of unfinalized instances for
>>>>>>>> running-finalizers-on-exit if requested by
>>>>>>>> System.runFinalizersOnExit(true);
>>>>>>>>
>>>>>>>> So it really depends on what one would like to see. Showing the queue
>>>>>>>> may be interesting if one wants to see how the finalizer thread is
>>>>>>>> coping with processing the finalize() invocations. Showing unfinalized
>>>>>>>> list may be interesting if one wants to know how many live +
>>>>>>>> finalization pending instances are there on the heap that override
>>>>>>>> finalize() method.
>>>>>>>>
>>>>>>>> Regards, Peter
>>>>>>>>
>>>>>>>>> For the output, it would be a nice touch to sort it on the number of
>>>>>>>>> references for each type. Perhaps outputting it more like a table
>>>>>>>>> (see the old code again) would also make it easier to digest.
>>>>>>>>>
>>>>>>>>> In diagnosticCommand.hpp, the new commands should override
>>>>>>>>> permission() and limit access:
>>>>>>>>>
>>>>>>>>>    static const JavaPermission permission() {
>>>>>>>>>      JavaPermission p = {"java.lang.management.ManagementPermission",
>>>>>>>>>                          "monitor", NULL};
>>>>>>>>>      return p;
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> The two tests don’t validate the output in any way. Would it be
>>>>>>>>> possible to add validation? Perhaps hard to make sure an object is on
>>>>>>>>> the finalizer queue… Hmm.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 5 maj 2015, at 12:01, Dmitry Samersoff
>>>>>>>>>> <dmitry.samersoff at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Everyone,
>>>>>>>>>>
>>>>>>>>>> Please review the fix:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8059036/webrev.01/
>>>>>>>>>>
>>>>>>>>>> heap dcmd outputs the same information as SIGBREAK
>>>>>>>>>>
>>>>>>>>>> finalizerinfo dcmd outputs a list of all classes in finalization queue
>>>>>>>>>> with count
>>>>>>>>>>
>>>>>>>>>> -Dmitry
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Dmitry Samersoff
>>>>>>>>>> Oracle Java development team, Saint Petersburg, Russia
>>>>>>>>>> * I would love to change the world, but they won't give me the sources.
>>>>>
>>>>
>>>
>>
> 

-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.