RFR(M,v3): JDK-8059036 : Implement Diagnostic Commands for heap and finalizerinfo
Derek White
derek.white at oracle.com
Fri May 15 22:59:52 UTC 2015
Hi Dmitry, Peter,
I should read my email more frequently - I missed Dmitry's last webrev
email.
My comments on a preference for volatile over Unsafe are not a strong
objection to using Unsafe fences. I just don't have enough experience
with Unsafe fences yet to agree that this use is correct.
While looking over this Reference code I found 3-6 really simple things
that need cleaning up that have been there for years, so I'm not a big
cheerleader for more complicated things here :-)
- Derek
On 5/15/15 6:46 PM, Derek White wrote:
> Hi Peter,
>
> Some comments below...
>
> On 5/14/15 3:50 AM, Peter Levart wrote:
>> Hi Derek,
>>
>> On 05/13/2015 10:34 PM, Derek White wrote:
>>> Hi Peter,
>>>
>>> I don't have smoking gun evidence that your change introduces a bug,
>>> but I have some concerns...
>>
>> I did have a concern too, ...
>>
>>>
>>> On 5/12/15 6:05 PM, Peter Levart wrote:
>>>> Hi Dmitry,
>>>>
>>>> You iterate the queue then, not the unfinalized list. That's more
>>>> logical.
>>>>
>>>> Holding the queue's lock may pause reference handler and finalizer
>>>> threads for the entire time of iteration. This can blow up the
>>>> application. Suppose one wants to diagnose the application because
>>>> he suspects that finalizer thread hardly keeps up with production
>>>> of finalizable instances. This can happen if the allocation rate of
>>>> finalizable objects is very high and/or finalize() methods are not
>>>> coded to be fast enough. Suppose the queue of Finalizer(s) builds
>>>> up gradually and is already holding several million objects. Now
>>>> you initiate the diagnostic command to print the queue. The
>>>> iteration over and grouping/counting of several millions of
>>>> Finalizer(s) takes some time. This blocks finalizer thread and
>>>> reference handler thread. But does not block the threads that
>>>> allocate finalizable objects. By the time the iteration is over,
>>>> the JVM blows up and application slows down to a halt doing GCs
>>>> most of the time, getting OOMEs, etc...
>>>>
>>>> It is possible to iterate the elements of the queue for diagnostic
>>>> purposes without holding a lock. The modification required to do
>>>> that is the following (havent tested this, but I think it should work):
>>>>
>>>> http://cr.openjdk.java.net/~plevart/misc/Finalizer.printFinalizationQueue/webrev.01/
>>>>
>>> One issue to watch out for is the garbage collectors inspect the
>>> Reference.next field from C code directly (to distinguish active vs.
>>> pending, iterate over oops, etc.). You can search hotspot for
>>> java_lang_ref_Reference::next*, java_lang_ref_Reference::set_next*.
>>>
>>> Your change makes "inactive" References superficially look like
>>> "enqueued" References. The GC code is rather subtle and I haven't
>>> yet seen a case where it would get confused by this change, but
>>> there could easily be something like that lurking in the GC code.
>>
>> ...but then I thought that GC can in no way treat a Reference
>> differently whether it is still enqueued in a ReferenceQueue or
>> already dequeued (inactive) - responsibility is already on the Java
>> side. Currently the definition of Reference.next is this:
>>
>> /* When active: NULL
>> * pending: this
>> * Enqueued: next reference in queue (or this if last)
>> * Inactive: this
>> */
>> @SuppressWarnings("rawtypes")
>> Reference next;
>>
>> We see that, unless GC inspects all ReferenceQueue instances and
>> scans their lists too, the state of a Reference that is enqueued as
>> last in list is indistinguishable from the state of inactive
>> Reference. So I deduced that this distinction (enqueued/inactive)
>> can't be established solely on the value of .next field ( == this or
>> != this)...
>>
>> But I should inspect the GC code too to build a better understanding
>> of that part of the story...
>>
>> Anyway. It turns out that there is already enough state in Reference
>> to destinguish between it being enqueued as last in list and already
>> dequeued (inactive) - the additional state is Reference.queue that
>> transitions from ReferenceQueue.ENQUEUED -> ReferenceQueue.NULL in
>> ReferenceQueue.reallyPoll. We need to just make sure the two fields
>> (r.next and r.queue) are assigned and read in correct order. This can
>> be achieved either by making Reference.next a volatile field or by a
>> couple of explicit fences:
>>
>> http://cr.openjdk.java.net/~plevart/misc/Finalizer.printFinalizationQueue/webrev.02/
>>
>> The assignment of r.queue to ReferenceQueue.ENQUEUED already happens
>> before linking the reference into the queue's head in
>> ReferenceQueue.enqueue():
>>
>> r.queue = ENQUEUED;
>> r.next = (head == null) ? r : head;
>> head = r;
>>
>> Both stores are volatile.
>
> This sounds a bit better. I don't have a good handle on the pros and
> cons of a volatile field over explicit fences via Unsafe in this case.
> Kim might jump in soon.
>
> I'd like to suggest an entirely less-clever solution. Since the
> problem is that forEach() might see inconsistent values for the queue
> and next fields of a Reference, but we don't want to grab a lock
> walking the whole queue, we could instead grab the lock as we look at
> each Reference (and do the "back-up" trick if we were interfered
> with). Of course this is slower than going lock-free, but it's not any
> more overhead than the ReferenceHandler thread or FinalizerThreads incur.
>
> The only benefit of this solution over the others is that it is less
> clever, and I'm not sure how much cleverness this problem deserves.
> Given that, and ranking the solutions as "lock" < (clever) "volatile"
> < "fence", I'd be happier with the "volatile" or "lock" solution if
> that is sufficient.
>
> - Derek
>>>
>>>> I also suggest the addition to the ReferenceQueue to be contained
>>>> (package-private) and as generic as possible. That's why I suggest
>>>> forEach iteration method with no concrete logic.
>>>>
>>>> It would be possible to encapsulate the entire logic into a special
>>>> package-private class (say java.lang.ref.DiagnosticCommands) with
>>>> static method(s) (printFinalizationQueue)... You could just expose
>>>> a package-private forEach static method from Finalizer and code the
>>>> rest in DiagnosticCommands.
>>> That's good for encapsulation. But I'm concerned that if "forEach"
>>> got exposed beyond careful use in DiagnosticCommands, and the
>>> Referents were leaked back into the heap, then we could get
>>> unexpected object resurrection after finalization. This isn't a bug
>>> on it's own - any finalizer might resurrect its object if not
>>> careful, but ordinarily /only/ the finalizer could resurrect the
>>> object. This could invalidate application invariants?
>>
>> Well, it all stays in the confines of package-private API - internal
>> to JDK. Any future additional use should be reviewed carefully.
>> Comments on the forEach() method should warn about that.
>>
>>>
>>> I agree that using a lock over the ReferenceQueue iteration opens up
>>> /another/ case of diagnostics affecting application behavior. And
>>> there are pathological scenarios where this gets severe. This is
>>> unfortunate but not uncommon. There is enough complication here that
>>> you should be sure that the fix for diagnostics performance doesn't
>>> introduce subtle bugs to the system in general. A change in this
>>> area should get a thorough review from both the runtime and GC sides.
>>
>> Is the webrev.02 proposed above more acceptable in that respect? It
>> does not introduce any logical changes to existing code.
>>
>>>
>>> Better yet, the reference handling code in GC and runtime should
>>> probably be thrown out and re-written, but that's another issue :-)
>>
>> I may have some proposals in that direction. Stay tuned.
>>
>> Regards, Peter
>>
>>>
>>> - Derek
>>>
>>>> Regards, Peter
>>>>
>>>>
>>>> On 05/12/2015 07:10 PM, Dmitry Samersoff wrote:
>>>>> Everybody,
>>>>>
>>>>> Updated version:
>>>>>
>>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8059036/webrev.03/
>>>>>
>>>>> Now it iterates over queue and output result sorted by number of instances.
>>>>>
>>>>> -Dmitry
>>>>>
>>>>> On 2015-05-07 00:51, Derek White wrote:
>>>>>> Hi Dmitry, Staffan,
>>>>>>
>>>>>> Lots of good comments here.
>>>>>>
>>>>>> On the topic of what list should be printed out, I think we should focus
>>>>>> on objects waiting to be finalized - e.g. the contents of the
>>>>>> ReferenceQueue. It's more of a pain to walk the ReferenceQueue, but you
>>>>>> could add a summerizeQueue(TreeMap<String, Integer>) method, or a
>>>>>> general iterator and lambda.
>>>>>>
>>>>>> A histogram of objects with finalize methods might also be interesting,
>>>>>> but you can get most of the same information from a heap histogram
>>>>>> (ClassHistogramDCmd, and search for count of Finalizer instances).
>>>>>>
>>>>>> BTW, by only locking the ReferenceQueue, we should only be blocking the
>>>>>> FinalizerThread from making progress (and I think there's a GC thread
>>>>>> that runs after GC that sorts found References objects that need
>>>>>> processing into their respective ReferenceQueues). But locking the
>>>>>> "unfinalized" list blocks initializing any object with a finalize() method.
>>>>>>
>>>>>> The sorting suggestion is a nice touch.
>>>>>>
>>>>>> - Derek White, GC team
>>>>>>
>>>>>> On 5/5/15 10:40 AM, Peter Levart wrote:
>>>>>>> Hi Dmitry, Staffan,
>>>>>>>
>>>>>>> On 05/05/2015 12:38 PM, Staffan Larsen wrote:
>>>>>>>> Dmitry,
>>>>>>>>
>>>>>>>> I think this should be reviewed on hotspot-gc and core-libs-dev as
>>>>>>>> well considering the changes to Finalizer.
>>>>>>>>
>>>>>>>> I’m a little worried about the potentially very large String that is
>>>>>>>> returned from printFinalizationQueue(). A possible different approach
>>>>>>>> would be to write printFinalizationQueue() in C++ inside Hotspot.
>>>>>>>> That would allow for outputting one line at a time. The output would
>>>>>>>> still be saved in memory (since the stream is buffered), but at least
>>>>>>>> the data is only saved once in memory, then. It would make the code a
>>>>>>>> bit harder to write, so its a question of how scared we are of
>>>>>>>> running out of memory.
>>>>>>> If the output is just a histogram of # of instances per class name,
>>>>>>> then it should not be too large, as there are not many different
>>>>>>> classes overriding finalize() method (I counted 72 overriddings of
>>>>>>> finalize() method in the whole jdk/src tree).
>>>>>>>
>>>>>>> I'm more concerned about the fact that while traversing the list, a
>>>>>>> lock is held, which might impact prompt finalization(). Is it
>>>>>>> acceptable for diagnostic output to impact performance and/or
>>>>>>> interfere with synchronization?
>>>>>>>
>>>>>>> It might be possible to scan the list without holding a lock for
>>>>>>> diagnostic purposes if Finalizer.remove() didn't set the removed
>>>>>>> Finalizer.next pointer to point back to itself:
>>>>>>>
>>>>>>> private void remove() {
>>>>>>> synchronized (lock) {
>>>>>>> if (unfinalized == this) {
>>>>>>> if (this.next != null) {
>>>>>>> unfinalized = this.next;
>>>>>>> } else {
>>>>>>> unfinalized = this.prev;
>>>>>>> }
>>>>>>> }
>>>>>>> if (this.next != null) {
>>>>>>> this.next.prev = this.prev;
>>>>>>> }
>>>>>>> if (this.prev != null) {
>>>>>>> this.prev.next = this.next;
>>>>>>> }
>>>>>>> // this.next = this; must not be set so that we can
>>>>>>> traverse the list unsynchronized
>>>>>>> this.prev = this; /* Indicates that this has been
>>>>>>> finalized */
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> For detecting whether a Finalizer is already processed, the 'prev'
>>>>>>> pointer could be used instead:
>>>>>>>
>>>>>>> private boolean hasBeenFinalized() {
>>>>>>> return (prev == this);
>>>>>>> }
>>>>>>>
>>>>>>> Also, to make sure a data race dereferencing 'unfinalized' in
>>>>>>> unsynchronized printFinalizationQueue() would get you a fully
>>>>>>> initialized Finalizer instance (in particular the next pointer), you
>>>>>>> would have to make the 'unfinalized' field volatile or insert an
>>>>>>> Unsafe.storeFence() before assigning to 'unfinalized':
>>>>>>>
>>>>>>> private void add() {
>>>>>>> synchronized (lock) {
>>>>>>> if (unfinalized != null) {
>>>>>>> this.next = unfinalized;
>>>>>>> unfinalized.prev = this;
>>>>>>> }
>>>>>>> // make sure a data race dereferencing 'unfinalized'
>>>>>>> // in printFinalizationQueue() does see the Finalizer
>>>>>>> // instance fully initialized
>>>>>>> // (in particular the 'next' pointer)
>>>>>>> U.storeFence();
>>>>>>> unfinalized = this;
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> By doing these modifications, I think you can remove
>>>>>>> synchronized(lock) {} from printFinalizationQueue().
>>>>>>>
>>>>>>>> I see you are traversing the ‘unfinalized’ list in Finalizer, whereas
>>>>>>>> the old SA code for ‘-finalizerinfo' traverses the ‘queue’ list. I am
>>>>>>>> not sure of the difference, perhaps someone from the GC team can help
>>>>>>>> out?
>>>>>>> The 'queue' is a ReferenceQueue of Finalizer (FinalReference)
>>>>>>> instances pending processing by finalizer thread because their
>>>>>>> referents are eligible for finalization (they are not reachable any
>>>>>>> more). The 'unfinalized' is a doubly-linked list of all Finalizer
>>>>>>> instances for which their referents have not been finalized yet
>>>>>>> (including those that are still reachable and alive). The later serves
>>>>>>> two purposes:
>>>>>>> - it keeps Finalizer instances reachable until they are processed
>>>>>>> - it is a source of unfinalized instances for
>>>>>>> running-finalizers-on-exit if requested by
>>>>>>> System.runFinalizersOnExit(true);
>>>>>>>
>>>>>>> So it really depends on what one would like to see. Showing the queue
>>>>>>> may be interesting if one wants to see how the finalizer thread is
>>>>>>> coping with processing the finalize() invocations. Showing unfinalized
>>>>>>> list may be interesting if one wants to know how many live +
>>>>>>> finalization pending instances are there on the heap that override
>>>>>>> finalize() method.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>>> For the output, it would be a nice touch to sort it on the number of
>>>>>>>> references for each type. Perhaps outputting it more like a table
>>>>>>>> (see the old code again) would also make it easier to digest.
>>>>>>>>
>>>>>>>> In diagnosticCommand.hpp, the new commands should override
>>>>>>>> permission() and limit access:
>>>>>>>>
>>>>>>>> static const JavaPermission permission() {
>>>>>>>> JavaPermission p = {"java.lang.management.ManagementPermission",
>>>>>>>> "monitor", NULL};
>>>>>>>> return p;
>>>>>>>> }
>>>>>>>>
>>>>>>>> The two tests don’t validate the output in any way. Would it be
>>>>>>>> possible to add validation? Perhaps hard to make sure an object is on
>>>>>>>> the finalizer queue… Hmm.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Staffan
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 5 maj 2015, at 12:01, Dmitry Samersoff
>>>>>>>>> <dmitry.samersoff at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Everyone,
>>>>>>>>>
>>>>>>>>> Please review the fix:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8059036/webrev.01/
>>>>>>>>>
>>>>>>>>> heap dcmd outputs the same information as SIGBREAK
>>>>>>>>>
>>>>>>>>> finalizerinfo dcmd outputs a list of all classes in finalization queue
>>>>>>>>> with count
>>>>>>>>>
>>>>>>>>> -Dmitry
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dmitry Samersoff
>>>>>>>>> Oracle Java development team, Saint Petersburg, Russia
>>>>>>>>> * I would love to change the world, but they won't give me the sources.
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20150515/f2d3adc9/attachment-0001.html>
More information about the serviceability-dev
mailing list