Understanding High Object Copy Times

Mon May 1 21:40:12 UTC 2017

Charlie, Kirk, thanks for all the insights. We have actually fixed some
sort of leak in the app and the situation definitely got better.

Do you guys think having THP in state madvise vs. always is equivalently
"bad"?

Thanks!
Nezih

2017-05-01 14:37 GMT-07:00 charlie hunt <charlie.hunt at oracle.com>:

> Yeah, if a cache is being loaded and continuing to see heap occupancy grow
> … not sounding good.  As you have said, other observations will not really
> matter if that’s the case.
>
> thanks,
>
> charlie
>
> On May 1, 2017, at 4:32 PM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>
>
> On May 1, 2017, at 11:19 PM, charlie hunt <charlie.hunt at oracle.com> wrote:
>
> Hi Kirk,
>
> A comment from me embedded below.
>
> charlie
>
> On May 1, 2017, at 3:42 PM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>
> So, never mind all this.. I scrubbed all of the noise out of the log and
> what I see here is a Heap After GC chart that shows heap completely filled.
> Unless the Full GC cleaned things up, this VM isn’t going any further. Heap
> fills very quickly.
>
>
>
>
>> Remember set is the data structure that keep track of pointers from other
>> regions to the region we are operating on. If there are pointers from old
>> regions to the objects in the young regions, we can not collect that object.
>> After a gc pause, the live objects are evacuated to either the survivor
>> or old gen regions. Then we need to update the Remember set. That is what I
>> was referring to.
>> It is part of object copy.
>>
>> Like Kirk mentioned, the parallelism of the gc threads seems fine.
>> Charlie.Hunt suggested maybe the object graph is very deep so that the gc
>> threads can not steal work and end up spinning.
>>
>
> I’d like to hear more from Charlie on this point because if the
> parallelism is ok then I would conclude that the work load is well balanced
> implying that there shouldn’t be any work stealing.. am prepared to be
> wrong on this but…. and, that said the number of attempts to shutdown
> suggest that there was something going on at the tail end of the collection
> and maybe work stealing was far to granular???? No idea….
>
>
> I haven’t had a chance to look at the logs in detail.  That said,
> generally if we see high termination times it suggests that one or more of
> the GC threads are in a termination protocol where it is waiting for one or
> more other GC threads to finish their work. Hence my thought that there may
> be some part of the object graph that is really deep that is keeping one GC
> thread very occupied, i.e. much longer than the other GC threads.
>
>
> Ok, so we’re on the same page… If this were happening I’d expect that
> there would be some unbalance in the parallelization of the work load and I
> don’t see that in this case. However, I am using a heuristic to estimate
> this value and the heuristic could be a wee bit off. But then, all the GC
> threads seem to have trouble with termination so… don’t know what to say
> about this ‘cept it is a minor problem compared to object copy times but
> then… since heap is filling to full, I’d expect object copy times to be
> high and higher.
>
>
> Btw, thanks for jumping in and offering a detailed analysis … a full heap
> followed by a Full GC is not good, especially if it didn’t make much space
> available. :-|
>
>
> Don’t know, the record is corrupted.
>
>
> Ooh, before I forget, did I see Nezih mention that THP was enabled?  If
> that’s the case, I would disable it.
>
>
> Kept meaning to put that in but I don’t think it’s making a difference to
> the primary problem in this case.. the heap occupancy increases after each
> and every collection. Looks like a cache is being loaded.
>
> Regards,
> Kirk
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170501/98b8db25/attachment.htm>