Extremely long parnew/cms promotion failure scenario?
John O'Brien
jobrien at ieee.org
Fri Oct 19 11:13:51 PDT 2012
Srinivas,
I am interested in how this is resolved. Can I clarify that you are
referring to the GC-- events in the garbage log where the minor gc
unwinds and turns into a major GC?
I'd like to figure out if this is something different than what I've
seen before. I have seen GC times blow out is when Transparent Huge
Pages interfered when it was doing its coalescing at the same time as
a GC. Can you clarify what kernel version you are on and if huge/large
pages are enabled?
This may help you but it will help me follow the discussion better.
Thanks,
John
On Fri, Oct 19, 2012 at 6:36 AM, Charlie Hunt <chunt at salesforce.com> wrote:
> Interesting discussion. :-)
>
> Ramki's observation of high context switches to me suggests active locking
> as a possible culprit. Fwiw, based on your discussion it looks like you're
> headed down a path that makes sense.
>
> charlie...
>
> On Oct 19, 2012, at 3:40 AM, Srinivas Ramakrishna wrote:
>
>
>
> On Thu, Oct 18, 2012 at 5:27 PM, Peter B. Kessler
> <Peter.B.Kessler at oracle.com> wrote:
>>
>> When there's no room in the old generation and a worker has filled its
>> PLAB to capacity, but it still has instances to try to promote, does it try
>> to allocate a new PLAB, and fail? That would lead to each of the workers
>> eventually failing to allocate a new PLAB for each promotion attempt. IIRC,
>> PLAB allocation grabs a real lock (since it happens so rarely :-). In the
>> promotion failure case, that lock could get incandescent. Maybe it's gone
>> unnoticed because for modest young generations it doesn't stay hot enough
>> for long enough for people to witness the supernova? Having a young
>> generation the size you do would exacerbate the problem. If you have lots
>> of workers, that would increase the amount of contention, too.
>
>
> Yes, that's exactly my thinking too. For the case of CMS, the PLAB's are
> "local free block lists" and the allocation from the shared global pool is
> even worse and more heavyweight than an atomic pointer bump, with a lock
> protecting several layers of checks.
>
>>
>>
>> PLAB allocation might be a place where you could put a test for having
>> failed promotion, so just return null and let the worker self-loop this
>> instance. That would keep the test off the fast-path (when things are going
>> well).
>
>
> Yes, that's a good idea and might well be sufficient, and was also my first
> thought. However, I also wonder about whether just moving the promotion
> failure test a volatile read into the fast path of the copy routine, and
> immediately failing all subsequent copies after the first failure (and
> indeed via the
> global flag propagating that failure across all the workers immediately)
> won't just be quicker without having added that much in the fast path. It
> seems
> that in that case we may be able to even avoid the self-looping and the
> subsequent single-threaded fixup. The first thread that fails sets the
> volatile
> global, so any subsequent thread artificially fails all subsequent copies of
> uncopied objects. Any object reference found pointing to an object in Eden
> or From space that hasn't yet been copied will call the copy routine which
> will (artificially) fail and return the original address.
>
> I'll do some experiments and there may lurk devils in the details, but it
> seems to me that this will work and be much more efficient in the
> slow case, without making the fast path that much slower.
>
>>
>>
>> I'm still guessing.
>
>
> Your guesses are good, and very helpful, and I think we are on the right
> track with this one as regards the cause of the slowdown.
>
> I'll update.
>
> -- ramki
>
>>
>>
>>
>> ... peter
>>
>> Srinivas Ramakrishna wrote:
>>>
>>> System data show high context switching in vicinity of event and points
>>> at the futile allocation bottleneck as a possible theory with some legs....
>>>
>>> more later.
>>> -- ramki
>>>
>>> On Thu, Oct 18, 2012 at 3:47 PM, Srinivas Ramakrishna <ysr1729 at gmail.com
>>> <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>> Thanks Peter... the possibility of paging or related issue of VM
>>> system did occur to me, especially because system time shows up as
>>> somewhat high here. The problem is that this server runs without
>>> swap :-) so the time is going elsewhere.
>>>
>>> The cache miss theory is interesting (but would not show up as
>>> system time), and your back of the envelope calculation gives about
>>> 0.8 us for fetching a cache line, although i am pretty sure the
>>> cache miss predictor would probably figure out the misses and stream
>>> in the
>>> cache lines since as you say we are going in address order). I'd
>>> expect it to be no worse than when we do an "initial mark pause on a
>>> full Eden", give or
>>> take a little, and this is some 30 x worse.
>>>
>>> One possibility I am looking at is the part where we self-loop. I
>>> suspect the ParNew/CMS combination running with multiple worker
>>> threads
>>> is hit hard here, if the failure happens very early say -- from what
>>> i saw of that code recently, we don't consult the flag that says we
>>> failed
>>> so we should just return and self-loop. Rather we retry allocation
>>> for each subsequent object, fail that and then do the self-loop. The
>>> repeated
>>> failed attempts might be adding up, especially since the access
>>> involves looking at the shared pool. I'll look at how that is done,
>>> and see if we can
>>> do a fast fail after the first failure happens, rather than try and
>>> do the rest of the scavenge, since we'll need to do a fixup anyway.
>>>
>>> thanks for the discussion and i'll update as and when i do some more
>>> investigations. Keep those ideas coming, and I'll submit a bug
>>> report once
>>> i have spent a few more cycles looking at the available data and
>>> ruminating.
>>>
>>> - ramki
>>>
>>>
>>> On Thu, Oct 18, 2012 at 1:20 PM, Peter B. Kessler
>>> <Peter.B.Kessler at oracle.com <mailto:Peter.B.Kessler at oracle.com>>
>>> wrote:
>>>
>>> IIRC, promotion failure still has to finish the evacuation
>>> attempt (and some objects may get promoted while the ones that
>>> fail get self-looped). That part is the usual multi-threaded
>>> object graph walk, with failed PLAB allocations thrown in to
>>> slow you down. Then you get to start the pass that deals with
>>> the self-loops, which you say is single-threaded. Undoing the
>>> self-loops is in address order, but it walks by the object
>>> sizes, so probably it mostly misses in the cache. 40GB at the
>>> average object size (call them 40 bytes to make the math easy)
>>> is a lot of cache misses. How fast is your memory system?
>>> Probably faster than (10minutes / (40GB / 40bytes)) per cache
>>> miss.
>>>
>>> Is it possible you are paging? Maybe not when things are
>>> running smoothly, but maybe a 10 minute stall on one service
>>> causes things to back up (and grow the heap of) other services
>>> on the same machine? I'm guessing.
>>>
>>> ... peter
>>>
>>> Srinivas Ramakrishna wrote:
>>>
>>>
>>> Has anyone come across extremely long (upwards of 10
>>> minutes) promotion failure unwinding scenarios when using
>>> any of the collectors, but especially with ParNew/CMS?
>>> I recently came across one such occurrence with ParNew/CMS
>>> that, with a 40 GB young gen took upwards of 10 minutes to
>>> "unwind". I looked through the code and I can see
>>> that the unwinding steps can be a source of slowdown as we
>>> iterate single-threaded (DefNew) through the large Eden to
>>> fix up self-forwarded objects, but that still wouldn't
>>> seem to explain such a large pause, even with a 40 GB young
>>> gen. I am looking through the promotion failure paths to see
>>> what might be the cause of such a large pause,
>>> but if anyone has experienced this kind of scenario before
>>> or has any conjectures or insights, I'd appreciate it.
>>>
>>> thanks!
>>> -- ramki
>>>
>>>
>>>
>>> ------------------------------__------------------------------__------------
>>>
>>> _________________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.__net
>>> <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use
>>>
>>> <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>>>
>>>
>>>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
More information about the hotspot-gc-use
mailing list