The GCLocker blues...
Peter B. Kessler
Peter.B.Kessler at Oracle.COM
Fri Jun 27 21:04:32 UTC 2014
Right. VM_GC_Operation::skip_operation() could be made smarter. The current version just looks at the count of collections. There's lots of data available about why a collection was requested and again when it gets to the prologue. E.g., young generation occupancy. There's all the time in the world to make a good decision, relative to the time for the collection (if one happens) or the savings (if a collection is avoided).
... peter
On 06/27/14 12:18, Jon Masamitsu wrote:
> Peter,
>
> Some of the cases that Tony points out have edens
> with small amounts of used. That would indicate that
> a GC finished, the mutators were restarted and then
> the GC-locker needed GC was requested.
>
> Jon
>
> On 6/27/2014 11:35 AM, Peter B. Kessler wrote:
>> I thought there was code somewhere to address this in the case that multiple threads requested collections (for whatever reason, not just GC-locker).
>>
>> Something like: when the collection is requested, record the time (or an epoch number like the collection count), and then when compare that time (or epoch number) to the time (or epoch number) of the last collection when the requested collection is processed. If there's been a collection since the request, assume the requested collection is redundant.
>>
>> There it is: line 88 of
>>
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/b67a3f81b630/src/share/vm/gc_implementation/shared/vmGCOperations.cpp
>>
>> VM_GC_Operation::skip_operation(). With a comment describing why it's there. Maybe it's not detailed enough to prevent an extra collection in your situation?
>>
>> ... peter
>>
>> On 06/27/14 09:41, Tony Printezis wrote:
>>> Hi Jon,
>>>
>>> Great to hear from you! :-)
>>>
>>> I haven't actually tried running the test with JDK 6 (I could if it'd be helpful...).
>>>
>>> Yes, I know exactly what's going on. There's a race between one thread in jni_unlock() scheduling the GCLocker-initiated young GC (let's call it GC-L) and another thread also scheduling a young GC (let's call it GC-A) because it couldn't allocate due to the eden being full. Under certain circumstances, GC-A can happen first, with GC-L being scheduled and going ahead as soon as GC-A finishes.
>>>
>>> I'll open a CR and add a more detailed analysis to it.
>>>
>>> Tony
>>>
>>> On 6/27/14, 11:25 AM, Jon Masamitsu wrote:
>>>> Tony,
>>>>
>>>> I don't recall talk within the GC group about this type of
>>>> problem. I didn't find a CR that relates to that behavior.
>>>> If there is one, I don't think it is on anyone's radar.
>>>>
>>>> Can I infer that the problem does not occur in jdk6?
>>>>
>>>> Any theories on what's going on?
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 6/27/2014 6:00 AM, Tony Printezis wrote:
>>>>> Hi all,
>>>>>
>>>>> (trying again from my Twitter address; moderator: feel free to disregard the original I accidentally sent from my personal address)
>>>>>
>>>>> We have recently noticed an interesting problem which seems to happen quite frequently under certain circumstances. Immediately after a young GC, a second one happens which seems unnecessary given that it starts with an empty or almost empty eden. Here's an example:
>>>>>
>>>>> {Heap before GC invocations=2 (full 0):
>>>>> par new generation total 471872K, used 433003K [0x00000007bae00000, 0x00000007dae00000, 0x00000007dae00000)
>>>>> eden space 419456K, 100% used [0x00000007bae00000, 0x00000007d47a0000, 0x00000007d47a0000)
>>>>> from space 52416K, 25% used [0x00000007d47a0000, 0x00000007d54dacb0, 0x00000007d7ad0000)
>>>>> to space 52416K, 0% used [0x00000007d7ad0000, 0x00000007d7ad0000, 0x00000007dae00000)
>>>>> tenured generation total 524288K, used 0K [0x00000007dae00000, 0x00000007fae00000, 0x00000007fae00000)
>>>>> the space 524288K, 0% used [0x00000007dae00000, 0x00000007dae00000, 0x00000007dae00200, 0x00000007fae00000)
>>>>> compacting perm gen total 21248K, used 2549K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>>>>> the space 21248K, 12% used [0x00000007fae00000, 0x00000007fb07d7a0, 0x00000007fb07d800, 0x00000007fc2c0000)
>>>>> No shared spaces configured.
>>>>> 1.119: [GC (Allocation Failure)[ParNew: 433003K->15843K(471872K), 0.0103090 secs] 433003K->15843K(996160K), 0.0103320 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
>>>>> Heap after GC invocations=3 (full 0):
>>>>> par new generation total 471872K, used 15843K [0x00000007bae00000, 0x00000007dae00000, 0x00000007dae00000)
>>>>> eden space 419456K, 0% used [0x00000007bae00000, 0x00000007bae00000, 0x00000007d47a0000)
>>>>> from space 52416K, 30% used [0x00000007d7ad0000, 0x00000007d8a48c88, 0x00000007dae00000)
>>>>> to space 52416K, 0% used [0x00000007d47a0000, 0x00000007d47a0000, 0x00000007d7ad0000)
>>>>> tenured generation total 524288K, used 0K [0x00000007dae00000, 0x00000007fae00000, 0x00000007fae00000)
>>>>> the space 524288K, 0% used [0x00000007dae00000, 0x00000007dae00000, 0x00000007dae00200, 0x00000007fae00000)
>>>>> compacting perm gen total 21248K, used 2549K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>>>>> the space 21248K, 12% used [0x00000007fae00000, 0x00000007fb07d7a0, 0x00000007fb07d800, 0x00000007fc2c0000)
>>>>> No shared spaces configured.
>>>>> }
>>>>> {Heap before GC invocations=3 (full 0):
>>>>> par new generation total 471872K, used 24002K [0x00000007bae00000, 0x00000007dae00000, 0x00000007dae00000)
>>>>> eden space 419456K, 1% used [0x00000007bae00000, 0x00000007bb5f7c50, 0x00000007d47a0000)
>>>>> from space 52416K, 30% used [0x00000007d7ad0000, 0x00000007d8a48c88, 0x00000007dae00000)
>>>>> to space 52416K, 0% used [0x00000007d47a0000, 0x00000007d47a0000, 0x00000007d7ad0000)
>>>>> tenured generation total 524288K, used 0K [0x00000007dae00000, 0x00000007fae00000, 0x00000007fae00000)
>>>>> the space 524288K, 0% used [0x00000007dae00000, 0x00000007dae00000, 0x00000007dae00200, 0x00000007fae00000)
>>>>> compacting perm gen total 21248K, used 2549K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>>>>> the space 21248K, 12% used [0x00000007fae00000, 0x00000007fb07d7a0, 0x00000007fb07d800, 0x00000007fc2c0000)
>>>>> No shared spaces configured.
>>>>> 1.130: [GC (GCLocker Initiated GC)[ParNew: 24002K->12748K(471872K), 0.0123930 secs] 24002K->12748K(996160K), 0.0124130 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]
>>>>> Heap after GC invocations=4 (full 0):
>>>>> par new generation total 471872K, used 12748K [0x00000007bae00000, 0x00000007dae00000, 0x00000007dae00000)
>>>>> eden space 419456K, 0% used [0x00000007bae00000, 0x00000007bae00000, 0x00000007d47a0000)
>>>>> from space 52416K, 24% used [0x00000007d47a0000, 0x00000007d5413320, 0x00000007d7ad0000)
>>>>> to space 52416K, 0% used [0x00000007d7ad0000, 0x00000007d7ad0000, 0x00000007dae00000)
>>>>> tenured generation total 524288K, used 0K [0x00000007dae00000, 0x00000007fae00000, 0x00000007fae00000)
>>>>> the space 524288K, 0% used [0x00000007dae00000, 0x00000007dae00000, 0x00000007dae00200, 0x00000007fae00000)
>>>>> compacting perm gen total 21248K, used 2549K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>>>>> the space 21248K, 12% used [0x00000007fae00000, 0x00000007fb07d7a0, 0x00000007fb07d800, 0x00000007fc2c0000)
>>>>> No shared spaces configured.
>>>>> }
>>>>>
>>>>> Notice that:
>>>>>
>>>>> * The timestamp of the second GC (1.130) is almost equal to the timestamp of the first GC plus the duration of the first GC (1.119 + 0.0103320 = 1.1293). In this test young GCs normally happen at a frequency of one every 100ms-110ms or so.
>>>>> * The eden at the start of the second GC is almost empty (1% occupancy). We've also seen it very often with a completely empty eden.
>>>>> * (the big hint) The second GC is GClocker-initiated.
>>>>>
>>>>> This happens most often with ParNew (in some cases, more than 30% of the GCs are those unnecessary ones) but also happens with ParallelGC too but less frequently (maybe 1%-1.5% of the GCs are those unnecessary ones). I was unable to reproduce it with G1.
>>>>>
>>>>> I can reproduce it with with latest JDK 7, JDK 8, and also the latest hotspot-gc/hotspot workspace.
>>>>>
>>>>> Are you guys looking into this (and is there a CR?)? I have a small test I can reproduce it with and a diagnosis / proposed fix(es) if you're interested.
>>>>>
>>>>> Tony
>>>>>
>>>>
>>>
>
More information about the hotspot-gc-dev
mailing list