G1 issue: falling over to Full GC

Fri Nov 2 12:26:46 PDT 2012

Thanks for the update John!  Great news on isolating and fixing the issue.

I haven't poked around just yet ... is there an OpenJDK workspace for HotSpot 24 that'll be included in 7u12 ?   

I'd be happy to build HotSpot from that repository and try out the changes to see if the changes rids the premature evacuations I've observed with 7u9.

Love the progress you, Bengt and Monica are making with G1!

charlie ...

On Nov 2, 2012, at 11:35 AM, John Cuthbertson wrote:

> Hi Charlie,
> 
> I'm jumping in here late as well and I'll try to answer Andreas' 
> questions in a separate email later today.
> 
> I just wanted to let you know what's happening with 7143858. The fix for 
> this CR is already in hs24 which, I believe, is intended for jdk7u12. So 
> fortunately no backporting is needed. With this fix about 90% of the 
> premature evacuations due to the GC locker are eliminated (in one 
> workload they went from around 30 to 3).  The remainder are being 
> tracked using 7181612. Looking at the code I can see a possible scenario 
> that might result in an unexpected evacuation pause , but I haven't been 
> able to prove it - yet.
> 
> Regards,
> 
> JohnC
> 
> On 11/2/2012 5:34 AM, Charlie Hunt wrote:
>> Jumping in a bit late ...
>> 
>> Strongly suggest to anyone evaluating G1 to not use anything prior to 7u4.  And, even better if you use (as of this writing) 7u9, or the latest production Java 7 HotSpot VM.
>> 
>> Fwiw, I'm really liking what I am seeing in 7u9 with the exception on one issue, (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7143858), which is currently slated to be back ported to a future Java 7, (thanks Monica, John Cuthbertson and Bengt tackling this!).
>> 
>>> From looking at your observations and others comments thus far, my initial reaction is that with a 1G Java heap, you might get the best results with -XX:+UseParallelOldGC.  Are you using -XX:+UseParallelGC, or -XX:+UseParallelOldGC?  Or, are you not setting a GC?  Not until 7u4 is -XX:+UseParallelOldGC automatically set for what's called "server class" machines when you don't specify a GC.
>> 
>> The lengthy concurrent mark could be the result of the implementation of G1 in 6u*, or it could be that your system is swapping. Could you check if your system is swapping?  On Solaris you can monitor this using vmstat and observing, not only just free memory, but also sr == scan rate along with pi == page in and po == page out.  Seeing sr (page scan activity) along with low free memory along with pi & po activity are strong suggestions of swapping.  Seeing low free memory and no sr activity is ok, i.e. no swapping.
>> 
>> Additionally, you are right.  "partial" was changed to "mixed" in the GC logs.  For those interested in a bit of history .... this change was made since we felt "partial" was misleading.  What partial was intended to mean was a partial old gen collection, which did occur.  But, on that same GC event it also included a young gen GC.  As a result, we changed the GC event name to "mixed" since that GC event was really a combination of both a young gen GC and portion of old gen GC.
>> 
>> Simone also has a good suggestion with including -XX:+PrintFlagsFinal and -showversion as part of the GC log data to collect, especially with G1 continuing to be improve and evolve.
>> 
>> Look forward to seeing your GC logs!
>> 
>> hths,
>> 
>> charlie ....
>> 
>> On Nov 2, 2012, at 5:46 AM, Andreas Müller wrote:
>> 
>>> Hi Simone,
>>> 
>>>> 4972.437: [GC pause (partial), 1.89505180 secs]
>>>> that I cannot decypher (to Monica - what "partial" means ?), and no mixed GCs, which seems unusual as well.
>>> Oops, I understand that now: 'partial' used to be what 'mixed' is now!
>>> Our portal usually runs on Java 6u33. For the G1 tests I switched to 7u7 because I had learned that G1 is far from mature in 6u33.
>>> But automatic deployments can overwrite the start script and thus switch back to 6u33.
>>> 
>>>> Are you sure you are actually using 1.7.0_u7 ?
>>> I have checked that in the archived start scripts and the result, unfortunetaley, is: no.
>>> The 'good case' was actually running on 7u7 (that's why it was good), but the 'bad case' was unwittingly run on 6u33 again.
>>> That's the true reason why the results were so much worse and so incomprehensible.
>>> Thank you very much for looking at the log and for asking good questions!
>>> 
>>> I'll try to repeat the test and post the results on this list.
>>> 
>>> Regards
>>> Andreas
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use