Bug in G1 logging? incorrect stop times?

Thu Oct 2 14:34:54 UTC 2014

On 2014-10-02 15:07, charlie hunt wrote:
> Hi Thomas,
>
> Don't mean to jump in the middle of your discussion with Bengt ...
>
> Since this is M5, have you tried setting the FX Solaris CPU scheduler, 
> (based on the hypothesis this may be thread scheduling related)?

Good point, Charlie. My guess is that there is some kind of scheduling 
issue. 2200 threads are quite a few threads to start and stop...

Thomas, have you tried with ParallelGC to see if you happen to get the 
same issue there. If it is a scheduling issue for safepoints I don't 
think it should be GC dependent.

Bengt

>
> hths,
>
> Charlie
>
>
>
> Sent from my iPhone
>
> On Oct 2, 2014, at 5:35 AM, Thomas Viessmann 
> <thomas.viessmann at oracle.com <mailto:thomas.viessmann at oracle.com>> wrote:
>
>> Hi Bengt,
>>
>> Better late than never :-) .
>> There are 2200 threads, about 10% are runnable.
>> The application is the only one running on this M5.
>>
>> We observe this effect quite often with G1. Relatively short
>> GC times and long application thread stop times.
>> Interestingly I've never seen this with CMS. But
>> CMS is not an option here due to fragmentation.
>>
>> Thanks and Regards
>>
>> Thomas
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/02/14 11:15, Bengt Rutisson wrote:
>>>
>>> Hi Thomas,
>>>
>>> A very late reply. I happened to find this in my inbox and have some 
>>> comments.
>>>
>>> On 2014-08-29 14:15, Thomas Viessmann wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I suspect a bug in G1 logging. Before I file a (possibly incorrect) 
>>>> bug I would like to ask here.
>>>> Is my understanding correct that the stw evacuation phase starts at 
>>>> *48448.931* and ends at *48454**.034* ?
>>>> This would mean a stop time of 5.103 s, which is clearly a mismatch 
>>>> to the reported *9.3426141* s
>>>> If the stw phase in fact started at a later time e.g. at *48449.288 
>>>> *then the stop pause would have been even shorter.
>>>>
>>>> In addition I do not understand how all the relatively short 
>>>> evacuation phases sum up to several seconds.
>>>> If I add all the MAX values *69.1 + 22.2 + 45+ 150.8 + 0.1 + 155.1 
>>>> + 0.9 + 0.5**= 443.7**ms*
>>>>
>>>>
>>>>
>>>>
>>>> 48443.380: Total time for which application threads were stopped: 
>>>> 0.0084701 seconds
>>>> *48448.931*: [GC pause (mixed) 48448.931: [G1Ergonomics (CSet 
>>>> Construction) start choosing CSet, _pending_cards: 89458, predicted 
>>>> base time: 73.58 ms, remaining time: 176.42 ms, target pause time: 
>>>> 250.00 ms]
>>>>  48448.931: [G1Ergonomics (CSet Construction) add young regions to 
>>>> CSet, eden: 33 regions, survivors: 5 regions, predicted young 
>>>> region time: 35.73 ms]
>>>>  48448.933: [G1Ergonomics (CSet Construction) finish adding old 
>>>> regions to CSet, reason: reclaimable percentage not over threshold, 
>>>> old: 19 regions, max: 77 regions, reclaimable: 2570649712 bytes 
>>>> (9.98 %), threshold: 10.00 %]
>>>>  48448.933: [G1Ergonomics (CSet Construction) added expensive 
>>>> regions to CSet, reason: old CSet region num not reached min, old: 
>>>> 19 regions, expensive: 7 regions, min: 26 regions, remaining time: 
>>>> 0.00 ms]
>>>>  48448.933: [G1Ergonomics (CSet Construction) finish choosing CSet, 
>>>> eden: 33 regions, survivors: 5 regions, old: 19 regions, predicted 
>>>> pause time: 340.58 ms, target pause time: 250.00 ms]
>>>> 48449.252: [SoftReference, 962 refs, 0.0064679 secs]48449.259: 
>>>> [WeakReference, 7411 refs, 0.0027722 secs]48449.261: 
>>>> [FinalReference, 272 refs, 0.0022754 secs]48449.264: 
>>>> [PhantomReference, 6 refs, 0.0024348 secs]48449.266: [JNI Weak 
>>>> Reference, 0.0000337 secs] 48449.288: [G1Ergonomics (Concurrent 
>>>> Cycles) do not request concurrent cycle initiation, reason: still 
>>>> doing mixed collections, occupancy: 14898167808 bytes, allocation 
>>>> request: 0 bytes, threshold: 11596411665 bytes (45.00 %), source: 
>>>> end of GC]
>>>> *48449.288*: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, 
>>>> reason: reclaimable percentage not over threshold, candidate old 
>>>> regions: 156 regions, reclaimable: 2570649712 bytes (9.98 %), 
>>>> threshold: 10.00 %]
>>>> , 0.3573636 secs]
>>>>    [Parallel Time: 314.9 ms, GC Workers: 40]
>>>>       [GC Worker Start (ms): Min: 48448933.6, Avg: 48448934.2, Max: 
>>>> 48448934.7, Diff: 1.1]
>>>>       [Ext Root Scanning (ms): Min: 15.1, Avg: 17.7, Max: *69.1*, 
>>>> Diff: 54.0, Sum: 708.0]
>>>>       [Update RS (ms): Min: 0.0, Avg: 14.5, Max: *22.2*, Diff: 
>>>> 22.2, Sum: 580.3]
>>>>          [Processed Buffers: Min: 0, Avg: 22.0, Max: *45*, Diff: 
>>>> 45, Sum: 879]
>>>>       [Scan RS (ms): Min: 88.7, Avg: 142.6, Max: *150.8*, Diff: 
>>>> 62.1, Sum: 5705.2]
>>>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: *0.1*, 
>>>> Diff: 0.1, Sum: 0.1]
>>>>       [Object Copy (ms): Min: 131.4, Avg: 138.2, Max: *155.1*, 
>>>> Diff: 23.8, Sum: 5526.6]
>>>>       [Termination (ms): Min: 0.0, Avg: 0.7, Max: *0.9*, Diff: 0.9, 
>>>> Sum: 27.2]
>>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.3, Max: *0.5*, Diff: 
>>>> 0.5, Sum: 10.0]
>>>>       [GC Worker Total (ms): Min: 313.2, Avg: 313.9, Max: *314.7*, 
>>>> Diff: 1.5, Sum: 12557.5]
>>>>       [GC Worker End (ms): Min: 48449247.9, Avg: 48449248.1, Max: 
>>>> 48449248.4, Diff: 0.5]
>>>>    [Code Root Fixup: 2.2 ms]
>>>>    [Code Root Migration: 0.2 ms]
>>>>    [Clear CT: 2.0 ms]
>>>>    [Other: 38.2 ms]
>>>>       [Choose CSet: 1.7 ms]
>>>>       [Ref Proc: 15.6 ms]
>>>>       [Ref Enq: 1.2 ms]
>>>>       [Free CSet: 2.4 ms]
>>>>    [Eden: 1056.0M(1056.0M)->0.0B(1088.0M) Survivors: 160.0M->128.0M 
>>>> Heap: 14.5G(24.0G)->13.9G(24.0G)]
>>>>  [Times: user=12.45 sys=0.03, real=0.36 secs]
>>>> *48454**.034*: Total time for which application threads were 
>>>> stopped: *9.3426141* seconds
>>>>
>>>> For completeness; here are the cmd line options:
>>>>
>>>>
>>>> /usr/jdk/instances/jdk1.7.0/bin/sparcv9/java -Xms24g -Xmx24g 
>>>> -server  -XX:StringTableSize=27001 -XX:PermSize=512m 
>>>> -XX:MaxPermSize=512m  -Xss512k
>>>> -XX:+UseG1GC  -XX:SoftRefLRUPolicyMSPerMB=500 -XX:+PrintGCDetails  
>>>> -XX:+PrintGCTimeStamps 
>>>> -Xloggc:/var/cacao/instances/oem-ec/logs/gc.log 
>>>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 
>>>> -XX:GCLogFileSize=10m  -XX:+PrintGCApplicationStoppedTime 
>>>> -XX:MaxGCPauseMillis=250-XX:InitiatingHeapOccupancyPercent=45 
>>>> -XX:+ParallelRefProcEnabled  -XX:ParallelGCThreads=40 
>>>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintSafepointStatistics 
>>>> -XX:+PrintReferenceGC -XX:G1HeapRegionSize=32m  
>>>> -XX:HeapBaseMinAddress=2G
>>>
>>>
>>> The GC starts at 48448.931 and reports that it lasts for 0.36 
>>> seconds, so it should have ended at 48449.291. The last log line 
>>> that states that the threads were stopped for 9 seconds is from the 
>>> PrintGCApplicationStoppedTime flag, which logs how long the 
>>> safepoint was. i.e. how long the Java threads where stopped.
>>>
>>> This means that the safepoint lasted for 9 seconds but the GC only 
>>> lasted for 0.36 seconds. So the rest of the time must be attributed 
>>> to stopping and starting the Java threads. The safepoint was started 
>>> at 48444.691 (48454.034 - 9.3426141), which means that it took 4.240 
>>> seconds (48448.931 - 48444.691) to stop all Java threads and 4.743 
>>> seconds (9.3426141 - 4.240 - 0.36) to start them again.
>>>
>>> I don't have a good explanation for why it takes so long to stop and 
>>> start the threads. Are there many Java threads in the application? 
>>> Are there many other processes running that could cause the Java 
>>> threads to be scheduled out?
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>>
>>>>
>>>> Thanks and Regards
>>>>
>>>> Thomas
>>>>
>>>> -- 
>>>> <mime-attachment.gif> <http://www.oracle.com>
>>>> THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
>>>> Phone: +498914302496 <tel:+49814302496> | Mobile: +491743005467 
>>>> <tel:+491743005467>
>>>> Oracle Customer Technical Support - Java
>>>>
>>>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>>>
>>>> ORACLE Deutschland B.V. & Co. KG
>>>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>>>> Registergericht: Amtsgericht Muenchen, HRA 95603
>>>> Geschäftsführere: Juergen Kunz
>>>>
>>>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>>>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>>>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>>>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>>>
>>>> ------------------------------------------------------------------------
>>>> ------------------------------------------------------------------------
>>>> <mime-attachment.gif> <http://www.oracle.com/commitment> Oracle is 
>>>> committed to developing practices and products that help protect 
>>>> the environment
>>>
>>
>> -- 
>> <oracle_sig_logo.gif> <http://www.oracle.com>
>> THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
>> Phone: +498914302496 <tel:+49814302496> | Mobile: +491743005467 
>> <tel:+491743005467>
>> Oracle Customer Technical Support - Java
>>
>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>
>> ORACLE Deutschland B.V. & Co. KG
>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>> Registergericht: Amtsgericht Muenchen, HRA 95603
>> Geschäftsführere: Juergen Kunz
>>
>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>
>> ------------------------------------------------------------------------
>> ------------------------------------------------------------------------
>> <green-for-email-sig_0.gif> <http://www.oracle.com/commitment> Oracle 
>> is committed to developing practices and products that help protect 
>> the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20141002/ca39fbf8/attachment.htm>