Bug in G1 logging? incorrect stop times?
Bengt Rutisson
bengt.rutisson at oracle.com
Thu Oct 2 14:34:54 UTC 2014
On 2014-10-02 15:07, charlie hunt wrote:
> Hi Thomas,
>
> Don't mean to jump in the middle of your discussion with Bengt ...
>
> Since this is M5, have you tried setting the FX Solaris CPU scheduler,
> (based on the hypothesis this may be thread scheduling related)?
Good point, Charlie. My guess is that there is some kind of scheduling
issue. 2200 threads are quite a few threads to start and stop...
Thomas, have you tried with ParallelGC to see if you happen to get the
same issue there. If it is a scheduling issue for safepoints I don't
think it should be GC dependent.
Bengt
>
> hths,
>
> Charlie
>
>
>
> Sent from my iPhone
>
> On Oct 2, 2014, at 5:35 AM, Thomas Viessmann
> <thomas.viessmann at oracle.com <mailto:thomas.viessmann at oracle.com>> wrote:
>
>> Hi Bengt,
>>
>> Better late than never :-) .
>> There are 2200 threads, about 10% are runnable.
>> The application is the only one running on this M5.
>>
>> We observe this effect quite often with G1. Relatively short
>> GC times and long application thread stop times.
>> Interestingly I've never seen this with CMS. But
>> CMS is not an option here due to fragmentation.
>>
>> Thanks and Regards
>>
>> Thomas
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/02/14 11:15, Bengt Rutisson wrote:
>>>
>>> Hi Thomas,
>>>
>>> A very late reply. I happened to find this in my inbox and have some
>>> comments.
>>>
>>> On 2014-08-29 14:15, Thomas Viessmann wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I suspect a bug in G1 logging. Before I file a (possibly incorrect)
>>>> bug I would like to ask here.
>>>> Is my understanding correct that the stw evacuation phase starts at
>>>> *48448.931* and ends at *48454**.034* ?
>>>> This would mean a stop time of 5.103 s, which is clearly a mismatch
>>>> to the reported *9.3426141* s
>>>> If the stw phase in fact started at a later time e.g. at *48449.288
>>>> *then the stop pause would have been even shorter.
>>>>
>>>> In addition I do not understand how all the relatively short
>>>> evacuation phases sum up to several seconds.
>>>> If I add all the MAX values *69.1 + 22.2 + 45+ 150.8 + 0.1 + 155.1
>>>> + 0.9 + 0.5**= 443.7**ms*
>>>>
>>>>
>>>>
>>>>
>>>> 48443.380: Total time for which application threads were stopped:
>>>> 0.0084701 seconds
>>>> *48448.931*: [GC pause (mixed) 48448.931: [G1Ergonomics (CSet
>>>> Construction) start choosing CSet, _pending_cards: 89458, predicted
>>>> base time: 73.58 ms, remaining time: 176.42 ms, target pause time:
>>>> 250.00 ms]
>>>> 48448.931: [G1Ergonomics (CSet Construction) add young regions to
>>>> CSet, eden: 33 regions, survivors: 5 regions, predicted young
>>>> region time: 35.73 ms]
>>>> 48448.933: [G1Ergonomics (CSet Construction) finish adding old
>>>> regions to CSet, reason: reclaimable percentage not over threshold,
>>>> old: 19 regions, max: 77 regions, reclaimable: 2570649712 bytes
>>>> (9.98 %), threshold: 10.00 %]
>>>> 48448.933: [G1Ergonomics (CSet Construction) added expensive
>>>> regions to CSet, reason: old CSet region num not reached min, old:
>>>> 19 regions, expensive: 7 regions, min: 26 regions, remaining time:
>>>> 0.00 ms]
>>>> 48448.933: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>>> eden: 33 regions, survivors: 5 regions, old: 19 regions, predicted
>>>> pause time: 340.58 ms, target pause time: 250.00 ms]
>>>> 48449.252: [SoftReference, 962 refs, 0.0064679 secs]48449.259:
>>>> [WeakReference, 7411 refs, 0.0027722 secs]48449.261:
>>>> [FinalReference, 272 refs, 0.0022754 secs]48449.264:
>>>> [PhantomReference, 6 refs, 0.0024348 secs]48449.266: [JNI Weak
>>>> Reference, 0.0000337 secs] 48449.288: [G1Ergonomics (Concurrent
>>>> Cycles) do not request concurrent cycle initiation, reason: still
>>>> doing mixed collections, occupancy: 14898167808 bytes, allocation
>>>> request: 0 bytes, threshold: 11596411665 bytes (45.00 %), source:
>>>> end of GC]
>>>> *48449.288*: [G1Ergonomics (Mixed GCs) do not continue mixed GCs,
>>>> reason: reclaimable percentage not over threshold, candidate old
>>>> regions: 156 regions, reclaimable: 2570649712 bytes (9.98 %),
>>>> threshold: 10.00 %]
>>>> , 0.3573636 secs]
>>>> [Parallel Time: 314.9 ms, GC Workers: 40]
>>>> [GC Worker Start (ms): Min: 48448933.6, Avg: 48448934.2, Max:
>>>> 48448934.7, Diff: 1.1]
>>>> [Ext Root Scanning (ms): Min: 15.1, Avg: 17.7, Max: *69.1*,
>>>> Diff: 54.0, Sum: 708.0]
>>>> [Update RS (ms): Min: 0.0, Avg: 14.5, Max: *22.2*, Diff:
>>>> 22.2, Sum: 580.3]
>>>> [Processed Buffers: Min: 0, Avg: 22.0, Max: *45*, Diff:
>>>> 45, Sum: 879]
>>>> [Scan RS (ms): Min: 88.7, Avg: 142.6, Max: *150.8*, Diff:
>>>> 62.1, Sum: 5705.2]
>>>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: *0.1*,
>>>> Diff: 0.1, Sum: 0.1]
>>>> [Object Copy (ms): Min: 131.4, Avg: 138.2, Max: *155.1*,
>>>> Diff: 23.8, Sum: 5526.6]
>>>> [Termination (ms): Min: 0.0, Avg: 0.7, Max: *0.9*, Diff: 0.9,
>>>> Sum: 27.2]
>>>> [GC Worker Other (ms): Min: 0.0, Avg: 0.3, Max: *0.5*, Diff:
>>>> 0.5, Sum: 10.0]
>>>> [GC Worker Total (ms): Min: 313.2, Avg: 313.9, Max: *314.7*,
>>>> Diff: 1.5, Sum: 12557.5]
>>>> [GC Worker End (ms): Min: 48449247.9, Avg: 48449248.1, Max:
>>>> 48449248.4, Diff: 0.5]
>>>> [Code Root Fixup: 2.2 ms]
>>>> [Code Root Migration: 0.2 ms]
>>>> [Clear CT: 2.0 ms]
>>>> [Other: 38.2 ms]
>>>> [Choose CSet: 1.7 ms]
>>>> [Ref Proc: 15.6 ms]
>>>> [Ref Enq: 1.2 ms]
>>>> [Free CSet: 2.4 ms]
>>>> [Eden: 1056.0M(1056.0M)->0.0B(1088.0M) Survivors: 160.0M->128.0M
>>>> Heap: 14.5G(24.0G)->13.9G(24.0G)]
>>>> [Times: user=12.45 sys=0.03, real=0.36 secs]
>>>> *48454**.034*: Total time for which application threads were
>>>> stopped: *9.3426141* seconds
>>>>
>>>> For completeness; here are the cmd line options:
>>>>
>>>>
>>>> /usr/jdk/instances/jdk1.7.0/bin/sparcv9/java -Xms24g -Xmx24g
>>>> -server -XX:StringTableSize=27001 -XX:PermSize=512m
>>>> -XX:MaxPermSize=512m -Xss512k
>>>> -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=500 -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps
>>>> -Xloggc:/var/cacao/instances/oem-ec/logs/gc.log
>>>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>>> -XX:GCLogFileSize=10m -XX:+PrintGCApplicationStoppedTime
>>>> -XX:MaxGCPauseMillis=250-XX:InitiatingHeapOccupancyPercent=45
>>>> -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=40
>>>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintSafepointStatistics
>>>> -XX:+PrintReferenceGC -XX:G1HeapRegionSize=32m
>>>> -XX:HeapBaseMinAddress=2G
>>>
>>>
>>> The GC starts at 48448.931 and reports that it lasts for 0.36
>>> seconds, so it should have ended at 48449.291. The last log line
>>> that states that the threads were stopped for 9 seconds is from the
>>> PrintGCApplicationStoppedTime flag, which logs how long the
>>> safepoint was. i.e. how long the Java threads where stopped.
>>>
>>> This means that the safepoint lasted for 9 seconds but the GC only
>>> lasted for 0.36 seconds. So the rest of the time must be attributed
>>> to stopping and starting the Java threads. The safepoint was started
>>> at 48444.691 (48454.034 - 9.3426141), which means that it took 4.240
>>> seconds (48448.931 - 48444.691) to stop all Java threads and 4.743
>>> seconds (9.3426141 - 4.240 - 0.36) to start them again.
>>>
>>> I don't have a good explanation for why it takes so long to stop and
>>> start the threads. Are there many Java threads in the application?
>>> Are there many other processes running that could cause the Java
>>> threads to be scheduled out?
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>>
>>>>
>>>> Thanks and Regards
>>>>
>>>> Thomas
>>>>
>>>> --
>>>> <mime-attachment.gif> <http://www.oracle.com>
>>>> THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
>>>> Phone: +498914302496 <tel:+49814302496> | Mobile: +491743005467
>>>> <tel:+491743005467>
>>>> Oracle Customer Technical Support - Java
>>>>
>>>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>>>
>>>> ORACLE Deutschland B.V. & Co. KG
>>>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>>>> Registergericht: Amtsgericht Muenchen, HRA 95603
>>>> Geschäftsführere: Juergen Kunz
>>>>
>>>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>>>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>>>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>>>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>>>
>>>> ------------------------------------------------------------------------
>>>> ------------------------------------------------------------------------
>>>> <mime-attachment.gif> <http://www.oracle.com/commitment> Oracle is
>>>> committed to developing practices and products that help protect
>>>> the environment
>>>
>>
>> --
>> <oracle_sig_logo.gif> <http://www.oracle.com>
>> THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
>> Phone: +498914302496 <tel:+49814302496> | Mobile: +491743005467
>> <tel:+491743005467>
>> Oracle Customer Technical Support - Java
>>
>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>
>> ORACLE Deutschland B.V. & Co. KG
>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>> Registergericht: Amtsgericht Muenchen, HRA 95603
>> Geschäftsführere: Juergen Kunz
>>
>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>
>> ------------------------------------------------------------------------
>> ------------------------------------------------------------------------
>> <green-for-email-sig_0.gif> <http://www.oracle.com/commitment> Oracle
>> is committed to developing practices and products that help protect
>> the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20141002/ca39fbf8/attachment.htm>
More information about the hotspot-gc-dev
mailing list