zgc-dev Digest, Vol 26, Issue 4
Per Liden
per.liden at oracle.com
Mon Mar 2 10:20:09 UTC 2020
Hi Pierre,
On 2/13/20 2:58 PM, Pierre Mevel wrote:
> Good morning,
>
> Following on "Information on how to parse/interpret ZGC Logs",
> I did get the same issues back in October. (
> http://mail.openjdk.java.net/pipermail/zgc-dev/2019-October/000779.html for
> the curious).
>
> Basically, our application runs on relatively big servers, and allocates
> memory at a very high pace.
>
> We get enormous allocation stalls with ZGC, and increasing the amount of
> threads running will simply delay the first allocation stalls, not resolve
> the issue.
> Because ZGC is almost entirely concurrent, the application still allocates
> memory during the Concurrent Relocation phase.
> We have two root issues that clash against each other:
> 1. The allocation rate can be much higher than the recollection rate (which
> makes us want to give more OS resources to the GC)
> 2. The allocation rate can vary greatly (and when it's at a low, we do not
> want to have many threads running Concurrent phases) (and when it's at a
> high, it's because clients want some answers from the system, and we can't
> afford a long Allocation Stall)
>
> As I tried to suggest in my previous mail in October, I suggest that the
> workers count be boosted to a higher value (Max(ParallelGcThreads,
> ConcGcThreads) maybe?) as soon as an Allocation Stall is triggered, and
> restored to normal at the end of the phase (maybe?).
Changing number of worker threads in the middle of a GC cycle is
non-trivial (and I'm not even sure it would be the the right solution
for this). If I understand your problem, the root of it is that ZGC's
heuristics doesn't have a long enough memory. I.e. after a long period
of low allocation rate, it will not remember (and take into account)
that huge allocation spikes happen now and then. As I mentioned earlier,
you can try using -XX:SoftMaxHeapSize to tell ZGC to collect earlier to
increase the safety margin and avoid allocation stalls. Did you try that?
cheers,
Per
>
> Apologies for having taken up your time,
> Best Regards,
> Pierre Mével
>
>
>
> On Thu, Feb 13, 2020 at 1:03 PM <zgc-dev-request at openjdk.java.net> wrote:
>
>> Send zgc-dev mailing list submissions to
>> zgc-dev at openjdk.java.net
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://mail.openjdk.java.net/mailman/listinfo/zgc-dev
>> or, via email, send a message with subject or body 'help' to
>> zgc-dev-request at openjdk.java.net
>>
>> You can reach the person managing the list at
>> zgc-dev-owner at openjdk.java.net
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of zgc-dev digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Information on how to parse/interpret ZGC Logs
>> (Prabhash Rathore)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 12 Feb 2020 23:20:40 -0800
>> From: Prabhash Rathore <prabhashrathore at gmail.com>
>> To: Peter Booth <peter_booth at me.com>
>> Cc: zgc-dev at openjdk.java.net
>> Subject: Re: Information on how to parse/interpret ZGC Logs
>> Message-ID:
>> <
>> CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg at mail.gmail.com>
>> Content-Type: text/plain; charset="UTF-8"
>>
>> Thank you Per for your help! It's very helpful.
>>
>> I started my application GC configuration with default settings, I just had
>> Xmx set but because of memory allocation stalls and Gc pauses, I tunes
>> Concurrent threads and Parallel threads as default options didn't see
>> enough.
>>
>> My reasoning for high parallel thread configuration
>> (-XX:ParallelGCThreads=78) is that application threads are anyway stalled
>> during full pause so having higher threads (for now 80% of OS threads) can
>> work on collection. and keep the GC pause time lower. Again I increased
>> Concurrent threads from default value to keep collection rate on par with
>> the allocation rate.
>>
>> You mentioned when I see Allocation Stall, increase heap size. I think I
>> have already heap configured at 80% of RAM size. For such allocation stall,
>> is there anything else I can tune other than heap size, concurrent and
>> parallel thread counts.
>>
>>
>> Hi Peter,
>>
>> This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062
>>
>> Output of *egrep "thp|trans" /proc/vmstat:*
>> nr_anon_transparent_hugepages 4722
>> thp_fault_alloc 51664
>> thp_fault_fallback 620147
>> thp_collapse_alloc 11462
>> thp_collapse_alloc_failed 20085
>> thp_split 9350
>> thp_zero_page_alloc 1
>> thp_zero_page_alloc_failed 0
>>
>>
>>
>> Output of *tail -28 /proc/cpuinfo*
>> power management:
>>
>> processor : 95
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 85
>> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz
>> stepping : 7
>> microcode : 0x5000021
>> cpu MHz : 3304.431
>> cache size : 33792 KB
>> physical id : 1
>> siblings : 48
>> core id : 29
>> cpu cores : 24
>> apicid : 123
>> initial apicid : 123
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 22
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
>> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
>> rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
>> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
>> est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
>> 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba
>> ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid
>> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a
>> avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
>> xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
>> dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke
>> avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
>> bogomips : 5206.07
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 46 bits physical, 48 bits virtual
>> power management:
>>
>>
>>
>>
>>
>> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth <peter_booth at me.com> wrote:
>>
>>> Prabhash,
>>>
>>> What OS version?
>>> Is it a vanilla OS install?
>>> Can you print the output of the following?
>>> (Assuming Linux)
>>> egrep ?thp|trans? /proc/vmstat
>>> tail -28 /proc/cpuinfo
>>>
>>> Peter
>>>
>>> Sent from my iPhone
>>>
>>>> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <
>> prabhashrathore at gmail.com>
>>> wrote:
>>>>
>>>> ?Hi Per,
>>>>
>>>> Thanks for your reply!
>>>>
>>>> About ZGC logs, in general I am trying to understand following:
>>>>
>>>> - What are the full pause times?
>>>> - How many such pause per unit time?
>>>> - Anything else which helps me eliminate GC as cause for high
>>>> application latency.
>>>>
>>>> This is how I have configured ZGC logging at JVM level, wondering if I
>>>> should add other tags like Safepoint to get more details about GC
>> stats:
>>>> -Xlog:gc*=debug:file=gc.log
>>>>
>>>> All JVM flas used in my application:
>>>> -Xms130G -Xmx130G
>>>>
>> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
>>>> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
>>>> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
>>>>
>>>> It's a large machine with 96 threads and 196 GB RAM.
>>>>
>>>> I have -XX:+AlwaysPreTouch configured as one another option. With
>>>> AlwaysPreTouch option, I see in Linux top command shows a very high
>>> shared
>>>> and resident memory. My max heap size is configured as 130 GB but I see
>>>> shared memory is shown as 388 GB and Resident memory as 436 GB. On the
>>>> other hand, total virtual memory for this process in top is shown as
>> 17.1
>>>> tera byte. How is this possible? My whole machine size is 196 GB (is
>> this
>>>> accounting for things swapped out to disk). I did see without
>>>> AlwaysPretouch, numbers look close to the heap size. Trying to
>> understand
>>>> why with PreTouch, process memory is shown was higher than configured
>>> size?
>>>> I understand shared memory has all shared libs mapped out but how can
>> it
>>> be
>>>> such. a large size?
>>>>
>>>> Regarding high GC pause time, I did notice that my machine was low on
>>>> memory and it was swapping, hence slowing down everything. For now I
>> have
>>>> disabled Swappines completely with Kernel VM tunable but I am still
>>> trying
>>>> to find the actual cause of why swapping kicked in. This machine only
>>> runs
>>>> this particular Java applicaion which has 130 GB heap size. Other than
>>>> heap, I still have 66 GB memory available on host. Trying to figure out
>>> if
>>>> there is a native memory leak. If you have any inputs on this then
>> please
>>>> share.
>>>>
>>>> Thanks!
>>>> Prabhash Rathore
>>>>
>>>>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com>
>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We have decided to use ZGC Garbage Collector for our Java application
>>>>>> running on Java 11. I was wondering if there are any tools or any
>>>>>> documenation on how to interpret ZGC logs.
>>>>>
>>>>> Is there something in particular in the logs you're wondering about?
>>>>>
>>>>>>
>>>>>> I found following statistics in ZGC log which as per my understanding
>>>>> shows
>>>>>> a very large allocation stall of 3902042.342 milliseconds. It will be
>>>>>> really great if I can get some help to understand this further.
>>>>>
>>>>> I can see that you've had making times that is more than an hour long,
>>>>> which suggests that something in your system is seriously wrong (like
>>>>> extremely overloaded or an extremely slow disk that you log to... just
>>>>> guessing here). I think you need to have a broader look at the health
>> of
>>>>> the system before we can draw any conclusion from the GC logs.
>>>>>
>>>>> cheers,
>>>>> Per
>>>>>
>>>>>>
>>>>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>>>>>>
>>>>>
>>>
>> =======================================================================================================================
>>>>>> [2020-02-02T22:37:36.883+0000]
>>>>>> Last 10s Last 10m Last 10h
>>>>>> Total
>>>>>> [2020-02-02T22:37:36.883+0000]
>>>>>> Avg / Max Avg / Max Avg /
>> Max
>>>>>> Avg / Max
>>>>>> [2020-02-02T22:37:36.883+0000] Collector: Garbage Collection Cycle
>>>>>> 0.000 / 0.000 7789.187 / 7789.187 12727.424 /
>>>>>> 3903938.012 1265.033 / 3903938.012 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Contention: Mark Segment Reset
>>> Contention
>>>>>> 0 / 0 10 / 1084 176 /
>> 15122
>>>>>> 42 / 15122 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Contention: Mark SeqNum Reset
>>> Contention
>>>>>> 0 / 0 0 / 5 0 / 31
>>>>>> 0 / 31 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Contention: Relocation Contention
>>>>>> 0 / 0 0 / 3 1 / 708
>>>>>> 7 / 890 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall
>>>>>> 0.000 / 0.000 0.000 / 0.000 6714.722 /
>>>>>> 3902042.342 6714.722 / 3902042.342 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall
>>>>>> 0 / 0 0 / 0 12 /
>> 4115
>>>>>> 2 / 4115 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall
>>>>>> 0.000 / 0.000 0.000 / 0.000 3.979 /
>> 6.561
>>>>>> 1.251 / 6.561 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall
>>>>>> 0 / 0 0 / 0 0 / 1
>>>>>> 0 / 1 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Allocation Rate
>>>>>> 0 / 0 6 / 822 762 /
>> 25306
>>>>>> 1548 / 25306 MB/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After Mark
>>>>>> 0 / 0 92170 / 92170 89632 /
>>>>> 132896
>>>>>> 30301 / 132896 MB
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After
>> Relocation
>>>>>> 0 / 0 76376 / 76376 67490 /
>>>>> 132928
>>>>>> 8047 / 132928 MB
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before Mark
>>>>>> 0 / 0 92128 / 92128 84429 /
>>> 132896
>>>>>> 29452 / 132896 MB
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before
>> Relocation
>>>>>> 0 / 0 86340 / 86340 76995 /
>>> 132896
>>>>>> 15862 / 132896 MB
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Out Of Memory
>>>>>> 0 / 0 0 / 0 0 / 0
>>>>>> 0 / 0 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Flush
>>>>>> 0 / 0 0 / 0 62 /
>> 2868
>>>>>> 16 / 2868 MB/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L1
>>>>>> 0 / 0 7 / 2233 277 /
>> 11553
>>>>>> 583 / 11553 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L2
>>>>>> 0 / 0 0 / 0 20 /
>> 4619
>>>>>> 59 / 4619 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Miss
>>>>>> 0 / 0 0 / 0 15 /
>> 1039
>>>>>> 3 / 1297 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation
>>> Failed
>>>>>> 0 / 0 0 / 0 0 / 24
>>>>>> 0 / 24 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation
>>>>>> Succeeded 0 / 0 0 / 3
>>> 1
>>>>> /
>>>>>> 708 7 / 890 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Page Allocation
>>>>>> 0 / 0 0 / 12 30 /
>> 3464
>>>>>> 7 / 3464 ops/s
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Destroy
>> Detached
>>>>>> Pages 0.000 / 0.000 0.004 / 0.004 11.675 /
>>>>>> 1484.886 1.155 / 1484.886 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark
>>>>>> 0.000 / 0.000 7016.569 / 7016.569 11758.365 /
>>>>>> 3901893.544 1103.558 / 3901893.544 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark Continue
>>>>>> 0.000 / 0.000 0.000 / 0.000 1968.844 /
>>>>> 3674.454
>>>>>> 1968.844 / 3674.454 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Prepare
>>> Relocation
>>>>>> Set 0.000 / 0.000 453.732 / 453.732 364.535 /
>>>>>> 7103.720 39.453 / 7103.720 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Process
>>> Non-Strong
>>>>>> References 0.000 / 0.000 2.003 / 2.003 2.738 /
>>>>> 34.406
>>>>>> 2.253 / 34.406 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Relocate
>>>>>> 0.000 / 0.000 261.822 / 261.822 335.954 /
>>>>> 2207.669
>>>>>> 45.868 / 2207.669 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Reset
>> Relocation
>>>>> Set
>>>>>> 0.000 / 0.000 6.083 / 6.083 13.489 /
>>>>> 1128.678
>>>>>> 3.574 / 1128.678 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Select
>>> Relocation
>>>>>> Set 0.000 / 0.000 6.379 / 6.379 97.530 /
>>>>>> 1460.679 18.439 / 1460.679 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark End
>>>>>> 0.000 / 0.000 4.420 / 4.420 6.219 /
>>>>> 26.498
>>>>>> 6.474 / 40.883 ms
>>>>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark Start
>>>>>> 0.000 / 0.000 14.836 / 14.836 11.893 /
>>>>> 28.350
>>>>>> 11.664 / 41.767 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Phase: Pause Relocate Start
>>>>>> 0.000 / 0.000 13.411 / 13.411 30.849 /
>>>>> 697.344
>>>>>> 11.995 / 697.344 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark
>>>>>> 0.000 / 0.000 7015.793 / 7016.276 18497.265 /
>>>>>> 3901893.075 1690.497 / 3901893.075 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Idle
>>>>>> 0.000 / 0.000 1.127 / 13.510 1.292 /
>>>>> 219.999
>>>>>> 1.280 / 219.999 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try Flush
>>>>>> 0.000 / 0.000 1.295 / 2.029 47.094 /
>>>>> 34869.359
>>>>>> 4.797 / 34869.359 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try
>>> Terminate
>>>>>> 0.000 / 0.000 1.212 / 14.847 1.760 /
>>>>> 3799.238
>>>>>> 1.724 / 3799.238 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References
>>> Enqueue
>>>>>> 0.000 / 0.000 0.009 / 0.009 0.022 /
>> 1.930
>>>>>> 0.017 / 2.350 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References
>>> Process
>>>>>> 0.000 / 0.000 0.599 / 0.599 0.768 /
>>> 23.966
>>>>>> 0.495 / 23.966 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>>>>> 0.000 / 0.000 0.882 / 1.253 1.155 /
>>> 21.699
>>>>>> 1.077 / 23.602 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>>>>> JNIWeakHandles 0.000 / 0.000 0.301 / 0.943
>>>>> 0.308 /
>>>>>> 10.868 0.310 / 23.219 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>>>>> StringTable 0.000 / 0.000 0.289 / 0.496
>>>>> 0.390 /
>>>>>> 12.794 0.363 / 22.907 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>>>>> VMWeakHandles 0.000 / 0.000 0.230 / 0.469
>>>>> 0.329 /
>>>>>> 21.267 0.331 / 23.135 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Mark Try Complete
>>>>>> 0.000 / 0.000 0.000 / 0.000 0.501 /
>> 4.801
>>>>>> 0.480 / 17.208 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Remap TLABS
>>>>>> 0.000 / 0.000 0.252 / 0.252 0.195 /
>> 0.528
>>>>>> 0.226 / 3.451 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Retire TLABS
>>>>>> 0.000 / 0.000 1.195 / 1.195 1.324 /
>>> 5.082
>>>>>> 1.408 / 11.219 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots
>>>>>> 0.000 / 0.000 6.968 / 10.865 12.329 /
>>>>> 693.701
>>>>>> 6.431 / 1300.994 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots
>>>>>> ClassLoaderDataGraph 0.000 / 0.000 4.819 / 8.232
>>>>>> 9.635 / 693.405 3.476 / 693.405 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots CodeCache
>>>>>> 0.000 / 0.000 0.842 / 2.731 0.996 /
>>> 83.553
>>>>>> 0.780 / 83.553 ms
>>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots JNIHandles
>>>>>> 0.000 / 0.000 1.171 / 6.314 0.866 /
>>>>> 17.875
>>>>>> 0.837 / 25.708 ms
>>>>>>
>>>>>>
>>>>>> Thank you!
>>>>>> Prabhash Rathore
>>>>>>
>>>>>
>>>
>>>
>>
>>
>> End of zgc-dev Digest, Vol 26, Issue 4
>> **************************************
>>
More information about the zgc-dev
mailing list