From per.liden at oracle.com Mon Mar 2 10:20:09 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 2 Mar 2020 11:20:09 +0100 Subject: zgc-dev Digest, Vol 26, Issue 4 In-Reply-To: References:

Message-ID: <9c8dc70a-ea87-843c-8fae-aaaf1e92d335@oracle.com> Hi Pierre, On 2/13/20 2:58 PM, Pierre Mevel wrote: > Good morning, > > Following on "Information on how to parse/interpret ZGC Logs", > I did get the same issues back in October. ( > http://mail.openjdk.java.net/pipermail/zgc-dev/2019-October/000779.html for > the curious). > > Basically, our application runs on relatively big servers, and allocates > memory at a very high pace. > > We get enormous allocation stalls with ZGC, and increasing the amount of > threads running will simply delay the first allocation stalls, not resolve > the issue. > Because ZGC is almost entirely concurrent, the application still allocates > memory during the Concurrent Relocation phase. > We have two root issues that clash against each other: > 1. The allocation rate can be much higher than the recollection rate (which > makes us want to give more OS resources to the GC) > 2. The allocation rate can vary greatly (and when it's at a low, we do not > want to have many threads running Concurrent phases) (and when it's at a > high, it's because clients want some answers from the system, and we can't > afford a long Allocation Stall) > > As I tried to suggest in my previous mail in October, I suggest that the > workers count be boosted to a higher value (Max(ParallelGcThreads, > ConcGcThreads) maybe?) as soon as an Allocation Stall is triggered, and > restored to normal at the end of the phase (maybe?). Changing number of worker threads in the middle of a GC cycle is non-trivial (and I'm not even sure it would be the the right solution for this). If I understand your problem, the root of it is that ZGC's heuristics doesn't have a long enough memory. I.e. after a long period of low allocation rate, it will not remember (and take into account) that huge allocation spikes happen now and then. As I mentioned earlier, you can try using -XX:SoftMaxHeapSize to tell ZGC to collect earlier to increase the safety margin and avoid allocation stalls. Did you try that? cheers, Per > > Apologies for having taken up your time, > Best Regards, > Pierre M?vel > > > > On Thu, Feb 13, 2020 at 1:03 PM wrote: > >> Send zgc-dev mailing list submissions to >> zgc-dev at openjdk.java.net >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.openjdk.java.net/mailman/listinfo/zgc-dev >> or, via email, send a message with subject or body 'help' to >> zgc-dev-request at openjdk.java.net >> >> You can reach the person managing the list at >> zgc-dev-owner at openjdk.java.net >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of zgc-dev digest..." >> >> >> Today's Topics: >> >> 1. Re: Information on how to parse/interpret ZGC Logs >> (Prabhash Rathore) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 12 Feb 2020 23:20:40 -0800 >> From: Prabhash Rathore >> To: Peter Booth >> Cc: zgc-dev at openjdk.java.net >> Subject: Re: Information on how to parse/interpret ZGC Logs >> Message-ID: >> < >> CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg at mail.gmail.com> >> Content-Type: text/plain; charset="UTF-8" >> >> Thank you Per for your help! It's very helpful. >> >> I started my application GC configuration with default settings, I just had >> Xmx set but because of memory allocation stalls and Gc pauses, I tunes >> Concurrent threads and Parallel threads as default options didn't see >> enough. >> >> My reasoning for high parallel thread configuration >> (-XX:ParallelGCThreads=78) is that application threads are anyway stalled >> during full pause so having higher threads (for now 80% of OS threads) can >> work on collection. and keep the GC pause time lower. Again I increased >> Concurrent threads from default value to keep collection rate on par with >> the allocation rate. >> >> You mentioned when I see Allocation Stall, increase heap size. I think I >> have already heap configured at 80% of RAM size. For such allocation stall, >> is there anything else I can tune other than heap size, concurrent and >> parallel thread counts. >> >> >> Hi Peter, >> >> This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062 >> >> Output of *egrep "thp|trans" /proc/vmstat:* >> nr_anon_transparent_hugepages 4722 >> thp_fault_alloc 51664 >> thp_fault_fallback 620147 >> thp_collapse_alloc 11462 >> thp_collapse_alloc_failed 20085 >> thp_split 9350 >> thp_zero_page_alloc 1 >> thp_zero_page_alloc_failed 0 >> >> >> >> Output of *tail -28 /proc/cpuinfo* >> power management: >> >> processor : 95 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 85 >> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz >> stepping : 7 >> microcode : 0x5000021 >> cpu MHz : 3304.431 >> cache size : 33792 KB >> physical id : 1 >> siblings : 48 >> core id : 29 >> cpu cores : 24 >> apicid : 123 >> initial apicid : 123 >> fpu : yes >> fpu_exception : yes >> cpuid level : 22 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat >> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb >> rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology >> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx >> est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe >> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >> 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba >> ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid >> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a >> avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl >> xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local >> dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke >> avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities >> bogomips : 5206.07 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 46 bits physical, 48 bits virtual >> power management: >> >> >> >> >> >> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth wrote: >> >>> Prabhash, >>> >>> What OS version? >>> Is it a vanilla OS install? >>> Can you print the output of the following? >>> (Assuming Linux) >>> egrep ?thp|trans? /proc/vmstat >>> tail -28 /proc/cpuinfo >>> >>> Peter >>> >>> Sent from my iPhone >>> >>>> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore < >> prabhashrathore at gmail.com> >>> wrote: >>>> >>>> ?Hi Per, >>>> >>>> Thanks for your reply! >>>> >>>> About ZGC logs, in general I am trying to understand following: >>>> >>>> - What are the full pause times? >>>> - How many such pause per unit time? >>>> - Anything else which helps me eliminate GC as cause for high >>>> application latency. >>>> >>>> This is how I have configured ZGC logging at JVM level, wondering if I >>>> should add other tags like Safepoint to get more details about GC >> stats: >>>> -Xlog:gc*=debug:file=gc.log >>>> >>>> All JVM flas used in my application: >>>> -Xms130G -Xmx130G >>>> >> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864 >>>> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48 >>>> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow >>>> >>>> It's a large machine with 96 threads and 196 GB RAM. >>>> >>>> I have -XX:+AlwaysPreTouch configured as one another option. With >>>> AlwaysPreTouch option, I see in Linux top command shows a very high >>> shared >>>> and resident memory. My max heap size is configured as 130 GB but I see >>>> shared memory is shown as 388 GB and Resident memory as 436 GB. On the >>>> other hand, total virtual memory for this process in top is shown as >> 17.1 >>>> tera byte. How is this possible? My whole machine size is 196 GB (is >> this >>>> accounting for things swapped out to disk). I did see without >>>> AlwaysPretouch, numbers look close to the heap size. Trying to >> understand >>>> why with PreTouch, process memory is shown was higher than configured >>> size? >>>> I understand shared memory has all shared libs mapped out but how can >> it >>> be >>>> such. a large size? >>>> >>>> Regarding high GC pause time, I did notice that my machine was low on >>>> memory and it was swapping, hence slowing down everything. For now I >> have >>>> disabled Swappines completely with Kernel VM tunable but I am still >>> trying >>>> to find the actual cause of why swapping kicked in. This machine only >>> runs >>>> this particular Java applicaion which has 130 GB heap size. Other than >>>> heap, I still have 66 GB memory available on host. Trying to figure out >>> if >>>> there is a native memory leak. If you have any inputs on this then >> please >>>> share. >>>> >>>> Thanks! >>>> Prabhash Rathore >>>> >>>>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden >> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> On 2020-02-03 06:52, Prabhash Rathore wrote: >>>>>> Hello, >>>>>> >>>>>> We have decided to use ZGC Garbage Collector for our Java application >>>>>> running on Java 11. I was wondering if there are any tools or any >>>>>> documenation on how to interpret ZGC logs. >>>>> >>>>> Is there something in particular in the logs you're wondering about? >>>>> >>>>>> >>>>>> I found following statistics in ZGC log which as per my understanding >>>>> shows >>>>>> a very large allocation stall of 3902042.342 milliseconds. It will be >>>>>> really great if I can get some help to understand this further. >>>>> >>>>> I can see that you've had making times that is more than an hour long, >>>>> which suggests that something in your system is seriously wrong (like >>>>> extremely overloaded or an extremely slow disk that you log to... just >>>>> guessing here). I think you need to have a broader look at the health >> of >>>>> the system before we can draw any conclusion from the GC logs. >>>>> >>>>> cheers, >>>>> Per >>>>> >>>>>> >>>>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics >>>>>> >>>>> >>> >> ======================================================================================================================= >>>>>> [2020-02-02T22:37:36.883+0000] >>>>>> Last 10s Last 10m Last 10h >>>>>> Total >>>>>> [2020-02-02T22:37:36.883+0000] >>>>>> Avg / Max Avg / Max Avg / >> Max >>>>>> Avg / Max >>>>>> [2020-02-02T22:37:36.883+0000] Collector: Garbage Collection Cycle >>>>>> 0.000 / 0.000 7789.187 / 7789.187 12727.424 / >>>>>> 3903938.012 1265.033 / 3903938.012 ms >>>>>> [2020-02-02T22:37:36.883+0000] Contention: Mark Segment Reset >>> Contention >>>>>> 0 / 0 10 / 1084 176 / >> 15122 >>>>>> 42 / 15122 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Contention: Mark SeqNum Reset >>> Contention >>>>>> 0 / 0 0 / 5 0 / 31 >>>>>> 0 / 31 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Contention: Relocation Contention >>>>>> 0 / 0 0 / 3 1 / 708 >>>>>> 7 / 890 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall >>>>>> 0.000 / 0.000 0.000 / 0.000 6714.722 / >>>>>> 3902042.342 6714.722 / 3902042.342 ms >>>>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall >>>>>> 0 / 0 0 / 0 12 / >> 4115 >>>>>> 2 / 4115 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall >>>>>> 0.000 / 0.000 0.000 / 0.000 3.979 / >> 6.561 >>>>>> 1.251 / 6.561 ms >>>>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall >>>>>> 0 / 0 0 / 0 0 / 1 >>>>>> 0 / 1 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Allocation Rate >>>>>> 0 / 0 6 / 822 762 / >> 25306 >>>>>> 1548 / 25306 MB/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After Mark >>>>>> 0 / 0 92170 / 92170 89632 / >>>>> 132896 >>>>>> 30301 / 132896 MB >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After >> Relocation >>>>>> 0 / 0 76376 / 76376 67490 / >>>>> 132928 >>>>>> 8047 / 132928 MB >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before Mark >>>>>> 0 / 0 92128 / 92128 84429 / >>> 132896 >>>>>> 29452 / 132896 MB >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before >> Relocation >>>>>> 0 / 0 86340 / 86340 76995 / >>> 132896 >>>>>> 15862 / 132896 MB >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Out Of Memory >>>>>> 0 / 0 0 / 0 0 / 0 >>>>>> 0 / 0 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Flush >>>>>> 0 / 0 0 / 0 62 / >> 2868 >>>>>> 16 / 2868 MB/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L1 >>>>>> 0 / 0 7 / 2233 277 / >> 11553 >>>>>> 583 / 11553 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L2 >>>>>> 0 / 0 0 / 0 20 / >> 4619 >>>>>> 59 / 4619 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Miss >>>>>> 0 / 0 0 / 0 15 / >> 1039 >>>>>> 3 / 1297 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation >>> Failed >>>>>> 0 / 0 0 / 0 0 / 24 >>>>>> 0 / 24 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation >>>>>> Succeeded 0 / 0 0 / 3 >>> 1 >>>>> / >>>>>> 708 7 / 890 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Page Allocation >>>>>> 0 / 0 0 / 12 30 / >> 3464 >>>>>> 7 / 3464 ops/s >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Destroy >> Detached >>>>>> Pages 0.000 / 0.000 0.004 / 0.004 11.675 / >>>>>> 1484.886 1.155 / 1484.886 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark >>>>>> 0.000 / 0.000 7016.569 / 7016.569 11758.365 / >>>>>> 3901893.544 1103.558 / 3901893.544 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark Continue >>>>>> 0.000 / 0.000 0.000 / 0.000 1968.844 / >>>>> 3674.454 >>>>>> 1968.844 / 3674.454 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Prepare >>> Relocation >>>>>> Set 0.000 / 0.000 453.732 / 453.732 364.535 / >>>>>> 7103.720 39.453 / 7103.720 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Process >>> Non-Strong >>>>>> References 0.000 / 0.000 2.003 / 2.003 2.738 / >>>>> 34.406 >>>>>> 2.253 / 34.406 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Relocate >>>>>> 0.000 / 0.000 261.822 / 261.822 335.954 / >>>>> 2207.669 >>>>>> 45.868 / 2207.669 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Reset >> Relocation >>>>> Set >>>>>> 0.000 / 0.000 6.083 / 6.083 13.489 / >>>>> 1128.678 >>>>>> 3.574 / 1128.678 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Select >>> Relocation >>>>>> Set 0.000 / 0.000 6.379 / 6.379 97.530 / >>>>>> 1460.679 18.439 / 1460.679 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark End >>>>>> 0.000 / 0.000 4.420 / 4.420 6.219 / >>>>> 26.498 >>>>>> 6.474 / 40.883 ms >>>>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark Start >>>>>> 0.000 / 0.000 14.836 / 14.836 11.893 / >>>>> 28.350 >>>>>> 11.664 / 41.767 ms >>>>>> [2020-02-02T22:37:36.884+0000] Phase: Pause Relocate Start >>>>>> 0.000 / 0.000 13.411 / 13.411 30.849 / >>>>> 697.344 >>>>>> 11.995 / 697.344 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark >>>>>> 0.000 / 0.000 7015.793 / 7016.276 18497.265 / >>>>>> 3901893.075 1690.497 / 3901893.075 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Idle >>>>>> 0.000 / 0.000 1.127 / 13.510 1.292 / >>>>> 219.999 >>>>>> 1.280 / 219.999 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try Flush >>>>>> 0.000 / 0.000 1.295 / 2.029 47.094 / >>>>> 34869.359 >>>>>> 4.797 / 34869.359 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try >>> Terminate >>>>>> 0.000 / 0.000 1.212 / 14.847 1.760 / >>>>> 3799.238 >>>>>> 1.724 / 3799.238 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References >>> Enqueue >>>>>> 0.000 / 0.000 0.009 / 0.009 0.022 / >> 1.930 >>>>>> 0.017 / 2.350 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References >>> Process >>>>>> 0.000 / 0.000 0.599 / 0.599 0.768 / >>> 23.966 >>>>>> 0.495 / 23.966 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots >>>>>> 0.000 / 0.000 0.882 / 1.253 1.155 / >>> 21.699 >>>>>> 1.077 / 23.602 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots >>>>>> JNIWeakHandles 0.000 / 0.000 0.301 / 0.943 >>>>> 0.308 / >>>>>> 10.868 0.310 / 23.219 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots >>>>>> StringTable 0.000 / 0.000 0.289 / 0.496 >>>>> 0.390 / >>>>>> 12.794 0.363 / 22.907 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots >>>>>> VMWeakHandles 0.000 / 0.000 0.230 / 0.469 >>>>> 0.329 / >>>>>> 21.267 0.331 / 23.135 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Mark Try Complete >>>>>> 0.000 / 0.000 0.000 / 0.000 0.501 / >> 4.801 >>>>>> 0.480 / 17.208 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Remap TLABS >>>>>> 0.000 / 0.000 0.252 / 0.252 0.195 / >> 0.528 >>>>>> 0.226 / 3.451 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Retire TLABS >>>>>> 0.000 / 0.000 1.195 / 1.195 1.324 / >>> 5.082 >>>>>> 1.408 / 11.219 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots >>>>>> 0.000 / 0.000 6.968 / 10.865 12.329 / >>>>> 693.701 >>>>>> 6.431 / 1300.994 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots >>>>>> ClassLoaderDataGraph 0.000 / 0.000 4.819 / 8.232 >>>>>> 9.635 / 693.405 3.476 / 693.405 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots CodeCache >>>>>> 0.000 / 0.000 0.842 / 2.731 0.996 / >>> 83.553 >>>>>> 0.780 / 83.553 ms >>>>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots JNIHandles >>>>>> 0.000 / 0.000 1.171 / 6.314 0.866 / >>>>> 17.875 >>>>>> 0.837 / 25.708 ms >>>>>> >>>>>> >>>>>> Thank you! >>>>>> Prabhash Rathore >>>>>> >>>>> >>> >>> >> >> >> End of zgc-dev Digest, Vol 26, Issue 4 >> ************************************** >> From per.liden at oracle.com Mon Mar 2 10:27:48 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 2 Mar 2020 11:27:48 +0100 Subject: zgc-dev Digest, Vol 26, Issue 4 In-Reply-To: <9c8dc70a-ea87-843c-8fae-aaaf1e92d335@oracle.com> References:

<9c8dc70a-ea87-843c-8fae-aaaf1e92d335@oracle.com> Message-ID: <035f89d6-27ab-7a0f-2a93-f273ca382ff5@oracle.com> On 3/2/20 11:20 AM, Per Liden wrote: > Hi Pierre, > > On 2/13/20 2:58 PM, Pierre Mevel wrote: >> Good morning, >> >> Following on "Information on how to parse/interpret ZGC Logs", >> I did get the same issues back in October. ( >> http://mail.openjdk.java.net/pipermail/zgc-dev/2019-October/000779.html for >> >> the curious). >> >> Basically, our application runs on relatively big servers, and allocates >> memory at a very high pace. >> >> We get enormous allocation stalls with ZGC, and increasing the amount of >> threads running will simply delay the first allocation stalls, not >> resolve >> the issue. >> Because ZGC is almost entirely concurrent, the application still >> allocates >> memory during the Concurrent Relocation phase. >> We have two root issues that clash against each other: >> 1. The allocation rate can be much higher than the recollection rate >> (which >> makes us want to give more OS resources to the GC) >> 2. The allocation rate can vary greatly (and when it's at a low, we do >> not >> want to have many threads running Concurrent phases) (and when it's at a >> high, it's because clients want some answers from the system, and we >> can't >> afford a long Allocation Stall) >> >> As I tried to suggest in my previous mail in October, I suggest that the >> workers count be boosted to a higher value (Max(ParallelGcThreads, >> ConcGcThreads) maybe?) as soon as an Allocation Stall is triggered, and >> restored to normal at the end of the phase (maybe?). > > Changing number of worker threads in the middle of a GC cycle is > non-trivial (and I'm not even sure it would be the the right solution > for this). If I understand your problem, the root of it is that ZGC's > heuristics doesn't have a long enough memory. I.e. after a long period > of low allocation rate, it will not remember (and take into account) > that huge allocation spikes happen now and then. As I mentioned earlier, > you can try using -XX:SoftMaxHeapSize to tell ZGC to collect earlier to > increase the safety margin and avoid allocation stalls. Did you try that? Btw, to better withstand huge allocation rates, the long term plan is to make ZGC generational. But we're not quite there yet. cheers, Per > > cheers, > Per > >> >> Apologies for having taken up your time, >> Best Regards, >> Pierre M?vel >> >> >> >> On Thu, Feb 13, 2020 at 1:03 PM wrote: >> >>> Send zgc-dev mailing list submissions to >>> ???????? zgc-dev at openjdk.java.net >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> ???????? https://mail.openjdk.java.net/mailman/listinfo/zgc-dev >>> or, via email, send a message with subject or body 'help' to >>> ???????? zgc-dev-request at openjdk.java.net >>> >>> You can reach the person managing the list at >>> ???????? zgc-dev-owner at openjdk.java.net >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of zgc-dev digest..." >>> >>> >>> Today's Topics: >>> >>> ??? 1. Re: Information on how to parse/interpret ZGC Logs >>> ?????? (Prabhash Rathore) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Wed, 12 Feb 2020 23:20:40 -0800 >>> From: Prabhash Rathore >>> To: Peter Booth >>> Cc: zgc-dev at openjdk.java.net >>> Subject: Re: Information on how to parse/interpret ZGC Logs >>> Message-ID: >>> ???????? < >>> CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg at mail.gmail.com> >>> Content-Type: text/plain; charset="UTF-8" >>> >>> Thank you Per for your help! It's very helpful. >>> >>> I started my application GC configuration with default settings, I >>> just had >>> Xmx set but because of memory allocation stalls and Gc pauses, I tunes >>> Concurrent threads and Parallel threads as default options didn't see >>> enough. >>> >>> My reasoning for high parallel? thread configuration >>> (-XX:ParallelGCThreads=78) is that application threads are anyway >>> stalled >>> during full pause so having higher threads (for now 80% of OS >>> threads) can >>> work on collection. and keep the GC pause time lower. Again I increased >>> Concurrent threads from default value to keep collection rate on par >>> with >>> the allocation rate. >>> >>> You mentioned when I see Allocation Stall, increase heap size. I think I >>> have already heap configured at 80% of RAM size. For such allocation >>> stall, >>> is there anything else I can tune other than heap size, concurrent and >>> parallel thread counts. >>> >>> >>> Hi Peter, >>> >>> This application runs on Linux RHEL 7.7. OS. Kernel version is >>> 3.10.0-1062 >>> >>> Output of *egrep "thp|trans" /proc/vmstat:* >>> nr_anon_transparent_hugepages 4722 >>> thp_fault_alloc 51664 >>> thp_fault_fallback 620147 >>> thp_collapse_alloc 11462 >>> thp_collapse_alloc_failed 20085 >>> thp_split 9350 >>> thp_zero_page_alloc 1 >>> thp_zero_page_alloc_failed 0 >>> >>> >>> >>> Output of *tail -28 /proc/cpuinfo* >>> power management: >>> >>> processor : 95 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 85 >>> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz >>> stepping : 7 >>> microcode : 0x5000021 >>> cpu MHz : 3304.431 >>> cache size : 33792 KB >>> physical id : 1 >>> siblings : 48 >>> core id : 29 >>> cpu cores : 24 >>> apicid : 123 >>> initial apicid : 123 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 22 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >>> pat >>> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb >>> rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology >>> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl >>> vmx smx >>> est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic >>> movbe >>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >>> 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt >>> ssbd mba >>> ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid >>> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx >>> rdt_a >>> avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw >>> avx512vl >>> xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total >>> cqm_mbm_local >>> dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke >>> avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities >>> bogomips : 5206.07 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 46 bits physical, 48 bits virtual >>> power management: >>> >>> >>> >>> >>> >>> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth wrote: >>> >>>> Prabhash, >>>> >>>> What OS version? >>>> Is it a vanilla OS install? >>>> Can you print the output of the following? >>>> (Assuming Linux) >>>> egrep ?thp|trans? /proc/vmstat >>>> tail -28 /proc/cpuinfo >>>> >>>> Peter >>>> >>>> Sent from my iPhone >>>> >>>>> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore < >>> prabhashrathore at gmail.com> >>>> wrote: >>>>> >>>>> ?Hi Per, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> About ZGC logs, in general I am trying to understand following: >>>>> >>>>> ?? - What are the full pause times? >>>>> ?? - How many such pause per unit time? >>>>> ?? - Anything else which helps me eliminate GC as cause for high >>>>> ?? application latency. >>>>> >>>>> This is how I have configured ZGC logging at JVM level, wondering if I >>>>> should add other tags like Safepoint to get more details about GC >>> stats: >>>>> -Xlog:gc*=debug:file=gc.log >>>>> >>>>> All JVM flas used in my application: >>>>> -Xms130G -Xmx130G >>>>> >>> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864 >>>>> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError >>>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48 >>>>> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow >>>>> >>>>> It's a large machine with 96 threads and 196 GB RAM. >>>>> >>>>> I have -XX:+AlwaysPreTouch configured as one another option. With >>>>> AlwaysPreTouch option, I see in Linux top command shows a very high >>>> shared >>>>> and resident memory. My max heap size is configured as 130 GB but I >>>>> see >>>>> shared memory is shown as 388 GB and Resident memory as 436 GB. On the >>>>> other hand, total virtual memory for this process in top is shown as >>> 17.1 >>>>> tera byte. How is this possible? My whole machine size is 196 GB (is >>> this >>>>> accounting for things swapped out to disk). I did see without >>>>> AlwaysPretouch, numbers look close to the heap size. Trying to >>> understand >>>>> why with PreTouch, process memory is shown was higher than configured >>>> size? >>>>> I understand shared memory has all shared libs mapped out but how can >>> it >>>> be >>>>> such. a large size? >>>>> >>>>> Regarding high GC pause time, I did notice that my machine was low on >>>>> memory and it was swapping, hence slowing down everything. For now I >>> have >>>>> disabled Swappines completely with Kernel VM tunable but I am still >>>> trying >>>>> to find the actual cause of why swapping kicked in. This machine only >>>> runs >>>>> this particular Java applicaion which has 130 GB heap size. Other than >>>>> heap, I still have 66 GB memory available on host. Trying to figure >>>>> out >>>> if >>>>> there is a native memory leak. If you have any inputs on this then >>> please >>>>> share. >>>>> >>>>> Thanks! >>>>> Prabhash Rathore >>>>> >>>>>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden >>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>>> On 2020-02-03 06:52, Prabhash Rathore wrote: >>>>>>> Hello, >>>>>>> >>>>>>> We have decided to use ZGC Garbage Collector for our Java >>>>>>> application >>>>>>> running on Java 11. I was wondering if there are any tools or any >>>>>>> documenation on how to interpret ZGC logs. >>>>>> >>>>>> Is there something in particular in the logs you're wondering about? >>>>>> >>>>>>> >>>>>>> I found following statistics in ZGC log which as per my >>>>>>> understanding >>>>>> shows >>>>>>> a very large allocation stall of 3902042.342 milliseconds. It >>>>>>> will be >>>>>>> really great if I can get some help to understand this further. >>>>>> >>>>>> I can see that you've had making times that is more than an hour >>>>>> long, >>>>>> which suggests that something in your system is seriously wrong (like >>>>>> extremely overloaded or an extremely slow disk that you log to... >>>>>> just >>>>>> guessing here). I think you need to have a broader look at the health >>> of >>>>>> the system before we can draw any conclusion from the GC logs. >>>>>> >>>>>> cheers, >>>>>> Per >>>>>> >>>>>>> >>>>>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics >>>>>>> >>>>>> >>>> >>> ======================================================================================================================= >>> >>>>>>> [2020-02-02T22:37:36.883+0000] >>>>>>> ????????????????? Last 10s????????????? Last 10m >>>>>>> Last 10h >>>>>>> ????????? Total >>>>>>> [2020-02-02T22:37:36.883+0000] >>>>>>> ????????????????? Avg / Max???????????? Avg / Max???????????? Avg / >>> Max >>>>>>> ???????? Avg / Max >>>>>>> [2020-02-02T22:37:36.883+0000]?? Collector: Garbage Collection Cycle >>>>>>> ??????????????? 0.000 / 0.000????? 7789.187 / 7789.187? 12727.424 / >>>>>>> 3903938.012? 1265.033 / 3903938.012?? ms >>>>>>> [2020-02-02T22:37:36.883+0000]? Contention: Mark Segment Reset >>>> Contention >>>>>>> ?????????????????? 0 / 0??????????????? 10 / 1084??????????? 176 / >>> 15122 >>>>>>> ??????? 42 / 15122?????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]? Contention: Mark SeqNum Reset >>>> Contention >>>>>>> ??????????????????? 0 / 0???????????????? 0 / 5???????????????? 0 >>>>>>> / 31 >>>>>>> ????????? 0 / 31????????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]? Contention: Relocation Contention >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 3???????????????? 1 >>>>>>> / 708 >>>>>>> ???????? 7 / 890???????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]??? Critical: Allocation Stall >>>>>>> ??????????????? 0.000 / 0.000???????? 0.000 / 0.000????? 6714.722 / >>>>>>> 3902042.342? 6714.722 / 3902042.342?? ms >>>>>>> [2020-02-02T22:37:36.883+0000]??? Critical: Allocation Stall >>>>>>> ??????????????????? 0 / 0???????????????? 0 / 0??????????????? 12 / >>> 4115 >>>>>>> ????????? 2 / 4115??????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]??? Critical: GC Locker Stall >>>>>>> ?????????????? 0.000 / 0.000???????? 0.000 / 0.000???????? 3.979 / >>> 6.561 >>>>>>> ???? 1.251 / 6.561?????? ms >>>>>>> [2020-02-02T22:37:36.883+0000]??? Critical: GC Locker Stall >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 0???????????????? 0 / 1 >>>>>>> ???????? 0 / 1?????????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Allocation Rate >>>>>>> ?????????????????? 0 / 0???????????????? 6 / 822???????????? 762 / >>> 25306 >>>>>>> ????? 1548 / 25306?????? MB/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Heap Used After Mark >>>>>>> ??????????????????? 0 / 0???????????? 92170 / 92170???????? 89632 / >>>>>> 132896 >>>>>>> ????? 30301 / 132896????? MB >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Heap Used After >>> Relocation >>>>>>> ??????????????????? 0 / 0???????????? 76376 / 76376???????? 67490 / >>>>>> 132928 >>>>>>> ?????? 8047 / 132928????? MB >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Heap Used Before Mark >>>>>>> ?????????????????? 0 / 0???????????? 92128 / 92128???????? 84429 / >>>> 132896 >>>>>>> ????? 29452 / 132896????? MB >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Heap Used Before >>> Relocation >>>>>>> ?????????????????? 0 / 0???????????? 86340 / 86340???????? 76995 / >>>> 132896 >>>>>>> ????? 15862 / 132896????? MB >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Out Of Memory >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 0???????????????? 0 / 0 >>>>>>> ???????? 0 / 0?????????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Page Cache Flush >>>>>>> ??????????????????? 0 / 0???????????????? 0 / 0??????????????? 62 / >>> 2868 >>>>>>> ???????? 16 / 2868??????? MB/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Page Cache Hit L1 >>>>>>> ?????????????????? 0 / 0???????????????? 7 / 2233??????????? 277 / >>> 11553 >>>>>>> ?????? 583 / 11553?????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Page Cache Hit L2 >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 0??????????????? 20 / >>> 4619 >>>>>>> ???????? 59 / 4619??????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Page Cache Miss >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 0??????????????? 15 / >>> 1039 >>>>>>> ????????? 3 / 1297??????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Undo Object Allocation >>>> Failed >>>>>>> ?????????????????? 0 / 0???????????????? 0 / 0???????????????? 0 >>>>>>> / 24 >>>>>>> ????????? 0 / 24????????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Undo Object Allocation >>>>>>> Succeeded????????????????? 0 / 0???????????????? 0 / 3 >>>> ? 1 >>>>>> / >>>>>>> 708?????????????? 7 / 890???????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]????? Memory: Undo Page Allocation >>>>>>> ??????????????????? 0 / 0???????????????? 0 / 12?????????????? 30 / >>> 3464 >>>>>>> ????????? 7 / 3464??????? ops/s >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Destroy >>> Detached >>>>>>> Pages???????????? 0.000 / 0.000???????? 0.004 / 0.004 >>>>>>> 11.675 / >>>>>>> 1484.886????? 1.155 / 1484.886??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Mark >>>>>>> ?????????????? 0.000 / 0.000????? 7016.569 / 7016.569? 11758.365 / >>>>>>> 3901893.544? 1103.558 / 3901893.544?? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Mark Continue >>>>>>> ??????????????? 0.000 / 0.000???????? 0.000 / 0.000????? 1968.844 / >>>>>> 3674.454 >>>>>>> ?? 1968.844 / 3674.454??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Prepare >>>> Relocation >>>>>>> Set???????????? 0.000 / 0.000?????? 453.732 / 453.732???? 364.535 / >>>>>>> 7103.720???? 39.453 / 7103.720??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Process >>>> Non-Strong >>>>>>> References????? 0.000 / 0.000???????? 2.003 / 2.003???????? 2.738 / >>>>>> 34.406 >>>>>>> ??????? 2.253 / 34.406????? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Relocate >>>>>>> ?????????????? 0.000 / 0.000?????? 261.822 / 261.822???? 335.954 / >>>>>> 2207.669 >>>>>>> ???? 45.868 / 2207.669??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Reset >>> Relocation >>>>>> Set >>>>>>> ?????????????? 0.000 / 0.000???????? 6.083 / 6.083??????? 13.489 / >>>>>> 1128.678 >>>>>>> ????? 3.574 / 1128.678??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Concurrent Select >>>> Relocation >>>>>>> Set????????????? 0.000 / 0.000???????? 6.379 / 6.379??????? 97.530 / >>>>>>> 1460.679???? 18.439 / 1460.679??? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Pause Mark End >>>>>>> ??????????????? 0.000 / 0.000???????? 4.420 / 4.420???????? 6.219 / >>>>>> 26.498 >>>>>>> ????? 6.474 / 40.883????? ms >>>>>>> [2020-02-02T22:37:36.883+0000]?????? Phase: Pause Mark Start >>>>>>> ??????????????? 0.000 / 0.000??????? 14.836 / 14.836?????? 11.893 / >>>>>> 28.350 >>>>>>> ???? 11.664 / 41.767????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]?????? Phase: Pause Relocate Start >>>>>>> ??????????????? 0.000 / 0.000??????? 13.411 / 13.411?????? 30.849 / >>>>>> 697.344 >>>>>>> ????? 11.995 / 697.344???? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Mark >>>>>>> ?????????????? 0.000 / 0.000????? 7015.793 / 7016.276? 18497.265 / >>>>>>> 3901893.075? 1690.497 / 3901893.075?? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Mark Idle >>>>>>> ??????????????? 0.000 / 0.000???????? 1.127 / 13.510??????? 1.292 / >>>>>> 219.999 >>>>>>> ?????? 1.280 / 219.999???? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Mark Try >>>>>>> Flush >>>>>>> ?????????????? 0.000 / 0.000???????? 1.295 / 2.029??????? 47.094 / >>>>>> 34869.359 >>>>>>> ???? 4.797 / 34869.359?? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Mark Try >>>> Terminate >>>>>>> ?????????????? 0.000 / 0.000???????? 1.212 / 14.847??????? 1.760 / >>>>>> 3799.238 >>>>>>> ????? 1.724 / 3799.238??? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent References >>>> Enqueue >>>>>>> ?????????????? 0.000 / 0.000???????? 0.009 / 0.009???????? 0.022 / >>> 1.930 >>>>>>> ???? 0.017 / 2.350?????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent References >>>> Process >>>>>>> ?????????????? 0.000 / 0.000???????? 0.599 / 0.599???????? 0.768 / >>>> 23.966 >>>>>>> ????? 0.495 / 23.966????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Weak Roots >>>>>>> ?????????????? 0.000 / 0.000???????? 0.882 / 1.253???????? 1.155 / >>>> 21.699 >>>>>>> ????? 1.077 / 23.602????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Weak Roots >>>>>>> JNIWeakHandles????????? 0.000 / 0.000???????? 0.301 / 0.943 >>>>>> 0.308 / >>>>>>> 10.868??????? 0.310 / 23.219????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Weak Roots >>>>>>> StringTable???????????? 0.000 / 0.000???????? 0.289 / 0.496 >>>>>> 0.390 / >>>>>>> 12.794??????? 0.363 / 22.907????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Concurrent Weak Roots >>>>>>> VMWeakHandles?????????? 0.000 / 0.000???????? 0.230 / 0.469 >>>>>> 0.329 / >>>>>>> 21.267??????? 0.331 / 23.135????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Mark Try Complete >>>>>>> ?????????????? 0.000 / 0.000???????? 0.000 / 0.000???????? 0.501 / >>> 4.801 >>>>>>> ???? 0.480 / 17.208????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Remap TLABS >>>>>>> ?????????????? 0.000 / 0.000???????? 0.252 / 0.252???????? 0.195 / >>> 0.528 >>>>>>> ???? 0.226 / 3.451?????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Retire TLABS >>>>>>> ??????????????? 0.000 / 0.000???????? 1.195 / 1.195???????? 1.324 / >>>> 5.082 >>>>>>> ?????? 1.408 / 11.219????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Roots >>>>>>> ?????????????? 0.000 / 0.000???????? 6.968 / 10.865?????? 12.329 / >>>>>> 693.701 >>>>>>> ???? 6.431 / 1300.994??? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Roots >>>>>>> ClassLoaderDataGraph????????????? 0.000 / 0.000???????? 4.819 / >>>>>>> 8.232 >>>>>>> ?? 9.635 / 693.405?????? 3.476 / 693.405???? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Roots CodeCache >>>>>>> ?????????????? 0.000 / 0.000???????? 0.842 / 2.731???????? 0.996 / >>>> 83.553 >>>>>>> ????? 0.780 / 83.553????? ms >>>>>>> [2020-02-02T22:37:36.884+0000]??? Subphase: Pause Roots JNIHandles >>>>>>> ??????????????? 0.000 / 0.000???????? 1.171 / 6.314???????? 0.866 / >>>>>> 17.875 >>>>>>> ????? 0.837 / 25.708????? ms >>>>>>> >>>>>>> >>>>>>> Thank you! >>>>>>> Prabhash Rathore >>>>>>> >>>>>> >>>> >>>> >>> >>> >>> End of zgc-dev Digest, Vol 26, Issue 4 >>> ************************************** >>> From erik.osterlund at oracle.com Tue Mar 3 08:32:16 2020 From: erik.osterlund at oracle.com (erik_osterlund) Date: Tue, 3 Mar 2020 08:32:16 GMT Subject: git: openjdk/zgc: 3 new changesets Message-ID: <207d5c3f-110e-4bac-9fee-3273669a0577@oracle.com> Changeset: b74c5680 Author: erik_osterlund Date: 2020-02-19 10:02:02 +0000 URL: https://git.openjdk.java.net/zgc/commit/b74c5680 ZGC: Mechanism to opt out from safepoint coalescing ! src/hotspot/share/runtime/vmOperations.hpp ! src/hotspot/share/runtime/vmThread.cpp Changeset: 1b90b281 Author: erik_osterlund Date: 2020-02-11 10:21:13 +0000 URL: https://git.openjdk.java.net/zgc/commit/1b90b281 ZGC: Remove hotness counter sampling from safepoint cleanup ! src/hotspot/share/runtime/safepoint.cpp ! src/hotspot/share/runtime/sweeper.cpp ! src/hotspot/share/runtime/sweeper.hpp Changeset: 833f56fb Author: erik_osterlund Date: 2019-11-04 16:17:14 +0000 URL: https://git.openjdk.java.net/zgc/commit/833f56fb ZGC: Concurrent execution stack processing ! src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp + src/hotspot/cpu/aarch64/c2_safepointPollStubTable_aarch64.hpp ! src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp ! src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp ! src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp + src/hotspot/cpu/arm/c2_safepointPollStubTable_arm.hpp ! src/hotspot/cpu/arm/interp_masm_arm.cpp ! src/hotspot/cpu/arm/macroAssembler_arm.cpp ! src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp + src/hotspot/cpu/ppc/c2_safepointPollStubTable_ppc.hpp ! src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp ! src/hotspot/cpu/ppc/macroAssembler_ppc.cpp ! src/hotspot/cpu/ppc/templateTable_ppc_64.cpp ! src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp + src/hotspot/cpu/s390/c2_safepointPollStubTable_s390.hpp ! src/hotspot/cpu/s390/interp_masm_s390.cpp ! src/hotspot/cpu/s390/macroAssembler_s390.cpp ! src/hotspot/cpu/s390/templateTable_s390.cpp ! src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp + src/hotspot/cpu/sparc/c2_safepointPollStubTable_sparc.hpp ! src/hotspot/cpu/sparc/interp_masm_sparc.cpp ! src/hotspot/cpu/sparc/macroAssembler_sparc.cpp ! src/hotspot/cpu/sparc/templateTable_sparc.cpp ! src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp ! src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp + src/hotspot/cpu/x86/c2_safepointPollStubTable_x86.cpp + src/hotspot/cpu/x86/c2_safepointPollStubTable_x86.hpp ! src/hotspot/cpu/x86/frame_x86.cpp ! src/hotspot/cpu/x86/interp_masm_x86.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.hpp ! src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp ! src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp ! src/hotspot/cpu/x86/templateTable_x86.cpp ! src/hotspot/cpu/x86/vm_version_x86.hpp ! src/hotspot/cpu/x86/x86_64.ad ! src/hotspot/cpu/zero/cppInterpreter_zero.cpp ! src/hotspot/share/c1/c1_CodeStubs.hpp ! src/hotspot/share/c1/c1_LIR.cpp ! src/hotspot/share/c1/c1_LIR.hpp ! src/hotspot/share/c1/c1_LIRAssembler.cpp ! src/hotspot/share/c1/c1_LIRAssembler.hpp ! src/hotspot/share/c1/c1_LinearScan.cpp ! src/hotspot/share/c1/c1_Runtime1.cpp ! src/hotspot/share/compiler/oopMap.cpp ! src/hotspot/share/compiler/oopMap.hpp ! src/hotspot/share/gc/z/zArguments.cpp ! src/hotspot/share/gc/z/zBarrier.cpp ! src/hotspot/share/gc/z/zBarrier.hpp ! src/hotspot/share/gc/z/zBarrier.inline.hpp ! src/hotspot/share/gc/z/zBarrierSet.cpp ! src/hotspot/share/gc/z/zDriver.cpp ! src/hotspot/share/gc/z/zHeapIterator.cpp ! src/hotspot/share/gc/z/zHeapIterator.hpp ! src/hotspot/share/gc/z/zMark.cpp ! src/hotspot/share/gc/z/zObjectAllocator.cpp ! src/hotspot/share/gc/z/zRootsIterator.cpp ! src/hotspot/share/gc/z/zRootsIterator.hpp + src/hotspot/share/gc/z/zStackWatermark.cpp + src/hotspot/share/gc/z/zStackWatermark.hpp ! src/hotspot/share/gc/z/zThreadLocalAllocBuffer.cpp ! src/hotspot/share/gc/z/zThreadLocalAllocBuffer.hpp ! src/hotspot/share/gc/z/z_globals.hpp ! src/hotspot/share/interpreter/bytecodeInterpreter.cpp ! src/hotspot/share/interpreter/interpreterRuntime.cpp ! src/hotspot/share/interpreter/interpreterRuntime.hpp ! src/hotspot/share/jfr/periodic/sampling/jfrCallTrace.cpp ! src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp ! src/hotspot/share/jfr/recorder/stacktrace/jfrStackTrace.cpp ! src/hotspot/share/jvmci/jvmciCodeInstaller.cpp ! src/hotspot/share/logging/logTag.hpp ! src/hotspot/share/opto/compile.hpp ! src/hotspot/share/opto/output.cpp ! src/hotspot/share/opto/runtime.cpp + src/hotspot/share/opto/safepointPollStubTable.hpp ! src/hotspot/share/prims/forte.cpp ! src/hotspot/share/runtime/abstract_vm_version.hpp ! src/hotspot/share/runtime/deoptimization.cpp ! src/hotspot/share/runtime/frame.cpp ! src/hotspot/share/runtime/frame.hpp ! src/hotspot/share/runtime/handshake.cpp ! src/hotspot/share/runtime/interfaceSupport.inline.hpp ! src/hotspot/share/runtime/mutexLocker.cpp ! src/hotspot/share/runtime/objectMonitor.cpp ! src/hotspot/share/runtime/objectMonitor.inline.hpp ! src/hotspot/share/runtime/registerMap.hpp ! src/hotspot/share/runtime/safepoint.cpp ! src/hotspot/share/runtime/safepointMechanism.cpp ! src/hotspot/share/runtime/safepointMechanism.hpp ! src/hotspot/share/runtime/safepointMechanism.inline.hpp ! src/hotspot/share/runtime/serviceThread.cpp ! src/hotspot/share/runtime/serviceThread.hpp ! src/hotspot/share/runtime/sharedRuntime.cpp + src/hotspot/share/runtime/stackWatermark.cpp + src/hotspot/share/runtime/stackWatermark.hpp + src/hotspot/share/runtime/stackWatermark.inline.hpp + src/hotspot/share/runtime/stackWatermarkSet.cpp + src/hotspot/share/runtime/stackWatermarkSet.hpp + src/hotspot/share/runtime/stackWatermarkSet.inline.hpp ! src/hotspot/share/runtime/sweeper.cpp ! src/hotspot/share/runtime/synchronizer.cpp ! src/hotspot/share/runtime/thread.cpp ! src/hotspot/share/runtime/thread.hpp ! src/hotspot/share/runtime/thread.inline.hpp ! src/hotspot/share/runtime/vframe.cpp ! src/hotspot/share/runtime/vframe.hpp ! src/hotspot/share/runtime/vframe.inline.hpp ! src/hotspot/share/runtime/vmOperations.hpp From sergeicelov at gmail.com Sat Mar 14 03:20:28 2020 From: sergeicelov at gmail.com (Sergey Tselovalnikov) Date: Sat, 14 Mar 2020 14:20:28 +1100 Subject: Experience with ZGC Message-ID: Hi, I met Chad (https://twitter.com/chadarimura) a few weeks ago at UnVoxxed Hawaii unconference and mentioned that we use ZGC at Canva, and he encouraged me to share the details. So I wanted to share our experience here. I hope, sharing our success with ZGC can encourage other people to try it out. At Canva, we use ZGC for our API Gateway (further AFE for short). ZGC helped us to reduce GC pauses from around 30-50ms with occasional spikes to hundreds of ms down to only 1-2ms pauses [0]. GC pauses used to cause issues with the TCP backlog filling up which would result in further queuing inside the app, and would require allocating more threads/connections to clear up the queue. These two graphs show the difference we observed [1]. To give some background, AFE runs on a few dozens of c5.large AWS instances. The application runs on OpenJDK JDK 13 with 1.5 GB max heap size, and a stable heap size around 400 MB. It uses Jetty with non-blocking APIs as a web framework, and Finagle as an RPC framework. When fully warmed up, less than 10% of CPU time is spent in GC threads. Enabling ZGC didn't require any special tuning, however, we increased the max heap size which was previously lower following the recommendations [2]. There were a few issues that we faced: * Occasional crashes prior to 13.0.2 Prior to JDK 13.0.2, we observed a number of crashes that would happen after running the app for around 14 hours. The symptoms were very similar to the ones in JDK-8230565. Looking at the crash logs, we found that the crashes would happen when one of the application methods is being recompiled from level 3 to level 4, so we had to mitigate this issue. However, after updating to 13.0.2, we haven't seen them anymore. * Occasional allocation stalls We're still seeing occasional "Application Stall" events which are a bit harder to debug. It doesn't happen very often, and we're still collecting data, but it seems that at least in some cases it's preceded by a number of "ICBufferFull" safepoints. * The results depend on the load profile ZGC worked really well for us for AFE for which the workload consists of a large number of requests flowing through at a relatively steady rate. However, we didn't have much success trying to apply it to some of the services with a bit spikier workloads. For such workloads, ZGC resulted in a large number of application stalls after which the app wasn't able to keep up with the incoming requests anymore. Thanks for working on this awesome GC! [0] https://user-images.githubusercontent.com/1780970/76673536-116fd000-659e-11ea-8832-4aefa06f02b2.png [1] https://user-images.githubusercontent.com/1780970/76673708-93acc400-659f-11ea-903e-0a9d50ef154d.png [2] https://wiki.openjdk.java.net/display/zgc/Main#Main-SettingHeapSize -- Cheers, Sergey Tselovalnikov From erik.osterlund at oracle.com Mon Mar 16 20:40:57 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 16 Mar 2020 21:40:57 +0100 Subject: Experience with ZGC In-Reply-To: References: Message-ID: <1fabb4c8-a4b1-31ea-9d14-72b105a0acf0@oracle.com> Hi Sergey, Thank you for sharing your experience using ZGC. I am glad to hear that you like it in general. As for the more spiky workloads, it is possible to tame the GC by tuning two knobs: 1) -XX:ZAllocationSpikeTolerance This flag sets a factor of how much we can expect the allocation rate to fluctuate. The default is 2. Higher values will trigger GC earlier, anticipating that allocation rates will spike more. 2) -XX:SoftMaxHeapSize The GC will try to keep the heap below this size. So by setting it lower than the MaxHeapSize, you can accommodate more spiky allocation rate and heap residency better. I hope this helps you. Having said that, I would love for the defaults to be able to catch such issues better automatically, so if there is any way I could try one of the spiky workload, I would be delighted to have acloser look at it. Thanks, /Erik On 2020-03-14 04:20, Sergey Tselovalnikov wrote: > Hi, > > I met Chad (https://twitter.com/chadarimura) a few weeks ago at UnVoxxed > Hawaii unconference and mentioned that we use ZGC at Canva, and he > encouraged me to share the details. So I wanted to share our experience > here. I hope, sharing our success with ZGC can encourage other people to > try it out. > > At Canva, we use ZGC for our API Gateway (further AFE for short). ZGC > helped us to reduce GC pauses from around 30-50ms with occasional spikes > to hundreds of ms down to only 1-2ms pauses [0]. GC pauses used to cause > issues with the TCP backlog filling up which would result in further > queuing inside the app, and would require allocating more > threads/connections to clear up the queue. These two graphs show the > difference we observed [1]. > > To give some background, AFE runs on a few dozens of c5.large AWS > instances. The application runs on OpenJDK JDK 13 with 1.5 GB max heap > size, and a stable heap size around 400 MB. It uses Jetty with non-blocking > APIs as a web framework, and Finagle as an RPC framework. When fully warmed > up, less than 10% of CPU time is spent in GC threads. Enabling ZGC didn't > require any special tuning, however, we increased the max heap size which > was previously lower following the recommendations [2]. > > There were a few issues that we faced: > > * Occasional crashes prior to 13.0.2 > Prior to JDK 13.0.2, we observed a number of crashes that would happen > after running the app for around 14 hours. The symptoms were very similar > to the ones in JDK-8230565. Looking at the crash logs, we found that the > crashes would happen when one of the application methods is being > recompiled from level 3 to level 4, so we had to mitigate this issue. > However, after updating to 13.0.2, we haven't seen them anymore. > > * Occasional allocation stalls > We're still seeing occasional "Application Stall" events which are a bit > harder to debug. It doesn't happen very often, and we're still collecting > data, but it seems that at least in some cases it's preceded by a number of > "ICBufferFull" safepoints. > > * The results depend on the load profile > ZGC worked really well for us for AFE for which the workload consists of a > large number of requests flowing through at a relatively steady rate. > However, we didn't have much success trying to apply it to some of the > services with a bit spikier workloads. For such workloads, ZGC resulted in > a large number of application stalls after which the app wasn't able to > keep up with the incoming requests anymore. > > Thanks for working on this awesome GC! > > [0] > https://user-images.githubusercontent.com/1780970/76673536-116fd000-659e-11ea-8832-4aefa06f02b2.png > [1] > https://user-images.githubusercontent.com/1780970/76673708-93acc400-659f-11ea-903e-0a9d50ef154d.png > [2] https://wiki.openjdk.java.net/display/zgc/Main#Main-SettingHeapSize > From erik.osterlund at oracle.com Wed Mar 18 10:29:08 2020 From: erik.osterlund at oracle.com (erik_osterlund) Date: Wed, 18 Mar 2020 10:29:08 GMT Subject: git: openjdk/zgc: ZGC: Add support for ZVerifyRoots Message-ID: Changeset: 074fb856 Author: erik_osterlund Date: 2020-03-16 15:42:39 +0000 URL: https://git.openjdk.java.net/zgc/commit/074fb856 ZGC: Add support for ZVerifyRoots ! src/hotspot/share/gc/z/zVerify.cpp ! src/hotspot/share/gc/z/zVerify.hpp ! src/hotspot/share/runtime/stackWatermark.cpp ! src/hotspot/share/runtime/stackWatermark.hpp ! src/hotspot/share/runtime/stackWatermark.inline.hpp From raell at web.de Thu Mar 19 00:11:03 2020 From: raell at web.de (raell at web.de) Date: Thu, 19 Mar 2020 01:11:03 +0100 Subject: Are good colors more common than bad colors? Message-ID: Dear all, on slide 35 of the presentation about ZGC by Per Lid?n and Stefan Karlsson [1] it is stated that most object references will have the good color. I tried to analyze the probability of a non-root reference that is loaded for the first time for having a good color: A complete cycle has the three phases 1. remapping/marking 2. relocation 3. no gc action (till the next marking/remapping starts) In phase 1 'good' means that the right marked bit is set. At the beginning of the phase all non-root references are bad, at the end of the phase all are good. So, if a non-root reference is selected randomly, it is good with a probability of about 50%. In phases 2, 3 'good' means that the remapped bit is set. Since in these phases no remapping is done (except by the load barrier), all non-root references are bad. So, if a non-root reference is selected randomly, it is good with a probability of 0%. Altogether, it seems to me, that in most parts of a cycle a non-root reference will have the bad color. Of course, I my be missing something. Therefore, I would be interested in an argument, why most object references are expected to have a good color. Thank you very much! Ralph [1] http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf From per.liden at oracle.com Thu Mar 19 07:34:52 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 19 Mar 2020 08:34:52 +0100 Subject: Are good colors more common than bad colors? In-Reply-To: References: Message-ID: <619dea79-1e23-b0a0-1356-bd4c87961bb4@oracle.com> Hi, On 3/19/20 1:11 AM, raell at web.de wrote: > Dear all, > > on slide 35 of the presentation about ZGC by Per Lid?n and Stefan Karlsson [1] it > is stated that most object references will have the good color. I tried to analyze > the probability of a non-root reference that is loaded for the first time for > having a good color: > > A complete cycle has the three phases > > 1. remapping/marking > 2. relocation > 3. no gc action (till the next marking/remapping starts) > > In phase 1 'good' means that the right marked bit is set. At the beginning of the phase > all non-root references are bad, at the end of the phase all are good. So, if a non-root > reference is selected randomly, it is good with a probability of about 50%. > > In phases 2, 3 'good' means that the remapped bit is set. Since in these phases no remapping > is done (except by the load barrier), all non-root references are bad. So, if a non-root > reference is selected randomly, it is good with a probability of 0%. > > Altogether, it seems to me, that in most parts of a cycle a non-root reference will have > the bad color. > > Of course, I my be missing something. Therefore, I would be interested in an argument, why > most object references are expected to have a good color. Your observations are correct. However, applications typically doesn't load randomly selected references. They tend to load a much smaller subset of references over and over again, and a reference will only be bad the first time it's loaded (within the same phase). From our measurements, the chances of loading a bad pointer (i.e. load barrier taking the slow path), is on the order of 1 in a million. cheers, Per > > Thank you very much! > > Ralph > > > [1] http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf > > > From gil at azul.com Thu Mar 19 15:45:04 2020 From: gil at azul.com (Gil Tene) Date: Thu, 19 Mar 2020 15:45:04 +0000 Subject: Are good colors more common than bad colors? In-Reply-To: <619dea79-1e23-b0a0-1356-bd4c87961bb4@oracle.com> References: , <619dea79-1e23-b0a0-1356-bd4c87961bb4@oracle.com> Message-ID: <4AB1DD2D-29B3-422E-9945-261C2470F4D2@azul.com> The dramatic (multi-order-of-magnitude) impact of self healing on barrier triggering likelihood and on the overall costs of self-healing read barriers compared to traditional non-self-healing read barriers (Such as Brooks and Baker style barriers, which do not self heal and keep triggering during GC activity) is highlighted in section 2.2 of [1]. That section explains the fundamental barrier qualities that enable self healing (the semantic placement of the barrier between the load of a reference value and the use of that loaded (reference) value, giving the barrier access to the address that the reference value is loaded from).. The introduction of self healing completely changed the effective costs of read barrier schemes on GC?ed workloads. The change was profound enough that prior work surveying costs of read barriers schemes (usually showing read barriers costs to be prohibitive) no longer applied to modern GC implementations that employ self healing read barriers. With all modern concurrent collectors seeming to coalesce on using self healing read barriers (first Pauseless & C4, then ZGC, then the latest variant of Shenandoah), we can probably start referring to collectors that do not employ self healing read barriers as ?legacy collectors?. A lower bound on the probability of a mutator NOT loading a barrier-triggering reference [a ?bad color? in the terminology used below] in a self healing loaded (reference) value barrier (LVB) based system can be trivially approximated as: 1 - ( RefsInLiveSet / (RefsAccessedPerSecond / GCCycleFrequency) ) RefsInLiveSet = number of references in the live set RefsAccessedPerSecond = number of references accessed per second by the mutator GCCycleFrequency = Frequency (in cycles per second) of GC cycles. Since reference loads are extremely common operations in java execution, this number tends to have many 9s in it. An even higher lower bound approximation can be expressed as: 1 - ( (RefsInLiveSet - RefsHealedByCollector) / (RefsAccessedPerSecond / GCCycleFrequency) ) RefsHealedByCollector = number if references healed by the collector (as opposed to the mutator) Note: In C4/ZGC, this healing-by-the-collector is done in the remap/fixup as part of the next mark phase. In Shenandoah 2.x (2.0 with the new self healing ?LRB?), it is done in a separate pass. In workloads where a significant portion of the references in the heap are rarely loaded by mutators (to the point where they are often not loaded between two consecutive GC cycles), this number has even more 9s in it... [1] C4: The Continuously Concurrent Compacting Collector http://paperhub.s3.amazonaws.com/d14661878f7811e5ee9c43de88414e86.pdf Sent from my iPad On Mar 19, 2020, at 12:36 AM, Per Liden wrote: ?Hi, On 3/19/20 1:11 AM, raell at web.de wrote: Dear all, on slide 35 of the presentation about ZGC by Per Lid?n and Stefan Karlsson [1] it is stated that most object references will have the good color. I tried to analyze the probability of a non-root reference that is loaded for the first time for having a good color: A complete cycle has the three phases 1. remapping/marking 2. relocation 3. no gc action (till the next marking/remapping starts) In phase 1 'good' means that the right marked bit is set. At the beginning of the phase all non-root references are bad, at the end of the phase all are good. So, if a non-root reference is selected randomly, it is good with a probability of about 50%. In phases 2, 3 'good' means that the remapped bit is set. Since in these phases no remapping is done (except by the load barrier), all non-root references are bad. So, if a non-root reference is selected randomly, it is good with a probability of 0%. Altogether, it seems to me, that in most parts of a cycle a non-root reference will have the bad color. Of course, I my be missing something. Therefore, I would be interested in an argument, why most object references are expected to have a good color. Your observations are correct. However, applications typically doesn't load randomly selected references. They tend to load a much smaller subset of references over and over again, and a reference will only be bad the first time it's loaded (within the same phase). From our measurements, the chances of loading a bad pointer (i.e. load barrier taking the slow path), is on the order of 1 in a million. cheers, Per Thank you very much! Ralph [1] http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf From conniall at amazon.com Thu Mar 19 17:28:13 2020 From: conniall at amazon.com (Connaughton, Niall) Date: Thu, 19 Mar 2020 17:28:13 +0000 Subject: Experience with ZGC In-Reply-To: References: Message-ID: <397D0AE1-6596-4317-92F8-029B6491A11C@amazon.com> Hey Sergey, thanks for the interesting read. I'm interested to hear more about how you are tracking the Allocation Stalls? Are you post-processing the GC logs to look for stalls, or you have some other mechanism? ?On 3/13/20, 20:22, "zgc-dev on behalf of Sergey Tselovalnikov" wrote: Hi, I met Chad (https://twitter.com/chadarimura) a few weeks ago at UnVoxxed Hawaii unconference and mentioned that we use ZGC at Canva, and he encouraged me to share the details. So I wanted to share our experience here. I hope, sharing our success with ZGC can encourage other people to try it out. At Canva, we use ZGC for our API Gateway (further AFE for short). ZGC helped us to reduce GC pauses from around 30-50ms with occasional spikes to hundreds of ms down to only 1-2ms pauses [0]. GC pauses used to cause issues with the TCP backlog filling up which would result in further queuing inside the app, and would require allocating more threads/connections to clear up the queue. These two graphs show the difference we observed [1]. To give some background, AFE runs on a few dozens of c5.large AWS instances. The application runs on OpenJDK JDK 13 with 1.5 GB max heap size, and a stable heap size around 400 MB. It uses Jetty with non-blocking APIs as a web framework, and Finagle as an RPC framework. When fully warmed up, less than 10% of CPU time is spent in GC threads. Enabling ZGC didn't require any special tuning, however, we increased the max heap size which was previously lower following the recommendations [2]. There were a few issues that we faced: * Occasional crashes prior to 13.0.2 Prior to JDK 13.0.2, we observed a number of crashes that would happen after running the app for around 14 hours. The symptoms were very similar to the ones in JDK-8230565. Looking at the crash logs, we found that the crashes would happen when one of the application methods is being recompiled from level 3 to level 4, so we had to mitigate this issue. However, after updating to 13.0.2, we haven't seen them anymore. * Occasional allocation stalls We're still seeing occasional "Application Stall" events which are a bit harder to debug. It doesn't happen very often, and we're still collecting data, but it seems that at least in some cases it's preceded by a number of "ICBufferFull" safepoints. * The results depend on the load profile ZGC worked really well for us for AFE for which the workload consists of a large number of requests flowing through at a relatively steady rate. However, we didn't have much success trying to apply it to some of the services with a bit spikier workloads. For such workloads, ZGC resulted in a large number of application stalls after which the app wasn't able to keep up with the incoming requests anymore. Thanks for working on this awesome GC! [0] https://user-images.githubusercontent.com/1780970/76673536-116fd000-659e-11ea-8832-4aefa06f02b2.png [1] https://user-images.githubusercontent.com/1780970/76673708-93acc400-659f-11ea-903e-0a9d50ef154d.png [2] https://wiki.openjdk.java.net/display/zgc/Main#Main-SettingHeapSize -- Cheers, Sergey Tselovalnikov From raell at web.de Thu Mar 19 22:19:31 2020 From: raell at web.de (raell at web.de) Date: Thu, 19 Mar 2020 23:19:31 +0100 Subject: Are good colors more common than bad colors? In-Reply-To: References: Message-ID: Per and Gil, thank you very much for your explanations and for sharing your insights. This is very helpful. Regards Ralph From sergeicelov at gmail.com Thu Mar 19 23:21:47 2020 From: sergeicelov at gmail.com (Sergey Tselovalnikov) Date: Fri, 20 Mar 2020 10:21:47 +1100 Subject: Experience with ZGC In-Reply-To: <397D0AE1-6596-4317-92F8-029B6491A11C@amazon.com> References: <397D0AE1-6596-4317-92F8-029B6491A11C@amazon.com> Message-ID: Hey, Niall! It's mostly looking at the GC logs post-factum. For instance, GC logs are one of the few things to look at if the length of the request queue jumps. Seeing this value in a real-time-ish way, e.g. by integrating with the Datadog agent (which we use for monitoring) could be interesting, but not very useful since it's not actionable. On Fri, 20 Mar 2020 at 04:28, Connaughton, Niall wrote: > Hey Sergey, thanks for the interesting read. I'm interested to hear more > about how you are tracking the Allocation Stalls? Are you post-processing > the GC logs to look for stalls, or you have some other mechanism? > > ?On 3/13/20, 20:22, "zgc-dev on behalf of Sergey Tselovalnikov" < > zgc-dev-bounces at openjdk.java.net on behalf of sergeicelov at gmail.com> > wrote: > > Hi, > > I met Chad (https://twitter.com/chadarimura) a few weeks ago at > UnVoxxed > Hawaii unconference and mentioned that we use ZGC at Canva, and he > encouraged me to share the details. So I wanted to share our experience > here. I hope, sharing our success with ZGC can encourage other people > to > try it out. > > At Canva, we use ZGC for our API Gateway (further AFE for short). ZGC > helped us to reduce GC pauses from around 30-50ms with occasional > spikes > to hundreds of ms down to only 1-2ms pauses [0]. GC pauses used to > cause > issues with the TCP backlog filling up which would result in further > queuing inside the app, and would require allocating more > threads/connections to clear up the queue. These two graphs show the > difference we observed [1]. > > To give some background, AFE runs on a few dozens of c5.large AWS > instances. The application runs on OpenJDK JDK 13 with 1.5 GB max heap > size, and a stable heap size around 400 MB. It uses Jetty with > non-blocking > APIs as a web framework, and Finagle as an RPC framework. When fully > warmed > up, less than 10% of CPU time is spent in GC threads. Enabling ZGC > didn't > require any special tuning, however, we increased the max heap size > which > was previously lower following the recommendations [2]. > > There were a few issues that we faced: > > * Occasional crashes prior to 13.0.2 > Prior to JDK 13.0.2, we observed a number of crashes that would happen > after running the app for around 14 hours. The symptoms were very > similar > to the ones in JDK-8230565. Looking at the crash logs, we found that > the > crashes would happen when one of the application methods is being > recompiled from level 3 to level 4, so we had to mitigate this issue. > However, after updating to 13.0.2, we haven't seen them anymore. > > * Occasional allocation stalls > We're still seeing occasional "Application Stall" events which are a > bit > harder to debug. It doesn't happen very often, and we're still > collecting > data, but it seems that at least in some cases it's preceded by a > number of > "ICBufferFull" safepoints. > > * The results depend on the load profile > ZGC worked really well for us for AFE for which the workload consists > of a > large number of requests flowing through at a relatively steady rate. > However, we didn't have much success trying to apply it to some of the > services with a bit spikier workloads. For such workloads, ZGC > resulted in > a large number of application stalls after which the app wasn't able to > keep up with the incoming requests anymore. > > Thanks for working on this awesome GC! > > [0] > > https://user-images.githubusercontent.com/1780970/76673536-116fd000-659e-11ea-8832-4aefa06f02b2.png > [1] > > https://user-images.githubusercontent.com/1780970/76673708-93acc400-659f-11ea-903e-0a9d50ef154d.png > [2] > https://wiki.openjdk.java.net/display/zgc/Main#Main-SettingHeapSize > > -- > Cheers, > Sergey Tselovalnikov > > > -- Cheers, Sergey Tselovalnikov From sergeicelov at gmail.com Thu Mar 19 23:34:45 2020 From: sergeicelov at gmail.com (Sergey Tselovalnikov) Date: Fri, 20 Mar 2020 10:34:45 +1100 Subject: Experience with ZGC In-Reply-To: <1fabb4c8-a4b1-31ea-9d14-72b105a0acf0@oracle.com> References: <1fabb4c8-a4b1-31ea-9d14-72b105a0acf0@oracle.com> Message-ID: Hi, Erik! Thank you, these knobs can come in really handy! > if there is any way I could try one of the spiky workload, I would be delighted to have acloser look at it. I can't share them, unfortunately, but the idea is that some of the services have scheduled tasks every N (typically 5-15) minutes. These tasks might be CPU intensive with lots of allocation involved, for instance - rebuilding indexes. With GC, such tasks do not noticeably interfere with serving the requests. WIth ZGC such services fail to keep up with the incoming requests causing most of them to fail. On Tue, 17 Mar 2020 at 07:41, Erik ?sterlund wrote: > Hi Sergey, > > Thank you for sharing your experience using ZGC. I am glad to hear that > you like it in general. > > As for the more spiky workloads, it is possible to tame the GC by tuning > two knobs: > 1) -XX:ZAllocationSpikeTolerance > This flag sets a factor of how much we can expect the allocation rate to > fluctuate. The default is 2. > Higher values will trigger GC earlier, anticipating that allocation > rates will spike more. > > 2) -XX:SoftMaxHeapSize > The GC will try to keep the heap below this size. So by setting it lower > than the MaxHeapSize, you > can accommodate more spiky allocation rate and heap residency better. > > I hope this helps you. Having said that, I would love for the defaults > to be able to catch such issues > better automatically, so if there is any way I could try one of the > spiky workload, I would be delighted > to have acloser look at it. > > Thanks, > /Erik > > On 2020-03-14 04:20, Sergey Tselovalnikov wrote: > > Hi, > > > > I met Chad (https://twitter.com/chadarimura) a few weeks ago at UnVoxxed > > Hawaii unconference and mentioned that we use ZGC at Canva, and he > > encouraged me to share the details. So I wanted to share our experience > > here. I hope, sharing our success with ZGC can encourage other people to > > try it out. > > > > At Canva, we use ZGC for our API Gateway (further AFE for short). ZGC > > helped us to reduce GC pauses from around 30-50ms with occasional spikes > > to hundreds of ms down to only 1-2ms pauses [0]. GC pauses used to cause > > issues with the TCP backlog filling up which would result in further > > queuing inside the app, and would require allocating more > > threads/connections to clear up the queue. These two graphs show the > > difference we observed [1]. > > > > To give some background, AFE runs on a few dozens of c5.large AWS > > instances. The application runs on OpenJDK JDK 13 with 1.5 GB max heap > > size, and a stable heap size around 400 MB. It uses Jetty with > non-blocking > > APIs as a web framework, and Finagle as an RPC framework. When fully > warmed > > up, less than 10% of CPU time is spent in GC threads. Enabling ZGC didn't > > require any special tuning, however, we increased the max heap size which > > was previously lower following the recommendations [2]. > > > > There were a few issues that we faced: > > > > * Occasional crashes prior to 13.0.2 > > Prior to JDK 13.0.2, we observed a number of crashes that would happen > > after running the app for around 14 hours. The symptoms were very similar > > to the ones in JDK-8230565. Looking at the crash logs, we found that the > > crashes would happen when one of the application methods is being > > recompiled from level 3 to level 4, so we had to mitigate this issue. > > However, after updating to 13.0.2, we haven't seen them anymore. > > > > * Occasional allocation stalls > > We're still seeing occasional "Application Stall" events which are a bit > > harder to debug. It doesn't happen very often, and we're still collecting > > data, but it seems that at least in some cases it's preceded by a number > of > > "ICBufferFull" safepoints. > > > > * The results depend on the load profile > > ZGC worked really well for us for AFE for which the workload consists of > a > > large number of requests flowing through at a relatively steady rate. > > However, we didn't have much success trying to apply it to some of the > > services with a bit spikier workloads. For such workloads, ZGC resulted > in > > a large number of application stalls after which the app wasn't able to > > keep up with the incoming requests anymore. > > > > Thanks for working on this awesome GC! > > > > [0] > > > https://user-images.githubusercontent.com/1780970/76673536-116fd000-659e-11ea-8832-4aefa06f02b2.png > > [1] > > > https://user-images.githubusercontent.com/1780970/76673708-93acc400-659f-11ea-903e-0a9d50ef154d.png > > [2] https://wiki.openjdk.java.net/display/zgc/Main#Main-SettingHeapSize > > > > -- Cheers, Sergey Tselovalnikov From eosterlund at openjdk.java.net Mon Mar 23 09:49:02 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Mar 2020 09:49:02 GMT Subject: git: openjdk/zgc: 2 new changesets Message-ID: Changeset: cda7af81 Author: Erik ?sterlund Date: 2020-03-23 09:31:21 +0000 URL: https://git.openjdk.java.net/zgc/commit/cda7af81 ZGC: Assume nmethod and stack barriers are always available ! src/hotspot/share/gc/z/zBarrierSet.cpp ! src/hotspot/share/gc/z/zMark.cpp ! src/hotspot/share/gc/z/zNMethod.cpp ! src/hotspot/share/gc/z/zNMethod.hpp ! src/hotspot/share/gc/z/zRelocate.cpp ! src/hotspot/share/gc/z/zRootsIterator.cpp ! src/hotspot/share/gc/z/zRootsIterator.hpp ! src/hotspot/share/gc/z/zVerify.cpp Changeset: 4f00c861 Author: Erik ?sterlund Date: 2020-03-23 09:42:07 +0000 URL: https://git.openjdk.java.net/zgc/commit/4f00c861 ZGC: Remove VMThread root processing ! src/hotspot/share/gc/z/zRootsIterator.cpp ! src/hotspot/share/gc/z/zRootsIterator.hpp From eosterlund at openjdk.java.net Mon Mar 23 11:49:12 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Mar 2020 11:49:12 GMT Subject: git: openjdk/zgc: 2 new changesets Message-ID: Changeset: 0d11461f Author: Erik ?sterlund Date: 2020-03-23 10:50:52 +0000 URL: https://git.openjdk.java.net/zgc/commit/0d11461f ZGC: Remove do_frames default parameter on Thread::oops_do ! src/hotspot/share/gc/parallel/psParallelCompact.cpp ! src/hotspot/share/gc/parallel/psScavenge.cpp ! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp ! src/hotspot/share/gc/z/zHeapIterator.cpp ! src/hotspot/share/jfr/leakprofiler/checkpoint/rootResolver.cpp ! src/hotspot/share/runtime/serviceThread.hpp ! src/hotspot/share/runtime/thread.cpp ! src/hotspot/share/runtime/thread.hpp ! src/hotspot/share/runtime/vmThread.cpp Changeset: 72d6d21e Author: Erik ?sterlund Date: 2020-03-23 11:11:38 +0000 URL: https://git.openjdk.java.net/zgc/commit/72d6d21e ZGC: Remove default parameters on frame iterators ! src/hotspot/share/jfr/leakprofiler/checkpoint/rootResolver.cpp ! src/hotspot/share/jvmci/jvmciCompilerToVM.cpp ! src/hotspot/share/prims/whitebox.cpp ! src/hotspot/share/runtime/deoptimization.cpp ! src/hotspot/share/runtime/frame.hpp ! src/hotspot/share/runtime/interfaceSupport.cpp ! src/hotspot/share/runtime/thread.cpp ! src/hotspot/share/runtime/vframe.cpp ! src/hotspot/share/runtime/vframe.hpp ! src/hotspot/share/runtime/vframe.inline.hpp ! src/hotspot/share/runtime/vmOperations.cpp ! src/hotspot/share/utilities/vmError.cpp From per.liden at oracle.com Mon Mar 23 13:28:40 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 23 Mar 2020 14:28:40 +0100 Subject: ZGC | What's new in JDK 14 Message-ID: <5a4c52e9-5226-0344-42ed-226dede1aadf@oracle.com> I wrote a blog post highlighting some of the more important and interesting enhancements we did to ZGC in JDK 14. https://malloc.se/blog/zgc-jdk14 cheers, Per From thomas.stuefe at gmail.com Wed Mar 25 07:14:18 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 25 Mar 2020 08:14:18 +0100 Subject: gc/z/TestUncommit times out Message-ID: Hi, on some of my machines (all not that slow) gc/z/TestUncommit keeps timing out. It needs about 3-4 minutes to run through successfully, which means the default 120 seconds are too short. Is that normal? I ran into this testing our new Metaspace prototype, but the timeouts happen in the stock VM too (jdk15 tip). If that is normal, would it make sense to adjust the test to not hit the timeout on standard machines? Cheers, Thomas From per.liden at oracle.com Wed Mar 25 08:35:36 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 25 Mar 2020 09:35:36 +0100 Subject: gc/z/TestUncommit times out In-Reply-To: References: Message-ID: Hi Thomas, On 3/25/20 8:14 AM, Thomas St?fe wrote: > Hi, > > on some of my machines (all not that slow) gc/z/TestUncommit keeps timing > out. It needs about 3-4 minutes to run through successfully, which means > the default 120 seconds are too short. > > Is that normal? I ran into this testing our new Metaspace prototype, but > the timeouts happen in the stock VM too (jdk15 tip). > > If that is normal, would it make sense to adjust the test to not hit the > timeout on standard machines? Yes, that test takes 4 minutes or so in total, but each @run should be less than 120 second. I've never seen this time out in our testing, but I can see that the first @run (which does 3 iterations) is on the border. We could lower that to 2 iterations to play it safe. I assume that would help you? --- a/test/hotspot/jtreg/gc/z/TestUncommit.java +++ b/test/hotspot/jtreg/gc/z/TestUncommit.java @@ -27,7 +27,7 @@ * @test TestUncommit * @requires vm.gc.Z & !vm.graal.enabled & vm.compMode != "Xcomp" * @summary Test ZGC uncommit unused memory - * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 3 + * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 2 * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms512M -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit false 1 * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M -Xmx512M -XX:ZUncommitDelay=10 -XX:-ZUncommit gc.z.TestUncommit false 1 */ cheers, Per From thomas.stuefe at gmail.com Wed Mar 25 09:38:22 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 25 Mar 2020 10:38:22 +0100 Subject: gc/z/TestUncommit times out In-Reply-To: References:

Message-ID: Thank you Per, yes, this helps. Cheers, Thomas On Wed, Mar 25, 2020 at 9:35 AM Per Liden wrote: > Hi Thomas, > > On 3/25/20 8:14 AM, Thomas St?fe wrote: > > Hi, > > > > on some of my machines (all not that slow) gc/z/TestUncommit keeps timing > > out. It needs about 3-4 minutes to run through successfully, which means > > the default 120 seconds are too short. > > > > Is that normal? I ran into this testing our new Metaspace prototype, but > > the timeouts happen in the stock VM too (jdk15 tip). > > > > If that is normal, would it make sense to adjust the test to not hit the > > timeout on standard machines? > > Yes, that test takes 4 minutes or so in total, but each @run should be > less than 120 second. I've never seen this time out in our testing, but > I can see that the first @run (which does 3 iterations) is on the > border. We could lower that to 2 iterations to play it safe. I assume > that would help you? > > --- a/test/hotspot/jtreg/gc/z/TestUncommit.java > +++ b/test/hotspot/jtreg/gc/z/TestUncommit.java > @@ -27,7 +27,7 @@ > * @test TestUncommit > * @requires vm.gc.Z & !vm.graal.enabled & vm.compMode != "Xcomp" > * @summary Test ZGC uncommit unused memory > - * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 3 > + * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 2 > * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms512M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit false 1 > * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 -XX:-ZUncommit gc.z.TestUncommit false 1 > */ > > cheers, > Per > From per.liden at oracle.com Wed Mar 25 12:01:28 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 25 Mar 2020 13:01:28 +0100 Subject: gc/z/TestUncommit times out In-Reply-To: References:

Message-ID: <12909ea1-8090-9cd4-5bc3-af4e72b123a9@oracle.com> Ok, good! I'll file a bug and post a patch for review on hotspot-gc-dev. /Per On 3/25/20 10:38 AM, Thomas St?fe wrote: > Thank you Per, yes, this helps. > > Cheers, Thomas > > On Wed, Mar 25, 2020 at 9:35 AM Per Liden > wrote: > > Hi Thomas, > > On 3/25/20 8:14 AM, Thomas St?fe wrote: > > Hi, > > > > on some of my machines (all not that slow) gc/z/TestUncommit > keeps timing > > out. It needs about 3-4 minutes to run through successfully, > which means > > the default 120 seconds are too short. > > > > Is that normal? I ran into this testing our new Metaspace > prototype, but > > the timeouts happen in the stock VM too (jdk15 tip). > > > > If that is normal, would it make sense to adjust the test to not > hit the > > timeout on standard machines? > > Yes, that test takes 4 minutes or so in total, but each @run should be > less than 120 second. I've never seen this time out in our testing, but > I can see that the first @run (which does 3 iterations) is on the > border. We could lower that to 2 iterations to play it safe. I assume > that would help you? > > --- a/test/hotspot/jtreg/gc/z/TestUncommit.java > +++ b/test/hotspot/jtreg/gc/z/TestUncommit.java > @@ -27,7 +27,7 @@ > ? ?* @test TestUncommit > ? ?* @requires vm.gc.Z & !vm.graal.enabled & vm.compMode != "Xcomp" > ? ?* @summary Test ZGC uncommit unused memory > - * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 3 > + * @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit true 2 > ? ?* @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms512M > -Xmx512M -XX:ZUncommitDelay=10 gc.z.TestUncommit false 1 > ? ?* @run main/othervm -XX:+UseZGC -Xlog:gc*,gc+stats=off -Xms128M > -Xmx512M -XX:ZUncommitDelay=10 -XX:-ZUncommit gc.z.TestUncommit false 1 > ? ?*/ > > cheers, > Per > From stumon01 at arm.com Thu Mar 26 22:42:08 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Thu, 26 Mar 2020 22:42:08 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading Message-ID: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Hello, Please review this change to implement nmethod entry barriers on aarch64, and hence concurrent class unloading with ZGC. Shenandoah will need to be separately tested and enabled - there are problems with this on Shenandoah. It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as well as Netbeans. In terms of interesting features: With nmethod entry barriers, immediate oops are removed by: LIR_Assembler::jobject2reg and MacroAssembler::movoop This is to ensure consistency with the entry barrier, as otherwise with an immediate we'd otherwise need an ISB. I've added "-XX:DeoptNMethodBarrierALot". I found this functionality useful in testing as deoptimisation is very infrequent. I've written it as an atomic to avoid it happening too frequently. As it is a new option, I'm not sure whether any more is needed than this review. A new test has been added "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to test GC with that option enabled. BarrierSetAssembler::nmethod_entry_barrier This method emits the barrier code. In internal review it was suggested the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not done this as the BarrierSetNMethod code checks the exact instruction sequence, and I prefer to be explicit. Benchmarking method entry shows an increase of around 6ns with the nmethod entry barrier. The deoptimisation code was contributed by Andrew Haley. The bug: https://bugs.openjdk.java.net/browse/JDK-8216557 The webrev: http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ BR, Stuart IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From erik.osterlund at oracle.com Fri Mar 27 09:47:41 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 27 Mar 2020 10:47:41 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> Hi Stuart, Thanks for sorting this out on AArch64. It is nice to see thatyou can implement these barriers on platforms that do not have instruction cache coherency. One small change request: It looks like in C1 you inject the entry barrier right after build_frame is done: ?629?????? build_frame(); ?630?????? { ?631???????? // Insert nmethod entry barrier into frame. ?632???????? BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); ?633???????? bs->nmethod_entry_barrier(_masm); ?634?????? } Unfortunately, this is in the platform independent part of the LIR assembler. In the x86 version we inject it at the very end of build_frame() instead, which is a platform-specific function. The platform-specific function is in the C1 macro assembler file for that platform. We intentionally put it in the platform-specific path as it is a platform-specific feature. Now on x86, the barrier code will be emitted once in build_frame() and once after returning from build_frame, resulting in two nmethod entry barriers, and only the first one will get patched, causing the second one to mostly take slow paths, which isn't necessarily wrong, but will cause regressions. I would propose you just move those lines into the very end of the AArch64-specific part of build_frame(). I don't need to see another webrev for that trivial code motion. This looks good to me. Agan, thanks a lot for fixing this! It will allow me to go forward with concurrent stack scanning on AArch64 as well. Thanks, /Erik On 2020-03-26 23:42, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From per.liden at oracle.com Fri Mar 27 11:36:37 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 12:36:37 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Hi Stuart, Awesome, thanks a lot for doing this! On 3/26/20 11:42 PM, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ I'll leave the aarch64-specific part for others to review. I just have two minor comments on the rest. * May I suggest that we rename DeoptNMethodBarrierALot to DeoptimizeNMethodBarriersALot, to better match -XX:DeoptimizeALot and friends? * The "counter" used should probably be an unsigned type, to avoid any overflow UB. That variable could also move into the scope where it's used. Like: ---------------------------------------------------------- diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp @@ -50,7 +50,6 @@ int BarrierSetNMethod::nmethod_stub_entry_barrier(address* return_address_ptr) { address return_address = *return_address_ptr; CodeBlob* cb = CodeCache::find_blob(return_address); - static volatile int counter=0; assert(cb != NULL, "invariant"); @@ -67,8 +66,9 @@ // Diagnostic option to force deoptimization 1 in 3 times. It is otherwise // a very rare event. - if (DeoptNMethodBarrierALot) { - if (Atomic::add(&counter, 1) % 3 == 0) { + if (DeoptimizeNMethodBarriersALot) { + static volatile uint32_t counter = 0; + if (Atomic::add(&counter, 1u) % 3 == 0) { may_enter = false; } } diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -2489,7 +2489,7 @@ product(bool, UseEmptySlotsInSupers, true, \ "Allow allocating fields in empty slots of super-classes") \ \ - diagnostic(bool, DeoptNMethodBarrierALot, false, \ + diagnostic(bool, DeoptimizeNMethodBarriersALot, false, \ "Make nmethod barriers deoptimise a lot.") \ // Interface macros ---------------------------------------------------------- * Instead of adding a new file for the test, we could just add a new section in the existing test. * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. Like: ---------------------------------------------------------- diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -35,6 +35,18 @@ * @summary Stress ZGC * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC gc.stress.gcbasher.TestGCBasherWithZ 120000 */ + +/* + * @test TestGCBasherDeoptWithZ + * @key gc stress + * @library / + * @requires vm.gc.Z + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC + * -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 + */ + public class TestGCBasherWithZ { public static void main(String[] args) throws IOException { TestGCBasher.main(args); ---------------------------------------------------------- cheers, Per > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From per.liden at oracle.com Fri Mar 27 11:59:42 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 12:59:42 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Message-ID: Hi again, On 3/27/20 12:36 PM, Per Liden wrote: > Hi Stuart, > > Awesome, thanks a lot for doing this! > > On 3/26/20 11:42 PM, Stuart Monteith wrote: >> Hello, >> ???????? Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> ????????? With nmethod entry barriers,? immediate oops are removed by: >> ???????????????? LIR_Assembler::jobject2reg? and? MacroAssembler::movoop >> ???????? This is to ensure consistency with the entry barrier, as >> otherwise with >> an immediate we'd otherwise need an ISB. >> >> ???????? I've added "-XX:DeoptNMethodBarrierALot". I found this >> functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> ???????? BarrierSetAssembler::nmethod_entry_barrier >> ???????? This method emits the barrier code. In internal review it was >> suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> ???????? Benchmarking method entry shows an increase of around 6ns >> with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> ???????? https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> ???????? http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > I'll leave the aarch64-specific part for others to review. I just have > two minor comments on the rest. > > * May I suggest that we rename DeoptNMethodBarrierALot to > DeoptimizeNMethodBarriersALot, to better match -XX:DeoptimizeALot and > friends? > > * The "counter" used should probably be an unsigned type, to avoid any > overflow UB. That variable could also move into the scope where it's used. > > Like: > > ---------------------------------------------------------- > diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > @@ -50,7 +50,6 @@ > ?int BarrierSetNMethod::nmethod_stub_entry_barrier(address* > return_address_ptr) { > ?? address return_address = *return_address_ptr; > ?? CodeBlob* cb = CodeCache::find_blob(return_address); > -? static volatile int counter=0; > > ?? assert(cb != NULL, "invariant"); > > @@ -67,8 +66,9 @@ > > ?? // Diagnostic option to force deoptimization 1 in 3 times. It is > otherwise > ?? // a very rare event. > -? if (DeoptNMethodBarrierALot) { > -??? if (Atomic::add(&counter, 1) % 3 == 0) { > +? if (DeoptimizeNMethodBarriersALot) { > +??? static volatile uint32_t counter = 0; > +??? if (Atomic::add(&counter, 1u) % 3 == 0) { > ?????? may_enter = false; > ???? } > ?? } > diff --git a/src/hotspot/share/runtime/globals.hpp > b/src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -2489,7 +2489,7 @@ > ?? product(bool, UseEmptySlotsInSupers, true, ???? \ > ???????????????? "Allow allocating fields in empty slots of > super-classes")? \ > > ???? \ > -? diagnostic(bool, DeoptNMethodBarrierALot, false, ??? \ > +? diagnostic(bool, DeoptimizeNMethodBarriersALot, false, ??? \ > ???????????????? "Make nmethod barriers deoptimise a lot.") ???? \ > > ?// Interface macros > ---------------------------------------------------------- > > > * Instead of adding a new file for the test, we could just add a new > section in the existing test. > > * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. Meh, forgot -XX:+UnlockDiagnosticVMOptions in my patch. Updated: ---------------------------------------------------------- diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -35,6 +35,18 @@ * @summary Stress ZGC * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC gc.stress.gcbasher.TestGCBasherWithZ 120000 */ + +/* + * @test TestGCBasherDeoptWithZ + * @key gc stress + * @library / + * @requires vm.gc.Z + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC + * -XX:+UnlockDiagnosticVMOptions -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 + */ + public class TestGCBasherWithZ { public static void main(String[] args) throws IOException { TestGCBasher.main(args); ---------------------------------------------------------- cheers, Per > > Like: > > ---------------------------------------------------------- > diff --git > a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -35,6 +35,18 @@ > ? * @summary Stress ZGC > ? * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > gc.stress.gcbasher.TestGCBasherWithZ 120000 > ? */ > + > +/* > + * @test TestGCBasherDeoptWithZ > + * @key gc stress > + * @library / > + * @requires vm.gc.Z > + * @requires vm.flavor == "server" & !vm.emulatedClient & > !vm.graal.enabled & vm.opt.ClassUnloading != false > + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled > + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace > -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > + *?????????????????????????????? -XX:+DeoptimizeNMethodBarriersALot > -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 > + */ > + > ?public class TestGCBasherWithZ { > ???? public static void main(String[] args) throws IOException { > ???????? TestGCBasher.main(args); > ---------------------------------------------------------- > > cheers, > Per > > >> >> >> BR, >> ???????? Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose >> the contents to any other person, use it for any purpose, or store or >> copy the information in any medium. Thank you. >> From aph at redhat.com Fri Mar 27 12:36:41 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 27 Mar 2020 12:36:41 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> On 3/26/20 10:42 PM, Stuart Monteith wrote: > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. I understand, but LoadLoad is the semantics you need, and it's more important to say that. The mere existence of verification code shouldn't determine how you express the runtime code. I'll do a thorough review later. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Fri Mar 27 14:01:15 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 27 Mar 2020 10:01:15 -0400 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Hi Stuart, Great work! On 3/26/20 6:42 PM, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. I identified a problem that failed TestStringDedupStress.java, I have fix for it. Would you mind to share what else failed with Shenandoah? Thanks, -Zhengyu > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From stumon01 at arm.com Fri Mar 27 15:28:23 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:28:23 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Message-ID: Hello Zhengyu, That is the same test I had trouble with. One of the stack traces I had is: V [libjvm.so+0x4dd538] CompressedKlassPointers::decode_not_null(unsigned int)+0x70 V [libjvm.so+0xb87130] InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] j TestStringDedupStress.main([Ljava/lang/String;)V+162 v ~StubRoutines::call_stub V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 V [libjvm.so+0x12531d4] invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 V [libjvm.so+0x125380c] Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 V [libjvm.so+0xd05280] JVM_InvokeMethod+0x210 j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base at 15-internal j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base at 15-internal j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base at 15-internal j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 j java.lang.Thread.run()V+11 java.base at 15-internal v ~StubRoutines::call_stub V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 V [libjvm.so+0xb95784] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac V [libjvm.so+0xb95974] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac V [libjvm.so+0xcefce8] thread_entry(JavaThread*, Thread*)+0x98 V [libjvm.so+0x1507cc8] JavaThread::thread_main_inner()+0x258 V [libjvm.so+0x150fdac] JavaThread::run()+0x27c V [libjvm.so+0x150d4a4] Thread::call_run()+0x10c V [libjvm.so+0x115ff70] thread_native_entry(Thread*)+0x120 C [libpthread.so.0+0x8880] start_thread+0x1a0 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] j TestStringDedupStress.main([Ljava/lang/String;)V+162 v ~StubRoutines::call_stub j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base at 15-internal j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base at 15-internal j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base at 15-internal j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 j java.lang.Thread.run()V+11 java.base at 15-internal v ~StubRoutines::call_stub There are variations on that theme, but that was one of the more common ones. Thanks, Stuart On 27/03/2020 14:01, Zhengyu Gu wrote: > Hi Stuart, > > Great work! > > On 3/26/20 6:42 PM, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. > > I identified a problem that failed TestStringDedupStress.java, I have fix for it. > > Would you mind to share what else failed with Shenandoah? > > Thanks, > > -Zhengyu > > >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From zgu at redhat.com Fri Mar 27 15:30:31 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 27 Mar 2020 11:30:31 -0400 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Message-ID: <55a800b2-1564-1162-7177-485e674dc618@redhat.com> Hi Stuart, Yes, this is the same problem I saw. I filed JDK-8241765, will RFR once it passes all tests. Thanks, -Zhengyu On 3/27/20 11:28 AM, Stuart Monteith wrote: > Hello Zhengyu, > That is the same test I had trouble with. > > One of the stack traces I had is: > > V [libjvm.so+0x4dd538] CompressedKlassPointers::decode_not_null(unsigned int)+0x70 > V [libjvm.so+0xb87130] InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 > j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal > J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; > java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] > j TestStringDedupStress.main([Ljava/lang/String;)V+162 > v ~StubRoutines::call_stub > V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 > V [libjvm.so+0x12531d4] invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, > objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 > V [libjvm.so+0x125380c] Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 > V [libjvm.so+0xd05280] JVM_InvokeMethod+0x210 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base at 15-internal > j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base at 15-internal > j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base at 15-internal > j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal > j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 > j java.lang.Thread.run()V+11 java.base at 15-internal > v ~StubRoutines::call_stub > V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 > V [libjvm.so+0xb95784] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac > V [libjvm.so+0xb95974] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac > V [libjvm.so+0xcefce8] thread_entry(JavaThread*, Thread*)+0x98 > V [libjvm.so+0x1507cc8] JavaThread::thread_main_inner()+0x258 > V [libjvm.so+0x150fdac] JavaThread::run()+0x27c > V [libjvm.so+0x150d4a4] Thread::call_run()+0x10c > V [libjvm.so+0x115ff70] thread_native_entry(Thread*)+0x120 > C [libpthread.so.0+0x8880] start_thread+0x1a0 > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal > J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; > java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] > j TestStringDedupStress.main([Ljava/lang/String;)V+162 > v ~StubRoutines::call_stub > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base at 15-internal > j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base at 15-internal > j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base at 15-internal > j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal > j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 > j java.lang.Thread.run()V+11 java.base at 15-internal > v ~StubRoutines::call_stub > > There are variations on that theme, but that was one of the more common ones. > > Thanks, > Stuart > > On 27/03/2020 14:01, Zhengyu Gu wrote: >> Hi Stuart, >> >> Great work! >> >> On 3/26/20 6:42 PM, Stuart Monteith wrote: >>> Hello, >>> Please review this change to implement nmethod entry barriers on >>> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >>> need to be separately tested and enabled - there are problems with this >>> on Shenandoah. >> >> I identified a problem that failed TestStringDedupStress.java, I have fix for it. >> >> Would you mind to share what else failed with Shenandoah? >> >> Thanks, >> >> -Zhengyu >> >> >>> >>> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >>> well as Netbeans. >>> >>> In terms of interesting features: >>> With nmethod entry barriers, immediate oops are removed by: >>> LIR_Assembler::jobject2reg and MacroAssembler::movoop >>> This is to ensure consistency with the entry barrier, as otherwise with >>> an immediate we'd otherwise need an ISB. >>> >>> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >>> useful in testing as deoptimisation is very infrequent. I've written it >>> as an atomic to avoid it happening too frequently. As it is a new >>> option, I'm not sure whether any more is needed than this review. A new >>> test has been added >>> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >>> test GC with that option enabled. >>> >>> BarrierSetAssembler::nmethod_entry_barrier >>> This method emits the barrier code. In internal review it was suggested >>> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >>> done this as the BarrierSetNMethod code checks the exact instruction >>> sequence, and I prefer to be explicit. >>> >>> Benchmarking method entry shows an increase of around 6ns with the >>> nmethod entry barrier. >>> >>> >>> The deoptimisation code was contributed by Andrew Haley. >>> >>> The bug: >>> https://bugs.openjdk.java.net/browse/JDK-8216557 >>> >>> The webrev: >>> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >>> >>> >>> BR, >>> Stuart >>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >>> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >>> person, use it for any purpose, or store or copy the information in any medium. Thank you. >>> >> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From stumon01 at arm.com Fri Mar 27 15:32:53 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:32:53 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <55a800b2-1564-1162-7177-485e674dc618@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> <55a800b2-1564-1162-7177-485e674dc618@redhat.com> Message-ID: <40210d3e-15f5-02b9-9197-d35549d40cf7@arm.com> Thanks, that's great. It's great we have two GCs able to exercise the new barrier. On 27/03/2020 15:30, Zhengyu Gu wrote: > Hi Stuart, > > Yes, this is the same problem I saw. I filed JDK-8241765, will RFR once it passes all tests. > > Thanks, > > -Zhengyu > > On 3/27/20 11:28 AM, Stuart Monteith wrote: >> Hello Zhengyu, >> That is the same test I had trouble with. >> >> One of the stack traces I had is: >> >> V [libjvm.so+0x4dd538]? CompressedKlassPointers::decode_not_null(unsigned int)+0x70 >> V [libjvm.so+0xb87130]? InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 >> j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal >> J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; >> java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] >> j TestStringDedupStress.main([Ljava/lang/String;)V+162 >> v ~StubRoutines::call_stub >> V [libjvm.so+0xb95328]? JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 >> V [libjvm.so+0x12531d4]? invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, >> objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 >> V [libjvm.so+0x125380c]? Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 >> V [libjvm.so+0xd05280]? JVM_InvokeMethod+0x210 >> j >> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 >> >> java.base at 15-internal >> j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 >> java.base at 15-internal >> j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 >> java.base at 15-internal >> j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal >> j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 >> j java.lang.Thread.run()V+11 java.base at 15-internal >> v ~StubRoutines::call_stub >> V [libjvm.so+0xb95328]? JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 >> V [libjvm.so+0xb95784]? JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac >> V [libjvm.so+0xb95974]? JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac >> V [libjvm.so+0xcefce8]? thread_entry(JavaThread*, Thread*)+0x98 >> V [libjvm.so+0x1507cc8]? JavaThread::thread_main_inner()+0x258 >> V [libjvm.so+0x150fdac]? JavaThread::run()+0x27c >> V [libjvm.so+0x150d4a4]? Thread::call_run()+0x10c >> V [libjvm.so+0x115ff70]? thread_native_entry(Thread*)+0x120 >> C [libpthread.so.0+0x8880]? start_thread+0x1a0 >> >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal >> J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; >> java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] >> j TestStringDedupStress.main([Ljava/lang/String;)V+162 >> v ~StubRoutines::call_stub >> j >> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 >> >> java.base at 15-internal >> j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 >> java.base at 15-internal >> j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 >> java.base at 15-internal >> j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal >> j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 >> j java.lang.Thread.run()V+11 java.base at 15-internal >> v ~StubRoutines::call_stub >> >> There are variations on that theme, but that was one of the more common ones. >> >> Thanks, >> Stuart >> >> On 27/03/2020 14:01, Zhengyu Gu wrote: >>> Hi Stuart, >>> >>> Great work! >>> >>> On 3/26/20 6:42 PM, Stuart Monteith wrote: >>>> Hello, >>>> Please review this change to implement nmethod entry barriers on >>>> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >>>> need to be separately tested and enabled - there are problems with this >>>> on Shenandoah. >>> >>> I identified a problem that failed TestStringDedupStress.java, I have fix for it. >>> >>> Would you mind to share what else failed with Shenandoah? >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>>> >>>> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >>>> well as Netbeans. >>>> >>>> In terms of interesting features: >>>> With nmethod entry barriers, immediate oops are removed by: >>>> LIR_Assembler::jobject2reg and MacroAssembler::movoop >>>> This is to ensure consistency with the entry barrier, as otherwise with >>>> an immediate we'd otherwise need an ISB. >>>> >>>> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >>>> useful in testing as deoptimisation is very infrequent. I've written it >>>> as an atomic to avoid it happening too frequently. As it is a new >>>> option, I'm not sure whether any more is needed than this review. A new >>>> test has been added >>>> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >>>> test GC with that option enabled. >>>> >>>> BarrierSetAssembler::nmethod_entry_barrier >>>> This method emits the barrier code. In internal review it was suggested >>>> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >>>> done this as the BarrierSetNMethod code checks the exact instruction >>>> sequence, and I prefer to be explicit. >>>> >>>> Benchmarking method entry shows an increase of around 6ns with the >>>> nmethod entry barrier. >>>> >>>> >>>> The deoptimisation code was contributed by Andrew Haley. >>>> >>>> The bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8216557 >>>> >>>> The webrev: >>>> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >>>> >>>> >>>> BR, >>>> Stuart >>>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >>>> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >>>> person, use it for any purpose, or store or copy the information in any medium. Thank you. >>>> >>> >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 15:35:30 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:35:30 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> Message-ID: <9bd573db-1983-1dd3-dd11-f7803de3f851@arm.com> Thanks Andrew, I'll change that round. The code verifying the barrier would catch any change there anyway. On 27/03/2020 12:36, Andrew Haley wrote: > On 3/26/20 10:42 PM, Stuart Monteith wrote: >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. > > I understand, but LoadLoad is the semantics you need, and it's more important > to say that. The mere existence of verification code shouldn't determine > how you express the runtime code. > > I'll do a thorough review later. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 23:12:14 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 23:12:14 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Message-ID: Thanks Per, That all makes sense - I've made those changes, they'll appear in the next patch set. On 27/03/2020 11:36, Per Liden wrote: > Hi Stuart, > > Awesome, thanks a lot for doing this! > > On 3/26/20 11:42 PM, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > I'll leave the aarch64-specific part for others to review. I just have two minor comments on the rest. > > * May I suggest that we rename DeoptNMethodBarrierALot to DeoptimizeNMethodBarriersALot, to better match > -XX:DeoptimizeALot and friends? > > * The "counter" used should probably be an unsigned type, to avoid any overflow UB. That variable could also move into > the scope where it's used. > > Like: > > ---------------------------------------------------------- > diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > @@ -50,7 +50,6 @@ > int BarrierSetNMethod::nmethod_stub_entry_barrier(address* return_address_ptr) { > address return_address = *return_address_ptr; > CodeBlob* cb = CodeCache::find_blob(return_address); > - static volatile int counter=0; > > assert(cb != NULL, "invariant"); > > @@ -67,8 +66,9 @@ > > // Diagnostic option to force deoptimization 1 in 3 times. It is otherwise > // a very rare event. > - if (DeoptNMethodBarrierALot) { > - if (Atomic::add(&counter, 1) % 3 == 0) { > + if (DeoptimizeNMethodBarriersALot) { > + static volatile uint32_t counter = 0; > + if (Atomic::add(&counter, 1u) % 3 == 0) { > may_enter = false; > } > } > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -2489,7 +2489,7 @@ > product(bool, UseEmptySlotsInSupers, true, \ > "Allow allocating fields in empty slots of super-classes") \ > > \ > - diagnostic(bool, DeoptNMethodBarrierALot, false, \ > + diagnostic(bool, DeoptimizeNMethodBarriersALot, false, \ > "Make nmethod barriers deoptimise a lot.") \ > > // Interface macros > ---------------------------------------------------------- > > > * Instead of adding a new file for the test, we could just add a new section in the existing test. > > * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. > > Like: > > ---------------------------------------------------------- > diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -35,6 +35,18 @@ > * @summary Stress ZGC > * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > gc.stress.gcbasher.TestGCBasherWithZ 120000 > */ > + > +/* > + * @test TestGCBasherDeoptWithZ > + * @key gc stress > + * @library / > + * @requires vm.gc.Z > + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false > + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled > + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > + * -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ > 120000 > + */ > + > public class TestGCBasherWithZ { > public static void main(String[] args) throws IOException { > TestGCBasher.main(args); > ---------------------------------------------------------- > > cheers, > Per > > >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 23:42:52 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 23:42:52 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> Message-ID: <64351542-2e88-b918-025d-74456d507d1a@arm.com> Hi Erik, I'm scratching my head a little as to why I ventured into platform independent code. Anyhow, I've moved the code back to where it belongs, and that'll be in my next webrev. Thanks, Stuart On 27/03/2020 09:47, Erik ?sterlund wrote: > Hi Stuart, > > Thanks for sorting this out on AArch64. It is nice to see thatyou can implement these > barriers on platforms that do not have instruction cache coherency. > > One small change request: > It looks like in C1 you inject the entry barrier right after build_frame is done: > > 629 build_frame(); > 630 { > 631 // Insert nmethod entry barrier into frame. > 632 BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); > 633 bs->nmethod_entry_barrier(_masm); > 634 } > > Unfortunately, this is in the platform independent part of the LIR assembler. In the x86 version > we inject it at the very end of build_frame() instead, which is a platform-specific function. > The platform-specific function is in the C1 macro assembler file for that platform. > > We intentionally put it in the platform-specific path as it is a platform-specific feature. > Now on x86, the barrier code will be emitted once in build_frame() and once after returning > from build_frame, resulting in two nmethod entry barriers, and only the first one will get > patched, causing the second one to mostly take slow paths, which isn't necessarily wrong, > but will cause regressions. > > I would propose you just move those lines into the very end of the AArch64-specific part of > build_frame(). > > I don't need to see another webrev for that trivial code motion. This looks good to me. > Agan, thanks a lot for fixing this! It will allow me to go forward with concurrent stack > scanning on AArch64 as well. > > Thanks, > /Erik > > > On 2020-03-26 23:42, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From blazember at gmail.com Tue Mar 31 19:59:51 2020 From: blazember at gmail.com (Zoltan Baranyi) Date: Tue, 31 Mar 2020 21:59:51 +0200 Subject: JVM stalls around uncommitting Message-ID: <2000f65b-07a1-6b90-f065-ede64d9f9413@gmail.com> Hi ZGC Team, I run benchmarks against our application using ZGC on heaps in few hundreds GB scale. In the beginning everything goes smooth, but eventually I experience very long JVM stalls, sometimes longer than one minute. According to the JVM log, reaching safepoints occasionally takes very long time, matching to the duration of the stalls I experience. After a few iterations, I started looking at uncommitting and learned that the way ZGC performs uncommitting - flushing the pages, punching holes, removing blocks from the backing file - can be expensive [1] when uncommitting tens or more than a hundred GB of memory. The trace level heap logs confirmed that uncommitting blocks in this size takes many seconds. After disabled uncommitting my benchmark runs without the huge stalls and the overall experience with ZGC is quite good. Since uncommitting is done asynchronously to the mutators, I expected it not to interfere with them. My understanding is that flushing, bookeeping and uncommitting is done under a mutex [2], and contention on that can be the source of the stalls I see, such as when there is a demand to commit memory while uncommitting is taking place. Can you confirm if this above is an explanation that makes sense to you? If so, is there a cure to this that I couldn't find? Like a time bound or a cap on the amount of the memory that can be uncommitted in one go. This is an example log captured during a stall: [1778,704s][info ][safepoint] Safepoint "ZMarkStart", Time since last: 34394880194 ns, Reaching safepoint: 247308 ns, At safepoint: 339634 ns, Total: 586942 ns [1833,707s][trace][gc,heap ] Uncommitting memory: 459560M-459562M (2M) [...] [... zillions of continuous uncommitting log lines ...] [...] [1846,076s][trace][gc,heap ] Uncommitting memory: 84M-86M (2M) [1846,076s][info ][gc,heap ] Capacity: 528596M(86%)->386072M(63%), Uncommitted: 142524M [1846,076s][trace][gc,heap ] Uncommit Timeout: 1s [1846,078s][info ][safepoint] Safepoint "Cleanup", Time since last: 18001682918 ns, Reaching safepoint: 49371131055 ns, At safepoint: 252559 ns, Total: 49371383614 ns In the above case TTSP is 49s, while the uncommitting lines cover only 13s. The TTSP would indicate that the safepoint request was signaled at 1797s, but the log is empty between 1778s and 1883s. If my understanding above is correct, could it be that waiting for the mutex, flushing etc takes that much time and just not visible in the log? If needed, I can dig out more details since I can reliably reproduce the stalls. My environment is OpenJDK 14 running on Linux 5.2.9 with these arguments: "-Xmx600G -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseNUMA -XX:+AlwaysPreTouch -Xlog:gc,safepoint,gc+heap=trace:jvm.log". Best regards, Zoltan [1] https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp#L566-L573 [2] https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/share/gc/z/zPageAllocator.cpp#L685-L711