Massive slowdown in regex processing while GC log fills with ICBufferFull messages

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Tue Sep 20 18:14:19 UTC 2022


Hi Paul,

What you are seeing may be related to unwanted nmethod sweeper activity.
When space in code cache becomes tight, sweeper may switch into 
aggressive mode and start invalidating nmethods triggering massive code 
recompilation. It will trigger repeated inline cache invalidations and 
eventually lead to ICBufferFull-related GCs.

How to diagnose:
   - -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput 
-XX:-DisplayVMOutput -XX:LogFile=hotspot.log -XX:+PrintCompilation

Things to watch for in the output:


   (1) continuous and repeated massive "made not entrant"/"made zombie" 
events coming from the sweeper thread;


   (2) multiple "made not compilable" events caused by continuous 
recompilation:
made not compilable on level 4  ...   decompile_count > 
PerMethodRecompilationCutoff



Also, sample code cache utilization using jcmd before and after the 
problem occurs:

$ jcmd <pid> Compiler.codecache
<pid>:
CodeHeap 'non-profiled nmethods': size=120028Kb used=9159Kb 
max_used=9159Kb free=110868Kb
  bounds [0x0000000018215000, 0x0000000018b15000, 0x000000001f74c000]
CodeHeap 'profiled nmethods': size=120028Kb used=23937Kb 
max_used=23937Kb free=96090Kb
  bounds [0x0000000010cde000, 0x000000001244e000, 0x0000000018215000]
CodeHeap 'non-nmethods': size=5704Kb used=1473Kb max_used=1514Kb free=4231Kb
  bounds [0x000000001074c000, 0x00000000109bc000, 0x0000000010cde000]
  total_blobs=10880 nmethods=10045 adapters=746
  compilation: enabled
               stopped_count=0, restarted_count=0
  full_count=0



I see you specify -XX:ReservedCodeCacheSize=150M. Try to increate it to 
reduce sweeper activity.


Best regards,
Vladimir Ivanov

On 7/31/22 23:28, Paul Wagland wrote:
> Hi David,
> 
> Thanks for creating that issue!
> 
> As to when we first saw it, we first saw this in 17.0.1, however that was also the first version that we widely deployed, prior to that we were using OpenJ9 Java 11. However, when moving to Java 17, we also moved back to hotspot as well.
> 
> Cheers,
> Paul
> 
>> On 1 Aug 2022, at 03:16, David Holmes <david.holmes at oracle.com> wrote:
>>
>> https://bugs.openjdk.org/browse/JDK-8291596
>>
>> David
>>
>> On 1/08/2022 10:58 am, David Holmes wrote:
>>> Hi Paul,
>>> Redirecting to hotspot-dev as this issue is not likely to get much visibility on jdk-updates-dev.
>>> Probably worth filing an issue in bugs.openjdk.org if you can.
>>> Cheers,
>>> David
>>> On 1/08/2022 10:37 am, Paul Wagland wrote:
>>>> Hi all,
>>>>
>>>> We have been running into a problem with Java 17.0.1, 17.0.2, and now 17.0.3.
>>>>
>>>> # java --version
>>>> openjdk 17.0.3 2022-04-19
>>>> OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
>>>> OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)
>>>>
>>>>
>>>> The issue that we are seeing is very similar in behaviour to https://bugs.openjdk.org/browse/JDK-8277212 <https://bugs.openjdk.org/browse/JDK-8277212>, with a couple of notable exceptions:
>>>> 1. We are seeing this issue in 17.0.3, whereas JDK-8277212 was resolved in either 17.0.2 or 17.0.3-oracle.
>>>> 2. We are using G1GC, not ZGC
>>>>
>>>> Unfortunately, we don’t know how to reproduce this issue, we are seeing it happen about once a week across our fleet. We have about 300 instances of Java running various different workloads.
>>>>
>>>> What happens is that on a machine where this occurs, after a period of time we see CPU usage suddenly spike, and the gc logs suddenly start getting flooded with ICBufferFull messages, at the rate of about 430 a second. When doing thread dumps to see what is happening at that time, we find most threads end up very deep in a regex processing stack, unfortunately we have one of these fairly central to the product, so this ends up affecting most threads. It seems that almost everything that isn’t thread parsing operates at, or near to, full speed. This is one of the reasons that this feels similar to JDK-8277212, since that was mashing the VTABLES, and because of the way that regexes work, they would appear to use a lot of VTABLE lookups, so any slowdown in that area, is likely to affect them greatly. Note that this is pure supposition, and I would love to know if there is any way if we can prove, or disprove this, either via testing, or additional logging.
>>>>
>>>> Unfortunately, as mentioned above, we are not able to reproduce this in our test environments, and we are only seeing this in customer environments, and then not with any clearly identifiable cause. We have seen the issue about a dozen times, over 17.0.1, 17.0.2, and recently, 17.0.3.
>>>>
>>>> We have done some performance testing, and during the slow-down we get, for our regex processing test:
>>>>
>>>> 10 iterations of testString 1 took 10700ms (1070004460ns/match) (matched=false)
>>>> 10 iterations of testString 2 took 8965ms (896560879ns/match) (matched=true)
>>>>
>>>> Normal timings looks like:
>>>> 18503 iterations of testString 1 took 1000ms (54047ns/match) (matched=false)
>>>> 40954 iterations of testString 2 took 1000ms (24417ns/match) (matched=true)
>>>>
>>>> So that represents a slowdown of roughly 20,000 times!
>>>>
>>>> We have another test that is mostly map lookup heavy, and during the high CPU it has:
>>>> doMapOperations:2:31076ns
>>>> doMapOperations:3:16951ns
>>>> doMapOperations:4:14192ns
>>>>
>>>> While normally it has:
>>>> doMapOperations:2:27392ns
>>>> doMapOperations:3:7987ns
>>>> doMapOperations:4:6079ns
>>>>
>>>> So a slowdown of maybe a factor of 2, which is perfectly explainable by the very high CPU/system load.
>>>>
>>>> Our startup arguments are:
>>>>
>>>> JVM startup arguments:
>>>> -javaagent:/opt/jmx-exporter/jmx-exporter.jar=9404:/opt/jmx-exporter/config.yaml
>>>> --add-opens=java.base/java.lang=ALL-UNNAMED
>>>> --add-opens=java.base/java.io  <http://java.io>=ALL-UNNAMED
>>>> --add-opens=java.base/java.util=ALL-UNNAMED
>>>> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
>>>> --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
>>>> -Dnop
>>>> -Djava.security.egd=file:/dev/./urandom
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=/opt/…/logs
>>>> -Djava.awt.headless=true
>>>> -XX:+PrintFlagsFinal
>>>> -DRP_CONTEXTPATH=
>>>> -DRP_UNPACKWAR=true
>>>> -XX:FlightRecorderOptions=stackdepth=256
>>>> -XX:+UnlockDiagnosticVMOptions
>>>> -XX:+DebugNonSafepoints
>>>> -Xlog:gc*,safepoint=info:file=/opt/…/logs/verbosegc.log:time,uptimemillis:filecount=10,filesize=50M
>>>> -Xlog:class*=info:file=/opt/…/logs/classload.log:time,uptimemillis,level,tags:filecount=10,filesize=50M
>>>> -XX:+UseG1GC
>>>> -XX:G1PeriodicGCInterval=10000
>>>> -XX:+ParallelRefProcEnabled
>>>> -XX:MaxDirectMemorySize=25M
>>>> -XX:MaxMetaspaceSize=300M
>>>> -XX:MetaspaceSize=100M
>>>> -Xss1024K
>>>> -XX:ReservedCodeCacheSize=150M
>>>> -XX:+SegmentedCodeCache
>>>> -Xmx862m
>>>> -XX:MaxMetaspaceFreeRatio=20
>>>> -XX:MinMetaspaceFreeRatio=5
>>>> -XX:MaxHeapFreeRatio=40
>>>> -XX:MinHeapFreeRatio=5
>>>> -Djdk.nio.maxCachedBufferSize=8192
>>>> -XX:SoftMaxHeapSize=646m
>>>> -Dlog4j2.disableJmx=true
>>>> -Djdk.tls.ephemeralDHKeySize=2048
>>>> -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
>>>> -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
>>>> -Dignore.endorsed.dirs=
>>>> -Dcatalina.base=/opt/…/server1
>>>> -Dcatalina.home=/opt/…/tomcat
>>>> -Djava.io.tmpdir=/opt/…/temp
>>>>
>>>> Can anyone think of anything that we can do to ameliorate this problem, apart from not using regexes, which we are moving towards, but I think that they are the victim here, not the cause of the problem, since in normal use, they are “fast enough”.
>>>>
>>>> If there is any information that I can add, please let me know!
>>>>
>>>> Cheers,
>>>> Paul
> 


More information about the hotspot-dev mailing list