Massive slowdown in regex processing while GC log fills with ICBufferFull messages

David Holmes david.holmes at oracle.com
Mon Aug 1 01:16:54 UTC 2022


https://bugs.openjdk.org/browse/JDK-8291596

David

On 1/08/2022 10:58 am, David Holmes wrote:
> Hi Paul,
> 
> Redirecting to hotspot-dev as this issue is not likely to get much 
> visibility on jdk-updates-dev.
> 
> Probably worth filing an issue in bugs.openjdk.org if you can.
> 
> Cheers,
> David
> 
> 
> On 1/08/2022 10:37 am, Paul Wagland wrote:
>> Hi all,
>>
>> We have been running into a problem with Java 17.0.1, 17.0.2, and now 
>> 17.0.3.
>>
>> # java --version
>> openjdk 17.0.3 2022-04-19
>> OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
>> OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, 
>> sharing)
>>
>>
>> The issue that we are seeing is very similar in behaviour to 
>> https://bugs.openjdk.org/browse/JDK-8277212 
>> <https://bugs.openjdk.org/browse/JDK-8277212>, with a couple of 
>> notable exceptions:
>> 1. We are seeing this issue in 17.0.3, whereas JDK-8277212 was 
>> resolved in either 17.0.2 or 17.0.3-oracle.
>> 2. We are using G1GC, not ZGC
>>
>> Unfortunately, we don’t know how to reproduce this issue, we are 
>> seeing it happen about once a week across our fleet. We have about 300 
>> instances of Java running various different workloads.
>>
>> What happens is that on a machine where this occurs, after a period of 
>> time we see CPU usage suddenly spike, and the gc logs suddenly start 
>> getting flooded with ICBufferFull messages, at the rate of about 430 a 
>> second. When doing thread dumps to see what is happening at that time, 
>> we find most threads end up very deep in a regex processing stack, 
>> unfortunately we have one of these fairly central to the product, so 
>> this ends up affecting most threads. It seems that almost everything 
>> that isn’t thread parsing operates at, or near to, full speed. This is 
>> one of the reasons that this feels similar to JDK-8277212, since that 
>> was mashing the VTABLES, and because of the way that regexes work, 
>> they would appear to use a lot of VTABLE lookups, so any slowdown in 
>> that area, is likely to affect them greatly. Note that this is pure 
>> supposition, and I would love to know if there is any way if we can 
>> prove, or disprove this, either via testing, or additional logging.
>>
>> Unfortunately, as mentioned above, we are not able to reproduce this 
>> in our test environments, and we are only seeing this in customer 
>> environments, and then not with any clearly identifiable cause. We 
>> have seen the issue about a dozen times, over 17.0.1, 17.0.2, and 
>> recently, 17.0.3.
>>
>> We have done some performance testing, and during the slow-down we 
>> get, for our regex processing test:
>>
>> 10 iterations of testString 1 took 10700ms (1070004460ns/match) 
>> (matched=false)
>> 10 iterations of testString 2 took 8965ms (896560879ns/match) 
>> (matched=true)
>>
>> Normal timings looks like:
>> 18503 iterations of testString 1 took 1000ms (54047ns/match) 
>> (matched=false)
>> 40954 iterations of testString 2 took 1000ms (24417ns/match) 
>> (matched=true)
>>
>> So that represents a slowdown of roughly 20,000 times!
>>
>> We have another test that is mostly map lookup heavy, and during the 
>> high CPU it has:
>> doMapOperations:2:31076ns
>> doMapOperations:3:16951ns
>> doMapOperations:4:14192ns
>>
>> While normally it has:
>> doMapOperations:2:27392ns
>> doMapOperations:3:7987ns
>> doMapOperations:4:6079ns
>>
>> So a slowdown of maybe a factor of 2, which is perfectly explainable 
>> by the very high CPU/system load.
>>
>> Our startup arguments are:
>>
>> JVM startup arguments:
>> -javaagent:/opt/jmx-exporter/jmx-exporter.jar=9404:/opt/jmx-exporter/config.yaml 
>>
>> --add-opens=java.base/java.lang=ALL-UNNAMED
>> --add-opens=java.base/java.io  <http://java.io>=ALL-UNNAMED
>> --add-opens=java.base/java.util=ALL-UNNAMED
>> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
>> --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
>> -Dnop
>> -Djava.security.egd=file:/dev/./urandom
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/opt/…/logs
>> -Djava.awt.headless=true
>> -XX:+PrintFlagsFinal
>> -DRP_CONTEXTPATH=
>> -DRP_UNPACKWAR=true
>> -XX:FlightRecorderOptions=stackdepth=256
>> -XX:+UnlockDiagnosticVMOptions
>> -XX:+DebugNonSafepoints
>> -Xlog:gc*,safepoint=info:file=/opt/…/logs/verbosegc.log:time,uptimemillis:filecount=10,filesize=50M 
>>
>> -Xlog:class*=info:file=/opt/…/logs/classload.log:time,uptimemillis,level,tags:filecount=10,filesize=50M 
>>
>> -XX:+UseG1GC
>> -XX:G1PeriodicGCInterval=10000
>> -XX:+ParallelRefProcEnabled
>> -XX:MaxDirectMemorySize=25M
>> -XX:MaxMetaspaceSize=300M
>> -XX:MetaspaceSize=100M
>> -Xss1024K
>> -XX:ReservedCodeCacheSize=150M
>> -XX:+SegmentedCodeCache
>> -Xmx862m
>> -XX:MaxMetaspaceFreeRatio=20
>> -XX:MinMetaspaceFreeRatio=5
>> -XX:MaxHeapFreeRatio=40
>> -XX:MinHeapFreeRatio=5
>> -Djdk.nio.maxCachedBufferSize=8192
>> -XX:SoftMaxHeapSize=646m
>> -Dlog4j2.disableJmx=true
>> -Djdk.tls.ephemeralDHKeySize=2048
>> -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
>> -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
>> -Dignore.endorsed.dirs=
>> -Dcatalina.base=/opt/…/server1
>> -Dcatalina.home=/opt/…/tomcat
>> -Djava.io.tmpdir=/opt/…/temp
>>
>> Can anyone think of anything that we can do to ameliorate this 
>> problem, apart from not using regexes, which we are moving towards, 
>> but I think that they are the victim here, not the cause of the 
>> problem, since in normal use, they are “fast enough”.
>>
>> If there is any information that I can add, please let me know!
>>
>> Cheers,
>> Paul


More information about the hotspot-dev mailing list