Stack walking performance issue

Amir Hadadi amirhadadi at hotmail.com
Mon Mar 18 07:19:57 UTC 2019


Seems like some instances of the application are affected more than others, though all are showing upward CPU trend.
These are the top stack traces for one of the worst instances, where stack walking is now eating more than 50% CPU : https://gist.github.com/amirhadadi/31cf546c063fa351feefb918ba0ba8ed
________________________________
From: Amir Hadadi <amirhadadi at hotmail.com>
Sent: Sunday, March 17, 2019 11:18 PM
To: zgc-dev at openjdk.java.net
Subject: Re: Stack walking performance issue

Well, there's a different org.apache.logging.log4j.util.StackLocator class which is geared for Java 9: https://github.com/apache/logging-log4j2/blob/ef22be05ae037350836c1dfbaefa4a7560fbb1e8/log4j-api-java9/src/main/java/org/apache/logging/log4j/util/StackLocator.java

That one is using Java 9 StackWalker API, is not deprecated and is the one actually used in my case.
log4j is working fine BTW.
________________________________
From: Peter Booth <peter_booth at me.com>
Sent: Sunday, March 17, 2019 10:46 PM
To: Amir Hadadi
Cc: zgc-dev at openjdk.java.net
Subject: Re: Stack walking performance issue

The comments within org.apache.logging.log4j.util.StackLocator say that the class is deprecated in java 8 and doesn’t work in java 9. Do you see *any* log4j output? If it were me, I’d disable log4j.

Sent from my iPhone

> On Mar 17, 2019, at 10:43 AM, Amir Hadadi <amirhadadi at hotmail.com> wrote:
>
> We've encountered the following performance issue which happens on an instance deployed in docker with ubuntu 16.04, Linux kernel 4.4.0-92-generic and OpenJDK 11.0.2.
> The issue shows up with zgc but does not show up with G1.
>
> During a period of 10 days after deployment, CPU usage goes up steadily at a rate of ~10% per day. Eventually we have to restart all instances.
> I profiled our app after 4 days of uptime using async-profiler and found that the following stack is the most frequent stack: https://gist.github.com/amirhadadi/48b6f84e3b2412124e817a50608e6ddd
> I tried restarting the instance and waited 10 minutes before profiling, and stack walking shows up much less in sampling: https://gist.github.com/amirhadadi/0c43b087b9bfd995119a97cbf3557d21
> This is how the stack walk looks when profiling an instance deployed with g1 after 3 days: https://gist.github.com/amirhadadi/224c33a19bfd9ea8dcc264cefc641496
>
> Please help me figure this one out.
>
>
>
>
>



More information about the zgc-dev mailing list