Stack walking performance issue
    Stefan Karlsson 
    stefan.karlsson at oracle.com
       
    Mon Mar 18 11:33:38 UTC 2019
    
    
  
On 2019-03-18 10:46, Amir Hadadi wrote:
> Thanks Stefan and Alexey!
> As the app is not loading more and more classes, I was not expecting not having class unloading to be an issue.
> Can you please explain what's the ResolvedMethodTable and what can cause it to fill up?
I'm not too familiar with that code, but it looks like it creates an 
entry in the table for each different java stack frame it encounters in 
the stack walking code.
There are comments about the ResolvedMethodName here:
http://hg.openjdk.java.net/jdk/jdk/file/5e1480a38a43/src/java.base/share/classes/java/lang/invoke/MemberName.java#l46
> Can it keep filling up even when classes are not being constantly loaded?
A GC that performs class unloading will actually do more cleaning work 
than unloading classes. Cleaning ResolvedMethodTable is one. Cleaning 
out unused entries in the StringTable is another.
The ResolvedMethodTable holds "weak" pointers to ResolvedMethodNames 
java objects. If a ResolvedNMethodName is not strongly reachable, the GC 
will remove that entry (if class unloading is enabled). The 
ResolvedMethodName is installed in MemberName java objects (see code 
above). The stack frame code installs the MemberName object in 
StackFrameInfo.memberName here:
void java_lang_StackFrameInfo::set_method_and_bci(Handle stackFrame, 
const methodHandle& method, int bci, TRAPS) {
...
   // Create and install ResolvedMethodName in ResolvedMethodTable
   CallInfo info(method(), ik, CHECK);
   // Installs ResolveMethodName in MemberName
   MethodHandles::init_method_MemberName(mname, info);
And from your stack trace, the StackFrameInfo objects gets fetched here:
  java_lang_StackFrameInfo::set_method_and_bci(Handle, methodHandle 
const&, int, Thread*)
  JavaFrameStream::fill_frame(int, objArrayHandle, methodHandle const&, 
Thread*)
  StackWalk::fill_in_frames(long, BaseFrameStream&, int, int, 
objArrayHandle, int&, Thread*)
  StackWalk::fetchNextBatch(Handle, long, long, int, int, 
objArrayHandle, Thread*)
  JVM_MoreStackWalk
  java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames
I guess that these StackFrameInfos are used further down in that code, 
and when they're not needed any more they will be dropped from the 
object graph. When that happens there will be no other path to the 
MemberName, and no path to the ResolvedMethodName, and the next class 
unloading GC will clear the entry in the ResolvedMethodTable.
In ZGC (in JDK 11), we actually do clean out the "weak" pointers in the 
ResolvedMethodTable, but we do not remove the entries. Therefore, if you 
do a lot of stack walking and throw away the StackFrameInfos, the 
ResolvedMethodTable will fill up with cleared entries pointing to NULL.
So, in short, using the stack walking API with the JDK 11 version of 
ZGC, has the performance problem you've reported.
> 
> I will run the experiments you suggested and report back.
Thanks!
StefanK
> ________________________________
> From: Stefan Karlsson <stefan.karlsson at oracle.com>
> Sent: Monday, March 18, 2019 10:54 AM
> To: Amir Hadadi; zgc-dev at openjdk.java.net
> Subject: Re: Stack walking performance issue
> 
> Hi Amir,
> 
> Thanks for testing ZGC and reporting back this issue.
> 
> The stack trace indicates that the JVM is performing lookups in the
> ResolvedMethodTable:
> 
>    ResolvedMethodTable::lookup(int, unsigned int, Method*)
>    java_lang_invoke_ResolvedMethodName::find_resolved_method(methodHandle
> const&, Thread*)
> CallInfo::CallInfo(Method*, Klass*, Thread*)
> 
> This table has a small, fixed size:
> 
>     enum Constants {
>       _table_size  = 1007
>     };
> 
> and it gets cleaned out during class unloading.
> 
> ZGC doesn't have class unloading in JDK 11. In JDK 12, which is about to
> go GA tomorrow [1], we've implemented concurrent class unloading for
> ZGC. So, that version will clean out the ResolvedMethodTable.
> 
> It would be good to verify that this is indeed the problem. Two things
> that could be tested:
> 
> 1) Run with G1 and class unloading disabled with -XX:-ClassUnloading. If
> this is showing the same problem as ZGC, then we're on the right track.
> 
> 2) Try with ZGC on JDK 12.
> 
> When running with class unloading, you could also try to add
> membername+table=debug to the -Xlog flag. For example:
> 
>    -Xlog:gc*,membername+table=debug
> 
> This will print the following line every time the GC cleans the
> ResolvedMemberTable during class unloading:
> 
>     log_debug(membername, table) ("ResolvedMethod entries counted %d
> removed %d", _oops_counted, _oops_removed);
> 
> This will show us how full this table is, and if it gets cleaned a lot.
> If it's heavily populated even with class unloading, we probably need to
> change it's implementation to use the new ConcurrentHashTable.
> 
> Thanks,
> StefanK
> 
> [1] https://openjdk.java.net/projects/jdk/12/
> 
> On 2019-03-17 15:43, Amir Hadadi wrote:
>> We've encountered the following performance issue which happens on an instance deployed in docker with ubuntu 16.04, Linux kernel 4.4.0-92-generic and OpenJDK 11.0.2.
>> The issue shows up with zgc but does not show up with G1.
>>
>> During a period of 10 days after deployment, CPU usage goes up steadily at a rate of ~10% per day. Eventually we have to restart all instances.
>> I profiled our app after 4 days of uptime using async-profiler and found that the following stack is the most frequent stack: https://gist.github.com/amirhadadi/48b6f84e3b2412124e817a50608e6ddd
>> I tried restarting the instance and waited 10 minutes before profiling, and stack walking shows up much less in sampling: https://gist.github.com/amirhadadi/0c43b087b9bfd995119a97cbf3557d21
>> This is how the stack walk looks when profiling an instance deployed with g1 after 3 days: https://gist.github.com/amirhadadi/224c33a19bfd9ea8dcc264cefc641496
>>
>> Please help me figure this one out.
>>
>>
>>
>>
>>
    
    
More information about the zgc-dev
mailing list