Slow performance of StackWalker.getCallerClass() vs Reflection.getCallerClass()

Ralph Goers ralph.goers at dslextreme.com
Wed Jul 3 01:57:42 UTC 2019


Thanks Mandy, 

It seems I commented on the thread mentioned in the issue you linked to. Unfortunately, it doesn’t look like any work has been done on the issue.in the last 18 months. 

Yes, LogRecord doesn’t get the StackTraceElement. We only get it for the one stack entry we are interested in and we only do that because Log4j is still compatible with Java 7 and creating our own class would be too disruptive. Still, the cost seems to be in locating the correct frame, not creating the StackTraceElement. I don’t have an account in the bug tracking system or I would comment on the bug there instead of here. But it seems like adding toArray() and toArray(int frames) wouldn’t be very hard and has to perform better than Stream.toArray. While it was a nice idea to use Streams to deal with the stack trace, in practical terms almost everything that needs to do it is going to be extremely performance sensitive and is going to want to traverse it as fast as possible.  

Ralph

> On Jul 2, 2019, at 4:12 PM, Mandy Chung <mandy.chung at oracle.com> wrote:
> 
> Hi Ralph,
> 
> Thanks for this info.  
> 
> Quick comments: LogRecord does not get the line number nor StackTraceElement.  There is cost to construct the string-based StackTraceElement or get the line number mapped from BCI.   And it is orthogonal to StackWalker::getCallerClass that is only interested in the Class object. 
> 
> You may also be interesting in https://bugs.openjdk.java.net/browse/JDK-8189752 <https://bugs.openjdk.java.net/browse/JDK-8189752> to take a snapshot of a stack trace (possibly the top N frames).
> 
> Mandy
> 
> On 7/2/19 2:49 PM, Ralph Goers wrote:
>> The timing of this question is perfect. I have been doing some testing over the last week to address https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644> and found some interesting things - although they are related to the walk() method, not getCallerClass().
>> 1. Using skip(n) to find a frame a couple frames up is pretty fast but isn’t too much faster than finding the same element using new Throwable().getStackTrace() was in Java 8.
>> 2. The cost of walking the stack becomes much more costly as the number of elements needing to be walked increases.
>> 3. The most shocking to me was that the fastest way to traverse a stack trace is to use a Function that immediately converts the Stream to an array and then use an old style for loop to traverse it. However, doing this is incredibly awkward because StackWalker only supports streams so there is no good way to pass the value being searched for into the Function. I had to resort to storing it in a ThreadLocal. Having a toArray() method on StackWalker would be a lot nicer, especially if I could limit the number of frames retrieved. I should note that java.util.logging.LogRecord uses a Filter to walk the stack which is faster than the stream methods I was originally using, but is much slower than what I ended up with. 
>> 
>> As for the issue mentioned here, I believe I reported that getCallerClass was much slower than the Reflection class in Java 9 and opened a bug here. As I recall that was addressed and I believe I verified that fix but it probably wouldn’t hurt for me to do it again.
>> 
>> Ralph
>> 
>>> On Jul 2, 2019, at 10:48 AM, Mandy Chung <mandy.chung at oracle.com <mailto:mandy.chung at oracle.com>> wrote:
>>> 
>>> MethodHandles::lookup is optimized (@ForceInline) and so it may not
>>> represent apple-to-apple comparison.StackWalker::getCallerClass
>>> does have overhead compared to Reflection::getCallerClass and
>>> need to get the microbenchmark in the jdk repo and rerun the numbers [1].
>>> 
>>> I'm not getting how getCallerClass is used and related to access check.
>>> Can you elaborate?
>>> 
>>> Mandy
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8221623 <https://bugs.openjdk.java.net/browse/JDK-8221623>
>>> 
>>> 
>>> On 7/2/19 6:07 AM, Kasper Nielsen wrote:
>>>> Hi Remi,
>>>> 
>>>> Yes, setting up a StackWalker is more or less free. It is just
>>>> wrapping a set of options.
>>>> 
>>>> public class StackWalkerPerf {
>>>> 
>>>>     static final StackWalker sw =
>>>> StackWalker.getInstance(Option.RETAIN_CLASS_REFERENCE);
>>>> 
>>>>     @Benchmark
>>>>     public StackWalker stackWalkerSetup() {
>>>>         return StackWalker.getInstance(Option.RETAIN_CLASS_REFERENCE);
>>>>     }
>>>> 
>>>>     @Benchmark
>>>>     public Class<?> stackWalkerCallerClass() {
>>>>         return sw.getCallerClass();
>>>>     }
>>>> 
>>>>     @Benchmark
>>>>     public Lookup reflectionCallerClass() {
>>>>         return MethodHandles.lookup();
>>>>     }
>>>> }
>>>> 
>>>> Benchmark                               Mode  Cnt     Score    Error  Units
>>>> StackWalkerPerf.stackWalkerSetup        avgt   10    11.958 ±  0.353  ns/op
>>>> StackWalkerPerf.reflectionCallerClass   avgt   10     8.511 ±  0.415  ns/op
>>>> StackWalkerPerf.stackWalkerCallerClass  avgt   10  1269.825 ± 66.471  ns/op
>>>> 
>>>> I'm using MethodHandles.lookup() in this test because it is cheapest
>>>> way to invoke Reflection.getCallerClass() without any tricks.
>>>> So real performance is likely better.
>>>> 
>>>> /Kasper
>>>> 
>>>> On Tue, 2 Jul 2019 at 13:53, Remi Forax <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>> wrote:
>>>>> Hi Kasper,
>>>>> did you store the StackWalker instance in a static final field ?
>>>>> 
>>>>> Rémi
>>>>> 
>>>>> ----- Mail original -----
>>>>>> De: "Kasper Nielsen" <kasperni at gmail.com <mailto:kasperni at gmail.com>>
>>>>>> À: "core-libs-dev" <core-libs-dev at openjdk.java.net <mailto:core-libs-dev at openjdk.java.net>>
>>>>>> Envoyé: Mardi 2 Juillet 2019 11:09:11
>>>>>> Objet: Slow performance of StackWalker.getCallerClass() vs Reflection.getCallerClass()
>>>>>> Hi,
>>>>>> 
>>>>>> Are there any security reasons for why StackWalker.getCallerClass()
>>>>>> cannot be made as performant as Reflection.getCallerClass()?
>>>>>> StackWalker.getCallerClass() is at least 100 times slower then
>>>>>> Reflection.getCallerClass() (~1000 ns/op vs ~10 ns/op).
>>>>>> 
>>>>>> I'm trying to retrofit some existing APIs where I cannot take a Lookup
>>>>>> object to do some access control checks.
>>>>>> But the performance of StackWalker.getCallerClass() is making it impossible.
>>>>>> 
>>>>>> Best
>>>>>>  Kasper
>>> 
>>> 
>> 
> 



More information about the core-libs-dev mailing list