Isolating setup of StackWalker test call stack in JMH?

Wed Aug 16 04:52:00 UTC 2023

Invoking nanoTime twice per operation, you will get large nanoTime overhead.
https://shipilev.net/blog/2014/nanotrusting-nanotime/
Not sure that this overhead is smaller than the building up of the call stack (at some degree).
By the way, have you looked how Mode.SampleTime works. In this case, JMH measures the time only of SOME operations (not all of them). It's too expensive to measure the time of each operation.

May I ask you - why do you need to move stack construction into "setup"?

________________________________________
From: jmh-jdk-microbenchmarks-dev <jmh-jdk-microbenchmarks-dev-retn at openjdk.org> on behalf of Brent Christian <brent.christian at oracle.com>
Sent: Tuesday, August 15, 2023 4:53 PM
To: jmh-dev at openjdk.org; jmh-jdk-microbenchmarks-dev at openjdk.org
Subject: Isolating setup of StackWalker test call stack in JMH?

Greetings, JMH users and enthusiasts.

The JDK microbenchmarks include benchmarks to measure the performance of java.lang.StackWalker operations. A 'depth' @Param is used to test various call stack depths. See:

https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/StackWalkBench.java

The TestStack nested class (L69) builds up 'n' method calls, and then run()s the SW method under test once n=depth calls are on the call stack. The issue is that this building up of the call stack is "setup" code, but is executed during the benchmark, influencing the scoring. In particular, it skews the StackWalker.getCallerClass() results, which should perform the same regardless of stack depth. E.g.:

Benchmark                      (depth)  Mode  Cnt     Score   Error  Units
StackWalkBench.getCallerClass        4  avgt   15   445.778 ± 8.920  ns/op
StackWalkBench.getCallerClass      100  avgt   15   496.285 ± 6.631  ns/op
StackWalkBench.getCallerClass     1000  avgt   15  1563.743 ± 7.601  ns/op

Because of the unique needs of setting up a StackWalker call at an artificial stack depth, the setup doesn't seem like something that could be moved into a @Setup method. I looked through the JMH samples, and the only promising thing I found was @AuxCounter. Maybe I could add an AuxCounter to separately report *just* the duration spent in the StackWalker call? (Not ideal, but perhaps it's all there is.)

So I gave that a try. The changes are here:
https://github.com/bchristi-git/jdk/compare/master...bchristi-git:jdk:swBench

However the results are not what I expected:

Benchmark                                   (depth)  Mode  Cnt        Score        Error  Units
StackWalkBench.getCallerClass                     4  avgt   10      983.959 ±     10.895  ns/op
StackWalkBench.getCallerClass:benchNanoDur        4  avgt   10  1073151.585 ± 123551.526  ns/op
StackWalkBench.getCallerClass                   100  avgt   10     1063.878 ±     16.790  ns/op
StackWalkBench.getCallerClass:benchNanoDur      100  avgt   10  1085369.870 ±  87001.613  ns/op
StackWalkBench.getCallerClass                  1000  avgt   10     2143.571 ±     17.697  ns/op
StackWalkBench.getCallerClass:benchNanoDur     1000  avgt   10   868953.256 ± 237607.736  ns/op

The reported benchNanoDur score is much greater than I anticipated.

I think don't have a full grasp of how AuxCounters are calculated or meant to be used. I'd appreciate any hints on what I might be doing wrong. :)

(Or, perhaps, other ideas on how JMH could separate out the call-stack-building-setup for this benchmark, if such a thing is possible.)

Much appreciated,
-Brent