From brent.christian at oracle.com Tue Aug 15 23:53:14 2023 From: brent.christian at oracle.com (Brent Christian) Date: Tue, 15 Aug 2023 16:53:14 -0700 Subject: Isolating setup of StackWalker test call stack in JMH? Message-ID: <8c066010-4418-b505-b32f-9fe408a8e3c7@oracle.com> Greetings, JMH users and enthusiasts. The JDK microbenchmarks include benchmarks to measure the performance of java.lang.StackWalker operations. A 'depth' @Param is used to test various call stack depths. See: https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/StackWalkBench.java The TestStack nested class (L69) builds up 'n' method calls, and then run()s the SW method under test once n=depth calls are on the call stack. The issue is that this building up of the call stack is "setup" code, but is executed during the benchmark, influencing the scoring. In particular, it skews the StackWalker.getCallerClass() results, which should perform the same regardless of stack depth. E.g.: Benchmark (depth) Mode Cnt Score Error Units StackWalkBench.getCallerClass 4 avgt 15 445.778 ? 8.920 ns/op StackWalkBench.getCallerClass 100 avgt 15 496.285 ? 6.631 ns/op StackWalkBench.getCallerClass 1000 avgt 15 1563.743 ? 7.601 ns/op Because of the unique needs of setting up a StackWalker call at an artificial stack depth, the setup doesn't seem like something that could be moved into a @Setup method. I looked through the JMH samples, and the only promising thing I found was @AuxCounter. Maybe I could add an AuxCounter to separately report *just* the duration spent in the StackWalker call? (Not ideal, but perhaps it's all there is.) So I gave that a try. The changes are here: https://github.com/bchristi-git/jdk/compare/master...bchristi-git:jdk:swBench However the results are not what I expected: Benchmark (depth) Mode Cnt Score Error Units StackWalkBench.getCallerClass 4 avgt 10 983.959 ? 10.895 ns/op StackWalkBench.getCallerClass:benchNanoDur 4 avgt 10 1073151.585 ? 123551.526 ns/op StackWalkBench.getCallerClass 100 avgt 10 1063.878 ? 16.790 ns/op StackWalkBench.getCallerClass:benchNanoDur 100 avgt 10 1085369.870 ? 87001.613 ns/op StackWalkBench.getCallerClass 1000 avgt 10 2143.571 ? 17.697 ns/op StackWalkBench.getCallerClass:benchNanoDur 1000 avgt 10 868953.256 ? 237607.736 ns/op The reported benchNanoDur score is much greater than I anticipated. I think don't have a full grasp of how AuxCounters are calculated or meant to be used. I'd appreciate any hints on what I might be doing wrong. :) (Or, perhaps, other ideas on how JMH could separate out the call-stack-building-setup for this benchmark, if such a thing is possible.) Much appreciated, -Brent From sergey.kuksenko at oracle.com Wed Aug 16 04:52:00 2023 From: sergey.kuksenko at oracle.com (Sergey Kuksenko) Date: Wed, 16 Aug 2023 04:52:00 +0000 Subject: Isolating setup of StackWalker test call stack in JMH? In-Reply-To: <8c066010-4418-b505-b32f-9fe408a8e3c7@oracle.com> References: <8c066010-4418-b505-b32f-9fe408a8e3c7@oracle.com> Message-ID: Invoking nanoTime twice per operation, you will get large nanoTime overhead. https://shipilev.net/blog/2014/nanotrusting-nanotime/ Not sure that this overhead is smaller than the building up of the call stack (at some degree). By the way, have you looked how Mode.SampleTime works. In this case, JMH measures the time only of SOME operations (not all of them). It's too expensive to measure the time of each operation. May I ask you - why do you need to move stack construction into "setup"? ________________________________________ From: jmh-jdk-microbenchmarks-dev on behalf of Brent Christian Sent: Tuesday, August 15, 2023 4:53 PM To: jmh-dev at openjdk.org; jmh-jdk-microbenchmarks-dev at openjdk.org Subject: Isolating setup of StackWalker test call stack in JMH? Greetings, JMH users and enthusiasts. The JDK microbenchmarks include benchmarks to measure the performance of java.lang.StackWalker operations. A 'depth' @Param is used to test various call stack depths. See: https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/StackWalkBench.java The TestStack nested class (L69) builds up 'n' method calls, and then run()s the SW method under test once n=depth calls are on the call stack. The issue is that this building up of the call stack is "setup" code, but is executed during the benchmark, influencing the scoring. In particular, it skews the StackWalker.getCallerClass() results, which should perform the same regardless of stack depth. E.g.: Benchmark (depth) Mode Cnt Score Error Units StackWalkBench.getCallerClass 4 avgt 15 445.778 ? 8.920 ns/op StackWalkBench.getCallerClass 100 avgt 15 496.285 ? 6.631 ns/op StackWalkBench.getCallerClass 1000 avgt 15 1563.743 ? 7.601 ns/op Because of the unique needs of setting up a StackWalker call at an artificial stack depth, the setup doesn't seem like something that could be moved into a @Setup method. I looked through the JMH samples, and the only promising thing I found was @AuxCounter. Maybe I could add an AuxCounter to separately report *just* the duration spent in the StackWalker call? (Not ideal, but perhaps it's all there is.) So I gave that a try. The changes are here: https://github.com/bchristi-git/jdk/compare/master...bchristi-git:jdk:swBench However the results are not what I expected: Benchmark (depth) Mode Cnt Score Error Units StackWalkBench.getCallerClass 4 avgt 10 983.959 ? 10.895 ns/op StackWalkBench.getCallerClass:benchNanoDur 4 avgt 10 1073151.585 ? 123551.526 ns/op StackWalkBench.getCallerClass 100 avgt 10 1063.878 ? 16.790 ns/op StackWalkBench.getCallerClass:benchNanoDur 100 avgt 10 1085369.870 ? 87001.613 ns/op StackWalkBench.getCallerClass 1000 avgt 10 2143.571 ? 17.697 ns/op StackWalkBench.getCallerClass:benchNanoDur 1000 avgt 10 868953.256 ? 237607.736 ns/op The reported benchNanoDur score is much greater than I anticipated. I think don't have a full grasp of how AuxCounters are calculated or meant to be used. I'd appreciate any hints on what I might be doing wrong. :) (Or, perhaps, other ideas on how JMH could separate out the call-stack-building-setup for this benchmark, if such a thing is possible.) Much appreciated, -Brent From brent.christian at oracle.com Fri Aug 25 23:43:52 2023 From: brent.christian at oracle.com (Brent Christian) Date: Fri, 25 Aug 2023 16:43:52 -0700 Subject: Isolating setup of StackWalker test call stack in JMH? In-Reply-To: References: <8c066010-4418-b505-b32f-9fe408a8e3c7@oracle.com> Message-ID: <02fbfdec-8fcb-3c1e-7103-477a3334dd1f@oracle.com> Thanks for the response, Sergey. On 8/15/23 9:52 PM, Sergey Kuksenko wrote: > Invoking nanoTime twice per operation, you will get large nanoTime overhead. > https://shipilev.net/blog/2014/nanotrusting-nanotime/ That's very interesting. > May I ask you - why do you need to move stack construction into "setup"? I originally wanted StackWalkBench.getCallerClass() to be able to confirm that StackWalker.getCallerClass() performs the same, regardless of call stack depth. The benchmark first makes method calls to create a call stack depth of `depth` (which I don't want to be part of the measurement) before calling getCallerClass(). At this point, StackWalker is mature enough that we're confident that getCallerClass() behaves/performs as expected regardless of call stack depth. So in PR 15370[1], the benchmark will simplify the benchmark to just call getCallerClass(), without adding to the call stack. Thanks, -Brent 1. https://github.com/openjdk/jdk/pull/15370 From sergey.kuksenko at oracle.com Sat Aug 26 03:58:23 2023 From: sergey.kuksenko at oracle.com (Sergey Kuksenko) Date: Sat, 26 Aug 2023 03:58:23 +0000 Subject: Isolating setup of StackWalker test call stack in JMH? In-Reply-To: <02fbfdec-8fcb-3c1e-7103-477a3334dd1f@oracle.com> References: <8c066010-4418-b505-b32f-9fe408a8e3c7@oracle.com> <02fbfdec-8fcb-3c1e-7103-477a3334dd1f@oracle.com> Message-ID: You may write a simple baseline benchmark, just constructing call stack. Something like this: @Benchmark public void forEach_makeCallStack(Blackhole bh) { final Blackhole localBH = bh; final boolean[] done = {false}; new TestStack(depth, new Runnable() { public void run() { done[0] = true; } }).start(); if (!done[0]) { throw new RuntimeException(); } } And check the difference between this baseline and the corresponding "StackWalker.getCallerClass()". ________________________________________ From: Brent Christian Sent: Friday, August 25, 2023 4:43 PM To: Sergey Kuksenko; jmh-dev at openjdk.org; jmh-jdk-microbenchmarks-dev at openjdk.org Subject: Re: Isolating setup of StackWalker test call stack in JMH? Thanks for the response, Sergey. On 8/15/23 9:52 PM, Sergey Kuksenko wrote: > Invoking nanoTime twice per operation, you will get large nanoTime overhead. > https://shipilev.net/blog/2014/nanotrusting-nanotime/ That's very interesting. > May I ask you - why do you need to move stack construction into "setup"? I originally wanted StackWalkBench.getCallerClass() to be able to confirm that StackWalker.getCallerClass() performs the same, regardless of call stack depth. The benchmark first makes method calls to create a call stack depth of `depth` (which I don't want to be part of the measurement) before calling getCallerClass(). At this point, StackWalker is mature enough that we're confident that getCallerClass() behaves/performs as expected regardless of call stack depth. So in PR 15370[1], the benchmark will simplify the benchmark to just call getCallerClass(), without adding to the call stack. Thanks, -Brent 1. https://github.com/openjdk/jdk/pull/15370