Request for discussion: rewrite invokeinterface dispatch, JMH benchmark

Wed Oct 9 09:18:14 UTC 2024

Hello Andrew,

Your observations are quite interesting. If you remember
https://github.com/openjdk/jdk/pull/13460, example micro-benchmark
improvements for x86 were ~10% and only ~3% in Naive Bayes. Those results
greatly varied depending on the CPU microarchitecture and some CPUs also
showed almost no difference (some data is in the PR). However, we did not
complicate predictability in benchmarks. Also, other micro-benchmarks were
not too sensitive to the change at all. It is hard to say what are the
patterns in applications where dispatching has significant weight (which
can be as high as 15%), but won't be surprising to meet both
predictable and not predictable scenarios. It might be that the predictable
case is more common but the non-predictable case brings more pain when it
happens.

-Dmitry

On Mon, Oct 7, 2024 at 7:19 PM Andrew Haley <aph-open at littlepinkcloud.com>
wrote:

> I've been looking at rewriting invokeinterface, with a view to making
> it more efficient and predictable on today's hardware, hopefully (near)
> O(1) execution time. Also, we (again, hopefully) wouldn't need to
> dynamically generate and manage itable stubs.
>
> I've been trying a few approaches and don't yet have anything ready to
> present, but I've come across an interesting anomaly in our
> benchmarking. No matter what I did, and however bad my experiment, the
> performance barely changed at all! It was as though my changes were
> doing nothing, but eyeballing the generated code showed it was
> different.
>
> org.openjdk.bench.vm.compiler.InterfaceCalls.test2ndInt5Types looks
> like this:
>
>      as[0] = new FirstClass();
>      as[1] = new SecondClass();
>      as[2] = new ThirdClass();
>      as[3] = new FourthClass();
>      as[4] = new FifthClass();
>
>      // ...
>
>      int l = 0;
>
>      @Benchmark
>      public int test2ndInt5Types() {
>          SecondInterface ai = (SecondInterface) as[l];
>          l = ++ l % asLength;
>          return ai.getIntSecond();
>      }
>
> That is to say, we serially step through an array, invoking the same
> interface method on a different concrete class in turn.
>
> The performance (Apple M1) is sparklingly good:
>
> InterfaceCalls.test2ndInt5Types    6.026 ± 0.009      ns/op
>
> But this is so fast as to be incredible, only 19.3 clocks per
> invocation, including the control loop and the called method. A load
> from L1 cache takes about 3-4 cycles, and there are several dependent
> loads in the method lookup path. I suspected that because this
> benchmark is unrealistically predictable, it does not fairly represent
> real-world performance.
>
> So, let's try mixing it up a little, and jump about rather than
> cycling through the array:
>
>      static final int scramble(int n) {
>          int x = n;
>          x ^= x << 13;
>          x ^= x >>> 17;
>          x ^= x << 5;
>          return x == 0 ? 1 : x;
>      }
>
>      @Benchmark
>      public int test2ndInt5TypesScrambled() {
>          l = scramble(l);
>          SecondInterface ai = (SecondInterface) as[Math.floorMod(l,
> asLength)];
>          return ai.getIntSecond();
>      }
>
> This adds only a few instructions, but the measured performance is
> radically different:
>
> InterfaceCalls.test2ndInt5TypesScrambled  19.363 ± 0.084      ns/op
>
> This is 62 clocks per invocation, and I suspect this result is far
> more realistic. But is it really? Maybe invokeinterface calls are
> generally very predictable, so the benchmark we already have is
> representative.
>
> Questions:
>
> - Which benchmarks should we be optimizing for? I guess it could be
>    the scrambled one, but maybe that would have no benefit if generally
>    (or overwhelmingly often) invokeinterface is predictable.
>
> - How many of the (micro-)benchmarks in HotSpot suffer from this
>    problem? I'm guessing a lot of them, and perhaps it's partly because
>    they were written in the days when speculative execution were less
>    aggressive and branch prediction wasn't so good.
>
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20241009/88462dd9/attachment.htm>