FFM performance tweaks

Fri Nov 22 17:13:45 UTC 2024

Hey Maurizio,

I looked into inlining again, since the other issues affecting us have
been addressed in 24-ea+24. I used LWJGL's HelloVulkan sample as an
approximation of real-world rendering that is heavy on foreign calls
and off-heap memory access. I tested the exact same code but with 2
different "backends":

1. LWJGL with JNI downcalls and Unsafe memory access
2. LWJGL with FFM downcalls and everything-segment memory access

In this code, the inlining failure happens in demo_draw_build_cmd [1],
with the following call stack:

main -> run -> demo_run -> <event loop> { demo_draw -> demo_draw_build_cmd }

So, not too deep and indeed the code is not affected by MaxInlineLevel.
All inlining failures are reported with NodeCountInliningCutoff. Both
the JNI and FFM implementations suffer from this, but it does happen
much earlier with FFM:

- First failure happens at line 1710 with JNI
- First failure happens at line 1655 with FFM, not even halfway through
the method.

I have digged into C2 a bit and this is my current understanding:

- NodeCountInliningCutoff is a develop flag, hardcoded to 18000 and not
changed since the first git commit (2007). [2]
- NodeCountInliningCutoff is only applicable when incremental inlining
is disabled for the method being compiled. [3]
- Setting LiveNodeCountInliningCutoff to a really high value (1M) has
no effect on incremental inlining decisions, for this particular code
at least.

Having no other (obvious) way to affect inlining in a product JVM, one
workaround that did work was -XX:+StressIncrementalInlining (with some
variance due to randomization of should_delay_inlining()). Not sure why
this is a product flag, but it does make a huge difference. Everything
in demo_draw_build_cmd gets fully inlined and GC activity drops to
nothing, with either the JNI or FFM backends.

I hope this helps in some way and would be happy to do more testing if
necessary.

- Ioannis

[1]: https://github.com/LWJGL/lwjgl3/blob/master/modules/samples/src/test/java/org/lwjgl/demo/vulkan/HelloVulkan.java#L1631

(note, this sample has been ported from C and intentionally maintains
the original code style)

[2]: https://github.com/openjdk/jdk/blob/13987b4244614d594dc8f94c288eddb6239a066f/src/hotspot/share/opto/c2_globals.hpp#L435
[3]: https://github.com/openjdk/jdk/blob/13987b4244614d594dc8f94c288eddb6239a066f/src/hotspot/share/opto/compile.hpp#L1108

On Fri, 22 Nov 2024 at 13:50, Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> We are taking a look on our side as well, and we do notice the inliner
> giving up, with both workarounds (specialized var handle and everything
> segment).
>
> We will share some updates as soon as we understand this a bit better
> (this will probably take some time).
>
> Cheers
> Maurizio
>
> On 21/11/2024 22:14, Brian S O'Neill wrote:
> > So what's going on? Ignoring the memory copy difference, it seems it's
> > really just the inliner giving up. The rebalancing code is broken up
> > into four very large methods, with lots of special edge cases which
> > get expanded, and so it ends up getting quite huge. I have confirmed
> > in previous test runs that the inliner does give up, but I was unable
> > to determine if it was in the rebalancing code itself. I suspect that
> > it was.