FFM API: Simplifying MemorySegment::ofBuffer

Tue Oct 29 10:56:05 UTC 2024

Hey Maurizio,

Yes, forcing inlining with CompileCommand in a simple benchmark helps:

https://gist.github.com/Spasi/bd45f53fb71997e9059f7bb22443a24b

It also helps in a real application, with significantly fewer young GC
cycles. I've attached 2 JITWatch screenshots to the above gist, first a
run with default JVM options, then the same call-site in a run with
CompileCommand.

On Tue, 29 Oct 2024 at 11:42, Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> Hi Ioannis,
> the idea you suggest of moving bits and pieces of the MS::ofBuffer
> factory seems very good to me.
>
> Have you tried forcing inlining of that method to see if that helps -
> that could be an interim move we could make (e.g. add a ForceInline
> annotation there, while we wait for the code to be improved).
>
> I've filed this:
>
> https://bugs.openjdk.org/browse/JDK-8343188
>
> Thanks!
>
> Maurizio
>
> On 28/10/2024 10:47, Ioannis Tsakpinis wrote:
> > Hello again,
> >
> > In the context of writing a "compatibility" layer between JNI/NIO-based
> > bindings and FFM-based bindings, a method that seems problematic while
> > performance testing is MemorySegment::ofBuffer. Specifically, it is too
> > big to be inlined in most call-sites, which hurts optimizations in the
> > surrounding code.
> >
> > (like my previous thread, this is mostly related to EA not scalar
> > replacing very-short-lived instances created behind the public API)
> >
> > I don't have a benchmark to share, nor have I tested potential fixes,
> > but looking at the implementation, the first thing I would try is to
> > extract the HeapMemorySegmentImpl switch to a separate method and also
> > combine it with the getScaleFactor switch for that case. Right now the
> > type-switch happens twice, for no apparent benefit. While the types are
> > different, the result is effectively always the same. Then, I'd remove
> > the "The provided heap buffer is not backed by an array." check at the
> > start. Afaict, it is not something that could ever happen, at least not
> > without someone doing nasty things with Unsafe. These two changes
> > should make the method small enough for more inlining possibilities.
> >
> > A more involved approach, but one that would definitely be more
> > efficient, is to move the MemorySegment creation to the NIO API
> > instead. MemorySegment::asByteBuffer is simple, because each segment
> > subclass knows exactly what kind of ByteBuffer to create. Likewise,
> > each Buffer subclass could have a method (non-public, exposed via
> > NIO_ACCESS or something) that creates the appropriate MemorySegment
> > subclass. For example, a direct DoubleBuffer would implicitly know to
> > create a NativeMemorySegmentImpl with a scale factor of 8.
> >
> > Note that this is a low-priority request, just a small improvement that
> > would be good to have, but can live without. The motivation is being
> > able to run current LWJGL using FFM internally and without Unsafe
> > (prototype with --sun-misc-unsafe-memory-access=deny actually works!),
> > while providing decent performance compared to the legacy solution.
> > Future versions however will be 100% FFM-based, so not affected by this
> > issue at all.
> >
> > - Ioannis