FFM API: Simplifying MemorySegment::ofBuffer

Tue Oct 29 11:16:52 UTC 2024

Thanks for confirming!

I'd say we should probably do something to provide relief in the short 
term, while we spend some more cycle to simplify the implementation for 
good.

Maurizio

On 29/10/2024 10:56, Ioannis Tsakpinis wrote:
> Hey Maurizio,
>
> Yes, forcing inlining with CompileCommand in a simple benchmark helps:
>
> https://urldefense.com/v3/__https://gist.github.com/Spasi/bd45f53fb71997e9059f7bb22443a24b__;!!ACWV5N9M2RV99hQ!Ns9SoF8yQThXxtDjso4aG6ag2GUb6CCa2M2p-QkbWl9VT3yIEOIUawUqU8hGB1iidQgyW16G__L2Z7qsdFlTqhY$
>
> It also helps in a real application, with significantly fewer young GC
> cycles. I've attached 2 JITWatch screenshots to the above gist, first a
> run with default JVM options, then the same call-site in a run with
> CompileCommand.
>
> On Tue, 29 Oct 2024 at 11:42, Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com> wrote:
>> Hi Ioannis,
>> the idea you suggest of moving bits and pieces of the MS::ofBuffer
>> factory seems very good to me.
>>
>> Have you tried forcing inlining of that method to see if that helps -
>> that could be an interim move we could make (e.g. add a ForceInline
>> annotation there, while we wait for the code to be improved).
>>
>> I've filed this:
>>
>> https://bugs.openjdk.org/browse/JDK-8343188
>>
>> Thanks!
>>
>> Maurizio
>>
>> On 28/10/2024 10:47, Ioannis Tsakpinis wrote:
>>> Hello again,
>>>
>>> In the context of writing a "compatibility" layer between JNI/NIO-based
>>> bindings and FFM-based bindings, a method that seems problematic while
>>> performance testing is MemorySegment::ofBuffer. Specifically, it is too
>>> big to be inlined in most call-sites, which hurts optimizations in the
>>> surrounding code.
>>>
>>> (like my previous thread, this is mostly related to EA not scalar
>>> replacing very-short-lived instances created behind the public API)
>>>
>>> I don't have a benchmark to share, nor have I tested potential fixes,
>>> but looking at the implementation, the first thing I would try is to
>>> extract the HeapMemorySegmentImpl switch to a separate method and also
>>> combine it with the getScaleFactor switch for that case. Right now the
>>> type-switch happens twice, for no apparent benefit. While the types are
>>> different, the result is effectively always the same. Then, I'd remove
>>> the "The provided heap buffer is not backed by an array." check at the
>>> start. Afaict, it is not something that could ever happen, at least not
>>> without someone doing nasty things with Unsafe. These two changes
>>> should make the method small enough for more inlining possibilities.
>>>
>>> A more involved approach, but one that would definitely be more
>>> efficient, is to move the MemorySegment creation to the NIO API
>>> instead. MemorySegment::asByteBuffer is simple, because each segment
>>> subclass knows exactly what kind of ByteBuffer to create. Likewise,
>>> each Buffer subclass could have a method (non-public, exposed via
>>> NIO_ACCESS or something) that creates the appropriate MemorySegment
>>> subclass. For example, a direct DoubleBuffer would implicitly know to
>>> create a NativeMemorySegmentImpl with a scale factor of 8.
>>>
>>> Note that this is a low-priority request, just a small improvement that
>>> would be good to have, but can live without. The motivation is being
>>> able to run current LWJGL using FFM internally and without Unsafe
>>> (prototype with --sun-misc-unsafe-memory-access=deny actually works!),
>>> while providing decent performance compared to the legacy solution.
>>> Future versions however will be 100% FFM-based, so not affected by this
>>> issue at all.
>>>
>>> - Ioannis