AW: AW: Using MemoryAccess with structured MemoryLayout

Fri Feb 26 10:52:15 UTC 2021

Nice work Remi!

I second what Jorn says - it's good to see that what we have allows
higher-level stories to be built atop. There are few directions to go
(as Jorn mentions) - one being the one you describe, another more
class-oriented, where e.g. you could define an interface with
getters/setters and then ask the runtime to instantiate it for you
using a layout.

One note here is that, your solution has some generality issues when it
comes to indices - you support two longs - which will probably be ok in
the vast majority of cases, there could be more. The class-based
approach is more scalable, but at the same time adds a lot of static
footprint (startup of binder-based FFI was easily 4x what we have now).

Anyway - my point here is: for now the goal is to make sure that the
foundations we have are stable and efficient. Once we get there, and
once we see what kind of libraries people are developing on top of the
foreign memory access/linker, there's definitively room to see if we
can uplevel some of that in the JDK (as we have done other times in the
past).

Thanks

Maurizio

On Fri, 2021-02-26 at 11:14 +0100, Jorn Vernee wrote:
> Hi Rémi,
> 
> I see trusted record fields are already being put to good use ;)
> 
> Using lazily initialized MutableCallSites is an interesting way to
> fold 
> multiple access shapes into the same capability object. I thought
> you'd 
> end up with a chain of ifs to check the shape at each use-site if
> you 
> have multiple different shapes spread over different use-sites, but
> I 
> guess the path String having to be constant makes sure that those
> checks 
> are folded away as well. Very clever :)
> 
> To me, this validates the work we've done done on the memory access
> API 
> over the past year and a half, which for a large part was about
> "finding 
> the right primitive" to add to the JDK. If the foundation is solid,
> it 
> opens up all kinds of possibilities for building other things on
> top, 
> such as this FastAccess example.
> 
> It also opens up the opportunity to implement some of these 
> middle-ground APIs in the JDK, whether it's something like this, or 
> something like the old Panama binder re-implemented on top of the 
> current API. There seem to be many things to choose from here. I
> think 
> we'll have to see though; each API layer we add requires
> maintenance, 
> and it's not worth it if users just end up spinning their own thing
> in 
> the end, because they wanted a different API shape.
> 
> For now, we're still finalizing the basement :)
> 
> Jorn
> 
> On 25/02/2021 23:46, Remi Forax wrote:
> > [sneaking into this conversation]
> > 
> > While i agree that the state of the basement is now far better with
> > panama than with JNI,
> > I also think you can have a kind of middle ground API, that is
> > based on MemoryLayout but propose a little more high level api than
> > just MemoryAccess.
> > 
> > Something like this,
> > i can describe my layout
> > 
> >      SequenceLayout keyValues = MemoryLayout.ofSequence(
> >          MemoryLayout.ofStruct(
> >              MemoryLayout.ofValueBits(32,
> > nativeOrder()).withName("key"),
> >              MemoryLayout.ofValueBits(32,
> > nativeOrder()).withName("value")
> >          )
> >      ).withName("KeyValues");
> > 
> > 
> > and then creates a FastAccess objet on that MemoryLayout
> > 
> >      private static final FastAccess FAST_ACCESS =
> > FastAccess.of(keyValues);
> > 
> >      
> > then i have access to method getInt/getLong, ..., setInt, setLong
> > etc that takes a kind of ad hoc DSL that describe an array of
> > PathElement in a more compact way and in a way that is considered
> > as a constant by the JIT, so i can write code like this
> > 
> >      try (var segment = MemorySegment.allocateNative(400)) {
> >        for (int i = 0 ; i < 100 ; i++) {
> >          MemoryAccess.setIntAtIndex(segment, i, i);
> >        }
> > 
> >        assertEquals(4, FAST_ACCESS.getInt(segment, "[].key", 2));
> >        assertEquals(3, FAST_ACCESS.getInt(segment, "[].value", 1));
> >      }
> > 
> > The prototype is here
> >    https://github.com/forax/panama-fastaccess
> > 
> > Rémi
> > 
> > ----- Mail original -----
> > > De: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> > > À: markus at headcrashing.eu, "panama-dev at openjdk.java.net'" <
> > > panama-dev at openjdk.java.net>
> > > Envoyé: Jeudi 25 Février 2021 22:58:06
> > > Objet: Re: AW: AW: Using MemoryAccess with structured
> > > MemoryLayout
> > > I think I disagree on a couple of points :-)
> > > 
> > > On Thu, 2021-02-25 at 19:33 +0100, Markus KARG wrote:
> > > > Maurizio,
> > > > 
> > > > I really appreciate your long reply, indeed, and I understand
> > > > what
> > > > you mean with "seeing from the other side".
> > > > 
> > > > But see, as an application vendor, I am not convinced by the
> > > > solution
> > > > you provide to the struct-member-problem. Tooling is fine, but
> > > > it
> > > > should not be a MUST just to get a high performant AND readable
> > > > solution. Really: Most coders I know HATE tooling but LOVE
> > > > coding. A
> > > > Java core API shall be in itself be standalone not not force
> > > > tooling.
> > > > This just badly smells like javah.
> > > This is one of the points where I (strongly) disagree; in my
> > > opinion
> > > there is a huge difference between what javah generates and what
> > > jextract generates; regardless of whether you love or hate the
> > > tool,
> > > their options, or the flavor of the code that comes out of them,
> > > one
> > > thing is _very_ different: javah generates C header files -
> > > jextract
> > > generates plain Java files. The latter are ready to be included
> > > in your
> > > repository of choice, you need zero extra work to build them and
> > > run
> > > them, your IDE can index them, autocompletion works, etc. The
> > > same,
> > > sadly, cannot be said about what comes out of javah - which
> > > forces you
> > > to write some C glue code just to be able to call simple
> > > functions like
> > > getpid.
> > > 
> > > So, I think I cannot agree with you there - yes, they are both
> > > tools,
> > > and they both generate code, but let's please stop and recognize
> > > how
> > > useful and handy it is to be able to call a native function
> > > without
> > > writing a single line of native code!
> > > 
> > > Also, on the topic of coders loving to code, but hating tooling -
> > > well,
> > > I think there's code and code. No matter how you can improve the
> > > API
> > > for accessing struct members, there is still a significant amount
> > > of
> > > information that has to be derived from the header files; if you
> > > take a
> > > look at libraries like this:
> > > 
> > > http://www.jcuda.org/jcuda/doc/index.html
> > > 
> > > (hat tip to Marco Hutter who has the patience and will power to
> > > maintain it :-) )
> > > 
> > > I don't think that many developer would want to write code like
> > > this?
> > > Which is why JCuda exists in the first place, so that they don't
> > > have
> > > to! So, yes, tooling has a bad rep, I get it, but problems like
> > > these
> > > can only be solved with tools.
> > > 
> > > And if you only need to interact with few structs - then, it
> > > shouldn't
> > > matter too much if it takes some code to get there?
> > > 
> > > 
> > > >   Infact, what you describe with "writing your own wrapper"
> > > > TaggedValues.setValue(segment, 42) looks pretty well, but why
> > > > shall
> > > > EVERY programmer reinvent the wheel here? From a new API that
> > > > provides nice wrappers for single fields I actually do expect
> > > > that it
> > > > also provides a just-as-nice way for structs, too. So speaking
> > > > of
> > > > what API users do expect, from a high level view, would be NOT
> > > > writing customs wrappers around VarHandle, NEITHER using
> > > > tooling, but
> > > > just using a simple, standard API for that:
> > > > 
> > > > /*
> > > >   * Define a memory layout as a Java-view on a C struct
> > > >   * Use MemoryAccess static methods to most easily (and in high
> > > > performance) push values into the struct members
> > > >   */
> > > > MemoryLayout struct = ...;
> > > > MemoryAccess.setInt(structMember, value); // yes, THIS short!
> > > Well, sure, I'd like to be able to write less code, and have it
> > > run
> > > even faster :-) but from what you write, I'm having trouble
> > > picturing
> > > what exactly you are proposing.
> > > 
> > > What is structMember? Is it a layout? Is it a segment? Is it
> > > both? If
> > > it's both, I think I already replied as to why that's not great
> > > API-
> > > wise. A layout is fundamentally a _static_ abstraction - it
> > > exists as
> > > defined somewhere (an header file, a protobuf file, somewhere)
> > > and
> > > there it lies. A memory segment is a _dynamic_ entity: it's
> > > allocated,
> > > it's freed, it's sliced, it traversed with a spliterator...
> > > layouts are
> > > used _at the boundaries_ (e.g. if you need to know how much to
> > > allocate) - but a segment is just a bunch of bytes, and that's a
> > > big
> > > part in what makes the access API efficient and universally
> > > applicable.
> > > 
> > > But regardless, I think I've also explained how, from the nitty-
> > > gritty
> > > performance perspective, the API you propose doesn't really make
> > > sense
> > > in the JVM we have (where, to be optimized, var handles/method
> > > handles
> > > have to be constant static fields). It _might_ (or not!) make
> > > sense in
> > > the VM we'll have 5 years from now, and if it does, rest assured
> > > that
> > > we'll circle back to this, but that's a story for another day, I
> > > think.
> > > 
> > > There's no magic trick we can pull out of the hat here - it's a
> > > choice
> > > between having a high-level API (which performs horribly) or have
> > > a
> > > lower-level API which performs well, _and that can be
> > > specialized_ for
> > > the use cases that you want to work with.
> > > 
> > > I understand you feel that's not enough; we believe that, when
> > > combined
> > > , the Foreign Memory Access API and the Foreign Linker API
> > > significantly enhance Java's ability to interop with native
> > > memory and
> > > libraries, and to do so in 100% Java code.
> > > 
> > > 
> > > As a case of "eating your own dog's food": our jextract tool
> > > works on
> > > top of LLVM/libclang; in fact, jextract was built on top of an
> > > handwritten JNI port of libclang. Over the last year or so, we
> > > replaced this ad-hoc JNI code with 100% auto-generated foreign-
> > > linker
> > > binding. To be honest we never looked back. It's (far) easier to
> > > maintain, and when libclang gets new goodies, we just run the
> > > tool
> > > again and commit the sources: all the new
> > > functions/structs/constants
> > > are there.
> > > 
> > > Even at the beginning, when the performance of the foreign linker
> > > API
> > > wasn't great (Panama used to be several Xs slower than JNI at
> > > calling
> > > native functions, and then some more Xs slower when calling Java
> > > code
> > > back from native), it was still worth it, because the difference
> > > in
> > > performance was negligible overall compared to how much time we
> > > saved
> > > by no longer having to maintain that JNI port.
> > > 
> > > Now that the linker API is, performance-wise, on a more solid
> > > footing
> > > (for downcalls, for upcalls we're about to get significantly
> > > faster
> > > than JNI [1]), I honestly don't see many reasons as to why one
> > > should
> > > stick with JNI - other than compatibility (which in some contexts
> > > can
> > > be a big one, I know). Yes, the new APIs might be a little on the
> > > low-
> > > level side, but at least you know we tried hard to squeeze every
> > > ounce
> > > of available power into them :-)
> > > 
> > > Maurizio
> > > 
> > > [1] - https://github.com/openjdk/panama-foreign/pull/457
> > > 
> > > > So the most easy way to get the structMember is to get a
> > > > MemorySegment simply by its NAME.
> > > > Hence, what I REALLY miss the most in the Memory Access API is
> > > > a
> > > > simple command to get a named MemorySegment from a structured
> > > > MemorySegment root plus a String name -- without using
> > > > VarHandle or
> > > > tooling. :-)
> > > > 
> > > > -Markus
> > > >   
> > > > 
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Maurizio Cimadamore [mailto:maurizio.cimadamore at oracle.com
> > > > ]
> > > > Gesendet: Donnerstag, 25. Februar 2021 19:13
> > > > An: markus at headcrashing.eu; panama-dev at openjdk.java.net
> > > > Betreff: Re: AW: Using MemoryAccess with structured
> > > > MemoryLayout
> > > > 
> > > > On Thu, 2021-02-25 at 18:31 +0100, Markus KARG wrote:
> > > > > Maurizio,
> > > > > 
> > > > > thank you for your kind answer.
> > > > > 
> > > > > Yes, indeed I am already using VarHandle currently, but
> > > > > actually I
> > > > > like the idea of MemoryAccess more, as the code looks a bit
> > > > > simpler
> > > > > to me.
> > > > > 
> > > > > What I envision is something like doing this instead, as it
> > > > > spares
> > > > > one code line (the actual invocation of the VarValue):
> > > > > 
> > > > > ```
> > > > > MemorySegment valueSegment = taggedValues.memorySegment(
> > > > >                                        PathElement.sequenceEl
> > > > > ement(3
> > > > > ),
> > > > >                                        PathElement.groupEleme
> > > > > nt("val
> > > > > ue
> > > > > "));
> > > > > MemoryAccess.setInt(valueSegment, someInteger);
> > > > > ```
> > > > > 
> > > > > It would be cool to have this additional possibility, as it
> > > > > makes
> > > > > using structs rather simple compared to the VarHandle way.
> > > > Hi,
> > > > I see that you would like to somehow attach the layout to the
> > > > segment
> > > > -
> > > > but layouts and segments are orthogonal, and for good reasons.
> > > > 
> > > > First, not always, when accessing a segment you might know what
> > > > is
> > > > the
> > > > layout of the thing being accessed - in a lot of cases access
> > > > is much
> > > > more ad-hoc.
> > > > 
> > > > Second, if a notion of layout is always associated with a
> > > > segment,
> > > > you
> > > > end up in a place where, in order to slice a segment, you
> > > > probably
> > > > have
> > > > to follow that operation with some kind of "cast" (e.g. where
> > > > you set
> > > > the layout of the slice to something else). We've been there
> > > > with a
> > > > past incarnation of the Panama API, and, while an API like the
> > > > one
> > > > you
> > > > describe is probably more suited to closely model a C pointer
> > > > type,
> > > > that API is not very "primitive" - meaning that it is quite
> > > > useless
> > > > if
> > > > you start using a memory segment in a more buffer-like way.
> > > > 
> > > > Note that not _all_ the users of the Memory Access API are
> > > > interested
> > > > in native interop - many just want to be able to allocate slabs
> > > > of
> > > > native memory, and free deterministically. So, the more baggage
> > > > we
> > > > add,
> > > > the more those non-linker use cases become bloated with
> > > > unnecessary
> > > > overhead.
> > > > 
> > > > Third, I imagine that you would like a method like this:
> > > > 
> > > > MemoryAccess.setIntAtLayout(valueSegment, someInteger,
> > > > PathElement...)
> > > > 
> > > > E.g. you want/need to specify a path into the segment to obtain
> > > > one
> > > > of
> > > > the leaves (otherwise I don't see how the runtime can infer
> > > > which
> > > > element you wanna access). But here we rub against another big
> > > > problem:
> > > > VarHandle (and MethodHandle) work best when they are
> > > > _constants_ e.g.
> > > > declared as static final variables in your code. When that
> > > > happens,
> > > > the
> > > > VM is able to inline all the var handle goo away, and optimize
> > > > the
> > > > code
> > > > enough that accessing a segment in a tight loop will often
> > > > result in
> > > > a
> > > > sequence of unrolled MOV instructions (in some cases you can
> > > > even see
> > > > auto-vectorization kicking in).
> > > > 
> > > > If the VarHandle is not constant - well, none of these
> > > > optimization
> > > > will occur - meaning that your memory access will easily be 10x
> > > > slower.
> > > > The reason MemoryAccess works is that it works on a number of
> > > > predefined VarHandle which are created as static constants
> > > > under the
> > > > hood, once and for all.
> > > > 
> > > > But your API would require a _fresh_ VarHandle to be created on
> > > > every
> > > > call, based on the coordinates passed in. Hence, the var handle
> > > > would
> > > > not be constant, and performance would suffer big time.
> > > > 
> > > > The fine line we're walking in this project is to expose the
> > > > tools
> > > > and
> > > > the knob which allow clients to perform memory access/foreign
> > > > function
> > > > access in the fastest possible way we know of/is possible
> > > > within the
> > > > JVM. To do that, sometimes (not always) we have to "look the
> > > > other
> > > > way"
> > > > when it comes to usability - simply because it would be
> > > > impossible to
> > > > have an API that is both 100% efficient and 100% usable.
> > > > 
> > > > The main trick that users can adopt in these cases, is to
> > > > mediate
> > > > access; that is, if there is a particular kind of struct that
> > > > you
> > > > want
> > > > to operate with, nothing prevents you from declaring _your own_
> > > > MemoryAccess-like abstraction that works for specific fields of
> > > > that
> > > > struct - e.g.
> > > > 
> > > > TaggedValues.setValue(segment, 42);
> > > > 
> > > > TaggedValues will have constant method handles (one for each
> > > > field),
> > > > and a bunch of accessors (a pair for each field). There is
> > > > nothing
> > > > magic in MemoryAccess - it's just shorthand for accessing
> > > > ubiquitous
> > > > primitive types. There's no reason users cannot replicate the
> > > > same
> > > > idiom in their code - so that clients will be _both_ fast AND
> > > > usable/readable.
> > > > 
> > > > Of course, when working with bigger libraries, there might be
> > > > many
> > > > structs to work with, and manually defining a "wrapper static
> > > > class"
> > > > for each struct might prove too tedious. But that's why we're
> > > > investing
> > > > in tooling: that's exactly the job that jextract does: it
> > > > parses a
> > > > complex C header and turns it into a bunch of static
> > > > declarations
> > > > which
> > > > help you access your native API more quickly (as the
> > > > boilerplate has
> > > > been generated for you) and more safely (as the static wrappers
> > > > will
> > > > avoid direct VarHandle usage, which can sometimes be "sharp").
> > > > 
> > > > Even at the jextract level, we are aware that some people would
> > > > expect
> > > > an API that is closer to the C world (e.g. a `Pointer` type?
> > > > Struct
> > > > wrappers?) - but again here our approach is to enable people to
> > > > write
> > > > code which targets the library they wanna use quickly (e.g. way
> > > > faster
> > > > than using JNI), but w/o introducing unnecessary translation
> > > > steps in
> > > > the middle - which would make the bindings too slow for some
> > > > advanced
> > > > use cases.
> > > > 
> > > > I apologize for the (too) big reply - I hope you find it
> > > > helpful to
> > > > understand the "why not" part of your earlier question.
> > > > 
> > > > Cheers
> > > > Maurizio
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > -Markus
> > > > > 
> > > > > 
> > > > > -----Ursprüngliche Nachricht-----
> > > > > Von: Maurizio Cimadamore [mailto:
> > > > > maurizio.cimadamore at oracle.com]
> > > > > Gesendet: Donnerstag, 25. Februar 2021 17:18
> > > > > An: markus at headcrashing.eu; panama-dev at openjdk.java.net
> > > > > Betreff: Re: Using MemoryAccess with structured MemoryLayout
> > > > > 
> > > > > Hi Markus,
> > > > > to read inside the struct, you can:
> > > > > 
> > > > > * use the MemoryAccess API - but doing so is limited - e.g.
> > > > > MemoryAccess only supports access by physical offset or
> > > > > logical
> > > > > index.
> > > > > 
> > > > > * create your own VarHandle which points to the desired part
> > > > > of the
> > > > > layout, and use that
> > > > > 
> > > > > Here:
> > > > > 
> > > > > https://urldefense.com/v3/__https://download.java.net/java/early_access/jdk16/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemoryLayout.html__;!!GqivPVa7Brio!LVwiSmmQDT3XCTpdxKQi2AocVfza9_6et_c92Nt2gcvxhNVkKRoQn59203xQEPwO1lXaSNk$
> > > > > 
> > > > > More specifically:
> > > > > 
> > > > > 
> > > > > ```
> > > > > SequenceLayout taggedValues = MemoryLayout.ofSequence(5,
> > > > >      MemoryLayout.ofStruct(
> > > > >          MemoryLayout.ofValueBits(8,
> > > > > ByteOrder.nativeOrder()).withName("kind"),
> > > > >          MemoryLayout.ofPaddingBits(24),
> > > > >          MemoryLayout.ofValueBits(32,
> > > > > ByteOrder.nativeOrder()).withName("value")
> > > > >      )
> > > > > ).withName("TaggedValues");
> > > > > 
> > > > > ```
> > > > > 
> > > > > And
> > > > > 
> > > > > ```
> > > > > VarHandle valueHandle = taggedValues.varHandle(int.class,
> > > > >                                                 PathElement.s
> > > > > equence
> > > > > El
> > > > > ement(),
> > > > >                                                 PathElement.g
> > > > > roupEle
> > > > > me
> > > > > nt("value"));
> > > > > ```
> > > > > 
> > > > > Cheers
> > > > > Maurizio
> > > > > 
> > > > > 
> > > > > On Thu, 2021-02-25 at 17:01 +0100, Markus KARG wrote:
> > > > > > On Windows, many API function have C struct as parameters.
> > > > > > 
> > > > > > It is rather straightforward to set up a structured
> > > > > > MemoryLayout.
> > > > > > 
> > > > > > In case I want to easily poke bytes into that struct, I'd
> > > > > > like to
> > > > > > use
> > > > > > MemoryAccess.
> > > > > > 
> > > > > > Unfortunately, there seem to be no EASY / SIMPLE way to
> > > > > > write:
> > > > > > 
> > > > > > MemoryAccess.setIntAt(MEMBER_OF_SUCH_A_STRUCT,
> > > > > > VALUE_OF_THAT_MEMBER);
> > > > > > 
> > > > > > .or I missed to see it in the JavaDocs.
> > > > > > 
> > > > > > Is this possible? If yes, how? If not, why not?
> > > > > > 
> > > > > > -Markus
> > > > > > 
> > > > > >