AW: AW: Using MemoryAccess with structured MemoryLayout

Fri Feb 26 13:30:14 UTC 2021

----- Mail original -----
> De: "Jorn Vernee" <jorn.vernee at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>, "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> Cc: "markus" <markus at headcrashing.eu>, "panama-dev at openjdk.java.net'" <panama-dev at openjdk.java.net>
> Envoyé: Vendredi 26 Février 2021 11:14:55
> Objet: Re: AW: AW: Using MemoryAccess with structured MemoryLayout

> Hi Rémi,

Hi Jorn,

> 
> I see trusted record fields are already being put to good use ;)

yep :)

> 
> Using lazily initialized MutableCallSites is an interesting way to fold
> multiple access shapes into the same capability object.

I think that using @Stable is better because you can lazy initialize the whole method handle, here i still need to initialize the MutableCallSite.
But as a user i don't have access to @Stable.

> I thought you'd end up with a chain of ifs to check the shape at each use-site if you
> have multiple different shapes spread over different use-sites, but I
> guess the path String having to be constant makes sure that those checks
> are folded away as well. Very clever :)

yep, the JIT removes all the useless branches at each call site.
That said, you can reach a performance cliff if you creates too many SSA nodes, the JIT will just stop and it will not be pretty.

> 
> To me, this validates the work we've done done on the memory access API
> over the past year and a half, which for a large part was about "finding
> the right primitive" to add to the JDK. If the foundation is solid, it
> opens up all kinds of possibilities for building other things on top,
> such as this FastAccess example.

yes, i fully agree,
internally i'm on the shoulder of the giants, just using the MemoryLayout/PathElement/VarHandle objects.

> 
> It also opens up the opportunity to implement some of these
> middle-ground APIs in the JDK, whether it's something like this, or
> something like the old Panama binder re-implemented on top of the
> current API. There seem to be many things to choose from here. I think
> we'll have to see though; each API layer we add requires maintenance,
> and it's not worth it if users just end up spinning their own thing in
> the end, because they wanted a different API shape.
> 
> For now, we're still finalizing the basement :)
> 

Rémi

> Jorn
> 
> On 25/02/2021 23:46, Remi Forax wrote:
>> [sneaking into this conversation]
>>
>> While i agree that the state of the basement is now far better with panama than
>> with JNI,
>> I also think you can have a kind of middle ground API, that is based on
>> MemoryLayout but propose a little more high level api than just MemoryAccess.
>>
>> Something like this,
>> i can describe my layout
>>
>>      SequenceLayout keyValues = MemoryLayout.ofSequence(
>>          MemoryLayout.ofStruct(
>>              MemoryLayout.ofValueBits(32, nativeOrder()).withName("key"),
>>              MemoryLayout.ofValueBits(32, nativeOrder()).withName("value")
>>          )
>>      ).withName("KeyValues");
>>
>>
>> and then creates a FastAccess objet on that MemoryLayout
>>
>>      private static final FastAccess FAST_ACCESS = FastAccess.of(keyValues);
>>
>>      
>> then i have access to method getInt/getLong, ..., setInt, setLong etc that takes
>> a kind of ad hoc DSL that describe an array of PathElement in a more compact
>> way and in a way that is considered as a constant by the JIT, so i can write
>> code like this
>>
>>      try (var segment = MemorySegment.allocateNative(400)) {
>>        for (int i = 0 ; i < 100 ; i++) {
>>          MemoryAccess.setIntAtIndex(segment, i, i);
>>        }
>>
>>        assertEquals(4, FAST_ACCESS.getInt(segment, "[].key", 2));
>>        assertEquals(3, FAST_ACCESS.getInt(segment, "[].value", 1));
>>      }
>>
>> The prototype is here
>>    https://github.com/forax/panama-fastaccess
>>
>> Rémi
>>
>> ----- Mail original -----
>>> De: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
>>> À: markus at headcrashing.eu, "panama-dev at openjdk.java.net'"
>>> <panama-dev at openjdk.java.net>
>>> Envoyé: Jeudi 25 Février 2021 22:58:06
>>> Objet: Re: AW: AW: Using MemoryAccess with structured MemoryLayout
>>> I think I disagree on a couple of points :-)
>>>
>>> On Thu, 2021-02-25 at 19:33 +0100, Markus KARG wrote:
>>>> Maurizio,
>>>>
>>>> I really appreciate your long reply, indeed, and I understand what
>>>> you mean with "seeing from the other side".
>>>>
>>>> But see, as an application vendor, I am not convinced by the solution
>>>> you provide to the struct-member-problem. Tooling is fine, but it
>>>> should not be a MUST just to get a high performant AND readable
>>>> solution. Really: Most coders I know HATE tooling but LOVE coding. A
>>>> Java core API shall be in itself be standalone not not force tooling.
>>>> This just badly smells like javah.
>>> This is one of the points where I (strongly) disagree; in my opinion
>>> there is a huge difference between what javah generates and what
>>> jextract generates; regardless of whether you love or hate the tool,
>>> their options, or the flavor of the code that comes out of them, one
>>> thing is _very_ different: javah generates C header files - jextract
>>> generates plain Java files. The latter are ready to be included in your
>>> repository of choice, you need zero extra work to build them and run
>>> them, your IDE can index them, autocompletion works, etc. The same,
>>> sadly, cannot be said about what comes out of javah - which forces you
>>> to write some C glue code just to be able to call simple functions like
>>> getpid.
>>>
>>> So, I think I cannot agree with you there - yes, they are both tools,
>>> and they both generate code, but let's please stop and recognize how
>>> useful and handy it is to be able to call a native function without
>>> writing a single line of native code!
>>>
>>> Also, on the topic of coders loving to code, but hating tooling - well,
>>> I think there's code and code. No matter how you can improve the API
>>> for accessing struct members, there is still a significant amount of
>>> information that has to be derived from the header files; if you take a
>>> look at libraries like this:
>>>
>>> http://www.jcuda.org/jcuda/doc/index.html
>>>
>>> (hat tip to Marco Hutter who has the patience and will power to
>>> maintain it :-) )
>>>
>>> I don't think that many developer would want to write code like this?
>>> Which is why JCuda exists in the first place, so that they don't have
>>> to! So, yes, tooling has a bad rep, I get it, but problems like these
>>> can only be solved with tools.
>>>
>>> And if you only need to interact with few structs - then, it shouldn't
>>> matter too much if it takes some code to get there?
>>>
>>>
>>>>   Infact, what you describe with "writing your own wrapper"
>>>> TaggedValues.setValue(segment, 42) looks pretty well, but why shall
>>>> EVERY programmer reinvent the wheel here? From a new API that
>>>> provides nice wrappers for single fields I actually do expect that it
>>>> also provides a just-as-nice way for structs, too. So speaking of
>>>> what API users do expect, from a high level view, would be NOT
>>>> writing customs wrappers around VarHandle, NEITHER using tooling, but
>>>> just using a simple, standard API for that:
>>>>
>>>> /*
>>>>   * Define a memory layout as a Java-view on a C struct
>>>>   * Use MemoryAccess static methods to most easily (and in high
>>>> performance) push values into the struct members
>>>>   */
>>>> MemoryLayout struct = ...;
>>>> MemoryAccess.setInt(structMember, value); // yes, THIS short!
>>> Well, sure, I'd like to be able to write less code, and have it run
>>> even faster :-) but from what you write, I'm having trouble picturing
>>> what exactly you are proposing.
>>>
>>> What is structMember? Is it a layout? Is it a segment? Is it both? If
>>> it's both, I think I already replied as to why that's not great API-
>>> wise. A layout is fundamentally a _static_ abstraction - it exists as
>>> defined somewhere (an header file, a protobuf file, somewhere) and
>>> there it lies. A memory segment is a _dynamic_ entity: it's allocated,
>>> it's freed, it's sliced, it traversed with a spliterator... layouts are
>>> used _at the boundaries_ (e.g. if you need to know how much to
>>> allocate) - but a segment is just a bunch of bytes, and that's a big
>>> part in what makes the access API efficient and universally applicable.
>>>
>>> But regardless, I think I've also explained how, from the nitty-gritty
>>> performance perspective, the API you propose doesn't really make sense
>>> in the JVM we have (where, to be optimized, var handles/method handles
>>> have to be constant static fields). It _might_ (or not!) make sense in
>>> the VM we'll have 5 years from now, and if it does, rest assured that
>>> we'll circle back to this, but that's a story for another day, I think.
>>>
>>> There's no magic trick we can pull out of the hat here - it's a choice
>>> between having a high-level API (which performs horribly) or have a
>>> lower-level API which performs well, _and that can be specialized_ for
>>> the use cases that you want to work with.
>>>
>>> I understand you feel that's not enough; we believe that, when combined
>>> , the Foreign Memory Access API and the Foreign Linker API
>>> significantly enhance Java's ability to interop with native memory and
>>> libraries, and to do so in 100% Java code.
>>>
>>>
>>> As a case of "eating your own dog's food": our jextract tool works on
>>> top of LLVM/libclang; in fact, jextract was built on top of an
>>> handwritten JNI port of libclang. Over the last year or so, we
>>> replaced this ad-hoc JNI code with 100% auto-generated foreign-linker
>>> binding. To be honest we never looked back. It's (far) easier to
>>> maintain, and when libclang gets new goodies, we just run the tool
>>> again and commit the sources: all the new functions/structs/constants
>>> are there.
>>>
>>> Even at the beginning, when the performance of the foreign linker API
>>> wasn't great (Panama used to be several Xs slower than JNI at calling
>>> native functions, and then some more Xs slower when calling Java code
>>> back from native), it was still worth it, because the difference in
>>> performance was negligible overall compared to how much time we saved
>>> by no longer having to maintain that JNI port.
>>>
>>> Now that the linker API is, performance-wise, on a more solid footing
>>> (for downcalls, for upcalls we're about to get significantly faster
>>> than JNI [1]), I honestly don't see many reasons as to why one should
>>> stick with JNI - other than compatibility (which in some contexts can
>>> be a big one, I know). Yes, the new APIs might be a little on the low-
>>> level side, but at least you know we tried hard to squeeze every ounce
>>> of available power into them :-)
>>>
>>> Maurizio
>>>
>>> [1] - https://github.com/openjdk/panama-foreign/pull/457
>>>
>>>> So the most easy way to get the structMember is to get a
>>>> MemorySegment simply by its NAME.
>>>> Hence, what I REALLY miss the most in the Memory Access API is a
>>>> simple command to get a named MemorySegment from a structured
>>>> MemorySegment root plus a String name -- without using VarHandle or
>>>> tooling. :-)
>>>>
>>>> -Markus
>>>>   
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Maurizio Cimadamore [mailto:maurizio.cimadamore at oracle.com]
>>>> Gesendet: Donnerstag, 25. Februar 2021 19:13
>>>> An: markus at headcrashing.eu; panama-dev at openjdk.java.net
>>>> Betreff: Re: AW: Using MemoryAccess with structured MemoryLayout
>>>>
>>>> On Thu, 2021-02-25 at 18:31 +0100, Markus KARG wrote:
>>>>> Maurizio,
>>>>>
>>>>> thank you for your kind answer.
>>>>>
>>>>> Yes, indeed I am already using VarHandle currently, but actually I
>>>>> like the idea of MemoryAccess more, as the code looks a bit simpler
>>>>> to me.
>>>>>
>>>>> What I envision is something like doing this instead, as it spares
>>>>> one code line (the actual invocation of the VarValue):
>>>>>
>>>>> ```
>>>>> MemorySegment valueSegment = taggedValues.memorySegment(
>>>>>                                        PathElement.sequenceElement(3
>>>>> ),
>>>>>                                        PathElement.groupElement("val
>>>>> ue
>>>>> "));
>>>>> MemoryAccess.setInt(valueSegment, someInteger);
>>>>> ```
>>>>>
>>>>> It would be cool to have this additional possibility, as it makes
>>>>> using structs rather simple compared to the VarHandle way.
>>>> Hi,
>>>> I see that you would like to somehow attach the layout to the segment
>>>> -
>>>> but layouts and segments are orthogonal, and for good reasons.
>>>>
>>>> First, not always, when accessing a segment you might know what is
>>>> the
>>>> layout of the thing being accessed - in a lot of cases access is much
>>>> more ad-hoc.
>>>>
>>>> Second, if a notion of layout is always associated with a segment,
>>>> you
>>>> end up in a place where, in order to slice a segment, you probably
>>>> have
>>>> to follow that operation with some kind of "cast" (e.g. where you set
>>>> the layout of the slice to something else). We've been there with a
>>>> past incarnation of the Panama API, and, while an API like the one
>>>> you
>>>> describe is probably more suited to closely model a C pointer type,
>>>> that API is not very "primitive" - meaning that it is quite useless
>>>> if
>>>> you start using a memory segment in a more buffer-like way.
>>>>
>>>> Note that not _all_ the users of the Memory Access API are interested
>>>> in native interop - many just want to be able to allocate slabs of
>>>> native memory, and free deterministically. So, the more baggage we
>>>> add,
>>>> the more those non-linker use cases become bloated with unnecessary
>>>> overhead.
>>>>
>>>> Third, I imagine that you would like a method like this:
>>>>
>>>> MemoryAccess.setIntAtLayout(valueSegment, someInteger,
>>>> PathElement...)
>>>>
>>>> E.g. you want/need to specify a path into the segment to obtain one
>>>> of
>>>> the leaves (otherwise I don't see how the runtime can infer which
>>>> element you wanna access). But here we rub against another big
>>>> problem:
>>>> VarHandle (and MethodHandle) work best when they are _constants_ e.g.
>>>> declared as static final variables in your code. When that happens,
>>>> the
>>>> VM is able to inline all the var handle goo away, and optimize the
>>>> code
>>>> enough that accessing a segment in a tight loop will often result in
>>>> a
>>>> sequence of unrolled MOV instructions (in some cases you can even see
>>>> auto-vectorization kicking in).
>>>>
>>>> If the VarHandle is not constant - well, none of these optimization
>>>> will occur - meaning that your memory access will easily be 10x
>>>> slower.
>>>> The reason MemoryAccess works is that it works on a number of
>>>> predefined VarHandle which are created as static constants under the
>>>> hood, once and for all.
>>>>
>>>> But your API would require a _fresh_ VarHandle to be created on every
>>>> call, based on the coordinates passed in. Hence, the var handle would
>>>> not be constant, and performance would suffer big time.
>>>>
>>>> The fine line we're walking in this project is to expose the tools
>>>> and
>>>> the knob which allow clients to perform memory access/foreign
>>>> function
>>>> access in the fastest possible way we know of/is possible within the
>>>> JVM. To do that, sometimes (not always) we have to "look the other
>>>> way"
>>>> when it comes to usability - simply because it would be impossible to
>>>> have an API that is both 100% efficient and 100% usable.
>>>>
>>>> The main trick that users can adopt in these cases, is to mediate
>>>> access; that is, if there is a particular kind of struct that you
>>>> want
>>>> to operate with, nothing prevents you from declaring _your own_
>>>> MemoryAccess-like abstraction that works for specific fields of that
>>>> struct - e.g.
>>>>
>>>> TaggedValues.setValue(segment, 42);
>>>>
>>>> TaggedValues will have constant method handles (one for each field),
>>>> and a bunch of accessors (a pair for each field). There is nothing
>>>> magic in MemoryAccess - it's just shorthand for accessing ubiquitous
>>>> primitive types. There's no reason users cannot replicate the same
>>>> idiom in their code - so that clients will be _both_ fast AND
>>>> usable/readable.
>>>>
>>>> Of course, when working with bigger libraries, there might be many
>>>> structs to work with, and manually defining a "wrapper static class"
>>>> for each struct might prove too tedious. But that's why we're
>>>> investing
>>>> in tooling: that's exactly the job that jextract does: it parses a
>>>> complex C header and turns it into a bunch of static declarations
>>>> which
>>>> help you access your native API more quickly (as the boilerplate has
>>>> been generated for you) and more safely (as the static wrappers will
>>>> avoid direct VarHandle usage, which can sometimes be "sharp").
>>>>
>>>> Even at the jextract level, we are aware that some people would
>>>> expect
>>>> an API that is closer to the C world (e.g. a `Pointer` type? Struct
>>>> wrappers?) - but again here our approach is to enable people to write
>>>> code which targets the library they wanna use quickly (e.g. way
>>>> faster
>>>> than using JNI), but w/o introducing unnecessary translation steps in
>>>> the middle - which would make the bindings too slow for some advanced
>>>> use cases.
>>>>
>>>> I apologize for the (too) big reply - I hope you find it helpful to
>>>> understand the "why not" part of your earlier question.
>>>>
>>>> Cheers
>>>> Maurizio
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> -Markus
>>>>>
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Maurizio Cimadamore [mailto:maurizio.cimadamore at oracle.com]
>>>>> Gesendet: Donnerstag, 25. Februar 2021 17:18
>>>>> An: markus at headcrashing.eu; panama-dev at openjdk.java.net
>>>>> Betreff: Re: Using MemoryAccess with structured MemoryLayout
>>>>>
>>>>> Hi Markus,
>>>>> to read inside the struct, you can:
>>>>>
>>>>> * use the MemoryAccess API - but doing so is limited - e.g.
>>>>> MemoryAccess only supports access by physical offset or logical
>>>>> index.
>>>>>
>>>>> * create your own VarHandle which points to the desired part of the
>>>>> layout, and use that
>>>>>
>>>>> Here:
>>>>>
>>>>> https://urldefense.com/v3/__https://download.java.net/java/early_access/jdk16/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemoryLayout.html__;!!GqivPVa7Brio!LVwiSmmQDT3XCTpdxKQi2AocVfza9_6et_c92Nt2gcvxhNVkKRoQn59203xQEPwO1lXaSNk$
>>>>>
>>>>> More specifically:
>>>>>
>>>>>
>>>>> ```
>>>>> SequenceLayout taggedValues = MemoryLayout.ofSequence(5,
>>>>>      MemoryLayout.ofStruct(
>>>>>          MemoryLayout.ofValueBits(8,
>>>>> ByteOrder.nativeOrder()).withName("kind"),
>>>>>          MemoryLayout.ofPaddingBits(24),
>>>>>          MemoryLayout.ofValueBits(32,
>>>>> ByteOrder.nativeOrder()).withName("value")
>>>>>      )
>>>>> ).withName("TaggedValues");
>>>>>
>>>>> ```
>>>>>
>>>>> And
>>>>>
>>>>> ```
>>>>> VarHandle valueHandle = taggedValues.varHandle(int.class,
>>>>>                                                 PathElement.sequence
>>>>> El
>>>>> ement(),
>>>>>                                                 PathElement.groupEle
>>>>> me
>>>>> nt("value"));
>>>>> ```
>>>>>
>>>>> Cheers
>>>>> Maurizio
>>>>>
>>>>>
>>>>> On Thu, 2021-02-25 at 17:01 +0100, Markus KARG wrote:
>>>>>> On Windows, many API function have C struct as parameters.
>>>>>>
>>>>>> It is rather straightforward to set up a structured MemoryLayout.
>>>>>>
>>>>>> In case I want to easily poke bytes into that struct, I'd like to
>>>>>> use
>>>>>> MemoryAccess.
>>>>>>
>>>>>> Unfortunately, there seem to be no EASY / SIMPLE way to write:
>>>>>>
>>>>>> MemoryAccess.setIntAt(MEMBER_OF_SUCH_A_STRUCT,
>>>>>> VALUE_OF_THAT_MEMBER);
>>>>>>
>>>>>> .or I missed to see it in the JavaDocs.
>>>>>>
>>>>>> Is this possible? If yes, how? If not, why not?
>>>>>>
>>>>>> -Markus
>>>>>>