[foreign-memaccess] layout constants and endianness - take two

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Jul 18 12:10:28 UTC 2019


Quoting from an email sent by Brian few days ago on this list:

> Let's just bear in mind that we can think of at least three categories 
> of users, each of which will have different ideas about endianness:
>
>  - Traditional FFI users.  These folks are calling native methods, and 
> dealing with native layouts.  Some of the time, these folks don't care 
> about endianness, but sometimes they do -- such as if they are laying 
> out a network packet.  They want the ability to control endianness, 
> but probably want simple defaults.
>
>  - "Pure" Java off-heap users.  These folks are using off-heap memory 
> strictly to sidestep the GC and runtime; the data is never leaving the 
> machine.  They aggressively don't care about endianness, and will hate 
> you if you make them think about it.
>
>  - Java interop users.  These folks are doing things like protobuf, 
> which by definition is leaving the machine.  These guys need explicit 
> control over endianness all the time.
>
> There might be others too!  I guess my plea here is: don't sacrifice 
> the "Pure" java users for the sake of the others. There needs to be an 
> easy way to say "Java primitive int please, with all that entails", 
> and not get wrapped up in bit sizes, order, etc. 

This is a good characterization of who the users are. Now, when we move 
on to consider which set of layout constants we should provide, I think 
there are three kinds of them (one per user):

1) explicit sized constants - such as INT8, INT16, INT32, FLOAT32, 
FLOAT64... - these are useful for message protocols
2) Java-like constants such as JAVA_INT, JAVA_FLOAT, JAVA_CHAR... - 
these are useful for pure off-heap
3) ABI-dependent constants, such as C_INT, C_FLOAT ... - these are 
useful for FFI users

I think it's fair to say, that, no matter what we do, an hypothetical 
class listing all relevant layout constants should list all the possible 
combinations, e.g. for a Java int, there should be at least two 
constants e.g. JAVA_INT_BE and JAVA_INT_LE, in case users want to reach 
for them.

The big question is, what do endianness-less constant mean - e.g. 
something like a plan JAVA_INT? Here are some options:

A) Nothing - endianness is always explicit, we simply do NOT provide any 
constant that has no endianness suffix in it.

A2) Like A, but provide the constants in 'bundles', so that you can 
import them separately. E.g. instead of having JAVA_INT_BE, let's have 
BE.JAVA_INT instead (and the user can static import BE.*)

B) A default endianness value takes precedence over others - e.g. big 
endian, as in ByteBuffer API - that is, JAVA_INT == JAVA_INT_BE

C) We allow layouts to be created w/o endianness - meaning that, upon 
use, clients will have to force desired endianness - e.g. 
JAVA_INT.order(ByteOrder.BIG_ORDER)

D) We use machine endianness for all constants that do not have explicit 
endianness prefix

The simplest option is, of course (A). This option side-steps the 
'default' problem entirely. It's actually even not that bad in the sense 
that, if a user really doesn't care about endianness because he's 
serializing and deserializing from same machine, in a way, whichever 
endianness he picks he'll be fine. But for constants like the ones in 
(3) which are used with FFI, such an approach would be too taxing - if I 
want to call a native function, of course I want the same endianness as 
the machine I'm running on?

A2 improves on that, by offering a one-shot move to import all the 
constants you want at the top of the file. FFI users will pick the one 
that corresponds to the platform they want to work with; pure off-heap 
users don't care, so they will just 'pick one', but they'll do so only 
once. For message protocol users it's a bit more complex, in that they 
can't just static import both sets, as there will be conflicts - so they 
will have to explicitly use qualified names such as BE.JAVA_INT and 
LE.JAVA_INT. Not much worse than JAVA_INT_BE and JAVA_INT_LE, anyway.

(B) kind of builds on the idea that, if you remain on the same machine, 
you don't really care about endianness, so whatever default is picked is 
gonna work fine. Plus, if the default is the same as the one used by 
ByteBuffer, there's less impedance mismatch when going from 
MemorySegments to ByteBuffers. But this still does nothing for our FFI 
users who have to be endianness explicit nearly all the time (since the 
vast majority of platforms are little endian).

(C) was initially co-proposed by me/John [1] in a discussion revolving 
around the foreign branch. The idea is that we have a third endianness 
state -  NO_ENDIAN and, if the user tries to use a NO_ENDIAN layout e.g. 
to produce a memory access VarHandle, he will be met with an error 
because there's missing endianness info. I'd say a solution like this, 
while elegant, looks like overkill for pure off-heap cases (as we 
already stated, endianness is NOT relevant there). And it's also not 
great for FFI users who will have to go through a lot of goop to 
instantiate the layout with the correct endianness. So, all things 
considered, while more elegant, this has the same usability issues that 
(A) has.

(D) is based on the idea that, if you don't care about endianness, well, 
you don't - so whatever we pick will be fine (including native 
endianness). So constants/users in bucket (2) will be totally fine with 
this. On the other hand, message protocol heavy clients will need to be 
explicit anyway, and will probably prefer the endianness explicit 
constants to the implicit ones (e.g. they will use JAVA_INT_BE, not 
JAVA_INT, to adhere to the protocol more explicitly). And, 99.99% of 
times, when doing FFI, you really just do want native endianness.


So, looking at the scoreboard, it seems that D and A2 are the only 
solutions that have some chance to cater to all the various use cases. 
When it comes to D, precise, endianness-ful constants are still there 
for people who want/need to reach for them, but handy defaults are also 
provided. On the other hand, D is itself not perfect, and it has some 
pain points:

* it will bite when interfacing with ByteBuffer, which are BE by 
definition (yes, I've been a victim on this when writing a test for the 
memory access API)
* the same source code won't mean the same thing on all platforms; some 
differences can be poked at if the code 'reflectively' looks at the 
endianness property of a layout
* if we have native order-dependent constants, Constable/folding support 
kind of goes out of the window, or is made more complex by the fact that 
endianness at compile-time might be different from that at run-time

A2 is of course free from all these issues - since it basically 
side-steps the question of setting a default, but in a clever way which 
can be worked-around by using the right set of static imports. Of course 
this would still mean that endianness-agnostic users will still have to 
make an endianness-dependent choice in their imports - but given this is 
a one-off, maybe that's not so bad.


Of course, we don't have to use a single solution for all the constants 
in buckets 1-3, so we could do something like this:

* use D (or, better, A2) for FFI-related constants - this will give FFI 
users the set of constants they want - either implicitly or via explicit 
import (a la A2)
* use B for constants in (2) - pure off-heap users don't care much about 
endianness, and, in this case, compatibility with ByteBuffer is more 
important (since most of the use cases in this space use BB at the moment)
* use A for constants in (1) - after all, message protocol users care 
about endiannes anyway


So I guess there there are some questions here:

* how worried are we about the problem with D listed above?
* how odd would it be to apply different endianness decisions to 
constants in different buckets?
* is A2 'good enough' - if we only did that, will people be happy with 
adding an import at the top of the file to choose the polarity they want?

Comments welcome

Maurizio

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2019-February/004147.html








More information about the panama-dev mailing list