[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness

Thu Jan 31 17:08:37 UTC 2019

Hi Henry,
the code looks good, but stepping back I think there's an overall theme 
here of what the right choice for the default should be, and the tension 
between implicit and explicit. So far we have erred on the side of 
explicit-ness, arguing that implicit info is typically a place where 
subtle bugs can hide. I think endianness put this assumption to the 
test, in a way that I will try to capture in this email.

tl;dr;

the name of the game is _inferring_ sensible defaults; this doesn't mean 
that the underlying layouts have no explicit information, just that the 
explicit information is _inferred_.

First, let's remove jextract from the discussion - this discussion is 
NOT about having the binder magically guessing what the right layout 
should be - we assume that jextract will construct the fully explicit 
layouts, as per the descriptions in the annotations (which are fully 
explicit).

This discussion is then mostly about programmatic access - and what 
defaults should the API provide (if any); I see mainly two places that 
would be affected by this:

*) constants such as NativeTypes.INT32 - should this be LE, BE or PE ?

*) When clients build a value layout w/o specifying endianness, should 
the result be LE, BE or PE?

I think that we have three options:

a) default means platform endianness

b) default means something specific, either LE or BE

c) we remove defaults - e.g. not possible to construct value layout w/o 
endianness, no NativeTypes constants w/o endianness - everything explicit

Let's consider the options:

I believe (b) will be problematic -  as visible in the memory region 
code, if a user creates/uses a int native type for int he would expect 
to be able to read an int - example:

try (Scope sc = Scope.newNativeScope) {
     Pointer<Integer> pi = sc.allocate(NativeTypes.INT32);
     pi.set(42);
     assertEquals(pi.get(), 42);
}

If the above test fails in a platform-dependent way, I think users will 
be very confused!

Which leaves us to either (a) or (c). (c) seems a bit of a nuclear 
option - of course it is a possibility that is self-consistent - but now 
all users will have to reason about endianness when writing simple 
snippets such as the one above. If you have jextract to help you, this 
is not an issue, as the correct information will be extracted from the C 
API, but here we're talking about a piece of Java code, and its ability 
to be portable across platforms. So this decision affects deeply the 
usability of our off-heap API.

(a) seems of course the most user-friendly option - if the user doesn't 
specify anything, we pick something that makes sense (e.g. where ints 
can be written and re-read on a memory location w/o extra swaps :-)). 
The cost this approach has is more evident when different pieces of code 
are interacting using different endiannes - let's imagine:

try (Scope sc = Scope.newNativeScope) {
     Pointer<Integer> pi = sc.allocate(NativeTypes.INT32);
     Pointer<Integer> pi_le = sc.allocate(NativeTypes.LE_INT32);
     pi.set(42);
     Pointer.copy(pi, pi_le);
     assertEquals(pi_le.get(), 42); //??
}

The result of this operation is, again, going to be platform-dependent - 
for platforms which are LE, this test passes - for BE platforms this 
test fails (as our pointer copy cannot take care of swapping LE/BE bits).

Overall, I think there's no 'right' solution here. It is pretty much a 
'pick your poison scenario' - do we care more about usability? Or do we 
care more about correctness? The former implies (a), the latter implies 
(c). As much as I'd like to strive for correctness, I'm well aware that 
if we go down the (c) path, and make it impossible for people to say 
INT32 w/o specifying endianness, (1) people will be mad and (2) after 
finished fuming, they will immediately write code like this:

static LayoutType INT32 =  ByteOrder.nativeOrder() == 
ByteOrder.BIG_ENDIAN ?
          NativeTypes.BE_INT32 : NativeTypes.LE_INT32; // why, oh why?

And the resulting code might not be that much better off.

So, this might be one of those cases where being too paternalistic could 
backfire. So, out on a limb, I'd say let's pick (a).

Also, as discussed offline with Henry, I think it's also important that, 
if we go down the (a) path, "inferring" the right endianness is 
something that happens when we create the layout - after which we should 
be able to see which endianness has been inferred by querying the layout 
API. This IMHO makes swallowing (a) easier, as it's always possible to 
recover the info to diagnose what is going on.

If we like this idea of 'inferring' information to offer a more usable 
API, then I'd say that we should also provide _unsized_ constants in 
NativeTypes like INT, LONG etc. whose size will be inferred at 
construction time (again the user can query that size if he wants to). 
This would also eliminate the need for having the Type enum in the 
SystemABI interface.

Thoughts?

Maurizio

On 31/01/2019 16:21, Henry Jen wrote:
> Hi,
>
> Please review the webrev[1] that perform necessary conversion based on declared endianness of Value layout.
>
> In the webrev, also fixed some issues:
>
> 1. enum constant need to have  ACC_FINAL
> 2. union field offset is not correct
>
> [1] https://cr.openjdk.java.net/~henryjen/panama/endianness/03/webrev/
>
> Cheers,
> Henry
>