[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 31 17:08:37 UTC 2019
Hi Henry,
the code looks good, but stepping back I think there's an overall theme
here of what the right choice for the default should be, and the tension
between implicit and explicit. So far we have erred on the side of
explicit-ness, arguing that implicit info is typically a place where
subtle bugs can hide. I think endianness put this assumption to the
test, in a way that I will try to capture in this email.
tl;dr;
the name of the game is _inferring_ sensible defaults; this doesn't mean
that the underlying layouts have no explicit information, just that the
explicit information is _inferred_.
First, let's remove jextract from the discussion - this discussion is
NOT about having the binder magically guessing what the right layout
should be - we assume that jextract will construct the fully explicit
layouts, as per the descriptions in the annotations (which are fully
explicit).
This discussion is then mostly about programmatic access - and what
defaults should the API provide (if any); I see mainly two places that
would be affected by this:
*) constants such as NativeTypes.INT32 - should this be LE, BE or PE ?
*) When clients build a value layout w/o specifying endianness, should
the result be LE, BE or PE?
I think that we have three options:
a) default means platform endianness
b) default means something specific, either LE or BE
c) we remove defaults - e.g. not possible to construct value layout w/o
endianness, no NativeTypes constants w/o endianness - everything explicit
Let's consider the options:
I believe (b) will be problematic - as visible in the memory region
code, if a user creates/uses a int native type for int he would expect
to be able to read an int - example:
try (Scope sc = Scope.newNativeScope) {
Pointer<Integer> pi = sc.allocate(NativeTypes.INT32);
pi.set(42);
assertEquals(pi.get(), 42);
}
If the above test fails in a platform-dependent way, I think users will
be very confused!
Which leaves us to either (a) or (c). (c) seems a bit of a nuclear
option - of course it is a possibility that is self-consistent - but now
all users will have to reason about endianness when writing simple
snippets such as the one above. If you have jextract to help you, this
is not an issue, as the correct information will be extracted from the C
API, but here we're talking about a piece of Java code, and its ability
to be portable across platforms. So this decision affects deeply the
usability of our off-heap API.
(a) seems of course the most user-friendly option - if the user doesn't
specify anything, we pick something that makes sense (e.g. where ints
can be written and re-read on a memory location w/o extra swaps :-)).
The cost this approach has is more evident when different pieces of code
are interacting using different endiannes - let's imagine:
try (Scope sc = Scope.newNativeScope) {
Pointer<Integer> pi = sc.allocate(NativeTypes.INT32);
Pointer<Integer> pi_le = sc.allocate(NativeTypes.LE_INT32);
pi.set(42);
Pointer.copy(pi, pi_le);
assertEquals(pi_le.get(), 42); //??
}
The result of this operation is, again, going to be platform-dependent -
for platforms which are LE, this test passes - for BE platforms this
test fails (as our pointer copy cannot take care of swapping LE/BE bits).
Overall, I think there's no 'right' solution here. It is pretty much a
'pick your poison scenario' - do we care more about usability? Or do we
care more about correctness? The former implies (a), the latter implies
(c). As much as I'd like to strive for correctness, I'm well aware that
if we go down the (c) path, and make it impossible for people to say
INT32 w/o specifying endianness, (1) people will be mad and (2) after
finished fuming, they will immediately write code like this:
static LayoutType INT32 = ByteOrder.nativeOrder() ==
ByteOrder.BIG_ENDIAN ?
NativeTypes.BE_INT32 : NativeTypes.LE_INT32; // why, oh why?
And the resulting code might not be that much better off.
So, this might be one of those cases where being too paternalistic could
backfire. So, out on a limb, I'd say let's pick (a).
Also, as discussed offline with Henry, I think it's also important that,
if we go down the (a) path, "inferring" the right endianness is
something that happens when we create the layout - after which we should
be able to see which endianness has been inferred by querying the layout
API. This IMHO makes swallowing (a) easier, as it's always possible to
recover the info to diagnose what is going on.
If we like this idea of 'inferring' information to offer a more usable
API, then I'd say that we should also provide _unsized_ constants in
NativeTypes like INT, LONG etc. whose size will be inferred at
construction time (again the user can query that size if he wants to).
This would also eliminate the need for having the Type enum in the
SystemABI interface.
Thoughts?
Maurizio
On 31/01/2019 16:21, Henry Jen wrote:
> Hi,
>
> Please review the webrev[1] that perform necessary conversion based on declared endianness of Value layout.
>
> In the webrev, also fixed some issues:
>
> 1. enum constant need to have ACC_FINAL
> 2. union field offset is not correct
>
> [1] https://cr.openjdk.java.net/~henryjen/panama/endianness/03/webrev/
>
> Cheers,
> Henry
>
More information about the panama-dev
mailing list