[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness

John Rose john.r.rose at oracle.com
Thu Jan 31 18:41:16 UTC 2019


On Jan 31, 2019, at 9:08 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> Thoughts?

You already know my main thought here:  It's easy to
create deep future troubles with these choices.

Some more thoughts on this thorny problem:  I think
it's reasonable to opt in explicitly to platform polarity
and even platform sizes.  So, yes, there should be constants
that provide all of that.

But there also need to be ways to get precision, without
having the platform in the way.

One key question is what should be the *first* set of
types that a Panama user encounters?  That first set
is the set of types which sets the overall tone of either
precision or magical platform dependence.

The magic in the latter case feels like a creature comfort
at first.  And then your system size grows to include some
portability requirement, such as WORA or network protocols.
At that point all of your creature comforts become crawling
bugs.  You hit a wall until you find all of them.  If you didn't
opt into them explicitly, then it takes a very long time to
find them all, and your portability story keeps failing until
you find them all.

This took *years* to do in HotSpot when we went from x86
to x86+SPARC.  It was miserable.  There were many bad
fixes due to some engineer hopefully swapping bytes at
one point, and later finding out the bad order was
somewhere else.  I think there are still bad spots where
we have a double-swap somewhere, or a poorly named
hi/lo or first/second distinction that no longer makes sense.
Moral of the story:  You can't take back a decision to ignore
byte order or integer size, without spending months of
reengineering and bug chasing.  Let's not do that in Panama.

One reason I care about portability here is that portability
is one of the values that Java adds to C, and Panama is the
place where C libraries can be upgraded to play in Java's
world.  Once you are coding in Java, you (usually) don't
have to worry platform dependencies.  In Panama, jextract
is where the dependencies are injected, and it's clear that
a jextract-generated API is platform dependent.  But writing
Java code from scratch needs to be platform-independent
until proven otherwise (which means an explicit opt-in).

(Similar points can be made about safety.  Java is safe
routinely where C is unsafe routinely.  Java code from
scratch should be as safe and portable as we can make it.)

So, my conviction is that when a user programs with the
Java API, as opposed with jextract, we need to make sure
the first names that user encounters are the solid names.
Solid names don't have secret platform dependencies on
size or byte order.  Does the user not care about platform
neutrality?  That's fine, just opt into the platform specific
names.  I would prefer to do this with an import of a
nested class, so there's evidence at the top of the source
file, and in the classfile, that platform dependencies are
being injected into the code.

import static java.foreign.NativeType.*;  // solid types only
import static java.foreign.CurrentPlatform.*;  // platform-endian types
import static java.foreign.CurrentCType.*;  // platform types defined by C

(I just noticed that endian polarity and int-size work
like locale do in string operations.  Regarding locale,
I think our overall practice is to back away from "magic"
APIs which vary their behavior based on what country
the JVM woke up in.  IIRC in the early days of Java there
were more such "magic" APIs, because who could object
to helping the programmer make an easy decision?
Let's learn from our past mistakes!)

By the way, platform sizes are not CPU-specific but
ABI-specific and even C-language specific.  The
endian polarity of the platform is visible even if you
are staying inside of Java, because of the byte order
of objects on the heap.  But there's never any doubt
about the size of Java values.  That's why I posit three
sets of names in the import above.

Now for the portable names, there's the question of
whether LE or BE should be the default, or whether
both should be explicit.  I'd be fine either any of
those three answers, because, given an assurance
that the names are not "magic" and don't secretly
change their meanings, a programmer can reasonably
learn any fixed convention.  I think little-endian
is a graceful choice for a fixed convention, but I
would hate to waste time replaying a tedious flame
war between endian advocates.  Anybody familiar
with assembly-level programming in both polarities
can form a pretty clear opinion as to which convention
is slightly more natural than the other, depending
on their own personal definition of "natural".  And
they should probably keep it to themselves.

One way to please everybody on polarity would be
to (again) supply a way to make an explicit import,
at the header of the source file, to show exactly
what's going one:

import static java.foreign.NativeType.LE.*;
//or import static java.foreign.NativeType.BE.*;

And then we have NativeType.LE_INT32 and
*also* NativeType.BE.INT32.  A funny naming
convention invented, and everybody queues
up at their chosen window.  No confusion,
because every source file (and every classfile)
says exactly what are the ground rules.

— John


More information about the panama-dev mailing list