[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 31 18:59:42 UTC 2019
I think all this goes for the nuclear option (c), with some static
import reliefs.
I also think that inferring LE or BE always (as per my option b) is as
wrong as inferring platform endianness - in terms of place where bugs
can hide, and difficulty in terms of finding where such bugs are coming
from (because it's implicit).
So, I think the Value layout API should NOT have a default constructor
w/o endianness.
Maurizio
On 31/01/2019 18:41, John Rose wrote:
> On Jan 31, 2019, at 9:08 AM, Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> Thoughts?
>
> You already know my main thought here: It's easy to
> create deep future troubles with these choices.
>
> Some more thoughts on this thorny problem: I think
> it's reasonable to opt in explicitly to platform polarity
> and even platform sizes. So, yes, there should be constants
> that provide all of that.
>
> But there also need to be ways to get precision, without
> having the platform in the way.
>
> One key question is what should be the *first* set of
> types that a Panama user encounters? That first set
> is the set of types which sets the overall tone of either
> precision or magical platform dependence.
>
> The magic in the latter case feels like a creature comfort
> at first. And then your system size grows to include some
> portability requirement, such as WORA or network protocols.
> At that point all of your creature comforts become crawling
> bugs. You hit a wall until you find all of them. If you didn't
> opt into them explicitly, then it takes a very long time to
> find them all, and your portability story keeps failing until
> you find them all.
>
> This took *years* to do in HotSpot when we went from x86
> to x86+SPARC. It was miserable. There were many bad
> fixes due to some engineer hopefully swapping bytes at
> one point, and later finding out the bad order was
> somewhere else. I think there are still bad spots where
> we have a double-swap somewhere, or a poorly named
> hi/lo or first/second distinction that no longer makes sense.
> Moral of the story: You can't take back a decision to ignore
> byte order or integer size, without spending months of
> reengineering and bug chasing. Let's not do that in Panama.
>
> One reason I care about portability here is that portability
> is one of the values that Java adds to C, and Panama is the
> place where C libraries can be upgraded to play in Java's
> world. Once you are coding in Java, you (usually) don't
> have to worry platform dependencies. In Panama, jextract
> is where the dependencies are injected, and it's clear that
> a jextract-generated API is platform dependent. But writing
> Java code from scratch needs to be platform-independent
> until proven otherwise (which means an explicit opt-in).
>
> (Similar points can be made about safety. Java is safe
> routinely where C is unsafe routinely. Java code from
> scratch should be as safe and portable as we can make it.)
>
> So, my conviction is that when a user programs with the
> Java API, as opposed with jextract, we need to make sure
> the first names that user encounters are the solid names.
> Solid names don't have secret platform dependencies on
> size or byte order. Does the user not care about platform
> neutrality? That's fine, just opt into the platform specific
> names. I would prefer to do this with an import of a
> nested class, so there's evidence at the top of the source
> file, and in the classfile, that platform dependencies are
> being injected into the code.
>
> import static java.foreign.NativeType.*; // solid types only
> import static java.foreign.CurrentPlatform.*; // platform-endian types
> import static java.foreign.CurrentCType.*; // platform types defined by C
>
> (I just noticed that endian polarity and int-size work
> like locale do in string operations. Regarding locale,
> I think our overall practice is to back away from "magic"
> APIs which vary their behavior based on what country
> the JVM woke up in. IIRC in the early days of Java there
> were more such "magic" APIs, because who could object
> to helping the programmer make an easy decision?
> Let's learn from our past mistakes!)
>
> By the way, platform sizes are not CPU-specific but
> ABI-specific and even C-language specific. The
> endian polarity of the platform is visible even if you
> are staying inside of Java, because of the byte order
> of objects on the heap. But there's never any doubt
> about the size of Java values. That's why I posit three
> sets of names in the import above.
>
> Now for the portable names, there's the question of
> whether LE or BE should be the default, or whether
> both should be explicit. I'd be fine either any of
> those three answers, because, given an assurance
> that the names are not "magic" and don't secretly
> change their meanings, a programmer can reasonably
> learn any fixed convention. I think little-endian
> is a graceful choice for a fixed convention, but I
> would hate to waste time replaying a tedious flame
> war between endian advocates. Anybody familiar
> with assembly-level programming in both polarities
> can form a pretty clear opinion as to which convention
> is slightly more natural than the other, depending
> on their own personal definition of "natural". And
> they should probably keep it to themselves.
>
> One way to please everybody on polarity would be
> to (again) supply a way to make an explicit import,
> at the header of the source file, to show exactly
> what's going one:
>
> import static java.foreign.NativeType.LE.*;
> //or import static java.foreign.NativeType.BE.*;
>
> And then we have NativeType.LE_INT32 and
> *also* NativeType.BE.INT32. A funny naming
> convention invented, and everybody queues
> up at their chosen window. No confusion,
> because every source file (and every classfile)
> says exactly what are the ground rules.
>
> — John
More information about the panama-dev
mailing list