[9] RFR (S): 8161720: Better byte behavior for off-heap data
Zoltán Majó
zoltan.majo at oracle.com
Fri Aug 26 13:54:34 UTC 2016
Hi John,
thank you for the feedback!
On 08/25/2016 10:25 PM, John Rose wrote:
> On Aug 25, 2016, at 11:00 AM, Paul Sandoz <paul.sandoz at oracle.com
> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>> Would you mind adding some comments to byte2bool and bool2byte
>> saying this is consistent with the behaviour in HotSpot e.g. that reads
>> work for T_BOOLEAN or JNI values, and writes normallize?
>
> +1
I added the documentation you and Paul have requested. I included some
of the text on conversion conventions from your previous email. I hope
that's fine.
Here is the updated webrev (I've changed only the jdk code relative to
the previous webrev):
http://cr.openjdk.java.net/~zmajo/8161720/webrev.03/jdk/
Thank you!
Best regards,
Zoltan
>
> Enclosed is some background, FTR.
>
> — John
>
> The JVM converts ints to booleans using two different conventions,
> byte testing against zero and truncation to least-significant bit.
>
> The JNI documents specify that, at least for returning values from
> native methods, a Java boolean (T_BOOLEAN) value is converted
> to the value-set 0..1 by first truncating to a byte (0..255 or maybe
> -128..127) and then testing against zero. The present change
> (JDK-8161720) extends this behavior when loading a byte
> off-heap data, which is nice and consistent. Thus, Java booleans
> in non-Java data structures are by convention represented as
> 8-bit containers containing either zero (for false) or any non-zero
> value (for true).
>
> (The choice of convention is not highly constrained, since C does
> not share the boolean type with Java. C data structures contain
> Java booleans only when they hide under other types.)
>
> Meanwhile, Java booleans in the heap are also stored in bytes,
> but are strongly normalized to the value-set 0..1. If you happen
> to use Unsafe to load such an on-heap boolean as if it were
> off-heap, the compare-against-zero normalization will be a no-op.
>
> (If the compiler can prove that in fact the load only applies to
> on-heap data, then the compare-against-zero can be elided.
> That's what Zoltan has done here at my request—thanks!)
>
> (Note that Unsafe is carefully designed so that a single instruction
> can point, dynamically, to either on-heap or off-heap data.
> This allows certain hot loops to be devirtualized.)
>
> People who look closely will notice that compilers (and MethodHandles)
> use a different convention for normalizing on-heap and in-JVM values,
> in which byte2bool(x) := (x & 1) ? true : false. This is compatible
> with the reverse bool2byte(x) := x ? 1 : 0, but allows slightly better
> code, since the single low bit gets copied through any number of
> back-and-forth conversions, with never any testing against zero.
>
> (Testing against zero requires a little extra help from the CPU carry
> propagator, and thus an occasional extra instruction. Also, if a 32-bit
> value is converted from JVM stack to a boolean, it's better to truncate
> directly to the least significant bit, instead of truncating to a byte,
> and then testing those eight bits against zero. As I said,it's *slightly*
> better just to copy the bit around, and ignore the size of the container.)
>
> So, for values that are part of pure bytecode execution, a boolean
> is defined as a one-bit field of a byte container. If an integral value
> is stored into a boolean variable, it is truncated in one step by
> discarding all but the least significant bit (LSB).
>
> Since the Java type system does not allow free conversion between
> boolean and other types, this aspect of booleans does not usually
> show up, but the truncation is happening in there.
>
> The JVM user can see boolean truncation in two places.
> First, MethodHandles.explicitCastArguments is specified to use
> truncation-to-LSB when converting numeric values to boolean.
> Second,f or directly generated bytecodes, certain bytecodes
> quietly mask off all but the LSB of a stacked value before passing
> it to a value with a type descriptor of "Z". These bytecodes include
> "ireturn", "putfield", and "iastore", so that clients of methods
> and data structures that produce booleans can be assured
> that these booleans will be clean (either 0 or 1).
>
> (This surprises people sometimes, as does the fact that the
> JVM verifier primarily concentrates the distinctions among
> int/long/float/double/reference, and doesn't distinguish among
> the various subrange-types carried by int.)
>
> The truncations in ireturn/putfield/iastore were added as a
> security fix to the JVM fairly recently (I won't comment on
> why that might have been) but they are a no-op for all
> Java code, and in fact for all honest bytecodes.
>
> The bottom line of all this is that on-heap booleans are
> stored in normalized form, but off-heap booleans are not
> presumed to be so normalized.
>
> Sometimes on-heap data is used under a different type,
> as when byte buffer viewing operations inspect the bytes
> of a byte array as if they were little-endian or big-endian
> longs. (Unsafe allows this also.) This is a useful feature,
> which is important to Project Panama, where native
> structures (think "struct stat") can be captured as
> bitwise snapshots in on-heap long arrays.
>
> In those cases of on-heap type punning, when
> Unsafe.getBoolean grabs a boolean from a non-boolean
> on-heap variable (or part of a variable), it will normalize
> the loaded byte just as if it came from off-heap.
>
> When a boolean is *stored* to a byte in this way,
> the JVM will be working with a pre-normalized value
> (since booleans on heap and in the JVM stack
> are strongly normalized), and will just store the
> byte without testing it. Of course, the apparent
> test (z?1:0) is optimized to a simple copy when
> the JIT can deduce that the tested value is normalized.
>
More information about the hotspot-compiler-dev
mailing list