[9] RFR (S): 8161720: Better byte behavior for off-heap data

Thu Aug 25 20:25:57 UTC 2016

On Aug 25, 2016, at 11:00 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
> Would you mind adding some comments to byte2bool and bool2byte
> saying this is consistent with the behaviour in HotSpot e.g. that reads
> work for T_BOOLEAN or JNI values, and writes normallize?

+1

Enclosed is some background, FTR.

— John

The JVM converts ints to booleans using two different conventions,
byte testing against zero and truncation to least-significant bit.

The JNI documents specify that, at least for returning values from
native methods, a Java boolean (T_BOOLEAN) value is converted
to the value-set 0..1 by first truncating to a byte (0..255 or maybe
-128..127) and then testing against zero.  The present change
(JDK-8161720) extends this behavior when loading a byte
off-heap data, which is nice and consistent.  Thus, Java booleans
in non-Java data structures are by convention represented as
8-bit containers containing either zero (for false) or any non-zero
value (for true).

(The choice of convention is not highly constrained, since C does
not share the boolean type with Java.  C data structures contain
Java booleans only when they hide under other types.)

Meanwhile, Java booleans in the heap are also stored in bytes,
but are strongly normalized to the value-set 0..1.  If you happen
to use Unsafe to load such an on-heap boolean as if it were
off-heap, the compare-against-zero normalization will be a no-op.

(If the compiler can prove that in fact the load only applies to
on-heap data, then the compare-against-zero can be elided.
That's what Zoltan has done here at my request—thanks!)

(Note that Unsafe is carefully designed so that a single instruction
can point, dynamically, to either on-heap or off-heap data.
This allows certain hot loops to be devirtualized.)

People who look closely will notice that compilers (and MethodHandles)
use a different convention for normalizing on-heap and in-JVM values,
in which byte2bool(x) := (x & 1) ? true : false.  This is compatible
with the reverse bool2byte(x) := x ? 1 : 0, but allows slightly better
code, since the single low bit gets copied through any number of
back-and-forth conversions, with never any testing against zero.

(Testing against zero requires a little extra help from the CPU carry
propagator, and thus an occasional extra instruction.  Also, if a 32-bit
value is converted from JVM stack to a boolean, it's better to truncate
directly to the least significant bit, instead of truncating to a byte,
and then testing those eight bits against zero.  As I said,it's *slightly*
better just to copy the bit around, and ignore the size of the container.)

So, for values that are part of pure bytecode execution, a boolean
is defined as a one-bit field of a byte container.  If an integral value
is stored into a boolean variable, it is truncated in one step by
discarding all but the least significant bit (LSB).

Since the Java type system does not allow free conversion between
boolean and other types, this aspect of booleans does not usually
show up, but the truncation is happening in there.

The JVM user can see boolean truncation in two places.
First, MethodHandles.explicitCastArguments is specified to use
truncation-to-LSB when converting numeric values to boolean.
Second,f or directly generated bytecodes, certain bytecodes
quietly mask off all but the LSB of a stacked value before passing
it to a value with a type descriptor of "Z".  These bytecodes include
"ireturn", "putfield", and "iastore", so that clients of methods
and data structures that produce booleans can be assured
that these booleans will be clean (either 0 or 1).

(This surprises people sometimes, as does the fact that the
JVM verifier primarily concentrates the distinctions among
int/long/float/double/reference, and doesn't distinguish among
the various subrange-types carried by int.)

The truncations in ireturn/putfield/iastore were added as a
security fix to the JVM fairly recently (I won't comment on
why that might have been) but they are a no-op for all
Java code, and in fact for all honest bytecodes.

The bottom line of all this is that on-heap booleans are
stored in normalized form, but off-heap booleans are not
presumed to be so normalized.

Sometimes on-heap data is used under a different type,
as when byte buffer viewing operations inspect the bytes
of a byte array as if they were little-endian or big-endian
longs.  (Unsafe allows this also.)  This is a useful feature,
which is important to Project Panama, where native
structures (think "struct stat") can be captured as
bitwise snapshots in on-heap long arrays.

In those cases of on-heap type punning, when
Unsafe.getBoolean grabs a boolean from a non-boolean
on-heap variable (or part of a variable), it will normalize
the loaded byte just as if it came from off-heap.

When a boolean is *stored* to a byte in this way,
the JVM will be working with a pre-normalized value
(since booleans on heap and in the JVM stack
are strongly normalized), and will just store the
byte without testing it.  Of course, the apparent
test (z?1:0) is optimized to a simple copy when
the JIT can deduce that the tested value is normalized.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160825/2a342133/attachment-0001.html>