A special, built-in value type: a 64-bit "fixnum"

John Rose john.r.rose at oracle.com
Wed Apr 22 19:45:45 UTC 2015

The point is to be able to overlay primitives with references,
so that their storage is compact, and so they can be arranged in arrays.

I worked through this line of thought and got here after some refinements:

63 bits for a reference is more than anybody needs for a long time.
OTOH if you only ask for about 50 bits for references (still generous),
you can have all possible double values and nearly all long values.

The simplicity of the null check, and any new is-ref check, critically affects
GC performance.  Also, the HotSpot GC assumes strongly that managed
pointers are never encoded or obscured (except, uniformly, by scaling
when they are compressed).

These factors push us towards an "address-native" storage format.  The details
of the format depend sensitively on pointer compression mode, endian-ness,
and whether object addresses can be negative (sign bits set).  For that reason,
any API for such a tagged value must hide the position and coding of the tag bits.

("Address-native" means that if the variable is in the is-ref state the memory
contents are indistinguishable from a regular managed reference.  This means
that loading a long or double requires some sort of rotation in value space.)

The union check done by the GC becomes a range check (or high-bit test)
instead of a single-bit test.  This is preferable to a bit test because it can
(on some machines) be merged into the null check which the GC already

All that said, any such change is going to be really if it introduces a new
signature type.  The next break we make for signatures must have a bigger
payoff—either parametric polymorphism or full value types.  This is why
I (personally) stopped working on "tagu.patch".

But, to end on a more hopeful note, Rickard Backman has prototyped
something like this in a clever way that avoids committing us to a new
value type or signature:  He has created an ad hoc array object that
can hold the sorts of two-way ref/prim unioned things you want.
I suppose you could build heterogeneous sequences on top of this.

One final "but":  You can build compact heterogeneous sequences today,
with a little care.  A bundle of three arrays would do nicely:  N bytes for
tags, P longs for the primitive bits, and R objects for the refs (where N=P+R).
On some JVMs, that could be more compact than an array of unions,
when there are mostly (32-bit) refs.  Three array headers is more than
one, yes, but that only matters if you have very short sequences.
In the JSR 292 implementation we use old-fashioned Object[] varargs
arrays of boxed numbers, when necessary.  I periodically reconsider
using N/P/R bundles, but it hasn't seemed worth it yet.  Perhaps
your use case makes Object[] arrays impractical?

— John

On Apr 22, 2015, at 4:11 AM, Ron Pressler <ron at paralleluniverse.co> wrote:
> Hi.
> I'd like to propose that the Valhalla project include a single special,
> built-in value type: a 64-bit "fixnum". The value has a single bit
> discriminating between a reference or a 63-bit long. It will, of course, be
> treated correctly by the GC.
> For completeness, a couple of static helper functions may be introduced.
> One that takes a long and, preserving the sign, truncates it to 63 bits,
> throwing an exception in the case of an overflow, and the other taking a
> double and truncating down to 63 bits, truncating precision by one bit (and
> another for the reverse 63-bit double -> double operation).
> I believe this will be immensely useful for some applications that
> currently require two separate arrays to store a value of either a
> primitive or a reference, yet would require minimal work for GC support. Of
> course, this proposal can be extended to directly support any 63-bit (or
> smaller) value type, but even in its minimal form it is extremely useful.
> Ron

More information about the valhalla-dev mailing list