A special, built-in value type: a 64-bit "fixnum"

Thu Apr 23 13:58:59 UTC 2015

The use cases I had in mind are indeed fixnums for alternative languages (I
do not propose adding fixnum semantics into Java and/or the JVM), and, more
importantly perhaps, the ability to implement B-Trees and other similar
data structures efficiently. The three-array (or even two-array) bundle is
what we use today. Some analysis of such data structure (with JMH and
perfasm) showed that the multiple arrays are the main source for "bad"
cache-misses. In a balanced tree data structure, you really want to keep
index information (primitives) and references to child nodes in the same
array (usually with the index cell or cells immediately adjacent to the
child-node reference).

Regardless of future tagged value-unions that may or may not come one day,
this single value type does not seem to restrict future ideas (of course,
the tag bit is obscured, though it's pretty obvious it should be the high
bit). Doesn't the fact that this is a single value type ensures that the GC
isn't affected in the common case?

Ron

On Wed, Apr 22, 2015 at 10:45 PM, John Rose <john.r.rose at oracle.com> wrote:

> The point is to be able to overlay primitives with references,
> so that their storage is compact, and so they can be arranged in arrays.
>
> I worked through this line of thought and got here after some refinements:
>   http://hg.openjdk.java.net/mlvm/mlvm/hotspot/raw-file/tip/tagu.txt
>
> 63 bits for a reference is more than anybody needs for a long time.
> OTOH if you only ask for about 50 bits for references (still generous),
> you can have all possible double values and nearly all long values.
>
> The simplicity of the null check, and any new is-ref check, critically
> affects
> GC performance.  Also, the HotSpot GC assumes strongly that managed
> pointers are never encoded or obscured (except, uniformly, by scaling
> when they are compressed).
>
> These factors push us towards an "address-native" storage format.  The
> details
> of the format depend sensitively on pointer compression mode, endian-ness,
> and whether object addresses can be negative (sign bits set).  For that
> reason,
> any API for such a tagged value must hide the position and coding of the
> tag bits.
>
> ("Address-native" means that if the variable is in the is-ref state the
> memory
> contents are indistinguishable from a regular managed reference.  This
> means
> that loading a long or double requires some sort of rotation in value
> space.)
>
> The union check done by the GC becomes a range check (or high-bit test)
> instead of a single-bit test.  This is preferable to a bit test because it
> can
> (on some machines) be merged into the null check which the GC already
> does.
>
> All that said, any such change is going to be really if it introduces a new
> signature type.  The next break we make for signatures must have a bigger
> payoff—either parametric polymorphism or full value types.  This is why
> I (personally) stopped working on "tagu.patch".
>
> But, to end on a more hopeful note, Rickard Backman has prototyped
> something like this in a clever way that avoids committing us to a new
> value type or signature:  He has created an ad hoc array object that
> can hold the sorts of two-way ref/prim unioned things you want.
> I suppose you could build heterogeneous sequences on top of this.
>
> One final "but":  You can build compact heterogeneous sequences today,
> with a little care.  A bundle of three arrays would do nicely:  N bytes for
> tags, P longs for the primitive bits, and R objects for the refs (where
> N=P+R).
> On some JVMs, that could be more compact than an array of unions,
> when there are mostly (32-bit) refs.  Three array headers is more than
> one, yes, but that only matters if you have very short sequences.
> In the JSR 292 implementation we use old-fashioned Object[] varargs
> arrays of boxed numbers, when necessary.  I periodically reconsider
> using N/P/R bundles, but it hasn't seemed worth it yet.  Perhaps
> your use case makes Object[] arrays impractical?
>
> — John
>
> On Apr 22, 2015, at 4:11 AM, Ron Pressler <ron at paralleluniverse.co> wrote:
> >
> > Hi.
> > I'd like to propose that the Valhalla project include a single special,
> > built-in value type: a 64-bit "fixnum". The value has a single bit
> > discriminating between a reference or a 63-bit long. It will, of course,
> be
> > treated correctly by the GC.
> >
> > For completeness, a couple of static helper functions may be introduced.
> > One that takes a long and, preserving the sign, truncates it to 63 bits,
> > throwing an exception in the case of an overflow, and the other taking a
> > double and truncating down to 63 bits, truncating precision by one bit
> (and
> > another for the reverse 63-bit double -> double operation).
> >
> > I believe this will be immensely useful for some applications that
> > currently require two separate arrays to store a value of either a
> > primitive or a reference, yet would require minimal work for GC support.
> Of
> > course, this proposal can be extended to directly support any 63-bit (or
> > smaller) value type, but even in its minimal form it is extremely useful.
> >
> > Ron
>
>