A special, built-in value type: a 64-bit "fixnum"

Thu Apr 23 16:05:16 UTC 2015

Ron,

Here are some relevant discussions on mlvm-dev about Rickard's 
heterogeneous array work:
   http://mail.openjdk.java.net/pipermail/mlvm-dev/2012-July/004665.html
 
http://mail.openjdk.java.net/pipermail/mlvm-dev/2012-September/004957.html

Best regards,
Vladimir Ivanov

On 4/23/15 6:55 PM, Ron Pressler wrote:
> John, having read your document -- which pretty much describes my proposal
> exactly -- I'd like you to explain your reservations. As value types are
> being added anyway as part of Valhalla, why not tack this on? Also, can you
> please describe Rickard Backman heterogeneous array work in more detail?
>
> Ron
>
> On Wed, Apr 22, 2015 at 10:45 PM, John Rose <john.r.rose at oracle.com> wrote:
>
>> The point is to be able to overlay primitives with references,
>> so that their storage is compact, and so they can be arranged in arrays.
>>
>> I worked through this line of thought and got here after some refinements:
>>    http://hg.openjdk.java.net/mlvm/mlvm/hotspot/raw-file/tip/tagu.txt
>>
>> 63 bits for a reference is more than anybody needs for a long time.
>> OTOH if you only ask for about 50 bits for references (still generous),
>> you can have all possible double values and nearly all long values.
>>
>> The simplicity of the null check, and any new is-ref check, critically
>> affects
>> GC performance.  Also, the HotSpot GC assumes strongly that managed
>> pointers are never encoded or obscured (except, uniformly, by scaling
>> when they are compressed).
>>
>> These factors push us towards an "address-native" storage format.  The
>> details
>> of the format depend sensitively on pointer compression mode, endian-ness,
>> and whether object addresses can be negative (sign bits set).  For that
>> reason,
>> any API for such a tagged value must hide the position and coding of the
>> tag bits.
>>
>> ("Address-native" means that if the variable is in the is-ref state the
>> memory
>> contents are indistinguishable from a regular managed reference.  This
>> means
>> that loading a long or double requires some sort of rotation in value
>> space.)
>>
>> The union check done by the GC becomes a range check (or high-bit test)
>> instead of a single-bit test.  This is preferable to a bit test because it
>> can
>> (on some machines) be merged into the null check which the GC already
>> does.
>>
>> All that said, any such change is going to be really if it introduces a new
>> signature type.  The next break we make for signatures must have a bigger
>> payoff—either parametric polymorphism or full value types.  This is why
>> I (personally) stopped working on "tagu.patch".
>>
>> But, to end on a more hopeful note, Rickard Backman has prototyped
>> something like this in a clever way that avoids committing us to a new
>> value type or signature:  He has created an ad hoc array object that
>> can hold the sorts of two-way ref/prim unioned things you want.
>> I suppose you could build heterogeneous sequences on top of this.
>>
>> One final "but":  You can build compact heterogeneous sequences today,
>> with a little care.  A bundle of three arrays would do nicely:  N bytes for
>> tags, P longs for the primitive bits, and R objects for the refs (where
>> N=P+R).
>> On some JVMs, that could be more compact than an array of unions,
>> when there are mostly (32-bit) refs.  Three array headers is more than
>> one, yes, but that only matters if you have very short sequences.
>> In the JSR 292 implementation we use old-fashioned Object[] varargs
>> arrays of boxed numbers, when necessary.  I periodically reconsider
>> using N/P/R bundles, but it hasn't seemed worth it yet.  Perhaps
>> your use case makes Object[] arrays impractical?
>>
>> — John
>>
>> On Apr 22, 2015, at 4:11 AM, Ron Pressler <ron at paralleluniverse.co> wrote:
>>>
>>> Hi.
>>> I'd like to propose that the Valhalla project include a single special,
>>> built-in value type: a 64-bit "fixnum". The value has a single bit
>>> discriminating between a reference or a 63-bit long. It will, of course,
>> be
>>> treated correctly by the GC.
>>>
>>> For completeness, a couple of static helper functions may be introduced.
>>> One that takes a long and, preserving the sign, truncates it to 63 bits,
>>> throwing an exception in the case of an overflow, and the other taking a
>>> double and truncating down to 63 bits, truncating precision by one bit
>> (and
>>> another for the reverse 63-bit double -> double operation).
>>>
>>> I believe this will be immensely useful for some applications that
>>> currently require two separate arrays to store a value of either a
>>> primitive or a reference, yet would require minimal work for GC support.
>> Of
>>> course, this proposal can be extended to directly support any 63-bit (or
>>> smaller) value type, but even in its minimal form it is extremely useful.
>>>
>>> Ron
>>
>>