TaggedArrays (Proposal)

Rémi Forax forax at univ-mlv.fr
Mon Jul 2 06:57:30 PDT 2012


On 07/02/2012 03:05 PM, Jim Laskey wrote:
> During a week in the rarefied air of Stockholm back in May, a 
> sleepless night got me thinking.  The day after that, the thinking 
> became a reality.  I've been sitting on the code since, not sure what 
> to do next.  So..., why not start the month leading up to the JVMLS 
> with a discussion about dynamic values.
>
> Every jLanguage developer knows that primitive boxing is the enemy. 
>  Even more so for untyped languages.  We need a way to interleave 
> primitive types with references.
>
> Tagged values (value types) for dynamic languages have been approached 
> from a dozen different angles over the history of Java.  However, no 
> one seems to be satisfied with any of the proposals so far.  Either 
> the implementation is too limiting for the language developer or too 
> complex to implement.
>
> Most recently, John (Rose) proposed hiding value tagging in the JVM 
> via the Integer/Long/Float/Double.valueof methods.  I saw a few issues 
> with this proposal.  First, the implementation works differently on 32 
> bit and 64 bit platforms (only half a solution on each).  Secondly, 
> control of the tag bits is hidden such that it doesn't give a language 
> implementor any leeway on bit usage.  Finally, it will take a long 
> time for it to get introduced into the JVM.  The implementation is 
> complex, scattered all over the VM and will lead to a significant 
> multiplier for testing coverage.

but it will also help Java perf.

>
> It occurred to me on that sleepless Monday night, that the solution 
> for most dynamic languages could be so much simpler.  First, we have 
> to look at what it is we really need.  Ultimately it's about boxing. 
>  We want to avoid allocating memory whenever we need to store a 
> primitive value in an object.  Concerning ourselves with passing 
> around tagged values in registers and storing in stack frames is all 
> red herring.  All that is needed is a mechanism for storing tagged 
> values (or compressed values) in a no-type slot of a generic object. 
>  Thinking about it in these terms isolates all issues to a single 
> array-like class, and thus simplifies implementation and simplifies 
> testing.  Instances of this class can be used as objects, as stack 
> frames and even full stacks.  A good percentage of a dynamic language 
> needs are covered.

using it as a stack frames will require a pretty good escape analysis if 
you want same perf as the native stack
or is there a trick somewhere ?
But given that there is a trick to avoid boxing for local variables (see 
my talk at next JVM Summit),
having an array like this just for storing fields is enough to pull its 
weight.

>
> So, Rickard Bäckman (also of Oracle) and I defined an API and 
> implemented (in HotSpot) an interface called TaggedArray. 
>  Conceptional, TaggedArray is a fixed array of no-type slots (64-bit), 
> where each slot can contain either a reference or a tagged long value 
> (least significant bit set.)  Internally, TaggedArray class's doOop 
> method knows that it should skip any 64-bit value with the least 
> significant bit set.  How the language developer uses the other 63 
> bits is up to them.  References are just addresses.  On 32 bit 
> machines, the address (or packed address) is stored in the high 
> 32-bits (user has no access)  So there is no interference with the tag 
> bit.
>
> We supply four implementations of the API.  1) is a naive two parallel 
> arrays (one Object[], one long[]) implementation for platforms not 
> supporting TaggedArrays (and JDK 1.7), 2) an optimized version of 1) 
>  that allocates each array on demand, 3) a JNI implementation 
> (minimally needed for the interpreter) that uses the native 
> implementation and 4) the native implementation that is recognized by 
> both the C1/C2 compilers (effort only partially completed.)  In 
> general, the implementation choice is transparent to the user (optimal 
> choice.)

Being able to subclass it in order to add fixed field like a metaclass 
field, i.e a field that
is always a reference, would be cool too.

About the API, the two method set should be setValue()/setReference().
I think that getValue()/setValue() should return the long with the bit 
set because
If i want to execute x + 1, I can convert it to x + 2 at compile time 
thus avoid the shifts at runtime.

>
> I've enclosed a JavaDoc and the roughed out source.  For discussion. 
>  Fire away.
>
> Cheers,
>
> -- Jim

cheers,
Rémi



More information about the mlvm-dev mailing list