TaggedArrays (Proposal)

Mon Jul 2 10:23:30 PDT 2012

>From Jim

It occurred to me on that sleepless Monday night, that the solution for 
most dynamic languages could be so much simpler.  First, we have to look 
at what it is we really need.  Ultimately it's about boxing.  We want to 
avoid allocating memory whenever we need to store a primitive value in an 
object.  Concerning ourselves with passing around tagged values in 
registers and storing in stack frames is all red herring.  All that is 
needed is a mechanism for storing tagged values (or compressed values) in 
a no-type slot of a generic object.  Thinking about it in these terms 
isolates all issues to a single array-like class, and thus simplifies 
implementation and simplifies testing.  Instances of this class can be 
used as objects, as stack frames and even full stacks.  A good percentage 
of a dynamic language needs are covered.

Just having spent the last year implementing Smalltalk on the JVM the 
issue of boxing ( particularly for integers ) is of interest for me. While 
I agree with
your statement that allocating memory is the big issue,  I don't really 
understand your comment about 'when we store a primitive'.  My most 
visible issue
is in a 'for loop' like construct where I generate an index and use it to 
access a byte array.  In this case I need to create both the index and the 
byte accessed.
For instance scanning a million byte array requires that I create a 
million indexes and then another million ( assuming no cache ) one for 
each byte accessed. 
These are all then discarded.  Its not clear to me how your proposal would 
help here.

As I have thought about the boxing issue ( the Smalltalk I am using as a 
reference has tagged integers ) I keep thinking that any jvm solution is 
probably 
going to have some 'java' or other target driven characteristics that make 
it hard to use.  In my case I have a java class that all of my objects are 
instances of. 
If there were a 'tagged' object type then it would have to be able to 
substitute as one of my instances,  hold or reference the same information 

(like method lookups, class, shape etc ), exist on the stack or in a temp, 
 be testable in a GWT .... 

Here I agree with you that making this work in the JVM is probably too 
difficult.

So my thoughts are back to how to have a boxed primitive with no 
allocation overhead unless it is saved or accessed outside of a thread, in 
other words 
how to reuse the box.  I can do this for some situations where I can prove 
the scope but I have yet to figure out a general solution.

So while I can see a use for mixing references and primitives in an array 
it has not shown up in my work as amajor issue.  Perhaps this is due to my 
not 
keeping parallel stacks?

In any case hope to hear more on this at the summit

regards
mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120702/7b291f52/attachment.html