TaggedArrays (Proposal)

Jim Laskey jlaskey at me.com
Mon Jul 2 11:50:52 PDT 2012


Mark,

I'll walk into the trap of offering a solution.  While I'm not familiar with your implementation of SmallTalk, the array example might work like this (generic/loose java)

	// language stack
	public static TaggedArray stack = TaggedArrays.allocate(1024*1024);
	public static int TOS = -1;

	// value conversions
	long toValue(int i) { return ((long)i << TaggedArrays.SHIFT) | TaggedArrays.TAG; }
	int toInt(long v) { return (int)(v >> TaggedArrays.SHIFT) ; }

	// stack manipulation
	void pushObject(Object o) { stack.setReference(++TOS, o); }
	void pushInt(int i) { stack.setValue(++TOS, toValue(i)); }
	Object popObject() { stack.getReference(TOS--); }
	int popInt() { toInt(stack.getValue(TOS--)); }

	....

 	// push array on stack
	pushObject(new byte[10000]);
	// push index on stack
	pushInt(10);
	
 	// get the array element, leaving the result on the stack
	invoke("getArray"); 

	// work with the result
	int b = popInt(); 


The code for getArray might be something like (loose error handling)

	int i;
	Object a;

	try {
		i = popInt();
	} catch (TaggedArrayException | IndexOutOfBoundsException ex) {
		// handle the fact it wasn't a value or underflow
	}

	try {
		a = popObject();
	} catch (TaggedArrayException | IndexOutOfBoundsException ex) {
		// handle the fact it wasn't an object or underflow
	}

	...

	if (a instanceof byte[]) {
		// get the byte array element
		long b = ((byte[])a)[i];
		// push result on the stack
		pushInt(b);
	} else ...
	
	
Not the best of examples, but
- there are no boxed values involved here
- the JVM compiler would convert setValue/setReference/getValue/getReference to simple memory access, so will perform well
- the net result is very simple native code without a lot of runtime handling
- note: you can choose to encrypt "values" suitable for your language needs 
	
I'll see if we can set up a TaggedArray breakout session at the summit and run thru other possibilities.

Cheers,

-- Jim





On 2012-07-02, at 2:23 PM, Mark Roos wrote:

> From Jim 
> 
> It occurred to me on that sleepless Monday night, that the solution for most dynamic languages could be so much simpler.  First, we have to look at what it is we really need.  Ultimately it's about boxing.  We want to avoid allocating memory whenever we need to store a primitive value in an object.  Concerning ourselves with passing around tagged values in registers and storing in stack frames is all red herring.  All that is needed is a mechanism for storing tagged values (or compressed values) in a no-type slot of a generic object.  Thinking about it in these terms isolates all issues to a single array-like class, and thus simplifies implementation and simplifies testing.  Instances of this class can be used as objects, as stack frames and even full stacks.  A good percentage of a dynamic language needs are covered. 
> 
> Just having spent the last year implementing Smalltalk on the JVM the issue of boxing ( particularly for integers ) is of interest for me.  While I agree with 
> your statement that allocating memory is the big issue,  I don't really understand your comment about 'when we store a primitive'.  My most visible issue 
> is in a 'for loop' like construct where I generate an index and use it to access a byte array.  In this case I need to create both the index and the byte accessed. 
> For instance scanning a million byte array requires that I create a million indexes and then another million ( assuming no cache ) one for each byte accessed.   
> These are all then discarded.  Its not clear to me how your proposal would help here. 
> 
> As I have thought about the boxing issue ( the Smalltalk I am using as a reference has tagged integers ) I keep thinking that any jvm solution is probably   
> going to have some 'java' or other target driven characteristics that make it hard to use.  In my case I have a java class that all of my objects are instances of. 
> If there were a 'tagged' object type then it would have to be able to substitute as one of my instances,  hold or reference the same information 
> (like method lookups, class, shape etc ), exist on the stack or in a temp,  be testable in a GWT .... 
> 
> Here I agree with you that making this work in the JVM is probably too difficult. 
> 
> So my thoughts are back to how to have a boxed primitive with no allocation overhead unless it is saved or accessed outside of a thread, in other words 
> how to reuse the box.  I can do this for some situations where I can prove the scope but I have yet to figure out a general solution. 
> 
> So while I can see a use for mixing references and primitives in an array it has not shown up in my work as amajor issue.  Perhaps this is due to my not 
> keeping parallel stacks? 
> 
> In any case hope to hear more on this at the summit 
> 
> regards 
> mark 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120702/3395221e/attachment-0001.html 


More information about the mlvm-dev mailing list