TaggedArrays (Proposal)
Krystal Mok
rednaxelafx at gmail.com
Mon Jul 2 22:40:25 PDT 2012
On Tue, Jul 3, 2012 at 1:28 PM, Howard Lovatt <howard.lovatt at gmail.com>wrote:
> I like the idea and something along these lines would be a great addition
> to the standard library, which I will come back to as a PS.
>
> In com.sun.misc.Unsafe there are already getLong(Object, int) and
> setLong(Object, int, long) methods and the same for Object. No doubt if we
> used getLong and setLong as they stand they would fail due to type checking
> if used to access references to Objects in an Object[]. However if similar
> methods were added that circumvented the type checking,
> getLongOrReference(Object, int) and setLongOrReference(Object, int, long),
> then you could write:
>
> private Object[] vsAndRs;
>
> ...
>
> @Override public long getValue(final int index) {
> checkIndex(pos);
> final long value = Unsafe.getLongOrReference(vsAndRs, index);
> if ( (value & TaggedArrays.TAG) != 0 ) {
> return value;
> }
> throw new TaggedArrayException("value at index =" + index + " is not
> tagged");
> }
>
> @Override public Object getReference(final int index) {
> if ( isReference(index) ) {
> return vsAndRs[index];
> }
> throw new TaggedArrayException("reference at index =" + index +" is
> tagged");
> }
>
> ..
>
> Which would save half the space and still be understandable Java (i.e. not
> a collection of native calls).
>
> It's not just about type checking at Java level, but also something deep
within the VM: the need to distinguish between a plain value and a
reference (pointer), which is fundamental to a exact/accurate garbage
collector.
Jim's idea includes modifying the doOop() function in the VM, which is
exactly where HotSpot handles reference fields within an object.
If you put random values into a place where HotSpot (or other JVM impls
using exact GC) think it's a reference, then something bad may happen later
(e.g. during GC), that the VM think it had found a corrupted reference.
- Kris
> Great work,
>
> -- Howard.
>
> PS A few random thoughts for the interface, assuming that something like
> this makes it into the JDK then the following additions might be nice:
>
> 1. Fills that accept a lambda that receives the index as its argument
> (filling function as opposed to filling constant).
> 2. A second interface and second factory that define tagged arrays for
> long and double in addition to Objects, like the example code given.
> 3. TaggedArray32, primarily for phones etc., with the extended interface,
> point 2 above, for TaggedArray32 accepting Objects, ints, and floats. On a
> 64 bit system TaggedArray32 would be the same as TaggedArray, but for 32
> bit systems it would save space.
>
>
> On 2 July 2012 23:05, Jim Laskey <jlaskey at me.com> wrote:
>
>> During a week in the rarefied air of Stockholm back in May, a sleepless
>> night got me thinking. The day after that, the thinking became a reality.
>> I've been sitting on the code since, not sure what to do next. So..., why
>> not start the month leading up to the JVMLS with a discussion about dynamic
>> values.
>>
>> Every jLanguage developer knows that primitive boxing is the enemy. Even
>> more so for untyped languages. We need a way to interleave primitive types
>> with references.
>>
>> Tagged values (value types) for dynamic languages have been approached
>> from a dozen different angles over the history of Java. However, no one
>> seems to be satisfied with any of the proposals so far. Either the
>> implementation is too limiting for the language developer or too complex to
>> implement.
>>
>> Most recently, John (Rose) proposed hiding value tagging in the JVM via
>> the Integer/Long/Float/Double.valueof methods. I saw a few issues with
>> this proposal. First, the implementation works differently on 32 bit and
>> 64 bit platforms (only half a solution on each). Secondly, control of the
>> tag bits is hidden such that it doesn't give a language implementor any
>> leeway on bit usage. Finally, it will take a long time for it to get
>> introduced into the JVM. The implementation is complex, scattered all over
>> the VM and will lead to a significant multiplier for testing coverage.
>>
>> It occurred to me on that sleepless Monday night, that the solution for
>> most dynamic languages could be so much simpler. First, we have to look at
>> what it is we really need. Ultimately it's about boxing. We want to avoid
>> allocating memory whenever we need to store a primitive value in an object.
>> Concerning ourselves with passing around tagged values in registers and
>> storing in stack frames is all red herring. All that is needed is a
>> mechanism for storing tagged values (or compressed values) in a no-type
>> slot of a generic object. Thinking about it in these terms isolates all
>> issues to a single array-like class, and thus simplifies implementation and
>> simplifies testing. Instances of this class can be used as objects, as
>> stack frames and even full stacks. A good percentage of a dynamic language
>> needs are covered.
>>
>> So, Rickard Bäckman (also of Oracle) and I defined an API and implemented
>> (in HotSpot) an interface called TaggedArray. Conceptional, TaggedArray is
>> a fixed array of no-type slots (64-bit), where each slot can contain either
>> a reference or a tagged long value (least significant bit set.)
>> Internally, TaggedArray class's doOop method knows that it should skip
>> any 64-bit value with the least significant bit set. How the language
>> developer uses the other 63 bits is up to them. References are just
>> addresses. On 32 bit machines, the address (or packed address) is stored
>> in the high 32-bits (user has no access) So there is no interference with
>> the tag bit.
>>
>> We supply four implementations of the API. 1) is a naive two parallel
>> arrays (one Object[], one long[]) implementation for platforms not
>> supporting TaggedArrays (and JDK 1.7), 2) an optimized version of 1) that
>> allocates each array on demand, 3) a JNI implementation (minimally needed
>> for the interpreter) that uses the native implementation and 4) the native
>> implementation that is recognized by both the C1/C2 compilers (effort only
>> partially completed.) In general, the implementation choice is transparent
>> to the user (optimal choice.)
>>
>> I've enclosed a JavaDoc and the roughed out source. For discussion.
>> Fire away.
>>
>> Cheers,
>>
>> -- Jim
>>
>>
>>
>>
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>
>
>
> --
> -- Howard.
>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120703/3554aec7/attachment-0001.html
More information about the mlvm-dev
mailing list