Transparency

Thu Jul 15 16:45:57 PDT 2010

2010/7/15 Reinier Zwitserloot <reinier at zwitserloot.com>

> What happens if some class has multiple type parameters? Should we create a
> Map$Short$Byte? That's going to explode into a very very large number of
> types as you add on more type parameters. What seems to be required for your
> scheme is a mix of complete reification combined with a lot of magic
> bytecode rewriting, which you haven't fully elaborated on.

If you want a Map<short, byte>, yes, we create a Map$Short$Byte. This is the
reason why you need dynamic code expansion, that happens on demand at
reify() calls. You only pay for what you are actually using. The Cartesian
product of all your generic parameters in collections and other interesting
classes X all primitive types would indeed result in thousands of possible
expansions; but in practice, any given application is not likely to use more
than a handful of different expansions. The reifier would hold the produced
classes in a weak cache, so when some class is not used for a long time,
it's class-GC'd. (Same technique already used by synthetic classes produced
by reflection, and other dynamic frameworks.)

> A few questions that I'm not sure your proposal answers:
>  - When should the JVM stop specializing new classes? With for example 4
> type parameters, there would be 8*8*8*8 = 4096 different classes required,
> which seems quite infeasible.
>

As explained above, this only happens if the process actually instantiates
all these 4096 combinations... not very likely.

>  - Exactly how does Magic.reify() work? The backing array field in the
> ArrayList implementation is not of type T[]; it's of type Object[]. How
> could any library know that it should rewrite this Object[] to int[] in the
> generated specialization without hardcoding a ruleset for all known classes?
>

I have actually proposed this, a configuration-by-exception mechanism - you
can have annotations to hint what to change, but the reifier could also have
default rules and heuristics to use in classes that lack such metadata.

>  - Even if ArrayList is rewritten to use T[] instead of Object[], that's no
> help for third party classes that used Object[] - Oracle can't very well
> change every library ever written for java. In a class where T[] cannot be
> rewritten to int[], exactly what does ArrayList$Int even do? Presumably it
> would have to have an (I)Z method auto-generated by the classloader as
> calling code will expect it to exist.
>

Non an issue, see previous item. BTW the reifier _must_ only rely on the
erased types like Object[], because it's a runtime transformation on
bytecode, not a compile-time tool that could see source code.

>  - What happens when heap corruption occurs? i.e: List<int> list = new
> ArrayList() /* Raw type! */; list.add(10) /* Generates call to non-existing
> (I)Z add method */; - Normally this results in a MethodNotFoundError, which
> is rarely caught as errors aren't usually supposed to be caught. That seems
> entirely inappropriate for heap corruption.
>

See http://mail.openjdk.java.net/pipermail/lambda-dev/2010-July/001925.html,
we won't have any (I)Z method, we just call the standard add(Object) and
rely on VM optimizations to get rid of the overhead of box/CALL/unbox
sequences.

>  - If the syntax is to remain ArrayList<Integer> and not ArrayList<int>,
> then what happens if one attempts to add null?
>

The first thing that the reified version of add(Object) does is unboxing -
((Integer)e).intValue() - so the result is a NullPointerException - that is
consistent with the existing autoboxing semantics. Notice that when the VM
can optimize those box/CALL/unbox patterns, the issue is non-existent as the
original value is guaranteed not-null.

>  - How does the auto-generator know that isEmpty() is not dependent on
> reification, but e.g. add() is?
> NB: At this point in time, the Oracle JVM does *NOT* eliminate autoboxing
> and unboxing, though the JRockit VM does.
>

I know that HotSpot doesn't provide special optimizations for autoboxing,
but I was just guessing that it can remove this overhead at least for
trivial cases as effect of EA & scalar replacement... if not, hopefully this
could be added with little effort, I guess it's just a matter of
intrinsifying the valueOf()/xxxValue() methods.

A+
Osvaldo

>
>
>
>
>
>  --Reinier Zwitserloot
>
>
>
>
> On Wed, Jul 14, 2010 at 6:16 PM, Osvaldo Doederlein <opinali at gmail.com>wrote:
>
>> A concrete suggestion: we could support this via bytecode transformation,
>> e.g.
>>
>> ArrayList<Integer> list = Magic.reify(ArrayList.class, int.class);
>>
>> ...where reify() will basically change the type of ArrayList's internal
>> Object[] backing store to int[]. This would probably benefit some help from
>> the original collection, e.g. annotations that list which
>> fields/parameters/returns/locals must be changed (or not); but this mapping
>> could be provided on a configuration-by-exception manner. The bytecode
>> transformation is mostly simple, it just needs also to translate some API
>> calls (e.g. to Arrays.copyOf(T[], int, int)) to equivalent calls for the
>> required primitive-array type (or fail the whole thing, if it hits some call
>> that is unknown to the transformer or has no primitive-friendly overloads).
>>
>> Several reifications of the same origin class could be arranged in a
>> hierarchy that's designed to both reflect the original hierarchy and to
>> share methods that don't directly depend on the element type (e.g. isEmpty()
>> { return size()==0; }). So we get synthetic classes like:
>>
>> AbstractCollection$ extends AbstractCollection
>> AbstractCollection$Int extends AbstractCollection$
>> AbstractList$Int extends AbstractCollection$Int
>> ArrayList$Int extends AbstractList$Int
>>
>> The new classes can even inherit the top-level class from the original
>> hierarchy, as long as that class has no fields or at least no "container"
>> Object[] fields.
>>
>> Returning a type like ArrayList<Integer> means no changes in syntax,
>> method signatures, classfile format, reflection etc. to support
>> ArrayList<int>. So the proposal is a simple library without any exceptional
>> demands - which is the only kind of proposal that we could hope adding to
>> JDK 7, at this late time. The tradeoff is that we're stuck with signatures
>> like add(I)Z, so we depend on HotSpot to optimize out the boxing/unboxing
>> that happens in each method call. At least for inlined calls I guess this
>> optimization is already done; the remaining work should be feasible, and it
>> could wait for some 7uXX maintenance update if there's no time do do that
>> until FCS.
>>
>> There is no Object[] field to worry about... but still, some Object[]s in
>> APIs like toArray(). In that case there is no significant extra conversion
>> cost (toArray() already does a defensive copy so we just do a different copy
>> from int[] to Object[]). The reified class could implement some extra,
>> type-specific helper interface that adds methods like int[] toArrayInt();
>> this would probably make the transformer more tightly-bound to the Java SE
>> Collections API, or require extra annotations for custom additions; not
>> pretty, but not hard to implement. And while we're at it, I would love to
>> add equivalent methods that DON'T do a defensive copy, just return the
>> internal array, or assign an external array to the collection's field.
>> Perf-critical code often benefits from that; I have personally had this
>> problem. When your code is already optimized enough, redundant data copies
>> always bubble up to the top of the performance profile. This is why
>> estimating the initial size is everybody's favorite collections
>> optimization, as it avoids lots of allocation and copy from growing.
>>
>> Finally, it's also easy to implement as the bytecode transformation, full
>> reification for any non-public method. So we can't touch add(Object), but we
>> certainly can do that for all private and even protected methods, and also
>> entire non-public helper classes like AbstractList's Itr, ListItr, SubList,
>> and RandomAccessSubList.
>>
>> I don't like ideas like new IntList(), or some Lists.of() that's just a
>> factory to hide types like IntList. This would add enormous bloat to the
>> core, and/or would be very limited - there's a hefty number of collection
>> classes in the core (not to mention third-party code) and I may need
>> primitive-optimized versions of ANY of these classes. If I need a List of
>> int, I may also need the specific features or O(N) traits of particular
>> implementations like ArrayList, LinkedList, or even CopyOnWriteArrayList.
>> Don't make me choose between optimal element storage (primitive reification)
>> and optimal API/structure (compact array, linked nodes, concurrency etc.). A
>> partial solution is sometimes worse than no solution at all - it avoids a
>> much bigger mess, e.g. using a core collection for ArrayList-of-int, and GNU
>> Trove for LinkedList-of-int or whatever.
>>
>> As a final remark, this implementation would be valid:
>>
>> <T> T reify (T collection, Class elementType) {
>>   return collection;
>> }
>>
>> ..it just wouldn't gain any performance. But it's good enough, for
>> example, to add (as a compatibility feature) to platforms where code
>> footprint is a higher concern, like the lowest Java ME profiles. Or even to
>> the first release of JDK 7, if there is no time for a full implementation of
>> this API.
>>
>> A+
>> Osvaldo
>>
>>
>> 2010/7/14 Reinier Zwitserloot <reinier at zwitserloot.com>
>>
>> Indeed, having: new ArrayList<int>(); be as fast as an int array requires
>>> reification; once ArrayList creates a backing array of type Object[], the
>>> performance game is lost.
>>>
>>> Nothing short of reification can address this issue. To be specific,
>>> specialization won't solve it any more than primitives-in-generics does.
>>> However, this:
>>>
>>> List<int> list = Lists.of(int.class);
>>> list.add(10);
>>>
>>> or:
>>>
>>> List<int> list = new IntList();
>>> list.add(10);
>>>
>>>
>>> *CAN* be made fast. Specialization, interestingly enough, cannot do
>>> this*,
>>> but my primitives in generics proposal can. Which partly makes me wonder
>>> why
>>> we're talking about the far more complicated specialization concept in
>>> the
>>> first place. I know scala uses this, but they either introduced their
>>> collections API with specialization from day one, or they created a
>>> backwards incompatible retrofit later.
>>>
>>> *) list.add(10), is, as far as the compiler knows, a call to add(T) on a
>>> List<int>. If this is to be fast under specialization, it would therefore
>>> have to generate a call to an add method with signature (I)Z and not with
>>> signature (Ljava/lang/Object;)Z like it would today (and autobox to make
>>> the
>>> int fit as an object). However, if it is to generate such a call, then
>>> *ANY*
>>> list, no matter how it was obtained, must have an add (I)Z method, but
>>> lists
>>> written today do not because that's not part of the List.java interface.
>>> We
>>> can't add methods to interfaces or we'd break backwards compatibility,
>>> and
>>> breaking backwards compatibility is not acceptable. Therefore, in list
>>> implementations that lack add(I)Z, the JVM or the classloader has to
>>> generate these. There's precedence of course: Brian Goetz' proposal for
>>> defender methods does something quite similar. Still, this is far from
>>> trivial, and I'd like to see a proposal on how one could update List.java
>>> so
>>> that the JVM / ClassLoader considers the specialization methods as
>>> defenderish, but the object-based one itself not. Primitives-in-generics
>>> keeps things simple by letting the call to (Ljava/lang/Object;)Z stand,
>>> and
>>> letting the JVM hotspot compiler eliminate the inefficiency instead.
>>>
>>>  --Reinier Zwitserloot
>>>
>>>
>>>
>>> On Tue, Jul 13, 2010 at 11:08 PM, Nathan Bryant <
>>> nathan.bryant at linkshare.com
>>> > wrote:
>>>
>>> > John, it's just array creation
>>> >
>>> > class Foo<T> {
>>> >        T[] arr = new T[SIZE]; // Can't write this without reification
>>> > }
>>> >
>>> > class Foo<T> {
>>> >        Object[] arr = new Object[SIZE];
>>> >
>>> >        void set(int i, T t) { arr[i] = t; } // Either causes autoboxing
>>> > unbeknownst to the sorry fool who wrote new Foo<int>, or becomes
>>> illegal
>>> > }
>>> >
>>> >
>>> > John Nilsson wrote:
>>> >
>>> > > What exactly is it that requires reification to work?
>>> >
>>> > BR,
>>> > John
>>> >
>>> >
>>>
>>>
>>
>