Transparency

Wed Jul 14 09:16:29 PDT 2010

A concrete suggestion: we could support this via bytecode transformation,
e.g.

ArrayList<Integer> list = Magic.reify(ArrayList.class, int.class);

...where reify() will basically change the type of ArrayList's internal
Object[] backing store to int[]. This would probably benefit some help from
the original collection, e.g. annotations that list which
fields/parameters/returns/locals must be changed (or not); but this mapping
could be provided on a configuration-by-exception manner. The bytecode
transformation is mostly simple, it just needs also to translate some API
calls (e.g. to Arrays.copyOf(T[], int, int)) to equivalent calls for the
required primitive-array type (or fail the whole thing, if it hits some call
that is unknown to the transformer or has no primitive-friendly overloads).

Several reifications of the same origin class could be arranged in a
hierarchy that's designed to both reflect the original hierarchy and to
share methods that don't directly depend on the element type (e.g. isEmpty()
{ return size()==0; }). So we get synthetic classes like:

AbstractCollection$ extends AbstractCollection
AbstractCollection$Int extends AbstractCollection$
AbstractList$Int extends AbstractCollection$Int
ArrayList$Int extends AbstractList$Int

The new classes can even inherit the top-level class from the original
hierarchy, as long as that class has no fields or at least no "container"
Object[] fields.

Returning a type like ArrayList<Integer> means no changes in syntax, method
signatures, classfile format, reflection etc. to support ArrayList<int>. So
the proposal is a simple library without any exceptional demands - which is
the only kind of proposal that we could hope adding to JDK 7, at this late
time. The tradeoff is that we're stuck with signatures like add(I)Z, so we
depend on HotSpot to optimize out the boxing/unboxing that happens in each
method call. At least for inlined calls I guess this optimization is already
done; the remaining work should be feasible, and it could wait for some 7uXX
maintenance update if there's no time do do that until FCS.

There is no Object[] field to worry about... but still, some Object[]s in
APIs like toArray(). In that case there is no significant extra conversion
cost (toArray() already does a defensive copy so we just do a different copy
from int[] to Object[]). The reified class could implement some extra,
type-specific helper interface that adds methods like int[] toArrayInt();
this would probably make the transformer more tightly-bound to the Java SE
Collections API, or require extra annotations for custom additions; not
pretty, but not hard to implement. And while we're at it, I would love to
add equivalent methods that DON'T do a defensive copy, just return the
internal array, or assign an external array to the collection's field.
Perf-critical code often benefits from that; I have personally had this
problem. When your code is already optimized enough, redundant data copies
always bubble up to the top of the performance profile. This is why
estimating the initial size is everybody's favorite collections
optimization, as it avoids lots of allocation and copy from growing.

Finally, it's also easy to implement as the bytecode transformation, full
reification for any non-public method. So we can't touch add(Object), but we
certainly can do that for all private and even protected methods, and also
entire non-public helper classes like AbstractList's Itr, ListItr, SubList,
and RandomAccessSubList.

I don't like ideas like new IntList(), or some Lists.of() that's just a
factory to hide types like IntList. This would add enormous bloat to the
core, and/or would be very limited - there's a hefty number of collection
classes in the core (not to mention third-party code) and I may need
primitive-optimized versions of ANY of these classes. If I need a List of
int, I may also need the specific features or O(N) traits of particular
implementations like ArrayList, LinkedList, or even CopyOnWriteArrayList.
Don't make me choose between optimal element storage (primitive reification)
and optimal API/structure (compact array, linked nodes, concurrency etc.). A
partial solution is sometimes worse than no solution at all - it avoids a
much bigger mess, e.g. using a core collection for ArrayList-of-int, and GNU
Trove for LinkedList-of-int or whatever.

As a final remark, this implementation would be valid:

<T> T reify (T collection, Class elementType) {
  return collection;
}

..it just wouldn't gain any performance. But it's good enough, for example,
to add (as a compatibility feature) to platforms where code footprint is a
higher concern, like the lowest Java ME profiles. Or even to the first
release of JDK 7, if there is no time for a full implementation of this API.

A+
Osvaldo

2010/7/14 Reinier Zwitserloot <reinier at zwitserloot.com>

> Indeed, having: new ArrayList<int>(); be as fast as an int array requires
> reification; once ArrayList creates a backing array of type Object[], the
> performance game is lost.
>
> Nothing short of reification can address this issue. To be specific,
> specialization won't solve it any more than primitives-in-generics does.
> However, this:
>
> List<int> list = Lists.of(int.class);
> list.add(10);
>
> or:
>
> List<int> list = new IntList();
> list.add(10);
>
>
> *CAN* be made fast. Specialization, interestingly enough, cannot do this*,
> but my primitives in generics proposal can. Which partly makes me wonder
> why
> we're talking about the far more complicated specialization concept in the
> first place. I know scala uses this, but they either introduced their
> collections API with specialization from day one, or they created a
> backwards incompatible retrofit later.
>
> *) list.add(10), is, as far as the compiler knows, a call to add(T) on a
> List<int>. If this is to be fast under specialization, it would therefore
> have to generate a call to an add method with signature (I)Z and not with
> signature (Ljava/lang/Object;)Z like it would today (and autobox to make
> the
> int fit as an object). However, if it is to generate such a call, then
> *ANY*
> list, no matter how it was obtained, must have an add (I)Z method, but
> lists
> written today do not because that's not part of the List.java interface. We
> can't add methods to interfaces or we'd break backwards compatibility, and
> breaking backwards compatibility is not acceptable. Therefore, in list
> implementations that lack add(I)Z, the JVM or the classloader has to
> generate these. There's precedence of course: Brian Goetz' proposal for
> defender methods does something quite similar. Still, this is far from
> trivial, and I'd like to see a proposal on how one could update List.java
> so
> that the JVM / ClassLoader considers the specialization methods as
> defenderish, but the object-based one itself not. Primitives-in-generics
> keeps things simple by letting the call to (Ljava/lang/Object;)Z stand, and
> letting the JVM hotspot compiler eliminate the inefficiency instead.
>
>  --Reinier Zwitserloot
>
>
>
> On Tue, Jul 13, 2010 at 11:08 PM, Nathan Bryant <
> nathan.bryant at linkshare.com
> > wrote:
>
> > John, it's just array creation
> >
> > class Foo<T> {
> >        T[] arr = new T[SIZE]; // Can't write this without reification
> > }
> >
> > class Foo<T> {
> >        Object[] arr = new Object[SIZE];
> >
> >        void set(int i, T t) { arr[i] = t; } // Either causes autoboxing
> > unbeknownst to the sorry fool who wrote new Foo<int>, or becomes illegal
> > }
> >
> >
> > John Nilsson wrote:
> >
> > > What exactly is it that requires reification to work?
> >
> > BR,
> > John
> >
> >
>
>