Transparency

Reinier Zwitserloot reinier at zwitserloot.com
Thu Jul 15 18:02:20 PDT 2010


Throwing a NullPointerException in response to this code:

List<Integer> list = new ArrayList<Integer>();
list.add(null);

is not backwards compatible.

 --Reinier Zwitserloot



On Fri, Jul 16, 2010 at 1:45 AM, Osvaldo Doederlein <opinali at gmail.com>wrote:

> 2010/7/15 Reinier Zwitserloot <reinier at zwitserloot.com>
>
> What happens if some class has multiple type parameters? Should we create a
>> Map$Short$Byte? That's going to explode into a very very large number of
>> types as you add on more type parameters. What seems to be required for your
>> scheme is a mix of complete reification combined with a lot of magic
>> bytecode rewriting, which you haven't fully elaborated on.
>
>
> If you want a Map<short, byte>, yes, we create a Map$Short$Byte. This is
> the reason why you need dynamic code expansion, that happens on demand at
> reify() calls. You only pay for what you are actually using. The Cartesian
> product of all your generic parameters in collections and other interesting
> classes X all primitive types would indeed result in thousands of possible
> expansions; but in practice, any given application is not likely to use more
> than a handful of different expansions. The reifier would hold the produced
> classes in a weak cache, so when some class is not used for a long time,
> it's class-GC'd. (Same technique already used by synthetic classes produced
> by reflection, and other dynamic frameworks.)
>
>
>> A few questions that I'm not sure your proposal answers:
>>  - When should the JVM stop specializing new classes? With for example 4
>> type parameters, there would be 8*8*8*8 = 4096 different classes required,
>> which seems quite infeasible.
>>
>
> As explained above, this only happens if the process actually instantiates
> all these 4096 combinations... not very likely.
>
>
>>  - Exactly how does Magic.reify() work? The backing array field in the
>> ArrayList implementation is not of type T[]; it's of type Object[]. How
>> could any library know that it should rewrite this Object[] to int[] in the
>> generated specialization without hardcoding a ruleset for all known classes?
>>
>
> I have actually proposed this, a configuration-by-exception mechanism - you
> can have annotations to hint what to change, but the reifier could also have
> default rules and heuristics to use in classes that lack such metadata.
>
>
>>  - Even if ArrayList is rewritten to use T[] instead of Object[], that's
>> no help for third party classes that used Object[] - Oracle can't very well
>> change every library ever written for java. In a class where T[] cannot be
>> rewritten to int[], exactly what does ArrayList$Int even do? Presumably it
>> would have to have an (I)Z method auto-generated by the classloader as
>> calling code will expect it to exist.
>>
>
> Non an issue, see previous item. BTW the reifier _must_ only rely on the
> erased types like Object[], because it's a runtime transformation on
> bytecode, not a compile-time tool that could see source code.
>
>
>>  - What happens when heap corruption occurs? i.e: List<int> list = new
>> ArrayList() /* Raw type! */; list.add(10) /* Generates call to non-existing
>> (I)Z add method */; - Normally this results in a MethodNotFoundError, which
>> is rarely caught as errors aren't usually supposed to be caught. That seems
>> entirely inappropriate for heap corruption.
>>
>
> See
> http://mail.openjdk.java.net/pipermail/lambda-dev/2010-July/001925.html,
> we won't have any (I)Z method, we just call the standard add(Object) and
> rely on VM optimizations to get rid of the overhead of box/CALL/unbox
> sequences.
>
>
>>  - If the syntax is to remain ArrayList<Integer> and not ArrayList<int>,
>> then what happens if one attempts to add null?
>>
>
> The first thing that the reified version of add(Object) does is unboxing -
> ((Integer)e).intValue() - so the result is a NullPointerException - that is
> consistent with the existing autoboxing semantics. Notice that when the VM
> can optimize those box/CALL/unbox patterns, the issue is non-existent as the
> original value is guaranteed not-null.
>
>
>>  - How does the auto-generator know that isEmpty() is not dependent on
>> reification, but e.g. add() is?
>> NB: At this point in time, the Oracle JVM does *NOT* eliminate autoboxing
>> and unboxing, though the JRockit VM does.
>>
>
> I know that HotSpot doesn't provide special optimizations for autoboxing,
> but I was just guessing that it can remove this overhead at least for
> trivial cases as effect of EA & scalar replacement... if not, hopefully this
> could be added with little effort, I guess it's just a matter of
> intrinsifying the valueOf()/xxxValue() methods.
>
> A+
> Osvaldo
>
>
>>
>>
>>
>>
>>
>>  --Reinier Zwitserloot
>>
>>
>>
>>
>> On Wed, Jul 14, 2010 at 6:16 PM, Osvaldo Doederlein <opinali at gmail.com>wrote:
>>
>>> A concrete suggestion: we could support this via bytecode transformation,
>>> e.g.
>>>
>>> ArrayList<Integer> list = Magic.reify(ArrayList.class, int.class);
>>>
>>> ...where reify() will basically change the type of ArrayList's internal
>>> Object[] backing store to int[]. This would probably benefit some help from
>>> the original collection, e.g. annotations that list which
>>> fields/parameters/returns/locals must be changed (or not); but this mapping
>>> could be provided on a configuration-by-exception manner. The bytecode
>>> transformation is mostly simple, it just needs also to translate some API
>>> calls (e.g. to Arrays.copyOf(T[], int, int)) to equivalent calls for the
>>> required primitive-array type (or fail the whole thing, if it hits some call
>>> that is unknown to the transformer or has no primitive-friendly overloads).
>>>
>>> Several reifications of the same origin class could be arranged in a
>>> hierarchy that's designed to both reflect the original hierarchy and to
>>> share methods that don't directly depend on the element type (e.g. isEmpty()
>>> { return size()==0; }). So we get synthetic classes like:
>>>
>>> AbstractCollection$ extends AbstractCollection
>>> AbstractCollection$Int extends AbstractCollection$
>>> AbstractList$Int extends AbstractCollection$Int
>>> ArrayList$Int extends AbstractList$Int
>>>
>>> The new classes can even inherit the top-level class from the original
>>> hierarchy, as long as that class has no fields or at least no "container"
>>> Object[] fields.
>>>
>>> Returning a type like ArrayList<Integer> means no changes in syntax,
>>> method signatures, classfile format, reflection etc. to support
>>> ArrayList<int>. So the proposal is a simple library without any exceptional
>>> demands - which is the only kind of proposal that we could hope adding to
>>> JDK 7, at this late time. The tradeoff is that we're stuck with signatures
>>> like add(I)Z, so we depend on HotSpot to optimize out the boxing/unboxing
>>> that happens in each method call. At least for inlined calls I guess this
>>> optimization is already done; the remaining work should be feasible, and it
>>> could wait for some 7uXX maintenance update if there's no time do do that
>>> until FCS.
>>>
>>> There is no Object[] field to worry about... but still, some Object[]s in
>>> APIs like toArray(). In that case there is no significant extra conversion
>>> cost (toArray() already does a defensive copy so we just do a different copy
>>> from int[] to Object[]). The reified class could implement some extra,
>>> type-specific helper interface that adds methods like int[] toArrayInt();
>>> this would probably make the transformer more tightly-bound to the Java SE
>>> Collections API, or require extra annotations for custom additions; not
>>> pretty, but not hard to implement. And while we're at it, I would love to
>>> add equivalent methods that DON'T do a defensive copy, just return the
>>> internal array, or assign an external array to the collection's field.
>>> Perf-critical code often benefits from that; I have personally had this
>>> problem. When your code is already optimized enough, redundant data copies
>>> always bubble up to the top of the performance profile. This is why
>>> estimating the initial size is everybody's favorite collections
>>> optimization, as it avoids lots of allocation and copy from growing.
>>>
>>> Finally, it's also easy to implement as the bytecode transformation, full
>>> reification for any non-public method. So we can't touch add(Object), but we
>>> certainly can do that for all private and even protected methods, and also
>>> entire non-public helper classes like AbstractList's Itr, ListItr, SubList,
>>> and RandomAccessSubList.
>>>
>>> I don't like ideas like new IntList(), or some Lists.of() that's just a
>>> factory to hide types like IntList. This would add enormous bloat to the
>>> core, and/or would be very limited - there's a hefty number of collection
>>> classes in the core (not to mention third-party code) and I may need
>>> primitive-optimized versions of ANY of these classes. If I need a List of
>>> int, I may also need the specific features or O(N) traits of particular
>>> implementations like ArrayList, LinkedList, or even CopyOnWriteArrayList.
>>> Don't make me choose between optimal element storage (primitive reification)
>>> and optimal API/structure (compact array, linked nodes, concurrency etc.). A
>>> partial solution is sometimes worse than no solution at all - it avoids a
>>> much bigger mess, e.g. using a core collection for ArrayList-of-int, and GNU
>>> Trove for LinkedList-of-int or whatever.
>>>
>>> As a final remark, this implementation would be valid:
>>>
>>> <T> T reify (T collection, Class elementType) {
>>>   return collection;
>>> }
>>>
>>> ..it just wouldn't gain any performance. But it's good enough, for
>>> example, to add (as a compatibility feature) to platforms where code
>>> footprint is a higher concern, like the lowest Java ME profiles. Or even to
>>> the first release of JDK 7, if there is no time for a full implementation of
>>> this API.
>>>
>>> A+
>>> Osvaldo
>>>
>>>
>>> 2010/7/14 Reinier Zwitserloot <reinier at zwitserloot.com>
>>>
>>> Indeed, having: new ArrayList<int>(); be as fast as an int array requires
>>>> reification; once ArrayList creates a backing array of type Object[],
>>>> the
>>>> performance game is lost.
>>>>
>>>> Nothing short of reification can address this issue. To be specific,
>>>> specialization won't solve it any more than primitives-in-generics does.
>>>> However, this:
>>>>
>>>> List<int> list = Lists.of(int.class);
>>>> list.add(10);
>>>>
>>>> or:
>>>>
>>>> List<int> list = new IntList();
>>>> list.add(10);
>>>>
>>>>
>>>> *CAN* be made fast. Specialization, interestingly enough, cannot do
>>>> this*,
>>>> but my primitives in generics proposal can. Which partly makes me wonder
>>>> why
>>>> we're talking about the far more complicated specialization concept in
>>>> the
>>>> first place. I know scala uses this, but they either introduced their
>>>> collections API with specialization from day one, or they created a
>>>> backwards incompatible retrofit later.
>>>>
>>>> *) list.add(10), is, as far as the compiler knows, a call to add(T) on a
>>>> List<int>. If this is to be fast under specialization, it would
>>>> therefore
>>>> have to generate a call to an add method with signature (I)Z and not
>>>> with
>>>> signature (Ljava/lang/Object;)Z like it would today (and autobox to make
>>>> the
>>>> int fit as an object). However, if it is to generate such a call, then
>>>> *ANY*
>>>> list, no matter how it was obtained, must have an add (I)Z method, but
>>>> lists
>>>> written today do not because that's not part of the List.java interface.
>>>> We
>>>> can't add methods to interfaces or we'd break backwards compatibility,
>>>> and
>>>> breaking backwards compatibility is not acceptable. Therefore, in list
>>>> implementations that lack add(I)Z, the JVM or the classloader has to
>>>> generate these. There's precedence of course: Brian Goetz' proposal for
>>>> defender methods does something quite similar. Still, this is far from
>>>> trivial, and I'd like to see a proposal on how one could update
>>>> List.java so
>>>> that the JVM / ClassLoader considers the specialization methods as
>>>> defenderish, but the object-based one itself not. Primitives-in-generics
>>>> keeps things simple by letting the call to (Ljava/lang/Object;)Z stand,
>>>> and
>>>> letting the JVM hotspot compiler eliminate the inefficiency instead.
>>>>
>>>>  --Reinier Zwitserloot
>>>>
>>>>
>>>>
>>>> On Tue, Jul 13, 2010 at 11:08 PM, Nathan Bryant <
>>>> nathan.bryant at linkshare.com
>>>> > wrote:
>>>>
>>>> > John, it's just array creation
>>>> >
>>>> > class Foo<T> {
>>>> >        T[] arr = new T[SIZE]; // Can't write this without reification
>>>> > }
>>>> >
>>>> > class Foo<T> {
>>>> >        Object[] arr = new Object[SIZE];
>>>> >
>>>> >        void set(int i, T t) { arr[i] = t; } // Either causes
>>>> autoboxing
>>>> > unbeknownst to the sorry fool who wrote new Foo<int>, or becomes
>>>> illegal
>>>> > }
>>>> >
>>>> >
>>>> > John Nilsson wrote:
>>>> >
>>>> > > What exactly is it that requires reification to work?
>>>> >
>>>> > BR,
>>>> > John
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>


More information about the lambda-dev mailing list