Transparency

Thu Jul 15 06:21:04 PDT 2010

2010/7/14 Collin Fagan <collin.fagan at gmail.com>

>
> It could be my imperfect understanding your proposal, but when I hear you
> talk about ArrayList and not the interface List, List. Well, I get the
> feeling that you are talking about an implementation of primitives in
> generics for a select few JDK classes only and this mechanism would not be
> available to any old programmer who wants to use primitives.
>

My use of ArrayList, not List, was not intentional to mean anything, but
you've spotted a bug in my proposal: a call like
Magic.reify(ArrayList.class, int.class) should obviously return a subtype of
ArrayList. This is not compatible with my proposed draft of implementation
hierarchy. But of course, we don't want the reified collection to be a
subclass of ArrayList, remarkably because this would carry together
ArrayList's fields, and we surely don't need its stinkin' Object[] field.
Well, that's not a big deal if this field can just be left uninitialized.
But forced subtyping of ArrayList still has the problem of making harder to
create an implementation hierarchy that favors code reuse (and this is an
important concern in any template expansion-like mechanism).

What I really want is: use the ArrayList class as an implementation
template, but return something that just implements the same interfaces
(remarkably List and RandomAccess). In that case, the reify() API could
become:

public class Collections {
    public static <T> List<T> reifiedCollection (Class<? extends
Collection<?>> template, Class<?> primitive);
    public static <T> List<T> reifiedList (Class<? extends List> template,
Class<?> primitive);
    public static <K,V> Map<K,V> reifiedMap (Class<? extends Map> template,
Class<?> primitive);
}
...
List<Integer> list = Collections.reifiedList(ArrayList.class, int.class);

The reifiedXxx() methods can be just facades to a single transformer; they
return each of the core Collections interfaces, allowing invocations to be
warning-free. Their signature also makes explicit that the returned object
is just a Collection/List/Map, not necessarily a subtype of the template
class.

Notice that the runtime should not depend on static knowledge of
implementation types to perform the proposed optimizations. Making this more
explicit, say you have the list variable allocated above, being used
somewhere else (in a method too far away, so the compiler cannot use flow
analysis to find that its type is always the reified ArrayList$Int). It
doesn't matter. If we do this:

list.add(10);

thjat code is compiled as usual, as a full-blown invokeinterface to
add(Object). But that doesn't matter. The method that's actually invoked is
the reified method, like this:

public boolean add (Object e) {
    int reified_e = ((Integer)e).intValue();
    ensureCapacity(size + 1);
    elementData[size++] = e;
    return true;
}

...where elementData has static type int[], only the first line contains
unboxing cost.

But if and when the VM inlines our call, it's easy to get rid of that cost,
both the boxing right before the CALL and the unboxing right after it. In
fact I think current HotSpot can already do this kind of optimization (comes
for free with inlining + escape analysis + scalar replacement).

Even for non-inlined / JIT-compiled methods, maybe it's possible to do some
good speedup. The interpreter could have new quickened bytecode to invoke
methods that have this boxing/CALL/unboxing pattern. The JIT compiler could
emit two entry points for the same method, so a call-site that has an int to
pass can just call the unboxed entry point directly, without boxing before
the call and without landing in prolog code that needs to unbox - this would
be good when the call cannot be inlined for any reason. But these tricks
could be added in later releases, they are "implementation detail" that
doesn't depend on public APIs or bytecode format changes. An initial
implementation that removes all boxing overhead only for "hot" methods, only
in HotSpot Server, would already be a big step forward.

> Also while I'm commenting on this idea I'd like to mention that the
> auto-boxing genie is out of the bottle. As an example my friend was working
> with some code that was backed by a bunch of Map<Long,Set<Long>> objects.
> The methods to populate this map took parameters declared as just long. In
> this case auto-boxing causes a massive amounts of new Long objects to be
> created.
>
> Personally I feel autoboxing ArrayList<int> to ArrayList<Integer> is no
> more dangerous or surprising then the current behavior. Which in my
> experience is very surprising to people. I think it would be a real benefit
> if one could minimize the number of objects created by autoboxing in
> general. Then something like this won't be as harmful.
>
> To see just how bad memory allocation was under this model my friend and I
> hacked up a quick Long cache. 3/4 of a gig of ram for a heap that was about
> 8 gig. He is championing a campaign to remove all primitive longs from his
> companies code base, but it's an uphill battle.
>

Removing usage 'long' won't buy much advantage to your friend. If you have,
say, a database where all PKs are longs but in practice their values always
fit in an int, you can just implement a Long cache that contains all
possible 2^31 positive int values (as negative values are not usual for
PKs). Such cache is just 2X bigger than the all-values Integer cache that
you need if all longs are replaced by ints.

For huge collections of primitive values or keys, the per-element overhead
is often enormous even without boxing, e.g. a big Map.Entry object for every
entry of a HashMap. In extreme cases (millions of elements and up) even the
reified java.util collections won't do the trick; you need alternative
collection implementations (you can implement a hashtable without
per-element containers like Entry).

A+
Osvaldo

>
> Collin
>
>
> On Wed, Jul 14, 2010 at 5:57 PM, Osvaldo Doederlein <opinali at gmail.com>wrote:
>
>> 2010/7/14 John Nilsson <john at milsson.nu>
>>
>> > On Wed, Jul 14, 2010 at 12:16 AM, Nathan Bryant
>> > <nathan.bryant at linkshare.com> wrote:
>> > > John,
>> > >
>> > > If you mean what I think you mean, that seems to result in a generics
>> > that is useful only for interface definition. Either it's not possible
>> to
>> > write:
>> > >
>> > > foo = new ArrayList<int>()
>> > >
>> > > Or doing so results in autoboxing to Integer, unbeknownst to the user,
>> > which is a serious violation of the principle of least surprise.
>> >
>> > Or there is some clever hacks in ArrayList to handle the issue. An
>> > empty array is just empty, so no need to have any specific array-type.
>> > If the first element added is an int or Integer you could
>> > optimistically assume that all input will be ints and thus allocate an
>> > int[], should that assumption turnout to be wrong, in later adds you
>> > allocate an Object[].
>> >
>>
>> This would create some extra effort - instanceof, typecast and branches -
>> in
>> many critical methods like add(), get() etc. For collections of
>> primitives,
>> this would be much better than the cost of boxing; but for collections of
>> reference types, we would make existing code (close to 100% of current
>> code)
>> slower, even if just a bit... which is not acceptable, breaks the "first,
>> do
>> no harm" principle. (Collections are critical enough that any 1% slowdown
>> in
>> the general/dominant usage will only pass with some big fight.)
>>
>>
>> > I can see how this could be problematic for uninitialized values,
>> > return null or 0? Maybe this could be solved by adding a generic
>> > "zero"-value that is valid for all types?
>> >
>>
>> This is not a problem. If the first value added is int and later we
>> transition int[]->Object, any unused position (slack due to the
>> exponential
>> growing strategy) becomes nulls, but these values are not really contained
>> by the collection (they are in positions >= size()). If we ever transition
>> Object[]->int[], slack elements are initialized with 0 but once again
>> they're not contained. If the collection (in Object[] state) does contains
>> nulls, then that collection simply cannot transition to the optimized
>> primitive-array form.
>>
>> A+
>> Osvaldo
>>
>>
>> >
>> > I'm not saying that this is particularily elegent though...
>> >
>> > BR,
>> > John
>> >
>> >
>>
>>
>
>