64-bit array functionality implemented via libraries?

Reinier Zwitserloot reinier at zwitserloot.com
Tue Jun 2 08:32:27 PDT 2009


There is no expectation that inserting near the front of a large list  
is going to be performant when employing an enormous list.

More to the point, whereas with arrays you're in the same boat, and  
the only 'benefit', is that arrays are so inflexible that it's not  
only more obvious this is going to be very slow, it also causes more  
boilerplate if for whatever reason you want to do this anyway - with  
an interface you can use an implementation that supports much faster  
near-front insertions.


Wouldn't it be far more fruitful for the javac/JVM core time to figure  
out how to make primitive generics work properly? With such a  
construct, arrays can truly be relegated to a niche, and  
ArrayList<byte> would be as fast as byte[], and require about the same  
order of memory, instead of 9 to 17 times more depending on JVM. Such  
a change undoubtedly means it can't be java.util.List, but instead a  
new collections API, so while the opportunity is there, support for  
primitives in generics is quite easy: Let a generics parameter  
declaration explicitly mention that it supports primitives as well as  
classes. Once you have the freedom of not being tied to retrofitting  
existing classes, it's doable.

Before someone claims this is utterly impossible, various languages  
have managed it before. Here's a layout of one strategy to make this  
work in java:

Not just any generics can take primitives; only those explicitly  
marked as such, like so:

package java.nutil; //new util - not compatible with old util's List.

public interface List<T extends any> {
     public T get(long idx);
     public long size();
}

such generics classes always carry the raw type of their T as a  
Class<T> reference in each instance. That is, an instance  
List<List<String>>  knows it's a List of Lists, but not what kind of  
Lists, but the inner Lists know they are a list of strings. A  
List<int> is perfectly legal, and the inner type would then be  
'int.class' (which is different from Integer.class).

Internally, for each mention of 'T', all variants are generated. The  
above 'get(long idx)', therefore, translates to many methods:  
Object(long idx), int(long idx), short(long idx), etcetera. Similarly,  
an .add(T x) method would expand to many methods; add(Object),  
add(int), add(boolean), etc.


If compile time integrity of the generics is assured (that is - there  
were no generics warnings), then a call to an .add() method will  
always result in either calling add(int), or add(Object), with an  
instance of java.lang.Integer. These can be trivially channeled  
towards the same, 'T' accepting method. When integrity is compromised,  
the internal reference to the raw type ensures a ClassCastException  
can be generated appropriately.

null assignments are allowed but can result in NPEs if the 'T' happens  
to be a primitive. Because 'null' is not part of java's type system,  
this can't
be (easily) fixed. Regrettable, but certainly nothing new; auto- 
unboxing suffers from the same flaw.

Any 'T' typed variables are just like now, Objects; even a List<int>  
that holds a 'private T foo;' field would use java.lang.Integer and  
not 'int'. However, creating arrays is allowed, via a special utility  
class. (Possible here, unlike java1.5 generics, because of the  
internal reference to the raw type).

Thus, code that looks like this:

NewList<int> list = FastList.create();
list.add(10);

would boil down to:

Create a new instance of FastList. store 'int.class' somewhere inside  
it.

list.add(10); - on a non-hotspotted run, the add(int) method will  
first check that this is a list of int.class or Integer.class and  
delegate to the right method. If not, a CCE is generated. During a  
hotspot run, hotspot would ascertain that this is a pointless check,  
and skip it entirely; link the .add() call straight to the right code  
that actually works with a primitive int.


My point isn't so much that this mail is a complete proposal for such  
a system (it certainly isn't, and, indeed, I doubt such a proposal  
would be even close to small enough to fit within Project Coin's  
mission), but rather that jumping through hoops changing the language  
itself seems short-sighted to me. The value of a long arrays proposal  
(vs. just using libraries to do it) exists only because java currently  
lacks something that is fixable and would have far more utility.

  --Reinier Zwitserloot



On 02-06-2009, at 15:30, Mark Thornton wrote:

> james lowden wrote:
>> My preference for doing something like this would be to introducted  
>> a parallel LargeList/LargeCollection interface that used 64-bit  
>> indexing but otherwise shared semantics with the extant List/ 
>> Collection interfaces, and have the [] access work for either, and  
>> then do the large array via a "LargeArrayList" (or something)  
>> implementation of the large list, probably represented internally  
>> as an array-of-arrays.  I'll toss something together at some  
>> point. . .
>>
> Large collections raise additional issues of use cases and performance
> expectations that don't arise with a simple large array. For example
> consider inserting near the front of a 4 billion element array list.
>
> Mark Thornton
>
>




More information about the coin-dev mailing list