64-bit array functionality implemented via libraries?
Reinier Zwitserloot
reinier at zwitserloot.com
Tue Jun 2 08:32:27 PDT 2009
There is no expectation that inserting near the front of a large list
is going to be performant when employing an enormous list.
More to the point, whereas with arrays you're in the same boat, and
the only 'benefit', is that arrays are so inflexible that it's not
only more obvious this is going to be very slow, it also causes more
boilerplate if for whatever reason you want to do this anyway - with
an interface you can use an implementation that supports much faster
near-front insertions.
Wouldn't it be far more fruitful for the javac/JVM core time to figure
out how to make primitive generics work properly? With such a
construct, arrays can truly be relegated to a niche, and
ArrayList<byte> would be as fast as byte[], and require about the same
order of memory, instead of 9 to 17 times more depending on JVM. Such
a change undoubtedly means it can't be java.util.List, but instead a
new collections API, so while the opportunity is there, support for
primitives in generics is quite easy: Let a generics parameter
declaration explicitly mention that it supports primitives as well as
classes. Once you have the freedom of not being tied to retrofitting
existing classes, it's doable.
Before someone claims this is utterly impossible, various languages
have managed it before. Here's a layout of one strategy to make this
work in java:
Not just any generics can take primitives; only those explicitly
marked as such, like so:
package java.nutil; //new util - not compatible with old util's List.
public interface List<T extends any> {
public T get(long idx);
public long size();
}
such generics classes always carry the raw type of their T as a
Class<T> reference in each instance. That is, an instance
List<List<String>> knows it's a List of Lists, but not what kind of
Lists, but the inner Lists know they are a list of strings. A
List<int> is perfectly legal, and the inner type would then be
'int.class' (which is different from Integer.class).
Internally, for each mention of 'T', all variants are generated. The
above 'get(long idx)', therefore, translates to many methods:
Object(long idx), int(long idx), short(long idx), etcetera. Similarly,
an .add(T x) method would expand to many methods; add(Object),
add(int), add(boolean), etc.
If compile time integrity of the generics is assured (that is - there
were no generics warnings), then a call to an .add() method will
always result in either calling add(int), or add(Object), with an
instance of java.lang.Integer. These can be trivially channeled
towards the same, 'T' accepting method. When integrity is compromised,
the internal reference to the raw type ensures a ClassCastException
can be generated appropriately.
null assignments are allowed but can result in NPEs if the 'T' happens
to be a primitive. Because 'null' is not part of java's type system,
this can't
be (easily) fixed. Regrettable, but certainly nothing new; auto-
unboxing suffers from the same flaw.
Any 'T' typed variables are just like now, Objects; even a List<int>
that holds a 'private T foo;' field would use java.lang.Integer and
not 'int'. However, creating arrays is allowed, via a special utility
class. (Possible here, unlike java1.5 generics, because of the
internal reference to the raw type).
Thus, code that looks like this:
NewList<int> list = FastList.create();
list.add(10);
would boil down to:
Create a new instance of FastList. store 'int.class' somewhere inside
it.
list.add(10); - on a non-hotspotted run, the add(int) method will
first check that this is a list of int.class or Integer.class and
delegate to the right method. If not, a CCE is generated. During a
hotspot run, hotspot would ascertain that this is a pointless check,
and skip it entirely; link the .add() call straight to the right code
that actually works with a primitive int.
My point isn't so much that this mail is a complete proposal for such
a system (it certainly isn't, and, indeed, I doubt such a proposal
would be even close to small enough to fit within Project Coin's
mission), but rather that jumping through hoops changing the language
itself seems short-sighted to me. The value of a long arrays proposal
(vs. just using libraries to do it) exists only because java currently
lacks something that is fixable and would have far more utility.
--Reinier Zwitserloot
On 02-06-2009, at 15:30, Mark Thornton wrote:
> james lowden wrote:
>> My preference for doing something like this would be to introducted
>> a parallel LargeList/LargeCollection interface that used 64-bit
>> indexing but otherwise shared semantics with the extant List/
>> Collection interfaces, and have the [] access work for either, and
>> then do the large array via a "LargeArrayList" (or something)
>> implementation of the large list, probably represented internally
>> as an array-of-arrays. I'll toss something together at some
>> point. . .
>>
> Large collections raise additional issues of use cases and performance
> expectations that don't arise with a simple large array. For example
> consider inserting near the front of a 4 billion element array list.
>
> Mark Thornton
>
>
More information about the coin-dev
mailing list