Proposal: Large arrays

james lowden jl0235 at yahoo.com
Tue Mar 24 11:24:20 PDT 2009


My (rough) thoughts on this:

There are going to be a few different levels of difficulty as regards dealing with existing array API.

The easiest case is methods like the one-argument Arrays.sort.  The signature doesn't need to change to support this, and no duplicate public method is necessary; the implementation needs to change to check the size as a long, and that's about it.

Next are methods like the multi-argument Arrays.sort and Arrays.fill that take in indices.  We can't change the existing methods to take in longs without breaking existing code, but we can just add a parallel method that takes in longs and it's pretty transparent to the caller; either way, you're passing in an array and indices.

The binarySearch is nastier as you can't (easily) create or call two methods with the same parameters but different return types.  My preference would be to create new parallel methods (call it logNSearch or something?) and leave the old ones there for extant code but deprecate them, so that if I try and use an int-only binary search, my code still compiles but the compiler yells at me and encourages me to change it.  If the old binary search is called on an array too big for it to handle, throw IllegalArgumentException (or maybe ArrayIndexOutOfBoundsException, although that's not *quite* in line with the current meaning of the ArrayIndexOutOfBoundsException) or maybe some new exception type like ArraySizeException or EpicArrayFailException or something.

The case that really worries me, actually, is collections, as we can't retrofit interfaces, and java.util.Collection is built around an int size.
We could maybe extend Collection, Set, List, Map, etc. with some new interface that defines long-based size/indexing methods and retroactively apply this to the concrete implementations like ArrayList, HashMap, etc., and deprecate the current Collection.size ().

It's definitely a hairy problem, but I think it's one that the language is going to have to deal with sooner or later--I'm currently hackishly using two-dimensional arrays in cases where my array might need to be outside of the current limits, but this results in ugly code and incurs some performance penalties.

-JL-


--- On Tue, 3/24/09, Joshua Bloch <jjb at google.com> wrote:

> From: Joshua Bloch <jjb at google.com>
> Subject: Re: Proposal: Large arrays
> To: "Joseph D. Darcy" <Joe.Darcy at sun.com>
> Cc: jl0235 at yahoo.com, coin-dev at openjdk.java.net
> Date: Tuesday, March 24, 2009, 1:03 PM
> Much as like the idea of 63-bit addressable arrays (hell
> even 32 would be
> nice, as opposed to the 31 we have now), I believe the
> source impact is
> large.  Consider our old friend Arrays.binarySearch:
> 
> public static int *binarySearch*(char[] a,
>                                char key)
> 
> Searches the specified array of chars for the specified
> value using the
> binary search algorithm. The array must be sorted (as by
> the
> sort(char[])<http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort%28char%5B%5D%29>method)
> prior to making this call. If it is not sorted, the results
> are
> undefined. If the array contains multiple elements with the
> specified value,
> there is no guarantee which one will be found.
> 
> *Parameters:*a - the array to be searchedkey - the value to
> be searched for
> *Returns:*index of the search key, if it is contained in
> the array;
> otherwise, (-(*insertion point*) - 1). The *insertion
> point* is defined as
> the point at which the key would be inserted into the
> array: the index of
> the first element greater than the key, or a.length if all
> elements in the
> array are less than the specified key. Note that this
> guarantees that the
> return value will be >= 0 if and only if the key is
> found. I don't see how
> we could change the return value to long--it's binary
> incompatible.  That
> means we need a second version of the call that can
> tolerate big arrays, and
> people have to remember to call it.
> 
> And what do we do if the original version is given an array
> that's longer
> than Integer.MAX_VALUE locations? I suppose we have to
> throw an
> IllegalArgumentException.  So how much damage will this
> cause to the
> libraries?  How many methods will need twins that tolerate
> large arrays?
> How much damage will it cause to user code?  I don't
> think these are easy
> questions to answer.
> 
>               Josh
> 
> 
> 
> 
> On Tue, Mar 24, 2009 at 10:52 AM, Joseph D. Darcy
> <Joe.Darcy at sun.com> wrote:
> 
> > james lowden wrote:
> > > Proposal: Large arrays
> > >
> > > AUTHOR(S):
> > >
> > > J. Lowden
> > >
> > > VERSION:
> > >
> > > 1.0    Initial version.
> > >
> > >
> > > OVERVIEW
> > >
> > > FEATURE SUMMARY:
> > >
> > > Java arrays are accessed via 32-bit ints,
> resulting in a maximum
> > theoretical array size of 2147483647 elements.  While
> > >
> > > this is enough for most uses, some applications
> that need to handle large
> > sequential data sets in memory would benefit
> > >
> > > from the ability to work with arrays indexed
> using 64-bit indices.  As
> > "simply" changing the current array syntax
> to use
> > >
> > > longs as indices rather than ints would have the
> potential to break a
> > large amount of existing code,
> >
> > I don't think the source compatibility impact is
> that large actually.
> > Let's say *all* arrays can potentially be 64-bit
> now.  The existing code
> > that uses an int expression to index into the array
> will still index
> > into the same element and negative indexes will still
> cause an
> > exception.  Now, how should the length field be set? 
> I'd guess it
> > should saturate to Integer.MAX_VALUE for oversize
> arrays and a new "long
> > size()" method could be added to return the full
> length.
> >
> > The JVM would need to be modified to handle the long
> indexes of course;
> > perhaps the wide bytecode prefix could be put to use
> here.
> >
> > [snip]
> >
> > > ALTERNATIVES:
> > >
> > > Build your own class for storing long-indexes
> sequences, possibly as an
> > array-or-arrays.
> > >
> >
> > This would be eased if the bracket operator could be
> used for
> > user-defined types too since today functionally an
> array is just a
> > fixed-length map from int to some other type.
> >
> > -Joe
> >
> >
> >


      



More information about the coin-dev mailing list