Proposal: Large arrays

Joshua Bloch jjb at google.com
Tue Mar 24 11:03:29 PDT 2009


Much as like the idea of 63-bit addressable arrays (hell even 32 would be
nice, as opposed to the 31 we have now), I believe the source impact is
large.  Consider our old friend Arrays.binarySearch:

public static int *binarySearch*(char[] a,
                               char key)

Searches the specified array of chars for the specified value using the
binary search algorithm. The array must be sorted (as by the
sort(char[])<http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort%28char%5B%5D%29>method)
prior to making this call. If it is not sorted, the results are
undefined. If the array contains multiple elements with the specified value,
there is no guarantee which one will be found.

*Parameters:*a - the array to be searchedkey - the value to be searched for
*Returns:*index of the search key, if it is contained in the array;
otherwise, (-(*insertion point*) - 1). The *insertion point* is defined as
the point at which the key would be inserted into the array: the index of
the first element greater than the key, or a.length if all elements in the
array are less than the specified key. Note that this guarantees that the
return value will be >= 0 if and only if the key is found. I don't see how
we could change the return value to long--it's binary incompatible.  That
means we need a second version of the call that can tolerate big arrays, and
people have to remember to call it.

And what do we do if the original version is given an array that's longer
than Integer.MAX_VALUE locations? I suppose we have to throw an
IllegalArgumentException.  So how much damage will this cause to the
libraries?  How many methods will need twins that tolerate large arrays?
How much damage will it cause to user code?  I don't think these are easy
questions to answer.

              Josh




On Tue, Mar 24, 2009 at 10:52 AM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:

> james lowden wrote:
> > Proposal: Large arrays
> >
> > AUTHOR(S):
> >
> > J. Lowden
> >
> > VERSION:
> >
> > 1.0    Initial version.
> >
> >
> > OVERVIEW
> >
> > FEATURE SUMMARY:
> >
> > Java arrays are accessed via 32-bit ints, resulting in a maximum
> theoretical array size of 2147483647 elements.  While
> >
> > this is enough for most uses, some applications that need to handle large
> sequential data sets in memory would benefit
> >
> > from the ability to work with arrays indexed using 64-bit indices.  As
> "simply" changing the current array syntax to use
> >
> > longs as indices rather than ints would have the potential to break a
> large amount of existing code,
>
> I don't think the source compatibility impact is that large actually.
> Let's say *all* arrays can potentially be 64-bit now.  The existing code
> that uses an int expression to index into the array will still index
> into the same element and negative indexes will still cause an
> exception.  Now, how should the length field be set?  I'd guess it
> should saturate to Integer.MAX_VALUE for oversize arrays and a new "long
> size()" method could be added to return the full length.
>
> The JVM would need to be modified to handle the long indexes of course;
> perhaps the wide bytecode prefix could be put to use here.
>
> [snip]
>
> > ALTERNATIVES:
> >
> > Build your own class for storing long-indexes sequences, possibly as an
> array-or-arrays.
> >
>
> This would be eased if the bracket operator could be used for
> user-defined types too since today functionally an array is just a
> fixed-length map from int to some other type.
>
> -Joe
>
>
>



More information about the coin-dev mailing list