Proposal: Large arrays
james lowden
jl0235 at yahoo.com
Tue Mar 24 09:35:30 PDT 2009
Proposal: Large arrays
AUTHOR(S):
J. Lowden
VERSION:
1.0 Initial version.
OVERVIEW
FEATURE SUMMARY:
Java arrays are accessed via 32-bit ints, resulting in a maximum theoretical array size of 2147483647 elements. While
this is enough for most uses, some applications that need to handle large sequential data sets in memory would benefit
from the ability to work with arrays indexed using 64-bit indices. As "simply" changing the current array syntax to use
longs as indices rather than ints would have the potential to break a large amount of existing code, I'm not proposing
doing any such thing. Instead, I'd like to see a separate but parallel array feature for creating and accessing both
types of arrays. A rather boring example, which simply fills a byte array with hexadecimal "FF" values, is shown below:
//int-indexed array
byte [] foo = new byte [200000];
for (int i = 0; i < foo.length; i++)
foo [i] = (byte) 0xff;
//long-indexed array
byte [[]] foo2 = new byte [[20000000000L]];
for (long i = 0; i < foo2.length; i++)
foo2 [i] = (byte) 0xff;
Syntax for declaration, instantation, instanceof and casting would use doubled square brackets to differentiate it from
current array access. Single brackets can still be used for access/mutation as the compiler should have sufficient
information to decide whether it can treat a specific reference as an int-indexed or long-indexed ("large") array. The length field would be a long rather than an int. Like standard arrays, large arrays would derive directly from
java.lang.Object.
MAJOR ADVANTAGE:
Java gains the ability to easily work with large sequential data sets.
MAJOR BENEFIT:
Declaring extremely large arrays simply isn't possible right now; in cases where such a structure is desirable, something
has to be hacked together. For applications dealing with large data sets, the current upper limit on array size is constricting, and will only grow more so as 64-bit systems continue to become more common and RAM less expensive.
MAJOR DISADVANTAGE:
This can't be implemented well solely via a compiler patch; existing VMs would likely need to be changed to support this
concept natively; new VM instructions parallel to the existing array instructions would likely be required. Also, a class
parallel to java.util.Arrays for performing standard operations on large arrays would likely be desirable (although
presumably fairly simple to implement.)
ALTERNATIVES:
Build your own class for storing long-indexes sequences, possibly as an array-or-arrays.
EXAMPLES
SIMPLE EXAMPLE:
// calculate the first 3 billion fibonacci numbers and store them in an array
long [[]] fib = new long [[3000000000L]];
fib [0] = 0;
fib [1] = 0;
for (long i = 2; i < fib.length; i++)
fib [i] = fib [i - 1] + fib [i - 2];
ADVANCED EXAMPLE:
// this doesn't really do anything particular useful, but does serve to show how casting and instanceof are handled
byte [] foo = new byte [400000];
byte [[]] blah = new byte [[40000000000L]];
Object temp = Math.random () < 0.5 ? foo : blah;
if (foo instanceof byte [[]])
System.out.println (((byte [[]]) foo).length);
else
System.out.println (((byte []) foo).length);
DETAILS
SPECIFICATION:
Syntax-wise, the following expressions become legal:
SomeType [[]] foo; // declares foo as a long-indexed "large" array of SomeType,
// where sometype is either a primitive or reference type
foo = new SomeType [[some_long_value]]; // instantiates a large array of length
// some_long_value, which is either a long
// or some type that can upconvert or unbox
// to a long
long some_long = foo.length; // large arrays will have a "length" field, which is
// a long indicating the number of elements in the array
foo instanceof SomeType [[]] // returns true if foo is a large array
// of SomeType, false otherwise
(SomeType [[]]) foo // casts foo to a large array of SomeType. If foo isn't
// a large array of SomeType or a subclass of SomeType,
// throws a ClassCastException
Large arrays are not castable or assignable to or from int-indexed arrays of the same element type.
int [[]] p = new int [40000]; // compiler error
int [] p = new int [[760000]]; // compiler error
int [] foo = (int []) (new int [[4040404040L]]); // throws ClassCastException
int [[]] foo = (int [[]]) (new int [4040]); // throws ClassCastException
Canonical class names for large arrays would be generated in a similar fashion to those for standard arrays, but
using the opposite square bracket. The following table shows some examples.
declared type class name
int [[]] ]I
int [[]][[]] ]]I
Object [[]] ]Ljava.lang.Object;
The same set of indicator character (i.e., 'I' for int, 'L' for reference types, etc.) as are currently used for arrays
would be used for large arrays.
A set of additional VM instructions parallel to the current array instructions would be added to support this feature.
The following table shows the current instruction, the proposed instruction for large arrays, and any semantic
differences (reference http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html):
array instruction proposed instruction differences
aaload lxaaload index value becomes a long
aastore lxaastore index value becomes a long
anewarray lxanewarray count value becomes a long
arraylength lxarraylength return value becomes a long
baload lxbaload index value becomes a long
bastore lxbastore index value becomes a long
caload lxcaload index value becomes a long
castore lxcastore index value becomes a long
daload lxdaload index value becomes a long
dastore lxdastore index value becomes a long
faload lxfaload index value becomes a long
fastore lxfastore index value becomes a long
iaload lxiaload index value becomes a long
iastore lxiastore index value becomes a long
laload lxlaload index value becomes a long
lastore lxlastore index value becomes a long
multianewarray lxmultianewarray countN values become longs
newarray lxnewarray count value becomes a long
saload lxsaload index value becomes a long
sastore lxsastore index value becomes a long
COMPILATION:
Compilation would be parallel to the present mechanism for compiling arrays, but would output the lx* VM instructions listed above for large arrays instead of the current array instructions.
TESTING:
Any test sufficient to test support for current arrays should work for this as well.
LIBRARY SUPPORT:
It would probably be desirable to create large array versions of the various java.util.Arrays methods, whether that be done by adding methods to the existing java.util.Arrays class or putting them somewhere new. At some point in the future, large java.util.List<X> might be a desirable library feature, but that is beyond the scope of this proposal.
REFLECTIVE APIS:
An additional class (LargeArray or something of that sort) would need to be added to the java.lang.reflect package. This would have a similar set of methods and constructors to java.lang.reflect.Array, but with long "index" and "length" parameters where appropriate.
OTHER CHANGES:
VM needs to support the above-listed large array instructions.
MIGRATION:
Existing "hacks" to provide this kind of behavior could be replaced with large arrays.
COMPATIBILITY
BREAKING CHANGES:
None. This feature simple doesn't exist; the proposed declaration/instantation syntax currently results in a compiler error.
EXISTING PROGRAMS:
No changes.
REFERENCES
EXISTING BUGS:
4880587 (64-Bit indexing for arrays)
URL FOR PROTOTYPE (optional):
None.
More information about the coin-dev
mailing list