Proposal: Large arrays
james lowden
jl0235 at yahoo.com
Tue Mar 24 10:05:26 PDT 2009
Not terribly relevant to the content of the proposal, but the fibonacci example should read:
long [[]] fib = new long [[3000000000L]];
fib [0] = 1;
fib [1] = 1;
for (long i = 2; i < fib.length; i++)
fib [i] = fib [i - 1] + fib [i - 2];
--- On Tue, 3/24/09, james lowden <jl0235 at yahoo.com> wrote:
> From: james lowden <jl0235 at yahoo.com>
> Subject: Proposal: Large arrays
> To: coin-dev at openjdk.java.net
> Date: Tuesday, March 24, 2009, 11:35 AM
> Proposal: Large arrays
>
> AUTHOR(S):
>
> J. Lowden
>
> VERSION:
>
> 1.0 Initial version.
>
>
> OVERVIEW
>
> FEATURE SUMMARY:
>
> Java arrays are accessed via 32-bit ints, resulting in a
> maximum theoretical array size of 2147483647 elements.
> While
>
> this is enough for most uses, some applications that need
> to handle large sequential data sets in memory would benefit
>
>
> from the ability to work with arrays indexed using 64-bit
> indices. As "simply" changing the current array
> syntax to use
>
> longs as indices rather than ints would have the potential
> to break a large amount of existing code, I'm not
> proposing
>
> doing any such thing. Instead, I'd like to see a
> separate but parallel array feature for creating and
> accessing both
>
> types of arrays. A rather boring example, which simply
> fills a byte array with hexadecimal "FF" values,
> is shown below:
>
> //int-indexed array
> byte [] foo = new byte [200000];
> for (int i = 0; i < foo.length; i++)
> foo [i] = (byte) 0xff;
>
> //long-indexed array
> byte [[]] foo2 = new byte [[20000000000L]];
> for (long i = 0; i < foo2.length; i++)
> foo2 [i] = (byte) 0xff;
>
> Syntax for declaration, instantation, instanceof and
> casting would use doubled square brackets to differentiate
> it from
>
> current array access. Single brackets can still be used
> for access/mutation as the compiler should have sufficient
>
> information to decide whether it can treat a specific
> reference as an int-indexed or long-indexed
> ("large") array. The length field would be a long
> rather than an int. Like standard arrays, large arrays
> would derive directly from
>
> java.lang.Object.
>
>
> MAJOR ADVANTAGE:
>
> Java gains the ability to easily work with large sequential
> data sets.
>
>
> MAJOR BENEFIT:
>
> Declaring extremely large arrays simply isn't possible
> right now; in cases where such a structure is desirable,
> something
>
> has to be hacked together. For applications dealing with
> large data sets, the current upper limit on array size is
> constricting, and will only grow more so as 64-bit systems
> continue to become more common and RAM less expensive.
>
>
> MAJOR DISADVANTAGE:
>
> This can't be implemented well solely via a compiler
> patch; existing VMs would likely need to be changed to
> support this
>
> concept natively; new VM instructions parallel to the
> existing array instructions would likely be required. Also,
> a class
>
> parallel to java.util.Arrays for performing standard
> operations on large arrays would likely be desirable
> (although
>
> presumably fairly simple to implement.)
>
>
> ALTERNATIVES:
>
> Build your own class for storing long-indexes sequences,
> possibly as an array-or-arrays.
>
>
> EXAMPLES
>
> SIMPLE EXAMPLE:
>
> // calculate the first 3 billion fibonacci numbers and
> store them in an array
>
> long [[]] fib = new long [[3000000000L]];
> fib [0] = 0;
> fib [1] = 0;
> for (long i = 2; i < fib.length; i++)
> fib [i] = fib [i - 1] + fib [i - 2];
>
>
> ADVANCED EXAMPLE:
>
> // this doesn't really do anything particular useful,
> but does serve to show how casting and instanceof are
> handled
> byte [] foo = new byte [400000];
> byte [[]] blah = new byte [[40000000000L]];
> Object temp = Math.random () < 0.5 ? foo : blah;
> if (foo instanceof byte [[]])
> System.out.println (((byte [[]]) foo).length);
> else
> System.out.println (((byte []) foo).length);
>
>
> DETAILS
>
> SPECIFICATION:
>
> Syntax-wise, the following expressions become legal:
>
> SomeType [[]] foo; // declares foo as a long-indexed
> "large" array of SomeType,
> // where sometype is either a primitive or reference
> type
>
> foo = new SomeType [[some_long_value]]; // instantiates a
> large array of length
> // some_long_value, which is either a long
> // or some type that can upconvert or unbox
> // to a long
>
> long some_long = foo.length; // large arrays will have a
> "length" field, which is
> // a long indicating the number of elements in the
> array
>
> foo instanceof SomeType [[]] // returns true if foo is a
> large array
> // of SomeType, false otherwise
>
> (SomeType [[]]) foo // casts foo to a large array of
> SomeType. If foo isn't
> // a large array of SomeType or a subclass of SomeType,
> // throws a ClassCastException
>
>
> Large arrays are not castable or assignable to or from
> int-indexed arrays of the same element type.
>
> int [[]] p = new int [40000]; // compiler error
> int [] p = new int [[760000]]; // compiler error
> int [] foo = (int []) (new int [[4040404040L]]); // throws
> ClassCastException
> int [[]] foo = (int [[]]) (new int [4040]); // throws
> ClassCastException
>
>
> Canonical class names for large arrays would be generated
> in a similar fashion to those for standard arrays, but
>
> using the opposite square bracket. The following table
> shows some examples.
>
> declared type class name
> int [[]] ]I
> int [[]][[]] ]]I
> Object [[]] ]Ljava.lang.Object;
>
>
> The same set of indicator character (i.e., 'I' for
> int, 'L' for reference types, etc.) as are currently
> used for arrays
>
> would be used for large arrays.
>
> A set of additional VM instructions parallel to the current
> array instructions would be added to support this feature.
>
> The following table shows the current instruction, the
> proposed instruction for large arrays, and any semantic
>
> differences (reference
> http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html):
>
> array instruction proposed instruction differences
> aaload lxaaload index value
> becomes a long
> aastore lxaastore index value
> becomes a long
> anewarray lxanewarray count value
> becomes a long
> arraylength lxarraylength return value
> becomes a long
> baload lxbaload index value
> becomes a long
> bastore lxbastore index value
> becomes a long
> caload lxcaload index value
> becomes a long
> castore lxcastore index value
> becomes a long
> daload lxdaload index value
> becomes a long
> dastore lxdastore index value
> becomes a long
> faload lxfaload index value
> becomes a long
> fastore lxfastore index value
> becomes a long
> iaload lxiaload index value
> becomes a long
> iastore lxiastore index value
> becomes a long
> laload lxlaload index value
> becomes a long
> lastore lxlastore index value
> becomes a long
> multianewarray lxmultianewarray countN values
> become longs
> newarray lxnewarray count value
> becomes a long
> saload lxsaload index value
> becomes a long
> sastore lxsastore index value
> becomes a long
>
>
> COMPILATION:
>
> Compilation would be parallel to the present mechanism for
> compiling arrays, but would output the lx* VM instructions
> listed above for large arrays instead of the current array
> instructions.
>
> TESTING:
>
> Any test sufficient to test support for current arrays
> should work for this as well.
>
> LIBRARY SUPPORT:
>
> It would probably be desirable to create large array
> versions of the various java.util.Arrays methods, whether
> that be done by adding methods to the existing
> java.util.Arrays class or putting them somewhere new. At
> some point in the future, large java.util.List<X>
> might be a desirable library feature, but that is beyond the
> scope of this proposal.
>
> REFLECTIVE APIS:
>
> An additional class (LargeArray or something of that sort)
> would need to be added to the java.lang.reflect package.
> This would have a similar set of methods and constructors to
> java.lang.reflect.Array, but with long "index" and
> "length" parameters where appropriate.
>
> OTHER CHANGES:
>
> VM needs to support the above-listed large array
> instructions.
>
>
> MIGRATION:
>
> Existing "hacks" to provide this kind of behavior
> could be replaced with large arrays.
>
>
> COMPATIBILITY
>
> BREAKING CHANGES:
>
> None. This feature simple doesn't exist; the proposed
> declaration/instantation syntax currently results in a
> compiler error.
>
>
> EXISTING PROGRAMS:
>
> No changes.
>
>
> REFERENCES
>
> EXISTING BUGS:
>
> 4880587 (64-Bit indexing for arrays)
>
>
> URL FOR PROTOTYPE (optional):
>
> None.
More information about the coin-dev
mailing list