Proposal: Large arrays

james lowden jl0235 at yahoo.com
Tue Mar 24 10:05:26 PDT 2009


Not terribly relevant to the content of the proposal, but the fibonacci example should read:

 
long [[]] fib = new long [[3000000000L]];
fib [0] = 1;
fib [1] = 1;
for (long i = 2; i < fib.length; i++)
  fib [i] = fib [i - 1] + fib [i - 2];


--- On Tue, 3/24/09, james lowden <jl0235 at yahoo.com> wrote:

> From: james lowden <jl0235 at yahoo.com>
> Subject: Proposal: Large arrays
> To: coin-dev at openjdk.java.net
> Date: Tuesday, March 24, 2009, 11:35 AM
> Proposal: Large arrays
> 
> AUTHOR(S):
> 
> J. Lowden
> 
> VERSION:
> 
> 1.0    Initial version.
> 
> 
> OVERVIEW
> 
> FEATURE SUMMARY:
> 
> Java arrays are accessed via 32-bit ints, resulting in a
> maximum theoretical array size of 2147483647 elements. 
> While 
> 
> this is enough for most uses, some applications that need
> to handle large sequential data sets in memory would benefit
> 
> 
> from the ability to work with arrays indexed using 64-bit
> indices.  As "simply" changing the current array
> syntax to use 
> 
> longs as indices rather than ints would have the potential
> to break a large amount of existing code, I'm not
> proposing 
> 
> doing any such thing.  Instead, I'd like to see a
> separate but parallel array feature for creating and
> accessing both 
> 
> types of arrays.  A rather boring example, which simply
> fills a byte array with hexadecimal "FF" values,
> is shown below:
> 
> //int-indexed array
> byte [] foo = new byte [200000];
> for (int i = 0; i < foo.length; i++)
>   foo [i] = (byte) 0xff;
> 
> //long-indexed array
> byte [[]] foo2 = new byte [[20000000000L]];
> for (long i = 0; i < foo2.length; i++)
>   foo2 [i] = (byte) 0xff;
> 
> Syntax for declaration, instantation, instanceof and
> casting would use doubled square brackets to differentiate
> it from 
> 
> current array access.  Single brackets can still be used
> for access/mutation as the compiler should have sufficient 
> 
> information to decide whether it can treat a specific
> reference as an int-indexed or long-indexed
> ("large") array.  The length field would be a long
> rather than an int.  Like standard arrays, large arrays
> would derive directly from 
> 
> java.lang.Object.
> 
> 
> MAJOR ADVANTAGE:
> 
> Java gains the ability to easily work with large sequential
> data sets.
> 
> 
> MAJOR BENEFIT:
> 
> Declaring extremely large arrays simply isn't possible
> right now; in cases where such a structure is desirable,
> something 
> 
> has to be hacked together.  For applications dealing with
> large data sets, the current upper limit on array size is
> constricting, and will only grow more so as 64-bit systems
> continue to become more common and RAM less expensive.
> 
> 
> MAJOR DISADVANTAGE:
> 
> This can't be implemented well solely via a compiler
> patch; existing VMs would likely need to be changed to
> support this 
> 
> concept natively; new VM instructions parallel to the
> existing array instructions would likely be required.  Also,
> a class 
> 
> parallel to java.util.Arrays for performing standard
> operations on large arrays would likely be desirable
> (although 
> 
> presumably fairly simple to implement.)
> 
> 
> ALTERNATIVES:
> 
> Build your own class for storing long-indexes sequences,
> possibly as an array-or-arrays.
> 
> 
> EXAMPLES
> 
> SIMPLE EXAMPLE:
> 
> // calculate the first 3 billion fibonacci numbers and
> store them in an array
> 
> long [[]] fib = new long [[3000000000L]];
> fib [0] = 0;
> fib [1] = 0;
> for (long i = 2; i < fib.length; i++)
>   fib [i] = fib [i - 1] + fib [i - 2];
> 
> 
> ADVANCED EXAMPLE:
> 
> // this doesn't really do anything particular useful,
> but does serve to show how casting and instanceof are
> handled
> byte [] foo = new byte [400000];
> byte [[]] blah = new byte [[40000000000L]];
> Object temp = Math.random () < 0.5 ? foo : blah;
> if (foo instanceof byte [[]])
>   System.out.println (((byte [[]]) foo).length);
> else
>   System.out.println (((byte []) foo).length);
> 
> 
> DETAILS
> 
> SPECIFICATION:
> 
> Syntax-wise, the following expressions become legal:
> 
> SomeType [[]] foo;    // declares foo as a long-indexed
> "large" array of SomeType,
> 			// where sometype is either a primitive or reference
> type
> 
> foo = new SomeType [[some_long_value]];   // instantiates a
> large array of length
> 					// some_long_value, which is either a long
> 					// or some type that can upconvert or unbox
> 					// to a long
> 
> long some_long = foo.length;   // large arrays will have a
> "length" field, which is
> 				// a long indicating the number of elements in the
> array
> 
> foo instanceof SomeType [[]]   // returns true if foo is a
> large array
> 				// of SomeType, false otherwise
> 
> (SomeType [[]]) foo    // casts foo to a large array of
> SomeType.  If foo isn't 
> 			// a large array of SomeType or a subclass of SomeType,
> 			// throws a ClassCastException
> 
> 
> Large arrays are not castable or assignable to or from
> int-indexed arrays of the same element type.
> 
> int [[]] p = new int [40000]; // compiler error
> int [] p = new int [[760000]]; // compiler error
> int [] foo = (int []) (new int [[4040404040L]]); // throws
> ClassCastException
> int [[]] foo = (int [[]]) (new int [4040]); // throws
> ClassCastException
>  
> 
> Canonical class names for large arrays would be generated
> in a similar fashion to those for standard arrays, but 
> 
> using the opposite square bracket.  The following table
> shows some examples.
> 
> declared type     class name
> int [[]]          ]I
> int [[]][[]]      ]]I
> Object [[]]       ]Ljava.lang.Object;
> 
> 
> The same set of indicator character (i.e., 'I' for
> int, 'L' for reference types, etc.) as are currently
> used for arrays 
> 
> would be used for large arrays.
> 
> A set of additional VM instructions parallel to the current
> array instructions would be added to support this feature.  
> 
> The following table shows the current instruction, the
> proposed instruction for large arrays, and any semantic 
> 
> differences (reference
> http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html):
> 
> array instruction   proposed instruction   differences
> aaload              lxaaload               index value
> becomes a long
> aastore             lxaastore              index value
> becomes a long
> anewarray           lxanewarray            count value
> becomes a long
> arraylength         lxarraylength          return value
> becomes a long
> baload              lxbaload               index value
> becomes a long
> bastore             lxbastore              index value
> becomes a long
> caload              lxcaload               index value
> becomes a long
> castore             lxcastore              index value
> becomes a long
> daload              lxdaload               index value
> becomes a long
> dastore             lxdastore              index value
> becomes a long
> faload              lxfaload               index value
> becomes a long
> fastore             lxfastore              index value
> becomes a long
> iaload              lxiaload               index value
> becomes a long
> iastore             lxiastore              index value
> becomes a long
> laload              lxlaload               index value
> becomes a long
> lastore             lxlastore              index value
> becomes a long
> multianewarray      lxmultianewarray       countN values
> become longs
> newarray            lxnewarray             count value
> becomes a long
> saload              lxsaload               index value
> becomes a long
> sastore             lxsastore              index value
> becomes a long
> 
> 
> COMPILATION:
> 
> Compilation would be parallel to the present mechanism for
> compiling arrays, but would output the lx* VM instructions
> listed above for large arrays instead of the current array
> instructions.
> 
> TESTING:
> 
> Any test sufficient to test support for current arrays
> should work for this as well.
> 
> LIBRARY SUPPORT:
> 
> It would probably be desirable to create large array
> versions of the various java.util.Arrays methods, whether
> that be done by adding methods to the existing
> java.util.Arrays class or putting them somewhere new.  At
> some point in the future, large java.util.List<X>
> might be a desirable library feature, but that is beyond the
> scope of this proposal.
> 
> REFLECTIVE APIS:
> 
> An additional class (LargeArray or something of that sort)
> would need to be added to the java.lang.reflect package. 
> This would have a similar set of methods and constructors to
> java.lang.reflect.Array, but with long "index" and
> "length" parameters where appropriate.
> 
> OTHER CHANGES:
> 
> VM needs to support the above-listed large array
> instructions.
> 
> 
> MIGRATION:
> 
> Existing "hacks" to provide this kind of behavior
> could be replaced with large arrays.
> 
> 
> COMPATIBILITY
> 
> BREAKING CHANGES:
> 
> None. This feature simple doesn't exist; the proposed
> declaration/instantation syntax currently results in a
> compiler error.
> 
> 
> EXISTING PROGRAMS:
> 
> No changes.
> 
> 
> REFERENCES
> 
> EXISTING BUGS:
> 
> 4880587 (64-Bit indexing for arrays)
> 
> 
> URL FOR PROTOTYPE (optional):
> 
> None.


      



More information about the coin-dev mailing list