Proposal: Large arrays

james lowden jl0235 at yahoo.com
Tue Mar 24 10:54:55 PDT 2009


Ideally, I would like to see the ability to declare and manipulate arrays with long indicies exactly the same way that we currently do with ints.  Unfortunately, this introduces the need to have the standard array's length field become a long, which immediately breaks a whole lot of existing code; thus I went with proposing a totally new array type.

-JL-

--- On Tue, 3/24/09, Reinier Zwitserloot <reinier at zwitserloot.com> wrote:

> From: Reinier Zwitserloot <reinier at zwitserloot.com>
> Subject: Re: Proposal: Large arrays
> To: jl0235 at yahoo.com
> Cc: coin-dev at openjdk.java.net
> Date: Tuesday, March 24, 2009, 11:53 AM
> This change has serious impact on the JVM, and the language
> support for java is clearly a patchy hack: The
> 'best' solution (however unattainable that may be)
> is to just ensure that arrays can be long-indexed, and
> forego the double bracket notation. You also forgot to
> mention you need rather a lot of work in the reflection API,
> because you're introducing a new magic type here (there
> are the primitives, arrays, and objects. You're adding a
> new one: large arrays).
> 
> 
> I'm opposed to any solution where a clearly superior
> one that is similar is obvious, and not clearly impossible
> to implement (even if it will be implemented in java8 or
> beyond). If your proposal was somehow compatible with such a
> future, that'd be okay, I guess, but it isn't: If
> this double-bracket feature is added, then the extra type
> will never go away.
> 
> 
> If you're willing to eat up 20 of the precious opcode
> space, you can instead propose something that is much
> simpler for all involved:
> 
> Any attempt to use arrays with a long instead of an int
> will work fine, it'll just use a different opcodes.
> Internally, there are such things as large arrays, but you
> can index into them using an int, and you can index into a
> plain array with a long (with the usual result if your long
> has a non-zero upper int: AIOOBException). For most other
> things including reflection, there is no discernable
> difference between large and non-large arrays. You can't
> grow arrays, so I don't really understand why you'd
> want a distinction in the first place. That's just
> implementation detail leaking through.
> 
>  --Reinier Zwitserloot
> 
> 
> 
> On Mar 24, 2009, at 17:35, james lowden wrote:
> 
> > 
> > Proposal: Large arrays
> > 
> > AUTHOR(S):
> > 
> > J. Lowden
> > 
> > VERSION:
> > 
> > 1.0    Initial version.
> > 
> > 
> > OVERVIEW
> > 
> > FEATURE SUMMARY:
> > 
> > Java arrays are accessed via 32-bit ints, resulting in
> a maximum theoretical array size of 2147483647 elements. 
> While
> > 
> > this is enough for most uses, some applications that
> need to handle large sequential data sets in memory would
> benefit
> > 
> > from the ability to work with arrays indexed using
> 64-bit indices.  As "simply" changing the current
> array syntax to use
> > 
> > longs as indices rather than ints would have the
> potential to break a large amount of existing code, I'm
> not proposing
> > 
> > doing any such thing.  Instead, I'd like to see a
> separate but parallel array feature for creating and
> accessing both
> > 
> > types of arrays.  A rather boring example, which
> simply fills a byte array with hexadecimal "FF"
> values, is shown below:
> > 
> > //int-indexed array
> > byte [] foo = new byte [200000];
> > for (int i = 0; i < foo.length; i++)
> >  foo [i] = (byte) 0xff;
> > 
> > //long-indexed array
> > byte [[]] foo2 = new byte [[20000000000L]];
> > for (long i = 0; i < foo2.length; i++)
> >  foo2 [i] = (byte) 0xff;
> > 
> > Syntax for declaration, instantation, instanceof and
> casting would use doubled square brackets to differentiate
> it from
> > 
> > current array access.  Single brackets can still be
> used for access/mutation as the compiler should have
> sufficient
> > 
> > information to decide whether it can treat a specific
> reference as an int-indexed or long-indexed
> ("large") array.  The length field would be a long
> rather than an int.  Like standard arrays, large arrays
> would derive directly from
> > 
> > java.lang.Object.
> > 
> > 
> > MAJOR ADVANTAGE:
> > 
> > Java gains the ability to easily work with large
> sequential data sets.
> > 
> > 
> > MAJOR BENEFIT:
> > 
> > Declaring extremely large arrays simply isn't
> possible right now; in cases where such a structure is
> desirable, something
> > 
> > has to be hacked together.  For applications dealing
> with large data sets, the current upper limit on array size
> is constricting, and will only grow more so as 64-bit
> systems continue to become more common and RAM less
> expensive.
> > 
> > 
> > MAJOR DISADVANTAGE:
> > 
> > This can't be implemented well solely via a
> compiler patch; existing VMs would likely need to be changed
> to support this
> > 
> > concept natively; new VM instructions parallel to the
> existing array instructions would likely be required.  Also,
> a class
> > 
> > parallel to java.util.Arrays for performing standard
> operations on large arrays would likely be desirable
> (although
> > 
> > presumably fairly simple to implement.)
> > 
> > 
> > ALTERNATIVES:
> > 
> > Build your own class for storing long-indexes
> sequences, possibly as an array-or-arrays.
> > 
> > 
> > EXAMPLES
> > 
> > SIMPLE EXAMPLE:
> > 
> > // calculate the first 3 billion fibonacci numbers and
> store them in an array
> > 
> > long [[]] fib = new long [[3000000000L]];
> > fib [0] = 0;
> > fib [1] = 0;
> > for (long i = 2; i < fib.length; i++)
> >  fib [i] = fib [i - 1] + fib [i - 2];
> > 
> > 
> > ADVANCED EXAMPLE:
> > 
> > // this doesn't really do anything particular
> useful, but does serve to show how casting and instanceof
> are handled
> > byte [] foo = new byte [400000];
> > byte [[]] blah = new byte [[40000000000L]];
> > Object temp = Math.random () < 0.5 ? foo : blah;
> > if (foo instanceof byte [[]])
> >  System.out.println (((byte [[]]) foo).length);
> > else
> >  System.out.println (((byte []) foo).length);
> > 
> > 
> > DETAILS
> > 
> > SPECIFICATION:
> > 
> > Syntax-wise, the following expressions become legal:
> > 
> > SomeType [[]] foo;    // declares foo as a
> long-indexed "large" array of SomeType,
> > 			// where sometype is either a primitive or
> reference type
> > 
> > foo = new SomeType [[some_long_value]];   //
> instantiates a large array of length
> > 					// some_long_value, which is either a long
> > 					// or some type that can upconvert or unbox
> > 					// to a long
> > 
> > long some_long = foo.length;   // large arrays will
> have a "length" field, which is
> > 				// a long indicating the number of elements in the
> array
> > 
> > foo instanceof SomeType [[]]   // returns true if foo
> is a large array
> > 				// of SomeType, false otherwise
> > 
> > (SomeType [[]]) foo    // casts foo to a large array
> of SomeType.  If foo isn't
> > 			// a large array of SomeType or a subclass of
> SomeType,
> > 			// throws a ClassCastException
> > 
> > 
> > Large arrays are not castable or assignable to or from
> int-indexed arrays of the same element type.
> > 
> > int [[]] p = new int [40000]; // compiler error
> > int [] p = new int [[760000]]; // compiler error
> > int [] foo = (int []) (new int [[4040404040L]]); //
> throws ClassCastException
> > int [[]] foo = (int [[]]) (new int [4040]); // throws
> ClassCastException
> > 
> > 
> > Canonical class names for large arrays would be
> generated in a similar fashion to those for standard arrays,
> but
> > 
> > using the opposite square bracket.  The following
> table shows some examples.
> > 
> > declared type     class name
> > int [[]]          ]I
> > int [[]][[]]      ]]I
> > Object [[]]       ]Ljava.lang.Object;
> > 
> > 
> > The same set of indicator character (i.e., 'I'
> for int, 'L' for reference types, etc.) as are
> currently used for arrays
> > 
> > would be used for large arrays.
> > 
> > A set of additional VM instructions parallel to the
> current array instructions would be added to support this
> feature.
> > 
> > The following table shows the current instruction, the
> proposed instruction for large arrays, and any semantic
> > 
> > differences (reference
> http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html):
> > 
> > array instruction   proposed instruction   differences
> > aaload              lxaaload               index value
> becomes a long
> > aastore             lxaastore              index value
> becomes a long
> > anewarray           lxanewarray            count value
> becomes a long
> > arraylength         lxarraylength          return
> value becomes a long
> > baload              lxbaload               index value
> becomes a long
> > bastore             lxbastore              index value
> becomes a long
> > caload              lxcaload               index value
> becomes a long
> > castore             lxcastore              index value
> becomes a long
> > daload              lxdaload               index value
> becomes a long
> > dastore             lxdastore              index value
> becomes a long
> > faload              lxfaload               index value
> becomes a long
> > fastore             lxfastore              index value
> becomes a long
> > iaload              lxiaload               index value
> becomes a long
> > iastore             lxiastore              index value
> becomes a long
> > laload              lxlaload               index value
> becomes a long
> > lastore             lxlastore              index value
> becomes a long
> > multianewarray      lxmultianewarray       countN
> values become longs
> > newarray            lxnewarray             count value
> becomes a long
> > saload              lxsaload               index value
> becomes a long
> > sastore             lxsastore              index value
> becomes a long
> > 
> > 
> > COMPILATION:
> > 
> > Compilation would be parallel to the present mechanism
> for compiling arrays, but would output the lx* VM
> instructions listed above for large arrays instead of the
> current array instructions.
> > 
> > TESTING:
> > 
> > Any test sufficient to test support for current arrays
> should work for this as well.
> > 
> > LIBRARY SUPPORT:
> > 
> > It would probably be desirable to create large array
> versions of the various java.util.Arrays methods, whether
> that be done by adding methods to the existing
> java.util.Arrays class or putting them somewhere new.  At
> some point in the future, large java.util.List<X>
> might be a desirable library feature, but that is beyond the
> scope of this proposal.
> > 
> > REFLECTIVE APIS:
> > 
> > An additional class (LargeArray or something of that
> sort) would need to be added to the java.lang.reflect
> package.  This would have a similar set of methods and
> constructors to java.lang.reflect.Array, but with long
> "index" and "length" parameters where
> appropriate.
> > 
> > OTHER CHANGES:
> > 
> > VM needs to support the above-listed large array
> instructions.
> > 
> > 
> > MIGRATION:
> > 
> > Existing "hacks" to provide this kind of
> behavior could be replaced with large arrays.
> > 
> > 
> > COMPATIBILITY
> > 
> > BREAKING CHANGES:
> > 
> > None. This feature simple doesn't exist; the
> proposed declaration/instantation syntax currently results
> in a compiler error.
> > 
> > 
> > EXISTING PROGRAMS:
> > 
> > No changes.
> > 
> > 
> > REFERENCES
> > 
> > EXISTING BUGS:
> > 
> > 4880587 (64-Bit indexing for arrays)
> > 
> > 
> > URL FOR PROTOTYPE (optional):
> > 
> > None.
> > 
> > 
> > 
> >



      



More information about the coin-dev mailing list