RFE: 64 bit pointers needed

Thu May 10 22:16:28 UTC 2012

You can use direct byte buffer + sun.misc.Unsafe as a workaround if
you only need array of primitives.

Rémi

On 05/10/2012 11:48 PM, Senseney, Justin (NIH/CIT) [E] wrote:
> Title: RFE: 64 bit pointers needed
> Author: Justin Senseney
> Organization: National Institutes of Health
> Owner: Justin Senseney
> Created: 2012/04/17
> Type: Feature
> State: Draft
> Exposure: Open
> Component: core/lang
> Scope: JDK
> JSR: TBD
> RFE: 4963452 (4850923, 4880587, 4088441, 6292967)
> Discussion: compiler-dev at openjdk.java.net
> Start: 2012/Q3
> Depends:
> Blocks:
> Effort: XL
> Duration: L
> Template: 1.0
> Internal-refs:
> Reviewed-by:
> Endorsed-by:
> Funded-by:
>
> Summary
> -------
>
> As per the Java Language Specification, section 10.4, all array access in Java is done by using an int as index. Since an int is a signed 32bit value, this limits the total number of addressable elements of an array to 2**31 (about 2 billion). It should be possible to address an array using 64bit values.
>
> Goals
> -----
>
> Improved handling of large datasets that need to be stored in contiguous arrays.
>
> Non-Goals
> ---------
>
> Not changing existing range of Integer
>
> Success Metrics
> ---------------
>
> Able to compile boolean[] a = new boolean[Long.MAX_VALUE];
>
> Motivation
> ----------
>
> While having access to 2 billion entries may seem sufficient, there are very compelling performance reasons to be able to use more in a single array. As an example, consider a square n*n matrix, stored as an array (either row or column major, doesn't matter which). Since an array stores at most 2**31 entries, this means that n=sqrt(2**31)=46341, thus the matrix cannot be very large. For multidimensional arrays this is an even more severe limitation (3d Tensors could at most be of size 1290).
>
> Description
> -----------
>
> The scope of this work is extensive, however the solution may be quite technically feasible.
>
> Alternatives
> ------------
>
> A workaround is to use an array of arrays (ie. double[][]). However there is no guarantee that successive rows will be laid of linearly in memory, and therefore performance may be severely penalized. Experimentally, performance may suffer by a factor of over 2, often far greater.
>
> Also, most existing matrix packages (ie. LAPACK) assumes linear storage, and are thus incompatible with a double[][] storage (requires double[]). Calling a LAPACK routine with a jagged storage thus requires extra array copying and memory allocation, and can further decrease performance and increase memory requirements.
>
>
> Testing
> -------
>
> It should be possible to address arrays using 64bit integers (long?), as this provides a seamless transition for users of 64bit computers.
>
> Risks and Assumptions
> ---------------------
>
> Use of array of array constructs (use double[][] instead of double[]) possible as workaround. This feature is well implemented in C/C++ without any problem, so should be quite technically feasible to implement.
>
> Dependences
> -----------
>
> None none.
>
> Impact
> ------
>
> My group has requested this feature for several years.  It is currently listed as one of the top 25 RFEs on http://bugs.sun.com/top25_rfes.do.  Please help Java maintain its relevance by implementing this.   I have several image processing applications that are severely limited by this bug, these images cannot be opened in most Java applications.  These include electron microscopy and micro-CT images where storage of a single slice requires more entries than allowable in a Java array.
>
>
>
> Thank you for considering this RFE,
> Justin Senseney
> BIRSS/ISL/DCB/CIT/NIH
> 301-594-5887
> 301-480-0028 (fax)
> Building 12A/2015
>
> http://mipav.cit.nih.gov
> http://dcb.cit.nih.gov/~senseneyj
> http://image.nih.gov
>