RFE: 64 bit pointers needed

Senseney, Justin (NIH/CIT) [E] senseneyj at mail.nih.gov
Thu May 10 14:48:34 PDT 2012


Title: RFE: 64 bit pointers needed
Author: Justin Senseney
Organization: National Institutes of Health
Owner: Justin Senseney
Created: 2012/04/17
Type: Feature
State: Draft
Exposure: Open
Component: core/lang
Scope: JDK
JSR: TBD
RFE: 4963452 (4850923, 4880587, 4088441, 6292967)
Discussion: compiler-dev at openjdk.java.net
Start: 2012/Q3
Depends:
Blocks:
Effort: XL
Duration: L
Template: 1.0
Internal-refs:
Reviewed-by:
Endorsed-by:
Funded-by:

Summary
-------

As per the Java Language Specification, section 10.4, all array access in Java is done by using an int as index. Since an int is a signed 32bit value, this limits the total number of addressable elements of an array to 2**31 (about 2 billion). It should be possible to address an array using 64bit values.

Goals
-----

Improved handling of large datasets that need to be stored in contiguous arrays.

Non-Goals
---------

Not changing existing range of Integer

Success Metrics
---------------

Able to compile boolean[] a = new boolean[Long.MAX_VALUE];

Motivation
----------

While having access to 2 billion entries may seem sufficient, there are very compelling performance reasons to be able to use more in a single array. As an example, consider a square n*n matrix, stored as an array (either row or column major, doesn't matter which). Since an array stores at most 2**31 entries, this means that n=sqrt(2**31)=46341, thus the matrix cannot be very large. For multidimensional arrays this is an even more severe limitation (3d Tensors could at most be of size 1290).

Description
-----------

The scope of this work is extensive, however the solution may be quite technically feasible.

Alternatives
------------

A workaround is to use an array of arrays (ie. double[][]). However there is no guarantee that successive rows will be laid of linearly in memory, and therefore performance may be severely penalized. Experimentally, performance may suffer by a factor of over 2, often far greater.

Also, most existing matrix packages (ie. LAPACK) assumes linear storage, and are thus incompatible with a double[][] storage (requires double[]). Calling a LAPACK routine with a jagged storage thus requires extra array copying and memory allocation, and can further decrease performance and increase memory requirements.


Testing
-------

It should be possible to address arrays using 64bit integers (long?), as this provides a seamless transition for users of 64bit computers.

Risks and Assumptions
---------------------

Use of array of array constructs (use double[][] instead of double[]) possible as workaround. This feature is well implemented in C/C++ without any problem, so should be quite technically feasible to implement.

Dependences
-----------

None none.

Impact
------

My group has requested this feature for several years.  It is currently listed as one of the top 25 RFEs on http://bugs.sun.com/top25_rfes.do.  Please help Java maintain its relevance by implementing this.   I have several image processing applications that are severely limited by this bug, these images cannot be opened in most Java applications.  These include electron microscopy and micro-CT images where storage of a single slice requires more entries than allowable in a Java array.



Thank you for considering this RFE,
Justin Senseney
BIRSS/ISL/DCB/CIT/NIH
301-594-5887
301-480-0028 (fax)
Building 12A/2015

http://mipav.cit.nih.gov
http://dcb.cit.nih.gov/~senseneyj
http://image.nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20120510/783f048b/attachment.html 


More information about the compiler-dev mailing list