A List implementation backed by multiple small arrays rather than the traditional single large array.
Kevin L. Stern
kevin.l.stern at gmail.com
Mon Mar 29 23:24:26 UTC 2010
One advantage of this approach over the VM approach is that no data copy is
necessary when the capacity of the data structure is expanded (new arrays
are tacked on to the end of the top level array of references) or contracted
(arrays of null are removed from the top level array of references) aside
from the (any?) copy of the top level array of references. One way to
address your concern, though, is to create a ChunkedArray class that simply
wraps an array and provides expand and contract functionality. This could
be reused in any/all collections.
On Mon, Mar 29, 2010 at 10:08 AM, Osvaldo Doederlein <opinali at gmail.com>wrote:
> Initially, it would be good enough to replace only java.util.ArrayList with
> minimal overhead. ArrayList does not support efficient add-at-front or other
> enhancements of ArrayDeque; but ArrayList is still a much more important and
> popular collection, it's the primary "straight replacement for primitive
> arrrays" and I guess it should continue with that role.
>
> One problem of both ArrayList and primitive arrays is that they're not
> GC-friendly; huge arrays suck for GC. IBM's realtime Metronome collector
> uses the "arraylet" structure for primitive arrays, so there is a hard
> upper-limit on object size (well, at least as long as apps don't define
> classes with thousands of fields, I guess). This avoids the whole issue of
> "large objects" which permits a simpler heap layout, better incremental GC,
> etc. There are two tradeoffs. First, some overhead for all array operations
> - but this is the least important, remarkably as the arraylet trick is
> implement at the VM level so we can rely on the JIT to perform extra
> optimizations (e.g., unrolling and other loop optimizations; bounds-check
> elimination and other array opts, may be arraylet-aware so most overhead is
> cancelled or at least lifted out of loop bodies and hot paths.) Second, no
> support at all for huge arrays is incompatible with native code that expects
> a continuous layout, e.g. for the byte[]s inside Images - so all these uses
> must be identified and fixed somehow, e.g using DirectBuffers, or changing
> the native layer to understand arraylets (image libaries may be OK with
> banding), or in the worst case just copy the data to/from a continuous,
> native array (in most cases I think this copy already happens for other
> reasons, so there's no extra copy, just a slightly more expensive copy).
>
> Now we're talking about some big VM change of course, but HotSpot would not
> be the first production VM to do this so maybe it's a viable project for the
> future, remarkably as Sun plans to keep raising the bar in
> incremental/realtime GC (G1 may already be a great step forward, but huge
> arrays will always spoil the fun for many apps).
>
> In summary I think the ChunkedArrayList would serve only as a stopgap
> solution, with extremely limited benefits unless it's sufficiently good so
> like Martin says, we can replace more List implementations. And I'll even
> add, replace many other collections too - e.g. a giant HashMap will contain
> a giant Entry[] array inside it, I want this array to be chunked too
> (ConcurrentHashMap already is, but it's tuned up differently, for concurrent
> usage - and that's just one example anyway). And by "replace" I further mean
> "change the implementation of all existing collections that are
> array-backed", not "offer new collections" as the latter will only be
> heavily used ten years from today when JavaSE7 is considered the minimum
> JavaSE release to be supported by apps/libraries/frameworks/containers/etc.
> Even then, the benefits will be clearly inferior to what can be achireved by
> VM-level arraylets.
>
> A+
> Osvaldo
>
>
> 2010/3/29 Martin Buchholz <martinrb at google.com>
>
> On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern <kevin.l.stern at gmail.com>
>> wrote:
>> > I put together the following class, ChunkedArrayList, in response to
>> > Martin's request (excerpted from an earlier conversation on this web
>> board)
>> > below.
>> >
>> >
>> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50
>> >
>> > Thoughts?
>>
>> This class is well on the way to what I was thinking of,
>> but my bar for acceptance is a little higher.
>> In particular, I don't want to add yet another class
>> that is can replace some, but not all of existing
>> list implementations.
>>
>> Most obviously, I don't want to lose the ability,
>> introduced in ArrayDeque, of having O(1) insertion
>> at the front and end of the collection.
>> Perhaps you can do this by having one "arraylet"
>> always be shared by both ends, which
>> grow towards each other in circular fashion.
>>
>> I also think we should shrink the array when
>> necessary, so that occupancy never drops
>> below, say 50%.
>>
>> Perhaps we should also have amortized O(1)
>> insertion in the middle by using a "gap array".
>> Probably more important for byte/char collections
>> like StringBuilder...
>>
>> I believe there are more complicated implementations
>> that permit O(1) insertions at the ends, and only
>> O(sqrt(N)) space overhead.
>>
>> ....
>>
>> E.g. Use your favorite search engine to do
>> some research on:
>> Resizable arrays in optimal time and space
>> Succinct dynamic data structures
>>
>> Meta-comment: there is not enough transfer of
>> academic research results into practice; I would think this
>> is one of the responsibilities of the researchers.
>>
>> I presume you'd be willing to sign a
>> contributor agreement to get your changes into
>> the JDK someday.
>>
>> Martin
>>
>> > Regards,
>> >
>> > Kevin
>> >
>> >
>> > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz <martinrb at google.com>
>> > wrote:
>> >
>> > It surely is not a good idea to use a single backing array
>> > for huge arrays. As you point out, it's up to 32GB
>> > for just one object. But the core JDK
>> > doesn't offer a suitable alternative for users who need very
>> > large collections.
>> >
>> > It would have been more in the spirit of Java to have a
>> > collection class instead of ArrayList that was not fastest at
>> > any particular operation, but had excellent asymptotic behaviour,
>> > based on backing arrays containing backing arrays.
>> > But:
>> > - no such excellent class has been written yet
>> > (or please point me to such a class)
>> > - even if it were, such a best-of-breed-general-purpose
>> > List implementation would probably need to be introduced as a
>> > separate class, because of the performance expectations of
>> > existing implementations.
>> >
>> > In the meantime, we have to maintain what we got,
>> > and that includes living with arrays and classes that wrap them.
>> >
>> > Changing the spec is unlikely to succeed..
>> >
>> > Martin
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100329/c718d23d/attachment.html>
More information about the core-libs-dev
mailing list