A List implementation backed by multiple small arrays rather than the traditional single large array.

Sun Apr 4 11:12:24 UTC 2010

The data structure is available at the second link that I originally
provided (once again, it is
https://docs.google.com/Doc?docid=0Aabrz3MPBDdhZGdrbnEzejdfM2M3am5wM2Mz&hl=en).
This does not have O(1) time insertion at the front as yet as it was unclear
to me whether or not it was agreed upon:
_________________
From: Osvaldo Doederlein <opinali at gmail.com>
Date: Mon, Mar 29, 2010 at 10:08 AM
Subject: Re: A List implementation backed by multiple small arrays rather
than the traditional single large array.
To: Martin Buchholz <martinrb at google.com>
Cc: "Kevin L. Stern" <kevin.l.stern at gmail.com>,
core-libs-dev at openjdk.java.net

Initially, it would be good enough to replace only java.util.ArrayList with
minimal overhead. ArrayList does not support efficient add-at-front or other
enhancements of ArrayDeque; but ArrayList is still a much more important and
popular collection, it's the primary "straight replacement for primitive
arrrays" and I guess it should continue with that role.
_________________

As a disclaimer, I'm still tinkering with this so I'll be updating the
document at the provided link as I find improvements.

Thoughts?

Thanks,

Kevin

On Thu, Apr 1, 2010 at 10:28 PM, Martin Buchholz <martinrb at google.com>wrote:

> Hi Kevin,
>
> You're probably the only one on this list who has
> seriously read the paper.  It is not surprising that
> taking a research paper into production would
> discover bugs - the research never had to undergo
> rigorous testing.  (I like the Java culture of
> combining spec + implementation + test suite)
>
> I suggest you ask the authors directly about the bug.
> They would probably also be interested to hear
> about your implementation.
>
> Are you aware of Integer.numberOfLeadingZeros?
>
> http://download.java.net/jdk7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)<http://download.java.net/jdk7/docs/api/java/lang/Integer.html#numberOfLeadingZeros%28int%29>
>
> Martin
>
> On Wed, Mar 31, 2010 at 19:34, Kevin L. Stern <kevin.l.stern at gmail.com>
> wrote:
> > I'm almost convinced now that the paper is incorrect.  The code below
> gives
> > me the appropriate index into the index array and the offset into the
> data
> > block.  That being said, remember when I mentioned that this will include
> a
> > bit more work to access an element than a simple bit shift and a bit
> mask?
> > Well this is more than a bit more - we'll be doing this each time an
> index
> > is requested.  I'll spend some time trying to twiddle the bits to see if
> I
> > can eliminate/combine some of the operations.
> >
> >         for (int r = 1; r < 33; r++) {
> >             int k = lg(r);
> >             int floorKO2 = k >> 1;
> >             int powFloorKO2 = (1 << floorKO2);
> >             int p = ((1 << floorKO2) - 1) << 1;
> >             int ceilKO2;
> >             if ((k & 1) == 1) {
> >                 ceilKO2 = floorKO2 + 1;
> >                 p += powFloorKO2;
> >             } else {
> >                 ceilKO2 = floorKO2;
> >             }
> >             int e = r & ((1 << ceilKO2) - 1);
> >             int b = (r >> ceilKO2) & (powFloorKO2 - 1);
> >
> >             System.out.println((r - 1) + " " + (p + b) + " " + e);
> >         }
> >
> > Kevin
> >
> > On Wed, Mar 31, 2010 at 7:08 PM, Kevin L. Stern <kevin.l.stern at gmail.com
> >
> > wrote:
> >>
> >> I realize that 2 * (2^(k/2) - 1) only works for even numbered
> superblocks,
> >> the odd numbered superblocks need an additional term added (the number
> of
> >> data blocks in SB_[k-1]) to jive with my interpretation; anyhow, I also
> came
> >> across an alternative characterization of superblock in the paper which
> >> states that data blocks are grouped within a superblock when they are
> the
> >> same size - to me, though, that implies that my example structure below
> >> would be
> >>
> >> SB_0: [1]
> >> SB_1: [2][2][2]
> >> SB_2: [4][4][4][4][4][4]
> >>
> >> which seems to contradict my understanding of (1) below.  I must be
> >> reading this upside down.
> >>
> >> On Wed, Mar 31, 2010 at 6:36 PM, Kevin L. Stern <
> kevin.l.stern at gmail.com>
> >> wrote:
> >>>
> >>> What am I missing here?  In "Resizable arrays in optimal time and
> space"
> >>> the authors define their data structure with the following property:
> >>>
> >>> (1)  "When superblock SB_k is fully allocated, it consists of
> >>> 2^(floor(k/2)) data blocks, each of size 2^(ceil(k/2))."
> >>>
> >>> Since the superblock is zero-based indexed this implies the following
> >>> structure:
> >>>
> >>> SB_0: [1]
> >>> SB_1: [2]
> >>> SB_2: [2][2]
> >>> SB_3: [4][4]
> >>> SB_4: [4][4][4][4]
> >>> [...]
> >>>
> >>> Let's have a look at Algorithm 3, Locate(i), with i = 3:
> >>>
> >>> r = 100 (the binary expansion of i + 1)
> >>> k = |r| - 1 = 2
> >>> p = 2^k - 1 = 3
> >>>
> >>> What concerns me is their statement that p represents "the number of
> data
> >>> blocks in superblocks prior to SB_k."  There are only two data blocks
> in
> >>> superblocks prior to SB_2, not three.  Given (1) above, unless I'm
> >>> misinterpreting it, the number of data blocks in superblocks prior to
> SB_k
> >>> should be:
> >>>
> >>> 2 * Sum[i=0->k/2-1] 2^i = 2 * (2^(k/2) - 1)
> >>>
> >>> This, of course, seems to work out much better in my example above,
> >>> giving the correct answer to my interpretation of their data structure,
> but
> >>> I have a hard time believing that this is their mistake rather than my
> >>> misinterpretation.
> >>>
> >>> Thoughts?
> >>>
> >>> Kevin
> >>>
> >>> On Tue, Mar 30, 2010 at 5:20 PM, Martin Buchholz <martinrb at google.com>
> >>> wrote:
> >>>>
> >>>> On Tue, Mar 30, 2010 at 04:25, Kevin L. Stern <
> kevin.l.stern at gmail.com>
> >>>> wrote:
> >>>> > Hi Martin,
> >>>> >
> >>>> > Thanks much for your feedback.  The first approach that comes to
> mind
> >>>> > to
> >>>> > implement O(1) time front as well as rear insertion is to create a
> >>>> > cyclic
> >>>> > list structure with a front/rear pointer - to insert at the front
> >>>> > requires
> >>>> > decrementing the front pointer (modulo the size) and to insert at
> the
> >>>> > rear
> >>>> > requires incrementing the rear pointer (modulo the size).  We need
> to
> >>>> > resize
> >>>> > when the two pointers bump into each other.  Could you explain more
> >>>> > about
> >>>> > your suggestion of introducing an arraylet that is shared by the
> front
> >>>> > and
> >>>> > the rear?
> >>>>
> >>>> It was a half-baked idea - I don't know if there's a way to turn it
> into
> >>>> something useful.  I was thinking of the ArrayDeque implementation,
> >>>> where all the elements live in a single array.
> >>>>
> >>>> >  It's not clear to me how that would help and/or be a better
> >>>> > approach than the cyclic list.  Anyhow, the paper that you
> reference,
> >>>> > "Resizable arrays in optimal time and space", gives a deque so if we
> >>>> > take
> >>>> > that approach then the deque is specified.
> >>>>
> >>>> Technically, ArrayList also supports the Deque operations -
> >>>> just not efficiently.
> >>>
> >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100404/fbd6cdd5/attachment.html>