A List implementation backed by multiple small arrays rather than the traditional single large array.

Martin Buchholz martinrb at google.com
Fri Apr 2 03:28:18 UTC 2010


Hi Kevin,

You're probably the only one on this list who has
seriously read the paper.  It is not surprising that
taking a research paper into production would
discover bugs - the research never had to undergo
rigorous testing.  (I like the Java culture of
combining spec + implementation + test suite)

I suggest you ask the authors directly about the bug.
They would probably also be interested to hear
about your implementation.

Are you aware of Integer.numberOfLeadingZeros?
http://download.java.net/jdk7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)

Martin

On Wed, Mar 31, 2010 at 19:34, Kevin L. Stern <kevin.l.stern at gmail.com> wrote:
> I'm almost convinced now that the paper is incorrect.  The code below gives
> me the appropriate index into the index array and the offset into the data
> block.  That being said, remember when I mentioned that this will include a
> bit more work to access an element than a simple bit shift and a bit mask?
> Well this is more than a bit more - we'll be doing this each time an index
> is requested.  I'll spend some time trying to twiddle the bits to see if I
> can eliminate/combine some of the operations.
>
>         for (int r = 1; r < 33; r++) {
>             int k = lg(r);
>             int floorKO2 = k >> 1;
>             int powFloorKO2 = (1 << floorKO2);
>             int p = ((1 << floorKO2) - 1) << 1;
>             int ceilKO2;
>             if ((k & 1) == 1) {
>                 ceilKO2 = floorKO2 + 1;
>                 p += powFloorKO2;
>             } else {
>                 ceilKO2 = floorKO2;
>             }
>             int e = r & ((1 << ceilKO2) - 1);
>             int b = (r >> ceilKO2) & (powFloorKO2 - 1);
>
>             System.out.println((r - 1) + " " + (p + b) + " " + e);
>         }
>
> Kevin
>
> On Wed, Mar 31, 2010 at 7:08 PM, Kevin L. Stern <kevin.l.stern at gmail.com>
> wrote:
>>
>> I realize that 2 * (2^(k/2) - 1) only works for even numbered superblocks,
>> the odd numbered superblocks need an additional term added (the number of
>> data blocks in SB_[k-1]) to jive with my interpretation; anyhow, I also came
>> across an alternative characterization of superblock in the paper which
>> states that data blocks are grouped within a superblock when they are the
>> same size - to me, though, that implies that my example structure below
>> would be
>>
>> SB_0: [1]
>> SB_1: [2][2][2]
>> SB_2: [4][4][4][4][4][4]
>>
>> which seems to contradict my understanding of (1) below.  I must be
>> reading this upside down.
>>
>> On Wed, Mar 31, 2010 at 6:36 PM, Kevin L. Stern <kevin.l.stern at gmail.com>
>> wrote:
>>>
>>> What am I missing here?  In "Resizable arrays in optimal time and space"
>>> the authors define their data structure with the following property:
>>>
>>> (1)  "When superblock SB_k is fully allocated, it consists of
>>> 2^(floor(k/2)) data blocks, each of size 2^(ceil(k/2))."
>>>
>>> Since the superblock is zero-based indexed this implies the following
>>> structure:
>>>
>>> SB_0: [1]
>>> SB_1: [2]
>>> SB_2: [2][2]
>>> SB_3: [4][4]
>>> SB_4: [4][4][4][4]
>>> [...]
>>>
>>> Let's have a look at Algorithm 3, Locate(i), with i = 3:
>>>
>>> r = 100 (the binary expansion of i + 1)
>>> k = |r| - 1 = 2
>>> p = 2^k - 1 = 3
>>>
>>> What concerns me is their statement that p represents "the number of data
>>> blocks in superblocks prior to SB_k."  There are only two data blocks in
>>> superblocks prior to SB_2, not three.  Given (1) above, unless I'm
>>> misinterpreting it, the number of data blocks in superblocks prior to SB_k
>>> should be:
>>>
>>> 2 * Sum[i=0->k/2-1] 2^i = 2 * (2^(k/2) - 1)
>>>
>>> This, of course, seems to work out much better in my example above,
>>> giving the correct answer to my interpretation of their data structure, but
>>> I have a hard time believing that this is their mistake rather than my
>>> misinterpretation.
>>>
>>> Thoughts?
>>>
>>> Kevin
>>>
>>> On Tue, Mar 30, 2010 at 5:20 PM, Martin Buchholz <martinrb at google.com>
>>> wrote:
>>>>
>>>> On Tue, Mar 30, 2010 at 04:25, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>> wrote:
>>>> > Hi Martin,
>>>> >
>>>> > Thanks much for your feedback.  The first approach that comes to mind
>>>> > to
>>>> > implement O(1) time front as well as rear insertion is to create a
>>>> > cyclic
>>>> > list structure with a front/rear pointer - to insert at the front
>>>> > requires
>>>> > decrementing the front pointer (modulo the size) and to insert at the
>>>> > rear
>>>> > requires incrementing the rear pointer (modulo the size).  We need to
>>>> > resize
>>>> > when the two pointers bump into each other.  Could you explain more
>>>> > about
>>>> > your suggestion of introducing an arraylet that is shared by the front
>>>> > and
>>>> > the rear?
>>>>
>>>> It was a half-baked idea - I don't know if there's a way to turn it into
>>>> something useful.  I was thinking of the ArrayDeque implementation,
>>>> where all the elements live in a single array.
>>>>
>>>> >  It's not clear to me how that would help and/or be a better
>>>> > approach than the cyclic list.  Anyhow, the paper that you reference,
>>>> > "Resizable arrays in optimal time and space", gives a deque so if we
>>>> > take
>>>> > that approach then the deque is specified.
>>>>
>>>> Technically, ArrayList also supports the Deque operations -
>>>> just not efficiently.
>>>
>>
>
>



More information about the core-libs-dev mailing list