RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size

Sun Jul 6 03:50:26 UTC 2014

Thank you Doug for your comment!

On 04.07.2014 23:38, Doug Lea wrote:
> On 07/04/2014 01:33 PM, Ivan Gerasimov wrote:
>> On 04.07.2014 8:14, David Holmes wrote:
>>> Hi Ivan,
>>>
>>> I find the change to capacity somewhat obfuscating and I can't see 
>>> what the
>>> actual bug was.
>>>
>> The bug was in rounding in the expression minCapacity = (3 * 
>> expectedMaxSize)/2.
>> Suppose, we want the expected size to be 11, then minCapacity becomes 
>> (int)(11 *
>> 3 / 2) == 16.
>> The threshold is calculated later to be (int)(16 * 2 / 3) == 10, 
>> which is less
>> than expected size 11.
>
> So the bug report was based on someone noticing that with
> odd-valued sizes, the undocumented 2/3 target load was
> rounded down, not up, which they didn't like?
>
Not quite.

There are two small issues that I'm trying to address here:
1) The wrong capacity is calculated based on the expected size due to 
rounding error. This is only noticeable for the expected sizes 11, 43, 
171, 683 and a few others. This leads to a redundant table resize when 
inserting the expected number of elements.
2) The wrong capacity due to overflow error. For numbers N larger or 
equal to 1431655765, the expression (int)(N * 3) / 2 produces 
non-negative small numbers. Frankly speaking, this is also true for my 
last proposed fix, so the updated webrev will follow.
3) In put() the table gets resized as soon as the threshold is reached. 
This is wrong for expected sizes 2, 3, 5, 10, 21, 42, and some others.
4) If put() was unsuccessful, the map gets corrupt.

> I don't object to rounding it up and modernizing the size
> calculations to use highestOneBit, but it's a weak pretense :-)
>

Simple rounding the minimum capacity up as minCapacity = (3 * 
expectedMaxSize + 1)/2 would produce the same result of course.
I just took a chance to replace the while loop with a single function call.

In the last webrev I also modified the check to address the issue #2 
from the list above.

>>
>> To address the issue I combined the division by 2 with the rounding 
>> up to the
>> nearest power of two.
>> I also took a chance to replace a while-loop with a single call to the
>> highestOneBit method, which calculates exactly what we need here.
>>
>>> The recursive call to put after a resize seems very sub-optimal as 
>>> you will
>>> re-search the map for the non-existent key. Can you not just 
>>> recompute the
>>> correct indices and do the store?
>
> The initial rationale for post-insert-resize was to avoid these issues
> given the need to preserve at least one empty slot. Which I
> expect to remain faster than pre-resize even with your modified patch.
> Please provide systematic performance comparisons before committing!
>
> -Doug 

Yes, you are right.
Microbenchmark showed more than 20% slowdown of a single put operation.
So, I reverted this change to the original code, but added a single 
check of the current table size before doing any modifications to the map.
This is to address the issue #4, and this doesn't seem to introduce any 
significant penalty.

Would you please help review the updated webrev:

http://cr.openjdk.java.net/~igerasim/6904367/3/webrev/

Sincerely yours,
Ivan