RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size
Peter Levart
peter.levart at gmail.com
Wed Jul 9 22:17:56 UTC 2014
On 07/09/2014 06:36 PM, Martin Buchholz wrote:
> OK, I think we're down to the smallest possible bug:
I wouldn't call it a bug, since it's not present.
>
> if size == MAX_CAPACITY - 1 and we putAll a concurrent map with
> greater size, but the concurrent map shrinks while we're adding it,
> and no actual elements get added to the IHM (but each element put
> takes ~ 2**28 probes), then an ISE is thrown when it was not strictly
> necessary.
We won't be adding any elements if resize() throws "capacity exceeded",
so this is just hypothetical thinking what would happen if we didn't
call resize() and just started adding elements from the argument map. We
might hit the ceiling or we might not.
>
> Meanwhile, I'm actually wishing we could throw at something like 80%
> full...
I created a little simulation:
public class IMHSimulation {
static final int MAXIMUM_CAPACITY = 1 << 29;
static final int MASK = MAXIMUM_CAPACITY - 1;
static int hash(Object x) {
int h = System.identityHashCode(x);
// Multiply by -127 to use least bit as part of hash
return (h - (h << 7)) & MASK;
}
static int add(BitSet bs, Object o) {
int i = hash(o);
int probes = 0;
while (bs.get(i)) {
i = (i + 1) & MASK;
probes++;
}
bs.set(i);
return probes;
}
public static void main(String[] args) {
BitSet bs = new BitSet(MAXIMUM_CAPACITY);
long totalProbes = 0;
int maxProbes = 0;
int percent = 10;
int nextReportAt = (int) ((long) percent * MAXIMUM_CAPACITY / 100);
for (int i = 0; i < MAXIMUM_CAPACITY; i++) {
int probes = add(bs, new Object());
totalProbes += probes;
maxProbes = Math.max(maxProbes, probes);
if (i == nextReportAt) {
System.out.printf(
"%3d%% full, avg. %4.1f, max. %6d probes\n",
percent, (double) totalProbes / i, maxProbes
);
if (percent < 60)
percent += 10;
else if (percent < 90)
percent += 5;
else
percent++;
nextReportAt = (int) ((long) percent * MAXIMUM_CAPACITY
/ 100);
}
}
}
Which produces the following:
10% full, avg. 0.1, max. 9 probes
20% full, avg. 0.1, max. 15 probes
30% full, avg. 0.2, max. 25 probes
40% full, avg. 0.3, max. 38 probes
50% full, avg. 0.5, max. 64 probes
60% full, avg. 0.7, max. 93 probes
65% full, avg. 0.9, max. 147 probes
70% full, avg. 1.2, max. 224 probes
75% full, avg. 1.5, max. 354 probes
80% full, avg. 2.0, max. 441 probes
85% full, avg. 2.8, max. 789 probes
90% full, avg. 4.5, max. 1869 probes
91% full, avg. 5.1, max. 2377 probes
92% full, avg. 5.7, max. 3409 probes
93% full, avg. 6.6, max. 3804 probes
94% full, avg. 7.8, max. 5824 probes
95% full, avg. 9.5, max. 7021 probes
96% full, avg. 12.0, max. 12607 probes
97% full, avg. 16.2, max. 17643 probes
98% full, avg. 24.5, max. 34561 probes
99% full, avg. 49.6, max. 131371 probes
IMH resizing is arranged so that the table is always 33% ... 66% full
(if nothing is ever removed from it) except when capacity reaches 2**29,
then it can be filled up to the top.
avg. # of probes can be taken as a rough estimation of average
slow-down, max. # of probes can be taken as a rough estimation of
worst-case-operation slow-down.
So where to draw the line?
Regards, Peter
>
> Relatedly: It occurs to me that it's traditional in java.util to throw
> OOME instead of ISE for capacity exceeded.
>
>
> On Wed, Jul 9, 2014 at 12:33 AM, Peter Levart <peter.levart at gmail.com
> <mailto:peter.levart at gmail.com>> wrote:
>
>
> On 07/09/2014 09:23 AM, Peter Levart wrote:
>>
>> On 07/09/2014 02:46 AM, Martin Buchholz wrote:
>>> Let me understand - you're worried that when size is
>>> MAX_CAPACITY - 1, then a call to putAll that does not actually
>>> add any elements might throw?
>>
>> This is not possible, because resize() is only called from
>> putAll(map) if argument map.size() > this.size. At least one
>> element will be added to the map and it's correct to throw if
>> current size == MAX_CAPACITY - 1.
>
> ...at least if the argument map obeys the rules of Map contract.
> Even if it's a HashMap or another IdentityHashMap, it should not
> contain the same INSTANCE of the key in two or more mappings,
> should not "overshoot" reporting it's size() and should be stable
> during the whole putAll() operation... So calling IHM.addAll()
> with a live changing ConcurrentHashMap is not advisable. Even
> then, addAll() could only throw, because at some point the
> argument's size indicated that IHM could reach it's max. capacity.
>
> Peter
>
>
>>
>>> Well, I'm having a hard time considering that corner case a
>>> bug, given how unusable the map is at this point.
>>
>> It seems even this corner case is not present.
>>
>>>
>>> Your suggested fix of returning immediately in case of no resize
>>> would cause put to succeed and reach size == MAX_CAPACITY, which
>>> you were trying to prevent.
>>
>> That's not possible either, since resize() is always called from
>> put() with current table.length, which makes newLength == 2 *
>> oldLength, therefore (oldLength >= newLength) will never succeed
>> in this case.
>>
>> Peter
>>
>>>
>>>
>>> On Tue, Jul 8, 2014 at 5:25 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>>
>>> wrote:
>>>
>>>
>>> On 09.07.2014 3:20, Martin Buchholz wrote:
>>>> I agree with Peter that we do not need to increment
>>>> modCount on resize.
>>>>
>>>> My latest webrev is again "done".
>>>>
>>>> Ivan, feel free to take over.
>>>
>>> Yes, thanks! The fix is much much better now.
>>>
>>> I think I see yet another very minor glitch, though.
>>> If the table is already of size 2 * MAXIMUM_CAPACITY, and
>>> we're merging in another map, which has more elements with
>>> putAll(), resize() will be called and it will throw, even
>>> though all the elements could fit into the map due to the
>>> key duplicates.
>>>
>>> To avoid this the following check should be done first in
>>> resize():
>>>
>>> 469 if (oldLength >= newLength)
>>> 470 return false;
>>>
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>
>>
>
>
More information about the core-libs-dev
mailing list