RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size

Wed Jul 9 22:17:56 UTC 2014

On 07/09/2014 06:36 PM, Martin Buchholz wrote:
> OK, I think we're down to the smallest possible bug:

I wouldn't call it a bug, since it's not present.

>
> if size == MAX_CAPACITY - 1 and we putAll a concurrent map with 
> greater size, but the concurrent map shrinks while we're adding it,
> and no actual elements get added to the IHM (but each element put 
> takes ~  2**28 probes), then an ISE is thrown when it was not strictly 
> necessary.

We won't be adding any elements if resize() throws "capacity exceeded", 
so this is just hypothetical thinking what would happen if we didn't 
call resize() and just started adding elements from the argument map. We 
might hit the ceiling or we might not.

>
> Meanwhile, I'm actually wishing we could throw at something like 80% 
> full...

I  created a little simulation:

public class IMHSimulation {

     static final int MAXIMUM_CAPACITY = 1 << 29;
     static final int MASK = MAXIMUM_CAPACITY - 1;

     static int hash(Object x) {
         int h = System.identityHashCode(x);
         // Multiply by -127 to use least bit as part of hash
         return (h - (h << 7)) & MASK;
     }

     static int add(BitSet bs, Object o) {
         int i = hash(o);
         int probes = 0;
         while (bs.get(i)) {
             i = (i + 1) & MASK;
             probes++;
         }
         bs.set(i);
         return probes;
     }

     public static void main(String[] args) {
         BitSet bs = new BitSet(MAXIMUM_CAPACITY);
         long totalProbes = 0;
         int maxProbes = 0;
         int percent = 10;
         int nextReportAt = (int) ((long) percent * MAXIMUM_CAPACITY / 100);
         for (int i = 0; i < MAXIMUM_CAPACITY; i++) {
             int probes = add(bs, new Object());
             totalProbes += probes;
             maxProbes = Math.max(maxProbes, probes);
             if (i == nextReportAt) {
                 System.out.printf(
                     "%3d%% full, avg. %4.1f, max. %6d probes\n",
                     percent, (double) totalProbes / i, maxProbes
                 );
                 if (percent < 60)
                     percent += 10;
                 else if (percent < 90)
                     percent += 5;
                 else
                     percent++;
                 nextReportAt = (int) ((long) percent * MAXIMUM_CAPACITY 
/ 100);
             }
         }
     }

Which produces the following:

  10% full, avg.  0.1, max.      9 probes
  20% full, avg.  0.1, max.     15 probes
  30% full, avg.  0.2, max.     25 probes
  40% full, avg.  0.3, max.     38 probes
  50% full, avg.  0.5, max.     64 probes
  60% full, avg.  0.7, max.     93 probes
  65% full, avg.  0.9, max.    147 probes
  70% full, avg.  1.2, max.    224 probes
  75% full, avg.  1.5, max.    354 probes
  80% full, avg.  2.0, max.    441 probes
  85% full, avg.  2.8, max.    789 probes
  90% full, avg.  4.5, max.   1869 probes
  91% full, avg.  5.1, max.   2377 probes
  92% full, avg.  5.7, max.   3409 probes
  93% full, avg.  6.6, max.   3804 probes
  94% full, avg.  7.8, max.   5824 probes
  95% full, avg.  9.5, max.   7021 probes
  96% full, avg. 12.0, max.  12607 probes
  97% full, avg. 16.2, max.  17643 probes
  98% full, avg. 24.5, max.  34561 probes
  99% full, avg. 49.6, max. 131371 probes

IMH resizing is arranged so that the table is always 33% ... 66% full 
(if nothing is ever removed from it) except when capacity reaches 2**29, 
then it can be filled up to the top.

avg. # of probes can be taken as a rough estimation of average 
slow-down, max. # of probes can be taken as a rough estimation of 
worst-case-operation slow-down.

So where to draw the line?

Regards, Peter

>
> Relatedly: It occurs to me that it's traditional in java.util to throw 
> OOME instead of ISE for capacity exceeded.

>
>
> On Wed, Jul 9, 2014 at 12:33 AM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>
>     On 07/09/2014 09:23 AM, Peter Levart wrote:
>>
>>     On 07/09/2014 02:46 AM, Martin Buchholz wrote:
>>>     Let me understand - you're worried that when size is
>>>     MAX_CAPACITY - 1, then a call to putAll that does not actually
>>>     add any elements might throw?
>>
>>     This is not possible, because resize() is only called from
>>     putAll(map) if argument map.size() > this.size. At least one
>>     element will be added to the map and it's correct to throw if
>>     current size == MAX_CAPACITY - 1.
>
>     ...at least if the argument map obeys the rules of Map contract.
>     Even if it's a HashMap or another IdentityHashMap, it should not
>     contain the same INSTANCE of the key in two or more mappings,
>     should not "overshoot" reporting it's size() and should be stable
>     during the whole putAll() operation... So calling IHM.addAll()
>     with a live changing ConcurrentHashMap is not advisable. Even
>     then, addAll() could only throw, because at some point the
>     argument's size indicated that IHM could reach it's max. capacity.
>
>     Peter
>
>
>>
>>>      Well, I'm having a hard time considering that corner case a
>>>     bug, given how unusable the map is at this point.
>>
>>     It seems even this corner case is not present.
>>
>>>
>>>     Your suggested fix of returning immediately in case of no resize
>>>     would cause put to succeed and reach size == MAX_CAPACITY, which
>>>     you were trying to prevent.
>>
>>     That's not possible either, since resize() is always called from
>>     put() with current table.length, which makes newLength == 2 *
>>     oldLength, therefore (oldLength >= newLength) will never succeed
>>     in this case.
>>
>>     Peter
>>
>>>
>>>
>>>     On Tue, Jul 8, 2014 at 5:25 PM, Ivan Gerasimov
>>>     <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>>
>>>     wrote:
>>>
>>>
>>>         On 09.07.2014 3:20, Martin Buchholz wrote:
>>>>         I agree with Peter that we do not need to increment
>>>>         modCount on resize.
>>>>
>>>>         My latest webrev is again "done".
>>>>
>>>>         Ivan, feel free to take over.
>>>
>>>         Yes, thanks! The fix is much much better now.
>>>
>>>         I think I see yet another very minor glitch, though.
>>>         If the table is already of size 2 * MAXIMUM_CAPACITY, and
>>>         we're merging in another map, which has more elements with
>>>         putAll(), resize() will be called and it will throw, even
>>>         though all the elements could fit into the map due to the
>>>         key duplicates.
>>>
>>>         To avoid this the following check should be done first in
>>>         resize():
>>>
>>>           469         if (oldLength >= newLength)
>>>           470             return false;
>>>           
>>>
>>>         Sincerely yours,
>>>         Ivan
>>>
>>>
>>
>
>