RFR(s): 8205583: Crash in ConcurrentHashTable do_bulk_delete_locked_for

Tue Jun 26 08:20:51 UTC 2018

Hi David,

- Resize only happens if we have a low load factor but clustered strings in one 
or few bucket.
- Cleaning only happen after a GC and when the number of dead entries compared 
to table size is over a high water mark.
- Growing is preferred and if we go over a high water mark load factor we will 
grow (growing also cleans) instead of cleaning/rehashing.

It is very difficult to write a reliable test which would safepoint in the
middle of the cleaning and choose to rehash using just String.intern. A really
advanced gtest or exposing several function via the whitebox API could be done.
This also is why our ordinary testing didn't show this until now.

The exact circumstance for G1 is:
- Intern a lot of strings and remove references to must of them.
- Exhaust memory or call System.gc().
- GC safepoint clear the weak strings oops.
- After safepoint cleaning starts, if load factor is to low to grow.
- Intern many strings with same/very similar hash code, which will trigger 
rehash in next safepoint.
- Safepoint _before_ cleaning ends.
- Rehash
- If we now continue cleaning after safepoint we crash.

/Robbin

On 06/26/2018 03:32 AM, David Holmes wrote:
> Hi Robbin,
> 
> Do you have any idea on what exact circumstances caused this bug to be exposed? 
> I'm a little concerned that my mach5 testing accidentally tripped over it while 
> our proper testing has not!
> 
> Thanks,
> David
> 
> On 26/06/2018 4:26 AM, Robbin Ehn wrote:
>> Hi all, please review.
>>
>> Webrev: http://cr.openjdk.java.net/~rehn/8205583/v0/webrev/index.html
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8205583
>>
>> The problem is that the cancel-able cleaning operation is unaware of rehash.
>> The cleaning starts, pauses for a safepoint which does the rehash, destroying
>> the old table. When the cleaning continues after the safepoint it will continue
>> to do so in the destroyed table.
>>
>> The cancelability of the cleaning operation is not needed and just creates
>> complicity. In this change-set I remove that functionality, which means a rehash
>> will be postponed until the cleaning have finished. (identical as for growing)
>>
>> Passed 700 runs of the test that produced the error and tier 1-3.
>>
>> Thanks, Robbin