RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Ioi Lam ioi.lam at oracle.com
Wed Aug 24 02:24:45 UTC 2016



On 8/23/16 6:51 PM, David Holmes wrote:
> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>> On 8/23/16 6:08 PM, David Holmes wrote:
>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>
>>>> void Symbol::decrement_refcount() {
>>>> #ifdef ASSERT
>>>>   if (_refcount == 0) {
>>>>       print();
>>>>       assert(false, "reference count underflow for symbol");
>>>>     }
>>>>   }
>>>> #endif
>>>>   Atomic::dec(&_refcount);
>>>> }
>>>>
>>>> There's a race condition that won't detect the underflow. E.g., 
>>>> refcount
>>>> is 1. Two threads comes in and decrement at the same time. We will end
>>>> up with -1.
>>>
>>> So if we're going this path then you can get rid of the race by using
>>> Atomic::add(&_refcount, -1), which returns the updated value. If you
>>> get back -1 then assert.
>>>
>>
>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>> archive are mapped read-only and have refcount==-1. If we decrement them
>> we get a SEGV.
>
> Unless you set them using the decrement_refcount function I don't see 
> how this is a problem. The assertion would only trigger if we 
> decrement from zero to -1.
>
>> Also, if we blindly decrement the refcount, we could end up rolling back
>> all the way from 0xffff to 0x0, at which point we may free the Symbol
>> which may still be in use.
>
> ?? As soon as a bad decrement happens the assert is  trigerred and the 
> VM is dead.

The assert doesn't happen in product VM. Currently the product VM will 
continue to work, and all symbols with underflown/overflown refcounts 
will be considered as permanent.

- Ioi


>
> David
> -----
>
>> The fundamental problem is we need a two-step operation and we avoid
>> proper synchronization (for performance/avoiding deadlock/etc). So we
>> will always have a race condition somewhere, and we just need to allow
>> only the benign ones.
>>
>> - Ioi
>>
>>
>>> Cheers,
>>> David
>>>
>>>> However, it's not worse than before. The old version also has a race
>>>> condition:
>>>>
>>>>     refcount is 0
>>>>     thread A decrements
>>>>     thread B increments
>>>>     thread A checks for underflow
>>>>
>>>> the decrementing thread will read _refcount==0 at the end so it won't
>>>> detect the (transient) underflow.
>>>>
>>>> I think the failure to detect underflow is fine, since this happens 
>>>> only
>>>> with concurrent access. The kinds of underflow that we are interested
>>>> usually can be caught in single-threaded situations.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>
>>>>
>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>
>>>>> This doesn't make sense for me and I have to go in gdb to print out
>>>>> what -16384 is.   It appears that this is trying to detect that we
>>>>> went below zero from zero, which is an error, but this isn't clear at
>>>>> all.
>>>>>
>>>>> It seems that
>>>>>
>>>>>    if (_refcount >= 0) {
>>>>>
>>>>>
>>>>> Should be > 0 and we should assert if this is ever zero instead, and
>>>>> allow anything negative to mean that this count has gone immortal.
>>>>>
>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>> decrement, but maybe that isn't necessary, especially since there
>>>>> isn't a short version of cmpxchg.
>>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Summary:
>>>>>>
>>>>>> The test was loading a lot of JCK classes into the same VM. Many of
>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>>> refcount (a signed short integer) of this Symbol would run up and
>>>>>> past 0x7fff.
>>>>>>
>>>>>> The assert was caused by a race condition: the refcount started with
>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>> decrementing and several other threads are incrementing. The 
>>>>>> refcount
>>>>>> will end up being 0x8000 or slightly higher (limited to the 
>>>>>> number of
>>>>>> concurrent threads that are running within a small window of several
>>>>>> instructions in the decrementing thread, so most likely it will be
>>>>>> 0x800?).
>>>>>>
>>>>>> As a result, the decrementing thread found that the refecount is
>>>>>> negative after the operation, and thought that an underflow had
>>>>>> happened.
>>>>>>
>>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>>> 0xbfff] range and do not flag these as underflows (since they are
>>>>>> most likely overflows -- overflows are already handled by making the
>>>>>> Symbol permanent).
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list