RFR (XS) 8161280 - assert failed: reference count underflow for symbol

David Holmes david.holmes at oracle.com
Wed Aug 24 01:51:14 UTC 2016


On 24/08/2016 11:32 AM, Ioi Lam wrote:
> On 8/23/16 6:08 PM, David Holmes wrote:
>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>> Hi Coleen, thanks for suggestion the simplification:
>>>
>>> void Symbol::decrement_refcount() {
>>> #ifdef ASSERT
>>>   if (_refcount == 0) {
>>>       print();
>>>       assert(false, "reference count underflow for symbol");
>>>     }
>>>   }
>>> #endif
>>>   Atomic::dec(&_refcount);
>>> }
>>>
>>> There's a race condition that won't detect the underflow. E.g., refcount
>>> is 1. Two threads comes in and decrement at the same time. We will end
>>> up with -1.
>>
>> So if we're going this path then you can get rid of the race by using
>> Atomic::add(&_refcount, -1), which returns the updated value. If you
>> get back -1 then assert.
>>
>
> The problem is we allow -1 to mean "permanent". Symbols in the CDS
> archive are mapped read-only and have refcount==-1. If we decrement them
> we get a SEGV.

Unless you set them using the decrement_refcount function I don't see 
how this is a problem. The assertion would only trigger if we decrement 
from zero to -1.

> Also, if we blindly decrement the refcount, we could end up rolling back
> all the way from 0xffff to 0x0, at which point we may free the Symbol
> which may still be in use.

?? As soon as a bad decrement happens the assert is  trigerred and the 
VM is dead.

David
-----

> The fundamental problem is we need a two-step operation and we avoid
> proper synchronization (for performance/avoiding deadlock/etc). So we
> will always have a race condition somewhere, and we just need to allow
> only the benign ones.
>
> - Ioi
>
>
>> Cheers,
>> David
>>
>>> However, it's not worse than before. The old version also has a race
>>> condition:
>>>
>>>     refcount is 0
>>>     thread A decrements
>>>     thread B increments
>>>     thread A checks for underflow
>>>
>>> the decrementing thread will read _refcount==0 at the end so it won't
>>> detect the (transient) underflow.
>>>
>>> I think the failure to detect underflow is fine, since this happens only
>>> with concurrent access. The kinds of underflow that we are interested
>>> usually can be caught in single-threaded situations.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>
>>>
>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>
>>>> This doesn't make sense for me and I have to go in gdb to print out
>>>> what -16384 is.   It appears that this is trying to detect that we
>>>> went below zero from zero, which is an error, but this isn't clear at
>>>> all.
>>>>
>>>> It seems that
>>>>
>>>>    if (_refcount >= 0) {
>>>>
>>>>
>>>> Should be > 0 and we should assert if this is ever zero instead, and
>>>> allow anything negative to mean that this count has gone immortal.
>>>>
>>>> Kim thought it should use CAS rather than atomic increment and
>>>> decrement, but maybe that isn't necessary, especially since there
>>>> isn't a short version of cmpxchg.
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/
>>>>>
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> The test was loading a lot of JCK classes into the same VM. Many of
>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>> refcount (a signed short integer) of this Symbol would run up and
>>>>> past 0x7fff.
>>>>>
>>>>> The assert was caused by a race condition: the refcount started with
>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>> decrementing and several other threads are incrementing. The refcount
>>>>> will end up being 0x8000 or slightly higher (limited to the number of
>>>>> concurrent threads that are running within a small window of several
>>>>> instructions in the decrementing thread, so most likely it will be
>>>>> 0x800?).
>>>>>
>>>>> As a result, the decrementing thread found that the refecount is
>>>>> negative after the operation, and thought that an underflow had
>>>>> happened.
>>>>>
>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>> 0xbfff] range and do not flag these as underflows (since they are
>>>>> most likely overflows -- overflows are already handled by making the
>>>>> Symbol permanent).
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list