RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Ioi Lam ioi.lam at oracle.com
Wed Aug 24 03:44:29 UTC 2016


On 8/23/16 7:54 PM, David Holmes wrote:
> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>
>>
>> On 8/23/16 6:51 PM, David Holmes wrote:
>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>
>>>>>> void Symbol::decrement_refcount() {
>>>>>> #ifdef ASSERT
>>>>>>   if (_refcount == 0) {
>>>>>>       print();
>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>     }
>>>>>>   }
>>>>>> #endif
>>>>>>   Atomic::dec(&_refcount);
>>>>>> }
>>>>>>
>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>> refcount
>>>>>> is 1. Two threads comes in and decrement at the same time. We 
>>>>>> will end
>>>>>> up with -1.
>>>>>
>>>>> So if we're going this path then you can get rid of the race by using
>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If you
>>>>> get back -1 then assert.
>>>>>
>>>>
>>>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>>>> archive are mapped read-only and have refcount==-1. If we decrement 
>>>> them
>>>> we get a SEGV.
>>>
>>> Unless you set them using the decrement_refcount function I don't see
>>> how this is a problem. The assertion would only trigger if we
>>> decrement from zero to -1.
>>>
>>>> Also, if we blindly decrement the refcount, we could end up rolling 
>>>> back
>>>> all the way from 0xffff to 0x0, at which point we may free the Symbol
>>>> which may still be in use.
>>>
>>> ?? As soon as a bad decrement happens the assert is  trigerred and the
>>> VM is dead.
>>
>> The assert doesn't happen in product VM. Currently the product VM will
>> continue to work, and all symbols with underflown/overflown refcounts
>> will be considered as permanent.
>
> Okay - note this bug is about an assertion failure :)
>
> So there are two problems:
>
> 1. There is a race related to the assert that the Atomic::add(-1) can 
> fix.
>
> 2. There is the product-mode unconditional decrementing of the 
> refcount which will mark everything as "permanent". I haven't see any 
> suggestions to address that existing issue. One obvious suggestion is 
> to use a different value than -1 so it doesn't easily get hit by 
> unexpected underflow.
>

Here's an updated version that's an improvement over my original patch:

+ A negative refcount means permanent symbols.
+ Only non-permanent symbols can be incremented/decremented
+ Underflow is detected by 0 -> -1 transition. Real underflows are 
always caught.
+ Theoretically there's still a race condition for false asserts:

[1] thread A observes that refcount is 0x7fff, proceeds to decrement it
[2] but in the mean time, around 32768 threads come in. They all observe 
that the refcount is non-negative, and then all proceed to increment the 
refcount to 0x0.
[3] thread A decrements the refcount and observes that refcount is -1 
afterwards

but I think this can be safely ignored :-)

http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/

I like this version as it doesn't change the logic for the product VM 
and only limits the possibility of false asserts.

Thanks
- Ioi

> David
>
>> - Ioi
>>
>>
>>>
>>> David
>>> -----
>>>
>>>> The fundamental problem is we need a two-step operation and we avoid
>>>> proper synchronization (for performance/avoiding deadlock/etc). So we
>>>> will always have a race condition somewhere, and we just need to allow
>>>> only the benign ones.
>>>>
>>>> - Ioi
>>>>
>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> However, it's not worse than before. The old version also has a race
>>>>>> condition:
>>>>>>
>>>>>>     refcount is 0
>>>>>>     thread A decrements
>>>>>>     thread B increments
>>>>>>     thread A checks for underflow
>>>>>>
>>>>>> the decrementing thread will read _refcount==0 at the end so it 
>>>>>> won't
>>>>>> detect the (transient) underflow.
>>>>>>
>>>>>> I think the failure to detect underflow is fine, since this happens
>>>>>> only
>>>>>> with concurrent access. The kinds of underflow that we are 
>>>>>> interested
>>>>>> usually can be caught in single-threaded situations.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>
>>>>>>> This doesn't make sense for me and I have to go in gdb to print out
>>>>>>> what -16384 is.   It appears that this is trying to detect that we
>>>>>>> went below zero from zero, which is an error, but this isn't 
>>>>>>> clear at
>>>>>>> all.
>>>>>>>
>>>>>>> It seems that
>>>>>>>
>>>>>>>    if (_refcount >= 0) {
>>>>>>>
>>>>>>>
>>>>>>> Should be > 0 and we should assert if this is ever zero instead, 
>>>>>>> and
>>>>>>> allow anything negative to mean that this count has gone immortal.
>>>>>>>
>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>> decrement, but maybe that isn't necessary, especially since there
>>>>>>> isn't a short version of cmpxchg.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>>
>>>>>>>> The test was loading a lot of JCK classes into the same VM. 
>>>>>>>> Many of
>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>>>>> refcount (a signed short integer) of this Symbol would run up and
>>>>>>>> past 0x7fff.
>>>>>>>>
>>>>>>>> The assert was caused by a race condition: the refcount started 
>>>>>>>> with
>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>> refcount
>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>> number of
>>>>>>>> concurrent threads that are running within a small window of 
>>>>>>>> several
>>>>>>>> instructions in the decrementing thread, so most likely it will be
>>>>>>>> 0x800?).
>>>>>>>>
>>>>>>>> As a result, the decrementing thread found that the refecount is
>>>>>>>> negative after the operation, and thought that an underflow had
>>>>>>>> happened.
>>>>>>>>
>>>>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>>>>> 0xbfff] range and do not flag these as underflows (since they are
>>>>>>>> most likely overflows -- overflows are already handled by 
>>>>>>>> making the
>>>>>>>> Symbol permanent).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list