RFR (XS) 8161280 - assert failed: reference count underflow for symbol

David Holmes david.holmes at oracle.com
Wed Aug 24 05:12:10 UTC 2016


Hi Ioi,

Sorry I have a problem here. I just realized there is no 
Atomic::add(jshort*, jshort). But I can't accept changing the 
Atomic::inc/dec API only for the jshort case (and why return jint ??). 
So either the whole Atomic inc/dec API needs updating across the board 
(not a small task) or else we need to introduce Atomic::add(jshort*, 
jshort). The latter seems a small task - the existing inc/dec 
implementations can call the new add version.

Also in a product build might the "new_value" variable trigger an 
"unused" warning? If so an ugly option would be:

DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
assert(new_value != -1, "...");

Thanks,
David

On 24/08/2016 1:44 PM, Ioi Lam wrote:
>
>
> On 8/23/16 7:54 PM, David Holmes wrote:
>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>
>>>
>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>
>>>>>>> void Symbol::decrement_refcount() {
>>>>>>> #ifdef ASSERT
>>>>>>>   if (_refcount == 0) {
>>>>>>>       print();
>>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>>     }
>>>>>>>   }
>>>>>>> #endif
>>>>>>>   Atomic::dec(&_refcount);
>>>>>>> }
>>>>>>>
>>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>>> refcount
>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>> will end
>>>>>>> up with -1.
>>>>>>
>>>>>> So if we're going this path then you can get rid of the race by using
>>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If you
>>>>>> get back -1 then assert.
>>>>>>
>>>>>
>>>>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>>>>> archive are mapped read-only and have refcount==-1. If we decrement
>>>>> them
>>>>> we get a SEGV.
>>>>
>>>> Unless you set them using the decrement_refcount function I don't see
>>>> how this is a problem. The assertion would only trigger if we
>>>> decrement from zero to -1.
>>>>
>>>>> Also, if we blindly decrement the refcount, we could end up rolling
>>>>> back
>>>>> all the way from 0xffff to 0x0, at which point we may free the Symbol
>>>>> which may still be in use.
>>>>
>>>> ?? As soon as a bad decrement happens the assert is  trigerred and the
>>>> VM is dead.
>>>
>>> The assert doesn't happen in product VM. Currently the product VM will
>>> continue to work, and all symbols with underflown/overflown refcounts
>>> will be considered as permanent.
>>
>> Okay - note this bug is about an assertion failure :)
>>
>> So there are two problems:
>>
>> 1. There is a race related to the assert that the Atomic::add(-1) can
>> fix.
>>
>> 2. There is the product-mode unconditional decrementing of the
>> refcount which will mark everything as "permanent". I haven't see any
>> suggestions to address that existing issue. One obvious suggestion is
>> to use a different value than -1 so it doesn't easily get hit by
>> unexpected underflow.
>>
>
> Here's an updated version that's an improvement over my original patch:
>
> + A negative refcount means permanent symbols.
> + Only non-permanent symbols can be incremented/decremented
> + Underflow is detected by 0 -> -1 transition. Real underflows are
> always caught.
> + Theoretically there's still a race condition for false asserts:
>
> [1] thread A observes that refcount is 0x7fff, proceeds to decrement it
> [2] but in the mean time, around 32768 threads come in. They all observe
> that the refcount is non-negative, and then all proceed to increment the
> refcount to 0x0.
> [3] thread A decrements the refcount and observes that refcount is -1
> afterwards
>
> but I think this can be safely ignored :-)
>
> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/
>
>
> I like this version as it doesn't change the logic for the product VM
> and only limits the possibility of false asserts.
>
> Thanks
> - Ioi
>
>> David
>>
>>> - Ioi
>>>
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> The fundamental problem is we need a two-step operation and we avoid
>>>>> proper synchronization (for performance/avoiding deadlock/etc). So we
>>>>> will always have a race condition somewhere, and we just need to allow
>>>>> only the benign ones.
>>>>>
>>>>> - Ioi
>>>>>
>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>>> However, it's not worse than before. The old version also has a race
>>>>>>> condition:
>>>>>>>
>>>>>>>     refcount is 0
>>>>>>>     thread A decrements
>>>>>>>     thread B increments
>>>>>>>     thread A checks for underflow
>>>>>>>
>>>>>>> the decrementing thread will read _refcount==0 at the end so it
>>>>>>> won't
>>>>>>> detect the (transient) underflow.
>>>>>>>
>>>>>>> I think the failure to detect underflow is fine, since this happens
>>>>>>> only
>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>> interested
>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> This doesn't make sense for me and I have to go in gdb to print out
>>>>>>>> what -16384 is.   It appears that this is trying to detect that we
>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>> clear at
>>>>>>>> all.
>>>>>>>>
>>>>>>>> It seems that
>>>>>>>>
>>>>>>>>    if (_refcount >= 0) {
>>>>>>>>
>>>>>>>>
>>>>>>>> Should be > 0 and we should assert if this is ever zero instead,
>>>>>>>> and
>>>>>>>> allow anything negative to mean that this count has gone immortal.
>>>>>>>>
>>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>>> decrement, but maybe that isn't necessary, especially since there
>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Summary:
>>>>>>>>>
>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>> Many of
>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>>>>>> refcount (a signed short integer) of this Symbol would run up and
>>>>>>>>> past 0x7fff.
>>>>>>>>>
>>>>>>>>> The assert was caused by a race condition: the refcount started
>>>>>>>>> with
>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>>> refcount
>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>> number of
>>>>>>>>> concurrent threads that are running within a small window of
>>>>>>>>> several
>>>>>>>>> instructions in the decrementing thread, so most likely it will be
>>>>>>>>> 0x800?).
>>>>>>>>>
>>>>>>>>> As a result, the decrementing thread found that the refecount is
>>>>>>>>> negative after the operation, and thought that an underflow had
>>>>>>>>> happened.
>>>>>>>>>
>>>>>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>>>>>> 0xbfff] range and do not flag these as underflows (since they are
>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>> making the
>>>>>>>>> Symbol permanent).
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list