RFR (XS) 8161280 - assert failed: reference count underflow for symbol

David Holmes david.holmes at oracle.com
Wed Aug 24 07:14:29 UTC 2016


Hi Ioi,

On 24/08/2016 5:01 PM, Ioi Lam wrote:
> Hi David,
>
> Here's an updated version that added Atomic::add(jshort*, jshort) as you
> suggested.

Thanks. Looks good.

> To appease the "unused" warnings, I just added (void)new_value.

I think I prefer the DEBUG_ONLY version. :)

Cheers,
David

> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v03/
>
>
> I am running RBT with "--test
> hotspot/test/:hotspot_all,vm.parallel_class_loading,vm.runtime.testlist"
> to make sure everything works.
>
> Thanks
> - Ioi
>
> On 8/23/16 10:12 PM, David Holmes wrote:
>> Hi Ioi,
>>
>> Sorry I have a problem here. I just realized there is no
>> Atomic::add(jshort*, jshort). But I can't accept changing the
>> Atomic::inc/dec API only for the jshort case (and why return jint ??).
>> So either the whole Atomic inc/dec API needs updating across the board
>> (not a small task) or else we need to introduce Atomic::add(jshort*,
>> jshort). The latter seems a small task - the existing inc/dec
>> implementations can call the new add version.
>>
>> Also in a product build might the "new_value" variable trigger an
>> "unused" warning? If so an ugly option would be:
>>
>> DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
>> assert(new_value != -1, "...");
>>
>> Thanks,
>> David
>>
>> On 24/08/2016 1:44 PM, Ioi Lam wrote:
>>>
>>>
>>> On 8/23/16 7:54 PM, David Holmes wrote:
>>>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>>>
>>>>>
>>>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>>>
>>>>>>>>> void Symbol::decrement_refcount() {
>>>>>>>>> #ifdef ASSERT
>>>>>>>>>   if (_refcount == 0) {
>>>>>>>>>       print();
>>>>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>>>>     }
>>>>>>>>>   }
>>>>>>>>> #endif
>>>>>>>>>   Atomic::dec(&_refcount);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>>>>> refcount
>>>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>>>> will end
>>>>>>>>> up with -1.
>>>>>>>>
>>>>>>>> So if we're going this path then you can get rid of the race by
>>>>>>>> using
>>>>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If
>>>>>>>> you
>>>>>>>> get back -1 then assert.
>>>>>>>>
>>>>>>>
>>>>>>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>>>>>>> archive are mapped read-only and have refcount==-1. If we decrement
>>>>>>> them
>>>>>>> we get a SEGV.
>>>>>>
>>>>>> Unless you set them using the decrement_refcount function I don't see
>>>>>> how this is a problem. The assertion would only trigger if we
>>>>>> decrement from zero to -1.
>>>>>>
>>>>>>> Also, if we blindly decrement the refcount, we could end up rolling
>>>>>>> back
>>>>>>> all the way from 0xffff to 0x0, at which point we may free the
>>>>>>> Symbol
>>>>>>> which may still be in use.
>>>>>>
>>>>>> ?? As soon as a bad decrement happens the assert is trigerred and the
>>>>>> VM is dead.
>>>>>
>>>>> The assert doesn't happen in product VM. Currently the product VM will
>>>>> continue to work, and all symbols with underflown/overflown refcounts
>>>>> will be considered as permanent.
>>>>
>>>> Okay - note this bug is about an assertion failure :)
>>>>
>>>> So there are two problems:
>>>>
>>>> 1. There is a race related to the assert that the Atomic::add(-1) can
>>>> fix.
>>>>
>>>> 2. There is the product-mode unconditional decrementing of the
>>>> refcount which will mark everything as "permanent". I haven't see any
>>>> suggestions to address that existing issue. One obvious suggestion is
>>>> to use a different value than -1 so it doesn't easily get hit by
>>>> unexpected underflow.
>>>>
>>>
>>> Here's an updated version that's an improvement over my original patch:
>>>
>>> + A negative refcount means permanent symbols.
>>> + Only non-permanent symbols can be incremented/decremented
>>> + Underflow is detected by 0 -> -1 transition. Real underflows are
>>> always caught.
>>> + Theoretically there's still a race condition for false asserts:
>>>
>>> [1] thread A observes that refcount is 0x7fff, proceeds to decrement it
>>> [2] but in the mean time, around 32768 threads come in. They all observe
>>> that the refcount is non-negative, and then all proceed to increment the
>>> refcount to 0x0.
>>> [3] thread A decrements the refcount and observes that refcount is -1
>>> afterwards
>>>
>>> but I think this can be safely ignored :-)
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/
>>>
>>>
>>>
>>> I like this version as it doesn't change the logic for the product VM
>>> and only limits the possibility of false asserts.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> David
>>>>
>>>>> - Ioi
>>>>>
>>>>>
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> The fundamental problem is we need a two-step operation and we avoid
>>>>>>> proper synchronization (for performance/avoiding deadlock/etc).
>>>>>>> So we
>>>>>>> will always have a race condition somewhere, and we just need to
>>>>>>> allow
>>>>>>> only the benign ones.
>>>>>>>
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> However, it's not worse than before. The old version also has a
>>>>>>>>> race
>>>>>>>>> condition:
>>>>>>>>>
>>>>>>>>>     refcount is 0
>>>>>>>>>     thread A decrements
>>>>>>>>>     thread B increments
>>>>>>>>>     thread A checks for underflow
>>>>>>>>>
>>>>>>>>> the decrementing thread will read _refcount==0 at the end so it
>>>>>>>>> won't
>>>>>>>>> detect the (transient) underflow.
>>>>>>>>>
>>>>>>>>> I think the failure to detect underflow is fine, since this
>>>>>>>>> happens
>>>>>>>>> only
>>>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>>>> interested
>>>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>>>
>>>>>>>>>> This doesn't make sense for me and I have to go in gdb to
>>>>>>>>>> print out
>>>>>>>>>> what -16384 is.   It appears that this is trying to detect
>>>>>>>>>> that we
>>>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>>>> clear at
>>>>>>>>>> all.
>>>>>>>>>>
>>>>>>>>>> It seems that
>>>>>>>>>>
>>>>>>>>>>    if (_refcount >= 0) {
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Should be > 0 and we should assert if this is ever zero instead,
>>>>>>>>>> and
>>>>>>>>>> allow anything negative to mean that this count has gone
>>>>>>>>>> immortal.
>>>>>>>>>>
>>>>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>>>>> decrement, but maybe that isn't necessary, especially since there
>>>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Summary:
>>>>>>>>>>>
>>>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>>>> Many of
>>>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>>>>>>>> refcount (a signed short integer) of this Symbol would run up
>>>>>>>>>>> and
>>>>>>>>>>> past 0x7fff.
>>>>>>>>>>>
>>>>>>>>>>> The assert was caused by a race condition: the refcount started
>>>>>>>>>>> with
>>>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>>>>> refcount
>>>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>>>> number of
>>>>>>>>>>> concurrent threads that are running within a small window of
>>>>>>>>>>> several
>>>>>>>>>>> instructions in the decrementing thread, so most likely it
>>>>>>>>>>> will be
>>>>>>>>>>> 0x800?).
>>>>>>>>>>>
>>>>>>>>>>> As a result, the decrementing thread found that the refecount is
>>>>>>>>>>> negative after the operation, and thought that an underflow had
>>>>>>>>>>> happened.
>>>>>>>>>>>
>>>>>>>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>>>>>>>> 0xbfff] range and do not flag these as underflows (since they
>>>>>>>>>>> are
>>>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>>>> making the
>>>>>>>>>>> Symbol permanent).
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> - Ioi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list