RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Ioi Lam ioi.lam at oracle.com
Wed Aug 24 08:28:17 UTC 2016



On 8/24/16 12:14 AM, David Holmes wrote:
> Hi Ioi,
>
> On 24/08/2016 5:01 PM, Ioi Lam wrote:
>> Hi David,
>>
>> Here's an updated version that added Atomic::add(jshort*, jshort) as you
>> suggested.
>
> Thanks. Looks good.
>
>> To appease the "unused" warnings, I just added (void)new_value.
>
> I think I prefer the DEBUG_ONLY version. :)
>

I am not a big fan of ending DEBUG_ONLY with a "=", so I'll keep the 
code as is :-)

- Ioi


> Cheers,
> David
>
>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v03/ 
>>
>>
>>
>> I am running RBT with "--test
>> hotspot/test/:hotspot_all,vm.parallel_class_loading,vm.runtime.testlist"
>> to make sure everything works.
>>
>> Thanks
>> - Ioi
>>
>> On 8/23/16 10:12 PM, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> Sorry I have a problem here. I just realized there is no
>>> Atomic::add(jshort*, jshort). But I can't accept changing the
>>> Atomic::inc/dec API only for the jshort case (and why return jint ??).
>>> So either the whole Atomic inc/dec API needs updating across the board
>>> (not a small task) or else we need to introduce Atomic::add(jshort*,
>>> jshort). The latter seems a small task - the existing inc/dec
>>> implementations can call the new add version.
>>>
>>> Also in a product build might the "new_value" variable trigger an
>>> "unused" warning? If so an ugly option would be:
>>>
>>> DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
>>> assert(new_value != -1, "...");
>>>
>>> Thanks,
>>> David
>>>
>>> On 24/08/2016 1:44 PM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 8/23/16 7:54 PM, David Holmes wrote:
>>>>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>>>>
>>>>>>
>>>>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>>>>
>>>>>>>>>> void Symbol::decrement_refcount() {
>>>>>>>>>> #ifdef ASSERT
>>>>>>>>>>   if (_refcount == 0) {
>>>>>>>>>>       print();
>>>>>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>>>>>     }
>>>>>>>>>>   }
>>>>>>>>>> #endif
>>>>>>>>>>   Atomic::dec(&_refcount);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>>>>>> refcount
>>>>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>>>>> will end
>>>>>>>>>> up with -1.
>>>>>>>>>
>>>>>>>>> So if we're going this path then you can get rid of the race by
>>>>>>>>> using
>>>>>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If
>>>>>>>>> you
>>>>>>>>> get back -1 then assert.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>>>>>>>> archive are mapped read-only and have refcount==-1. If we 
>>>>>>>> decrement
>>>>>>>> them
>>>>>>>> we get a SEGV.
>>>>>>>
>>>>>>> Unless you set them using the decrement_refcount function I 
>>>>>>> don't see
>>>>>>> how this is a problem. The assertion would only trigger if we
>>>>>>> decrement from zero to -1.
>>>>>>>
>>>>>>>> Also, if we blindly decrement the refcount, we could end up 
>>>>>>>> rolling
>>>>>>>> back
>>>>>>>> all the way from 0xffff to 0x0, at which point we may free the
>>>>>>>> Symbol
>>>>>>>> which may still be in use.
>>>>>>>
>>>>>>> ?? As soon as a bad decrement happens the assert is trigerred 
>>>>>>> and the
>>>>>>> VM is dead.
>>>>>>
>>>>>> The assert doesn't happen in product VM. Currently the product VM 
>>>>>> will
>>>>>> continue to work, and all symbols with underflown/overflown 
>>>>>> refcounts
>>>>>> will be considered as permanent.
>>>>>
>>>>> Okay - note this bug is about an assertion failure :)
>>>>>
>>>>> So there are two problems:
>>>>>
>>>>> 1. There is a race related to the assert that the Atomic::add(-1) can
>>>>> fix.
>>>>>
>>>>> 2. There is the product-mode unconditional decrementing of the
>>>>> refcount which will mark everything as "permanent". I haven't see any
>>>>> suggestions to address that existing issue. One obvious suggestion is
>>>>> to use a different value than -1 so it doesn't easily get hit by
>>>>> unexpected underflow.
>>>>>
>>>>
>>>> Here's an updated version that's an improvement over my original 
>>>> patch:
>>>>
>>>> + A negative refcount means permanent symbols.
>>>> + Only non-permanent symbols can be incremented/decremented
>>>> + Underflow is detected by 0 -> -1 transition. Real underflows are
>>>> always caught.
>>>> + Theoretically there's still a race condition for false asserts:
>>>>
>>>> [1] thread A observes that refcount is 0x7fff, proceeds to 
>>>> decrement it
>>>> [2] but in the mean time, around 32768 threads come in. They all 
>>>> observe
>>>> that the refcount is non-negative, and then all proceed to 
>>>> increment the
>>>> refcount to 0x0.
>>>> [3] thread A decrements the refcount and observes that refcount is -1
>>>> afterwards
>>>>
>>>> but I think this can be safely ignored :-)
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/ 
>>>>
>>>>
>>>>
>>>>
>>>> I like this version as it doesn't change the logic for the product VM
>>>> and only limits the possibility of false asserts.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> David
>>>>>
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> The fundamental problem is we need a two-step operation and we 
>>>>>>>> avoid
>>>>>>>> proper synchronization (for performance/avoiding deadlock/etc).
>>>>>>>> So we
>>>>>>>> will always have a race condition somewhere, and we just need to
>>>>>>>> allow
>>>>>>>> only the benign ones.
>>>>>>>>
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> However, it's not worse than before. The old version also has a
>>>>>>>>>> race
>>>>>>>>>> condition:
>>>>>>>>>>
>>>>>>>>>>     refcount is 0
>>>>>>>>>>     thread A decrements
>>>>>>>>>>     thread B increments
>>>>>>>>>>     thread A checks for underflow
>>>>>>>>>>
>>>>>>>>>> the decrementing thread will read _refcount==0 at the end so it
>>>>>>>>>> won't
>>>>>>>>>> detect the (transient) underflow.
>>>>>>>>>>
>>>>>>>>>> I think the failure to detect underflow is fine, since this
>>>>>>>>>> happens
>>>>>>>>>> only
>>>>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>>>>> interested
>>>>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>>>>
>>>>>>>>>>> This doesn't make sense for me and I have to go in gdb to
>>>>>>>>>>> print out
>>>>>>>>>>> what -16384 is.   It appears that this is trying to detect
>>>>>>>>>>> that we
>>>>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>>>>> clear at
>>>>>>>>>>> all.
>>>>>>>>>>>
>>>>>>>>>>> It seems that
>>>>>>>>>>>
>>>>>>>>>>>    if (_refcount >= 0) {
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Should be > 0 and we should assert if this is ever zero 
>>>>>>>>>>> instead,
>>>>>>>>>>> and
>>>>>>>>>>> allow anything negative to mean that this count has gone
>>>>>>>>>>> immortal.
>>>>>>>>>>>
>>>>>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>>>>>> decrement, but maybe that isn't necessary, especially since 
>>>>>>>>>>> there
>>>>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>> Coleen
>>>>>>>>>>>
>>>>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Summary:
>>>>>>>>>>>>
>>>>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>>>>> Many of
>>>>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so 
>>>>>>>>>>>> the
>>>>>>>>>>>> refcount (a signed short integer) of this Symbol would run up
>>>>>>>>>>>> and
>>>>>>>>>>>> past 0x7fff.
>>>>>>>>>>>>
>>>>>>>>>>>> The assert was caused by a race condition: the refcount 
>>>>>>>>>>>> started
>>>>>>>>>>>> with
>>>>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>>>>>> refcount
>>>>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>>>>> number of
>>>>>>>>>>>> concurrent threads that are running within a small window of
>>>>>>>>>>>> several
>>>>>>>>>>>> instructions in the decrementing thread, so most likely it
>>>>>>>>>>>> will be
>>>>>>>>>>>> 0x800?).
>>>>>>>>>>>>
>>>>>>>>>>>> As a result, the decrementing thread found that the 
>>>>>>>>>>>> refecount is
>>>>>>>>>>>> negative after the operation, and thought that an underflow 
>>>>>>>>>>>> had
>>>>>>>>>>>> happened.
>>>>>>>>>>>>
>>>>>>>>>>>> The fix is to ignore any value that may appear in the 
>>>>>>>>>>>> [0x8000 -
>>>>>>>>>>>> 0xbfff] range and do not flag these as underflows (since they
>>>>>>>>>>>> are
>>>>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>>>>> making the
>>>>>>>>>>>> Symbol permanent).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> - Ioi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list