RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Coleen Phillimore coleen.phillimore at oracle.com
Wed Aug 24 12:42:32 UTC 2016



On 8/24/16 8:37 AM, Ioi Lam wrote:
>
>
> On 8/24/16 5:14 AM, Coleen Phillimore wrote:
>>
>> I still don't want us to observe a _refcount == 0 - this is always an 
>> error.
>>
>> + if (_refcount >= 0) { // not a permanent symbol
>>
>>
>> Can this be changed to something like:
>>
>> + volatile int ref = _refcount;
>> + assert(ref != 0, "underflow");
>>
>> + if (ref > 0) { // not a permanent symbol
>>
>>
> This will probably mess up product VM, as refcount will stay being 
> zero. I'd rather have it fall negative and become a permanent symbol.
>
> The current code will make the product VM more resilient even if 
> underflow happens.

Okay, that's fine.
Thanks,
Coleen

>
> Thanks
> - Ioi
>
>> I really like the atomic::add checking return value.
>>
>> I'm working on a change for later that will eagerly delete symbols 
>> whose refcounts go to zero.
>>
>> Thanks,
>> Coleen
>>
>> On 8/24/16 4:28 AM, Ioi Lam wrote:
>>>
>>>
>>> On 8/24/16 12:14 AM, David Holmes wrote:
>>>> Hi Ioi,
>>>>
>>>> On 24/08/2016 5:01 PM, Ioi Lam wrote:
>>>>> Hi David,
>>>>>
>>>>> Here's an updated version that added Atomic::add(jshort*, jshort) 
>>>>> as you
>>>>> suggested.
>>>>
>>>> Thanks. Looks good.
>>>>
>>>>> To appease the "unused" warnings, I just added (void)new_value.
>>>>
>>>> I think I prefer the DEBUG_ONLY version. :)
>>>>
>>>
>>> I am not a big fan of ending DEBUG_ONLY with a "=", so I'll keep the 
>>> code as is :-)
>>>
>>> - Ioi
>>>
>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v03/ 
>>>>>
>>>>>
>>>>>
>>>>> I am running RBT with "--test
>>>>> hotspot/test/:hotspot_all,vm.parallel_class_loading,vm.runtime.testlist" 
>>>>>
>>>>> to make sure everything works.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>> On 8/23/16 10:12 PM, David Holmes wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>> Sorry I have a problem here. I just realized there is no
>>>>>> Atomic::add(jshort*, jshort). But I can't accept changing the
>>>>>> Atomic::inc/dec API only for the jshort case (and why return jint 
>>>>>> ??).
>>>>>> So either the whole Atomic inc/dec API needs updating across the 
>>>>>> board
>>>>>> (not a small task) or else we need to introduce Atomic::add(jshort*,
>>>>>> jshort). The latter seems a small task - the existing inc/dec
>>>>>> implementations can call the new add version.
>>>>>>
>>>>>> Also in a product build might the "new_value" variable trigger an
>>>>>> "unused" warning? If so an ugly option would be:
>>>>>>
>>>>>> DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
>>>>>> assert(new_value != -1, "...");
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 24/08/2016 1:44 PM, Ioi Lam wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 8/23/16 7:54 PM, David Holmes wrote:
>>>>>>>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>>>>>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>>>>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>>>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>>>>>>>
>>>>>>>>>>>>> void Symbol::decrement_refcount() {
>>>>>>>>>>>>> #ifdef ASSERT
>>>>>>>>>>>>>   if (_refcount == 0) {
>>>>>>>>>>>>>       print();
>>>>>>>>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>   Atomic::dec(&_refcount);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> There's a race condition that won't detect the underflow. 
>>>>>>>>>>>>> E.g.,
>>>>>>>>>>>>> refcount
>>>>>>>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>>>>>>>> will end
>>>>>>>>>>>>> up with -1.
>>>>>>>>>>>>
>>>>>>>>>>>> So if we're going this path then you can get rid of the 
>>>>>>>>>>>> race by
>>>>>>>>>>>> using
>>>>>>>>>>>> Atomic::add(&_refcount, -1), which returns the updated 
>>>>>>>>>>>> value. If
>>>>>>>>>>>> you
>>>>>>>>>>>> get back -1 then assert.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem is we allow -1 to mean "permanent". Symbols in 
>>>>>>>>>>> the CDS
>>>>>>>>>>> archive are mapped read-only and have refcount==-1. If we 
>>>>>>>>>>> decrement
>>>>>>>>>>> them
>>>>>>>>>>> we get a SEGV.
>>>>>>>>>>
>>>>>>>>>> Unless you set them using the decrement_refcount function I 
>>>>>>>>>> don't see
>>>>>>>>>> how this is a problem. The assertion would only trigger if we
>>>>>>>>>> decrement from zero to -1.
>>>>>>>>>>
>>>>>>>>>>> Also, if we blindly decrement the refcount, we could end up 
>>>>>>>>>>> rolling
>>>>>>>>>>> back
>>>>>>>>>>> all the way from 0xffff to 0x0, at which point we may free the
>>>>>>>>>>> Symbol
>>>>>>>>>>> which may still be in use.
>>>>>>>>>>
>>>>>>>>>> ?? As soon as a bad decrement happens the assert is trigerred 
>>>>>>>>>> and the
>>>>>>>>>> VM is dead.
>>>>>>>>>
>>>>>>>>> The assert doesn't happen in product VM. Currently the product 
>>>>>>>>> VM will
>>>>>>>>> continue to work, and all symbols with underflown/overflown 
>>>>>>>>> refcounts
>>>>>>>>> will be considered as permanent.
>>>>>>>>
>>>>>>>> Okay - note this bug is about an assertion failure :)
>>>>>>>>
>>>>>>>> So there are two problems:
>>>>>>>>
>>>>>>>> 1. There is a race related to the assert that the 
>>>>>>>> Atomic::add(-1) can
>>>>>>>> fix.
>>>>>>>>
>>>>>>>> 2. There is the product-mode unconditional decrementing of the
>>>>>>>> refcount which will mark everything as "permanent". I haven't 
>>>>>>>> see any
>>>>>>>> suggestions to address that existing issue. One obvious 
>>>>>>>> suggestion is
>>>>>>>> to use a different value than -1 so it doesn't easily get hit by
>>>>>>>> unexpected underflow.
>>>>>>>>
>>>>>>>
>>>>>>> Here's an updated version that's an improvement over my original 
>>>>>>> patch:
>>>>>>>
>>>>>>> + A negative refcount means permanent symbols.
>>>>>>> + Only non-permanent symbols can be incremented/decremented
>>>>>>> + Underflow is detected by 0 -> -1 transition. Real underflows are
>>>>>>> always caught.
>>>>>>> + Theoretically there's still a race condition for false asserts:
>>>>>>>
>>>>>>> [1] thread A observes that refcount is 0x7fff, proceeds to 
>>>>>>> decrement it
>>>>>>> [2] but in the mean time, around 32768 threads come in. They all 
>>>>>>> observe
>>>>>>> that the refcount is non-negative, and then all proceed to 
>>>>>>> increment the
>>>>>>> refcount to 0x0.
>>>>>>> [3] thread A decrements the refcount and observes that refcount 
>>>>>>> is -1
>>>>>>> afterwards
>>>>>>>
>>>>>>> but I think this can be safely ignored :-)
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I like this version as it doesn't change the logic for the 
>>>>>>> product VM
>>>>>>> and only limits the possibility of false asserts.
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> The fundamental problem is we need a two-step operation and 
>>>>>>>>>>> we avoid
>>>>>>>>>>> proper synchronization (for performance/avoiding deadlock/etc).
>>>>>>>>>>> So we
>>>>>>>>>>> will always have a race condition somewhere, and we just 
>>>>>>>>>>> need to
>>>>>>>>>>> allow
>>>>>>>>>>> only the benign ones.
>>>>>>>>>>>
>>>>>>>>>>> - Ioi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> However, it's not worse than before. The old version also 
>>>>>>>>>>>>> has a
>>>>>>>>>>>>> race
>>>>>>>>>>>>> condition:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     refcount is 0
>>>>>>>>>>>>>     thread A decrements
>>>>>>>>>>>>>     thread B increments
>>>>>>>>>>>>>     thread A checks for underflow
>>>>>>>>>>>>>
>>>>>>>>>>>>> the decrementing thread will read _refcount==0 at the end 
>>>>>>>>>>>>> so it
>>>>>>>>>>>>> won't
>>>>>>>>>>>>> detect the (transient) underflow.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the failure to detect underflow is fine, since this
>>>>>>>>>>>>> happens
>>>>>>>>>>>>> only
>>>>>>>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>>>>>>>> interested
>>>>>>>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> - Ioi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This doesn't make sense for me and I have to go in gdb to
>>>>>>>>>>>>>> print out
>>>>>>>>>>>>>> what -16384 is.   It appears that this is trying to detect
>>>>>>>>>>>>>> that we
>>>>>>>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>>>>>>>> clear at
>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems that
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    if (_refcount >= 0) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should be > 0 and we should assert if this is ever zero 
>>>>>>>>>>>>>> instead,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> allow anything negative to mean that this count has gone
>>>>>>>>>>>>>> immortal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kim thought it should use CAS rather than atomic 
>>>>>>>>>>>>>> increment and
>>>>>>>>>>>>>> decrement, but maybe that isn't necessary, especially 
>>>>>>>>>>>>>> since there
>>>>>>>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>> Coleen
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>>>>>>>> Many of
>>>>>>>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", 
>>>>>>>>>>>>>>> so the
>>>>>>>>>>>>>>> refcount (a signed short integer) of this Symbol would 
>>>>>>>>>>>>>>> run up
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> past 0x7fff.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The assert was caused by a race condition: the refcount 
>>>>>>>>>>>>>>> started
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one 
>>>>>>>>>>>>>>> thread is
>>>>>>>>>>>>>>> decrementing and several other threads are incrementing. 
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>> refcount
>>>>>>>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>>>>>>>> number of
>>>>>>>>>>>>>>> concurrent threads that are running within a small 
>>>>>>>>>>>>>>> window of
>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>> instructions in the decrementing thread, so most likely it
>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>> 0x800?).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As a result, the decrementing thread found that the 
>>>>>>>>>>>>>>> refecount is
>>>>>>>>>>>>>>> negative after the operation, and thought that an 
>>>>>>>>>>>>>>> underflow had
>>>>>>>>>>>>>>> happened.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The fix is to ignore any value that may appear in the 
>>>>>>>>>>>>>>> [0x8000 -
>>>>>>>>>>>>>>> 0xbfff] range and do not flag these as underflows (since 
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>>>>>>>> making the
>>>>>>>>>>>>>>> Symbol permanent).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> - Ioi
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list