RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Coleen Phillimore coleen.phillimore at oracle.com
Wed Aug 24 12:14:46 UTC 2016


I still don't want us to observe a _refcount == 0 - this is always an error.

+ if (_refcount >= 0) { // not a permanent symbol


Can this be changed to something like:

+ volatile int ref = _refcount;
+ assert(ref != 0, "underflow");

+ if (ref > 0) { // not a permanent symbol


I really like the atomic::add checking return value.

I'm working on a change for later that will eagerly delete symbols whose 
refcounts go to zero.

Thanks,
Coleen

On 8/24/16 4:28 AM, Ioi Lam wrote:
>
>
> On 8/24/16 12:14 AM, David Holmes wrote:
>> Hi Ioi,
>>
>> On 24/08/2016 5:01 PM, Ioi Lam wrote:
>>> Hi David,
>>>
>>> Here's an updated version that added Atomic::add(jshort*, jshort) as 
>>> you
>>> suggested.
>>
>> Thanks. Looks good.
>>
>>> To appease the "unused" warnings, I just added (void)new_value.
>>
>> I think I prefer the DEBUG_ONLY version. :)
>>
>
> I am not a big fan of ending DEBUG_ONLY with a "=", so I'll keep the 
> code as is :-)
>
> - Ioi
>
>
>> Cheers,
>> David
>>
>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v03/ 
>>>
>>>
>>>
>>> I am running RBT with "--test
>>> hotspot/test/:hotspot_all,vm.parallel_class_loading,vm.runtime.testlist" 
>>>
>>> to make sure everything works.
>>>
>>> Thanks
>>> - Ioi
>>>
>>> On 8/23/16 10:12 PM, David Holmes wrote:
>>>> Hi Ioi,
>>>>
>>>> Sorry I have a problem here. I just realized there is no
>>>> Atomic::add(jshort*, jshort). But I can't accept changing the
>>>> Atomic::inc/dec API only for the jshort case (and why return jint ??).
>>>> So either the whole Atomic inc/dec API needs updating across the board
>>>> (not a small task) or else we need to introduce Atomic::add(jshort*,
>>>> jshort). The latter seems a small task - the existing inc/dec
>>>> implementations can call the new add version.
>>>>
>>>> Also in a product build might the "new_value" variable trigger an
>>>> "unused" warning? If so an ugly option would be:
>>>>
>>>> DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
>>>> assert(new_value != -1, "...");
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 24/08/2016 1:44 PM, Ioi Lam wrote:
>>>>>
>>>>>
>>>>> On 8/23/16 7:54 PM, David Holmes wrote:
>>>>>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>>>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>>>>>
>>>>>>>>>>> void Symbol::decrement_refcount() {
>>>>>>>>>>> #ifdef ASSERT
>>>>>>>>>>>   if (_refcount == 0) {
>>>>>>>>>>>       print();
>>>>>>>>>>>       assert(false, "reference count underflow for symbol");
>>>>>>>>>>>     }
>>>>>>>>>>>   }
>>>>>>>>>>> #endif
>>>>>>>>>>>   Atomic::dec(&_refcount);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>>>>>>> refcount
>>>>>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>>>>>> will end
>>>>>>>>>>> up with -1.
>>>>>>>>>>
>>>>>>>>>> So if we're going this path then you can get rid of the race by
>>>>>>>>>> using
>>>>>>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If
>>>>>>>>>> you
>>>>>>>>>> get back -1 then assert.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem is we allow -1 to mean "permanent". Symbols in the 
>>>>>>>>> CDS
>>>>>>>>> archive are mapped read-only and have refcount==-1. If we 
>>>>>>>>> decrement
>>>>>>>>> them
>>>>>>>>> we get a SEGV.
>>>>>>>>
>>>>>>>> Unless you set them using the decrement_refcount function I 
>>>>>>>> don't see
>>>>>>>> how this is a problem. The assertion would only trigger if we
>>>>>>>> decrement from zero to -1.
>>>>>>>>
>>>>>>>>> Also, if we blindly decrement the refcount, we could end up 
>>>>>>>>> rolling
>>>>>>>>> back
>>>>>>>>> all the way from 0xffff to 0x0, at which point we may free the
>>>>>>>>> Symbol
>>>>>>>>> which may still be in use.
>>>>>>>>
>>>>>>>> ?? As soon as a bad decrement happens the assert is trigerred 
>>>>>>>> and the
>>>>>>>> VM is dead.
>>>>>>>
>>>>>>> The assert doesn't happen in product VM. Currently the product 
>>>>>>> VM will
>>>>>>> continue to work, and all symbols with underflown/overflown 
>>>>>>> refcounts
>>>>>>> will be considered as permanent.
>>>>>>
>>>>>> Okay - note this bug is about an assertion failure :)
>>>>>>
>>>>>> So there are two problems:
>>>>>>
>>>>>> 1. There is a race related to the assert that the Atomic::add(-1) 
>>>>>> can
>>>>>> fix.
>>>>>>
>>>>>> 2. There is the product-mode unconditional decrementing of the
>>>>>> refcount which will mark everything as "permanent". I haven't see 
>>>>>> any
>>>>>> suggestions to address that existing issue. One obvious 
>>>>>> suggestion is
>>>>>> to use a different value than -1 so it doesn't easily get hit by
>>>>>> unexpected underflow.
>>>>>>
>>>>>
>>>>> Here's an updated version that's an improvement over my original 
>>>>> patch:
>>>>>
>>>>> + A negative refcount means permanent symbols.
>>>>> + Only non-permanent symbols can be incremented/decremented
>>>>> + Underflow is detected by 0 -> -1 transition. Real underflows are
>>>>> always caught.
>>>>> + Theoretically there's still a race condition for false asserts:
>>>>>
>>>>> [1] thread A observes that refcount is 0x7fff, proceeds to 
>>>>> decrement it
>>>>> [2] but in the mean time, around 32768 threads come in. They all 
>>>>> observe
>>>>> that the refcount is non-negative, and then all proceed to 
>>>>> increment the
>>>>> refcount to 0x0.
>>>>> [3] thread A decrements the refcount and observes that refcount is -1
>>>>> afterwards
>>>>>
>>>>> but I think this can be safely ignored :-)
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/ 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I like this version as it doesn't change the logic for the product VM
>>>>> and only limits the possibility of false asserts.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> David
>>>>>>
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> The fundamental problem is we need a two-step operation and we 
>>>>>>>>> avoid
>>>>>>>>> proper synchronization (for performance/avoiding deadlock/etc).
>>>>>>>>> So we
>>>>>>>>> will always have a race condition somewhere, and we just need to
>>>>>>>>> allow
>>>>>>>>> only the benign ones.
>>>>>>>>>
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> However, it's not worse than before. The old version also has a
>>>>>>>>>>> race
>>>>>>>>>>> condition:
>>>>>>>>>>>
>>>>>>>>>>>     refcount is 0
>>>>>>>>>>>     thread A decrements
>>>>>>>>>>>     thread B increments
>>>>>>>>>>>     thread A checks for underflow
>>>>>>>>>>>
>>>>>>>>>>> the decrementing thread will read _refcount==0 at the end so it
>>>>>>>>>>> won't
>>>>>>>>>>> detect the (transient) underflow.
>>>>>>>>>>>
>>>>>>>>>>> I think the failure to detect underflow is fine, since this
>>>>>>>>>>> happens
>>>>>>>>>>> only
>>>>>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>>>>>> interested
>>>>>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> - Ioi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> This doesn't make sense for me and I have to go in gdb to
>>>>>>>>>>>> print out
>>>>>>>>>>>> what -16384 is.   It appears that this is trying to detect
>>>>>>>>>>>> that we
>>>>>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>>>>>> clear at
>>>>>>>>>>>> all.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems that
>>>>>>>>>>>>
>>>>>>>>>>>>    if (_refcount >= 0) {
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Should be > 0 and we should assert if this is ever zero 
>>>>>>>>>>>> instead,
>>>>>>>>>>>> and
>>>>>>>>>>>> allow anything negative to mean that this count has gone
>>>>>>>>>>>> immortal.
>>>>>>>>>>>>
>>>>>>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>>>>>>> decrement, but maybe that isn't necessary, especially since 
>>>>>>>>>>>> there
>>>>>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>> Coleen
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>>>>>> Many of
>>>>>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", 
>>>>>>>>>>>>> so the
>>>>>>>>>>>>> refcount (a signed short integer) of this Symbol would run up
>>>>>>>>>>>>> and
>>>>>>>>>>>>> past 0x7fff.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The assert was caused by a race condition: the refcount 
>>>>>>>>>>>>> started
>>>>>>>>>>>>> with
>>>>>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>>>>>>> refcount
>>>>>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>>>>>> number of
>>>>>>>>>>>>> concurrent threads that are running within a small window of
>>>>>>>>>>>>> several
>>>>>>>>>>>>> instructions in the decrementing thread, so most likely it
>>>>>>>>>>>>> will be
>>>>>>>>>>>>> 0x800?).
>>>>>>>>>>>>>
>>>>>>>>>>>>> As a result, the decrementing thread found that the 
>>>>>>>>>>>>> refecount is
>>>>>>>>>>>>> negative after the operation, and thought that an 
>>>>>>>>>>>>> underflow had
>>>>>>>>>>>>> happened.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The fix is to ignore any value that may appear in the 
>>>>>>>>>>>>> [0x8000 -
>>>>>>>>>>>>> 0xbfff] range and do not flag these as underflows (since they
>>>>>>>>>>>>> are
>>>>>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>>>>>> making the
>>>>>>>>>>>>> Symbol permanent).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> - Ioi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>



More information about the hotspot-runtime-dev mailing list