RFR (XS) 8161280 - assert failed: reference count underflow for symbol
Ioi Lam
ioi.lam at oracle.com
Wed Aug 24 07:01:29 UTC 2016
Hi David,
Here's an updated version that added Atomic::add(jshort*, jshort) as you
suggested.
To appease the "unused" warnings, I just added (void)new_value.
http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v03/
I am running RBT with "--test
hotspot/test/:hotspot_all,vm.parallel_class_loading,vm.runtime.testlist"
to make sure everything works.
Thanks
- Ioi
On 8/23/16 10:12 PM, David Holmes wrote:
> Hi Ioi,
>
> Sorry I have a problem here. I just realized there is no
> Atomic::add(jshort*, jshort). But I can't accept changing the
> Atomic::inc/dec API only for the jshort case (and why return jint ??).
> So either the whole Atomic inc/dec API needs updating across the board
> (not a small task) or else we need to introduce Atomic::add(jshort*,
> jshort). The latter seems a small task - the existing inc/dec
> implementations can call the new add version.
>
> Also in a product build might the "new_value" variable trigger an
> "unused" warning? If so an ugly option would be:
>
> DEBUG_ONLY(jshort new_value =) Atomic::dec(...);
> assert(new_value != -1, "...");
>
> Thanks,
> David
>
> On 24/08/2016 1:44 PM, Ioi Lam wrote:
>>
>>
>> On 8/23/16 7:54 PM, David Holmes wrote:
>>> On 24/08/2016 12:24 PM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 8/23/16 6:51 PM, David Holmes wrote:
>>>>> On 24/08/2016 11:32 AM, Ioi Lam wrote:
>>>>>> On 8/23/16 6:08 PM, David Holmes wrote:
>>>>>>> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>>>>>>>> Hi Coleen, thanks for suggestion the simplification:
>>>>>>>>
>>>>>>>> void Symbol::decrement_refcount() {
>>>>>>>> #ifdef ASSERT
>>>>>>>> if (_refcount == 0) {
>>>>>>>> print();
>>>>>>>> assert(false, "reference count underflow for symbol");
>>>>>>>> }
>>>>>>>> }
>>>>>>>> #endif
>>>>>>>> Atomic::dec(&_refcount);
>>>>>>>> }
>>>>>>>>
>>>>>>>> There's a race condition that won't detect the underflow. E.g.,
>>>>>>>> refcount
>>>>>>>> is 1. Two threads comes in and decrement at the same time. We
>>>>>>>> will end
>>>>>>>> up with -1.
>>>>>>>
>>>>>>> So if we're going this path then you can get rid of the race by
>>>>>>> using
>>>>>>> Atomic::add(&_refcount, -1), which returns the updated value. If
>>>>>>> you
>>>>>>> get back -1 then assert.
>>>>>>>
>>>>>>
>>>>>> The problem is we allow -1 to mean "permanent". Symbols in the CDS
>>>>>> archive are mapped read-only and have refcount==-1. If we decrement
>>>>>> them
>>>>>> we get a SEGV.
>>>>>
>>>>> Unless you set them using the decrement_refcount function I don't see
>>>>> how this is a problem. The assertion would only trigger if we
>>>>> decrement from zero to -1.
>>>>>
>>>>>> Also, if we blindly decrement the refcount, we could end up rolling
>>>>>> back
>>>>>> all the way from 0xffff to 0x0, at which point we may free the
>>>>>> Symbol
>>>>>> which may still be in use.
>>>>>
>>>>> ?? As soon as a bad decrement happens the assert is trigerred and the
>>>>> VM is dead.
>>>>
>>>> The assert doesn't happen in product VM. Currently the product VM will
>>>> continue to work, and all symbols with underflown/overflown refcounts
>>>> will be considered as permanent.
>>>
>>> Okay - note this bug is about an assertion failure :)
>>>
>>> So there are two problems:
>>>
>>> 1. There is a race related to the assert that the Atomic::add(-1) can
>>> fix.
>>>
>>> 2. There is the product-mode unconditional decrementing of the
>>> refcount which will mark everything as "permanent". I haven't see any
>>> suggestions to address that existing issue. One obvious suggestion is
>>> to use a different value than -1 so it doesn't easily get hit by
>>> unexpected underflow.
>>>
>>
>> Here's an updated version that's an improvement over my original patch:
>>
>> + A negative refcount means permanent symbols.
>> + Only non-permanent symbols can be incremented/decremented
>> + Underflow is detected by 0 -> -1 transition. Real underflows are
>> always caught.
>> + Theoretically there's still a race condition for false asserts:
>>
>> [1] thread A observes that refcount is 0x7fff, proceeds to decrement it
>> [2] but in the mean time, around 32768 threads come in. They all observe
>> that the refcount is non-negative, and then all proceed to increment the
>> refcount to 0x0.
>> [3] thread A decrements the refcount and observes that refcount is -1
>> afterwards
>>
>> but I think this can be safely ignored :-)
>>
>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v02/
>>
>>
>>
>> I like this version as it doesn't change the logic for the product VM
>> and only limits the possibility of false asserts.
>>
>> Thanks
>> - Ioi
>>
>>> David
>>>
>>>> - Ioi
>>>>
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> The fundamental problem is we need a two-step operation and we avoid
>>>>>> proper synchronization (for performance/avoiding deadlock/etc).
>>>>>> So we
>>>>>> will always have a race condition somewhere, and we just need to
>>>>>> allow
>>>>>> only the benign ones.
>>>>>>
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>>> However, it's not worse than before. The old version also has a
>>>>>>>> race
>>>>>>>> condition:
>>>>>>>>
>>>>>>>> refcount is 0
>>>>>>>> thread A decrements
>>>>>>>> thread B increments
>>>>>>>> thread A checks for underflow
>>>>>>>>
>>>>>>>> the decrementing thread will read _refcount==0 at the end so it
>>>>>>>> won't
>>>>>>>> detect the (transient) underflow.
>>>>>>>>
>>>>>>>> I think the failure to detect underflow is fine, since this
>>>>>>>> happens
>>>>>>>> only
>>>>>>>> with concurrent access. The kinds of underflow that we are
>>>>>>>> interested
>>>>>>>> usually can be caught in single-threaded situations.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> This doesn't make sense for me and I have to go in gdb to
>>>>>>>>> print out
>>>>>>>>> what -16384 is. It appears that this is trying to detect
>>>>>>>>> that we
>>>>>>>>> went below zero from zero, which is an error, but this isn't
>>>>>>>>> clear at
>>>>>>>>> all.
>>>>>>>>>
>>>>>>>>> It seems that
>>>>>>>>>
>>>>>>>>> if (_refcount >= 0) {
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Should be > 0 and we should assert if this is ever zero instead,
>>>>>>>>> and
>>>>>>>>> allow anything negative to mean that this count has gone
>>>>>>>>> immortal.
>>>>>>>>>
>>>>>>>>> Kim thought it should use CAS rather than atomic increment and
>>>>>>>>> decrement, but maybe that isn't necessary, especially since there
>>>>>>>>> isn't a short version of cmpxchg.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Summary:
>>>>>>>>>>
>>>>>>>>>> The test was loading a lot of JCK classes into the same VM.
>>>>>>>>>> Many of
>>>>>>>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>>>>>>>> refcount (a signed short integer) of this Symbol would run up
>>>>>>>>>> and
>>>>>>>>>> past 0x7fff.
>>>>>>>>>>
>>>>>>>>>> The assert was caused by a race condition: the refcount started
>>>>>>>>>> with
>>>>>>>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>>>>>>>> decrementing and several other threads are incrementing. The
>>>>>>>>>> refcount
>>>>>>>>>> will end up being 0x8000 or slightly higher (limited to the
>>>>>>>>>> number of
>>>>>>>>>> concurrent threads that are running within a small window of
>>>>>>>>>> several
>>>>>>>>>> instructions in the decrementing thread, so most likely it
>>>>>>>>>> will be
>>>>>>>>>> 0x800?).
>>>>>>>>>>
>>>>>>>>>> As a result, the decrementing thread found that the refecount is
>>>>>>>>>> negative after the operation, and thought that an underflow had
>>>>>>>>>> happened.
>>>>>>>>>>
>>>>>>>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>>>>>>>> 0xbfff] range and do not flag these as underflows (since they
>>>>>>>>>> are
>>>>>>>>>> most likely overflows -- overflows are already handled by
>>>>>>>>>> making the
>>>>>>>>>> Symbol permanent).
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
More information about the hotspot-runtime-dev
mailing list