RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Oct 21 03:30:38 UTC 2015
On 10/20/15, 9:24 PM, David Holmes wrote:
> On 21/10/2015 12:51 PM, Daniel D. Daugherty wrote:
>> Thanks for the quick review!
>>
>>
>> On 10/20/15, 8:34 PM, David Holmes wrote:
>>> On 21/10/2015 10:43 AM, serguei.spitsyn at oracle.com wrote:
>>>> Dan,
>>>>
>>>> Nice finding!
>>>>
>>>> The fixes look good to me.
>>>> The changes in the oops_do() and deflate_idle_monitors() are not
>>>> necessary
>>>> as these functions are called in VM_Operation's.
>>>> But some unification with OrderAccess::load_ptr_acquire() does not
>>>> harm
>>>> either.
>>>
>>> I was thinking on the same lines: that reading the variable at a
>>> safepoint should already be safe, but that consistently using
>>> load_acquire_ptr is harmless and stylistically a 'Good Thing".
>>
>> Good. We're on the same page here...
>>
>>> The other thought I had was perhaps making the verification routines
>>> (which are non-product) use the lock as well. :)
>>
>> If the verification routine had used the lock, then we wouldn't
>> have spotted this unsafe use of the lock-free algorithm in this
>> context. The lock-free calls to ObjectSynchronizer::monitors_iterate()
>> would simply miss everything other than the new ObjectMonitor block
>> on occasion; not an easy thing to diagnose unless you are looking
>> for something specific on the list that you know should be there...
>
> AFAICS monitors_iterate() is either called at a safepoint or else with
> the lock held. Exactly which call to monitors_iterate exposed this
> problem?
I could swear I ran across a monitors_iterate() call from
JVM/TI that didn't go to a safepoint because the target
thread was suspended... I have a vague memory that JVM/TI
uses a safepoint-op when the target thread is not suspended
and goes direct when the target thread is suspended...
Dan
>
>>
>>> Aside: did you check that all the other global monitor variables are
>>> accessed correctly - ie either always with the lock held or using
>>> OrderAccess operations if lock-free?
>>
>> Nothing more than a cursory look. I have a deflate_idle_monitors bug
>> on my plate also so I'll be looking at this code even more...
>
> Ok.
>
> Thanks,
> David
>
>>
>>> I have still have the query about the vmStructs changes but otherwise
>>> thumbs up.
>>
>> My reply to that query crossed in the ether with this review...
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 10/20/15 14:15, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've updated the fix based on feedback from Carsten V and David H.
>>>>>
>>>>> Webrev URL:
>>>>> http://cr.openjdk.java.net/~dcubed/8047212-webrev/1-jdk9-hs-rt/
>>>>>
>>>>> Changes relative to round 0:
>>>>>
>>>>> - only src/share/vm/runtime/synchronizer.cpp has changed
>>>>> - reads of gBlockList now use OrderAccess::load_ptr_acquire()
>>>>>
>>>>> code style cleanups:
>>>>>
>>>>> - only cleaned up the functions that I touched to make the
>>>>> OrderAccess::load_ptr_acquire() changes
>>>>> - changed implied booleans into real boolean expressions
>>>>> - moved some locals to narrower context
>>>>> - added/removed some blank lines
>>>>> - made casts consistent with the majority style in this file
>>>>>
>>>>> I'm repeating all of the same testing that I did for round 0. The
>>>>> round 1 bits have not yet made it through JPRT-west, but the jobs
>>>>> are mostly done.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/19/15, 9:02 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix for a long standing race between the lock-free
>>>>>> ObjectMonitor
>>>>>> verification code and the normal (locked) ObjectMonitor block
>>>>>> allocation
>>>>>> code path. For this fix, I would like at least a Runtime team
>>>>>> reviewer
>>>>>> and a Serviceability team reviewer. Thanks!
>>>>>>
>>>>>> JDK-8047212
>>>>>> runtime/ParallelClassLoading/bootstrap/random/inner-complex
>>>>>> assert(ObjectSynchronizer::verify_objmon_isinpool(inf)) failed:
>>>>>> monitor is invalid
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8047212
>>>>>>
>>>>>> Webrev URL:
>>>>>> http://cr.openjdk.java.net/~dcubed/8047212-webrev/0-jdk9-hs-rt/
>>>>>>
>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>> 4 inner-complex fastdebug parallel runs for 4+ days and
>>>>>> 600K iterations without seeing this failure; the
>>>>>> experiment
>>>>>> is still running; final results to be reported at the end
>>>>>> of the review cycle
>>>>>> JPRT -testset hotspot
>>>>>>
>>>>>> This fix:
>>>>>>
>>>>>> - makes ObjectMonitor::gBlockList volatile
>>>>>> - uses "OrderAccess::release_store_ptr(&gBlockList, temp)" to
>>>>>> make sure the new block updates _happen before_ gBlockList is
>>>>>> changed to refer to the new block
>>>>>> - add SA support for a "static pointer volatile" field like:
>>>>>>
>>>>>> static ObjectMonitor * volatile gBlockList;
>>>>>>
>>>>>> See the following link for a nice description of what "volatile"
>>>>>> means in the different positions on a variable/parameter decl line:
>>>>>>
>>>>>> http://www.embedded.com/electronics-blogs/beginner-s-corner/4023801/Introduction-to-the-Volatile-Keyword
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>
More information about the serviceability-dev
mailing list