RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
David Holmes
david.holmes at oracle.com
Wed Oct 21 03:33:49 UTC 2015
On 21/10/2015 1:30 PM, Daniel D. Daugherty wrote:
> On 10/20/15, 9:24 PM, David Holmes wrote:
>> On 21/10/2015 12:51 PM, Daniel D. Daugherty wrote:
>>> Thanks for the quick review!
>>>
>>>
>>> On 10/20/15, 8:34 PM, David Holmes wrote:
>>>> On 21/10/2015 10:43 AM, serguei.spitsyn at oracle.com wrote:
>>>>> Dan,
>>>>>
>>>>> Nice finding!
>>>>>
>>>>> The fixes look good to me.
>>>>> The changes in the oops_do() and deflate_idle_monitors() are not
>>>>> necessary
>>>>> as these functions are called in VM_Operation's.
>>>>> But some unification with OrderAccess::load_ptr_acquire() does not
>>>>> harm
>>>>> either.
>>>>
>>>> I was thinking on the same lines: that reading the variable at a
>>>> safepoint should already be safe, but that consistently using
>>>> load_acquire_ptr is harmless and stylistically a 'Good Thing".
>>>
>>> Good. We're on the same page here...
>>>
>>>> The other thought I had was perhaps making the verification routines
>>>> (which are non-product) use the lock as well. :)
>>>
>>> If the verification routine had used the lock, then we wouldn't
>>> have spotted this unsafe use of the lock-free algorithm in this
>>> context. The lock-free calls to ObjectSynchronizer::monitors_iterate()
>>> would simply miss everything other than the new ObjectMonitor block
>>> on occasion; not an easy thing to diagnose unless you are looking
>>> for something specific on the list that you know should be there...
>>
>> AFAICS monitors_iterate() is either called at a safepoint or else with
>> the lock held. Exactly which call to monitors_iterate exposed this
>> problem?
>
> I could swear I ran across a monitors_iterate() call from
> JVM/TI that didn't go to a safepoint because the target
> thread was suspended... I have a vague memory that JVM/TI
> uses a safepoint-op when the target thread is not suspended
> and goes direct when the target thread is suspended...
My bad - you are right, it skips the safepoint when the target is suspended.
David
> Dan
>
>
>>
>>>
>>>> Aside: did you check that all the other global monitor variables are
>>>> accessed correctly - ie either always with the lock held or using
>>>> OrderAccess operations if lock-free?
>>>
>>> Nothing more than a cursory look. I have a deflate_idle_monitors bug
>>> on my plate also so I'll be looking at this code even more...
>>
>> Ok.
>>
>> Thanks,
>> David
>>
>>>
>>>> I have still have the query about the vmStructs changes but otherwise
>>>> thumbs up.
>>>
>>> My reply to that query crossed in the ether with this review...
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 10/20/15 14:15, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I've updated the fix based on feedback from Carsten V and David H.
>>>>>>
>>>>>> Webrev URL:
>>>>>> http://cr.openjdk.java.net/~dcubed/8047212-webrev/1-jdk9-hs-rt/
>>>>>>
>>>>>> Changes relative to round 0:
>>>>>>
>>>>>> - only src/share/vm/runtime/synchronizer.cpp has changed
>>>>>> - reads of gBlockList now use OrderAccess::load_ptr_acquire()
>>>>>>
>>>>>> code style cleanups:
>>>>>>
>>>>>> - only cleaned up the functions that I touched to make the
>>>>>> OrderAccess::load_ptr_acquire() changes
>>>>>> - changed implied booleans into real boolean expressions
>>>>>> - moved some locals to narrower context
>>>>>> - added/removed some blank lines
>>>>>> - made casts consistent with the majority style in this file
>>>>>>
>>>>>> I'm repeating all of the same testing that I did for round 0. The
>>>>>> round 1 bits have not yet made it through JPRT-west, but the jobs
>>>>>> are mostly done.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/19/15, 9:02 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a fix for a long standing race between the lock-free
>>>>>>> ObjectMonitor
>>>>>>> verification code and the normal (locked) ObjectMonitor block
>>>>>>> allocation
>>>>>>> code path. For this fix, I would like at least a Runtime team
>>>>>>> reviewer
>>>>>>> and a Serviceability team reviewer. Thanks!
>>>>>>>
>>>>>>> JDK-8047212
>>>>>>> runtime/ParallelClassLoading/bootstrap/random/inner-complex
>>>>>>> assert(ObjectSynchronizer::verify_objmon_isinpool(inf)) failed:
>>>>>>> monitor is invalid
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8047212
>>>>>>>
>>>>>>> Webrev URL:
>>>>>>> http://cr.openjdk.java.net/~dcubed/8047212-webrev/0-jdk9-hs-rt/
>>>>>>>
>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>> 4 inner-complex fastdebug parallel runs for 4+ days and
>>>>>>> 600K iterations without seeing this failure; the
>>>>>>> experiment
>>>>>>> is still running; final results to be reported at the end
>>>>>>> of the review cycle
>>>>>>> JPRT -testset hotspot
>>>>>>>
>>>>>>> This fix:
>>>>>>>
>>>>>>> - makes ObjectMonitor::gBlockList volatile
>>>>>>> - uses "OrderAccess::release_store_ptr(&gBlockList, temp)" to
>>>>>>> make sure the new block updates _happen before_ gBlockList is
>>>>>>> changed to refer to the new block
>>>>>>> - add SA support for a "static pointer volatile" field like:
>>>>>>>
>>>>>>> static ObjectMonitor * volatile gBlockList;
>>>>>>>
>>>>>>> See the following link for a nice description of what "volatile"
>>>>>>> means in the different positions on a variable/parameter decl line:
>>>>>>>
>>>>>>> http://www.embedded.com/electronics-blogs/beginner-s-corner/4023801/Introduction-to-the-Volatile-Keyword
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>
More information about the hotspot-runtime-dev
mailing list