RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code

Carsten Varming varming at gmail.com
Tue Oct 20 15:52:01 UTC 2015


Dear Dan,

See inline.

On Tue, Oct 20, 2015 at 7:29 AM, Daniel D. Daugherty <
daniel.daugherty at oracle.com> wrote:

> Carsten,
>
> It's been a long time! Thanks for the quick review...
>

It was a nice small change. Sorry, I am asking for a larger change.


> Replies embedded below...
>
> On 10/19/15, 10:09 PM, Carsten Varming wrote:
>
> Dear Dan,
>
> I thought acquire/release semantics mandated that the release should be
> paired with an acquire.
>
>
> Yes, src/share/vm/runtime/orderAccess.hpp is pretty clear about
> that. I modeled this fix after some other ObjectMonitor code
> that doesn't (yet) follow that mandate.
>
>
> In this case, the first line in verify_objmon_isinpool should be changed
> from "PaddedEnd<ObjectMonitor> * block = (PaddedEnd<ObjectMonitor>
> *)gBlockList" to use a load_acquire. Actually, it seems like there are a
> lot of reads of gBlockList that should be replaced with a load_acquire in
> synchronizer.cpp.
>
>
> Yes, that would be a good change and cleanup. The really annoying
> part is that the current fix seems to have resolved the bug...
>
> Current numbers: 700K iterations of 4 fastdebug instances in parallel
> in 4.5 days. I typically saw one failure per day in a three day run...
>

I suspect the data dependency between "block = gBlockList" in the top of
verify_objmon_isinpool and "block = block->FreeNext" at the bottom prevents
the compiler from re-ordering the loads. Hence the assembly will be
correct. Likewise the data dependency prevents the cpu from re-ordering the
loads. Hence the program doesn't crash. Nevertheless, I encourage you to
use the right acquire/release semantics.


>
>
> Otherwise the change looks great.
>
>
> Thanks!
>

You are welcome.

Carsten


> Dan
>
>
>
> Carsten
> (not a reviewer).
>
> On Mon, Oct 19, 2015 at 8:02 PM, Daniel D. Daugherty <
> daniel.daugherty at oracle.com> wrote:
>
>> Greetings,
>>
>> I have a fix for a long standing race between the lock-free ObjectMonitor
>> verification code and the normal (locked) ObjectMonitor block allocation
>> code path. For this fix, I would like at least a Runtime team reviewer
>> and a Serviceability team reviewer. Thanks!
>>
>> JDK-8047212 runtime/ParallelClassLoading/bootstrap/random/inner-complex
>>             assert(ObjectSynchronizer::verify_objmon_isinpool(inf))
>> failed:
>>             monitor is invalid
>> https://bugs.openjdk.java.net/browse/JDK-8047212
>>
>> Webrev URL:
>> http://cr.openjdk.java.net/~dcubed/8047212-webrev/0-jdk9-hs-rt/
>>
>> Testing: Aurora Adhoc RT-SVC nightly batch
>>          4 inner-complex fastdebug parallel runs for 4+ days and
>>            600K iterations without seeing this failure; the experiment
>>            is still running; final results to be reported at the end
>>            of the review cycle
>>          JPRT -testset hotspot
>>
>> This fix:
>>
>> - makes ObjectMonitor::gBlockList volatile
>> - uses "OrderAccess::release_store_ptr(&gBlockList, temp)" to
>>   make sure the new block updates _happen before_ gBlockList is
>>   changed to refer to the new block
>> - add SA support for a "static pointer volatile" field like:
>>
>>     static ObjectMonitor * volatile gBlockList;
>>
>> See the following link for a nice description of what "volatile"
>> means in the different positions on a variable/parameter decl line:
>>
>>
>> http://www.embedded.com/electronics-blogs/beginner-s-corner/4023801/Introduction-to-the-Volatile-Keyword
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>>
>


More information about the hotspot-runtime-dev mailing list