Code Review request for bug fixes found by the Contended Locking project

Thu Jan 17 19:49:41 PST 2013

Hi Dan,

Yup, those are the patterns I was referring to.

So to make sure I understand correctly, you're saying that some pthreads
implementations either (a) allow code motion by compiler that moves other
code above the unlock() call or (b) generate assembly instructions that
don't entail a full fence and thus allow the CPU to reorder surrounding
memory accesses.  I'm a bit unclear on what it means, at compiler or
CPU/assembly level, for pthreads implementation to defer memory fences
until a later point in time - is there an example? Is this a theoretical
issue that you guys spotted or did someone come across a bug that was due
to this?

Also, what performance effect, if any, will this have for platforms using a
strong pthreads impl? Two consecutive full barriers will be executed right
after each other (don't know whether compiler will pick up on this and
coalesce them into 1).  Maybe the 2nd one is "cheap" since the first one
will take the hit of stalling the pipeline and emptying store buffers.

Finally, what supported architectures/platforms could have this problem? I
know Alpha is funky, but I don't think hotspot supports it.  What's left,
PowerPC or something?

Sorry to sidetrack this thread a bit. :)

Thanks for your response

Sent from my phone
On Jan 17, 2013 10:13 PM, "Daniel D. Daugherty" <daniel.daugherty at oracle.com>
wrote:

>  Vitaly,
>
> Thanks for the question. Reply embedded below.
>
>
>  On 1/17/13 4:16 PM, Vitaly Davidovich wrote:
>
> Hi Dan,
>
> Just curious - what's the reason for adding OrderAccess::fence() after the
> pthread_mutex_unlock() calls? Is this working around some libc issues or
> something? I'd expect those unlocks to provide the ordering and visibility
> guarantee.
>
>
> I presume you are asking about one of these two patterns:
>
> src/os/linux/vm/os_linux.cpp:
>
> 4979 void os::PlatformEvent::park() {
> <snip>
> 5004     _Event = 0 ;
> 5005      status = pthread_mutex_unlock(_mutex);
> 5006      assert_status(status == 0, status, "mutex_unlock");
> 5007     // Paranoia to ensure our locked and lock-free paths interact
> 5008     // correctly with each other.
> 5009     OrderAccess::fence();
>
> or
>
> 5194 void Parker::park(bool isAbsolute, jlong time) {
> <snip>
> 5240     _counter = 0;
> 5241     status = pthread_mutex_unlock(_mutex);
> 5242     assert (status == 0, "invariant") ;
> 5243     // Paranoia to ensure our locked and lock-free paths interact
> 5244     // correctly with each other and Java-level accesses.
> 5245     OrderAccess::fence();
> 5246     return;
>
>
> The short answer is that we use both locked and lock-free paths
> in os::PlatformEvent::park(), os::PlatformEvent::unpark() and in
> Parker::park(). Those optimizations on our part may not play well
> with optimizations done in a looser pthreads implementation. This
> is _not_ a bug in the pthreads implementation. pthreads is allowed
> to presume that everyone is playing nice and using locks; this
> allows pthreads to make some optimizations of their own like
> deferring memory syncing to certain points.
>
> The slightly longer answer is:
>
> These changes were made as part of the fix for the following bug:
>
>     8004902 correctness fixes motivated by contended locking work (6607129)
>
> In the last paragraph of the triangular-race.txt attachment of 8004902,
> I wrote the following about these fence() calls:
>
> The addition of fence() calls after the second and third settings of
> "_counter = 0" was done in the original fix for 6822370 for similar
> reasons: worries about weaker memory model semantics. Those two settings
> of "_counter = 0" are done while the mutex lock is held and they
> are followed by unlock(_mutex) operations. In all of the transaction
> diagrams, I included a "// _counter flushes" comment after T2 did an
> "unlock(_mutex)" operation. However, there is concern that unlock(_mutex)
> may not force _counter to be flushed. It appears that a loose pthreads
> implementation is allowed to be lax on memory flushes on unlock as long
> as everyone uses locks. Since unpark() always uses locks, I think it is
> OK for the transaction diagrams to show T2's unlock() calls including the
> "// _counter flushes" comment. In support of this, I'll note that the
> original fix for 6822370 does not include fence() calls after unpark()
> calls unlock(_mutex). Dave D's latest changes also do not include fence()
> calls after unpark() calls unlock(_mutex)
>
>
> There is some context in the above explanation that assumes that you
> have read other parts of triangular-race.txt.
>
> For the extremely long answer about the entire triangular race, I
> refer you to the triangular-race.txt attachment.
>
> Thanks again for the question. Please let me know if I have not addressed
> the question to your satisfaction.
>
> Dan
>
>
>   Thanks
>
> Sent from my phone
> On Jan 17, 2013 1:46 PM, "Daniel D. Daugherty" <
> daniel.daugherty at oracle.com> wrote:
>
>> Greetings,
>>
>> I have been working on the Contended Locking project and there are some
>> bug fixes that I'd like to push into HSX-25 independently of the Contended
>> Locking project.
>>
>> I'm using three different bug IDs to track these very distinct bug
>> fixes:
>>
>>     6444286 Possible naked oop related to biased locking revocation
>>             safepoint in jni_exit()
>>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6444286
>>     https://jbs.oracle.com/bugs/browse/JDK-6444286
>>
>>     8004902 correctness fixes motivated by contended locking work
>> (6607129)
>>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8004902
>>     https://jbs.oracle.com/bugs/browse/JDK-8004902
>>
>>     8004903 VMThread::execute() calls
>> Thread::check_for_valid_safepoint_state()
>>             on concurrent VM ops
>>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8004903
>>     https://jbs.oracle.com/bugs/browse/JDK-8004903
>>
>> Here is the URL for the webrev:
>>
>>     http://cr.openjdk.java.net/~dcubed/cl_bug_fix_bucket-webrev/2/
>>
>> These changes have been through two internal rounds of code review so I
>> think they are ready for wider review. The changes for 8004902 are very
>> tricky and there is long write-up attached to the 8004902 bug report that
>> explains the intricate details of the "triangular race". The changes for
>> 6444286 and 8004903 are more straight forward.
>>
>> These changes have been through JPRT build-and-test and have also been
>> tested with the vm.parallel_class_loading and vm.quick subsuites via
>> the Aurora ad-hoc testing mechanism.
>>
>> The current set of reviewers is:
>>
>>     acorn, dholmes, dice
>>
>> As always, other reviewers are welcome. Comments, questions and
>> suggestions are also welcome.
>>
>> Dan
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130117/74239722/attachment-0001.html