Code Review request for bug fixes found by the Contended Locking project

Thu Jan 17 19:13:25 PST 2013

Vitaly,

Thanks for the question. Reply embedded below.

On 1/17/13 4:16 PM, Vitaly Davidovich wrote:
>
> Hi Dan,
>
> Just curious - what's the reason for adding OrderAccess::fence() after 
> the pthread_mutex_unlock() calls? Is this working around some libc 
> issues or something? I'd expect those unlocks to provide the ordering 
> and visibility guarantee.
>

I presume you are asking about one of these two patterns:

src/os/linux/vm/os_linux.cpp:

4979 void os::PlatformEvent::park() {
<snip>
5004     _Event = 0 ;
5005      status = pthread_mutex_unlock(_mutex);
5006      assert_status(status == 0, status, "mutex_unlock");
5007     // Paranoia to ensure our locked and lock-free paths interact
5008     // correctly with each other.
5009     OrderAccess::fence();

or

5194 void Parker::park(bool isAbsolute, jlong time) {
<snip>
5240     _counter = 0;
5241     status = pthread_mutex_unlock(_mutex);
5242     assert (status == 0, "invariant") ;
5243     // Paranoia to ensure our locked and lock-free paths interact
5244     // correctly with each other and Java-level accesses.
5245     OrderAccess::fence();
5246     return;

The short answer is that we use both locked and lock-free paths
in os::PlatformEvent::park(), os::PlatformEvent::unpark() and in
Parker::park(). Those optimizations on our part may not play well
with optimizations done in a looser pthreads implementation. This
is _not_ a bug in the pthreads implementation. pthreads is allowed
to presume that everyone is playing nice and using locks; this
allows pthreads to make some optimizations of their own like
deferring memory syncing to certain points.

The slightly longer answer is:

These changes were made as part of the fix for the following bug:

     8004902 correctness fixes motivated by contended locking work (6607129)

In the last paragraph of the triangular-race.txt attachment of 8004902,
I wrote the following about these fence() calls:

> The addition of fence() calls after the second and third settings of
> "_counter = 0" was done in the original fix for 6822370 for similar
> reasons: worries about weaker memory model semantics. Those two settings
> of "_counter = 0" are done while the mutex lock is held and they
> are followed by unlock(_mutex) operations. In all of the transaction
> diagrams, I included a "// _counter flushes" comment after T2 did an
> "unlock(_mutex)" operation. However, there is concern that unlock(_mutex)
> may not force _counter to be flushed. It appears that a loose pthreads
> implementation is allowed to be lax on memory flushes on unlock as long
> as everyone uses locks. Since unpark() always uses locks, I think it is
> OK for the transaction diagrams to show T2's unlock() calls including the
> "// _counter flushes" comment. In support of this, I'll note that the
> original fix for 6822370 does not include fence() calls after unpark()
> calls unlock(_mutex). Dave D's latest changes also do not include fence()
> calls after unpark() calls unlock(_mutex)

There is some context in the above explanation that assumes that you
have read other parts of triangular-race.txt.

For the extremely long answer about the entire triangular race, I
refer you to the triangular-race.txt attachment.

Thanks again for the question. Please let me know if I have not addressed
the question to your satisfaction.

Dan

> Thanks
>
> Sent from my phone
>
> On Jan 17, 2013 1:46 PM, "Daniel D. Daugherty" 
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
>     Greetings,
>
>     I have been working on the Contended Locking project and there are
>     some
>     bug fixes that I'd like to push into HSX-25 independently of the
>     Contended
>     Locking project.
>
>     I'm using three different bug IDs to track these very distinct bug
>     fixes:
>
>         6444286 Possible naked oop related to biased locking revocation
>                 safepoint in jni_exit()
>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6444286
>     https://jbs.oracle.com/bugs/browse/JDK-6444286
>
>         8004902 correctness fixes motivated by contended locking work
>     (6607129)
>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8004902
>     https://jbs.oracle.com/bugs/browse/JDK-8004902
>
>         8004903 VMThread::execute() calls
>     Thread::check_for_valid_safepoint_state()
>                 on concurrent VM ops
>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8004903
>     https://jbs.oracle.com/bugs/browse/JDK-8004903
>
>     Here is the URL for the webrev:
>
>     http://cr.openjdk.java.net/~dcubed/cl_bug_fix_bucket-webrev/2/
>     <http://cr.openjdk.java.net/%7Edcubed/cl_bug_fix_bucket-webrev/2/>
>
>     These changes have been through two internal rounds of code review
>     so I
>     think they are ready for wider review. The changes for 8004902 are
>     very
>     tricky and there is long write-up attached to the 8004902 bug
>     report that
>     explains the intricate details of the "triangular race". The
>     changes for
>     6444286 and 8004903 are more straight forward.
>
>     These changes have been through JPRT build-and-test and have also been
>     tested with the vm.parallel_class_loading and vm.quick subsuites via
>     the Aurora ad-hoc testing mechanism.
>
>     The current set of reviewers is:
>
>         acorn, dholmes, dice
>
>     As always, other reviewers are welcome. Comments, questions and
>     suggestions are also welcome.
>
>     Dan
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130117/43b25b70/attachment-0001.html