RFR(s): 8204166: TLH: Semaphore may not be destroy until signal have returned.

Robbin Ehn robbin.ehn at oracle.com
Thu Jun 14 10:11:30 UTC 2018


Hi all, please review.

Bug: https://bugs.openjdk.java.net/browse/JDK-8204166
Webrev: http://cr.openjdk.java.net/~rehn/8204166/v1/webrev/

The root cause of this failure is a bug in the posix semaphores: 
https://sourceware.org/bugzilla/show_bug.cgi?id=12674

Thread a:
sem_post(my_sem);

Thread b:
sem_wait(my_sem);
sem_destroy(my_sem);

Thread b is waiting on my_sem (count 0), Thread a posts (count 0->1).
If Thread b start executing directly after the increment in post but before
Thread a leaves the call to post and manage to destroy the semaphore. Thread a
_can_ get EINVAL from sem_post! This is fixed in newer glibc(2.21).

Note that mutexes have had same issue on some platforms:
https://sourceware.org/bugzilla/show_bug.cgi?id=13690
Fixed in 2.23.

Since we only have one handshake operation running at anytime (safepoints and 
handshakes are also mutual exclusive, both run on VM Thread) we can actually 
always use the same semaphore. This patch changes the _done semaphore to be 
static instead, thus avoiding the post<->destroy race.

Patch also contains some small changes which remove of dead code, remove 
unneeded state, handling of cases which we can't easily say will never happen 
and some additional error checks.

Handshakes test passes, but they don't trigger the original issue, so more 
interesting is that this issue do not happen when running ZGC which utilize 
handshakes with the static semaphore.

Thanks, Robbin


More information about the hotspot-dev mailing list