RFR(s): 8204166: TLH: Semaphore may not be destroy until signal have returned.
Robbin Ehn
robbin.ehn at oracle.com
Thu Jun 14 10:11:30 UTC 2018
Hi all, please review.
Bug: https://bugs.openjdk.java.net/browse/JDK-8204166
Webrev: http://cr.openjdk.java.net/~rehn/8204166/v1/webrev/
The root cause of this failure is a bug in the posix semaphores:
https://sourceware.org/bugzilla/show_bug.cgi?id=12674
Thread a:
sem_post(my_sem);
Thread b:
sem_wait(my_sem);
sem_destroy(my_sem);
Thread b is waiting on my_sem (count 0), Thread a posts (count 0->1).
If Thread b start executing directly after the increment in post but before
Thread a leaves the call to post and manage to destroy the semaphore. Thread a
_can_ get EINVAL from sem_post! This is fixed in newer glibc(2.21).
Note that mutexes have had same issue on some platforms:
https://sourceware.org/bugzilla/show_bug.cgi?id=13690
Fixed in 2.23.
Since we only have one handshake operation running at anytime (safepoints and
handshakes are also mutual exclusive, both run on VM Thread) we can actually
always use the same semaphore. This patch changes the _done semaphore to be
static instead, thus avoiding the post<->destroy race.
Patch also contains some small changes which remove of dead code, remove
unneeded state, handling of cases which we can't easily say will never happen
and some additional error checks.
Handshakes test passes, but they don't trigger the original issue, so more
interesting is that this issue do not happen when running ZGC which utilize
handshakes with the static semaphore.
Thanks, Robbin
More information about the hotspot-dev
mailing list