RFR(s): 8204166: TLH: Semaphore may not be destroy until signal have returned.

Mon Jun 18 13:07:28 UTC 2018

Hi all,

After some internal discussions I changed the patch to:
http://rehn-ws.se.oracle.com/cr_mirror/8204166/v2/

Which handles thread off javathreads list better.

Passes handshake testing and ZGC testing seems okay.

Thanks, Robbin

On 06/14/2018 12:11 PM, Robbin Ehn wrote:
> Hi all, please review.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8204166
> Webrev: http://cr.openjdk.java.net/~rehn/8204166/v1/webrev/
> 
> The root cause of this failure is a bug in the posix semaphores: 
> https://sourceware.org/bugzilla/show_bug.cgi?id=12674
> 
> Thread a:
> sem_post(my_sem);
> 
> Thread b:
> sem_wait(my_sem);
> sem_destroy(my_sem);
> 
> Thread b is waiting on my_sem (count 0), Thread a posts (count 0->1).
> If Thread b start executing directly after the increment in post but before
> Thread a leaves the call to post and manage to destroy the semaphore. Thread a
> _can_ get EINVAL from sem_post! This is fixed in newer glibc(2.21).
> 
> Note that mutexes have had same issue on some platforms:
> https://sourceware.org/bugzilla/show_bug.cgi?id=13690
> Fixed in 2.23.
> 
> Since we only have one handshake operation running at anytime (safepoints and 
> handshakes are also mutual exclusive, both run on VM Thread) we can actually 
> always use the same semaphore. This patch changes the _done semaphore to be 
> static instead, thus avoiding the post<->destroy race.
> 
> Patch also contains some small changes which remove of dead code, remove 
> unneeded state, handling of cases which we can't easily say will never happen 
> and some additional error checks.
> 
> Handshakes test passes, but they don't trigger the original issue, so more 
> interesting is that this issue do not happen when running ZGC which utilize 
> handshakes with the static semaphore.
> 
> Thanks, Robbin