RFR(m): 8214271: Fast primitive to wake many threads

Mon Dec 17 11:18:12 UTC 2018

Hi David,

On 11/28/18 6:41 AM, David Holmes wrote:
> Hi Robbin,
> 
> Part 3 (and final part)
> 
> This is looking at the futex implementation and the test.
> 
> Overall looks good!

Great!

> 
> Shouldn't you be using the FUTEX_*_PRIVATE variants as this is a process-local 
> FUTEX?

Yes

> 
> Your main code would be simpler to read (and thus understand) if your futex 
> helper function handled the unused arguments e.g. instead of:
> 
> static int futex(volatile int *uaddr, int futex_op, int val,
>                   const struct timespec *timeout, int *uaddr2, int val3)
>   {
>     return syscall(SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3);
>   }
> 
> use (with name changes):
> 
> static int futex(volatile int *addr, int futex_op, int op_arg) {
>     return syscall(SYS_futex, addr, futex_op, op_arg, NULL, NULL, 0);
> }
> 

Fixed

> ---
> 
> Minor suggestions on this comment block:
> 
>    88     // Return value 0, re-check in case of spurious wake-up.
>    89     // EINTR and re-check and go back to waiting.
>    90     // EAGAIN we already are disarmed, we should pass the check,
>    91     // if not re-armed with same tag.
> 
> ->
> 
>    // Return value 0: woken up, but re-check in case of spurious wakeup
>    // Error EINTR: woken by signal, so re-check and re-wait if necessary.
>    // Error EAGAIN: we are already disarmed and so will pass the check
> 

Fixed

> As per initial review comments I think re-arming with same tag should be an 
> error. I think the tag semantics should be that the tag is a strictly increasing 
> value, and we can then store the previous tag and check that.

Since the futex only have a 32-bit this would eventually wrap.
But we could save previously armed tag make sure the new one is different.

> 
> ---
> 
> The test could do with at least an introductory comment block explaining the 
> basic operation of the test. Further inline comments wouldn't hurt either.

Sure!

Sending update with reply to first RFR mail, when tested and gtest commented.

/Robbin

> 
> Thanks,
> David
> 
> On 24/11/2018 2:55 am, Robbin Ehn wrote:
>> Forgot RFR in subject.
>>
>> /Robbin
>>
>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>> Hi all, please review.
>>>
>>> When a safepoint is ended we need a way to get back to 100% utilization as fast
>>> as possible. 100% utilization means no idle cpu in the system if there is a
>>> JavaThread that could be executed. The traditional ways to wake many, e.g.
>>> semaphore, pthread_cond, is not implemented with a single syscall instead they
>>> typical do one syscall per thread to wake.
>>>
>>> This change-set contains that primitive, the WaitBarrier, and a gtest for it.
>>> No actual users, which is in coming patches.
>>>
>>> The WaitBarrier solves by doing a cooperative semaphore posting, threads woken
>>> will also post. On Linux we can instead directly use a futex and with one
>>> syscall wake all. Depending on how many threads and cpus the performance vary,
>>> but a good utilization of the machine, just on the edge of saturated, the 
>>> time to reach 100% utilization is around 3 times faster with the WaitBarrier 
>>> (where futex is faster than semaphore).
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>
>>> Passes 100 iterations of gtest on our platforms, both fastdebug and release.
>>> And have been stable when used in safepoints (t1-8) (coming patches).
>>>
>>> Thanks, Robbin