RFR (XS): 8236177: assert(status == 0) failed: error ETIMEDOUT(60), cond_wait

gerard ziemski gerard.ziemski at oracle.com
Fri May 1 20:59:07 UTC 2020


hi David,

On 3/27/20 5:53 PM, David Holmes wrote:
> Hi Gerard,
>
> On 28/03/2020 2:23 am, gerard ziemski wrote:
>> Thnak you David for your feedback!
>>
>> On 3/25/20 6:43 PM, David Holmes wrote:
>>> Hi Gerard,
>>>
>>> On 26/03/2020 4:03 am, gerard ziemski wrote:
>>>> hi all,
>>>>
>>>> Please review this "workaround" for now, which can not be called an 
>>>> actual fix just yet, designed to figure out why on Mac OS X, we get 
>>>> (very rarely) ETIMEDOUT when calling pthread_cond_wait() API. On 
>>>> other hand, it might actually fix it.
>>>
>>> The ETIMEDOUT should be treated as a "spurious wakeup" and we will 
>>> naturally retry the wait if the condition is not yet met. All we 
>>> have to do to our code is adjust the assert so that ETIMEDOUT 
>>> doesn't cause it to fail.
>>
>> My initial approach, as I noted in the bug, was to do exactly that, 
>> but when you commented in the bug, that we used to run into these 
>> kinds of issues before, and used to have workarounds in place for 
>> that, it made me worry that we will do again the workaround, and at 
>> some point in the future we remove it again, and we will back to 
>> square one.
>
> I'm sorry my comment misled you.
>
>> I was hoping that now we do something to get a better sense of what 
>> the underlying issue is, which is why I proposed this change instead.
>>
>> Is that OK?
>
> There is no need for that kind of workaround. The effect of the bug is 
> at worst a spurious wakeup (and we can't tell if it is spurious or not 
> without more investigation - not that it matters.) We just need to fix 
> the assert.

I now understand why to fix this issue all we need is to only add the 
extra assert (thank you David!). It's because any clients that call it 
must check predicate used to protect it regardless, so it doesn't matter 
why it returns (https://linux.die.net/man/3/pthread_cond_wait). In all 
the 3 cases where we call pthread_cond_wait(), we do indeed do that 
(usually in a while loop), though the calling chain may obscure it.

In the fix I introduce MAC_ONLY() macro, as per David's suggestion.

bug link at https://bugs.openjdk.java.net/browse/JDK-8236177
open webrev at http://cr.openjdk.java.net/~gziemski/8236177_rev3
testing: Mach hs-tier1,2,3,4,5 in progress...


cheers





More information about the hotspot-runtime-dev mailing list