[jdk17] RFR: 8269865: Async UL needs to handle ERANGE on exceeding SEM_VALUE_MAX [v4]
David Holmes
david.holmes at oracle.com
Thu Jul 8 02:01:59 UTC 2021
On 8/07/2021 11:39 am, Xin Liu wrote:
> On Wed, 7 Jul 2021 08:08:10 GMT, Xin Liu <xliu at openjdk.org> wrote:
>
>>> This patch solved the sempahore overflow issue with errno ERANGE or EOVERFLOW.
>>> Previously, we have asymmetric p/v operations for semaphore _sem. Each iteration
>>> only decrements _sem 1 but dequeues N messages. If logging threads keep preempting
>>> async logging thread, it may cause the value of _sem accumulates until overflow!
>>>
>>> The patch corrects the value of _sem after write(). n messages are dequeued/processed.
>>> We need to invoke _sem.wait() max(n-1, 1) time. This ensures that each iteration
>>> decrements n instead of 1.
>>
>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Use the new API signal_overflow of semaphore
>>
>> This patch handles overflow scenerios for Posix and Windows.
>> MacOS platform doesn't have any error so we ignore it.
>
> Hi, @tstuefe ,
> Do you mind trying this patch? David and I think we should ignore `sem_post` error when its value has been overflown. It happens rarely and ignoring this error won't affect correctness. I just wonder if this approach can solve the LogConfigurationTest.reconfigure_decorators_MT_vm failure on AIX?
>
>> Figuring out the best way to handle this is proving to be quite tricky
>
> If it works out, the one open issue is what we should do on MacOS. My option is that we just leave it alone since `semaphore_signal`(not posix semaphore) on Darwin doesn't overflow. We can revisit this later if the assumption turns out false in the future.
It isn't that it doesn't overflow, but that it doesn't appear to detect
the overflow. So what we don't know is what happens to the operation of
the semaphore after an overflow occurs - we might incur a situation
where wait() will not return until we increment from a huge negative
number back to a positive one.
I'm also thinking that perhaps we need to handle overflow in product
builds, which means changing the API even more. For example:
bool signal(int n = 1)
and return false if we hit an error like EOVERFLOW.
Cheers,
David
> -------------
>
> PR: https://git.openjdk.java.net/jdk17/pull/216
>
More information about the hotspot-runtime-dev
mailing list