Project Loom VirtualThreads hang
Alex Otenko
oleksandr.otenko at gmail.com
Wed Dec 28 18:10:45 UTC 2022
Hi Robert,
Since you have a reproducer that doesn't use RingBuffer, this is a little
unnecessary, but I can explain a bit.
1. When you see multiple atomic operations, you wonder if things can happen
out of order. Who can mutate tail counter and who can mutate array value,
and what precludes more mutations of tail and array values? If you can't
prove constructively that it can't happen, you need to assume that it may
happen.
2. When you see integer arithmetic, you wonder what happens upon
wraparound, and when does that happen?
3. When you see a condition check (and an indicator that it is an error -
an exception is thrown), you wonder what may be the flip side of that
condition (may it go undetected?)
Those are red flags that drive investigation. Even without a proof of how
things go wrong, I would raise questions about its safety.
However, in this case we can even show how to deadlock.
Producer A and B, RingBuffer of size N.
Consumer emptied the buffer, so head == tail.
Producer A ready to offer, increments tail, and gets suspended before it
updates array at position tail. (E.g. interrupt preempts the thread)
Producer B offers N values. Observe that tail wraparound occurs.
Consumer consumes all N values. Now array cell at tail is null again.
Producer B offers N-2 values. Now array cell at tail is not null again.
Consumer consumes at least 1 value. Now array cell at tail is null.
Producer A resumes, stores the value, because the correctness condition is
met. (the flip side of exception throw - the inconsistency went unobserved)
Done. Now Consumer will not be able to access array cell with the value
Producer A has just stored, and will wait indefinitely. You may need to
track head to see why that is the case.
Alex
On Wed, 28 Dec 2022, 12:50 Robert Engels, <rengels at ix.netcom.com> wrote:
>
> Alex,
>
> You write:
>
> > I won't go into detail, but the producers not synchronizing between
> themselves leads to hangs.
>
>
> Can you explain? Im fairly certain the producers do valid synchronization.
> As I said, using native threads the code runs to completion fine
>
> Now that I understand the issue I was able to reproduce the issue very
> simply without any queues. See SpinTest posted to the project. The
> “starver” thread fails to make any progress - this should not be possible
> is Thread.yield() was fair.
>
> > On Dec 27, 2022, at 11:15 PM, Alex Otenko <oleksandr.otenko at gmail.com>
> wrote:
> > I won't go into detail, but the producers not synchronizing between
> themselves leads to hangs.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221228/56b3fd20/attachment.htm>
More information about the loom-dev
mailing list