<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I realized I left off the list, which causes some discussion with Alex to be removed.<div class=""><br class=""></div><div class="">——</div><div class=""><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">I think the check + exception was not just an assertion. It could actually happen, if Producer B offers N values, and Producer A resumes before consumer makes any progress. This is probably not the case after adding the check.</span><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""><br class=""></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""><br class=""></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">Alex</div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">——</div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=""><br class=""></div><div class=""><span class="" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Thanks for the very thoughtful analysis.</span><div class="" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=""><div class=""><div dir="auto" class=""><div dir="auto" class="">Luckily, I am pretty certain it is a trivial fix for the out of order as well - simply ensure that “next tail” != head when checking if there is space in the buffer. The consumer cannot race ahead because it must read a non-null value. I posted the updated code. If Producer A never gets rescheduled the system will hang, but that would be considered a critical failure (which is what prompted the original inquiry).</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">To your other points:</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">1. yes, it requires significant cognitive reasoning to get right - oops.</div><div dir="auto" class="">2. this is why next() is used - to avoid problems with integer math</div><div dir="auto" class="">3. the condition check is really an assert. all else being correct it should never occur.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Luckily, as you point out, the bug in the ring buffer had nothing to do with the problems being reported: 1) that an “unparked” vthread never runs, and 2) that Thread.yield() does not behave as expected with vthreads. Happily, both seem to be addressed in JDK 20.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">——</div><div dir="auto" class="">But doing those tracking steps myself, it looks like i rushed to conclusions. We only observe elements issued by producer B in wrong order, but that's only a problem if it is meant to be a FIFO queue.<div dir="auto" class=""><br class=""></div><div dir="auto" class="">Alex</div><div dir="auto" class="">——</div><div dir="auto" class=""><br class=""></div></div></div></div></div><div><br class=""><blockquote type="cite" class=""><div class="">On Dec 28, 2022, at 12:10 PM, Alex Otenko <<a href="mailto:oleksandr.otenko@gmail.com" class="">oleksandr.otenko@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="auto" class="">Hi Robert, <div dir="auto" class=""><br class=""></div><div dir="auto" class="">Since you have a reproducer that doesn't use RingBuffer, this is a little unnecessary, but I can explain a bit.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">1. When you see multiple atomic operations, you wonder if things can happen out of order. Who can mutate tail counter and who can mutate array value, and what precludes more mutations of tail and array values? If you can't prove constructively that it can't happen, you need to assume that it may happen. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">2. When you see integer arithmetic, you wonder what happens upon wraparound, and when does that happen?</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">3. When you see a condition check (and an indicator that it is an error - an exception is thrown), you wonder what may be the flip side of that condition (may it go undetected?)</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Those are red flags that drive investigation. Even without a proof of how things go wrong, I would raise questions about its safety.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">However, in this case we can even show how to deadlock.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Producer A and B, RingBuffer of size N.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Consumer emptied the buffer, so head == tail.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Producer A ready to offer, increments tail, and gets suspended before it updates array at position tail. (E.g. interrupt preempts the thread)</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Producer B offers N values. Observe that tail wraparound occurs.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Consumer consumes all N values. Now array cell at tail is null again.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Producer B offers N-2 values. Now array cell at tail is not null again. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Consumer consumes at least 1 value. Now array cell at tail is null.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Producer A resumes, stores the value, because the correctness condition is met. (the flip side of exception throw - the inconsistency went unobserved)</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Done. Now Consumer will not be able to access array cell with the value Producer A has just stored, and will wait indefinitely. You may need to track head to see why that is the case.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Alex</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 28 Dec 2022, 12:50 Robert Engels, <<a href="mailto:rengels@ix.netcom.com" class="">rengels@ix.netcom.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="">
Alex,<br class="">
<br class="">
You write:<br class="">
<br class="">
> I won't go into detail, but the producers not synchronizing between themselves leads to hangs.<br class="">
<br class="">
<br class="">
Can you explain? Im fairly certain the producers do valid synchronization. As I said, using native threads the code runs to completion fine <br class="">
<br class="">
Now that I understand the issue I was able to reproduce the issue very simply without any queues. See SpinTest posted to the project. The “starver” thread fails to make any progress - this should not be possible is Thread.yield() was fair. <br class="">
<br class="">
> On Dec 27, 2022, at 11:15 PM, Alex Otenko <<a href="mailto:oleksandr.otenko@gmail.com" target="_blank" rel="noreferrer" class="">oleksandr.otenko@gmail.com</a>> wrote:<br class="">
> I won't go into detail, but the producers not synchronizing between themselves leads to hangs.<br class="">
</blockquote></div>
</div></blockquote></div><br class=""></div></div></body></html>