<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">I’ll try to create a simpler standalone test case without the dependencies. It may be tough because I am guessing that the async/socket IO layer plus the multi threaded nature is what is causing the problem - something internal related to the multiple schedulers. </div><div dir="ltr"><br><blockquote type="cite">On Dec 27, 2022, at 7:29 PM, Eric Kolotyluk <eric@kolotyluk.net> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>My 2 cents...</p>
<p>Right, Virtual Threads are not preemptable, and as explained by
Ron previously, it would be hard to make them preemptable...<br>
</p>
<p>If a thread spins, this can be a problem, I don't know that "all
other threads will be blocked." Not sure why it's assumed that
fork-join is not stealing work...<br>
</p>
<p>My intuition would be that Thread.yield() should move the current
thread to the end of the run queue, as I have often used this in
the past to serialize operations and achieve similar effects. If
it does not move the current thread to the end of the queue, I
would love to know why.</p>
<p>Cheers, Eric<br>
</p>
<div class="moz-cite-prefix">On 2022-12-27 4:44 p.m., Robert Engels
wrote:<br>
</div>
<blockquote type="cite" cite="mid:3F84261B-7ED6-42EE-A5F2-2F98E050FC13@ix.netcom.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Bummer. That is a really significant limitation. </div>
<div dir="ltr"><br>
</div>
<div dir="ltr">At a minimum it seems Thread.yield() should put the
calling thread at the end of the run queue. </div>
<div dir="ltr"><br>
</div>
<div dir="ltr">It seems this boils down to Loom not having a sense
of “fair” thread scheduling. But I also am not sure that is the
whole story. Even with far less runnable virtual threads than
carrier threads the virtual thread seems stuck in runnable but
never runs. </div>
<div dir="ltr"><br>
<blockquote type="cite">On Dec 27, 2022, at 6:27 PM, Dr Heinz M.
Kabutz <a class="moz-txt-link-rfc2396E" href="mailto:heinz@javaspecialists.eu"><heinz@javaspecialists.eu></a> wrote:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="auto">Correct. </div>
<div><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 28 Dec 2022 at
00:54, robert engels <<a href="mailto:rengels@ix.netcom.com" moz-do-not-send="true" class="moz-txt-link-freetext">rengels@ix.netcom.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word;line-break:after-white-space">Further
diagnosis seems to show that virtual threads are not
preemptible - and it seems that the fork-join pool is
not stealing work, so if one thread spins - all other
threads will be blocked.
<div><br>
</div>
<div>Does this sound reasonable? If so, this seems
like a significant limitation which will cause all
sorts of spin/lock-free code to fail under virtual
threads.</div>
</div>
<div style="word-wrap:break-word;line-break:after-white-space">
<div><br>
<div><br>
<blockquote type="cite">
<div>On Dec 27, 2022, at 4:27 PM, robert engels
<<a href="mailto:rengels@ix.netcom.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">rengels@ix.netcom.com</a>>
wrote:</div>
<br>
<div>
<div style="word-wrap:break-word;line-break:after-white-space">Hi
devs,
<div><br>
</div>
<div>First, <br>
<div><br>
</div>
<span>Thanks for this amazing work!!! It
literally solves the only remaining
problem Java had.</span></div>
<div><span><br>
</span></div>
<div><span>Sorry for the long email.</span></div>
<div><font><span><br>
</span></font>
<div>I have been very excited to
test-drive Project Loom in JDK19. I have
extensive experience in highly
concurrent systems/HFT/HPC, so I usually
:) know what I am doing.</div>
<div><br>
</div>
<div>For the easiest test, I took a highly
threaded (connection based) server based
system (Java port of Go’s <a href="http://nats.io/" target="_blank" moz-do-not-send="true">nats.io</a> message
broker), and converted the threads to
virtual threads. The project (jnatsd) is
available <a href="https://github.com/robaho/jnatsd" target="_blank" moz-do-not-send="true">here</a>.
The ‘master’ branch runs very well with
excellent performance, but I thought
switching to virtual threads might be
able to improve things over using async
IO, channels, etc. (I have a branch for
this that works as well, but it is much
more complex, and didn’t provide a huge
performance benefit)/</div>
<div><br>
</div>
<div>There are two branches
’simple_virtual_threads’ and
‘virtual_threads’.</div>
<div><br>
</div>
<div>In the former, it is literally a 2
line change to enable the virtual
threads but it doesn’t work. I narrowed
it down the issue that
LockSupport.unpark(thread) does not work
consistently. At some point, the virtual
thread is never scheduled again. I
enabled the debug options and I see that
the the virtual thread is in:</div>
<div><br>
</div>
<div>
<pre>yield0:365, Continuation (jdk.internal.vm)
yield:357, Continuation (jdk.internal.vm)
yieldContinuation:370, VirtualThread (java.lang)
park:499, VirtualThread (java.lang)
parkVirtualThread:2606, System$2 (java.lang)
park:54, VirtualThreads (jdk.internal.misc)
park:369, LockSupport (java.util.concurrent.locks)
run:88, Connection$ConnectionWriter (com.robaho.jnatsd)
run:287, VirtualThread (java.lang)
lambda$new$0:174, VirtualThread$VThreadContinuation (java.lang)
run:-1, VirtualThread$VThreadContinuation$$Lambda$50/0x0000000801065670 (java.lang)
enter0:327, Continuation (jdk.internal.vm)
enter:320, Continuation (jdk.internal.vm)
</pre>
<div>The instance state is:</div>
</div>
<div><br>
</div>
<div>
<div>this =
{VirtualThread$VThreadContinuation@1775} </div>
<div> target =
{VirtualThread$VThreadContinuation$lambda@1777} </div>
<div> arg$1 = {VirtualThread@1699} </div>
<div> scheduler = {ForkJoinPool@1781} </div>
<div> cont =
{VirtualThread$VThreadContinuation@1775} </div>
<div> runContinuation =
{VirtualThread$lambda@1782} </div>
<div> state = 2</div>
<div> parkPermit = true</div>
<div> carrierThread = null</div>
<div> termination = null</div>
<div> eetop = 0</div>
<div> tid = 76</div>
<div> name = ""</div>
<div> interrupted = false</div>
<div> contextClassLoader =
{ClassLoaders$AppClassLoader@1784} </div>
<div> inheritedAccessControlContext =
{AccessControlContext@1785} </div>
<div> holder = null</div>
<div> threadLocals = null</div>
<div> inheritableThreadLocals = null</div>
<div> extentLocalBindings = null</div>
<div> interruptLock = {Object@1786} </div>
<div> parkBlocker = null</div>
<div> nioBlocker = null</div>
<div> Thread.cont = null</div>
<div> uncaughtExceptionHandler = null</div>
<div> threadLocalRandomSeed = 0</div>
<div> threadLocalRandomProbe = 0</div>
<div> threadLocalRandomSecondarySeed =
0</div>
<div> container =
{ThreadContainers$RootContainer$CountingRootContainer@1787} </div>
<div> headStackableScopes = null</div>
<div> arg$2 =
{Connection$ConnectionWriter@1780} </div>
<div> scope = {ContinuationScope@1776} </div>
<div> parent = null</div>
<div> child = null</div>
<div> tail = {StackChunk@1778} </div>
<div> done = false</div>
<div> mounted = false</div>
<div> yieldInfo = null</div>
<div> preempted = false</div>
<div> extentLocalCache = null</div>
<div>scope = {ContinuationScope@1776} </div>
<div>child = null</div>
</div>
<div><br>
</div>
<div>As you see in the above, the
parkPermit is true, but it never runs
again.</div>
<div><br>
</div>
<div>In the latter branch,
‘virtual_threads’, I changed the
lock-free RingBuffer class to use simple
synchronized primitives - under the
assumption that with virtual threads
lock/wait/notify should be highly
efficient. It worked, but it was nearly
2x slower than the original thread based
lock-free implementation. So, I added a
’spin loop’ in the RingBuffer methods.
This code is completely optional and can
be no-op’d, and I was able to increase
performance to above that of the Thread
based version.</div>
<div><br>
</div>
<div>I dug a little deeper, and decided
that using Thread.yield() should be even
more efficient than
LockSupport.parkNanos(1) - problem is
that changing that simple line brings
back the hangs. I think there is very
little semantic difference between
LockSupport.parkNanos(1) and
Thread.yield() but the latter should
avoid any timer scheduling. The
RingBuffer code there is fairly trivial.</div>
<div><br>
</div>
<div>So, before I dig deeper, is this a
known issue that Thread.yield() does not
work as expected? Is it is known issue
that LockSupport.unpark() fails to
reschedule threads?</div>
<div><br>
</div>
<div>Is it possible because the
VirtualThreads do not implement the Java
memory model properly?</div>
<div><br>
</div>
<div>Any ideas how to further diagnose?</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
-- <br>
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Dr Heinz M. Kabutz (PhD
CompSci)<br>
Author of "The Java(tm) Specialists' Newsletter"<br>
Sun/Oracle Java Champion<br>
JavaOne Rockstar Speaker<br>
<a href="http://www.javaspecialists.eu" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://www.javaspecialists.eu</a><br>
Tel: +30 69 75 595 262<br>
Skype: kabutz<br>
</div>
</div>
</blockquote>
</blockquote>
</div></blockquote></body></html>