Project Loom VirtualThreads hang

Wed Dec 28 01:28:48 UTC 2022

My 2 cents...

Right, Virtual Threads are not preemptable, and as explained by Ron 
previously, it would be hard to make them preemptable...

If a thread spins, this can be a problem, I don't know that "all other 
threads will be blocked." Not sure why it's assumed that fork-join is 
not stealing work...

My intuition would be that Thread.yield() should move the current thread 
to the end of the run queue, as I have often used this in the past to 
serialize operations and achieve similar effects. If it does not move 
the current thread to the end of the queue, I would love to know why.

Cheers, Eric

On 2022-12-27 4:44 p.m., Robert Engels wrote:
> Bummer. That is a really significant limitation.
>
> At a minimum it seems Thread.yield() should put the calling thread at 
> the end of the run queue.
>
> It seems this boils down to Loom not having a sense of “fair” thread 
> scheduling. But I also am not sure that is the whole story. Even with 
> far less runnable virtual threads than carrier threads the virtual 
> thread seems stuck in runnable but never runs.
>
>> On Dec 27, 2022, at 6:27 PM, Dr Heinz M. Kabutz 
>> <heinz at javaspecialists.eu> wrote:
>>
>> 
>> Correct.
>>
>> On Wed, 28 Dec 2022 at 00:54, robert engels <rengels at ix.netcom.com> 
>> wrote:
>>
>>     Further diagnosis seems to show that virtual threads are not
>>     preemptible - and it seems that the fork-join pool is not
>>     stealing work, so if one thread spins - all other threads will be
>>     blocked.
>>
>>     Does this sound reasonable? If so, this seems like a significant
>>     limitation which will cause all sorts of spin/lock-free code to
>>     fail under virtual threads.
>>
>>
>>>     On Dec 27, 2022, at 4:27 PM, robert engels
>>>     <rengels at ix.netcom.com> wrote:
>>>
>>>     Hi devs,
>>>
>>>     First,
>>>
>>>     Thanks for this amazing work!!! It literally solves the only
>>>     remaining problem Java had.
>>>
>>>     Sorry for the long email.
>>>
>>>     I have been very excited to test-drive Project Loom in JDK19. I
>>>     have extensive experience in highly concurrent systems/HFT/HPC,
>>>     so I usually :) know what I am doing.
>>>
>>>     For the easiest test, I took a highly threaded (connection
>>>     based) server based system (Java port of Go’s nats.io
>>>     <http://nats.io/> message broker), and converted the threads to
>>>     virtual threads. The project (jnatsd) is available here
>>>     <https://github.com/robaho/jnatsd>. The ‘master’ branch runs
>>>     very well with excellent performance, but I thought switching to
>>>     virtual threads might be able to improve things over using async
>>>     IO, channels, etc. (I have a branch for this that works as well,
>>>     but it is much more complex, and didn’t provide a huge
>>>     performance benefit)/
>>>
>>>     There are two branches ’simple_virtual_threads’ and
>>>     ‘virtual_threads’.
>>>
>>>     In the former, it is literally a 2 line change to enable the
>>>     virtual threads but it doesn’t work. I narrowed it down the
>>>     issue that LockSupport.unpark(thread) does not work
>>>     consistently. At some point, the virtual thread is never
>>>     scheduled again. I enabled the debug options and I see that the
>>>     the virtual thread is in:
>>>
>>>     yield0:365, Continuation (jdk.internal.vm)
>>>     yield:357, Continuation (jdk.internal.vm)
>>>     yieldContinuation:370, VirtualThread (java.lang)
>>>     park:499, VirtualThread (java.lang)
>>>     parkVirtualThread:2606, System$2 (java.lang)
>>>     park:54, VirtualThreads (jdk.internal.misc)
>>>     park:369, LockSupport (java.util.concurrent.locks)
>>>     run:88, Connection$ConnectionWriter (com.robaho.jnatsd)
>>>     run:287, VirtualThread (java.lang)
>>>     lambda$new$0:174, VirtualThread$VThreadContinuation (java.lang)
>>>     run:-1, VirtualThread$VThreadContinuation$$Lambda$50/0x0000000801065670 (java.lang)
>>>     enter0:327, Continuation (jdk.internal.vm)
>>>     enter:320, Continuation (jdk.internal.vm)
>>>     The instance state is:
>>>
>>>     this = {VirtualThread$VThreadContinuation at 1775}
>>>      target = {VirtualThread$VThreadContinuation$lambda at 1777}
>>>       arg$1 = {VirtualThread at 1699}
>>>        scheduler = {ForkJoinPool at 1781}
>>>        cont = {VirtualThread$VThreadContinuation at 1775}
>>>        runContinuation = {VirtualThread$lambda at 1782}
>>>        state = 2
>>>        parkPermit = true
>>>        carrierThread = null
>>>        termination = null
>>>        eetop = 0
>>>        tid = 76
>>>        name = ""
>>>        interrupted = false
>>>        contextClassLoader = {ClassLoaders$AppClassLoader at 1784}
>>>        inheritedAccessControlContext = {AccessControlContext at 1785}
>>>        holder = null
>>>        threadLocals = null
>>>        inheritableThreadLocals = null
>>>        extentLocalBindings = null
>>>        interruptLock = {Object at 1786}
>>>        parkBlocker = null
>>>        nioBlocker = null
>>>        Thread.cont = null
>>>        uncaughtExceptionHandler = null
>>>        threadLocalRandomSeed = 0
>>>        threadLocalRandomProbe = 0
>>>        threadLocalRandomSecondarySeed = 0
>>>        container =
>>>     {ThreadContainers$RootContainer$CountingRootContainer at 1787}
>>>        headStackableScopes = null
>>>       arg$2 = {Connection$ConnectionWriter at 1780}
>>>      scope = {ContinuationScope at 1776}
>>>      parent = null
>>>      child = null
>>>      tail = {StackChunk at 1778}
>>>      done = false
>>>      mounted = false
>>>      yieldInfo = null
>>>      preempted = false
>>>      extentLocalCache = null
>>>     scope = {ContinuationScope at 1776}
>>>     child = null
>>>
>>>     As you see in the above, the parkPermit is true, but it never
>>>     runs again.
>>>
>>>     In the latter branch, ‘virtual_threads’, I changed the lock-free
>>>     RingBuffer class to use simple synchronized primitives - under
>>>     the assumption that with virtual threads lock/wait/notify should
>>>     be highly efficient. It worked, but it was nearly 2x slower than
>>>     the original thread based lock-free implementation. So, I added
>>>     a ’spin loop’ in the RingBuffer methods. This code is completely
>>>     optional and can be no-op’d, and I was able to increase
>>>     performance to above that of the Thread based version.
>>>
>>>     I dug a little deeper, and decided that using Thread.yield()
>>>     should be even more efficient than LockSupport.parkNanos(1) -
>>>     problem is that changing that simple line brings back the hangs.
>>>     I think there is very little semantic difference between
>>>     LockSupport.parkNanos(1) and Thread.yield() but the latter
>>>     should avoid any timer scheduling. The RingBuffer code there is
>>>     fairly trivial.
>>>
>>>     So, before I dig deeper, is this a known issue that
>>>     Thread.yield() does not work as expected? Is it is known issue
>>>     that LockSupport.unpark() fails to reschedule threads?
>>>
>>>     Is it possible because the VirtualThreads do not implement the
>>>     Java memory model properly?
>>>
>>>     Any ideas how to further diagnose?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> Dr Heinz M. Kabutz (PhD CompSci)
>> Author of "The Java(tm) Specialists' Newsletter"
>> Sun/Oracle Java Champion
>> JavaOne Rockstar Speaker
>> http://www.javaspecialists.eu
>> Tel: +30 69 75 595 262
>> Skype: kabutz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221227/0c6eb92a/attachment-0001.htm>