Project Loom VirtualThreads hang
Eric Kolotyluk
eric at kolotyluk.net
Wed Dec 28 01:28:48 UTC 2022
My 2 cents...
Right, Virtual Threads are not preemptable, and as explained by Ron
previously, it would be hard to make them preemptable...
If a thread spins, this can be a problem, I don't know that "all other
threads will be blocked." Not sure why it's assumed that fork-join is
not stealing work...
My intuition would be that Thread.yield() should move the current thread
to the end of the run queue, as I have often used this in the past to
serialize operations and achieve similar effects. If it does not move
the current thread to the end of the queue, I would love to know why.
Cheers, Eric
On 2022-12-27 4:44 p.m., Robert Engels wrote:
> Bummer. That is a really significant limitation.
>
> At a minimum it seems Thread.yield() should put the calling thread at
> the end of the run queue.
>
> It seems this boils down to Loom not having a sense of “fair” thread
> scheduling. But I also am not sure that is the whole story. Even with
> far less runnable virtual threads than carrier threads the virtual
> thread seems stuck in runnable but never runs.
>
>> On Dec 27, 2022, at 6:27 PM, Dr Heinz M. Kabutz
>> <heinz at javaspecialists.eu> wrote:
>>
>>
>> Correct.
>>
>> On Wed, 28 Dec 2022 at 00:54, robert engels <rengels at ix.netcom.com>
>> wrote:
>>
>> Further diagnosis seems to show that virtual threads are not
>> preemptible - and it seems that the fork-join pool is not
>> stealing work, so if one thread spins - all other threads will be
>> blocked.
>>
>> Does this sound reasonable? If so, this seems like a significant
>> limitation which will cause all sorts of spin/lock-free code to
>> fail under virtual threads.
>>
>>
>>> On Dec 27, 2022, at 4:27 PM, robert engels
>>> <rengels at ix.netcom.com> wrote:
>>>
>>> Hi devs,
>>>
>>> First,
>>>
>>> Thanks for this amazing work!!! It literally solves the only
>>> remaining problem Java had.
>>>
>>> Sorry for the long email.
>>>
>>> I have been very excited to test-drive Project Loom in JDK19. I
>>> have extensive experience in highly concurrent systems/HFT/HPC,
>>> so I usually :) know what I am doing.
>>>
>>> For the easiest test, I took a highly threaded (connection
>>> based) server based system (Java port of Go’s nats.io
>>> <http://nats.io/> message broker), and converted the threads to
>>> virtual threads. The project (jnatsd) is available here
>>> <https://github.com/robaho/jnatsd>. The ‘master’ branch runs
>>> very well with excellent performance, but I thought switching to
>>> virtual threads might be able to improve things over using async
>>> IO, channels, etc. (I have a branch for this that works as well,
>>> but it is much more complex, and didn’t provide a huge
>>> performance benefit)/
>>>
>>> There are two branches ’simple_virtual_threads’ and
>>> ‘virtual_threads’.
>>>
>>> In the former, it is literally a 2 line change to enable the
>>> virtual threads but it doesn’t work. I narrowed it down the
>>> issue that LockSupport.unpark(thread) does not work
>>> consistently. At some point, the virtual thread is never
>>> scheduled again. I enabled the debug options and I see that the
>>> the virtual thread is in:
>>>
>>> yield0:365, Continuation (jdk.internal.vm)
>>> yield:357, Continuation (jdk.internal.vm)
>>> yieldContinuation:370, VirtualThread (java.lang)
>>> park:499, VirtualThread (java.lang)
>>> parkVirtualThread:2606, System$2 (java.lang)
>>> park:54, VirtualThreads (jdk.internal.misc)
>>> park:369, LockSupport (java.util.concurrent.locks)
>>> run:88, Connection$ConnectionWriter (com.robaho.jnatsd)
>>> run:287, VirtualThread (java.lang)
>>> lambda$new$0:174, VirtualThread$VThreadContinuation (java.lang)
>>> run:-1, VirtualThread$VThreadContinuation$$Lambda$50/0x0000000801065670 (java.lang)
>>> enter0:327, Continuation (jdk.internal.vm)
>>> enter:320, Continuation (jdk.internal.vm)
>>> The instance state is:
>>>
>>> this = {VirtualThread$VThreadContinuation at 1775}
>>> target = {VirtualThread$VThreadContinuation$lambda at 1777}
>>> arg$1 = {VirtualThread at 1699}
>>> scheduler = {ForkJoinPool at 1781}
>>> cont = {VirtualThread$VThreadContinuation at 1775}
>>> runContinuation = {VirtualThread$lambda at 1782}
>>> state = 2
>>> parkPermit = true
>>> carrierThread = null
>>> termination = null
>>> eetop = 0
>>> tid = 76
>>> name = ""
>>> interrupted = false
>>> contextClassLoader = {ClassLoaders$AppClassLoader at 1784}
>>> inheritedAccessControlContext = {AccessControlContext at 1785}
>>> holder = null
>>> threadLocals = null
>>> inheritableThreadLocals = null
>>> extentLocalBindings = null
>>> interruptLock = {Object at 1786}
>>> parkBlocker = null
>>> nioBlocker = null
>>> Thread.cont = null
>>> uncaughtExceptionHandler = null
>>> threadLocalRandomSeed = 0
>>> threadLocalRandomProbe = 0
>>> threadLocalRandomSecondarySeed = 0
>>> container =
>>> {ThreadContainers$RootContainer$CountingRootContainer at 1787}
>>> headStackableScopes = null
>>> arg$2 = {Connection$ConnectionWriter at 1780}
>>> scope = {ContinuationScope at 1776}
>>> parent = null
>>> child = null
>>> tail = {StackChunk at 1778}
>>> done = false
>>> mounted = false
>>> yieldInfo = null
>>> preempted = false
>>> extentLocalCache = null
>>> scope = {ContinuationScope at 1776}
>>> child = null
>>>
>>> As you see in the above, the parkPermit is true, but it never
>>> runs again.
>>>
>>> In the latter branch, ‘virtual_threads’, I changed the lock-free
>>> RingBuffer class to use simple synchronized primitives - under
>>> the assumption that with virtual threads lock/wait/notify should
>>> be highly efficient. It worked, but it was nearly 2x slower than
>>> the original thread based lock-free implementation. So, I added
>>> a ’spin loop’ in the RingBuffer methods. This code is completely
>>> optional and can be no-op’d, and I was able to increase
>>> performance to above that of the Thread based version.
>>>
>>> I dug a little deeper, and decided that using Thread.yield()
>>> should be even more efficient than LockSupport.parkNanos(1) -
>>> problem is that changing that simple line brings back the hangs.
>>> I think there is very little semantic difference between
>>> LockSupport.parkNanos(1) and Thread.yield() but the latter
>>> should avoid any timer scheduling. The RingBuffer code there is
>>> fairly trivial.
>>>
>>> So, before I dig deeper, is this a known issue that
>>> Thread.yield() does not work as expected? Is it is known issue
>>> that LockSupport.unpark() fails to reschedule threads?
>>>
>>> Is it possible because the VirtualThreads do not implement the
>>> Java memory model properly?
>>>
>>> Any ideas how to further diagnose?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Dr Heinz M. Kabutz (PhD CompSci)
>> Author of "The Java(tm) Specialists' Newsletter"
>> Sun/Oracle Java Champion
>> JavaOne Rockstar Speaker
>> http://www.javaspecialists.eu
>> Tel: +30 69 75 595 262
>> Skype: kabutz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221227/0c6eb92a/attachment-0001.htm>
More information about the loom-dev
mailing list