EA builds with changes to object monitor implementation to avoid pinning with virtual threads

masoud parvari masoud.parvari at gmail.com
Wed Feb 28 13:17:59 UTC 2024


I modified the test a bit, removed the platform thread from the picture for
now and managed to reproduce hanging again.

public class VirtualThreadHangingTest {

    public static void main(String[] args) throws InterruptedException {
        int size = 1000;
        Thread start = Thread.ofVirtual().start(() -> {
            try {
                test(size);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        });
        start.join();
    }

    private static void test(int size) throws InterruptedException {
        ExecutorService executorService =
Executors.newVirtualThreadPerTaskExecutor();
        CountDownLatch latch = new CountDownLatch(size);
        AtomicBoolean notify = new AtomicBoolean();
        Object object = new Object();
        AtomicInteger waiting = new AtomicInteger();
        AtomicInteger started = new AtomicInteger();
        long start = System.currentTimeMillis();
        for (int i = 0; i < size; i++) {
            executorService.submit(() -> {
                started.getAndIncrement();
                synchronized (object) {
                    try {
                        if (!notify.get()) {
                            waiting.getAndIncrement();
                            object.wait();
                        }
                    } catch (InterruptedException e) {
                        //do nothing;
                    }
                }
                latch.countDown();
            });
        }
        //expensive operation before notify()
        Thread.sleep(5000);
        synchronized (object) {
            notify.set(true);
            object.notifyAll();
        }
        System.out.println(String.format("notified. started: %s , waiting:
%s", started.get(), waiting.get()));
        latch.await();
        long duration = System.currentTimeMillis() - start;
        System.out.println("took " + duration + " milliseconds");
    }
}

For the size 1000, it hangs most of the time. When it hangs, most of the
time I see the line "notified. started ...", but sometimes I don't even see
that line meaning the thread who is running the test method , doesn't wake
up  from sleep at all in some scenarios.

The wait set is also always a lot lower than 256 which I don't understand
why. I thought based on the compensation algorithm , I should always see up
to 256 threads waiting. Does this mean that releasing the monitor on
object.wait() doesn't always wake up threads that are competing to enter
the monitor? If you run the same code with platform thread, the wait set is
always equal to size.

I even observed hanging on size 200 for example, but it happened way less
than size 1000. So I don't know what is the minimum number that can trigger
hanging here.

Btw is there any way to override the 256 hard limit of managed blocking
through a VM argument or something?

On Wed, Feb 28, 2024 at 11:17 AM Alan Bateman <Alan.Bateman at oracle.com>
wrote:

> On 28/02/2024 05:31, Patricio Chilano Mateo wrote:
> > The reason why you see this difference is because with platform
> > threads all 1000 threads will be started and will wait simultaneously,
> > as releasing the monitor on wait() allows another thread to enter it
> > and wait() too. For virtual threads, since we still pin on
> > Object.wait(), we can only run as many threads as workers in the FJP.
> > So the code will behave like running in batches, where after the first
> > batch finishes waiting, the next one will run. We actually compensate
> > on Object.wait() until a max pool size of 256, which will give you
> > around 4 batches, so that explains the 4x-5x you are seeing. If you
> > increase the wait time you will see this more clearly. This behavior
> > will be fixed once we remove pinning on Object.wait(). As for the
> > difference between virtual threads themselves against jdk21 I see a
> > difference too. I'll need to investigate a bit more to check exactly
> > where the overhead is coming from.
>
> It's an unusual test. One thing that would be interesting to look at
> what actual wait duration is. The creation of 1000 platform threads
> takes some time and a lot of the threads will already be in the wait set
> by the time that the main thread enters the monitor and notifies. This
> will have the effect that many, maybe all, of the platform threads don't
> actually wait 100ms. With JDK 21 and the Loom EA builds then at most 256
> threads will be in the wait set. The other thing is that the virtual
> threads contending on the monitor enter will unmount in the Loom EA
> builds whereas the pinning at contented monitorenter with JDK 21 means
> that it is no rescheduling whatsoever going.
>
> -Alan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240228/8a1535ca/attachment.htm>


More information about the loom-dev mailing list