Experience using virtual threads in EA 23-loom+4-102

Fri Jun 21 17:05:18 UTC 2024

Hello again,

As promised, here is my second (shorter I hope!) email sharing feedback on
the recent Loom EA build (23-loom+4-102). If follows up on my previous
email https://mail.openjdk.org/pipermail/loom-dev/2024-June/006788.html.

I performed some experiments using the same application described in my
previous email. However, in order to properly test the improvements to
Object monitors (synchronized blocks and Object.wait()) I reverted all of
the thread-pinning related changes that I had made in order to support
virtual threads with JDK21. Specifically, I reverted the changes converting
uses of monitors to ReentrantLock.

I'm pleased to say that this EA build looks extremely promising! :-)

### Experiment #1: read stress test

* platform threads: 215K/s throughput, CPU 14% idle
* virtual threads: 235K/s throughput, CPU 5% idle.

Comment: there's a slight throughput improvement, but CPU utilization is
slightly higher too. Presumably this is due to the number of carrier
threads being closely matched to the number of CPUs (I noticed
significantly less context switching with v threads).

### Experiment #2: heavily indexed write stress test, with 40 clients

* platform threads: 9300/s throughput, CPU 27% idle
* virtual threads: 8800/s throughput, CPU 24% idle.

Comment: there is a ~5% performance degradation using virtual threads. This
is better than the degradation I observed in my previous email after
switching to ReentrantLock though.

### Experiment #3: extreme heavy indexed write stress test, with 120 clients

* platform threads: 1450/s throughput
* virtual threads: 1450/s throughput (i.e. about the same).

Comment:

This test is intended to stress the internal locking mechanisms as much as
possible and expose any pinning problems.
With JDK21 virtual threads the test would sometimes deadlock and thread
dumps would show 100+ fork join carrier threads.
This is no longer the case with the EA build. It looks really solid.

This test does expose one important difference between platform threads and
virtual threads though. Let's take a look at the response times:

Platform threads:

-------------------------------------------------------------------------------
|     Throughput    |                 Response Time                |
   |
|    (ops/second)   |                (milliseconds)                |
   |
|   recent  average |   recent  average    99.9%   99.99%  99.999% |
 err/sec |
-------------------------------------------------------------------------------
...
|   1442.6   1606.6 |   83.097   74.683   448.79   599.79   721.42 |
 0.0 |
|   1480.8   1594.0 |   81.125   75.282   442.50   599.79   721.42 |
 0.0 |

Virtual threads:

 -------------------------------------------------------------------------------
|     Throughput    |                 Response Time                |
   |
|    (ops/second)   |                (milliseconds)                |
   |
|   recent  average |   recent  average    99.9%   99.99%  99.999% |
 err/sec |
-------------------------------------------------------------------------------
...
|   1445.4   1645.3 |   81.375   72.623  3170.89  4798.28  8925.48 |
 0.0 |
|   1442.2   1625.0 |   81.047   73.371  3154.12  4798.28  6106.91 |
 0.0 |

The outliers with virtual threads are much much higher. Could this be due
to potential starvation when rescheduling virtual threads in the fork join
pool?

Cheers,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240621/99bb43ef/attachment-0001.htm>