Experience using virtual threads in EA 23-loom+4-102

Sat Jul 6 20:38:16 UTC 2024

> On 6 Jul 2024, at 19:20, Matthew Swift <matthew.swift at gmail.com> wrote:
> 
> Thanks everyone for the helpful responses and continued discussion.
> 
> Returning to my original message about high tail latencies when using virtual threads compared with platform threads, Viktor's response, in particular, prompted me to question whether I was missing something obvious. I fully agree that the high tail latencies are due to over saturation, but I still wanted to understand why the tail latencies were an order of magnitude worse when using virtual threads.
> 
> I'm pleased to report (and slightly embarrassed!) that the cause of the bigger tail latencies is slightly more banal than subtle scheduling characteristics of the FJ pool, etc.
> 
> As a reminder, the test involved a client sustaining 120[*] concurrent database transactions, each touching and locking over 400 DB records. The fixed size platform thread pool had 40 threads which had the effect of limiting concurrency to 40 simultaneous transactions. I naively swapped the platform thread pool for a virtual thread pool, which had the effect of removing the concurrency limits, allowing all 120 transactions to run concurrently and with a corresponding increase in lock thrashing / convoys. To test this I adjusted the platform thread pool to 120 threads and observed exactly the same high tail latencies that I saw with virtual threads. Conversely, the high tail latencies were reduced by an order of magnitude by throttling the writes using a Semaphore. In fact, I was able to reduce the P99.999 latency from over 10 seconds to 200ms with 20 permits, with no impact on P50. 
> 
> What's the conclusion here? In hindsight, the guidance in the section "Do not pool virtual threads" of JEP444 is pretty clear about using Semaphores for limiting concurrent access to limited resources. I think that the platform thread pool that we were using was using a small number of threads (e.g. 2 x #CPUs), as it was optimized for a mix of read txns (CPU bound, minimal locking) and write txns (low latency SSD IO, short lived record level locks), rather than high latency network IO. Thus, coincidentally, we never really observed significant lock thrashing related tail latencies and, therefore, didn't think we needed to limit concurrent access for writes.
> 
> Cheers,
> Matt
> [*] I know virtual threads are intended for use cases involving thousands of concurrent requests. We see virtual threads as an opportunity for implementing features that may block threads for significant periods of time, such as fine grained rate limiting, distributed transactions and offloading password hashing to external microservices. Naturally, there will be a corresponding very significant increase in the number of concurrent requests. However, core functionality still needs to perform well and degrade gracefully, hence the test described in this thread.

Very good. Thank you very much for your experience reports. Even reporting on such a “user mistake” is helpful because this way we know what pitfalls may trip people.

— Ron