Experience using virtual threads in EA 23-loom+4-102

Mon Jun 24 20:12:45 UTC 2024

I still think it might be helpful to use virtual threads for all connections (simplicity!) - but when to perform the cpu intensive work like hashing, put a callable/future on a “cpu only” executor with a capped number of platform threads and join(). It should be a trivial refactor of the code.

The problem with using VT for everything is that a VT is not time-sliced, so you could quickly consume all of the carrier threads and then you make no progress on the IO (fan out) requests - which is especially bad if they are simply calling out to other servers (less bad if doing lots of local disk io).

> On Jun 24, 2024, at 12:05 PM, Matthew Swift <matthew.swift at gmail.com> wrote:
> 
> Thanks Robert.
> 
> The main issue we face with our application is that the client load can vary substantially over time. For example, we might experience a lot of CPU intensive authentication traffic (e.g. PBKDF2 hashing) in the morning, but then a lot of IO bound traffic at other times. It's hard to find the ideal number of worker threads: many threads work well for IO bound traffic, as you say, but sacrifices performance when the load is more CPU bound. On my 10 core (20 hyper threads) laptop, I observe nearly a 15-20% drop in throughput when subjecting the server to 1200 concurrent CPU bound requests, but a much smaller drop when using virtual threads:
> 
> * 10 platform threads: ~260K requests/s (note: this is too few threads for more IO bound traffic)
> * 40 platform threads: ~220K requests/s
> * 1200 platform threads: ~220K requests/s (note: this would be the equivalent of a one platform thread per request)
> * virtual threads: 252K requests/s (note: FJ pool defaults to 20 on my laptop - I didn't try disabling hyperthreading).
> 
> I find the "one size fits all" provided by virtual threads to be much easier for developers and our users alike. I don't have to worry about complex architectures involving split thread pools (one for CPU, one for IO), etc. We also have to deal with slow misbehaving clients, which has meant use of async IO and hard to debug call-back hell :-) All of this goes away with virtual threads as it will allow us to use simpler blocking network IO and a simple one thread per request design that is much more tolerant to heterogeneous traffic patterns.
> 
> It also opens up the possibility of future enhancements that would definitely shine with virtual threads as you suggest. For example, modern hashing algorithms, such as Argon2, can take hundreds of milliseconds of computation, which is simply too costly to scale horizontally in the data layer. We want to offload this to an external elastic compute service, but we could very quickly have thousands of blocked platform threads with response times this high.
> 
> Cheers,
> Matt
> 
> 
> On Fri, 21 Jun 2024 at 19:29, robert engels <rengels at ix.netcom.com <mailto:rengels at ix.netcom.com>> wrote:
> Hi,
> 
> Just an fyi, until you get into the order of 1k, 10k, etc. concurrent clients - I would expect platform threads to outperform virtual threads by quite a bit (best case be the same). Modern OS’s routinely handle thousands of active threads. (My OSX desktop with 4 true cores has nearly 5k threads running).
> 
> Also, if you can saturate your CPUs or local IO bus, adding more threads isn’t going to help. VirtualThreads shine when the request handler is fanning out to multiple remote services.
> 
> Regards,
> Robert
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240624/1a574a1a/attachment-0001.htm>