Native methods and virtual threads

Fri Jul 14 16:51:33 UTC 2023

That’s something we actually struggled with (and reached neither unanimity nor strong confidence in yet). The problem with compensating by default is that it is unlikely to work for the actual bread-and-butter of virtual threads. If your common IO happens to occur when threads are pinned then at best you’ll quickly get an OOME, and at worst you’ll get strange performance behaviour that doesn’t make detecting a problem easier. And that’s the point: if your most common operations block OS threads then there’s nothing we can do to raise your throughput beyond what platform thread pools do. You have an incompatibility with virtual threads that you must resolve somehow.

The question was, therefore, what would be the behaviour that would make detecting such problems easier? I don’t know if we have the right answer, nor am I certain that we won’t change it, but that was the question that guided us.

— Ron

> On 14 Jul 2023, at 17:31, Alejandro Revilla <apr at jpos.org> wrote:
> 
> I'm tempted to chime-in with a comment/idea that I'm not sure if you have considered.
> 
> This is not specific to Native methods, but somehow related to the problem you are addressing here.
> 
> TL;DR go to the last paragraph.
> 
> While working with Loom in our project (jPOS, payments related stuff), we immediately struggled with a few virtual threads consuming all available platform threads. This issue had us perplexed for a couple of days until we identified that Java Flight Recorder's `jdk.VirtualThreadPinned` could assist us and resolving the remaining code issues became significantly easier.
> 
> Loom is an absolute game-changer for our specific needs allowing us to move away from manually crafted continuations and reactive programming (we deal with a large number of in-flight transactions, sometimes in the tens of thousands, usually waiting for remote issuers, HSMs, databases, etc.).
> 
> While we are fixing all library related synchronization issues, a typical application has user code that may not behave as well, so what we have in mind is to detect if we are struggling in terms of response time so that we can beef-up our TransactionManager's sessions with more platform threads (instead of Virtual ones). In our case, when we receive a request, we handle it in a thread, doesn't matter if it's platform or real, so the idea is that once we get a request, if we find the system is "OK" (so to speak), we offload it to a VirtualThread, otherwise, to a platform one (raising alarms so that we can investigate why we had to create them).
> 
> After this lengthy introduction (my apologies), I have a suggestion to make. During this initial transition period, as many libraries adapt to Loom, wouldn't it be beneficial for the pool of platform threads used by Loom to be optionally dynamic? This would allow it to expand to thousands of platform threads if needed, which would reassure early adopters that, in a worst-case scenario, the system would operate as it did in the past using platform threads.
> 
> My 2c. Thank you for reading.
> 
> --
> @apr
> 
> On Fri, Jul 14, 2023 at 12:23 PM Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> On 14/07/2023 16:19, Brian S O'Neill wrote:
> > This sounds reasonable, but the current "wait and see" model is 
> > inconsistent. JEP 444 says, use JFR and we'll get back to you. Was 
> > this process followed when the decision was made to add the Blocker 
> > class, or is it a premature optimization that should be removed? 
> > Likewise, the FFM API has the isTrivial option which adds even more 
> > risks. What was the process for deciding that this optimization was 
> > necessary? 
> 
> I can speak to the latter, which was added to mitigate cases where users 
> migrate away from critical JNI to do low-latency native calls.
> 
> (We might also look into hints to enable pinning of heap objects for 
> very same reasons).
> 
> Maurizio
> 
>