Loom and high performance networking

Tue Aug 13 21:11:17 UTC 2024

It’s been a long time since I’ve looked at the ForkJoinPool code and it appears to have become way more complex than I remember.

Can someone point me to the code area where after a task completes the CarrierThread/ForkJoinThread tries to get more work? Is there any spin loop here at all?

My new hypothesis is that we enough parallelism the task completes but there is no waiting work, so it parks. And the park/unpark is way more expensive than the time until the poller enqueues another VT as read - so with less parallelism there is a higher chance of work being available - and thus limits the number of park/unpark cycles - improving the overall performance.

I would think a queue like this should spin at least as long as the expected park/unpark cost (time).

> On Aug 13, 2024, at 10:34 AM, robert engels <robaho at icloud.com> wrote:
> 
> I did. It didn’t make any difference. I checked the thread dump as well and the extras were created. 
> 
> Surprised that lowering the priority didn’t help - so now I need to think about other options. It feels like something when the carriers can use all the cores that the poller is prevented from running - like some sort of lock being held by the carrier/vt and do it thrashes around until it eventually gets a chance. 
> 
>> On Aug 13, 2024, at 10:26 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>> 
>> On 13/08/2024 15:59, robert engels wrote:
>>> Surprisingly, lowering the priority of the carrier threads did not result in the same performance gains as reducing the parallelism.
>>> 
>> Did you do any experiments with -Djdk.readPollers=2 or -Djdk.readPollers=4 to remove contention from the kqueue from the picture.
>> 
>> -Alan