Loom and high performance networking

Tue Aug 13 21:13:30 UTC 2024

I found it. It is ForkJoinPool::awaitWork() and it does appear to spin.

> On Aug 13, 2024, at 4:11 PM, Robert Engels <robaho at icloud.com> wrote:
> 
> It’s been a long time since I’ve looked at the ForkJoinPool code and it appears to have become way more complex than I remember.
> 
> Can someone point me to the code area where after a task completes the CarrierThread/ForkJoinThread tries to get more work? Is there any spin loop here at all?
> 
> My new hypothesis is that we enough parallelism the task completes but there is no waiting work, so it parks. And the park/unpark is way more expensive than the time until the poller enqueues another VT as read - so with less parallelism there is a higher chance of work being available - and thus limits the number of park/unpark cycles - improving the overall performance.
> 
> I would think a queue like this should spin at least as long as the expected park/unpark cost (time).
> 
>> On Aug 13, 2024, at 10:34 AM, robert engels <robaho at icloud.com> wrote:
>> 
>> I did. It didn’t make any difference. I checked the thread dump as well and the extras were created. 
>> 
>> Surprised that lowering the priority didn’t help - so now I need to think about other options. It feels like something when the carriers can use all the cores that the poller is prevented from running - like some sort of lock being held by the carrier/vt and do it thrashes around until it eventually gets a chance. 
>> 
>>> On Aug 13, 2024, at 10:26 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>>> 
>>> On 13/08/2024 15:59, robert engels wrote:
>>>> Surprisingly, lowering the priority of the carrier threads did not result in the same performance gains as reducing the parallelism.
>>>> 
>>> Did you do any experiments with -Djdk.readPollers=2 or -Djdk.readPollers=4 to remove contention from the kqueue from the picture.
>>> 
>>> -Alan
>