Loom and high performance networking

Robert Engels robaho at icloud.com
Tue Aug 13 21:21:12 UTC 2024


Actually, it seems the spin was taken out sometime after JDK-21. Anyone know why? (Maybe it is not, but the awaitWork is far more complex with no spins variable and little documentation).

> On Aug 13, 2024, at 4:13 PM, Robert Engels <robaho at icloud.com> wrote:
> 
> I found it. It is ForkJoinPool::awaitWork() and it does appear to spin.
> 
>> On Aug 13, 2024, at 4:11 PM, Robert Engels <robaho at icloud.com> wrote:
>> 
>> It’s been a long time since I’ve looked at the ForkJoinPool code and it appears to have become way more complex than I remember.
>> 
>> Can someone point me to the code area where after a task completes the CarrierThread/ForkJoinThread tries to get more work? Is there any spin loop here at all?
>> 
>> My new hypothesis is that we enough parallelism the task completes but there is no waiting work, so it parks. And the park/unpark is way more expensive than the time until the poller enqueues another VT as read - so with less parallelism there is a higher chance of work being available - and thus limits the number of park/unpark cycles - improving the overall performance.
>> 
>> I would think a queue like this should spin at least as long as the expected park/unpark cost (time).
>> 
>>> On Aug 13, 2024, at 10:34 AM, robert engels <robaho at icloud.com> wrote:
>>> 
>>> I did. It didn’t make any difference. I checked the thread dump as well and the extras were created. 
>>> 
>>> Surprised that lowering the priority didn’t help - so now I need to think about other options. It feels like something when the carriers can use all the cores that the poller is prevented from running - like some sort of lock being held by the carrier/vt and do it thrashes around until it eventually gets a chance. 
>>> 
>>>> On Aug 13, 2024, at 10:26 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>>>> 
>>>> On 13/08/2024 15:59, robert engels wrote:
>>>>> Surprisingly, lowering the priority of the carrier threads did not result in the same performance gains as reducing the parallelism.
>>>>> 
>>>> Did you do any experiments with -Djdk.readPollers=2 or -Djdk.readPollers=4 to remove contention from the kqueue from the picture.
>>>> 
>>>> -Alan
>> 
> 



More information about the loom-dev mailing list