Loom and high performance networking
Robert Engels
robaho at icloud.com
Tue Aug 13 21:21:12 UTC 2024
Actually, it seems the spin was taken out sometime after JDK-21. Anyone know why? (Maybe it is not, but the awaitWork is far more complex with no spins variable and little documentation).
> On Aug 13, 2024, at 4:13 PM, Robert Engels <robaho at icloud.com> wrote:
>
> I found it. It is ForkJoinPool::awaitWork() and it does appear to spin.
>
>> On Aug 13, 2024, at 4:11 PM, Robert Engels <robaho at icloud.com> wrote:
>>
>> It’s been a long time since I’ve looked at the ForkJoinPool code and it appears to have become way more complex than I remember.
>>
>> Can someone point me to the code area where after a task completes the CarrierThread/ForkJoinThread tries to get more work? Is there any spin loop here at all?
>>
>> My new hypothesis is that we enough parallelism the task completes but there is no waiting work, so it parks. And the park/unpark is way more expensive than the time until the poller enqueues another VT as read - so with less parallelism there is a higher chance of work being available - and thus limits the number of park/unpark cycles - improving the overall performance.
>>
>> I would think a queue like this should spin at least as long as the expected park/unpark cost (time).
>>
>>> On Aug 13, 2024, at 10:34 AM, robert engels <robaho at icloud.com> wrote:
>>>
>>> I did. It didn’t make any difference. I checked the thread dump as well and the extras were created.
>>>
>>> Surprised that lowering the priority didn’t help - so now I need to think about other options. It feels like something when the carriers can use all the cores that the poller is prevented from running - like some sort of lock being held by the carrier/vt and do it thrashes around until it eventually gets a chance.
>>>
>>>> On Aug 13, 2024, at 10:26 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>>>>
>>>> On 13/08/2024 15:59, robert engels wrote:
>>>>> Surprisingly, lowering the priority of the carrier threads did not result in the same performance gains as reducing the parallelism.
>>>>>
>>>> Did you do any experiments with -Djdk.readPollers=2 or -Djdk.readPollers=4 to remove contention from the kqueue from the picture.
>>>>
>>>> -Alan
>>
>
More information about the loom-dev
mailing list