Starting and joining a lot of threads slower on AMD systems compared to Intel systems

David Holmes david.holmes at oracle.com
Sun May 16 22:52:42 UTC 2021


On 16/05/2021 5:02 pm, Thomas Stüfe wrote:
> On Sun, May 16, 2021 at 8:06 AM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
> 
>> The difference to J9 is annoying though.
>>
>> It just occurred to me that we have this handshake between creator thread
>> and newborn thread. The creator thread waits for the newborn to be up and
>> running, in order to avoid inconsistent thread states. Which requires the
>> newborn to get at least some cycles, competing with all its siblings.
>>
> 
> Which interestingly we don't do for all platforms (not on AIX nor Windows).
> Another reason to execute such tests on the same OS and hardware.

As Linux/BSD/macOS can't start threads suspended we have to emulate that 
with this handshake. The reason this is needed is to ensure the newly 
created thread can't run to completion and delete itself before the 
creator has finished the initial interaction with it.

David
-----

> 
>>
>> I wonder whether J9 does this differently. May be worth looking into.
>>
>>
> Found a trivial low hanging fruit at least:
> https://github.com/openjdk/jdk/pull/4042
> 
> ..Thomas
> 
> 
>> ..Thomas
>>
>>
>> On Sun, May 16, 2021 at 7:46 AM Aleksey Shipilev <shade at redhat.com> wrote:
>>
>>> In addition to excellent Thomas Stuefes' reply:
>>>
>>> On 5/15/21 10:15 PM, Waishon wrote:
>>>> In a study project we should create 100.000 threads to get a feeling
>>> for the time it takes to
>>>> create a lot of threads and why it's more efficient to use tasks
>>> instead.
>>> I think what you found is in the spirit of the assignment: not only you
>>> discovered the overheads are
>>> high, but also that they are different across systems, OSes, etc.
>>>
>>>> What might be the reason why starting and joining threads is so much
>>> slower on AMD systems
>>>> compared to Intel systems?
>>>>
>>>> (Disclaimer: This question was also posted on Stackoverflow, which
>>> referred to this mailing list:
>>>>
>>> https://stackoverflow.com/questions/67550679/creating-threads-is-slower-on-amd-systems-compared-to-intel-systems
>>> )
>>> Well, could be a number of things. Most likely, since we are dealing with
>>> OS syscalls on thread
>>> creation, we are probably doing mostly memory and task management for new
>>> threads. But to know more,
>>> you need to follow the advice of the very first SO comment: "You should
>>> probably use a profiler and
>>> it will better tell you where the time is being spent." [1]
>>>
>>> For example, with async-profiler [2] would profile both Java, JVM and
>>> kernel code:
>>>    $ java
>>> -agentpath:$asyncProfilerPath/libasyncProfiler.so=start,event=cpu,frequency=10000,file=profile.html
>>>
>>> ThreadCreator
>>>
>>> --
>>> Thanks,
>>> -Aleksey
>>>
>>> [1]
>>>
>>> https://stackoverflow.com/questions/67550679/creating-threads-is-slower-on-amd-systems-compared-to-intel-systems#comment119397887_67550679
>>> [2] https://github.com/jvm-profiling-tools/async-profiler
>>>
>>>


More information about the discuss mailing list