Starting and joining a lot of threads slower on AMD systems compared to Intel systems
Thomas Stüfe
thomas.stuefe at gmail.com
Sun May 16 05:40:06 UTC 2021
Hi,
Just some points:
- you run the tests on different OSes with different kernels and different
libc implementations. So the software stack below the VM is different, not
only the hardware. I would run the tests on the same software stack.
- you want to know the cost of joining, but measure thread start + thread
joining. If you want to measure thread joining, create the threads, wait
until they are up, then measure joining.
- you have an unknown element of "how many threads are alive at the same
time". This is timing dependent. Your threads return right away. A fast
machine may create the threads faster and hence push a higher wave of
concurrently alive threads, whereas a slower machine may spread thread
creation out more. Higher number of concurrently alive threads take more
time for task switches and impose more stress on the memory management, see
below.
- About memory:
- each thread needs a stack. Stack costs by default 1M+x virtual memory.
Creating 100k threads will cost you a *lot* of address space. On my 64g
machine I can create about 32k threads before address space runs out. To
limit this effect I would reduce the max thread stack size.
- more importantly each thread needs:
- stack guard pages (16k per thread on linux)
- a little bit of stack, one page or two, even if it does nothing
- VM control structures
All that memory needs to be committed and counts toward the commit
charge. 100k threads may bring you well into swapping territory. On my
machine, I need about 400M for 25k threads. Make sure that you don't start
swapping. Again, only measure thread joining after all threads are up, not
creation + joining.
Hope that helps,
Cheers, Thomas
On Sat, May 15, 2021 at 10:15 PM Waishon <waishon009 at gmail.com> wrote:
> Hey there!
>
> In a study project we should create 100.000 threads to get a feeling for
> the time it takes to create a lot of threads and why it's more efficient to
> use tasks instead.
>
> However we found out, that the same "Create and start 100.000 threads"
> code runs a lot slower on a modern Ryzen AMD systems compared to some older
> (even notebook) Intel systems. We did some benchmarking with different
> JDKs, but all using Java 16 (older versions didn't make a difference). We
> know that this isn't a good "real world" example, however we would be very
> interested in the reason, why AMD is a lot slower in this special scenario.
>
> public class ThreadCreator {
> public static void main(String[] args) throws InterruptedException
> {
> List<Thread> startedThreads = new ArrayList<>();
> long startTime = System.currentTimeMillis();
>
> for (int i = 0; i < 100_000; i++) {
> Thread t = new Thread(() -> {});
> t.start();
> startedThreads.add(t);
> }
>
> for (Thread t : startedThreads) {
> t.join();
> }
>
> System.out.println("Duration: " + (System.currentTimeMillis()
> - startTime));
> }
> }
>
> The benchmark results:
>
> AMD Ryzen 7 3700X System (Java 16, Ubuntu 20.04):
> Adopt OpenJDK (Hotspot): 13882ms
> Adopt OpenJDK (OpenJ9): 7521ms
>
> Intel i7-8550U System (Fedora 34, Java 16):
> Adopt OpenJDK (Hotspot): 5321ms
> Adopt OpenJDK (OpenJ9): 3089ms
>
> Intel i5-6600k System (Windows 10, Java 16):
> Adopt OpenJDK (Hotspot): 29433ms (Maybe realted to low memory of this
> system)
> Adopt OpenJDK (OpenJ9): 5119ms
>
> The OpenJ9 JVM reduces the time on both systems to nearly the half.
> However the AMD system never reaches the time of the Intel systems. The AMD
> system only runs at 10% cpu utilisation during this test.
>
> What might be the reason why starting and joining threads is so much
> slower on AMD systems compared to Intel systems?
>
> (Disclaimer: This question was also posted on Stackoverflow, which
> referred to this mailing list:
> https://stackoverflow.com/questions/67550679/creating-threads-is-slower-on-amd-systems-compared-to-intel-systems
> )
>
> Thank you very much in advance!
>
More information about the discuss
mailing list