Strange interaction with hyperthreading on Intel hybrid CPU

Sun Oct 15 08:41:27 UTC 2023

Am Di., 10. Okt. 2023 um 15:56 Uhr schrieb Alan Bateman <
Alan.Bateman at oracle.com>:

> There's a table of system properties in the java.lang.Thread javadoc with
> the configuration, you probably want
> -Djdk.virtualThreadScheduler.maxPoolSize=<N> for your testing. It's hard to
> know what to take from your mail as virtual thread are only going to help
> if most of the time is spent blocking at the queue, the compilation and
> class generation tasks seem very compute bound.
>

Reducing maxPoolSize did not move the needle much.

Another idea was more fruitful.  Virtual threads are competing for
computing resources with other parts of the JVM: compilation, garbage
collection, and probably more.  At around 16 logical cores compilation
grabs itself an outsize portion of this pie, which shows up
in -XX:+PrintFlagsFinal as

@@ -50,7 +50,7 @@
      bool C1ProfileInlinedCalls                    = true
                  {C2 product} {default}
      bool C1ProfileVirtualCalls                    = true
                  {C2 product} {default}
      bool C1UpdateMethodData                       = true
                  {C2 product} {default}
-     intx CICompilerCount                          = 4
                    {product} {ergonomic}
+     intx CICompilerCount                          = 12
                     {product} {ergonomic}
      bool CICompilerCountPerCPU                    = true
                     {product} {default}
      bool CITime                                   = false
                    {product} {default}
      bool CheckJNICalls                            = false
                    {product} {default}

Taking 8+0+0 (8 P-cores, no HT, no E-cores) as a starting point, running
200 back to back iterations results in these timings (with ~570k fine
grained virtual threads):

real 71.37
user 394.47
sys 8.05

When adding hyperthreading and going to 8+8+0, user & sys time degrade
significantly and real time somewhat:

real 83.06
user 776.37
sys 16.38

But 8+8+0 plus -XX:CICompilerCount=4 closes most of the distance to the
8+0+0 timings again:

real 74.23
user 477.18
sys 9.75

The picture is similar when looking only at the elapsed time of the very
first bootstrap iteration in isolation, i.e. when warmup is just starting.

Adding more options to revert to the 8+0+0 garbage collection settings
seems to be a wash, and setting maxPoolSize and parallelism to 8 seems to
be slightly beneficial here.


Overlaid over this seems to be a degrading return on investment for
hyperthreading when increasing core count.  Going from 4+0+0 to 4+4+0 is
good for a real time speedup of 1.07, while going from 8+0+0 to 8+8+0 gives
a speedup of 0.97, i.e. a slowdown.

-- mva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20231015/33393e25/attachment.htm>