Strange interaction with hyperthreading on Intel hybrid CPU

Sun Oct 15 14:09:44 UTC 2023

Am So., 15. Okt. 2023 um 14:09 Uhr schrieb Francesco Nigro <
nigro.fra at gmail.com>:

> To echo what @robert engels <rengels at ix.netcom.com> said,
> https://www.moreno.marzolla.name/teaching/HPC/vol6iss1_art01.pdf which is
> a bit old, but relevant enough..
> From my understanding, the workload where cache misses are a factor, HT
> can be beneficial, because the CPU can keep on feeding the CPU frontend (or
> experience less L3 transitions because 2 SMP thread can share the same
> data, actually). Sadly both cache misses and computational intensive tasks
> are both considered CPU-bound scenarios, while tbh they can be
> frontend/backend bound instead, and although not I/O intensive, if
> backend-bound, HT can boost VT workload, but just because due to the nature
> of workload...
> That's why I suggest (for exploration) to use a proper profiler which can
> report cache misses or specific CPU events.
>

I've seen HT described as a cheap means (with regard to transistor count or
chip area) to push some workloads/benchmarks higher.  Such a cost/benefit
analysis probably makes sense for a consumer CPU like this one.

For this reason I'm not surprised that HT does not deliver much upside or
downside for my workload.  Memory access patterns of compilers tend to be
irregular anyway.  What brought this situation to my attention was the
considerable additional resource consumption when HT is enabled.  But this
seems to be only the result of a c1/c2 compilation ergonomics decision on
JVM startup, caused by counting "HT cores" as "full cores".

The CPU events recorded by async-profiler support this story:

8+0+0: 41750 samples total, with 23160 (55.47%) under
CompileBroker::compiler_thread_loop()
8+8+0: 78431 samples total, with 54435 (69.40%)
under CompileBroker::compiler_thread_loop()

The additional resources spent on compilation do not pay off here, neither
in the first nor over 200 iterations, and they even cannibalize the work
virtual threads could be doing for the application instead.

-- mva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20231015/9e03517d/attachment.htm>