Thread-Local Handshakes slowdown on single-cpu machines
Patricio Chilano
patricio.chilano.mateo at oracle.com
Wed Apr 29 00:18:53 UTC 2020
Hi Miklos,
On 4/28/20 8:04 AM, Karakó Miklós wrote:
> Hello David,
>
> I've tried with OpenJDK 15 EA Build 20. Unfortunately it's slow as
> with OpenJDK14.
> A colleague of mine set CPU affinity with taskset to only one CPU on
> Linux and did not experience the slowdown. So the issue seems
> Windows-related.
I wouldn’t rule out this being a scheduling issue. With thread-local
handshakes some operations like the ones you see in your logs
(deoptimization, biasedlocking revocation) do not need all JavaThreads
to be stopped at the same time in order to be executed. If you only have
one CPU but many JavaThreads running, depending on the scheduler, maybe
it could happen that the JavaThread you need to handshake is constantly
being left behind in the scheduler queue and so the operation takes
longer to execute. When you add -XX:-ThreadLocalHandshakes, those
operations will be executed using safepoints, which means all
JavaThreads are stopped. So even if the scheduler decides to give
priority to other JavaThreads those will block and free the CPU. Also
when adding more CPU that will increase the likelihood of a JavaThread
getting scheduled and might explain why you see it gets fixed.
Alternatively the JavaThread that needs to be handshaked is being
scheduled but is not polling for the handshake. But I think in that the
case you should still have the same issue with
-XX:-ThreadLocalHandshakes because the polling mechanism is the same,
although it's true that the misbehaving JavaThread will have more CPU
time to complete whatever is doing while others are stopped when
safepointing.
One thing that makes me question the scheduler theory is the fact that
you see such improvements when only disabling biased-locking in 13.0.2
because that was still using safepoints back then. Can you check the
logs to see in which operations you see the pauses when running with
default options compared to -XX:-UseBiasedLocking?
I’m also surprised you see those long logging pauses because all the
other JavaThreads were already processed in the handshake so something
must be executing. Maybe you can switch the logging to trace instead of
debug. Also maybe you can add logging at Java code so you can follow
what is actually running? You could send it to the same app.txt and
flush every time you write to see it ordered in the log. You can also
get the uptime you see in the UL logs with
ManagementFactory.getRuntimeMXBean().getUptime(). That logging will
affect timing but might give some insights.
Thanks,
Patricio
> Miklos
>
> On 4/28/20 1:02 AM, David Holmes wrote:
>> Hi Miklos,
>>
>> On 28/04/2020 7:26 am, Karakó Miklós wrote:
>>> Hello,
>>>
>>> We bumped into a possible Thread-Local Handshakes issue with
>>> multiple apps. It seems that enabled TLH slows down applications
>>> running on (although rare) single CPU boxes. I would be grateful if
>>> you could confirm that this is a known trade-off for these rare
>>> setups or a possible JVM bug. That would save us at least a few
>>> hours of debugging.
>>>
>>> TL;DR: Both tested apps stop frequently around "HandshakeOneThread",
>>> "HandshakeAllThreads" and "Revoked bias of currently-unlocked
>>> object" running with OpenJDK12/OpenJDK13/OpenJDK14. OpenJDK 13 with
>>> -XX:-ThreadLocalHandshakes fixes the issue. Adding a second CPU to
>>> the virtual machine fixes the issue. Enabling hyper-threading fixes
>>> the issue.
>>
>> Can you try with latest JDK 15 build, just to see if this may be
>> something already addressed?
>>
>> Thanks,
>> David
>>
>>> More details are available at StackOverflow:
>>> https://stackoverflow.com/questions/61375565/slow-application-frequent-jvm-hangs-with-single-cpu-setups-and-java-12
>>>
>>>
>>> All thoughts are welcome.
>>>
>>> Best,
>>> Miklos
>>>
>
More information about the hotspot-dev
mailing list