Thread-Local Handshakes slowdown on single-cpu machines

Wed Apr 29 03:26:43 UTC 2020

Hi Patricio,

Yes, it seems like a scheduling issue for us too. One of my colleagues 
spotted the following sentence in Robbin Ehn's Handshaking HotSpot 
presentation (https://youtu.be/VBCOfAJ409s?t=2222): ~"if the VM thread 
finds that all threads are unsafe, it keeps spinning"
Could it be that actually the VM thread gets more CPU time and actually 
blocks JavaThread?

I did not except that but I could reproduce the issue with a simple 
hello world:

public class Main {
     public static void main(String[] args) {
         System.out.println("Hello world");
     }
}

A few measurements on Windows 10 x64 in VirtualBox:

PS Z:\main> Measure-Command { .\main-1cpu.bat }

Minutes           : 0
Seconds           : 7
Milliseconds      : 20
Ticks             : 70203455
TotalMinutes      : 0.117005758333333
TotalSeconds      : 7.0203455
TotalMilliseconds : 7020.3455

PS Z:\main> Measure-Command { .\main-full.bat }

Minutes           : 0
Seconds           : 1
Milliseconds      : 575
Ticks             : 15751352
TotalSeconds      : 1.5751352
TotalMilliseconds : 1575.1352

PS Z:\main> Measure-Command { .\main-full.bat }

Minutes           : 0
Seconds           : 1
Milliseconds      : 558
Ticks             : 15585384
TotalSeconds      : 1.5585384
TotalMilliseconds : 1558.5384

PS Z:\main> Measure-Command { .\main-1cpu.bat }

Minutes           : 0
Seconds           : 23
Milliseconds      : 648
Ticks             : 236482685
TotalSeconds      : 23.6482685
TotalMilliseconds : 23648.2685

main-1cpu.bat:

c:\windows\system32\cmd.exe /C start /affinity 1 \jdk-14\bin\java 
-Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 Main

main-full.bat:

c:\windows\system32\cmd.exe /C start /affinity FF \jdk-14\bin\java 
-Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 Main

The files are also available here:
https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc/
Or in a zip:
https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc.zip

You can find trace level logs of three testruns in the run1-2-3 
directories as well.

Best,
Miklos

On 4/29/20 2:18 AM, Patricio Chilano wrote:
> Hi Miklos,
>
> On 4/28/20 8:04 AM, Karakó Miklós wrote:
>> Hello David,
>>
>> I've tried with OpenJDK 15 EA Build 20. Unfortunately it's slow as 
>> with OpenJDK14.
>> A colleague of mine set CPU affinity with taskset to only one CPU on 
>> Linux and did not experience the slowdown. So the issue seems 
>> Windows-related.
> I wouldn’t rule out this being a scheduling issue. With thread-local 
> handshakes some operations like the ones you see in your logs 
> (deoptimization, biasedlocking revocation) do not need all JavaThreads 
> to be stopped at the same time in order to be executed. If you only 
> have one CPU but many JavaThreads running, depending on the scheduler, 
> maybe it could happen that the JavaThread you need to handshake is 
> constantly being left behind in the scheduler queue and so the 
> operation takes longer to execute. When you add 
> -XX:-ThreadLocalHandshakes, those operations will be executed using 
> safepoints, which means all JavaThreads are stopped. So even if the 
> scheduler decides to give priority to other JavaThreads those will 
> block and free the CPU. Also when adding more CPU that will increase 
> the likelihood of a JavaThread getting scheduled and might explain why 
> you see it gets fixed.
>
> Alternatively the JavaThread that needs to be handshaked is being 
> scheduled but is not polling for the handshake. But I think in that 
> the case you should still have the same issue with 
> -XX:-ThreadLocalHandshakes because the polling mechanism is the same, 
> although it's true that the misbehaving JavaThread will have more CPU 
> time to complete whatever is doing while others are stopped when 
> safepointing.
>
> One thing that makes me question the scheduler theory is the fact that 
> you see such improvements when only disabling biased-locking in 13.0.2 
> because that was still using safepoints back then. Can you check the 
> logs to see in which operations you see the pauses when running with 
> default options compared to -XX:-UseBiasedLocking?
>
> I’m also surprised you see those long logging pauses because all the 
> other JavaThreads were already processed in the handshake so something 
> must be executing. Maybe you can switch the logging to trace instead 
> of debug. Also maybe you can add logging at Java code so you can 
> follow what is actually running? You could send it to the same app.txt 
> and flush every time you write to see it ordered in the log. You can 
> also get the uptime you see in the UL logs with 
> ManagementFactory.getRuntimeMXBean().getUptime(). That logging will 
> affect timing but might give some insights.
>
> Thanks,
> Patricio
>> Miklos
>>
>> On 4/28/20 1:02 AM, David Holmes wrote:
>>> Hi Miklos,
>>>
>>> On 28/04/2020 7:26 am, Karakó Miklós wrote:
>>>> Hello,
>>>>
>>>> We bumped into a possible Thread-Local Handshakes issue with 
>>>> multiple apps. It seems that enabled TLH slows down applications 
>>>> running on (although rare) single CPU boxes. I would be grateful if 
>>>> you could confirm that this is a known trade-off for these rare 
>>>> setups or a possible JVM bug. That would save us at least a few 
>>>> hours of debugging.
>>>>
>>>> TL;DR: Both tested apps stop frequently around 
>>>> "HandshakeOneThread", "HandshakeAllThreads" and "Revoked bias of 
>>>> currently-unlocked object" running with 
>>>> OpenJDK12/OpenJDK13/OpenJDK14. OpenJDK 13 with 
>>>> -XX:-ThreadLocalHandshakes fixes the issue. Adding a second CPU to 
>>>> the virtual machine fixes the issue. Enabling hyper-threading fixes 
>>>> the issue.
>>>
>>> Can you try with latest JDK 15 build, just to see if this may be 
>>> something already addressed?
>>>
>>> Thanks,
>>> David
>>>
>>>> More details are available at StackOverflow: 
>>>> https://stackoverflow.com/questions/61375565/slow-application-frequent-jvm-hangs-with-single-cpu-setups-and-java-12 
>>>>
>>>>
>>>> All thoughts are welcome.
>>>>
>>>> Best,
>>>> Miklos
>>>>
>>
>