Thread-Local Handshakes slowdown on single-cpu machines

Wed Apr 29 05:31:42 UTC 2020

On 29/04/2020 1:26 pm, Karakó Miklós wrote:
> Hi Patricio,
> 
> Yes, it seems like a scheduling issue for us too. One of my colleagues 
> spotted the following sentence in Robbin Ehn's Handshaking HotSpot 
> presentation (https://youtu.be/VBCOfAJ409s?t=2222): ~"if the VM thread 
> finds that all threads are unsafe, it keeps spinning"
> Could it be that actually the VM thread gets more CPU time and actually 
> blocks JavaThread?

The "spinning" is intended to be done in a scheduler-friendly way via:

void os::naked_short_nanosleep(jlong ns) {
   assert(ns > -1 && ns < NANOUNITS, "Un-interruptable sleep, short time 
use only");

   int64_t start = os::javaTimeNanos();
   do {
     if (SwitchToThread() == 0) {
       // Nothing else is ready to run on this cpu, spin a little
       SpinPause();
     }
   } while (os::javaTimeNanos() - start < ns);
}

but this won't be that scheduler-friendly because it places the VMThread 
at the head of the priority queue, so it will cede up-to a timeslice to 
another thread, then run another iteration of the loop and hence 
alternate between a JavaThread and the VMThread until they are all 
blocked at the safepoint. What is really wanted is for the VMThread to 
go to the back of the scheduling queue.

David
-----

> I did not except that but I could reproduce the issue with a simple 
> hello world:
> 
> public class Main {
>      public static void main(String[] args) {
>          System.out.println("Hello world");
>      }
> }
> 
> A few measurements on Windows 10 x64 in VirtualBox:
> 
> 
> PS Z:\main> Measure-Command { .\main-1cpu.bat }
> 
> Minutes           : 0
> Seconds           : 7
> Milliseconds      : 20
> Ticks             : 70203455
> TotalMinutes      : 0.117005758333333
> TotalSeconds      : 7.0203455
> TotalMilliseconds : 7020.3455
> 
> 
> 
> PS Z:\main> Measure-Command { .\main-full.bat }
> 
> Minutes           : 0
> Seconds           : 1
> Milliseconds      : 575
> Ticks             : 15751352
> TotalSeconds      : 1.5751352
> TotalMilliseconds : 1575.1352
> 
> 
> 
> PS Z:\main> Measure-Command { .\main-full.bat }
> 
> Minutes           : 0
> Seconds           : 1
> Milliseconds      : 558
> Ticks             : 15585384
> TotalSeconds      : 1.5585384
> TotalMilliseconds : 1558.5384
> 
> 
> 
> PS Z:\main> Measure-Command { .\main-1cpu.bat }
> 
> Minutes           : 0
> Seconds           : 23
> Milliseconds      : 648
> Ticks             : 236482685
> TotalSeconds      : 23.6482685
> TotalMilliseconds : 23648.2685
> 
> 
> main-1cpu.bat:
> 
> c:\windows\system32\cmd.exe /C start /affinity 1 \jdk-14\bin\java 
> -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 Main
> 
> main-full.bat:
> 
> c:\windows\system32\cmd.exe /C start /affinity FF \jdk-14\bin\java 
> -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 Main
> 
> The files are also available here:
> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc/
> Or in a zip:
> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc.zip
> 
> You can find trace level logs of three testruns in the run1-2-3 
> directories as well.
> 
> Best,
> Miklos
> 
> 
> On 4/29/20 2:18 AM, Patricio Chilano wrote:
>> Hi Miklos,
>>
>> On 4/28/20 8:04 AM, Karakó Miklós wrote:
>>> Hello David,
>>>
>>> I've tried with OpenJDK 15 EA Build 20. Unfortunately it's slow as 
>>> with OpenJDK14.
>>> A colleague of mine set CPU affinity with taskset to only one CPU on 
>>> Linux and did not experience the slowdown. So the issue seems 
>>> Windows-related.
>> I wouldn’t rule out this being a scheduling issue. With thread-local 
>> handshakes some operations like the ones you see in your logs 
>> (deoptimization, biasedlocking revocation) do not need all JavaThreads 
>> to be stopped at the same time in order to be executed. If you only 
>> have one CPU but many JavaThreads running, depending on the scheduler, 
>> maybe it could happen that the JavaThread you need to handshake is 
>> constantly being left behind in the scheduler queue and so the 
>> operation takes longer to execute. When you add 
>> -XX:-ThreadLocalHandshakes, those operations will be executed using 
>> safepoints, which means all JavaThreads are stopped. So even if the 
>> scheduler decides to give priority to other JavaThreads those will 
>> block and free the CPU. Also when adding more CPU that will increase 
>> the likelihood of a JavaThread getting scheduled and might explain why 
>> you see it gets fixed.
>>
>> Alternatively the JavaThread that needs to be handshaked is being 
>> scheduled but is not polling for the handshake. But I think in that 
>> the case you should still have the same issue with 
>> -XX:-ThreadLocalHandshakes because the polling mechanism is the same, 
>> although it's true that the misbehaving JavaThread will have more CPU 
>> time to complete whatever is doing while others are stopped when 
>> safepointing.
>>
>> One thing that makes me question the scheduler theory is the fact that 
>> you see such improvements when only disabling biased-locking in 13.0.2 
>> because that was still using safepoints back then. Can you check the 
>> logs to see in which operations you see the pauses when running with 
>> default options compared to -XX:-UseBiasedLocking?
>>
>> I’m also surprised you see those long logging pauses because all the 
>> other JavaThreads were already processed in the handshake so something 
>> must be executing. Maybe you can switch the logging to trace instead 
>> of debug. Also maybe you can add logging at Java code so you can 
>> follow what is actually running? You could send it to the same app.txt 
>> and flush every time you write to see it ordered in the log. You can 
>> also get the uptime you see in the UL logs with 
>> ManagementFactory.getRuntimeMXBean().getUptime(). That logging will 
>> affect timing but might give some insights.
>>
>> Thanks,
>> Patricio
>>> Miklos
>>>
>>> On 4/28/20 1:02 AM, David Holmes wrote:
>>>> Hi Miklos,
>>>>
>>>> On 28/04/2020 7:26 am, Karakó Miklós wrote:
>>>>> Hello,
>>>>>
>>>>> We bumped into a possible Thread-Local Handshakes issue with 
>>>>> multiple apps. It seems that enabled TLH slows down applications 
>>>>> running on (although rare) single CPU boxes. I would be grateful if 
>>>>> you could confirm that this is a known trade-off for these rare 
>>>>> setups or a possible JVM bug. That would save us at least a few 
>>>>> hours of debugging.
>>>>>
>>>>> TL;DR: Both tested apps stop frequently around 
>>>>> "HandshakeOneThread", "HandshakeAllThreads" and "Revoked bias of 
>>>>> currently-unlocked object" running with 
>>>>> OpenJDK12/OpenJDK13/OpenJDK14. OpenJDK 13 with 
>>>>> -XX:-ThreadLocalHandshakes fixes the issue. Adding a second CPU to 
>>>>> the virtual machine fixes the issue. Enabling hyper-threading fixes 
>>>>> the issue.
>>>>
>>>> Can you try with latest JDK 15 build, just to see if this may be 
>>>> something already addressed?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> More details are available at StackOverflow: 
>>>>> https://stackoverflow.com/questions/61375565/slow-application-frequent-jvm-hangs-with-single-cpu-setups-and-java-12 
>>>>>
>>>>>
>>>>> All thoughts are welcome.
>>>>>
>>>>> Best,
>>>>> Miklos
>>>>>
>>>
>>
>