Thread-Local Handshakes slowdown on single-cpu machines
Robbin Ehn
robbin.ehn at oracle.com
Wed Apr 29 07:45:44 UTC 2020
Hi,
On 2020-04-29 07:31, David Holmes wrote:
> On 29/04/2020 1:26 pm, Karakó Miklós wrote:
>> Hi Patricio,
>>
>> Yes, it seems like a scheduling issue for us too. One of my colleagues
>> spotted the following sentence in Robbin Ehn's Handshaking HotSpot
>> presentation (https://youtu.be/VBCOfAJ409s?t=2222): ~"if the VM thread
>> finds that all threads are unsafe, it keeps spinning"
>> Could it be that actually the VM thread gets more CPU time and
>> actually blocks JavaThread?
>
At the moment handshake do not use yield or sleep.
http://hg.openjdk.java.net/jdk/jdk/file/ddb12e77e8dd/src/hotspot/share/runtime/handshake.cpp
Do while loop L184 -> L201.
I have tested at least three different approaches trying to ease off the
CPU, all off them suffered from performance regression on MP machines.
We have internally seen this on very rare occasions on MP (we do not
have UPs). Since this have been so rare, not trivial to fix without
performance regression and I was under the assumption that UP were
basically dead :), it have not been prioritized.
To just fix the UP case we could add something like:
diff -r f7919010c7c2 src/hotspot/share/runtime/handshake.cpp
--- a/src/hotspot/share/runtime/handshake.cpp Tue Apr 28 09:14:17 2020
+0200
+++ b/src/hotspot/share/runtime/handshake.cpp Wed Apr 29 09:36:42 2020
+0200
@@ -200,2 +200,5 @@
}
+ if (!os::is_MP()) {
+ os::naked_yield();
+ }
} while (!_op->is_completed());
Since no thread can progress (assuming that one iteration is much
shorter than a time-slice, this is extremely unlikely) there is no need
for a second iteration until we executed something else.
And should be trivial to backport to JDK 11.
Thanks, Robbin
> The "spinning" is intended to be done in a scheduler-friendly way via:
>
> void os::naked_short_nanosleep(jlong ns) {
> assert(ns > -1 && ns < NANOUNITS, "Un-interruptable sleep, short time
> use only");
>
> int64_t start = os::javaTimeNanos();
> do {
> if (SwitchToThread() == 0) {
> // Nothing else is ready to run on this cpu, spin a little
> SpinPause();
> }
> } while (os::javaTimeNanos() - start < ns);
> }
>
> but this won't be that scheduler-friendly because it places the VMThread
> at the head of the priority queue, so it will cede up-to a timeslice to
> another thread, then run another iteration of the loop and hence
> alternate between a JavaThread and the VMThread until they are all
> blocked at the safepoint. What is really wanted is for the VMThread to
> go to the back of the scheduling queue.
>
> David
> -----
>
>> I did not except that but I could reproduce the issue with a simple
>> hello world:
>>
>> public class Main {
>> public static void main(String[] args) {
>> System.out.println("Hello world");
>> }
>> }
>>
>> A few measurements on Windows 10 x64 in VirtualBox:
>>
>>
>> PS Z:\main> Measure-Command { .\main-1cpu.bat }
>>
>> Minutes : 0
>> Seconds : 7
>> Milliseconds : 20
>> Ticks : 70203455
>> TotalMinutes : 0.117005758333333
>> TotalSeconds : 7.0203455
>> TotalMilliseconds : 7020.3455
>>
>>
>>
>> PS Z:\main> Measure-Command { .\main-full.bat }
>>
>> Minutes : 0
>> Seconds : 1
>> Milliseconds : 575
>> Ticks : 15751352
>> TotalSeconds : 1.5751352
>> TotalMilliseconds : 1575.1352
>>
>>
>>
>> PS Z:\main> Measure-Command { .\main-full.bat }
>>
>> Minutes : 0
>> Seconds : 1
>> Milliseconds : 558
>> Ticks : 15585384
>> TotalSeconds : 1.5585384
>> TotalMilliseconds : 1558.5384
>>
>>
>>
>> PS Z:\main> Measure-Command { .\main-1cpu.bat }
>>
>> Minutes : 0
>> Seconds : 23
>> Milliseconds : 648
>> Ticks : 236482685
>> TotalSeconds : 23.6482685
>> TotalMilliseconds : 23648.2685
>>
>>
>> main-1cpu.bat:
>>
>> c:\windows\system32\cmd.exe /C start /affinity 1 \jdk-14\bin\java
>> -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 Main
>>
>> main-full.bat:
>>
>> c:\windows\system32\cmd.exe /C start /affinity FF \jdk-14\bin\java
>> -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 Main
>>
>> The files are also available here:
>> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc/
>> Or in a zip:
>> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc.zip
>>
>> You can find trace level logs of three testruns in the run1-2-3
>> directories as well.
>>
>> Best,
>> Miklos
>>
>>
>> On 4/29/20 2:18 AM, Patricio Chilano wrote:
>>> Hi Miklos,
>>>
>>> On 4/28/20 8:04 AM, Karakó Miklós wrote:
>>>> Hello David,
>>>>
>>>> I've tried with OpenJDK 15 EA Build 20. Unfortunately it's slow as
>>>> with OpenJDK14.
>>>> A colleague of mine set CPU affinity with taskset to only one CPU on
>>>> Linux and did not experience the slowdown. So the issue seems
>>>> Windows-related.
>>> I wouldn’t rule out this being a scheduling issue. With thread-local
>>> handshakes some operations like the ones you see in your logs
>>> (deoptimization, biasedlocking revocation) do not need all
>>> JavaThreads to be stopped at the same time in order to be executed.
>>> If you only have one CPU but many JavaThreads running, depending on
>>> the scheduler, maybe it could happen that the JavaThread you need to
>>> handshake is constantly being left behind in the scheduler queue and
>>> so the operation takes longer to execute. When you add
>>> -XX:-ThreadLocalHandshakes, those operations will be executed using
>>> safepoints, which means all JavaThreads are stopped. So even if the
>>> scheduler decides to give priority to other JavaThreads those will
>>> block and free the CPU. Also when adding more CPU that will increase
>>> the likelihood of a JavaThread getting scheduled and might explain
>>> why you see it gets fixed.
>>>
>>> Alternatively the JavaThread that needs to be handshaked is being
>>> scheduled but is not polling for the handshake. But I think in that
>>> the case you should still have the same issue with
>>> -XX:-ThreadLocalHandshakes because the polling mechanism is the same,
>>> although it's true that the misbehaving JavaThread will have more CPU
>>> time to complete whatever is doing while others are stopped when
>>> safepointing.
>>>
>>> One thing that makes me question the scheduler theory is the fact
>>> that you see such improvements when only disabling biased-locking in
>>> 13.0.2 because that was still using safepoints back then. Can you
>>> check the logs to see in which operations you see the pauses when
>>> running with default options compared to -XX:-UseBiasedLocking?
>>>
>>> I’m also surprised you see those long logging pauses because all the
>>> other JavaThreads were already processed in the handshake so
>>> something must be executing. Maybe you can switch the logging to
>>> trace instead of debug. Also maybe you can add logging at Java code
>>> so you can follow what is actually running? You could send it to the
>>> same app.txt and flush every time you write to see it ordered in the
>>> log. You can also get the uptime you see in the UL logs with
>>> ManagementFactory.getRuntimeMXBean().getUptime(). That logging will
>>> affect timing but might give some insights.
>>>
>>> Thanks,
>>> Patricio
>>>> Miklos
>>>>
>>>> On 4/28/20 1:02 AM, David Holmes wrote:
>>>>> Hi Miklos,
>>>>>
>>>>> On 28/04/2020 7:26 am, Karakó Miklós wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We bumped into a possible Thread-Local Handshakes issue with
>>>>>> multiple apps. It seems that enabled TLH slows down applications
>>>>>> running on (although rare) single CPU boxes. I would be grateful
>>>>>> if you could confirm that this is a known trade-off for these rare
>>>>>> setups or a possible JVM bug. That would save us at least a few
>>>>>> hours of debugging.
>>>>>>
>>>>>> TL;DR: Both tested apps stop frequently around
>>>>>> "HandshakeOneThread", "HandshakeAllThreads" and "Revoked bias of
>>>>>> currently-unlocked object" running with
>>>>>> OpenJDK12/OpenJDK13/OpenJDK14. OpenJDK 13 with
>>>>>> -XX:-ThreadLocalHandshakes fixes the issue. Adding a second CPU to
>>>>>> the virtual machine fixes the issue. Enabling hyper-threading
>>>>>> fixes the issue.
>>>>>
>>>>> Can you try with latest JDK 15 build, just to see if this may be
>>>>> something already addressed?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> More details are available at StackOverflow:
>>>>>> https://stackoverflow.com/questions/61375565/slow-application-frequent-jvm-hangs-with-single-cpu-setups-and-java-12
>>>>>>
>>>>>>
>>>>>> All thoughts are welcome.
>>>>>>
>>>>>> Best,
>>>>>> Miklos
>>>>>>
>>>>
>>>
>>
More information about the hotspot-dev
mailing list