Thread-Local Handshakes slowdown on single-cpu machines
David Holmes
david.holmes at oracle.com
Wed Apr 29 13:04:11 UTC 2020
Hi Robbin,
On 29/04/2020 5:45 pm, Robbin Ehn wrote:
> Hi,
>
> On 2020-04-29 07:31, David Holmes wrote:
>> On 29/04/2020 1:26 pm, Karakó Miklós wrote:
>>> Hi Patricio,
>>>
>>> Yes, it seems like a scheduling issue for us too. One of my
>>> colleagues spotted the following sentence in Robbin Ehn's Handshaking
>>> HotSpot presentation (https://youtu.be/VBCOfAJ409s?t=2222): ~"if the
>>> VM thread finds that all threads are unsafe, it keeps spinning"
>>> Could it be that actually the VM thread gets more CPU time and
>>> actually blocks JavaThread?
>>
>
> At the moment handshake do not use yield or sleep.
> http://hg.openjdk.java.net/jdk/jdk/file/ddb12e77e8dd/src/hotspot/share/runtime/handshake.cpp
>
> Do while loop L184 -> L201.
>
> I have tested at least three different approaches trying to ease off the
> CPU, all off them suffered from performance regression on MP machines.
> We have internally seen this on very rare occasions on MP (we do not
> have UPs). Since this have been so rare, not trivial to fix without
> performance regression and I was under the assumption that UP were
> basically dead :), it have not been prioritized.
So I have to wonder why we did not use the same VMThread back_off()
strategy with handshakes?
Cheers,
David
-----
> To just fix the UP case we could add something like:
> diff -r f7919010c7c2 src/hotspot/share/runtime/handshake.cpp
> --- a/src/hotspot/share/runtime/handshake.cpp Tue Apr 28 09:14:17 2020
> +0200
> +++ b/src/hotspot/share/runtime/handshake.cpp Wed Apr 29 09:36:42 2020
> +0200
> @@ -200,2 +200,5 @@
> }
> + if (!os::is_MP()) {
> + os::naked_yield();
> + }
> } while (!_op->is_completed());
>
> Since no thread can progress (assuming that one iteration is much
> shorter than a time-slice, this is extremely unlikely) there is no need
> for a second iteration until we executed something else.
>
> And should be trivial to backport to JDK 11.
>
> Thanks, Robbin
>
>
>
>> The "spinning" is intended to be done in a scheduler-friendly way via:
>>
>> void os::naked_short_nanosleep(jlong ns) {
>> assert(ns > -1 && ns < NANOUNITS, "Un-interruptable sleep, short
>> time use only");
>>
>> int64_t start = os::javaTimeNanos();
>> do {
>> if (SwitchToThread() == 0) {
>> // Nothing else is ready to run on this cpu, spin a little
>> SpinPause();
>> }
>> } while (os::javaTimeNanos() - start < ns);
>> }
>>
>> but this won't be that scheduler-friendly because it places the
>> VMThread at the head of the priority queue, so it will cede up-to a
>> timeslice to another thread, then run another iteration of the loop
>> and hence alternate between a JavaThread and the VMThread until they
>> are all blocked at the safepoint. What is really wanted is for the
>> VMThread to go to the back of the scheduling queue.
>>
>> David
>> -----
>>
>>> I did not except that but I could reproduce the issue with a simple
>>> hello world:
>>>
>>> public class Main {
>>> public static void main(String[] args) {
>>> System.out.println("Hello world");
>>> }
>>> }
>>>
>>> A few measurements on Windows 10 x64 in VirtualBox:
>>>
>>>
>>> PS Z:\main> Measure-Command { .\main-1cpu.bat }
>>>
>>> Minutes : 0
>>> Seconds : 7
>>> Milliseconds : 20
>>> Ticks : 70203455
>>> TotalMinutes : 0.117005758333333
>>> TotalSeconds : 7.0203455
>>> TotalMilliseconds : 7020.3455
>>>
>>>
>>>
>>> PS Z:\main> Measure-Command { .\main-full.bat }
>>>
>>> Minutes : 0
>>> Seconds : 1
>>> Milliseconds : 575
>>> Ticks : 15751352
>>> TotalSeconds : 1.5751352
>>> TotalMilliseconds : 1575.1352
>>>
>>>
>>>
>>> PS Z:\main> Measure-Command { .\main-full.bat }
>>>
>>> Minutes : 0
>>> Seconds : 1
>>> Milliseconds : 558
>>> Ticks : 15585384
>>> TotalSeconds : 1.5585384
>>> TotalMilliseconds : 1558.5384
>>>
>>>
>>>
>>> PS Z:\main> Measure-Command { .\main-1cpu.bat }
>>>
>>> Minutes : 0
>>> Seconds : 23
>>> Milliseconds : 648
>>> Ticks : 236482685
>>> TotalSeconds : 23.6482685
>>> TotalMilliseconds : 23648.2685
>>>
>>>
>>> main-1cpu.bat:
>>>
>>> c:\windows\system32\cmd.exe /C start /affinity 1 \jdk-14\bin\java
>>> -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50
>>> Main
>>>
>>> main-full.bat:
>>>
>>> c:\windows\system32\cmd.exe /C start /affinity FF \jdk-14\bin\java
>>> -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50
>>> Main
>>>
>>> The files are also available here:
>>> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc/
>>> Or in a zip:
>>> https://palacsint.hu/etc/jdk/jdk14-single-cpu-slowdown-reprod-poc.zip
>>>
>>> You can find trace level logs of three testruns in the run1-2-3
>>> directories as well.
>>>
>>> Best,
>>> Miklos
>>>
>>>
>>> On 4/29/20 2:18 AM, Patricio Chilano wrote:
>>>> Hi Miklos,
>>>>
>>>> On 4/28/20 8:04 AM, Karakó Miklós wrote:
>>>>> Hello David,
>>>>>
>>>>> I've tried with OpenJDK 15 EA Build 20. Unfortunately it's slow as
>>>>> with OpenJDK14.
>>>>> A colleague of mine set CPU affinity with taskset to only one CPU
>>>>> on Linux and did not experience the slowdown. So the issue seems
>>>>> Windows-related.
>>>> I wouldn’t rule out this being a scheduling issue. With thread-local
>>>> handshakes some operations like the ones you see in your logs
>>>> (deoptimization, biasedlocking revocation) do not need all
>>>> JavaThreads to be stopped at the same time in order to be executed.
>>>> If you only have one CPU but many JavaThreads running, depending on
>>>> the scheduler, maybe it could happen that the JavaThread you need to
>>>> handshake is constantly being left behind in the scheduler queue and
>>>> so the operation takes longer to execute. When you add
>>>> -XX:-ThreadLocalHandshakes, those operations will be executed using
>>>> safepoints, which means all JavaThreads are stopped. So even if the
>>>> scheduler decides to give priority to other JavaThreads those will
>>>> block and free the CPU. Also when adding more CPU that will increase
>>>> the likelihood of a JavaThread getting scheduled and might explain
>>>> why you see it gets fixed.
>>>>
>>>> Alternatively the JavaThread that needs to be handshaked is being
>>>> scheduled but is not polling for the handshake. But I think in that
>>>> the case you should still have the same issue with
>>>> -XX:-ThreadLocalHandshakes because the polling mechanism is the
>>>> same, although it's true that the misbehaving JavaThread will have
>>>> more CPU time to complete whatever is doing while others are stopped
>>>> when safepointing.
>>>>
>>>> One thing that makes me question the scheduler theory is the fact
>>>> that you see such improvements when only disabling biased-locking in
>>>> 13.0.2 because that was still using safepoints back then. Can you
>>>> check the logs to see in which operations you see the pauses when
>>>> running with default options compared to -XX:-UseBiasedLocking?
>>>>
>>>> I’m also surprised you see those long logging pauses because all the
>>>> other JavaThreads were already processed in the handshake so
>>>> something must be executing. Maybe you can switch the logging to
>>>> trace instead of debug. Also maybe you can add logging at Java code
>>>> so you can follow what is actually running? You could send it to the
>>>> same app.txt and flush every time you write to see it ordered in the
>>>> log. You can also get the uptime you see in the UL logs with
>>>> ManagementFactory.getRuntimeMXBean().getUptime(). That logging will
>>>> affect timing but might give some insights.
>>>>
>>>> Thanks,
>>>> Patricio
>>>>> Miklos
>>>>>
>>>>> On 4/28/20 1:02 AM, David Holmes wrote:
>>>>>> Hi Miklos,
>>>>>>
>>>>>> On 28/04/2020 7:26 am, Karakó Miklós wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> We bumped into a possible Thread-Local Handshakes issue with
>>>>>>> multiple apps. It seems that enabled TLH slows down applications
>>>>>>> running on (although rare) single CPU boxes. I would be grateful
>>>>>>> if you could confirm that this is a known trade-off for these
>>>>>>> rare setups or a possible JVM bug. That would save us at least a
>>>>>>> few hours of debugging.
>>>>>>>
>>>>>>> TL;DR: Both tested apps stop frequently around
>>>>>>> "HandshakeOneThread", "HandshakeAllThreads" and "Revoked bias of
>>>>>>> currently-unlocked object" running with
>>>>>>> OpenJDK12/OpenJDK13/OpenJDK14. OpenJDK 13 with
>>>>>>> -XX:-ThreadLocalHandshakes fixes the issue. Adding a second CPU
>>>>>>> to the virtual machine fixes the issue. Enabling hyper-threading
>>>>>>> fixes the issue.
>>>>>>
>>>>>> Can you try with latest JDK 15 build, just to see if this may be
>>>>>> something already addressed?
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> More details are available at StackOverflow:
>>>>>>> https://stackoverflow.com/questions/61375565/slow-application-frequent-jvm-hangs-with-single-cpu-setups-and-java-12
>>>>>>>
>>>>>>>
>>>>>>> All thoughts are welcome.
>>>>>>>
>>>>>>> Best,
>>>>>>> Miklos
>>>>>>>
>>>>>
>>>>
>>>
More information about the hotspot-dev
mailing list