Strange STW pauses

Wed Aug 23 12:49:47 UTC 2017

David Holmes wrote
> On 23/08/2017 6:48 PM, Nick Chadwick wrote:
>> Thanks both for your replies.
>> 
>> Yes, it is a recurring issue - it doesn't go away after the JVM has
>> warmed
>> up. I am able to trigger the pauses quite easily and regularly with
>> minimal
>> load.
>> 
>> I am already logging GC in addition, but there are no GCs close to the
>> pauses that I am seeing. As I understand it, the log entries above are
>> explicitly not garbage collections - they would show something other than
>> "no vm operation" under vmop (e.g. CollectForMetadataAllocation,
>> ParallelGCFailedAllocation, etc) - but guaranteed safepoints. I
>> understand I
>> can tweak the GuaranteedSafepointInterval, but that doesn't feel like the
>> solution to me, and would probably just make the pauses longer, albeit
>> less
>> often.
>> 
>> If you do think it's still worth me posting some GC logs here as well, I
>> can, but it definitely feels like it is not GC-related. (I am running
>> with a
>> large heap, and see only a few, short pauses for GC)
>> 
>> Although I agree it looks like the vmop itself is not taking up the time,
>> I
>> am seeing stop-the-world pauses that correlate very closely with the
>> total
>> time spent under sync/cleanup in these specific pauses.
> 
> I did not glean that from the log fragments you posted. The cleaning 
> times seemed very short, as was the time to reach the safepoint. But the 
> formatting was problematic so perhaps I misread something ??
> 
> If there are long delays in reaching a safepoint then it is likely a C2 
> issue where safepoint checks have been elided from lengthy code 
> fragments. Try running with -XX:+UseCountedLoopSafepoints
> 
> David
> 
>> Grateful for any further insight.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://openjdk.5641.n7.nabble.com/Strange-STW-pauses-tp312324p312352.html
>> Sent from the OpenJDK Hotspot Runtime System mailing list archive at
>> Nabble.com.
>>

Admittedly, the snippets in my original post are a bit confusing - the first
two pauses and the second two pauses are notably different in the way
they're logged, albeit they all caused STW pauses of multiple seconds.

If you look at the last two, the"total time for which application threads
were stopped" (5.7s and 4.4s respectively) closely match the "cleanup" time
- and the pauses I observed. That seems logical to me.

The first two, though, show hardly any "total time for which application
threads were stopped", but substantially longer numbers in "sync" time. The
pauses I observed were approximately equal to that sync time.

Anyway, I'll revert when I've got some more logs to share, and will look at
adding UseCountedLoopSafepoints into the mix. I think I can guess what that
does :)

I should state that there are no valid situations where threads running
application code should be taking several seconds to sync to a safepoint -
there are no long loops, as fast response time is a key requirement of the
application.

Nick

--
View this message in context: http://openjdk.5641.n7.nabble.com/Strange-STW-pauses-tp312324p312374.html
Sent from the OpenJDK Hotspot Runtime System mailing list archive at Nabble.com.