Increased safepoint cleanup tasks in Java 11 (vs 8)

Tue Jun 18 19:44:40 UTC 2019

Sorry for the delay in getting back to this query. I've been focused
on other tasks...

On 6/12/19 12:46 PM, Vitaly Davidovich wrote:
>
>
> On Thu, Jun 6, 2019 at 10:29 AM Daniel D. Daugherty 
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
>     On 6/6/19 10:08 AM, Vitaly Davidovich wrote:
>>
>>
>>     On Wed, Jun 5, 2019 at 10:13 PM Vitaly Davidovich
>>     <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>
>>
>>
>>         On Wed, Jun 5, 2019 at 7:37 PM Vitaly Davidovich
>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>
>>
>>
>>             On Wed, Jun 5, 2019 at 7:10 PM Daniel D. Daugherty
>>             <daniel.daugherty at oracle.com
>>             <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>>                 On 6/5/19 7:03 PM, Vitaly Davidovich wrote:
>>>
>>>
>>>                 On Wed, Jun 5, 2019 at 6:55 PM David Holmes
>>>                 <david.holmes at oracle.com
>>>                 <mailto:david.holmes at oracle.com>> wrote:
>>>
>>>                     On 6/06/2019 2:31 am, Vitaly Davidovich wrote:
>>>                     > On Wed, Jun 5, 2019 at 12:15 PM Daniel D.
>>>                     Daugherty <
>>>                     > daniel.daugherty at oracle.com
>>>                     <mailto:daniel.daugherty at oracle.com>> wrote:
>>>                     >
>>>                     >> On 6/5/19 12:04 PM, Vitaly Davidovich wrote:
>>>                     >>
>>>                     >>
>>>                     >>
>>>                     >> On Wed, Jun 5, 2019 at 11:44 AM Daniel D.
>>>                     Daugherty <
>>>                     >> daniel.daugherty at oracle.com
>>>                     <mailto:daniel.daugherty at oracle.com>> wrote:
>>>                     >>
>>>                     >>> On 6/5/19 11:37 AM, Vitaly Davidovich wrote:
>>>                     >>>> On Wed, Jun 5, 2019 at 8:58 AM Vitaly
>>>                     Davidovich <vitalyd at gmail.com
>>>                     <mailto:vitalyd at gmail.com>>
>>>                     >>> wrote:
>>>                     >>>>
>>>                     >>>>> Hey Aleksey,
>>>                     >>>>>
>>>                     >>>>> On Wed, Jun 5, 2019 at 8:52 AM Aleksey
>>>                     Shipilev <shade at redhat.com
>>>                     <mailto:shade at redhat.com>>
>>>                     >>> wrote:
>>>                     >>>>>
>>>                     >>>>>> On 6/5/19 2:33 PM, Vitaly Davidovich wrote:
>>>                     >>>>>>> Does anyone have ideas on pinpointing
>>>                     this further?
>>>                     >>>>>> I have the educated guess:
>>>                     >>>>>>
>>>                     https://bugs.openjdk.java.net/browse/JDK-8181859
>>>                     >>>>>>
>>>                     >>>>>> Which means you can try if
>>>                     -XX:MonitorUsedDeflationThreshold=0 reduces
>>>                     >>>>>> the number of "guaranteed"
>>>                     >>>>>> safepoints.
>>>                     >>>>>>
>>>                     >>>>> Hmm, the application doesn't use JVM
>>>                     monitors (in any meaningful
>>>                     >>>>> capacity).  But interesting JBS entry - I
>>>                     will try it out, thanks! The
>>>                     >>>>> description of safepoint clean up in there
>>>                     is/was consistent with my
>>>                     >>>>> understanding as well. On Java 8, we'd see
>>>                     the "no vm operation"
>>>                     >>> safepoint
>>>                     >>>>> operations, and I always assumed it was IC
>>>                     buffer cleaning (even had a
>>>                     >>> mail
>>>                     >>>>> thread on that topic a few years ago,
>>>                     either on this list or
>>>                     >>>>> hotspot-compiler).  That tended to
>>>                     stabilize (i.e. they went away)
>>>                     >>> after
>>>                     >>>>> the application reached steady state,
>>>                     whereas I'm seeing essentially
>>>                     >>>>> non-stop safepoint cleanups on 11.
>>>                     >>>>>
>>>                     >>>> Ok, adding
>>>                     -XX:MonitorUsedDeflationThreshold=0 seems to
>>>                     alleviate the
>>>                     >>> issue
>>>                     >>>> - # of entered safepoint cleanups looks on
>>>                     par with Java 8, although I'm
>>>                     >>>> still assessing a bit more.
>>>                     >>>
>>>                     >>> This observation doesn't really match with
>>>                     your earlier comment:
>>>                     >>>
>>>                     >>>> Hmm, the application doesn't use JVM
>>>                     monitors (in any meaningful
>>>                     >>>> capacity).
>>>                     >>>
>>>                     >>> The default value for
>>>                     MonitorUsedDeflationThreshold is 90. So a cleanup
>>>                     >>> safepoint should only be triggered if
>>>                     monitor use exceeds 90% of the
>>>                     >>> monitor pool.
>>>                     >>>
>>>                     >> Hmm, ok - I see what you mean:
>>>                     >>
>>>                     https://github.com/openjdk/jdk11u/blob/d46b56eb81b07b452141c7dcd078d33fc8e49d63/src/hotspot/share/runtime/safepoint.cpp#L604
>>>                     >>
>>>                     >> So there's either monitor usage that I'm not
>>>                     aware of (or forgot about) or
>>>                     >> the ratio is > 90 but the absolute # of them
>>>                     is small (which I'd consider
>>>                     >> "don't use in any meaningful capacity").  Is
>>>                     there a way to print out
>>>                     >> monitor population so I can see?
>>>                     >>
>>>                     >>
>>>                     >> I've been adding a lot of logging support to
>>>                     the Java Monitor subsystem,
>>>                     >> but those changes are in JDK13 and not in
>>>                     JDK11. I'm trying to remember
>>>                     >> what there is in JDK11, but ...
>>>                     >>
>>>                     > Yeah, I'm having a hard time keeping track of
>>>                     what features/changes are in
>>>                     > what Java release or update version or
>>>                     upstream/downstream backport or ...
>>>                     >
>>>                     > By the way, do JNI (specifically, Java calling
>>>                     into native) calls cause
>>>                     > monitor inflation? The app in question does
>>>                     have substantial JNI usage.  I
>>>                     > guess by not using java monitors extensively,
>>>                     I meant the vanilla flavor of
>>>                     > Java code with synchronized blocks/methods.
>>>
>>>                     JNI calls do not in themselves impact monitors.
>>>                     However if native code
>>>                     uses the JNI MonitorEnter then that does trigger
>>>                     inflation.
>>>
>>>                     David
>>>                     -----
>>>
>>>                 Thanks for clarifying, David.
>>>
>>>                 Pretty certain that JNI code I have is not calling
>>>                 MonitorEnter, but will double check.
>>>
>>>                 So beyond monitor contention via vanilla Java code,
>>>                 are there other scenarios resulting in inflation? Is
>>>                 there perhaps a list of these that someone could
>>>                 point to? I can spelunk through hotspot code, but
>>>                 maybe someone can spare me that exercise :).
>>
>>                 Java Monitor inflation is caused by:
>>
>>                 - contention on the same monitor between 2 or more
>>                 threads
>>                 - monitor.wait() call
>>                 - synchronized (obj) { obj.hashCode(); }
>>
>>                 The last one is specific to the HotSpot VM.
>>
>>                 Dan
>>
>>             Awesome, thanks Dan! In lieu of more precise monitor
>>             usage logging in 11, I’ll look at our code for these.
>>
>>         I’m going to try "monitorinflation=debug" tomorrow:
>>         https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/synchronizer.cpp#L1519
>>
>>         Maybe that’ll help shed light on inflations.
>>
>>     So it doesn't look like there's much monitor inflation - some
>>     early during startup, but they go away fairly quickly (if
>>     monitorinflation=debug is to be trusted comprehensive).
>
>     Yes the 'monitorinflation=debug' logging in JDK11u will tell you about
>     every inflation and deflation of an ObjectMonitor. It is rather
>     verbose
>     if you have a lot of ObjectMonitor inflation/deflation happening.
>     If it
>     is not overloading you with information, then monitor deflation is not
>     your issue.
>
> So have been playing around with 'monitorinflation=debug' in 
> conjunction with -XX:MonitorUsedDeflationThreshold=0. One (so far) 
> puzzling observation is with the UL flag set, I still see recurring 
> safepoint cleanups but no trace of monitor inflation/deflation 
> events.  Naively, I'd expect to see (at least) inflation events if a 
> subsequent safepoint cleanup is entered for deflations.

Are you not seeing _any_ monitor inflation events?
Or are you seeing _some_ monitor inflation events at the beginning
and then no more after a steady state is reached?

> Setting MonitorUsedDeflationThreshold=0 does stop the cleanup storm, 
> however.

So that indicates that you've found the right knob to stop this change
in behavior. That's good.

> So unless there's some inflation that is not reported via UL, it's not 
> clear why there's this seeming disconnect.

These two spots in the code are the only two where we inflate a
monitor:

open/src/hotspot/share/runtime/synchronizer.cpp:

     // CASE: stack-locked
<snip>
       if (log_is_enabled(Debug, monitorinflation)) {
         if (object->is_instance()) {
           ResourceMark rm;
           log_debug(monitorinflation)("Inflating object " INTPTR_FORMAT 
" , mark " INTPTR_FORMAT " , type %s",
                                       p2i(object), p2i(object->mark()),
object->klass()->external_name());
         }
       }
       if (event.should_commit()) {
         post_monitor_inflate_event(&event, object, cause);
       }
       return m;
     }

     // CASE: neutral
<snip>
     if (log_is_enabled(Debug, monitorinflation)) {
       if (object->is_instance()) {
         ResourceMark rm;
         log_debug(monitorinflation)("Inflating object " INTPTR_FORMAT " 
, mark " INTPTR_FORMAT " , type %s",
                                     p2i(object), p2i(object->mark()),
object->klass()->external_name());
       }
     }
     if (event.should_commit()) {
       post_monitor_inflate_event(&event, object, cause);
     }
     return m;
   }

I have changed this code a bit in JDK13. For example, I've
removed the "if (object->is_instance())" checks so in JDK11u
is possible for there to be inflations related to objects
where "object->is_instance() == false" and those won't be
reported.

In my own testing of the logging improvements in JDK13, I
saw very few messages where "object->is_instance() == false"
and I saw no ill effects from stripping that if-statement
so I deleted it...

Dan

>
>
>     Just FYI: in JDK13 that particular set of diagnostics have moved from
>     'monitorinflation=debug' to 'monitorinflation=trace'.
>
> Ok, good to know - thanks for the heads up.
>
>
>     Dan
>
>
>>     Yet safepoints continue.  Interestingly, the timing reported in
>>     safepoint cleanup logging shows miniscule time spent "deflating
>>     idle monitors" (e.g. 0.0000004 secs = 400 ns), which I take as
>>     essentially overhead of going through that codepath without
>>     actually doing much of anything. "updating inline caches" is
>>     about the same (400-600 ns).
>>
>>     However, "compilation policy safepoint handler" is tens of micros
>>     (e.g. 30-50us).  Looking at the code,
>>     https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/compilationPolicy.cpp#L321,
>>     it appears to be doing method invocation counter decaying.  That
>>     does stackwalks (the application has about 30 threads), so fine -
>>     it's not very cheap.  But, according to
>>     https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/runtime/safepoint.cpp#L602,
>>     decay itself does no gate doing cleanup operations.
>>
>>     Unfortunately, it doesn't appear that I can trace ICBuffers in
>>     product builds (TraceICBuffer flag isn't present), so can't tell
>>     whether there's some change (vs java 8) in their regard. 
>>     Naively, it also seems odd that a single stub in the ICBuffer
>>     would require a cleanup:
>>     https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/code/icBuffer.cpp#L163.
>>
>>     I'll keep investigating.
>>
>>
>>         Dan, please let me know if you think this will be fruitless :).
>>
>>
>>             Only remaining input I’d appreciate from the list is
>>             whether setting MonitorUsedDeflationThreshold=0
>>             “restores” 8 behavior, or whether I’d run the risk of
>>             something unfortunate as a result (again, with respect to 8).
>>
>>             Thanks
>>
>>
>>
>>
>>>
>>>                 Thanks!
>>>
>>>
>>>
>>>                     >
>>>                     >>
>>>                     >> Dan
>>>                     >>
>>>                     >>
>>>                     >>
>>>                     >> Thanks Dan
>>>                     >>
>>>                     >>>
>>>                     >>> Dan
>>>                     >>>
>>>                     >>>
>>>                     >>>>
>>>                     >>>> Of course now the trouble is this is an
>>>                     experimental option, and
>>>                     >>> enabling
>>>                     >>>> things like that in production raises
>>>                     eyebrows.  Is it safe to assume,
>>>                     >>>> however, that setting this to 0 essentially
>>>                     behaves like Java 8? It
>>>                     >>> appears
>>>                     >>>> so, based on that JBS Aleksey linked above.
>>>                     But just double checking.
>>>                     >>>>
>>>                     >>>> If that's the case, then perhaps we can
>>>                     overlook the experimental bit
>>>                     >>> and
>>>                     >>>> wait until this is done outside safepoints,
>>>                     per Robbin's email.
>>>                     >>>>
>>>                     >>>> Thanks to everyone that responded on this
>>>                     thread! Very helpful.
>>>                     >>>>
>>>                     >>>>>> --
>>>                     >>>>>> Thanks,
>>>                     >>>>>> -Aleksey
>>>                     >>>>>>
>>>                     >>>>>>
>>>                     >>>
>>>                     >>>
>>>                     >>
>>>
>>>                 -- 
>>>                 Sent from my phone
>>
>>             -- 
>>             Sent from my phone
>>
>>         -- 
>>         Sent from my phone
>>
>