Increased safepoint cleanup tasks in Java 11 (vs 8)

Thu Jun 20 13:43:43 UTC 2019

On 6/20/19 9:03 AM, Vitaly Davidovich wrote:
> Hey Dan,
>
> On Wed, Jun 19, 2019 at 4:44 PM Daniel D. Daugherty 
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
>     Vitaly,
>
>     I filed the following bug:
>
>         JDK-8226416 MonitorUsedDeflationThreshold can caused repeated
>     cleanup safepoints
>     https://bugs.openjdk.java.net/browse/JDK-8226416
>
>     for this issue. I've created a test that reproduces the bug
>     in my JDK13 environment and _should_ also recreate the problem
>     in JDK11u... I have not tried it on JDK11u myself...
>
> Thanks! You mentioned the following in the bug report:
>
>     at this point gMonitorFreeCount has reached 0 because all of the
>     free global monitors have been allocated to per-threadlists.
>
> How do free global monitors get allocated to per-thread lists?

Please see ObjectSynchronizer::omAlloc() in
src/hotspot/share/runtime/synchronizer.cpp.

Short version: When omAlloc() is called:

1) check the per-thread free list
2) check the global free list
3) allocate new ObjectMonitors

In the case that I'm talking about, the thread moved
free nodes from the global free list to its own per-thread
free list.

> In your demo program, there's only a single (Java/user) thread, so 
> what are the "per-thread lists" you're referring to? I didn't quite 
> understand why, with 33 monitor inflations, the gMonitorFreeCount 
> dropped by a lot more.  It must be this per-thread list allocation, 
> but perhaps you can explain a bit how that works.

The ObjectMonitor allocation and management mechanisms are complicated.
The first time a thread asks for ObjectMonitors, it asks for 32 of them.
If it uses all of those up, then the next time it asks for more. That's
this line of code:

> Self->omFreeProvision += 1 + (Self->omFreeProvision/2);

so omFreeProvision starts at 32 and next time it will be

   32 + 1 + (32 / 2) => 33 + 16 => 49

If you look at the "debug" logging output files you'll find
some tables which so the "Provision" value for each thread.

So when I inflated 33 monitors in that thread, I was intentionally
using up the initial allocation and causing a new allocation request.
Although the omFreeProvision value went to 49, that does not mean
that the thread actually allocated that many more. It does that only
if they are readily available.

These are lines from the debug logging for our target thread:

[0.112s][debug][monitorinflation]  Per-Thread Lists: InUse        Free   
Provision
[0.112s][debug][monitorinflation] ==================  ========== 
==========  ==========
[0.112s][debug][monitorinflation] 0x00007fbc3f809000 0           
0          32
[0.120s][debug][monitorinflation] 0x00007fbc3f809000 0           
0          32
[0.207s][debug][monitorinflation] 0x00007fbc3f809000 19          
13          49
[1.212s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[2.219s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[3.225s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[4.230s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[5.233s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[6.240s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[7.243s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[8.249s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[9.251s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74

So our thread starts with an omFreeProvision value of 32, but
doesn't have any for the first two reports. The third line shows
that it has 32 split between in-use (19) and free (13), but the
omFreeProvision value is now 49. That's because it got updated
by that line of code above for the _next_ allocation. The fourth
line shows that our thread has just 33 in-use so even though the
omFreeProvision == 49 when we needed one more, we only got the
one more and did not get any more for our free list.

Another thing to keep in mind is that there are other threads
and the first time they need a single inflated ObjectMonitor,
they will also get 32 of them (if they are available).

Here's the last table before we hit the 10 second mark and
unlock all of our inflated monitors:

[9.251s][debug][monitorinflation]      Global Lists: InUse        
Free       Total
[9.251s][debug][monitorinflation] ==================  ========== 
==========  ==========
[9.251s][debug][monitorinflation] 0           0         127
[9.251s][debug][monitorinflation]  Per-Thread Lists: InUse        Free   
Provision
[9.251s][debug][monitorinflation] ==================  ========== 
==========  ==========
[9.251s][debug][monitorinflation] 0x00007fbc3f809000 33           
0          74
[9.251s][debug][monitorinflation] 0x00007fbc3f02f000 0          
31          49
[9.251s][debug][monitorinflation] 0x00007fbc4000a800 1          
30          49
[9.251s][debug][monitorinflation] 0x00007fbc3f941000 0           
0          32
[9.251s][debug][monitorinflation] 0x00007fbc3f95e000 0           
0          32
[9.251s][debug][monitorinflation] 0x00007fbc3f03a000 0           
0          32
[9.251s][debug][monitorinflation] 0x00007fbc3f03d800 0           
0          32
[9.251s][debug][monitorinflation] 0x00007fbc3f9ba800 0           
0          32
[9.251s][debug][monitorinflation] 0x00007fbc3f03b000 1          
31          49
[9.251s][debug][monitorinflation] Checking global lists:
[9.251s][debug][monitorinflation] gMonitorPopulation=127 equals 
chkMonitorPopulation=127
[9.251s][debug][monitorinflation] gOmInUseCount=0 equals chkOmInUseCount=0
[9.251s][debug][monitorinflation] gMonitorFreeCount=0 equals 
chkMonitorFreeCount=0

We have a total of 127 ObjectMonitors out there spread across
4 threads. 3 threads are using one or more ObjectMonitors.

This line is interesting:

0x00007fbc3f02f000           0          31          49

This thread obviously needed an ObjectMonitor (because it has
some on its free list), but none are currently in-use. My guess
would be that this thread used an ObjectMonitor, finished with
it, it was deflated and put back on the global free list and
that's where our thread (0x00007fbc3f809000) picked up one more
ObjectMonitor to get to 33. If we switched from 'debug' level
to 'trace' level we would see logging that would show all of
that info in gory detail...

Like I said... complicated...

Dan

>
>     Dan
>
> Thanks,
>
> Vitaly
>
>
>
>     On 6/18/19 4:09 PM, Vitaly Davidovich wrote:
>>     Hey Dan,
>>
>>     On Tue, Jun 18, 2019 at 3:44 PM Daniel D. Daugherty
>>     <daniel.daugherty at oracle.com
>>     <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>>         Sorry for the delay in getting back to this query. I've been
>>         focused
>>         on other tasks...
>>
>>     No worries, I understand and appreciate your time and help.
>>
>>
>>         On 6/12/19 12:46 PM, Vitaly Davidovich wrote:
>>>
>>>
>>>         On Thu, Jun 6, 2019 at 10:29 AM Daniel D. Daugherty
>>>         <daniel.daugherty at oracle.com
>>>         <mailto:daniel.daugherty at oracle.com>> wrote:
>>>
>>>             On 6/6/19 10:08 AM, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>             On Wed, Jun 5, 2019 at 10:13 PM Vitaly Davidovich
>>>>             <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>>                 On Wed, Jun 5, 2019 at 7:37 PM Vitaly Davidovich
>>>>                 <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>>                     On Wed, Jun 5, 2019 at 7:10 PM Daniel D.
>>>>                     Daugherty <daniel.daugherty at oracle.com
>>>>                     <mailto:daniel.daugherty at oracle.com>> wrote:
>>>>
>>>>                         On 6/5/19 7:03 PM, Vitaly Davidovich wrote:
>>>>>
>>>>>
>>>>>                         On Wed, Jun 5, 2019 at 6:55 PM David
>>>>>                         Holmes <david.holmes at oracle.com
>>>>>                         <mailto:david.holmes at oracle.com>> wrote:
>>>>>
>>>>>                             On 6/06/2019 2:31 am, Vitaly
>>>>>                             Davidovich wrote:
>>>>>                             > On Wed, Jun 5, 2019 at 12:15 PM
>>>>>                             Daniel D. Daugherty <
>>>>>                             > daniel.daugherty at oracle.com
>>>>>                             <mailto:daniel.daugherty at oracle.com>>
>>>>>                             wrote:
>>>>>                             >
>>>>>                             >> On 6/5/19 12:04 PM, Vitaly
>>>>>                             Davidovich wrote:
>>>>>                             >>
>>>>>                             >>
>>>>>                             >>
>>>>>                             >> On Wed, Jun 5, 2019 at 11:44 AM
>>>>>                             Daniel D. Daugherty <
>>>>>                             >> daniel.daugherty at oracle.com
>>>>>                             <mailto:daniel.daugherty at oracle.com>>
>>>>>                             wrote:
>>>>>                             >>
>>>>>                             >>> On 6/5/19 11:37 AM, Vitaly
>>>>>                             Davidovich wrote:
>>>>>                             >>>> On Wed, Jun 5, 2019 at 8:58 AM
>>>>>                             Vitaly Davidovich <vitalyd at gmail.com
>>>>>                             <mailto:vitalyd at gmail.com>>
>>>>>                             >>> wrote:
>>>>>                             >>>>
>>>>>                             >>>>> Hey Aleksey,
>>>>>                             >>>>>
>>>>>                             >>>>> On Wed, Jun 5, 2019 at 8:52 AM
>>>>>                             Aleksey Shipilev <shade at redhat.com
>>>>>                             <mailto:shade at redhat.com>>
>>>>>                             >>> wrote:
>>>>>                             >>>>>
>>>>>                             >>>>>> On 6/5/19 2:33 PM, Vitaly
>>>>>                             Davidovich wrote:
>>>>>                             >>>>>>> Does anyone have ideas on
>>>>>                             pinpointing this further?
>>>>>                             >>>>>> I have the educated guess:
>>>>>                             >>>>>>
>>>>>                             https://bugs.openjdk.java.net/browse/JDK-8181859
>>>>>                             >>>>>>
>>>>>                             >>>>>> Which means you can try if
>>>>>                             -XX:MonitorUsedDeflationThreshold=0
>>>>>                             reduces
>>>>>                             >>>>>> the number of "guaranteed"
>>>>>                             >>>>>> safepoints.
>>>>>                             >>>>>>
>>>>>                             >>>>> Hmm, the application doesn't use
>>>>>                             JVM monitors (in any meaningful
>>>>>                             >>>>> capacity).  But interesting JBS
>>>>>                             entry - I will try it out, thanks! The
>>>>>                             >>>>> description of safepoint clean
>>>>>                             up in there is/was consistent with my
>>>>>                             >>>>> understanding as well.  On Java
>>>>>                             8, we'd see the "no vm operation"
>>>>>                             >>> safepoint
>>>>>                             >>>>> operations, and I always assumed
>>>>>                             it was IC buffer cleaning (even had a
>>>>>                             >>> mail
>>>>>                             >>>>> thread on that topic a few years
>>>>>                             ago, either on this list or
>>>>>                             >>>>> hotspot-compiler).  That tended
>>>>>                             to stabilize (i.e. they went away)
>>>>>                             >>> after
>>>>>                             >>>>> the application reached steady
>>>>>                             state, whereas I'm seeing essentially
>>>>>                             >>>>> non-stop safepoint cleanups on 11.
>>>>>                             >>>>>
>>>>>                             >>>> Ok, adding
>>>>>                             -XX:MonitorUsedDeflationThreshold=0
>>>>>                             seems to alleviate the
>>>>>                             >>> issue
>>>>>                             >>>> - # of entered safepoint cleanups
>>>>>                             looks on par with Java 8, although I'm
>>>>>                             >>>> still assessing a bit more.
>>>>>                             >>>
>>>>>                             >>> This observation doesn't really
>>>>>                             match with your earlier comment:
>>>>>                             >>>
>>>>>                             >>>> Hmm, the application doesn't use
>>>>>                             JVM monitors (in any meaningful
>>>>>                             >>>> capacity).
>>>>>                             >>>
>>>>>                             >>> The default value for
>>>>>                             MonitorUsedDeflationThreshold is 90.
>>>>>                             So a cleanup
>>>>>                             >>> safepoint should only be triggered
>>>>>                             if monitor use exceeds 90% of the
>>>>>                             >>> monitor pool.
>>>>>                             >>>
>>>>>                             >> Hmm, ok - I see what you mean:
>>>>>                             >>
>>>>>                             https://github.com/openjdk/jdk11u/blob/d46b56eb81b07b452141c7dcd078d33fc8e49d63/src/hotspot/share/runtime/safepoint.cpp#L604
>>>>>                             >>
>>>>>                             >> So there's either monitor usage
>>>>>                             that I'm not aware of (or forgot about) or
>>>>>                             >> the ratio is > 90 but the absolute
>>>>>                             # of them is small (which I'd consider
>>>>>                             >> "don't use in any meaningful
>>>>>                             capacity"). Is there a way to print out
>>>>>                             >> monitor population so I can see?
>>>>>                             >>
>>>>>                             >>
>>>>>                             >> I've been adding a lot of logging
>>>>>                             support to the Java Monitor subsystem,
>>>>>                             >> but those changes are in JDK13 and
>>>>>                             not in JDK11. I'm trying to remember
>>>>>                             >> what there is in JDK11, but ...
>>>>>                             >>
>>>>>                             > Yeah, I'm having a hard time keeping
>>>>>                             track of what features/changes are in
>>>>>                             > what Java release or update version
>>>>>                             or upstream/downstream backport or ...
>>>>>                             >
>>>>>                             > By the way, do JNI (specifically,
>>>>>                             Java calling into native) calls cause
>>>>>                             > monitor inflation? The app in
>>>>>                             question does have substantial JNI
>>>>>                             usage.  I
>>>>>                             > guess by not using java monitors
>>>>>                             extensively, I meant the vanilla flavor of
>>>>>                             > Java code with synchronized
>>>>>                             blocks/methods.
>>>>>
>>>>>                             JNI calls do not in themselves impact
>>>>>                             monitors. However if native code
>>>>>                             uses the JNI MonitorEnter then that
>>>>>                             does trigger inflation.
>>>>>
>>>>>                             David
>>>>>                             -----
>>>>>
>>>>>                         Thanks for clarifying, David.
>>>>>
>>>>>                         Pretty certain that JNI code I have is not
>>>>>                         calling MonitorEnter, but will double check.
>>>>>
>>>>>                         So beyond monitor contention via vanilla
>>>>>                         Java code, are there other scenarios
>>>>>                         resulting in inflation? Is there perhaps a
>>>>>                         list of these that someone could point to?
>>>>>                         I can spelunk through hotspot code, but
>>>>>                         maybe someone can spare me that exercise :).
>>>>
>>>>                         Java Monitor inflation is caused by:
>>>>
>>>>                         - contention on the same monitor between 2
>>>>                         or more threads
>>>>                         - monitor.wait() call
>>>>                         - synchronized (obj) { obj.hashCode(); }
>>>>
>>>>                         The last one is specific to the HotSpot VM.
>>>>
>>>>                         Dan
>>>>
>>>>                     Awesome, thanks Dan! In lieu of more precise
>>>>                     monitor usage logging in 11, I’ll look at our
>>>>                     code for these.
>>>>
>>>>                 I’m going to try "monitorinflation=debug" tomorrow:
>>>>                 https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/synchronizer.cpp#L1519
>>>>
>>>>                 Maybe that’ll help shed light on inflations.
>>>>
>>>>             So it doesn't look like there's much monitor inflation
>>>>             - some early during startup, but they go away fairly
>>>>             quickly (if monitorinflation=debug is to be trusted
>>>>             comprehensive).
>>>
>>>             Yes the 'monitorinflation=debug' logging in JDK11u will
>>>             tell you about
>>>             every inflation and deflation of an ObjectMonitor. It is
>>>             rather verbose
>>>             if you have a lot of ObjectMonitor inflation/deflation
>>>             happening. If it
>>>             is not overloading you with information, then monitor
>>>             deflation is not
>>>             your issue.
>>>
>>>         So have been playing around with 'monitorinflation=debug' in
>>>         conjunction with -XX:MonitorUsedDeflationThreshold=0. One
>>>         (so far) puzzling observation is with the UL flag set, I
>>>         still see recurring safepoint cleanups but no trace of
>>>         monitor inflation/deflation events.  Naively, I'd expect to
>>>         see (at least) inflation events if a subsequent safepoint
>>>         cleanup is entered for deflations.
>>
>>         Are you not seeing _any_ monitor inflation events?
>>         Or are you seeing _some_ monitor inflation events at the
>>         beginning
>>         and then no more after a steady state is reached?
>>
>>     There are some inflation events early in the process' lifetime,
>>     and a bunch of safepoints (for other reasons as well, e.g. JIT
>>     deopts).  But then there are _no_ monitor inflation events in UL,
>>     yet safepoint cleanups continue.  This is why I asked if you
>>     think previous monitor cleanups didn't make sufficient progress
>>     to get under the MonitorUsedDeflationThreshold(90), and thus
>>     every subsequent guaranteed safepoint poll actually enters cleanup.
>>
>>
>>
>>>         Setting MonitorUsedDeflationThreshold=0 does stop the
>>>         cleanup storm, however.
>>
>>         So that indicates that you've found the right knob to stop
>>         this change
>>         in behavior. That's good.
>>
>>     Yeah, that seems to have worked.  My hesitation was/is in it
>>     being an experimental flag, and running a production process with
>>     *any* experimental flag enabled.  At a minimum, it raises eyebrows.
>>
>>
>>
>>>         So unless there's some inflation that is not reported via
>>>         UL, it's not clear why there's this seeming disconnect.
>>
>>         These two spots in the code are the only two where we inflate a
>>         monitor:
>>
>>         open/src/hotspot/share/runtime/synchronizer.cpp:
>>
>>             // CASE: stack-locked
>>         <snip>
>>               if (log_is_enabled(Debug, monitorinflation)) {
>>                 if (object->is_instance()) {
>>                   ResourceMark rm;
>>         log_debug(monitorinflation)("Inflating object " INTPTR_FORMAT
>>         " , mark " INTPTR_FORMAT " , type %s",
>>         p2i(object), p2i(object->mark()),
>>         object->klass()->external_name());
>>                 }
>>               }
>>               if (event.should_commit()) {
>>                 post_monitor_inflate_event(&event, object, cause);
>>               }
>>               return m;
>>             }
>>
>>             // CASE: neutral
>>         <snip>
>>             if (log_is_enabled(Debug, monitorinflation)) {
>>               if (object->is_instance()) {
>>                 ResourceMark rm;
>>                 log_debug(monitorinflation)("Inflating object "
>>         INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
>>         p2i(object), p2i(object->mark()),
>>         object->klass()->external_name());
>>               }
>>             }
>>             if (event.should_commit()) {
>>               post_monitor_inflate_event(&event, object, cause);
>>             }
>>             return m;
>>           }
>>
>>         I have changed this code a bit in JDK13. For example, I've
>>         removed the "if (object->is_instance())" checks so in JDK11u
>>         is possible for there to be inflations related to objects
>>         where "object->is_instance() == false" and those won't be
>>         reported.
>>
>>         In my own testing of the logging improvements in JDK13, I
>>         saw very few messages where "object->is_instance() == false"
>>         and I saw no ill effects from stripping that if-statement
>>         so I deleted it...
>>
>>     Ok, good to know.
>>
>>
>>         Dan
>>
>>>
>>>
>>>             Just FYI: in JDK13 that particular set of diagnostics
>>>             have moved from
>>>             'monitorinflation=debug' to 'monitorinflation=trace'.
>>>
>>>         Ok, good to know - thanks for the heads up.
>>>
>>>
>>>             Dan
>>>
>>>
>>>>             Yet safepoints continue. Interestingly, the timing
>>>>             reported in safepoint cleanup logging shows miniscule
>>>>             time spent "deflating idle monitors" (e.g. 0.0000004
>>>>             secs = 400 ns), which I take as essentially overhead of
>>>>             going through that codepath without actually doing much
>>>>             of anything.  "updating inline caches" is about the
>>>>             same (400-600 ns).
>>>>
>>>>             However, "compilation policy safepoint handler" is tens
>>>>             of micros (e.g. 30-50us).  Looking at the code,
>>>>             https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/compilationPolicy.cpp#L321,
>>>>             it appears to be doing method invocation counter
>>>>             decaying. That does stackwalks (the application has
>>>>             about 30 threads), so fine - it's not very cheap.  But,
>>>>             according to
>>>>             https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/runtime/safepoint.cpp#L602,
>>>>             decay itself does no gate doing cleanup operations.
>>>>
>>>>             Unfortunately, it doesn't appear that I can trace
>>>>             ICBuffers in product builds (TraceICBuffer flag isn't
>>>>             present), so can't tell whether there's some change (vs
>>>>             java 8) in their regard. Naively, it also seems odd
>>>>             that a single stub in the ICBuffer would require a
>>>>             cleanup:
>>>>             https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/code/icBuffer.cpp#L163.
>>>>
>>>>             I'll keep investigating.
>>>>
>>>>
>>>>                 Dan, please let me know if you think this will be
>>>>                 fruitless :).
>>>>
>>>>
>>>>                     Only remaining input I’d appreciate from the
>>>>                     list is whether setting
>>>>                     MonitorUsedDeflationThreshold=0 “restores” 8
>>>>                     behavior, or whether I’d run the risk of
>>>>                     something unfortunate as a result (again, with
>>>>                     respect to 8).
>>>>
>>>>                     Thanks
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>                         Thanks!
>>>>>
>>>>>
>>>>>
>>>>>                             >
>>>>>                             >>
>>>>>                             >> Dan
>>>>>                             >>
>>>>>                             >>
>>>>>                             >>
>>>>>                             >> Thanks Dan
>>>>>                             >>
>>>>>                             >>>
>>>>>                             >>> Dan
>>>>>                             >>>
>>>>>                             >>>
>>>>>                             >>>>
>>>>>                             >>>> Of course now the trouble is this
>>>>>                             is an experimental option, and
>>>>>                             >>> enabling
>>>>>                             >>>> things like that in production
>>>>>                             raises eyebrows.  Is it safe to assume,
>>>>>                             >>>> however, that setting this to 0
>>>>>                             essentially behaves like Java 8? It
>>>>>                             >>> appears
>>>>>                             >>>> so, based on that JBS Aleksey
>>>>>                             linked above.  But just double checking.
>>>>>                             >>>>
>>>>>                             >>>> If that's the case, then perhaps
>>>>>                             we can overlook the experimental bit
>>>>>                             >>> and
>>>>>                             >>>> wait until this is done outside
>>>>>                             safepoints, per Robbin's email.
>>>>>                             >>>>
>>>>>                             >>>> Thanks to everyone that responded
>>>>>                             on this thread! Very helpful.
>>>>>                             >>>>
>>>>>                             >>>>>> --
>>>>>                             >>>>>> Thanks,
>>>>>                             >>>>>> -Aleksey
>>>>>                             >>>>>>
>>>>>                             >>>>>>
>>>>>                             >>>
>>>>>                             >>>
>>>>>                             >>
>>>>>
>>>>>                         -- 
>>>>>                         Sent from my phone
>>>>
>>>>                     -- 
>>>>                     Sent from my phone
>>>>
>>>>                 -- 
>>>>                 Sent from my phone
>>>>
>>>
>>
>