Increased safepoint cleanup tasks in Java 11 (vs 8)

Thu Jun 27 16:39:42 UTC 2019

Dan,

Thanks for the details! Indeed, more complicated than one would naively
assume.  It seems geared towards applications that heavily use monitors; I
wonder if that’s the right default for modern Java apps ...

On Thu, Jun 20, 2019 at 9:46 AM Daniel D. Daugherty <
daniel.daugherty at oracle.com> wrote:

> On 6/20/19 9:03 AM, Vitaly Davidovich wrote:
>
> Hey Dan,
>
> On Wed, Jun 19, 2019 at 4:44 PM Daniel D. Daugherty <
> daniel.daugherty at oracle.com> wrote:
>
>> Vitaly,
>>
>> I filed the following bug:
>>
>>     JDK-8226416 MonitorUsedDeflationThreshold can caused repeated cleanup
>> safepoints
>>     https://bugs.openjdk.java.net/browse/JDK-8226416
>>
>> for this issue. I've created a test that reproduces the bug
>> in my JDK13 environment and _should_ also recreate the problem
>> in JDK11u... I have not tried it on JDK11u myself...
>>
> Thanks! You mentioned the following in the bug report:
>
>> at this point gMonitorFreeCount has reached 0 because all of the free
>> global monitors have been allocated to per-thread lists.
>
> How do free global monitors get allocated to per-thread lists?
>
>
> Please see ObjectSynchronizer::omAlloc() in
> src/hotspot/share/runtime/synchronizer.cpp.
>
> Short version: When omAlloc() is called:
>
> 1) check the per-thread free list
> 2) check the global free list
> 3) allocate new ObjectMonitors
>
> In the case that I'm talking about, the thread moved
> free nodes from the global free list to its own per-thread
> free list.
>
>
> In your demo program, there's only a single (Java/user) thread, so what
> are the "per-thread lists" you're referring to? I didn't quite understand
> why, with 33 monitor inflations, the gMonitorFreeCount dropped by a lot
> more.  It must be this per-thread list allocation, but perhaps you can
> explain a bit how that works.
>
>
> The ObjectMonitor allocation and management mechanisms are complicated.
> The first time a thread asks for ObjectMonitors, it asks for 32 of them.
> If it uses all of those up, then the next time it asks for more. That's
> this line of code:
>
> Self->omFreeProvision += 1 + (Self->omFreeProvision/2);
>
>
> so omFreeProvision starts at 32 and next time it will be
>
>   32 + 1 + (32 / 2) => 33 + 16 => 49
>
> If you look at the "debug" logging output files you'll find
> some tables which so the "Provision" value for each thread.
>
> So when I inflated 33 monitors in that thread, I was intentionally
> using up the initial allocation and causing a new allocation request.
> Although the omFreeProvision value went to 49, that does not mean
> that the thread actually allocated that many more. It does that only
> if they are readily available.
>
> These are lines from the debug logging for our target thread:
>
> [0.112s][debug][monitorinflation]  Per-Thread Lists:       InUse
> Free   Provision
> [0.112s][debug][monitorinflation] ==================  ==========
> ==========  ==========
> [0.112s][debug][monitorinflation] 0x00007fbc3f809000           0
> 0          32
> [0.120s][debug][monitorinflation] 0x00007fbc3f809000           0
> 0          32
> [0.207s][debug][monitorinflation] 0x00007fbc3f809000          19
> 13          49
> [1.212s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [2.219s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [3.225s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [4.230s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [5.233s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [6.240s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [7.243s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [8.249s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [9.251s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
>
> So our thread starts with an omFreeProvision value of 32, but
> doesn't have any for the first two reports. The third line shows
> that it has 32 split between in-use (19) and free (13), but the
> omFreeProvision value is now 49. That's because it got updated
> by that line of code above for the _next_ allocation. The fourth
> line shows that our thread has just 33 in-use so even though the
> omFreeProvision == 49 when we needed one more, we only got the
> one more and did not get any more for our free list.
>
> Another thing to keep in mind is that there are other threads
> and the first time they need a single inflated ObjectMonitor,
> they will also get 32 of them (if they are available).
>
> Here's the last table before we hit the 10 second mark and
> unlock all of our inflated monitors:
>
> [9.251s][debug][monitorinflation]      Global Lists:       InUse
> Free       Total
> [9.251s][debug][monitorinflation] ==================  ==========
> ==========  ==========
> [9.251s][debug][monitorinflation]                              0
> 0         127
> [9.251s][debug][monitorinflation]  Per-Thread Lists:       InUse
> Free   Provision
> [9.251s][debug][monitorinflation] ==================  ==========
> ==========  ==========
> [9.251s][debug][monitorinflation] 0x00007fbc3f809000          33
> 0          74
> [9.251s][debug][monitorinflation] 0x00007fbc3f02f000           0
> 31          49
> [9.251s][debug][monitorinflation] 0x00007fbc4000a800           1
> 30          49
> [9.251s][debug][monitorinflation] 0x00007fbc3f941000           0
> 0          32
> [9.251s][debug][monitorinflation] 0x00007fbc3f95e000           0
> 0          32
> [9.251s][debug][monitorinflation] 0x00007fbc3f03a000           0
> 0          32
> [9.251s][debug][monitorinflation] 0x00007fbc3f03d800           0
> 0          32
> [9.251s][debug][monitorinflation] 0x00007fbc3f9ba800           0
> 0          32
> [9.251s][debug][monitorinflation] 0x00007fbc3f03b000           1
> 31          49
> [9.251s][debug][monitorinflation] Checking global lists:
> [9.251s][debug][monitorinflation] gMonitorPopulation=127 equals
> chkMonitorPopulation=127
> [9.251s][debug][monitorinflation] gOmInUseCount=0 equals chkOmInUseCount=0
> [9.251s][debug][monitorinflation] gMonitorFreeCount=0 equals
> chkMonitorFreeCount=0
>
> We have a total of 127 ObjectMonitors out there spread across
> 4 threads. 3 threads are using one or more ObjectMonitors.
>
> This line is interesting:
>
> 0x00007fbc3f02f000           0          31          49
>
> This thread obviously needed an ObjectMonitor (because it has
> some on its free list), but none are currently in-use. My guess
> would be that this thread used an ObjectMonitor, finished with
> it, it was deflated and put back on the global free list and
> that's where our thread (0x00007fbc3f809000) picked up one more
> ObjectMonitor to get to 33. If we switched from 'debug' level
> to 'trace' level we would see logging that would show all of
> that info in gory detail...
>
> Like I said... complicated...
>
> Dan
>
>
>
>> Dan
>>
> Thanks,
>
> Vitaly
>
>>
>>
>> On 6/18/19 4:09 PM, Vitaly Davidovich wrote:
>>
>> Hey Dan,
>>
>> On Tue, Jun 18, 2019 at 3:44 PM Daniel D. Daugherty <
>> daniel.daugherty at oracle.com> wrote:
>>
>>> Sorry for the delay in getting back to this query. I've been focused
>>> on other tasks...
>>>
>> No worries, I understand and appreciate your time and help.
>>
>>>
>>> On 6/12/19 12:46 PM, Vitaly Davidovich wrote:
>>>
>>>
>>>
>>> On Thu, Jun 6, 2019 at 10:29 AM Daniel D. Daugherty <
>>> daniel.daugherty at oracle.com> wrote:
>>>
>>>> On 6/6/19 10:08 AM, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Jun 5, 2019 at 10:13 PM Vitaly Davidovich <vitalyd at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 5, 2019 at 7:37 PM Vitaly Davidovich <vitalyd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 5, 2019 at 7:10 PM Daniel D. Daugherty <
>>>>>> daniel.daugherty at oracle.com> wrote:
>>>>>>
>>>>>>> On 6/5/19 7:03 PM, Vitaly Davidovich wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 5, 2019 at 6:55 PM David Holmes <david.holmes at oracle.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 6/06/2019 2:31 am, Vitaly Davidovich wrote:
>>>>>>>> > On Wed, Jun 5, 2019 at 12:15 PM Daniel D. Daugherty <
>>>>>>>> > daniel.daugherty at oracle.com> wrote:
>>>>>>>> >
>>>>>>>> >> On 6/5/19 12:04 PM, Vitaly Davidovich wrote:
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Wed, Jun 5, 2019 at 11:44 AM Daniel D. Daugherty <
>>>>>>>> >> daniel.daugherty at oracle.com> wrote:
>>>>>>>> >>
>>>>>>>> >>> On 6/5/19 11:37 AM, Vitaly Davidovich wrote:
>>>>>>>> >>>> On Wed, Jun 5, 2019 at 8:58 AM Vitaly Davidovich <
>>>>>>>> vitalyd at gmail.com>
>>>>>>>> >>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> Hey Aleksey,
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Wed, Jun 5, 2019 at 8:52 AM Aleksey Shipilev <
>>>>>>>> shade at redhat.com>
>>>>>>>> >>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>>> On 6/5/19 2:33 PM, Vitaly Davidovich wrote:
>>>>>>>> >>>>>>> Does anyone have ideas on pinpointing this further?
>>>>>>>> >>>>>> I have the educated guess:
>>>>>>>> >>>>>>     https://bugs.openjdk.java.net/browse/JDK-8181859
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Which means you can try if
>>>>>>>> -XX:MonitorUsedDeflationThreshold=0 reduces
>>>>>>>> >>>>>> the number of "guaranteed"
>>>>>>>> >>>>>> safepoints.
>>>>>>>> >>>>>>
>>>>>>>> >>>>> Hmm, the application doesn't use JVM monitors (in any
>>>>>>>> meaningful
>>>>>>>> >>>>> capacity).  But interesting JBS entry - I will try it out,
>>>>>>>> thanks! The
>>>>>>>> >>>>> description of safepoint clean up in there is/was consistent
>>>>>>>> with my
>>>>>>>> >>>>> understanding as well.  On Java 8, we'd see the "no vm
>>>>>>>> operation"
>>>>>>>> >>> safepoint
>>>>>>>> >>>>> operations, and I always assumed it was IC buffer cleaning
>>>>>>>> (even had a
>>>>>>>> >>> mail
>>>>>>>> >>>>> thread on that topic a few years ago, either on this list or
>>>>>>>> >>>>> hotspot-compiler).  That tended to stabilize (i.e. they went
>>>>>>>> away)
>>>>>>>> >>> after
>>>>>>>> >>>>> the application reached steady state, whereas I'm seeing
>>>>>>>> essentially
>>>>>>>> >>>>> non-stop safepoint cleanups on 11.
>>>>>>>> >>>>>
>>>>>>>> >>>> Ok, adding -XX:MonitorUsedDeflationThreshold=0 seems to
>>>>>>>> alleviate the
>>>>>>>> >>> issue
>>>>>>>> >>>> - # of entered safepoint cleanups looks on par with Java 8,
>>>>>>>> although I'm
>>>>>>>> >>>> still assessing a bit more.
>>>>>>>> >>>
>>>>>>>> >>> This observation doesn't really match with your earlier comment:
>>>>>>>> >>>
>>>>>>>> >>>> Hmm, the application doesn't use JVM monitors (in any
>>>>>>>> meaningful
>>>>>>>> >>>> capacity).
>>>>>>>> >>>
>>>>>>>> >>> The default value for MonitorUsedDeflationThreshold is 90. So a
>>>>>>>> cleanup
>>>>>>>> >>> safepoint should only be triggered if monitor use exceeds 90%
>>>>>>>> of the
>>>>>>>> >>> monitor pool.
>>>>>>>> >>>
>>>>>>>> >> Hmm, ok - I see what you mean:
>>>>>>>> >>
>>>>>>>> https://github.com/openjdk/jdk11u/blob/d46b56eb81b07b452141c7dcd078d33fc8e49d63/src/hotspot/share/runtime/safepoint.cpp#L604
>>>>>>>> >>
>>>>>>>> >> So there's either monitor usage that I'm not aware of (or forgot
>>>>>>>> about) or
>>>>>>>> >> the ratio is > 90 but the absolute # of them is small (which I'd
>>>>>>>> consider
>>>>>>>> >> "don't use in any meaningful capacity").  Is there a way to
>>>>>>>> print out
>>>>>>>> >> monitor population so I can see?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> I've been adding a lot of logging support to the Java Monitor
>>>>>>>> subsystem,
>>>>>>>> >> but those changes are in JDK13 and not in JDK11. I'm trying to
>>>>>>>> remember
>>>>>>>> >> what there is in JDK11, but ...
>>>>>>>> >>
>>>>>>>> > Yeah, I'm having a hard time keeping track of what
>>>>>>>> features/changes are in
>>>>>>>> > what Java release or update version or upstream/downstream
>>>>>>>> backport or ...
>>>>>>>> >
>>>>>>>> > By the way, do JNI (specifically, Java calling into native) calls
>>>>>>>> cause
>>>>>>>> > monitor inflation? The app in question does have substantial JNI
>>>>>>>> usage.  I
>>>>>>>> > guess by not using java monitors extensively, I meant the vanilla
>>>>>>>> flavor of
>>>>>>>> > Java code with synchronized blocks/methods.
>>>>>>>>
>>>>>>>> JNI calls do not in themselves impact monitors. However if native
>>>>>>>> code
>>>>>>>> uses the JNI MonitorEnter then that does trigger inflation.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>
>>>>>>> Thanks for clarifying, David.
>>>>>>>
>>>>>>> Pretty certain that JNI code I have is not calling MonitorEnter, but
>>>>>>> will double check.
>>>>>>>
>>>>>>> So beyond monitor contention via vanilla Java code, are there other
>>>>>>> scenarios resulting in inflation? Is there perhaps a list of these that
>>>>>>> someone could point to? I can spelunk through hotspot code, but maybe
>>>>>>> someone can spare me that exercise :).
>>>>>>>
>>>>>>>
>>>>>>> Java Monitor inflation is caused by:
>>>>>>>
>>>>>>> - contention on the same monitor between 2 or more threads
>>>>>>> - monitor.wait() call
>>>>>>> - synchronized (obj) { obj.hashCode(); }
>>>>>>>
>>>>>>> The last one is specific to the HotSpot VM.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>> Awesome, thanks Dan! In lieu of more precise monitor usage logging in
>>>>>> 11, I’ll look at our code for these.
>>>>>>
>>>>> I’m going to try "monitorinflation=debug" tomorrow:
>>>>>
>>>>> https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/synchronizer.cpp#L1519
>>>>>
>>>>> Maybe that’ll help shed light on inflations.
>>>>>
>>>> So it doesn't look like there's much monitor inflation - some early
>>>> during startup, but they go away fairly quickly (if monitorinflation=debug
>>>> is to be trusted comprehensive).
>>>>
>>>>
>>>> Yes the 'monitorinflation=debug' logging in JDK11u will tell you about
>>>> every inflation and deflation of an ObjectMonitor. It is rather verbose
>>>> if you have a lot of ObjectMonitor inflation/deflation happening. If it
>>>> is not overloading you with information, then monitor deflation is not
>>>> your issue.
>>>>
>>> So have been playing around with 'monitorinflation=debug' in conjunction
>>> with -XX:MonitorUsedDeflationThreshold=0.  One (so far) puzzling
>>> observation is with the UL flag set, I still see recurring safepoint
>>> cleanups but no trace of monitor inflation/deflation events.  Naively, I'd
>>> expect to see (at least) inflation events if a subsequent safepoint cleanup
>>> is entered for deflations.
>>>
>>>
>>> Are you not seeing _any_ monitor inflation events?
>>> Or are you seeing _some_ monitor inflation events at the beginning
>>> and then no more after a steady state is reached?
>>>
>> There are some inflation events early in the process' lifetime, and a
>> bunch of safepoints (for other reasons as well, e.g. JIT deopts).  But then
>> there are _no_ monitor inflation events in UL, yet safepoint cleanups
>> continue.  This is why I asked if you think previous monitor cleanups
>> didn't make sufficient progress to get under the
>> MonitorUsedDeflationThreshold(90), and thus every subsequent guaranteed
>> safepoint poll actually enters cleanup.
>>
>>>
>>>
>>> Setting MonitorUsedDeflationThreshold=0 does stop the cleanup storm,
>>> however.
>>>
>>>
>>> So that indicates that you've found the right knob to stop this change
>>> in behavior. That's good.
>>>
>> Yeah, that seems to have worked.  My hesitation was/is in it being an
>> experimental flag, and running a production process with *any* experimental
>> flag enabled.  At a minimum, it raises eyebrows.
>>
>>>
>>>
>>> So unless there's some inflation that is not reported via UL, it's not
>>> clear why there's this seeming disconnect.
>>>
>>>
>>> These two spots in the code are the only two where we inflate a
>>> monitor:
>>>
>>> open/src/hotspot/share/runtime/synchronizer.cpp:
>>>
>>>     // CASE: stack-locked
>>> <snip>
>>>       if (log_is_enabled(Debug, monitorinflation)) {
>>>         if (object->is_instance()) {
>>>           ResourceMark rm;
>>>           log_debug(monitorinflation)("Inflating object " INTPTR_FORMAT
>>> " , mark " INTPTR_FORMAT " , type %s",
>>>                                       p2i(object), p2i(object->mark()),
>>>                                       object->klass()->external_name());
>>>         }
>>>       }
>>>       if (event.should_commit()) {
>>>         post_monitor_inflate_event(&event, object, cause);
>>>       }
>>>       return m;
>>>     }
>>>
>>>     // CASE: neutral
>>> <snip>
>>>     if (log_is_enabled(Debug, monitorinflation)) {
>>>       if (object->is_instance()) {
>>>         ResourceMark rm;
>>>         log_debug(monitorinflation)("Inflating object " INTPTR_FORMAT "
>>> , mark " INTPTR_FORMAT " , type %s",
>>>                                     p2i(object), p2i(object->mark()),
>>>                                     object->klass()->external_name());
>>>       }
>>>     }
>>>     if (event.should_commit()) {
>>>       post_monitor_inflate_event(&event, object, cause);
>>>     }
>>>     return m;
>>>   }
>>>
>>> I have changed this code a bit in JDK13. For example, I've
>>> removed the "if (object->is_instance())" checks so in JDK11u
>>> is possible for there to be inflations related to objects
>>> where "object->is_instance() == false" and those won't be
>>> reported.
>>>
>>> In my own testing of the logging improvements in JDK13, I
>>> saw very few messages where "object->is_instance() == false"
>>> and I saw no ill effects from stripping that if-statement
>>> so I deleted it...
>>>
>> Ok, good to know.
>>
>>>
>>> Dan
>>>
>>>
>>>
>>>> Just FYI: in JDK13 that particular set of diagnostics have moved from
>>>> 'monitorinflation=debug' to 'monitorinflation=trace'.
>>>>
>>> Ok, good to know - thanks for the heads up.
>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>> Yet safepoints continue.  Interestingly, the timing reported in
>>>> safepoint cleanup logging shows miniscule time spent "deflating idle
>>>> monitors" (e.g. 0.0000004 secs = 400 ns), which I take as essentially
>>>> overhead of going through that codepath without actually doing much of
>>>> anything.  "updating inline caches" is about the same (400-600 ns).
>>>>
>>>> However, "compilation policy safepoint handler" is tens of micros (e.g.
>>>> 30-50us).  Looking at the code,
>>>> https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/runtime/compilationPolicy.cpp#L321,
>>>> it appears to be doing method invocation counter decaying.  That does
>>>> stackwalks (the application has about 30 threads), so fine - it's not very
>>>> cheap.  But, according to
>>>> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/runtime/safepoint.cpp#L602,
>>>> decay itself does no gate doing cleanup operations.
>>>>
>>>> Unfortunately, it doesn't appear that I can trace ICBuffers in product
>>>> builds (TraceICBuffer flag isn't present), so can't tell whether there's
>>>> some change (vs java 8) in their regard.  Naively, it also seems odd that a
>>>> single stub in the ICBuffer would require a cleanup:
>>>> https://github.com/openjdk/jdk11u/blob/737d8437886ad97c6ed21a25b9911c10b3886f61/src/hotspot/share/code/icBuffer.cpp#L163
>>>> .
>>>>
>>>> I'll keep investigating.
>>>>
>>>>>
>>>>> Dan, please let me know if you think this will be fruitless :).
>>>>>
>>>>>>
>>>>>> Only remaining input I’d appreciate from the list is whether setting
>>>>>> MonitorUsedDeflationThreshold=0 “restores” 8 behavior, or whether I’d run
>>>>>> the risk of something unfortunate as a result (again, with respect to 8).
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> Dan
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Thanks Dan
>>>>>>>> >>
>>>>>>>> >>>
>>>>>>>> >>> Dan
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>>
>>>>>>>> >>>> Of course now the trouble is this is an experimental option,
>>>>>>>> and
>>>>>>>> >>> enabling
>>>>>>>> >>>> things like that in production raises eyebrows.  Is it safe to
>>>>>>>> assume,
>>>>>>>> >>>> however, that setting this to 0 essentially behaves like Java
>>>>>>>> 8? It
>>>>>>>> >>> appears
>>>>>>>> >>>> so, based on that JBS Aleksey linked above.  But just double
>>>>>>>> checking.
>>>>>>>> >>>>
>>>>>>>> >>>> If that's the case, then perhaps we can overlook the
>>>>>>>> experimental bit
>>>>>>>> >>> and
>>>>>>>> >>>> wait until this is done outside safepoints, per Robbin's email.
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks to everyone that responded on this thread! Very helpful.
>>>>>>>> >>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> Thanks,
>>>>>>>> >>>>>> -Aleksey
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>>
>>>>>>> --
>>>>>>> Sent from my phone
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Sent from my phone
>>>>>>
>>>>> --
>>>>> Sent from my phone
>>>>>
>>>>
>>>>
>>>
>>
> --
Sent from my phone