RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
David Holmes
david.holmes at oracle.com
Tue Dec 17 07:03:00 UTC 2019
<resend as my mailer crashed during last send>
David
On 17/12/2019 4:57 pm, David Holmes wrote:
> Hi Richard,
>
> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>> Hi David,
>>
>> > Some further queries/concerns:
>> >
>> > src/hotspot/share/runtime/objectMonitor.cpp
>> >
>> > Can you please explain the changes to ObjectMonitor::wait:
>> >
>> > ! _recursions = save // restore the old recursion count
>> > ! + jt->get_and_reset_relock_count_after_wait(); //
>> > increased by the deferred relock count
>> >
>> > what is the "deferred relock count"? I gather it relates to
>> >
>> > "The code was extended to be able to deoptimize objects of a
>> frame that
>> > is not the top frame and to let another thread than the owning
>> thread do
>> > it."
>>
>> Yes, these relate. Currently EA based optimizations are reverted, when
>> a compiled frame is replaced
>> with corresponding interpreter frames. Part of this is relocking
>> objects with eliminated
>> locking. New with the enhancement is that we do this also just before
>> object references are acquired
>> through JVMTI. In this case we deoptimize also the owning compiled
>> frame C and we register
>> deoptimized objects as deferred updates. When control returns to C it
>> gets deoptimized, we notice
>> that objects are already deoptimized (reallocated and relocked), so we
>> don't do it again (relocking
>> twice would be incorrect of course). Deferred updates are copied into
>> the new interpreter frames.
>>
>> Problem: relocking is not possible if the target thread T is waiting
>> on the monitor that needs to be
>> relocked. This happens only with non-local objects with
>> EliminateNestedLocks. Instead relocking is
>> deferred until T owns the monitor again. This is what the piece of
>> code above does.
>
> Sorry I need some more detail here. How can you wait() on an object
> monitor if the object allocation and/or locking was optimised away? And
> what is a "non-local object" in this context? Isn't EA restricted to
> thread-confined objects?
>
> Is it just that some of the locking gets optimized away e.g.
>
> synchronised(obj) {
> synchronised(obj) {
> synchronised(obj) {
> obj.wait();
> }
> }
> }
>
> If this is reduced to a form as-if it were a single lock of the monitor
> (due to EA) and the wait() triggers a JVM TI event which leads to the
> escape of "obj" then we need to reconstruct the true lock state, and so
> when the wait() internally unblocks and reacquires the monitor it has to
> set the true recursion count to 3, not the 1 that it appeared to be when
> wait() was initially called. Is that the scenario?
>
> If so I find this truly awful. Anyone using wait() in a realistic form
> requires a notification and so the object cannot be thread confined. In
> which case I would strongly argue that upon hitting the wait() the deopt
> should occur unconditionally and so the lock state is correct before we
> wait and so we don't need to mess with the recursion count internally
> when we reacquire the monitor.
>
>>
>> > which I don't like the sound of at all when it comes to
>> ObjectMonitor
>> > state. So I'd like to understand in detail exactly what is going
>> on here
>> > and why. This is a very intrusive change that seems to badly break
>> > encapsulation and impacts future changes to ObjectMonitor that
>> are under
>> > investigation.
>>
>> I would not regard this as breaking encapsulation. Certainly not badly.
>>
>> I've added a property relock_count_after_wait to JavaThread. The
>> property is well
>> encapsulated. Future ObjectMonitor implementations have to deal with
>> recursion too. They are free in
>> choosing a way to do that as long as that property is taken into
>> account. This is hardly a
>> limitation.
>
> I do think this badly breaks encapsulation as you have to add a callout
> from the guts of the ObjectMonitor code to reach into the thread to get
> this lock count adjustment. I understand why you have had to do this but
> I would much rather see a change to the EA optimisation strategy so that
> this is not needed.
>
>> Note also that the property is a straight forward extension of the
>> existing concept of deferred
>> local updates. It is embedded into the structure holding them. So not
>> even the footprint of a
>> JavaThread is enlarged if no deferred updates are generated.
>>
>> > ---
>> >
>> > src/hotspot/share/runtime/thread.cpp
>> >
>> > Can you please explain why
>> JavaThread::wait_for_object_deoptimization
>> > has to be handcrafted in this way rather than using proper
>> transitions.
>> >
>>
>> I wrote wait_for_object_deoptimization taking
>> JavaThread::java_suspend_self_with_safepoint_check
>> as template. So in short: for the same reasons :)
>>
>> Threads reach both methods as part of thread state transitions,
>> therefore special handling is
>> required to change thread state on top of ongoing transitions.
>>
>> > We got rid of "deopt suspend" some time ago and it is disturbing
>> to see
>> > it being added back (effectively). This seems like it may be
>> something
>> > that handshakes could be used for.
>>
>> Deopt suspend used to be something rather different with a similar
>> name[1]. It is not being added back.
>
> I stand corrected. Despite comments in the code to the contrary
> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of
> cleanup in this area 13 years ago :)
>
>>
>> I'm actually duplicating the existing external suspend mechanism,
>> because a thread can be suspended
>> at most once. And hey, and don't like that either! But it seems not
>> unlikely that the duplicate can
>> be removed together with the original and the new type of handshakes
>> that will be used for
>> thread suspend can be used for object deoptimization too. See today's
>> discussion in JDK-8227745 [2].
>
> I hope that discussion bears some fruit, at the moment it seems not to
> be possible to use handshakes here. :(
>
> The external suspend mechanism is a royal pain in the proverbial that we
> have to carefully live with. The idea that we're duplicating that for
> use in another fringe area of functionality does not thrill me at all.
>
> To be clear, I understand the problem that exists and that you wish to
> solve, but for the runtime parts I balk at the complexity cost of
> solving it.
>
> Thanks,
> David
> -----
>
>> Thanks, Richard.
>>
>> [1] Deopt suspend was something like an async. handshake for
>> architectures with register windows,
>> where patching the return pc for deoptimization of a compiled
>> frame was racy if the owner thread
>> was in native code. Instead a "deopt" suspend flag was set on
>> which the thread patched its own
>> frame upon return from native. So no thread was suspended. It got
>> its name only from the name of
>> the flags.
>>
>> [2] Discussion about using handshakes to sync. with the target thread:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727
>>
>>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Freitag, 13. Dezember 2019 00:56
>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>> serviceability-dev at openjdk.java.net;
>> hotspot-compiler-dev at openjdk.java.net;
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Richard,
>>
>> Some further queries/concerns:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp
>>
>> Can you please explain the changes to ObjectMonitor::wait:
>>
>> ! _recursions = save // restore the old recursion count
>> ! + jt->get_and_reset_relock_count_after_wait(); //
>> increased by the deferred relock count
>>
>> what is the "deferred relock count"? I gather it relates to
>>
>> "The code was extended to be able to deoptimize objects of a frame that
>> is not the top frame and to let another thread than the owning thread do
>> it."
>>
>> which I don't like the sound of at all when it comes to ObjectMonitor
>> state. So I'd like to understand in detail exactly what is going on here
>> and why. This is a very intrusive change that seems to badly break
>> encapsulation and impacts future changes to ObjectMonitor that are under
>> investigation.
>>
>> ---
>>
>> src/hotspot/share/runtime/thread.cpp
>>
>> Can you please explain why JavaThread::wait_for_object_deoptimization
>> has to be handcrafted in this way rather than using proper transitions.
>>
>> We got rid of "deopt suspend" some time ago and it is disturbing to see
>> it being added back (effectively). This seems like it may be something
>> that handshakes could be used for.
>>
>> Thanks,
>> David
>> -----
>>
>> On 12/12/2019 7:02 am, David Holmes wrote:
>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>> Hi David,
>>>>
>>>> > Most of the details here are in areas I can comment on in detail,
>>>> but I
>>>> > did take an initial general look at things.
>>>>
>>>> Thanks for taking the time!
>>>
>>> Apologies the above should read:
>>>
>>> "Most of the details here are in areas I *can't* comment on in detail
>>> ..."
>>>
>>> David
>>>
>>>> > The only thing that jumped out at me is that I think the
>>>> > DeoptimizeObjectsALotThread should be a hidden thread.
>>>> >
>>>> > + bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Yes, it should. Will add the method like above.
>>>>
>>>> > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>> Without
>>>> > active testing this will just bit-rot.
>>>>
>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>> workload. I will add a minimal test
>>>> to keep it fresh.
>>>>
>>>> > Also on the tests I don't understand your @requires clause:
>>>> >
>>>> > @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> > (vm.opt.TieredCompilation != true))
>>>> >
>>>> > This seems to require that TieredCompilation is disabled, but
>>>> tiered is
>>>> > our normal mode of operation. ??
>>>> >
>>>>
>>>> I removed the clause. I guess I wanted to target the tests towards the
>>>> code they are supposed to
>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>> with just one compiler thread.
>>>>
>>>> Additionally I will make use of
>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>> Performance in the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>> Hi,
>>>>>
>>>>> I would like to get reviews please for
>>>>>
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>
>>>>> Corresponding RFE:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>
>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>
>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>> issues (thanks!). In addition the
>>>>> change is being tested at SAP since I posted the first RFR some
>>>>> months ago.
>>>>>
>>>>> The intention of this enhancement is to benefit performance wise from
>>>>> escape analysis even if JVMTI
>>>>> agents request capabilities that allow them to access local variable
>>>>> values. E.g. if you start-up
>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>> escape analysis is disabled right
>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>> should do so. With the
>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>> debugger attaches. EA based
>>>>> optimizations are reverted just before an agent acquires the
>>>>> reference to an object. In the JBS item
>>>>> you'll find more details.
>>>>
>>>> Most of the details here are in areas I can comment on in detail, but I
>>>> did take an initial general look at things.
>>>>
>>>> The only thing that jumped out at me is that I think the
>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>
>>>> + bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>> Without
>>>> active testing this will just bit-rot.
>>>>
>>>> Also on the tests I don't understand your @requires clause:
>>>>
>>>> @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> (vm.opt.TieredCompilation != true))
>>>>
>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>> our normal mode of operation. ??
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>>>>
>>>>>
>>>>>
More information about the hotspot-compiler-dev
mailing list