RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
Reingruber, Richard
richard.reingruber at sap.com
Wed Apr 1 06:19:10 UTC 2020
> Thanks for cleaning up thread.hpp!
Thanks for providing the feedback!
I justed noticed that the forward declaration of class jvmtiDeferredLocalVariableSet is not required anymore. Will remove it in the next webrev. Hope to get some more (partial) reviews.
Thanks, Richard.
-----Original Message-----
From: Robbin Ehn <robbin.ehn at oracle.com>
Sent: Dienstag, 31. März 2020 16:21
To: Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes <david.holmes at oracle.com>; Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
Thanks for cleaning up thread.hpp!
/Robbin
On 2020-03-30 10:31, Reingruber, Richard wrote:
> Hi,
>
> this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :)
>
> The change affects jvmti, hotspot and c2. Partial reviews are very welcome too.
>
> Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/
> Delta: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/
>
> Robbin, Martin, please let me know, if anything shouldn't be quite as you wanted it. Also find my
> comments on your feedback below.
>
> Robbin, can I count you as Reviewer for the runtime part?
>
> Thanks, Richard.
>
> --
>
>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>> You can move both declaration and definition to that file, no need to clobber
>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
>
> Done.
>
>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's own
>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
>
> I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting jvmtiDeferredLocalVariableSet is
> declared.
>
>> src/hotspot/share/code/compiledMethod.cpp
>> Nice cleanup!
>
> Thanks :)
>
>> src/hotspot/share/code/debugInfoRec.cpp
>> src/hotspot/share/code/debugInfoRec.hpp
>> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would read better than "not_global_escape_in_scope", but your version is consistent with existing code, so no change request from my side.) Ok.
>
> I've been thinking about this too and finally stayed with not_global_escape_in_scope. It's supposed
> to mean an object whose escape state is not GlobalEscape is in scope.
>
>> src/hotspot/share/compiler/compileBroker.cpp
>> src/hotspot/share/compiler/compileBroker.hpp
>> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a follow up change together with the test in order to make this webrev smaller, but since it is included, I'm reviewing everything at once. Not a big deal.) Ok.
>
> Yes the change would be a little smaller. And if it helps I'll split it off. In general I prefer
> patches that bring along a suitable amount of tests.
>
>> src/hotspot/share/opto/c2compiler.cpp
>> Make do_escape_analysis independent of JVMCI capabilities. Nice!
>
> It is the main goal of the enhancement. It is done for C2, but could be done for JVMCI compilers
> with just a small effort as well.
>
>> src/hotspot/share/opto/escape.cpp
>> Annotation for MachSafePointNodes. Your added functionality looks correct.
>> But I'd prefer to move the bulky code out of the large function.
>> I suggest to factor out something like has_not_global_escape and has_arg_escape. So the code could look like this:
>> SafePointNode* sfn = sfn_worklist.at(next);
>> sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
>> if (sfn->is_CallJava()) {
>> CallJavaNode* call = sfn->as_CallJava();
>> call->set_arg_escape(has_arg_escape(call));
>> }
>> This would also allow us to get rid of the found_..._escape_in_args variables making the loops better readable.
>
> Done.
>
>> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to be the way to do it (there are more such places). So it's ok.
>
> Yeah. I copied the snippet.
>
>> src/hotspot/share/prims/jvmtiImpl.cpp
>> src/hotspot/share/prims/jvmtiImpl.hpp
>> The sequence is pretty complex:
>> VM_GetOrSetLocal element initialization executes EscapeBarrier code which suspends the target thread (extra VM Operation).
>
> Note that the target threads have to be suspended already for VM_GetOrSetLocal*. So it's mainly the
> synchronization effect of EscapeBarrier::sync_and_suspend_one() that is required here. Also no extra
> _handshake_ is executed, since sync_and_suspend_one() will find the target threads already
> suspended.
>
>> VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread to prepare VM Operation with frame deoptimization).
>> VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which resumes the target thread.
>> But I don't have any improvement proposal. Performance is probably not a concern, here. So it's ok.
>
>> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has non-globally escaping objects and other frames if they have arg escaping ones. Good.
>
> It's not specifically the top frame, but the frame that is accessed.
>
>> src/hotspot/share/runtime/deoptimization.cpp
>> Object deoptimization. I have more comments and proposals, here.
>> First of all, handling recursive and waiting locks in relock_objects is tricky, but looks correct.
>> Comments are sufficient to understand why things are done as they are implemented.
>
>> BiasedLocking related parts are complex, but we may get rid of them in the future (with BiasedLocking removal).
>> Anyway, looks correct, too.
>
>> Typo in comment: "regularily" => "regularly"
>
>> Deoptimization::fetch_unroll_info_helper is the only place where _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I think we always go through it, so I can't see a memory leak or such kind of issues.
>
> That's correct. The compiled frame for which deferred updates are allocated is always deoptimized
> before (see EscapeBarrier::deoptimize_objects()). This is also asserted in
> compiledVFrame::update_deferred_value(). I've added the same assertion to
> Deoptimization::relock_objects(). So we can be sure that _jvmti_deferred_updates are deallocated
> again in fetch_unroll_info_helper().
>
>> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
>
> Sure, well spotted!
>
>> You can use MutexLocker and MonitorLocker with Thread* to save the Thread::current() call.
>
> Right, good hint. This was recently introduced with 8235678. I even had to resolve conflicts. Should
> have done this then.
>
>> I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier interface because I think it shouldn't be used outside of EscapeBarrier::deoptimize_objects.
>
> Done.
>
>> Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
>
> Replaced with "[...] we deoptimize iff local objects are passed as args"
>
>> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and barrier_active() is redundant. Implementation can get moved to hpp file.
>
> Ok. Done.
>
>> I'll get back to suspend flags, later.
>
>> There are weird cases regarding _self_deoptimization_in_progress.
>> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C can set _self_deoptimization_in_progress while A performs the handshake for suspending C. I think this doesn't lead to errors, but it's probably not desired.
>> I think it would be better to use only one "wait" call in sync_and_suspend_one and sync_and_suspend_all.
>
> You're right. We've discussed that face-to-face, but couldn't find a real issue. But now, thinking again, a reckon I found one:
>
> 2808 // Sync with other threads that might be doing deoptimizations
> 2809 {
> 2810 // Need to switch to _thread_blocked for the wait() call
> 2811 ThreadBlockInVM tbivm(_calling_thread);
> 2812 MonitorLocker ml(EscapeBarrier_lock, Mutex::_no_safepoint_check_flag);
> 2813 while (_self_deoptimization_in_progress) {
> 2814 ml.wait();
> 2815 }
> 2816
> 2817 if (self_deopt()) {
> 2818 _self_deoptimization_in_progress = true;
> 2819 }
> 2820
> 2821 while (_deoptee_thread->is_ea_obj_deopt_suspend()) {
> 2822 ml.wait();
> 2823 }
> 2824
> 2825 if (self_deopt()) {
> 2826 return;
> 2827 }
> 2828
> 2829 // set suspend flag for target thread
> 2830 _deoptee_thread->set_ea_obj_deopt_flag();
> 2831 }
>
> - A waits in 2822
> - C is suspended
> - B notifies all in resume_one()
> - A and C wake up
> - C wins over A and sets _self_deoptimization_in_progress = true in 2818
> - C does the self deoptimization
> - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag()
>
> C will self suspend at some undefined point. The resulting state is illegal.
>
>> I first thought it'd be better to move ThreadBlockInVM before wait() to reduce thread state transitions, but that seems to be problematic because ThreadBlockInVM destructor contains a safepoint check which we shouldn't do while holding EscapeBarrier_lock. So no change request.
>
> Yes, would be nice to have the state change only if needed, but for the reason you mentioned it is
> not quite as easy as it seems to be. I experimented as well with a second lock, but did not succeed.
>
>> Change in thred_added:
>> I think the sequence would be more comprehensive if we waited for deopt_all_threads in Thread::start and all other places where a new thread can run into Java code (e.g. JVMTI attach).
>> Your version makes new threads come up with suspend flag set. That looks correct, too. Advantage is that you only have to change one place (thread_added). It'll be interesting to see how it will look like when we use async handshakes instead of suspend flags.
>> For now, I'm ok with your version.
>
> I had a version that did what you are suggesting. The current version also has the advantage, that
> there are fewer places where a thread has to wait for ongoing object deoptimization. This means
> viewer places where you have to worry about correct thread state transitions, possible deadlocks,
> and if all oops are properly Handle'ed.
>
>> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt->is_hidden_from_external_view()).
>
> Done.
>
>> Having 4 different deoptimize_objects functions makes it a little hard to keep an overview of which one is used for what.
>> Maybe adding suffixes would help a little bit, but I can also live with what you have.
>> Implementation looks correct to me.
>
> 2 are internal. I added the suffix _internal to them. This leaves 2 to choose from.
>
>> src/hotspot/share/runtime/deoptimization.hpp
>> Escape barriers and object deoptimization functions.
>> Typo in comment: "helt" => "held"
>
> Done in place already.
>
>> src/hotspot/share/runtime/interfaceSupport.cpp
>> InterfaceSupport::deoptimizeAllObjects() is only used for DeoptimizeObjectsALot = 1.
>> I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad to have DeoptimizeObjectsALot = 1 in addition. Ok.
>
> I never used DeoptimizeObjectsALot = 1 that much. It could be more deterministic in single threaded
> scenarios. I wouldn't object to get rid of it though.
>
>> src/hotspot/share/runtime/stackValue.hpp
>> Better reinitilization in StackValue. Good.
>
> StackValue::obj_is_scalar_replaced() should not return true after calling set_obj().
>
>> src/hotspot/share/runtime/thread.cpp
>> src/hotspot/share/runtime/thread.hpp
>> src/hotspot/share/runtime/thread.inline.hpp
>> wait_for_object_deoptimization, suspend flag, deferred updates and test feature to deoptimize objects.
>
>> In the long term, we want to get rid of suspend flags, so it's not so nice to introduce a new one. But I agree with Götz that it should be acceptable as temporary solution until async handshakes are available (which takes more time). So I'm ok with your change.
>
> I'm keen to build the feature on async handshakes when the arive.
>
>> You can use MutexLocker with Thread*.
>
> Done.
>
>> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out of thread.hpp.
>
> Done.
>
>> src/hotspot/share/runtime/vframe.cpp
>> Added support for entry frame to new_vframe. Ok.
>
>
>> src/hotspot/share/runtime/vframe_hp.cpp
>> src/hotspot/share/runtime/vframe_hp.hpp
>
>> I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() should better be under #ifdef ASSERT or inside the assert statement (no need for code cache walking in product build).
>
> Done.
>
>> jvmtiDeferredLocalVariableSet::update_monitors:
>> Please add a comment explaining that owner referenced by original info may be scalar replaced, but it is deoptimized in the vframe.
>
> Done.
>
> -----Original Message-----
> From: Doerr, Martin <martin.doerr at sap.com>
> Sent: Donnerstag, 12. März 2020 17:28
> To: Reingruber, Richard <richard.reingruber at sap.com>; 'Robbin Ehn' <robbin.ehn at oracle.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes <david.holmes at oracle.com>; Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
>
> Hi Richard,
>
>
> I managed to find time for a (almost) complete review of webrev.4. (I'll review the tests separately.)
>
> First of all, the change seems to be in pretty good quality for its significant complexity. I couldn't find any real bugs. But I'd like to propose minor improvements.
> I'm convinced that it's mature because we did substantial testing.
>
> I like the new functionality for object deoptimization. It can possibly be reused for future escape analysis based optimizations. So I appreciate having it available in the code base.
> In addition to that, your change makes the JVMTI implementation better integrated into the VM.
>
>
> Now to the details:
>
>
> src/hotspot/share/c1/c1_IR.hpp
> describe_scope parameters. Ok.
>
>
> src/hotspot/share/ci/ciEnv.cpp
> src/hotspot/share/ci/ciEnv.hpp
> Fix for JvmtiExport::can_walk_any_space() capability. Ok.
>
>
> src/hotspot/share/code/compiledMethod.cpp
> Nice cleanup!
>
>
> src/hotspot/share/code/debugInfoRec.cpp
> src/hotspot/share/code/debugInfoRec.hpp
> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would read better than "not_global_escape_in_scope", but your version is consistent with existing code, so no change request from my side.) Ok.
>
>
> src/hotspot/share/code/nmethod.cpp
> Nice cleanup!
>
>
> src/hotspot/share/code/pcDesc.hpp
> Additional parameters. Ok.
>
>
> src/hotspot/share/code/scopeDesc.cpp
> src/hotspot/share/code/scopeDesc.hpp
> Improved implementation + additional parameters. Ok.
>
>
> src/hotspot/share/compiler/compileBroker.cpp
> src/hotspot/share/compiler/compileBroker.hpp
> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a follow up change together with the test in order to make this webrev smaller, but since it is included, I'm reviewing everything at once. Not a big deal.) Ok.
>
>
> src/hotspot/share/jvmci/jvmciCodeInstaller.cpp
> Additional parameters. Ok.
>
>
> src/hotspot/share/opto/c2compiler.cpp
> Make do_escape_analysis independent of JVMCI capabilities. Nice!
>
>
> src/hotspot/share/opto/callnode.hpp
> Additional fields for MachSafePointNodes. Ok.
>
>
> src/hotspot/share/opto/escape.cpp
> Annotation for MachSafePointNodes. Your added functionality looks correct.
> But I'd prefer to move the bulky code out of the large function.
> I suggest to factor out something like has_not_global_escape and has_arg_escape. So the code could look like this:
> SafePointNode* sfn = sfn_worklist.at(next);
> sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
> if (sfn->is_CallJava()) {
> CallJavaNode* call = sfn->as_CallJava();
> call->set_arg_escape(has_arg_escape(call));
> }
> This would also allow us to get rid of the found_..._escape_in_args variables making the loops better readable.
>
> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to be the way to do it (there are more such places). So it's ok.
>
>
> src/hotspot/share/opto/machnode.hpp
> Additional fields for MachSafePointNodes. Ok.
>
>
> src/hotspot/share/opto/macro.cpp
> Allow elimination of non-escaping allocations. Ok.
>
>
> src/hotspot/share/opto/matcher.cpp
> src/hotspot/share/opto/output.cpp
> Copy attribute / pass parameters. Ok.
>
>
> src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp
> Nice cleanup!
>
>
> src/hotspot/share/prims/jvmtiEnv.cpp
> src/hotspot/share/prims/jvmtiEnvBase.cpp
> Escape barriers + deoptimize objects for target thread. Good.
>
>
> src/hotspot/share/prims/jvmtiImpl.cpp
> src/hotspot/share/prims/jvmtiImpl.hpp
> The sequence is pretty complex:
> VM_GetOrSetLocal element initialization executes EscapeBarrier code which suspends the target thread (extra VM Operation).
> VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread to prepare VM Operation with frame deoptimization).
> VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which resumes the target thread.
> But I don't have any improvement proposal. Performance is probably not a concern, here. So it's ok.
>
> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has non-globally escaping objects and other frames if they have arg escaping ones. Good.
>
>
> src/hotspot/share/prims/jvmtiTagMap.cpp
> Escape barriers + deoptimize objects for all threads. Ok.
>
>
> src/hotspot/share/prims/whitebox.cpp
> Added WB_IsFrameDeoptimized to API. Ok.
>
>
> src/hotspot/share/runtime/deoptimization.cpp
> Object deoptimization. I have more comments and proposals, here.
> First of all, handling recursive and waiting locks in relock_objects is tricky, but looks correct.
> Comments are sufficient to understand why things are done as they are implemented.
>
> BiasedLocking related parts are complex, but we may get rid of them in the future (with BiasedLocking removal).
> Anyway, looks correct, too.
>
> Typo in comment: "regularily" => "regularly"
>
> Deoptimization::fetch_unroll_info_helper is the only place where _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I think we always go through it, so I can't see a memory leak or such kind of issues.
>
> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
>
> You can use MutexLocker and MonitorLocker with Thread* to save the Thread::current() call.
>
> I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier interface because I think it shouldn't be used outside of EscapeBarrier::deoptimize_objects.
>
> Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
>
> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and barrier_active() is redundant. Implementation can get moved to hpp file.
>
> I'll get back to suspend flags, later.
>
> There are weird cases regarding _self_deoptimization_in_progress.
> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C can set _self_deoptimization_in_progress while A performs the handshake for suspending C. I think this doesn't lead to errors, but it's probably not desired.
> I think it would be better to use only one "wait" call in sync_and_suspend_one and sync_and_suspend_all.
>
> I first thought it'd be better to move ThreadBlockInVM before wait() to reduce thread state transitions, but that seems to be problematic because ThreadBlockInVM destructor contains a safepoint check which we shouldn't do while holding EscapeBarrier_lock. So no change request.
>
> Change in thred_added:
> I think the sequence would be more comprehensive if we waited for deopt_all_threads in Thread::start and all other places where a new thread can run into Java code (e.g. JVMTI attach).
> Your version makes new threads come up with suspend flag set. That looks correct, too. Advantage is that you only have to change one place (thread_added). It'll be interesting to see how it will look like when we use async handshakes instead of suspend flags.
> For now, I'm ok with your version.
>
> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt->is_hidden_from_external_view()).
>
> Having 4 different deoptimize_objects functions makes it a little hard to keep an overview of which one is used for what.
> Maybe adding suffixes would help a little bit, but I can also live with what you have.
> Implementation looks correct to me.
>
>
> src/hotspot/share/runtime/deoptimization.hpp
> Escape barriers and object deoptimization functions.
> Typo in comment: "helt" => "held"
>
>
> src/hotspot/share/runtime/globals.hpp
> Addition of develop flag DeoptimizeObjectsALotInterval. Ok.
>
>
> src/hotspot/share/runtime/interfaceSupport.cpp
> InterfaceSupport::deoptimizeAllObjects() is only used for DeoptimizeObjectsALot = 1.
> I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad to have DeoptimizeObjectsALot = 1 in addition. Ok.
>
>
> src/hotspot/share/runtime/interfaceSupport.inline.hpp
> Addition of deoptimizeAllObjects. Ok.
>
>
> src/hotspot/share/runtime/mutexLocker.cpp
> src/hotspot/share/runtime/mutexLocker.hpp
> Addition of EscapeBarrier_lock. Ok.
>
>
> src/hotspot/share/runtime/objectMonitor.cpp
> Make recursion count relock aware. Ok.
>
>
> src/hotspot/share/runtime/stackValue.hpp
> Better reinitilization in StackValue. Good.
>
>
> src/hotspot/share/runtime/thread.cpp
> src/hotspot/share/runtime/thread.hpp
> src/hotspot/share/runtime/thread.inline.hpp
> wait_for_object_deoptimization, suspend flag, deferred updates and test feature to deoptimize objects.
>
> In the long term, we want to get rid of suspend flags, so it's not so nice to introduce a new one. But I agree with Götz that it should be acceptable as temporary solution until async handshakes are available (which takes more time). So I'm ok with your change.
>
> You can use MutexLocker with Thread*.
>
> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out of thread.hpp.
>
>
> src/hotspot/share/runtime/vframe.cpp
> Added support for entry frame to new_vframe. Ok.
>
>
> src/hotspot/share/runtime/vframe_hp.cpp
> src/hotspot/share/runtime/vframe_hp.hpp
>
> I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() should better be under #ifdef ASSERT or inside the assert statement (no need for code cache walking in product build).
>
> jvmtiDeferredLocalVariableSet::update_monitors:
> Please add a comment explaining that owner referenced by original info may be scalar replaced, but it is deoptimized in the vframe.
>
>
> src/hotspot/share/utilities/macros.hpp
> Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok.
>
>
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java
> test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnalysisEnabled.c
> New test. Will review separately.
>
>
> test/jdk/TEST.ROOT
> Addition of vm.jvmci as required property. Ok.
>
>
> test/jdk/com/sun/jdi/EATests.java
> test/jdk/com/sun/jdi/EATestsJVMCI.java
> New test. Will review separately.
>
>
> test/lib/sun/hotspot/WhiteBox.java
> Added isFrameDeoptimized to API. Ok.
>
>
> That was it. Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Reingruber, Richard
>> Sent: Dienstag, 3. März 2020 21:23
>> To: 'Robbin Ehn' <robbin.ehn at oracle.com>; Lindenmaier, Goetz
>> <goetz.lindenmaier at sap.com>; David Holmes <david.holmes at oracle.com>;
>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
>> hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Robbin,
>>
>>>> I understand that Robbin proposed to replace the usage of
>>>> _suspend_flag with handshakes. Apparently, async handshakes
>>>> are needed to do so. We have been waiting a while for removal
>>>> of the _suspend_flag / introduction of async handshakes [2].
>>>> What is the status here?
>>
>>> I have an old prototype which I would like to continue to work on.
>>> So do not assume asynch handshakes will make 15.
>>> Even if it would, I think there are a lot more investigate work to remove
>>> _suspend_flag.
>>
>> Let us know, if we can be of any help to you and be it only testing.
>>
>>>>> Full:
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
>>
>>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>>> You can move both declaration and definition to that file, no need to
>> clobber
>>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
>>
>> Will do.
>>
>>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
>> own
>>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
>>
>> You are right. It shouldn't be declared in thread.hpp. I will look into that.
>>
>>> Note that we also think we may have a bug in deopt:
>>> https://bugs.openjdk.java.net/browse/JDK-8238237
>>
>>> I think it would be best, if possible, to push after that is resolved.
>>
>> Sure.
>>
>>> Not even nearly a full review :)
>>
>> I know :)
>>
>> Anyways, thanks a lot,
>> Richard.
>>
>>
>> -----Original Message-----
>> From: Robbin Ehn <robbin.ehn at oracle.com>
>> Sent: Monday, March 2, 2020 11:17 AM
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Reingruber, Richard
>> <richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com>;
>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
>> hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance
>> in the Presence of JVMTI Agents
>>
>> Hi,
>>
>> On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> I had a look at the progress of this change. Nothing
>>> happened since Richard posted his update using more
>>> handshakes [1].
>>> But we (SAP) would appreciate a lot if this change could
>>> be successfully reviewed and pushed.
>>>
>>> I think there is basic understanding that this
>>> change is helpful. It fixes a number of issues with JVMTI,
>>> and will deliver the same performance benefits as EA
>>> does in current production mode for debugging scenarios.
>>>
>>> This is important for us as we run our VMs prepared
>>> for debugging in production mode.
>>>
>>> I understand that Robbin proposed to replace the usage of
>>> _suspend_flag with handshakes. Apparently, async handshakes
>>> are needed to do so. We have been waiting a while for removal
>>> of the _suspend_flag / introduction of async handshakes [2].
>>> What is the status here?
>>
>> I have an old prototype which I would like to continue to work on.
>> So do not assume asynch handshakes will make 15.
>> Even if it would, I think there are a lot more investigate work to remove
>> _suspend_flag.
>>
>>>
>>> I think we should no longer wait, but proceed with
>>> this change. We will look into removing the usage of
>>> suspend_flag introduced here once it is possible to implement
>>> it with handshakes.
>>
>> Yes, sure.
>>
>>>> Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
>>
>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>> You can move both declaration and definition to that file, no need to clobber
>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
>>
>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
>> own
>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
>>
>> Note that we also think we may have a bug in deopt:
>> https://bugs.openjdk.java.net/browse/JDK-8238237
>>
>> I think it would be best, if possible, to push after that is resolved.
>>
>> Not even nearly a full review :)
>>
>> Thanks, Robbin
>>
>>
>>>> Incremental:
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/
>>>>
>>>> I was not able to eliminate the additional suspend flag now. I'll take care
>> of this
>>>> as soon as the
>>>> existing suspend-resume-mechanism is reworked.
>>>>
>>>> Testing:
>>>>
>>>> Nightly tests @SAP:
>>>>
>>>> JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
>> Renaissance
>>>> Suite, SAP specific tests
>>>> with fastdebug and release builds on all platforms
>>>>
>>>> Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x
>> parallel
>>>> for 24h
>>>>
>>>> Thanks, Richard.
>>>>
>>>>
>>>> More details on the changes:
>>>>
>>>> * Hide DeoptimizeObjectsALotThread from external view.
>>>>
>>>> * Changed EscapeBarrier_lock to be a _safepoint_check_never lock.
>>>> It used to be _safepoint_check_sometimes, which will be eliminated
>> sooner or
>>>> later.
>>>> I added explicit thread state changes with ThreadBlockInVM to code
>> paths
>>>> where we can wait()
>>>> on EscapeBarrier_lock to become safepoint safe.
>>>>
>>>> * Use handshake EscapeBarrierSuspendHandshake to suspend target
>> threads
>>>> instead of vm operation
>>>> VM_ThreadSuspendAllForObjDeopt.
>>>>
>>>> * Removed uses of Threads_lock. When adding a new thread we suspend
>> it iff
>>>> EA optimizations are
>>>> being reverted. In the previous version we were waiting on
>> Threads_lock
>>>> while EA optimizations
>>>> were reverted. See EscapeBarrier::thread_added().
>>>>
>>>> * Made tests require Xmixed compilation mode.
>>>>
>>>> * Made tests agnostic regarding tiered compilation.
>>>> I.e. tc isn't disabled anymore, and the tests can be run with tc enabled or
>>>> disabled.
>>>>
>>>> * Exercising EATests.java as well with stress test options
>>>> DeoptimizeObjectsALot*
>>>> Due to the non-deterministic deoptimizations some tests need to be
>> skipped.
>>>> We do this to prevent bit-rot of the stress test code.
>>>>
>>>> * Executing EATests.java as well with graal if available. Driver for this is
>>>> EATestsJVMCI.java. Graal cannot pass all tests, because it does not
>> provide all
>>>> the new debug info
>>>> (namely not_global_escape_in_scope and arg_escape in
>> scopeDesc.hpp).
>>>> And graal does not yet support the JVMTI operations force early return
>> and
>>>> pop frame.
>>>>
>>>> * Removed tracing from new jdi tests in EATests.java. Too much trace
>> output
>>>> before the debugging
>>>> connection is established can cause deadlock because output buffers fill
>> up.
>>>> (See https://bugs.openjdk.java.net/browse/JDK-8173304)
>>>>
>>>> * Many copyright year changes and smaller clean-up changes of testing
>> code
>>>> (trailing white-space and
>>>> the like).
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Donnerstag, 19. Dezember 2019 03:12
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-
>>>> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net;
>> hotspot-
>>>> runtime-dev at openjdk.java.net; Vladimir Kozlov
>> (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance in
>>>> the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> I think my issue is with the way EliminateNestedLocks works so I'm going
>>>> to look into that more deeply.
>>>>
>>>> Thanks for the explanations.
>>>>
>>>> David
>>>>
>>>> On 18/12/2019 12:47 am, Reingruber, Richard wrote:
>>>>> Hi David,
>>>>>
>>>>> > > > Some further queries/concerns:
>>>>> > > >
>>>>> > > > src/hotspot/share/runtime/objectMonitor.cpp
>>>>> > > >
>>>>> > > > Can you please explain the changes to ObjectMonitor::wait:
>>>>> > > >
>>>>> > > > ! _recursions = save // restore the old recursion count
>>>>> > > > ! + jt->get_and_reset_relock_count_after_wait(); //
>>>>> > > > increased by the deferred relock count
>>>>> > > >
>>>>> > > > what is the "deferred relock count"? I gather it relates to
>>>>> > > >
>>>>> > > > "The code was extended to be able to deoptimize objects of a
>>>>> > > frame that
>>>>> > > > is not the top frame and to let another thread than the owning
>>>>> > > thread do
>>>>> > > > it."
>>>>> > >
>>>>> > > Yes, these relate. Currently EA based optimizations are reverted,
>> when a
>>>> compiled frame is
>>>>> > > replaced with corresponding interpreter frames. Part of this is
>> relocking
>>>> objects with eliminated
>>>>> > > locking. New with the enhancement is that we do this also just
>> before
>>>> object references are
>>>>> > > acquired through JVMTI. In this case we deoptimize also the
>> owning
>>>> compiled frame C and we
>>>>> > > register deoptimized objects as deferred updates. When control
>> returns
>>>> to C it gets deoptimized,
>>>>> > > we notice that objects are already deoptimized (reallocated and
>>>> relocked), so we don't do it again
>>>>> > > (relocking twice would be incorrect of course). Deferred updates
>> are
>>>> copied into the new
>>>>> > > interpreter frames.
>>>>> > >
>>>>> > > Problem: relocking is not possible if the target thread T is waiting
>> on the
>>>> monitor that needs to
>>>>> > > be relocked. This happens only with non-local objects with
>>>> EliminateNestedLocks. Instead relocking
>>>>> > > is deferred until T owns the monitor again. This is what the piece of
>>>> code above does.
>>>>> >
>>>>> > Sorry I need some more detail here. How can you wait() on an
>> object
>>>>> > monitor if the object allocation and/or locking was optimised away?
>> And
>>>>> > what is a "non-local object" in this context? Isn't EA restricted to
>>>>> > thread-confined objects?
>>>>>
>>>>> "Non-local object" is an object that escapes its thread. The issue I'm
>>>> addressing with the changes
>>>>> in ObjectMonitor::wait are almost unrelated to EA. They are caused by
>>>> EliminateNestedLocks, where C2
>>>>> eliminates recursive locking of an already owned lock. The lock owning
>> object
>>>> exists on the heap, it
>>>>> is locked and you can call wait() on it.
>>>>>
>>>>> EliminateLocks is the C2 option that controls lock elimination based on
>> EA.
>>>> Both optimizations have
>>>>> in common that objects with eliminated locking need to be relocked
>> when
>>>> deoptimizing a frame,
>>>>> i.e. when replacing a compiled frame with equivalent interpreter
>>>>> frames. Deoptimization::relock_objects does that job for /all/ eliminated
>>>> locks in scope. /All/ can
>>>>> be a mix of eliminated nested locks and locks of not-escaping objects.
>>>>>
>>>>> New with the enhancement: I call relock_objects earlier, just before
>> objects
>>>> pontentially
>>>>> escape. But then later when the owning compiled frame gets
>> deoptimized, I
>>>> must not do it again:
>>>>>
>>>>> See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp:
>>>>>
>>>>> 373 if ((jvmci_enabled || ((DoEscapeAnalysis ||
>> EliminateNestedLocks) &&
>>>> EliminateLocks))
>>>>> 374 && !EscapeBarrier::objs_are_deoptimized(thread,
>> deoptee.id())) {
>>>>> 375 bool unused;
>>>>> 376 eliminate_locks(thread, chunk, realloc_failures, deoptee,
>> exec_mode,
>>>> unused);
>>>>> 377 }
>>>>>
>>>>> Now when calling relock_objects early it is quiet possible that I have to
>> relock
>>>> an object the
>>>>> target thread currently waits for. Obviously I cannot relock in this case,
>>>> instead I chose to
>>>>> introduce relock_count_after_wait to JavaThread.
>>>>>
>>>>> > Is it just that some of the locking gets optimized away e.g.
>>>>> >
>>>>> > synchronised(obj) {
>>>>> > synchronised(obj) {
>>>>> > synchronised(obj) {
>>>>> > obj.wait();
>>>>> > }
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > If this is reduced to a form as-if it were a single lock of the monitor
>>>>> > (due to EA) and the wait() triggers a JVM TI event which leads to the
>>>>> > escape of "obj" then we need to reconstruct the true lock state, and
>> so
>>>>> > when the wait() internally unblocks and reacquires the monitor it
>> has to
>>>>> > set the true recursion count to 3, not the 1 that it appeared to be
>> when
>>>>> > wait() was initially called. Is that the scenario?
>>>>>
>>>>> Kind of... except that the locking is not eliminated due to EA and there is
>> no
>>>> JVM TI event
>>>>> triggered by wait.
>>>>>
>>>>> Add
>>>>>
>>>>> LocalObject l1 = new LocalObject();
>>>>>
>>>>> in front of the synchrnized blocks and assume a JVM TI agent acquires l1.
>> This
>>>> triggers the code in
>>>>> question.
>>>>>
>>>>> See that relocking/reallocating is transactional. If it is done then for /all/
>>>> objects in scope and it is
>>>>> done at most once. It wouldn't be quite so easy to split this in relocking
>> of
>>>> nested/EA-based
>>>>> eliminated locks.
>>>>>
>>>>> > If so I find this truly awful. Anyone using wait() in a realistic form
>>>>> > requires a notification and so the object cannot be thread confined.
>> In
>>>>>
>>>>> It is not thread confined.
>>>>>
>>>>> > which case I would strongly argue that upon hitting the wait() the
>> deopt
>>>>> > should occur unconditionally and so the lock state is correct before
>> we
>>>>> > wait and so we don't need to mess with the recursion count
>> internally
>>>>> > when we reacquire the monitor.
>>>>> >
>>>>> > >
>>>>> > > > which I don't like the sound of at all when it comes to
>> ObjectMonitor
>>>>> > > > state. So I'd like to understand in detail exactly what is going on
>> here
>>>>> > > > and why. This is a very intrusive change that seems to badly
>> break
>>>>> > > > encapsulation and impacts future changes to ObjectMonitor
>> that are
>>>> under
>>>>> > > > investigation.
>>>>> > >
>>>>> > > I would not regard this as breaking encapsulation. Certainly not
>> badly.
>>>>> > >
>>>>> > > I've added a property relock_count_after_wait to JavaThread. The
>>>> property is well
>>>>> > > encapsulated. Future ObjectMonitor implementations have to deal
>> with
>>>> recursion too. They are free
>>>>> > > in choosing a way to do that as long as that property is taken into
>>>> account. This is hardly a
>>>>> > > limitation.
>>>>> >
>>>>> > I do think this badly breaks encapsulation as you have to add a
>> callout
>>>>> > from the guts of the ObjectMonitor code to reach into the thread to
>> get
>>>>> > this lock count adjustment. I understand why you have had to do
>> this but
>>>>> > I would much rather see a change to the EA optimisation strategy so
>> that
>>>>> > this is not needed.
>>>>> >
>>>>> > > Note also that the property is a straight forward extension of the
>>>> existing concept of deferred
>>>>> > > local updates. It is embedded into the structure holding them. So
>> not
>>>> even the footprint of a
>>>>> > > JavaThread is enlarged if no deferred updates are generated.
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> > >
>>>>> > > I'm actually duplicating the existing external suspend mechanism,
>>>> because a thread can be
>>>>> > > suspended at most once. And hey, and don't like that either! But it
>>>> seems not unlikely that the
>>>>> > > duplicate can be removed together with the original and the new
>> type
>>>> of handshakes that will be
>>>>> > > used for thread suspend can be used for object deoptimization
>> too. See
>>>> today's discussion in
>>>>> > > JDK-8227745 [2].
>>>>> >
>>>>> > I hope that discussion bears some fruit, at the moment it seems not
>> to
>>>>> > be possible to use handshakes here. :(
>>>>> >
>>>>> > The external suspend mechanism is a royal pain in the proverbial
>> that we
>>>>> > have to carefully live with. The idea that we're duplicating that for
>>>>> > use in another fringe area of functionality does not thrill me at all.
>>>>> >
>>>>> > To be clear, I understand the problem that exists and that you wish
>> to
>>>>> > solve, but for the runtime parts I balk at the complexity cost of
>>>>> > solving it.
>>>>>
>>>>> I know it's complex, but by far no rocket science.
>>>>>
>>>>> Also I find it hard to imagine another fix for JDK-8233915 besides
>> changing
>>>> the JVM TI specification.
>>>>>
>>>>> Thanks, Richard.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Dienstag, 17. Dezember 2019 08:03
>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-
>>>> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net;
>> hotspot-
>>>> runtime-dev at openjdk.java.net; Vladimir Kozlov
>> (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>
>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance
>>>> in the Presence of JVMTI Agents
>>>>>
>>>>> <resend as my mailer crashed during last send>
>>>>>
>>>>> David
>>>>>
>>>>> On 17/12/2019 4:57 pm, David Holmes wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> > Some further queries/concerns:
>>>>>>> >
>>>>>>> > src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>> >
>>>>>>> > Can you please explain the changes to ObjectMonitor::wait:
>>>>>>> >
>>>>>>> > ! _recursions = save // restore the old recursion count
>>>>>>> > ! + jt->get_and_reset_relock_count_after_wait(); //
>>>>>>> > increased by the deferred relock count
>>>>>>> >
>>>>>>> > what is the "deferred relock count"? I gather it relates to
>>>>>>> >
>>>>>>> > "The code was extended to be able to deoptimize objects of a
>>>>>>> frame that
>>>>>>> > is not the top frame and to let another thread than the owning
>>>>>>> thread do
>>>>>>> > it."
>>>>>>>
>>>>>>> Yes, these relate. Currently EA based optimizations are reverted,
>> when
>>>>>>> a compiled frame is replaced
>>>>>>> with corresponding interpreter frames. Part of this is relocking
>>>>>>> objects with eliminated
>>>>>>> locking. New with the enhancement is that we do this also just before
>>>>>>> object references are acquired
>>>>>>> through JVMTI. In this case we deoptimize also the owning compiled
>>>>>>> frame C and we register
>>>>>>> deoptimized objects as deferred updates. When control returns to C
>> it
>>>>>>> gets deoptimized, we notice
>>>>>>> that objects are already deoptimized (reallocated and relocked), so
>> we
>>>>>>> don't do it again (relocking
>>>>>>> twice would be incorrect of course). Deferred updates are copied into
>>>>>>> the new interpreter frames.
>>>>>>>
>>>>>>> Problem: relocking is not possible if the target thread T is waiting
>>>>>>> on the monitor that needs to be
>>>>>>> relocked. This happens only with non-local objects with
>>>>>>> EliminateNestedLocks. Instead relocking is
>>>>>>> deferred until T owns the monitor again. This is what the piece of
>>>>>>> code above does.
>>>>>>
>>>>>> Sorry I need some more detail here. How can you wait() on an object
>>>>>> monitor if the object allocation and/or locking was optimised away?
>> And
>>>>>> what is a "non-local object" in this context? Isn't EA restricted to
>>>>>> thread-confined objects?
>>>>>>
>>>>>> Is it just that some of the locking gets optimized away e.g.
>>>>>>
>>>>>> synchronised(obj) {
>>>>>> synchronised(obj) {
>>>>>> synchronised(obj) {
>>>>>> obj.wait();
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> If this is reduced to a form as-if it were a single lock of the monitor
>>>>>> (due to EA) and the wait() triggers a JVM TI event which leads to the
>>>>>> escape of "obj" then we need to reconstruct the true lock state, and so
>>>>>> when the wait() internally unblocks and reacquires the monitor it has to
>>>>>> set the true recursion count to 3, not the 1 that it appeared to be when
>>>>>> wait() was initially called. Is that the scenario?
>>>>>>
>>>>>> If so I find this truly awful. Anyone using wait() in a realistic form
>>>>>> requires a notification and so the object cannot be thread confined. In
>>>>>> which case I would strongly argue that upon hitting the wait() the
>> deopt
>>>>>> should occur unconditionally and so the lock state is correct before we
>>>>>> wait and so we don't need to mess with the recursion count internally
>>>>>> when we reacquire the monitor.
>>>>>>
>>>>>>>
>>>>>>> > which I don't like the sound of at all when it comes to
>>>>>>> ObjectMonitor
>>>>>>> > state. So I'd like to understand in detail exactly what is going
>>>>>>> on here
>>>>>>> > and why. This is a very intrusive change that seems to badly
>> break
>>>>>>> > encapsulation and impacts future changes to ObjectMonitor that
>>>>>>> are under
>>>>>>> > investigation.
>>>>>>>
>>>>>>> I would not regard this as breaking encapsulation. Certainly not badly.
>>>>>>>
>>>>>>> I've added a property relock_count_after_wait to JavaThread. The
>>>>>>> property is well
>>>>>>> encapsulated. Future ObjectMonitor implementations have to deal
>> with
>>>>>>> recursion too. They are free in
>>>>>>> choosing a way to do that as long as that property is taken into
>>>>>>> account. This is hardly a
>>>>>>> limitation.
>>>>>>
>>>>>> I do think this badly breaks encapsulation as you have to add a callout
>>>>>> from the guts of the ObjectMonitor code to reach into the thread to
>> get
>>>>>> this lock count adjustment. I understand why you have had to do this
>> but
>>>>>> I would much rather see a change to the EA optimisation strategy so
>> that
>>>>>> this is not needed.
>>>>>>
>>>>>>> Note also that the property is a straight forward extension of the
>>>>>>> existing concept of deferred
>>>>>>> local updates. It is embedded into the structure holding them. So not
>>>>>>> even the footprint of a
>>>>>>> JavaThread is enlarged if no deferred updates are generated.
>>>>>>>
>>>>>>> > ---
>>>>>>> >
>>>>>>> > src/hotspot/share/runtime/thread.cpp
>>>>>>> >
>>>>>>> > Can you please explain why
>>>>>>> JavaThread::wait_for_object_deoptimization
>>>>>>> > has to be handcrafted in this way rather than using proper
>>>>>>> transitions.
>>>>>>> >
>>>>>>>
>>>>>>> I wrote wait_for_object_deoptimization taking
>>>>>>> JavaThread::java_suspend_self_with_safepoint_check
>>>>>>> as template. So in short: for the same reasons :)
>>>>>>>
>>>>>>> Threads reach both methods as part of thread state transitions,
>>>>>>> therefore special handling is
>>>>>>> required to change thread state on top of ongoing transitions.
>>>>>>>
>>>>>>> > We got rid of "deopt suspend" some time ago and it is disturbing
>>>>>>> to see
>>>>>>> > it being added back (effectively). This seems like it may be
>>>>>>> something
>>>>>>> > that handshakes could be used for.
>>>>>>>
>>>>>>> Deopt suspend used to be something rather different with a similar
>>>>>>> name[1]. It is not being added back.
>>>>>>
>>>>>> I stand corrected. Despite comments in the code to the contrary
>>>>>> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of
>>>>>> cleanup in this area 13 years ago :)
>>>>>>
>>>>>>>
>>>>>>> I'm actually duplicating the existing external suspend mechanism,
>>>>>>> because a thread can be suspended
>>>>>>> at most once. And hey, and don't like that either! But it seems not
>>>>>>> unlikely that the duplicate can
>>>>>>> be removed together with the original and the new type of
>> handshakes
>>>>>>> that will be used for
>>>>>>> thread suspend can be used for object deoptimization too. See
>> today's
>>>>>>> discussion in JDK-8227745 [2].
>>>>>>
>>>>>> I hope that discussion bears some fruit, at the moment it seems not to
>>>>>> be possible to use handshakes here. :(
>>>>>>
>>>>>> The external suspend mechanism is a royal pain in the proverbial that
>> we
>>>>>> have to carefully live with. The idea that we're duplicating that for
>>>>>> use in another fringe area of functionality does not thrill me at all.
>>>>>>
>>>>>> To be clear, I understand the problem that exists and that you wish to
>>>>>> solve, but for the runtime parts I balk at the complexity cost of
>>>>>> solving it.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks, Richard.
>>>>>>>
>>>>>>> [1] Deopt suspend was something like an async. handshake for
>>>>>>> architectures with register windows,
>>>>>>> where patching the return pc for deoptimization of a compiled
>>>>>>> frame was racy if the owner thread
>>>>>>> was in native code. Instead a "deopt" suspend flag was set on
>>>>>>> which the thread patched its own
>>>>>>> frame upon return from native. So no thread was suspended. It
>> got
>>>>>>> its name only from the name of
>>>>>>> the flags.
>>>>>>>
>>>>>>> [2] Discussion about using handshakes to sync. with the target thread:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-
>>>>
>> 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst
>> e
>>>> m.issuetabpanels:comment-tabpanel#comment-14306727
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>> Sent: Freitag, 13. Dezember 2019 00:56
>>>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>>>> serviceability-dev at openjdk.java.net;
>>>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>>
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Some further queries/concerns:
>>>>>>>
>>>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>>
>>>>>>> Can you please explain the changes to ObjectMonitor::wait:
>>>>>>>
>>>>>>> ! _recursions = save // restore the old recursion count
>>>>>>> ! + jt->get_and_reset_relock_count_after_wait(); //
>>>>>>> increased by the deferred relock count
>>>>>>>
>>>>>>> what is the "deferred relock count"? I gather it relates to
>>>>>>>
>>>>>>> "The code was extended to be able to deoptimize objects of a frame
>> that
>>>>>>> is not the top frame and to let another thread than the owning thread
>> do
>>>>>>> it."
>>>>>>>
>>>>>>> which I don't like the sound of at all when it comes to ObjectMonitor
>>>>>>> state. So I'd like to understand in detail exactly what is going on here
>>>>>>> and why. This is a very intrusive change that seems to badly break
>>>>>>> encapsulation and impacts future changes to ObjectMonitor that are
>> under
>>>>>>> investigation.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>>
>>>>>>> Can you please explain why
>> JavaThread::wait_for_object_deoptimization
>>>>>>> has to be handcrafted in this way rather than using proper transitions.
>>>>>>>
>>>>>>> We got rid of "deopt suspend" some time ago and it is disturbing to
>> see
>>>>>>> it being added back (effectively). This seems like it may be something
>>>>>>> that handshakes could be used for.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 12/12/2019 7:02 am, David Holmes wrote:
>>>>>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> > Most of the details here are in areas I can comment on in
>> detail,
>>>>>>>>> but I
>>>>>>>>> > did take an initial general look at things.
>>>>>>>>>
>>>>>>>>> Thanks for taking the time!
>>>>>>>>
>>>>>>>> Apologies the above should read:
>>>>>>>>
>>>>>>>> "Most of the details here are in areas I *can't* comment on in detail
>>>>>>>> ..."
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> > The only thing that jumped out at me is that I think the
>>>>>>>>> > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>>>> >
>>>>>>>>> > + bool is_hidden_from_external_view() const { return true; }
>>>>>>>>>
>>>>>>>>> Yes, it should. Will add the method like above.
>>>>>>>>>
>>>>>>>>> > Also I don't see any testing of the
>> DeoptimizeObjectsALotThread.
>>>>>>>>> Without
>>>>>>>>> > active testing this will just bit-rot.
>>>>>>>>>
>>>>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>>>>>> workload. I will add a minimal test
>>>>>>>>> to keep it fresh.
>>>>>>>>>
>>>>>>>>> > Also on the tests I don't understand your @requires clause:
>>>>>>>>> >
>>>>>>>>> > @requires ((vm.compMode != "Xcomp") &
>> vm.compiler2.enabled
>>>> &
>>>>>>>>> > (vm.opt.TieredCompilation != true))
>>>>>>>>> >
>>>>>>>>> > This seems to require that TieredCompilation is disabled, but
>>>>>>>>> tiered is
>>>>>>>>> > our normal mode of operation. ??
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> I removed the clause. I guess I wanted to target the tests towards
>> the
>>>>>>>>> code they are supposed to
>>>>>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>>>>>> with just one compiler thread.
>>>>>>>>>
>>>>>>>>> Additionally I will make use of
>>>>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>>>>>> serviceability-dev at openjdk.java.net;
>>>>>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>>>>
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I would like to get reviews please for
>>>>>>>>>>
>>>>>>>>>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>>>>>
>>>>>>>>>> Corresponding RFE:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>>>>>
>>>>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-
>> 8214584 [1]
>>>>>>>>>>
>>>>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing
>> without
>>>>>>>>>> issues (thanks!). In addition the
>>>>>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>>>>>> months ago.
>>>>>>>>>>
>>>>>>>>>> The intention of this enhancement is to benefit performance wise
>> from
>>>>>>>>>> escape analysis even if JVMTI
>>>>>>>>>> agents request capabilities that allow them to access local variable
>>>>>>>>>> values. E.g. if you start-up
>>>>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,
>> then
>>>>>>>>>> escape analysis is disabled right
>>>>>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>>>>>> should do so. With the
>>>>>>>>>> enhancement, escape analysis will remain enabled until and after
>> a
>>>>>>>>>> debugger attaches. EA based
>>>>>>>>>> optimizations are reverted just before an agent acquires the
>>>>>>>>>> reference to an object. In the JBS item
>>>>>>>>>> you'll find more details.
>>>>>>>>>
>>>>>>>>> Most of the details here are in areas I can comment on in detail, but
>> I
>>>>>>>>> did take an initial general look at things.
>>>>>>>>>
>>>>>>>>> The only thing that jumped out at me is that I think the
>>>>>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>>>>
>>>>>>>>> + bool is_hidden_from_external_view() const { return true; }
>>>>>>>>>
>>>>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>>>>>>> Without
>>>>>>>>> active testing this will just bit-rot.
>>>>>>>>>
>>>>>>>>> Also on the tests I don't understand your @requires clause:
>>>>>>>>>
>>>>>>>>> @requires ((vm.compMode != "Xcomp") &
>> vm.compiler2.enabled &
>>>>>>>>> (vm.opt.TieredCompilation != true))
>>>>>>>>>
>>>>>>>>> This seems to require that TieredCompilation is disabled, but tiered
>> is
>>>>>>>>> our normal mode of operation. ??
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>>>>>>
>>>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa
>> tc
>>>> h
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
More information about the serviceability-dev
mailing list