RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Mon Jul 13 06:42:13 UTC 2020

Hi Goetz,

thanks for looking at this!

And my apologies for taking that long...

So here is the new webrev.6

Webrev.6: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/
Delta:    http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.inc/

I spent most of the time running a microbenchmark [1] I wrote to answer questions from your
review. At first I had trouble with variance in the results until I found out it was due to the NUMA
architecture of the server I used. After that I noticed that there was a performance regression of
about 5% even at low agent activity. I finally found out that it was due to the implementation of
JavaThread::wait_for_object_deoptimization() which is called by the target of the JVMTI operation to
self suspend for object deoptimization. I fixed this by adding limited spinning before calling
wait() on the monitor.

The delta includes many changes in comments, renaming of names, etc. So I'd like to summarize
functional changes:

* Collected all the code for the testing feature DeoptimizeObjectsALot in compileBroker.cpp and
  reworked it.

  With DeoptimizeObjectsALot enabled internal threads are started that deoptimize frames and
  objects. The number of threads started are given with DeoptimizeObjectsALotThreadCountAll and
  DeoptimizeObjectsALotThreadCountSingle. The former targets all existing threads whereas the
  latter operates on a single thread selected round robin.

  I removed the mode where deoptimizations were performed at every nth exit from the runtime. I
  never used it.

* EscapeBarrier::sync_and_suspend_one(): use a direct handshake and execute it always independently
  of is_thread_fully_suspended().

* Bugfix in EscapeBarrier::thread_added(): must not clear deopt flag. Found this testing with
  DeoptimizeObjectsALot.

* Added EscapeBarrier::thread_removed().

* EscapeBarrier constructors: barriers can now be entirely disabled by disabling DoEscapeAnalysis.
  This effectively disables the enhancement.

* JavaThread::wait_for_object_deoptimization():

  - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the safepoint check! This
    caused issues with not walkable stacks with DeoptimizeObjectsALot.

  - Added limited spinning inspired by HandshakeSpinYield to fix regression in microbenchmark [1]

I refer to some more changes answering your questions and comments inline below.

Thanks,
Richard.

[1] Microbenchmark: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/

> Hi Richard,
> 
> I had a look at your change.  It's complex, but not that big.
> A lot of code is just passing info through layers of abstraction.

Also it leverages preexisting functionality like materialization of virtual objects in non-top
frames (see materializeVirtualObjects).

> Also, one can tell this went through some iterations by now, 
> I think it's very well engineered.
> I had a look at webrev.05
> 
> Unfortunately
> "8242425: JVMTI monitor operations should use Thread-Local Handshakes" 
> breaks webrev.05.
> I updated to before that change and took that as base of my review.
> 
> I see four parts of the change that can be looked at
> rather individually.
> 
>  * Refactoring the scopeDesc constructors. Trivial.
>  * Persisting information about the optimizations done by the compilers.
>    Large and mostly trivial.
>  * Deoptimizing. The most complicated part. Really well abstracted, though.
>  * DeoptimizeObjectsALot for testing and the tests.
> 
> Review of compiler changes:
> 
> I understand you annotate at safepoints where the escape analysis
> finds out that an object is "better" than global escape. 
> This are the cases where the analysis identifies optimization 
> opportunities. These annotations are then used to deoptimize
> frames and the objects referenced by them.
> Doesn't this overestimate the optimized 
> objects?  E.g., eliminate_alloc_node has many cases where it bails
> out.

Yes, the implementation is conservative, but it is comparatively simple and the additional debug
info is just 2 flags per safepoint. On the other hand, those JVMTI operations that really trigger
deoptimizations are expected to be comparatively infrequent such that switching to the interpreter
for a few microseconds will hardly have an effect.

I've done microbenchmarking to check this.

http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/

I found that in the worst case performance can be impacted by 10%. If the agent is extremely active
and does relevant JVMTI calls like GetOwnedMonitorStackDepthInfo() every millisecond or more often,
then the performance impact can be 30%. But I would think that this is not realistic. These calls
are issued in interactive sessions to analyze deadlocks.

We could get more precise deoptimizations by adding a third flag per safepoint for ea-local objects
among the owned monitors. This would help improve the worst case in the benchmark. But I'm not
convinced, if it is worth it.

Refer to the README.txt of the microbenchmark for a more detailled discussion.

> c1_IR.hpp   
> 
> OK, nothing to do for C1, just adapt to extended method signature.
> 
> Break line once more so that it matches above line length.

Done.

> ciEnv.h|cpp
> 
> Pass through another jvmti capability.  Trivial & good.
> 
> 
> debugInfoRec.hpp
> 
> Pass through escape info that must be recorded. OK.
> 
> pcDesc.hpp
> 
> I would like to see some documentation of the methods.
>
> Maybe:
>   // There is an object in the scope that does not escape globally.
>   // It either does not escape at all or it escapes as arguemnt.
> and
>   // One of the arguments is an object that is not globally visible
>   // but escapes to the callee.

Done. I didn't take your text, though, because I only noticed it after writing my own. Let me know
if you are not ok with it.

> scopeDesc.cpp
> 
>   Besides refactoring copy escape info from pcDesc to scopeDesc
>   and add accessors. Trivial.
> 
>   In scopeDesc.hpp you talk about NoEscape and ArgEscape. 
>   This are opto terms, but scopeDesc is a shared datastructure
>   that does not depend on a specific compiler. 
>   Please explain what is going on without using these terms.

Actually these are not too opto specific terms. They are used in the paper referenced in
escape.hpp. Also you can easily google them. I'd rather keep the comments as they are.

> jvmciCodeInstaller.cpp
> 
>   OK, nothing for JVMCI. Here support for Object Optimizations 
>   for JVMCI compilers could be added. Leave this to graal people.
> 
> callnode.hpp
> 
> You add functionality to annotate callnodes with escape information 
> This is carried through code generation to final output where it is
> added to the compiled methods meta information.
> 
> At Safepoints in general jvmti can access
>   - Objects that were scalar replaced. They must be reallocated.
>     (Flag EliminateAllocations)
>   - Objects that should be locked but are not because they never 
>     escape the thread. They need to be relocked.
> 
> At calls, Objects where locks have been removed escape to callees.
> We must persist this information so that if jvmti accesses the 
> object in a callee, we can determine by looking at the caller that
> it needs to be relocked.

Note that the ea-optimization must not be at the current location, it can also follow when control
returns to the caller. Lock elimination isn't the only relevant optimization. Accesses to instance
members or array elements can be optimized as well.

> A side comment: 
> I think the flage handling in Opto is not very intuitive.
> DoEscapeAnalysis depends on the jvmti capabilities.
> This makes no sense. It is only an analysis. The optimizations
> should depend on the jvmti capabilities.
> The correct setup would be to handle this in 
> CompilerConfig::ergo_initialize():
> If the jvmti capabilities allow, enable the optimizations 
> EliminateAllocations or  EliminateLocks/EliminateNestedLocks.
> If one of these optimizations is on, enable EscapeAnalysis.
>  -- end side comment.
>
> So I would propose the following comments:
> 
>   // In the scope of this safepoints there are objects
>   // that do not globally escape. They are either NoEscape or
>   // ArgEscape. As such, they might be subject to optimizations.
>   // Persist this information here so that the frame an the
>   // Objects in scope can 
>   // be deoptimized if jvmti accesses an object at this safepoint.
>   void set_not_global_escape_in_scope(bool b) {
> 
>   // This call passes objects that do not globally escape 
>   // to its callee. The object might be subject to optimization, 
>   // e.g. a lock might be omitted. Persist this information here 
>   // so that on a jvmti access to the callee frame we can deoptimize
>   // the object and this frame.
>   void  set_arg_escape(bool f)             { _arg_escape = f; }

I do not really like these comments. They are too verbose and do not match the comment style of the
surrounding code. The names are descriptive enough IMO. Also the measures taken depending on the
flags should be commented at the locations, where the flags are read.

> Actuall I am not sure whether the name of these fields (and all 
> the others in the course of this change) should refer to 
> escape analysis.  I think the term "Object deoptimization" 
> you also use is much better. You could call these properties 
> (througout the whole change) 
>   set_optimized_objects_in_scope()
> and
>   set_passes_optimized_objects().
> 
> I think this would make the whole matter much easier
> to understand. 

I'd prefer the current names. They are closer to established terminology.  And it is actually
unknown, if optimizations based on their escape state exist.

> Anyways, locks can already be removed without running
> escape analysis at all. C2 recognizes some local patterns
> that allow this.
> 
> escape.h|cpp
> 
> The code looks good. 
> 
> Line 325: The comment could be a bit more elaborate:
>   // Annotate at safepoints if they have <= ArgEscape objects in their
>   // scope. Additionally, if the safepoint is a java call, annotate
>   // whether it passes ArgEscape objects as parameters.
> 
> And maybe add these comments?:
> 
> // Returns true if an oop in the scope of sfn does not escape
> // globally.
> bool ConnectionGraph::has_not_global_escape_in_scope(SafePointNode* sfn) {
> 
> // Returns true if at least one of the arguments to the call is an oop
> // that does not escape globally.
> bool ConnectionGraph::has_arg_escape(CallJavaNode* call) {

IMHO the method names are descriptive and don't need the comments. But I give in :) (only replaced
"oop" with "object")

> General question:
> You collect the information you want to annotate to the 
> method during escape analysis.
> Don't you overestimate the optimized objects by this?
> E.g. elimination of allocations does bail out for 
> various reasons. At the end, no optimization might 
> have happened, but then during runtime the frame is 
> deoptimized nevertheless.

Please see statements and worst case microbenchmark above.

> machnode.hpp:
> 
> Extends MachSafePointNode similar to the ideal version.  Good.
> 
> matcher.cpp
>   
> Copy info from ideal to mach node. good.
> 
> output.cpp
> 
> Now finally the information is written to the 
> debug info.  Good.
> 
> ---------------------------------------------------------
> 
> So now let's have a look at the runtime part (including
> relaxing constraints to escape analysis):
> 
> rootResolver.cpp
> 
> Adapt to changed interface. good.
> 
> c2compiler.cpp / macro.cpp
> 
> Make EscpaeAnlysis independent of jvmti capabilities. Good.
> 
> jvmtiEnv.cpp/jvmtiEnvBase.cpp
> 
> You add deoptimization of objects where they are 
> accessed. good.
> 
> jvmtiImpl.cpp
> 
> In deoptimize_objects, you check for DoEscapeAnalysis.
> This is correct given the current design of the flag
> handling in the compiler.
> It's not really nice to have a dependency to C2 here, 
> though. I understand it's an optimization, the code 
> could be run anyways, it would check but not find
> anything. But actually I would excpect dependencies
> on EliminateLocks and EliminateAllocations (if they
> were set according to jvmti capabilitiers as I elaborated
> above.)  
> Would it make sense to protect the ArgEscape
> loop by if (EliminateLocks)?

You are right, it is not correct how flags are checked. Especially if only running with the JVMCI
compiler.

I changed Deoptimization::deoptimize_objects_internal() to make reallocation and relocking dependent
on similar checks as in Deoptimization::fetch_unroll_info_helper(). Furthermore EscapeBarriers are
conditionally activated depending on the following (see EscapeBarrier ctors):

JVMCI_ONLY(UseJVMCICompiler) NOT_JVMCI(false) COMPILER2_PRESENT(|| DoEscapeAnalysis)

So the enhancement can be practically completely disabled by disabling DoEscapeAnalysis, which is
what C2 currently does if JVMTI capabilities that allow access to local references are taken.

> jvmtiTagMap.cpp
> 
> Deoptimize for jvmti operations.  Good.
> 
> deoptimization.cpp
> 
> I guess this is the core of your work.
> 
> 
> You add a new mode that just deoptimizes objects but not frames. 
> Good idea. You have to use reallocated objects in upper frames, 
> or by jvmti accesses to inner frames, which can not easily be
> replaced by interpreter frames.
> This way you can wait with replacing the frame until just before
> execution returns.
> 
> eliminate_allocations():
> (Strange method name, should at least be in past tense, even
> better reallocate_eliminated_allocations() or 
> allocate_scalarized_objects(). Confused me until
> I groked the code. Legacy though, not your business.)

I still don't grok the name... ;) but it's preexisting as you noted

> It's not that nice to return whether you only deoptimized
> objects by the boolean reference argument. After all, 
> it again depends on the mode you pass in.
> A different design would be to clone the method and 
> have an eliminate_allocations_no_unpack() variant, but that would
> not be better as some code would be duplicated.
> Maybe a comment for argument eliminate_allocations:
> // deoptimized_objects is set to true if objects were deoptimized
> // but not the frame. It is unchanged if there are no objects to 
> // be deoptimized, or if the frame was deoptim

I agree: duplicating the code would be really bad, but I don't think that having reference
parameters is not nice. I think it is a common pattern, if you return an error code and additional
result data. The variable is a minor detail. With the meaningful name it is not necessary to
document it.

In my eyes it should be set independently of the exec_mode. I didn't do it to make the change smaller.

> Similar for eliminate_locks():
> // deoptimized_objects is set to true if objects were relocked,
> // else it is left unchanged.
> 
> You reuse and extend the existing realloc/relock_objects, but extended it.
> 
> deoptimize_objects_internal()
> 
> Simple version of fetch_unroll_info_helper for EscapeBarrier.
> Good.
> I attributed the comment "Then relock objects if synchronization on them was eliminated."
> to the if() just below. Add an empty line to make clear the comment
> refers to the next 10 lines.
> Alternatively, replace the whole comment by 
> // At first, reallocate the non-escaping objects and restore their fields
> // so they are available for relocking.
> And add 
> // Now relock objects with eliminated locks.
> befor the if ((DoEscape... below.

I went for the latter.

> In fetch_unroll_info_helper, I don't understand why you need 
>  && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) {
> for eliminated locks, but not for skalar replaced objects?

In short reallocation is idempotent, relocking is not.

Without the enhancement Deoptimization::realloc_objects() can already be called more than once for a frame:

First call in materializeVirtualObjects() (also iterateFrames()).

Second (indirect) call in fetch_unroll_info_helper().

The objects from the first call are saved as jvmti deferred updates when realloc_objects()
returns. Note that there is no relationship to jvmti. The thing in common is that updates cannot be
directely installed into a compiled frame, it is necessary to deoptimize the frame and defer the
updates until the compiled frame gets replaced. Every time the vframes corresponding to the owner
frame are iterated, they get the deferred updates. So in fetch_unroll_info_helper() the
GrowableArray<compiledVFrame*>* chunk reference them too. All references to the objects created by
the second (indirect) call to realloc_objects() are never used, because compiledVFrame accessors to
locals, expressions, and monitors override them with the deferred updates. The objects become
unreachable and get gc'ed.

materializeVirtualObjects() does not bother with relocking. deoptimize_objects_internal(), which is
introduced by the enhancement, does relock objects, after all the lock elimination becomes illegal
with the change in escape state. Relocking twice does not work, so the enhancement avoids it by
checking EscapeBarrier::objs_are_deoptimized(thread, deoptee.id()).

Note that materializeVirtualObjects() can be called more than once and will always return the very
same objects, even though it calls realloc_objects() again.

> I would guess it is because the eliminated locks can be applied to
> argEscape, but scalar replacement only to noescape objects?
> I.e. it might have been done before?
> 
> But why isn't this the case for eliminate_allocations?
> deoptimize_objects_internal does both unconditionally,
> so both can happen to inner frames, right?

Sorry, I don't quite understand. Hope the explanation above helps.

> relock_objects()
> 
> Ok, you need to undo biased locking. Also, you remember the 
> lock nesting for later relocking if waiting for lock.
> 
> revoke_for_object_deoptimization()
>   I like if boolean operators are at the beginning of broken lines, 
>   but I think hotspot convention is to have them at the end.

Ok, fixed.

> Code will get much more simple if BiasedLocking is removed.
> 
> EscapeBarrier:: ...
> 
> (This class maybe would qualify for a file of its own.)
> 
> deoptimize_objects()
> I would mention escape analysis only as side remark.  Also, as I understand, 
> there is only one frame at given depth?
> // Deoptimize frames with optimized objects. This can be omitted locks and 
> // objects not allocated but replaced by scalars. In C2, these optimizations
> // are based on escape analysis.
> // Up to depth, deoptimize frames with any optimized objects.
> // From depth to entry_frame, deoptimize only frames that
> // pass optimized objects to their callees.
> (First part similar for the comment above EscapeBarrier::deoptimize_objects_internal().)

I've reworked the comment. Let me know if you still think it needs to be improved.

> 
> What is the check (cur_depth <= depth) good for? Can you 
> ever walk past entry_frame?  

Yes (assuming you mean the outer while-statement), there are java frames beyond the entry frame if a
native method calls java methods again. So we visit all frames up to the given depth and from there
we continue to the entry frame. It is not necessary to continue beyond that entry frame, because
escape analysis assumes that arguments to native functions escape globally.

Example: Let the java stack look like this:

+---------+
| Frame A |
+---------+
| Frame N |
+---------+
| Frame B |
+---------+ <- top of stack

Where java method A calls native method N and N calls java method B.

Very simplified the native stack will look like this

+-------------------------+
| Frame of JIT Compiled A |
+-------------------------+
| Frame N                 |
+-------------------------+
| Entry Frame             |
+-------------------------+
| Frame B                 |
+-------------------------+ <- top of stack

The entry frame is an activation of the call stub, which is a small assembler routine that
translates from the native calling convention to the java calling convention.

There cannot be any ArgEscape that is passed to B (see above), therefore we can stop the stackwalk
at the entry frame if depth is 1. If depth is 3 we have to continue to Frame A, as it is directely
accessed.

> Isn't vf->is_compiled_frame() prerequisite that "Move to next physical frame" 
> is needed? You could move it into the other check.
> If so, similar for deoptimize_objects_all_threads().

Only compiledVFrame require moving to the /top/ frame. Fixed.

> Syncronization: looks good. I think others had a look at this before.
> 
> EscapeBarrier::deoptimize_objects_internal()
>   The method name is misleading, it is not used by 
>   deoptimize_objects().
>   Also, method with the same name is in Deopitmization.
>   Proposal: deoptimize_objects_thread() ?

Sorry, but I don't see, why it would be misleading.
What would be the meaning of 'deoptimize_objects_thread'? I don't understand that name.

> C1 stubs: this really shows you tested all configurations, great!
> 
> 
> mutexLocker: ok.
> objectMonitor.cpp: ok
> stackValue.hpp   Is this missing clearing a bug?

In short: that change is not needed anymore. I'll remove it again.

Details: it is not a real bug, but the assertion in vframeArrayElement::fill_in() was triggered:

assert(!value->obj_is_scalar_replaced() || realloc_failures) failed: object should be reallocated already.

But only with the first version of the enhancement (webrev.0), were objects were only reallocated
when replacing a compiled frame with equivalent interpreter frames iff virtual objects where not
reallocated before.

I changed this after prexisting code was refactored (JDK-8226705), because practically never already
reallocated objects exist and if there should be any, it does not harm to reallocate again, because
the unnecessarily allocated objects become immediately garbage and last but not least no tricky
synchronization is required.

Also that's what happens with the preexisting code if virtual objects are materialized with
materializeVirtualObjects().

> 
> thread.hpp
> 
> I would remove "_ea" from the flag and method names.

Done.

> 
> Renaming deferred_locals to deferred_updates is good, as well as 
> adding a datastructure for it. 
> (Adding this data structure might be a breakout, too.)
> 
> good.
> 
> thread.cpp
> 
> good.
> 
> vframe.cpp
> 
> Is this a bug in existing code?
> Makes sense. 

Depends on your definition of bug. There are no references to vframe::is_entry_frame() in the
existing code. I would think it is a bug.

> 
> vframe_hp.hpp 
> (What stands _hp for? helper? The file should be named compiledVFrame ...)
> 
> not_global_escape_in_scope() ...
> Again, you mention escape analysis here. Comments above hold, too.

I think it is the right name, because it is meaningful and simple.

> You introduce JvmtiDeferredUpdates. Good.
> 
> vframe_hp.cpp
> 
> Changes for JvmtiDeferredUpdates, escape state accessors,
> 
> line 422:
> Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here?
> 
> 
> macros.hpp
>   Good.  
> 
> 
> Test coding
> ============
> 
> compileBroker.h|cpp
> 
> You introduce a third class of threads handled here and 
> add a new flag to distinguish it. Before, the two kinds
> of threads were distinguished implicitly by passing in 
> a compiler for compiler threads.
> The new thread kind is only used for testing in debug.
> 
> make_thread:
> You could assert (comp != NULL...) to assure previous
> conditions.

If replaced the if-statements with a switch-statement, made sure all enum-elements are covered, and
added the assertion you suggested.

> line 989 indentation broken

You are referring to this block I assume:
(from http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/src/hotspot/share/compiler/compileBroker.cpp.frames.html)

 976   if (MethodFlushing) {
 977     // Initialize the sweeper thread
 978     Handle thread_oop = create_thread_oop("Sweeper thread", CHECK);
 979     jobject thread_handle = JNIHandles::make_local(THREAD, thread_oop());
 980     make_thread(sweeper_t, thread_handle, NULL, NULL, THREAD);
 981   }
 982 
 983 #if defined(ASSERT) && COMPILER2_OR_JVMCI
 984   if (DeoptimizeObjectsALot == 2) {
 985     // Initialize and start the object deoptimizer threads
 986     for (int thread_count = 0; thread_count < DeoptimizeObjectsALotThreadCount; thread_count++) {
 987       Handle thread_oop = create_thread_oop("Deoptimize objects a lot thread", CHECK);
 988       jobject thread_handle = JNIHandles::make_local(THREAD, thread_oop());
 989       make_thread(deoptimizer_t, thread_handle, NULL, NULL, THREAD);
 990     }
 991   }
 992 #endif // defined(ASSERT) && COMPILER2_OR_JVMCI

I cannot really see broken indentation here. Am I looking at the wrong location?

> escape.cpp
> 
> You enable the optimization in case of testruns. good.
> 
> whitebox.cpp  ok.
> 
> deoptimization.cpp
> 
> deoptimize_objects_alot_loop()  Good.
> 
> globals.hpp
> 
> Nice docu of flags, but pleas mention "for testing purposes"
> or the like in DeoptimizeObjectsALot.
> I would place the flags next to each other. 
> 
> interfaceSupport.cpp: good.

Thanks! :)

-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com> 
Sent: Mittwoch, 6. Mai 2020 12:28
To: Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin <martin.doerr at sap.com>; 'Robbin Ehn' <robbin.ehn at oracle.com>; David Holmes <david.holmes at oracle.com>; Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

I had a look at your change.  It's complex, but not that big.
A lot of code is just passing info through layers of abstraction.
Also, one can tell this went through some iterations by now, 
I think it's very well engineered.
I had a look at webrev.05

Unfortunately
"8242425: JVMTI monitor operations should use Thread-Local Handshakes" 
breaks webrev.05.
I updated to before that change and took that as base of my review.

I see four parts of the change that can be looked at
rather individually.

 * Refactoring the scopeDesc constructors. Trivial.
 * Persisting information about the optimizations done by the compilers.
   Large and mostly trivial.
 * Deoptimizing. The most complicated part. Really well abstracted, though.
 * DeoptimizeObjectsALot for testing and the tests.

Review of compiler changes:

I understand you annotate at safepoints where the escape analysis
finds out that an object is "better" than global escape. 
This are the cases where the analysis identifies optimization 
opportunities. These annotations are then used to deoptimize
frames and the objects referenced by them.
Doesn't this overestimate the optimized 
objects?  E.g., eliminate_alloc_node has many cases where it bails
out.

c1_IR.hpp   

OK, nothing to do for C1, just adapt to extended method signature.

Break line once more so that it matches above line length.

ciEnv.h|cpp

Pass through another jvmti capability.  Trivial & good.

debugInfoRec.hpp

Pass through escape info that must be recorded. OK.

pcDesc.hpp

I would like to see some documentation of the methods.

Maybe:
  // There is an object in the scope that does not escape globally.
  // It either does not escape at all or it escapes as arguemnt.
and
  // One of the arguments is an object that is not globally visible
  // but escapes to the callee.

scopeDesc.cpp

  Besides refactoring copy escape info from pcDesc to scopeDesc
  and add accessors. Trivial.

  In scopeDesc.hpp you talk about NoEscape and ArgEscape. 
  This are opto terms, but scopeDesc is a shared datastructure
  that does not depend on a specific compiler. 
  Please explain what is going on without using these terms.

jvmciCodeInstaller.cpp

  OK, nothing for JVMCI. Here support for Object Optimizations 
  for JVMCI compilers could be added. Leave this to graal people.

callnode.hpp

You add functionality to annotate callnodes with escape information 
This is carried through code generation to final output where it is
added to the compiled methods meta information.

At Safepoints in general jvmti can access
  - Objects that were scalar replaced. They must be reallocated.
    (Flag EliminateAllocations)
  - Objects that should be locked but are not because they never 
    escape the thread. They need to be relocked.

At calls, Objects where locks have been removed escape to callees.
We must persist this information so that if jvmti accesses the 
object in a callee, we can determine by looking at the caller that
it needs to be relocked.

A side comment: 
I think the flage handling in Opto is not very intuitive.
DoEscapeAnalysis depends on the jvmti capabilities.
This makes no sense. It is only an analysis. The optimizations
should depend on the jvmti capabilities.
The correct setup would be to handle this in 
CompilerConfig::ergo_initialize():
If the jvmti capabilities allow, enable the optimizations 
EliminateAllocations or  EliminateLocks/EliminateNestedLocks.
If one of these optimizations is on, enable EscapeAnalysis.
 -- end side comment.

So I would propose the following comments:

  // In the scope of this safepoints there are objects
  // that do not globally escape. They are either NoEscape or
  // ArgEscape. As such, they might be subject to optimizations.
  // Persist this information here so that the frame an the
  // Objects in scope can 
  // be deoptimized if jvmti accesses an object at this safepoint.
  void set_not_global_escape_in_scope(bool b) {

  // This call passes objects that do not globally escape 
  // to its callee. The object might be subject to optimization, 
  // e.g. a lock might be omitted. Persist this information here 
  // so that on a jvmti access to the callee frame we can deoptimize
  // the object and this frame.
  void  set_arg_escape(bool f)             { _arg_escape = f; }

Actuall I am not sure whether the name of these fields (and all 
the others in the course of this change) should refer to 
escape analysis.  I think the term "Object deoptimization" 
you also use is much better. You could call these properties 
(througout the whole change) 
  set_optimized_objects_in_scope()
and
  set_passes_optimized_objects().

I think this would make the whole matter much easier
to understand. 

Anyways, locks can already be removed without running
escape analysis at all. C2 recognizes some local patterns
that allow this.

escape.h|cpp

The code looks good. 

Line 325: The comment could be a bit more elaborate:
  // Annotate at safepoints if they have <= ArgEscape objects in their
  // scope. Additionally, if the safepoint is a java call, annotate
  // whether it passes ArgEscape objects as parameters.

And maybe add these comments?:

// Returns true if an oop in the scope of sfn does not escape
// globally.
bool ConnectionGraph::has_not_global_escape_in_scope(SafePointNode* sfn) {

// Returns true if at least one of the arguments to the call is an oop
// that does not escape globally.
bool ConnectionGraph::has_arg_escape(CallJavaNode* call) {

General question:
You collect the information you want to annotate to the 
method during escape analysis.
Don't you overestimate the optimized objects by this?
E.g. elimination of allocations does bail out for 
various reasons. At the end, no optimization might 
have happened, but then during runtime the frame is 
deoptimized nevertheless.

machnode.hpp:

Extends MachSafePointNode similar to the ideal version.  Good.

matcher.cpp

Copy info from ideal to mach node. good.

output.cpp

Now finally the information is written to the 
debug info.  Good.

---------------------------------------------------------

So now let's have a look at the runtime part (including
relaxing constraints to escape analysis):

rootResolver.cpp

Adapt to changed interface. good.

c2compiler.cpp / macro.cpp

Make EscpaeAnlysis independent of jvmti capabilities. Good.

jvmtiEnv.cpp/jvmtiEnvBase.cpp

You add deoptimization of objects where they are 
accessed. good.

jvmtiImpl.cpp

In deoptimize_objects, you check for DoEscapeAnalysis.
This is correct given the current design of the flag
handling in the compiler.
It's not really nice to have a dependency to C2 here, 
though. I understand it's an optimization, the code 
could be run anyways, it would check but not find
anything. But actually I would excpect dependencies
on EliminateLocks and EliminateAllocations (if they
were set according to jvmti capabilitiers as I elaborated
above.)  
Would it make sense to protect the ArgEscape
loop by if (EliminateLocks)?

jvmtiTagMap.cpp

Deoptimize for jvmti operations.  Good.

deoptimization.cpp

I guess this is the core of your work.

You add a new mode that just deoptimizes objects but not frames. 
Good idea. You have to use reallocated objects in upper frames, 
or by jvmti accesses to inner frames, which can not easily be
replaced by interpreter frames.
This way you can wait with replacing the frame until just before
execution returns.

eliminate_allocations():
(Strange method name, should at least be in past tense, even
better reallocate_eliminated_allocations() or 
allocate_scalarized_objects(). Confused me until
I groked the code. Legacy though, not your business.)

It's not that nice to return whether you only deoptimized
objects by the boolean reference argument. After all, 
it again depends on the mode you pass in.
A different design would be to clone the method and 
have an eliminate_allocations_no_unpack() variant, but that would
not be better as some code would be duplicated.
Maybe a comment for argument eliminate_allocations:
// deoptimized_objects is set to true if objects were deoptimized
// but not the frame. It is unchanged if there are no objects to 
// be deoptimized, or if the frame was deoptim

Similar for eliminate_locks():
// deoptimized_objects is set to true if objects were relocked,
// else it is left unchanged.

You reuse and extend the existing realloc/relock_objects, but extended it.

deoptimize_objects_internal()

Simple version of fetch_unroll_info_helper for EscapeBarrier.
Good.
I attributed the comment "Then relock objects if synchronization on them was eliminated."
to the if() just below. Add an empty line to make clear the comment
refers to the next 10 lines.
Alternatively, replace the whole comment by 
// At first, reallocate the non-escaping objects and restore their fields
// so they are available for relocking.
And add 
// Now relock objects with eliminated locks.
befor the if ((DoEscape... below.

In fetch_unroll_info_helper, I don't understand why you need 
 && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) {
for eliminated locks, but not for skalar replaced objects?
I would guess it is because the eliminated locks can be applied to
argEscape, but scalar replacement only to noescape objects?
I.e. it might have been done before?

But why isn't this the case for eliminate_allocations?
deoptimize_objects_internal does both unconditionally,
so both can happen to inner frames, right?

relock_objects()

Ok, you need to undo biased locking. Also, you remember the 
lock nesting for later relocking if waiting for lock.

revoke_for_object_deoptimization()
  I like if boolean operators are at the beginning of broken lines, 
  but I think hotspot convention is to have them at the end.

Code will get much more simple if BiasedLocking is removed.

EscapeBarrier:: ...

(This class maybe would qualify for a file of its own.)

deoptimize_objects()
I would mention escape analysis only as side remark.  Also, as I understand, 
there is only one frame at given depth?
// Deoptimize frames with optimized objects. This can be omitted locks and 
// objects not allocated but replaced by scalars. In C2, these optimizations
// are based on escape analysis.
// Up to depth, deoptimize frames with any optimized objects.
// From depth to entry_frame, deoptimize only frames that
// pass optimized objects to their callees.
(First part similar for the comment above EscapeBarrier::deoptimize_objects_internal().)

What is the check (cur_depth <= depth) good for? Can you 
ever walk past entry_frame?  

Isn't vf->is_compiled_frame() prerequisite that "Move to next physical frame" 
is needed? You could move it into the other check.
If so, similar for deoptimize_objects_all_threads().

Syncronization: looks good. I think others had a look at this before.

EscapeBarrier::deoptimize_objects_internal()
  The method name is misleading, it is not used by 
  deoptimize_objects().
  Also, method with the same name is in Deopitmization.
  Proposal: deoptimize_objects_thread() ?

C1 stubs: this really shows you tested all configurations, great!

mutexLocker: ok.
objectMonitor.cpp: ok
stackValue.hpp   Is this missing clearing a bug?

thread.hpp

I would remove "_ea" from the flag and method names.

Renaming deferred_locals to deferred_updates is good, as well as 
adding a datastructure for it. 
(Adding this data structure might be a breakout, too.)

good.

thread.cpp

good.

vframe.cpp

Is this a bug in existing code?
Makes sense. 

vframe_hp.hpp 
(What stands _hp for? helper? The file should be named compiledVFrame ...)

not_global_escape_in_scope() ...
Again, you mention escape analysis here. Comments above hold, too.

You introduce JvmtiDeferredUpdates. Good.

vframe_hp.cpp

Changes for JvmtiDeferredUpdates, escape state accessors,

line 422:
Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here?

macros.hpp
  Good.  

Test coding
============

compileBroker.h|cpp

You introduce a third class of threads handled here and 
add a new flag to distinguish it. Before, the two kinds
of threads were distinguished implicitly by passing in 
a compiler for compiler threads.
The new thread kind is only used for testing in debug.

make_thread:
You could assert (comp != NULL...) to assure previous
conditions.

line 989 indentation broken

escape.cpp

You enable the optimization in case of testruns. good.

whitebox.cpp  ok.

deoptimization.cpp

deoptimize_objects_alot_loop()  Good.

globals.hpp

Nice docu of flags, but pleas mention "for testing purposes"
or the like in DeoptimizeObjectsALot.
I would place the flags next to each other. 

interfaceSupport.cpp: good.

I'll look at the test themselves in an extra mail (learning from 
Martin ��)

Best regards,
  Goetz.

> -----Original Message-----
> From: Reingruber, Richard <richard.reingruber at sap.com>
> Sent: Wednesday, April 1, 2020 8:15 AM
> To: Doerr, Martin <martin.doerr at sap.com>; 'Robbin Ehn'
> <robbin.ehn at oracle.com>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com>; David Holmes <david.holmes at oracle.com>;
> Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> serviceability-dev at openjdk.java.net; hotspot-compiler-
> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Martin,
> 
> > thanks for addressing all my points. I've looked over webrev.5 and I'm
> satisfied with your changes.
> 
> Thanks!
> 
> > I had also promised to review the tests.
> 
> Thanks++
> I appreciate it very much, the tests are many lines of code.
> 
> > test/jdk/com/sun/jdi/EATests.java
> > This is a substantial amount of tests which is appropriate for a such a large
> change. Skipping some subtests with UseJVMCICompiler makes sense
> because it doesn't provide the necessary JVMTI functionality, yet.
> > Nice work!
> > I also like that you test with and without BiasedLocking. Your tests will still
> be fine after BiasedLocking deprecation.
> 
> Hope so :)
> 
> > Very minor nits:
> > - 2 typos in comment above EARelockingNestedInflatedTarget: "lockes are
> ommitted" (sounds funny)
> > - You sometimes write "graal" and sometimes "Graal". I guess the capital G
> is better. (Also in EATestsJVMCI.java.)
> 
> > test/jdk/com/sun/jdi/EATestsJVMCI.java
> > EATests with Graal enabled. Nice that you support Graal to some extent.
> Maybe Graal folks want to enhance them in the future. I think this is a good
> starting point.
> 
> Will change this in the next webrev.
> 
> > Conclusion: Looks good and not trivial :-)
> > Now, you have one full review. I'd be ok with covering 2nd review by partial
> reviews.
> > Compiler and JVMTI parts are not too complicated IMHO.
> > Runtime part should get at least one additional careful review.
> 
> Thanks a lot,
> Richard.
> 
> -----Original Message-----
> From: Doerr, Martin <martin.doerr at sap.com>
> Sent: Dienstag, 31. März 2020 16:01
> To: Reingruber, Richard <richard.reingruber at sap.com>; 'Robbin Ehn'
> <robbin.ehn at oracle.com>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com>; David Holmes <david.holmes at oracle.com>;
> Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> serviceability-dev at openjdk.java.net; hotspot-compiler-
> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> thanks for addressing all my points. I've looked over webrev.5 and I'm
> satisfied with your changes.
> 
> 
> I had also promised to review the tests.
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis
> Enabled.java
> Thanks for updating the @summary comment. Looks good in webrev.5.
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnaly
> sisEnabled.c
> JVMTI agent for object tagging and heap iteration. Good.
> 
> test/jdk/com/sun/jdi/EATests.java
> This is a substantial amount of tests which is appropriate for a such a large
> change. Skipping some subtests with UseJVMCICompiler makes sense
> because it doesn't provide the necessary JVMTI functionality, yet.
> Nice work!
> I also like that you test with and without BiasedLocking. Your tests will still be
> fine after BiasedLocking deprecation.
> 
> Very minor nits:
> - 2 typos in comment above EARelockingNestedInflatedTarget: "lockes are
> ommitted" (sounds funny)
> - You sometimes write "graal" and sometimes "Graal". I guess the capital G is
> better. (Also in EATestsJVMCI.java.)
> 
> test/jdk/com/sun/jdi/EATestsJVMCI.java
> EATests with Graal enabled. Nice that you support Graal to some extent.
> Maybe Graal folks want to enhance them in the future. I think this is a good
> starting point.
> 
> 
> Conclusion: Looks good and not trivial :-)
> Now, you have one full review. I'd be ok with covering 2nd review by partial
> reviews.
> Compiler and JVMTI parts are not too complicated IMHO.
> Runtime part should get at least one additional careful review.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: Reingruber, Richard <richard.reingruber at sap.com>
> > Sent: Montag, 30. März 2020 10:32
> > To: Doerr, Martin <martin.doerr at sap.com>; 'Robbin Ehn'
> > <robbin.ehn at oracle.com>; Lindenmaier, Goetz
> > <goetz.lindenmaier at sap.com>; David Holmes
> <david.holmes at oracle.com>;
> > Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
> > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
> > dev at openjdk.java.net
> > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> > in the Presence of JVMTI Agents
> >
> > Hi,
> >
> > this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :)
> >
> > The change affects jvmti, hotspot and c2. Partial reviews are very welcome
> > too.
> >
> > Full:  http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/
> > Delta:
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/
> >
> > Robbin, Martin, please let me know, if anything shouldn't be quite as you
> > wanted it. Also find my
> > comments on your feedback below.
> >
> > Robbin, can I count you as Reviewer for the runtime part?
> >
> > Thanks, Richard.
> >
> > --
> >
> > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > > You can move both declaration and definition to that file, no need to
> > clobber
> > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> >
> > Done.
> >
> > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in
> it's
> > own
> > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> >
> > I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting
> > jvmtiDeferredLocalVariableSet is
> > declared.
> >
> > > src/hotspot/share/code/compiledMethod.cpp
> > > Nice cleanup!
> >
> > Thanks :)
> >
> > > src/hotspot/share/code/debugInfoRec.cpp
> > > src/hotspot/share/code/debugInfoRec.hpp
> > > Additional parmeters. (Remark: I think "non_global_escape_in_scope"
> > would read better than "not_global_escape_in_scope", but your version is
> > consistent with existing code, so no change request from my side.) Ok.
> >
> > I've been thinking about this too and finally stayed with
> > not_global_escape_in_scope. It's supposed
> > to mean an object whose escape state is not GlobalEscape is in scope.
> >
> > > src/hotspot/share/compiler/compileBroker.cpp
> > > src/hotspot/share/compiler/compileBroker.hpp
> > > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into
> > a follow up change together with the test in order to make this webrev
> > smaller, but since it is included, I'm reviewing everything at once. Not a big
> > deal.) Ok.
> >
> > Yes the change would be a little smaller. And if it helps I'll split it off. In
> > general I prefer
> > patches that bring along a suitable amount of tests.
> >
> > > src/hotspot/share/opto/c2compiler.cpp
> > > Make do_escape_analysis independent of JVMCI capabilities. Nice!
> >
> > It is the main goal of the enhancement. It is done for C2, but could be done
> > for JVMCI compilers
> > with just a small effort as well.
> >
> > > src/hotspot/share/opto/escape.cpp
> > > Annotation for MachSafePointNodes. Your added functionality looks
> > correct.
> > > But I'd prefer to move the bulky code out of the large function.
> > > I suggest to factor out something like has_not_global_escape and
> > has_arg_escape. So the code could look like this:
> > >       SafePointNode* sfn = sfn_worklist.at(next);
> > >       sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
> > >       if (sfn->is_CallJava()) {
> > >         CallJavaNode* call = sfn->as_CallJava();
> > >         call->set_arg_escape(has_arg_escape(call));
> > >       }
> > > This would also allow us to get rid of the found_..._escape_in_args
> > variables making the loops better readable.
> >
> > Done.
> >
> > > It's kind of ugly to use strcmp to recognize uncommon trap, but that
> seems
> > to be the way to do it (there are more such places). So it's ok.
> >
> > Yeah. I copied the snippet.
> >
> > > src/hotspot/share/prims/jvmtiImpl.cpp
> > > src/hotspot/share/prims/jvmtiImpl.hpp
> > > The sequence is pretty complex:
> > > VM_GetOrSetLocal element initialization executes EscapeBarrier code
> > which suspends the target thread (extra VM Operation).
> >
> > Note that the target threads have to be suspended already for
> > VM_GetOrSetLocal*. So it's mainly the
> > synchronization effect of EscapeBarrier::sync_and_suspend_one() that is
> > required here. Also no extra
> > _handshake_ is executed, since sync_and_suspend_one() will find the
> > target threads already
> > suspended.
> >
> > > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by
> VM
> > Thread to prepare VM Operation with frame deoptimization).
> > > VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor
> > which resumes the target thread.
> > > But I don't have any improvement proposal. Performance is probably not
> a
> > concern, here. So it's ok.
> >
> > > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it
> > has non-globally escaping objects and other frames if they have arg
> escaping
> > ones. Good.
> >
> > It's not specifically the top frame, but the frame that is accessed.
> >
> > > src/hotspot/share/runtime/deoptimization.cpp
> > > Object deoptimization. I have more comments and proposals, here.
> > > First of all, handling recursive and waiting locks in relock_objects is tricky,
> > but looks correct.
> > > Comments are sufficient to understand why things are done as they are
> > implemented.
> >
> > > BiasedLocking related parts are complex, but we may get rid of them in
> the
> > future (with BiasedLocking removal).
> > > Anyway, looks correct, too.
> >
> > > Typo in comment: "regularily" => "regularly"
> >
> > > Deoptimization::fetch_unroll_info_helper is the only place where
> > _jvmti_deferred_updates get deallocated (except JavaThread destructor).
> > But I think we always go through it, so I can't see a memory leak or such
> kind
> > of issues.
> >
> > That's correct. The compiled frame for which deferred updates are
> allocated
> > is always deoptimized
> > before (see EscapeBarrier::deoptimize_objects()). This is also asserted in
> > compiledVFrame::update_deferred_value(). I've added the same assertion
> > to
> > Deoptimization::relock_objects(). So we can be sure that
> > _jvmti_deferred_updates are deallocated
> > again in fetch_unroll_info_helper().
> >
> > > EscapeBarrier::deoptimize_objects: ResourceMark should use
> > calling_thread().
> >
> > Sure, well spotted!
> >
> > > You can use MutexLocker and MonitorLocker with Thread* to save the
> > Thread::current() call.
> >
> > Right, good hint. This was recently introduced with 8235678. I even had to
> > resolve conflicts. Should
> > have done this then.
> >
> > > I'd make set_objs_are_deoptimized static and remove it from the
> > EscapeBarrier interface because I think it shouldn't be used outside of
> > EscapeBarrier::deoptimize_objects.
> >
> > Done.
> >
> > > Typo in comment: "we must only deoptimize" => "we only have to
> > deoptimize"
> >
> > Replaced with "[...] we deoptimize iff local objects are passed as args"
> >
> > > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and
> > barrier_active() is redundant. Implementation can get moved to hpp file.
> >
> > Ok. Done.
> >
> > > I'll get back to suspend flags, later.
> >
> > > There are weird cases regarding _self_deoptimization_in_progress.
> > > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C.
> > C can set _self_deoptimization_in_progress while A performs the
> handshake
> > for suspending C. I think this doesn't lead to errors, but it's probably not
> > desired.
> > > I think it would be better to use only one "wait" call in
> > sync_and_suspend_one and sync_and_suspend_all.
> >
> > You're right. We've discussed that face-to-face, but couldn't find a real
> issue.
> > But now, thinking again, a reckon I found one:
> >
> > 2808   // Sync with other threads that might be doing deoptimizations
> > 2809   {
> > 2810     // Need to switch to _thread_blocked for the wait() call
> > 2811     ThreadBlockInVM tbivm(_calling_thread);
> > 2812     MonitorLocker ml(EscapeBarrier_lock,
> > Mutex::_no_safepoint_check_flag);
> > 2813     while (_self_deoptimization_in_progress) {
> > 2814       ml.wait();
> > 2815     }
> > 2816
> > 2817     if (self_deopt()) {
> > 2818       _self_deoptimization_in_progress = true;
> > 2819     }
> > 2820
> > 2821     while (_deoptee_thread->is_ea_obj_deopt_suspend()) {
> > 2822       ml.wait();
> > 2823     }
> > 2824
> > 2825     if (self_deopt()) {
> > 2826       return;
> > 2827     }
> > 2828
> > 2829     // set suspend flag for target thread
> > 2830     _deoptee_thread->set_ea_obj_deopt_flag();
> > 2831   }
> >
> > - A waits in 2822
> > - C is suspended
> > - B notifies all in resume_one()
> > - A and C wake up
> > - C wins over A and sets _self_deoptimization_in_progress = true in 2818
> > - C does the self deoptimization
> > - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag()
> >
> > C will self suspend at some undefined point. The resulting state is illegal.
> >
> > > I first thought it'd be better to move ThreadBlockInVM before wait() to
> > reduce thread state transitions, but that seems to be problematic because
> > ThreadBlockInVM destructor contains a safepoint check which we
> shouldn't
> > do while holding EscapeBarrier_lock. So no change request.
> >
> > Yes, would be nice to have the state change only if needed, but for the
> > reason you mentioned it is
> > not quite as easy as it seems to be. I experimented as well with a second
> > lock, but did not succeed.
> >
> > > Change in thred_added:
> > > I think the sequence would be more comprehensive if we waited for
> > deopt_all_threads in Thread::start and all other places where a new thread
> > can run into Java code (e.g. JVMTI attach).
> > > Your version makes new threads come up with suspend flag set. That
> looks
> > correct, too. Advantage is that you only have to change one place
> > (thread_added). It'll be interesting to see how it will look like when we use
> > async handshakes instead of suspend flags.
> > > For now, I'm ok with your version.
> >
> > I had a version that did what you are suggesting. The current version also
> has
> > the advantage, that
> > there are fewer places where a thread has to wait for ongoing object
> > deoptimization. This means
> > viewer places where you have to worry about correct thread state
> > transitions, possible deadlocks,
> > and if all oops are properly Handle'ed.
> >
> > > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt-
> > >is_hidden_from_external_view()).
> >
> > Done.
> >
> > > Having 4 different deoptimize_objects functions makes it a little hard to
> > keep an overview of which one is used for what.
> > > Maybe adding suffixes would help a little bit, but I can also live with what
> > you have.
> > > Implementation looks correct to me.
> >
> > 2 are internal. I added the suffix _internal to them. This leaves 2 to choose
> > from.
> >
> > > src/hotspot/share/runtime/deoptimization.hpp
> > > Escape barriers and object deoptimization functions.
> > > Typo in comment: "helt" => "held"
> >
> > Done in place already.
> >
> > > src/hotspot/share/runtime/interfaceSupport.cpp
> > > InterfaceSupport::deoptimizeAllObjects() is only used for
> > DeoptimizeObjectsALot = 1.
> > > I think DeoptimizeObjectsALot = 2 is more important, but I think it's not
> bad
> > to have DeoptimizeObjectsALot = 1 in addition. Ok.
> >
> > I never used DeoptimizeObjectsALot = 1 that much. It could be more
> > deterministic in single threaded
> > scenarios. I wouldn't object to get rid of it though.
> >
> > > src/hotspot/share/runtime/stackValue.hpp
> > > Better reinitilization in StackValue. Good.
> >
> > StackValue::obj_is_scalar_replaced() should not return true after calling
> > set_obj().
> >
> > > src/hotspot/share/runtime/thread.cpp
> > > src/hotspot/share/runtime/thread.hpp
> > > src/hotspot/share/runtime/thread.inline.hpp
> > > wait_for_object_deoptimization, suspend flag, deferred updates and test
> > feature to deoptimize objects.
> >
> > > In the long term, we want to get rid of suspend flags, so it's not so nice to
> > introduce a new one. But I agree with Götz that it should be acceptable as
> > temporary solution until async handshakes are available (which takes more
> > time). So I'm ok with your change.
> >
> > I'm keen to build the feature on async handshakes when the arive.
> >
> > > You can use MutexLocker with Thread*.
> >
> > Done.
> >
> > > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class
> > out of thread.hpp.
> >
> > Done.
> >
> > > src/hotspot/share/runtime/vframe.cpp
> > > Added support for entry frame to new_vframe. Ok.
> >
> >
> > > src/hotspot/share/runtime/vframe_hp.cpp
> > > src/hotspot/share/runtime/vframe_hp.hpp
> >
> > > I think code()->as_nmethod() in not_global_escape_in_scope() and
> > arg_escape() should better be under #ifdef ASSERT or inside the assert
> > statement (no need for code cache walking in product build).
> >
> > Done.
> >
> > > jvmtiDeferredLocalVariableSet::update_monitors:
> > > Please add a comment explaining that owner referenced by original info
> > may be scalar replaced, but it is deoptimized in the vframe.
> >
> > Done.
> >
> > -----Original Message-----
> > From: Doerr, Martin <martin.doerr at sap.com>
> > Sent: Donnerstag, 12. März 2020 17:28
> > To: Reingruber, Richard <richard.reingruber at sap.com>; 'Robbin Ehn'
> > <robbin.ehn at oracle.com>; Lindenmaier, Goetz
> > <goetz.lindenmaier at sap.com>; David Holmes
> <david.holmes at oracle.com>;
> > Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
> > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
> > dev at openjdk.java.net
> > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> > in the Presence of JVMTI Agents
> >
> > Hi Richard,
> >
> >
> > I managed to find time for a (almost) complete review of webrev.4. (I'll
> > review the tests separately.)
> >
> > First of all, the change seems to be in pretty good quality for its significant
> > complexity. I couldn't find any real bugs. But I'd like to propose minor
> > improvements.
> > I'm convinced that it's mature because we did substantial testing.
> >
> > I like the new functionality for object deoptimization. It can possibly be
> > reused for future escape analysis based optimizations. So I appreciate
> having
> > it available in the code base.
> > In addition to that, your change makes the JVMTI implementation better
> > integrated into the VM.
> >
> >
> > Now to the details:
> >
> >
> > src/hotspot/share/c1/c1_IR.hpp
> > describe_scope parameters. Ok.
> >
> >
> > src/hotspot/share/ci/ciEnv.cpp
> > src/hotspot/share/ci/ciEnv.hpp
> > Fix for JvmtiExport::can_walk_any_space() capability. Ok.
> >
> >
> > src/hotspot/share/code/compiledMethod.cpp
> > Nice cleanup!
> >
> >
> > src/hotspot/share/code/debugInfoRec.cpp
> > src/hotspot/share/code/debugInfoRec.hpp
> > Additional parmeters. (Remark: I think "non_global_escape_in_scope"
> > would read better than "not_global_escape_in_scope", but your version is
> > consistent with existing code, so no change request from my side.) Ok.
> >
> >
> > src/hotspot/share/code/nmethod.cpp
> > Nice cleanup!
> >
> >
> > src/hotspot/share/code/pcDesc.hpp
> > Additional parameters. Ok.
> >
> >
> > src/hotspot/share/code/scopeDesc.cpp
> > src/hotspot/share/code/scopeDesc.hpp
> > Improved implementation + additional parameters. Ok.
> >
> >
> > src/hotspot/share/compiler/compileBroker.cpp
> > src/hotspot/share/compiler/compileBroker.hpp
> > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a
> > follow up change together with the test in order to make this webrev
> > smaller, but since it is included, I'm reviewing everything at once. Not a big
> > deal.) Ok.
> >
> >
> > src/hotspot/share/jvmci/jvmciCodeInstaller.cpp
> > Additional parameters. Ok.
> >
> >
> > src/hotspot/share/opto/c2compiler.cpp
> > Make do_escape_analysis independent of JVMCI capabilities. Nice!
> >
> >
> > src/hotspot/share/opto/callnode.hpp
> > Additional fields for MachSafePointNodes. Ok.
> >
> >
> > src/hotspot/share/opto/escape.cpp
> > Annotation for MachSafePointNodes. Your added functionality looks
> correct.
> > But I'd prefer to move the bulky code out of the large function.
> > I suggest to factor out something like has_not_global_escape and
> > has_arg_escape. So the code could look like this:
> >       SafePointNode* sfn = sfn_worklist.at(next);
> >       sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
> >       if (sfn->is_CallJava()) {
> >         CallJavaNode* call = sfn->as_CallJava();
> >         call->set_arg_escape(has_arg_escape(call));
> >       }
> > This would also allow us to get rid of the found_..._escape_in_args
> variables
> > making the loops better readable.
> >
> > It's kind of ugly to use strcmp to recognize uncommon trap, but that seems
> > to be the way to do it (there are more such places). So it's ok.
> >
> >
> > src/hotspot/share/opto/machnode.hpp
> > Additional fields for MachSafePointNodes. Ok.
> >
> >
> > src/hotspot/share/opto/macro.cpp
> > Allow elimination of non-escaping allocations. Ok.
> >
> >
> > src/hotspot/share/opto/matcher.cpp
> > src/hotspot/share/opto/output.cpp
> > Copy attribute / pass parameters. Ok.
> >
> >
> > src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp
> > Nice cleanup!
> >
> >
> > src/hotspot/share/prims/jvmtiEnv.cpp
> > src/hotspot/share/prims/jvmtiEnvBase.cpp
> > Escape barriers + deoptimize objects for target thread. Good.
> >
> >
> > src/hotspot/share/prims/jvmtiImpl.cpp
> > src/hotspot/share/prims/jvmtiImpl.hpp
> > The sequence is pretty complex:
> > VM_GetOrSetLocal element initialization executes EscapeBarrier code
> which
> > suspends the target thread (extra VM Operation).
> > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM
> > Thread to prepare VM Operation with frame deoptimization).
> > VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor
> which
> > resumes the target thread.
> > But I don't have any improvement proposal. Performance is probably not a
> > concern, here. So it's ok.
> >
> > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has
> > non-globally escaping objects and other frames if they have arg escaping
> > ones. Good.
> >
> >
> > src/hotspot/share/prims/jvmtiTagMap.cpp
> > Escape barriers + deoptimize objects for all threads. Ok.
> >
> >
> > src/hotspot/share/prims/whitebox.cpp
> > Added WB_IsFrameDeoptimized to API. Ok.
> >
> >
> > src/hotspot/share/runtime/deoptimization.cpp
> > Object deoptimization. I have more comments and proposals, here.
> > First of all, handling recursive and waiting locks in relock_objects is tricky,
> but
> > looks correct.
> > Comments are sufficient to understand why things are done as they are
> > implemented.
> >
> > BiasedLocking related parts are complex, but we may get rid of them in the
> > future (with BiasedLocking removal).
> > Anyway, looks correct, too.
> >
> > Typo in comment: "regularily" => "regularly"
> >
> > Deoptimization::fetch_unroll_info_helper is the only place where
> > _jvmti_deferred_updates get deallocated (except JavaThread destructor).
> > But I think we always go through it, so I can't see a memory leak or such
> kind
> > of issues.
> >
> > EscapeBarrier::deoptimize_objects: ResourceMark should use
> > calling_thread().
> >
> > You can use MutexLocker and MonitorLocker with Thread* to save the
> > Thread::current() call.
> >
> > I'd make set_objs_are_deoptimized static and remove it from the
> > EscapeBarrier interface because I think it shouldn't be used outside of
> > EscapeBarrier::deoptimize_objects.
> >
> > Typo in comment: "we must only deoptimize" => "we only have to
> > deoptimize"
> >
> > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and
> > barrier_active() is redundant. Implementation can get moved to hpp file.
> >
> > I'll get back to suspend flags, later.
> >
> > There are weird cases regarding _self_deoptimization_in_progress.
> > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C.
> C
> > can set _self_deoptimization_in_progress while A performs the handshake
> > for suspending C. I think this doesn't lead to errors, but it's probably not
> > desired.
> > I think it would be better to use only one "wait" call in
> > sync_and_suspend_one and sync_and_suspend_all.
> >
> > I first thought it'd be better to move ThreadBlockInVM before wait() to
> > reduce thread state transitions, but that seems to be problematic because
> > ThreadBlockInVM destructor contains a safepoint check which we
> shouldn't
> > do while holding EscapeBarrier_lock. So no change request.
> >
> > Change in thred_added:
> > I think the sequence would be more comprehensive if we waited for
> > deopt_all_threads in Thread::start and all other places where a new thread
> > can run into Java code (e.g. JVMTI attach).
> > Your version makes new threads come up with suspend flag set. That looks
> > correct, too. Advantage is that you only have to change one place
> > (thread_added). It'll be interesting to see how it will look like when we use
> > async handshakes instead of suspend flags.
> > For now, I'm ok with your version.
> >
> > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt-
> > >is_hidden_from_external_view()).
> >
> > Having 4 different deoptimize_objects functions makes it a little hard to
> keep
> > an overview of which one is used for what.
> > Maybe adding suffixes would help a little bit, but I can also live with what
> you
> > have.
> > Implementation looks correct to me.
> >
> >
> > src/hotspot/share/runtime/deoptimization.hpp
> > Escape barriers and object deoptimization functions.
> > Typo in comment: "helt" => "held"
> >
> >
> > src/hotspot/share/runtime/globals.hpp
> > Addition of develop flag DeoptimizeObjectsALotInterval. Ok.
> >
> >
> > src/hotspot/share/runtime/interfaceSupport.cpp
> > InterfaceSupport::deoptimizeAllObjects() is only used for
> > DeoptimizeObjectsALot = 1.
> > I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad
> > to have DeoptimizeObjectsALot = 1 in addition. Ok.
> >
> >
> > src/hotspot/share/runtime/interfaceSupport.inline.hpp
> > Addition of deoptimizeAllObjects. Ok.
> >
> >
> > src/hotspot/share/runtime/mutexLocker.cpp
> > src/hotspot/share/runtime/mutexLocker.hpp
> > Addition of EscapeBarrier_lock. Ok.
> >
> >
> > src/hotspot/share/runtime/objectMonitor.cpp
> > Make recursion count relock aware. Ok.
> >
> >
> > src/hotspot/share/runtime/stackValue.hpp
> > Better reinitilization in StackValue. Good.
> >
> >
> > src/hotspot/share/runtime/thread.cpp
> > src/hotspot/share/runtime/thread.hpp
> > src/hotspot/share/runtime/thread.inline.hpp
> > wait_for_object_deoptimization, suspend flag, deferred updates and test
> > feature to deoptimize objects.
> >
> > In the long term, we want to get rid of suspend flags, so it's not so nice to
> > introduce a new one. But I agree with Götz that it should be acceptable as
> > temporary solution until async handshakes are available (which takes more
> > time). So I'm ok with your change.
> >
> > You can use MutexLocker with Thread*.
> >
> > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class
> out
> > of thread.hpp.
> >
> >
> > src/hotspot/share/runtime/vframe.cpp
> > Added support for entry frame to new_vframe. Ok.
> >
> >
> > src/hotspot/share/runtime/vframe_hp.cpp
> > src/hotspot/share/runtime/vframe_hp.hpp
> >
> > I think code()->as_nmethod() in not_global_escape_in_scope() and
> > arg_escape() should better be under #ifdef ASSERT or inside the assert
> > statement (no need for code cache walking in product build).
> >
> > jvmtiDeferredLocalVariableSet::update_monitors:
> > Please add a comment explaining that owner referenced by original info
> may
> > be scalar replaced, but it is deoptimized in the vframe.
> >
> >
> > src/hotspot/share/utilities/macros.hpp
> > Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok.
> >
> >
> >
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysi
> > sEnabled.java
> >
> test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnal
> > ysisEnabled.c
> > New test. Will review separately.
> >
> >
> > test/jdk/TEST.ROOT
> > Addition of vm.jvmci as required property. Ok.
> >
> >
> > test/jdk/com/sun/jdi/EATests.java
> > test/jdk/com/sun/jdi/EATestsJVMCI.java
> > New test. Will review separately.
> >
> >
> > test/lib/sun/hotspot/WhiteBox.java
> > Added isFrameDeoptimized to API. Ok.
> >
> >
> > That was it. Best regards,
> > Martin
> >
> >
> > > -----Original Message-----
> > > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > > bounces at openjdk.java.net> On Behalf Of Reingruber, Richard
> > > Sent: Dienstag, 3. März 2020 21:23
> > > To: 'Robbin Ehn' <robbin.ehn at oracle.com>; Lindenmaier, Goetz
> > > <goetz.lindenmaier at sap.com>; David Holmes
> > <david.holmes at oracle.com>;
> > > Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > > <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
> > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
> > > dev at openjdk.java.net
> > > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better
> > > Performance in the Presence of JVMTI Agents
> > >
> > > Hi Robbin,
> > >
> > > > > I understand that Robbin proposed to replace the usage of
> > > > > _suspend_flag with handshakes. Apparently, async handshakes
> > > > > are needed to do so. We have been waiting a while for removal
> > > > > of the _suspend_flag / introduction of async handshakes [2].
> > > > > What is the status here?
> > >
> > > > I have an old prototype which I would like to continue to work on.
> > > > So do not assume asynch handshakes will make 15.
> > > > Even if it would, I think there are a lot more investigate work to remove
> > > > _suspend_flag.
> > >
> > > Let us know, if we can be of any help to you and be it only testing.
> > >
> > > > >> Full:
> > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
> > >
> > > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > > > You can move both declaration and definition to that file, no need to
> > > clobber
> > > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> > >
> > > Will do.
> > >
> > > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in
> > it's
> > > own
> > > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> > >
> > > You are right. It shouldn't be declared in thread.hpp. I will look into that.
> > >
> > > > Note that we also think we may have a bug in deopt:
> > > > https://bugs.openjdk.java.net/browse/JDK-8238237
> > >
> > > > I think it would be best, if possible, to push after that is resolved.
> > >
> > > Sure.
> > >
> > > > Not even nearly a full review :)
> > >
> > > I know :)
> > >
> > > Anyways, thanks a lot,
> > > Richard.
> > >
> > >
> > > -----Original Message-----
> > > From: Robbin Ehn <robbin.ehn at oracle.com>
> > > Sent: Monday, March 2, 2020 11:17 AM
> > > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Reingruber,
> > Richard
> > > <richard.reingruber at sap.com>; David Holmes
> > <david.holmes at oracle.com>;
> > > Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > > <vladimir.kozlov at oracle.com>; serviceability-dev at openjdk.java.net;
> > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-
> > > dev at openjdk.java.net
> > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> Performance
> > > in the Presence of JVMTI Agents
> > >
> > > Hi,
> > >
> > > On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote:
> > > > Hi,
> > > >
> > > > I had a look at the progress of this change. Nothing
> > > > happened since Richard posted his update using more
> > > > handshakes [1].
> > > > But we (SAP) would appreciate a lot if this change could
> > > > be successfully reviewed and pushed.
> > > >
> > > > I think there is basic understanding that this
> > > > change is helpful. It fixes a number of issues with JVMTI,
> > > > and will deliver the same performance benefits as EA
> > > > does in current production mode for debugging scenarios.
> > > >
> > > > This is important for us as we run our VMs prepared
> > > > for debugging in production mode.
> > > >
> > > > I understand that Robbin proposed to replace the usage of
> > > > _suspend_flag with handshakes. Apparently, async handshakes
> > > > are needed to do so. We have been waiting a while for removal
> > > > of the _suspend_flag / introduction of async handshakes [2].
> > > > What is the status here?
> > >
> > > I have an old prototype which I would like to continue to work on.
> > > So do not assume asynch handshakes will make 15.
> > > Even if it would, I think there are a lot more investigate work to remove
> > > _suspend_flag.
> > >
> > > >
> > > > I think we should no longer wait, but proceed with
> > > > this change. We will look into removing the usage of
> > > > suspend_flag introduced here once it is possible to implement
> > > > it with handshakes.
> > >
> > > Yes, sure.
> > >
> > > >> Full:
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
> > >
> > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > > You can move both declaration and definition to that file, no need to
> > clobber
> > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> > >
> > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in
> it's
> > > own
> > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> > >
> > > Note that we also think we may have a bug in deopt:
> > > https://bugs.openjdk.java.net/browse/JDK-8238237
> > >
> > > I think it would be best, if possible, to push after that is resolved.
> > >
> > > Not even nearly a full review :)
> > >
> > > Thanks, Robbin
> > >
> > >
> > > >> Incremental:
> > > >>
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/
> > > >>
> > > >> I was not able to eliminate the additional suspend flag now. I'll take
> care
> > > of this
> > > >> as soon as the
> > > >> existing suspend-resume-mechanism is reworked.
> > > >>
> > > >> Testing:
> > > >>
> > > >> Nightly tests @SAP:
> > > >>
> > > >>    JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
> > > Renaissance
> > > >> Suite, SAP specific tests
> > > >>    with fastdebug and release builds on all platforms
> > > >>
> > > >>    Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x
> > > parallel
> > > >> for 24h
> > > >>
> > > >> Thanks, Richard.
> > > >>
> > > >>
> > > >> More details on the changes:
> > > >>
> > > >> * Hide DeoptimizeObjectsALotThread from external view.
> > > >>
> > > >> * Changed EscapeBarrier_lock to be a _safepoint_check_never lock.
> > > >>    It used to be _safepoint_check_sometimes, which will be eliminated
> > > sooner or
> > > >> later.
> > > >>    I added explicit thread state changes with ThreadBlockInVM to code
> > > paths
> > > >> where we can wait()
> > > >>    on EscapeBarrier_lock to become safepoint safe.
> > > >>
> > > >> * Use handshake EscapeBarrierSuspendHandshake to suspend target
> > > threads
> > > >> instead of vm operation
> > > >>    VM_ThreadSuspendAllForObjDeopt.
> > > >>
> > > >> * Removed uses of Threads_lock. When adding a new thread we
> > suspend
> > > it iff
> > > >> EA optimizations are
> > > >>    being reverted. In the previous version we were waiting on
> > > Threads_lock
> > > >> while EA optimizations
> > > >>    were reverted. See EscapeBarrier::thread_added().
> > > >>
> > > >> * Made tests require Xmixed compilation mode.
> > > >>
> > > >> * Made tests agnostic regarding tiered compilation.
> > > >>    I.e. tc isn't disabled anymore, and the tests can be run with tc
> enabled
> > or
> > > >> disabled.
> > > >>
> > > >> * Exercising EATests.java as well with stress test options
> > > >> DeoptimizeObjectsALot*
> > > >>    Due to the non-deterministic deoptimizations some tests need to be
> > > skipped.
> > > >>    We do this to prevent bit-rot of the stress test code.
> > > >>
> > > >> * Executing EATests.java as well with graal if available. Driver for this is
> > > >>    EATestsJVMCI.java. Graal cannot pass all tests, because it does not
> > > provide all
> > > >> the new debug info
> > > >>    (namely not_global_escape_in_scope and arg_escape in
> > > scopeDesc.hpp).
> > > >>    And graal does not yet support the JVMTI operations force early
> > return
> > > and
> > > >> pop frame.
> > > >>
> > > >> * Removed tracing from new jdi tests in EATests.java. Too much trace
> > > output
> > > >> before the debugging
> > > >>    connection is established can cause deadlock because output buffers
> > fill
> > > up.
> > > >>    (See https://bugs.openjdk.java.net/browse/JDK-8173304)
> > > >>
> > > >> * Many copyright year changes and smaller clean-up changes of
> testing
> > > code
> > > >> (trailing white-space and
> > > >>    the like).
> > > >>
> > > >>
> > > >> -----Original Message-----
> > > >> From: David Holmes <david.holmes at oracle.com>
> > > >> Sent: Donnerstag, 19. Dezember 2019 03:12
> > > >> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-
> > > >> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net;
> > > hotspot-
> > > >> runtime-dev at openjdk.java.net; Vladimir Kozlov
> > > (vladimir.kozlov at oracle.com)
> > > >> <vladimir.kozlov at oracle.com>
> > > >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > Performance in
> > > >> the Presence of JVMTI Agents
> > > >>
> > > >> Hi Richard,
> > > >>
> > > >> I think my issue is with the way EliminateNestedLocks works so I'm
> going
> > > >> to look into that more deeply.
> > > >>
> > > >> Thanks for the explanations.
> > > >>
> > > >> David
> > > >>
> > > >> On 18/12/2019 12:47 am, Reingruber, Richard wrote:
> > > >>> Hi David,
> > > >>>
> > > >>>     > >    > Some further queries/concerns:
> > > >>>     > >    >
> > > >>>     > >    > src/hotspot/share/runtime/objectMonitor.cpp
> > > >>>     > >    >
> > > >>>     > >    > Can you please explain the changes to ObjectMonitor::wait:
> > > >>>     > >    >
> > > >>>     > >    > !   _recursions = save      // restore the old recursion count
> > > >>>     > >    > !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > >>>     > >    > increased by the deferred relock count
> > > >>>     > >    >
> > > >>>     > >    > what is the "deferred relock count"? I gather it relates to
> > > >>>     > >    >
> > > >>>     > >    > "The code was extended to be able to deoptimize objects of
> a
> > > >>>     > > frame that
> > > >>>     > >    > is not the top frame and to let another thread than the
> > owning
> > > >>>     > > thread do
> > > >>>     > >    > it."
> > > >>>     > >
> > > >>>     > > Yes, these relate. Currently EA based optimizations are reverted,
> > > when a
> > > >> compiled frame is
> > > >>>     > > replaced with corresponding interpreter frames. Part of this is
> > > relocking
> > > >> objects with eliminated
> > > >>>     > > locking. New with the enhancement is that we do this also just
> > > before
> > > >> object references are
> > > >>>     > > acquired through JVMTI. In this case we deoptimize also the
> > > owning
> > > >> compiled frame C and we
> > > >>>     > > register deoptimized objects as deferred updates. When control
> > > returns
> > > >> to C it gets deoptimized,
> > > >>>     > > we notice that objects are already deoptimized (reallocated and
> > > >> relocked), so we don't do it again
> > > >>>     > > (relocking twice would be incorrect of course). Deferred
> updates
> > > are
> > > >> copied into the new
> > > >>>     > > interpreter frames.
> > > >>>     > >
> > > >>>     > > Problem: relocking is not possible if the target thread T is
> waiting
> > > on the
> > > >> monitor that needs to
> > > >>>     > > be relocked. This happens only with non-local objects with
> > > >> EliminateNestedLocks. Instead relocking
> > > >>>     > > is deferred until T owns the monitor again. This is what the
> piece
> > of
> > > >> code above does.
> > > >>>     >
> > > >>>     >  Sorry I need some more detail here. How can you wait() on an
> > > object
> > > >>>     >  monitor if the object allocation and/or locking was optimised
> > away?
> > > And
> > > >>>     >  what is a "non-local object" in this context? Isn't EA restricted to
> > > >>>     >  thread-confined objects?
> > > >>>
> > > >>> "Non-local object" is an object that escapes its thread. The issue I'm
> > > >> addressing with the changes
> > > >>> in ObjectMonitor::wait are almost unrelated to EA. They are caused
> by
> > > >> EliminateNestedLocks, where C2
> > > >>> eliminates recursive locking of an already owned lock. The lock
> owning
> > > object
> > > >> exists on the heap, it
> > > >>> is locked and you can call wait() on it.
> > > >>>
> > > >>> EliminateLocks is the C2 option that controls lock elimination based
> on
> > > EA.
> > > >> Both optimizations have
> > > >>> in common that objects with eliminated locking need to be relocked
> > > when
> > > >> deoptimizing a frame,
> > > >>> i.e. when replacing a compiled frame with equivalent interpreter
> > > >>> frames. Deoptimization::relock_objects does that job for /all/
> > eliminated
> > > >> locks in scope. /All/ can
> > > >>> be a mix of eliminated nested locks and locks of not-escaping objects.
> > > >>>
> > > >>> New with the enhancement: I call relock_objects earlier, just before
> > > objects
> > > >> pontentially
> > > >>> escape. But then later when the owning compiled frame gets
> > > deoptimized, I
> > > >> must not do it again:
> > > >>>
> > > >>> See call to EscapeBarrier::objs_are_deoptimized in
> > deoptimization.cpp:
> > > >>>
> > > >>>    373   if ((jvmci_enabled || ((DoEscapeAnalysis ||
> > > EliminateNestedLocks) &&
> > > >> EliminateLocks))
> > > >>>    374       && !EscapeBarrier::objs_are_deoptimized(thread,
> > > deoptee.id())) {
> > > >>>    375     bool unused;
> > > >>>    376     eliminate_locks(thread, chunk, realloc_failures, deoptee,
> > > exec_mode,
> > > >> unused);
> > > >>>    377   }
> > > >>>
> > > >>> Now when calling relock_objects early it is quiet possible that I have
> to
> > > relock
> > > >> an object the
> > > >>> target thread currently waits for. Obviously I cannot relock in this
> case,
> > > >> instead I chose to
> > > >>> introduce relock_count_after_wait to JavaThread.
> > > >>>
> > > >>>     >  Is it just that some of the locking gets optimized away e.g.
> > > >>>     >
> > > >>>     >  synchronised(obj) {
> > > >>>     >     synchronised(obj) {
> > > >>>     >       synchronised(obj) {
> > > >>>     >         obj.wait();
> > > >>>     >       }
> > > >>>     >     }
> > > >>>     >  }
> > > >>>     >
> > > >>>     >  If this is reduced to a form as-if it were a single lock of the
> monitor
> > > >>>     >  (due to EA) and the wait() triggers a JVM TI event which leads to
> > the
> > > >>>     >  escape of "obj" then we need to reconstruct the true lock state,
> > and
> > > so
> > > >>>     >  when the wait() internally unblocks and reacquires the monitor it
> > > has to
> > > >>>     >  set the true recursion count to 3, not the 1 that it appeared to be
> > > when
> > > >>>     >  wait() was initially called. Is that the scenario?
> > > >>>
> > > >>> Kind of... except that the locking is not eliminated due to EA and
> there
> > is
> > > no
> > > >> JVM TI event
> > > >>> triggered by wait.
> > > >>>
> > > >>> Add
> > > >>>
> > > >>> LocalObject l1 = new LocalObject();
> > > >>>
> > > >>> in front of the synchrnized blocks and assume a JVM TI agent
> acquires
> > l1.
> > > This
> > > >> triggers the code in
> > > >>> question.
> > > >>>
> > > >>> See that relocking/reallocating is transactional. If it is done then for
> > /all/
> > > >> objects in scope and it is
> > > >>> done at most once. It wouldn't be quite so easy to split this in
> relocking
> > > of
> > > >> nested/EA-based
> > > >>> eliminated locks.
> > > >>>
> > > >>>     >  If so I find this truly awful. Anyone using wait() in a realistic form
> > > >>>     >  requires a notification and so the object cannot be thread
> > confined.
> > > In
> > > >>>
> > > >>> It is not thread confined.
> > > >>>
> > > >>>     >  which case I would strongly argue that upon hitting the wait() the
> > > deopt
> > > >>>     >  should occur unconditionally and so the lock state is correct
> before
> > > we
> > > >>>     >  wait and so we don't need to mess with the recursion count
> > > internally
> > > >>>     >  when we reacquire the monitor.
> > > >>>     >
> > > >>>     > >
> > > >>>     > >    > which I don't like the sound of at all when it comes to
> > > ObjectMonitor
> > > >>>     > >    > state. So I'd like to understand in detail exactly what is going
> > on
> > > here
> > > >>>     > >    > and why.  This is a very intrusive change that seems to badly
> > > break
> > > >>>     > >    > encapsulation and impacts future changes to ObjectMonitor
> > > that are
> > > >> under
> > > >>>     > >    > investigation.
> > > >>>     > >
> > > >>>     > > I would not regard this as breaking encapsulation. Certainly not
> > > badly.
> > > >>>     > >
> > > >>>     > > I've added a property relock_count_after_wait to JavaThread.
> > The
> > > >> property is well
> > > >>>     > > encapsulated. Future ObjectMonitor implementations have to
> > deal
> > > with
> > > >> recursion too. They are free
> > > >>>     > > in choosing a way to do that as long as that property is taken
> into
> > > >> account. This is hardly a
> > > >>>     > > limitation.
> > > >>>     >
> > > >>>     >  I do think this badly breaks encapsulation as you have to add a
> > > callout
> > > >>>     >  from the guts of the ObjectMonitor code to reach into the thread
> > to
> > > get
> > > >>>     >  this lock count adjustment. I understand why you have had to do
> > > this but
> > > >>>     >  I would much rather see a change to the EA optimisation strategy
> > so
> > > that
> > > >>>     >  this is not needed.
> > > >>>     >
> > > >>>     > > Note also that the property is a straight forward extension of
> the
> > > >> existing concept of deferred
> > > >>>     > > local updates. It is embedded into the structure holding them.
> So
> > > not
> > > >> even the footprint of a
> > > >>>     > > JavaThread is enlarged if no deferred updates are generated.
> > > >>>     >
> > > >>>     > [...]
> > > >>>     >
> > > >>>     > >
> > > >>>     > > I'm actually duplicating the existing external suspend
> mechanism,
> > > >> because a thread can be
> > > >>>     > > suspended at most once. And hey, and don't like that either!
> But
> > it
> > > >> seems not unlikely that the
> > > >>>     > > duplicate can be removed together with the original and the
> new
> > > type
> > > >> of handshakes that will be
> > > >>>     > > used for thread suspend can be used for object deoptimization
> > > too. See
> > > >> today's discussion in
> > > >>>     > > JDK-8227745 [2].
> > > >>>     >
> > > >>>     >  I hope that discussion bears some fruit, at the moment it seems
> > not
> > > to
> > > >>>     >  be possible to use handshakes here. :(
> > > >>>     >
> > > >>>     >  The external suspend mechanism is a royal pain in the proverbial
> > > that we
> > > >>>     >  have to carefully live with. The idea that we're duplicating that
> for
> > > >>>     >  use in another fringe area of functionality does not thrill me at all.
> > > >>>     >
> > > >>>     >  To be clear, I understand the problem that exists and that you
> > wish
> > > to
> > > >>>     >  solve, but for the runtime parts I balk at the complexity cost of
> > > >>>     >  solving it.
> > > >>>
> > > >>> I know it's complex, but by far no rocket science.
> > > >>>
> > > >>> Also I find it hard to imagine another fix for JDK-8233915 besides
> > > changing
> > > >> the JVM TI specification.
> > > >>>
> > > >>> Thanks, Richard.
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: David Holmes <david.holmes at oracle.com>
> > > >>> Sent: Dienstag, 17. Dezember 2019 08:03
> > > >>> To: Reingruber, Richard <richard.reingruber at sap.com>;
> serviceability-
> > > >> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net;
> > > hotspot-
> > > >> runtime-dev at openjdk.java.net; Vladimir Kozlov
> > > (vladimir.kozlov at oracle.com)
> > > >> <vladimir.kozlov at oracle.com>
> > > >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > Performance
> > > >> in the Presence of JVMTI Agents
> > > >>>
> > > >>> <resend as my mailer crashed during last send>
> > > >>>
> > > >>> David
> > > >>>
> > > >>> On 17/12/2019 4:57 pm, David Holmes wrote:
> > > >>>> Hi Richard,
> > > >>>>
> > > >>>> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
> > > >>>>> Hi David,
> > > >>>>>
> > > >>>>>      > Some further queries/concerns:
> > > >>>>>      >
> > > >>>>>      > src/hotspot/share/runtime/objectMonitor.cpp
> > > >>>>>      >
> > > >>>>>      > Can you please explain the changes to ObjectMonitor::wait:
> > > >>>>>      >
> > > >>>>>      > !   _recursions = save      // restore the old recursion count
> > > >>>>>      > !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > >>>>>      > increased by the deferred relock count
> > > >>>>>      >
> > > >>>>>      > what is the "deferred relock count"? I gather it relates to
> > > >>>>>      >
> > > >>>>>      > "The code was extended to be able to deoptimize objects of a
> > > >>>>> frame that
> > > >>>>>      > is not the top frame and to let another thread than the owning
> > > >>>>> thread do
> > > >>>>>      > it."
> > > >>>>>
> > > >>>>> Yes, these relate. Currently EA based optimizations are reverted,
> > > when
> > > >>>>> a compiled frame is replaced
> > > >>>>> with corresponding interpreter frames. Part of this is relocking
> > > >>>>> objects with eliminated
> > > >>>>> locking. New with the enhancement is that we do this also just
> > before
> > > >>>>> object references are acquired
> > > >>>>> through JVMTI. In this case we deoptimize also the owning
> compiled
> > > >>>>> frame C and we register
> > > >>>>> deoptimized objects as deferred updates. When control returns to
> > C
> > > it
> > > >>>>> gets deoptimized, we notice
> > > >>>>> that objects are already deoptimized (reallocated and relocked), so
> > > we
> > > >>>>> don't do it again (relocking
> > > >>>>> twice would be incorrect of course). Deferred updates are copied
> > into
> > > >>>>> the new interpreter frames.
> > > >>>>>
> > > >>>>> Problem: relocking is not possible if the target thread T is waiting
> > > >>>>> on the monitor that needs to be
> > > >>>>> relocked. This happens only with non-local objects with
> > > >>>>> EliminateNestedLocks. Instead relocking is
> > > >>>>> deferred until T owns the monitor again. This is what the piece of
> > > >>>>> code above does.
> > > >>>>
> > > >>>> Sorry I need some more detail here. How can you wait() on an
> object
> > > >>>> monitor if the object allocation and/or locking was optimised away?
> > > And
> > > >>>> what is a "non-local object" in this context? Isn't EA restricted to
> > > >>>> thread-confined objects?
> > > >>>>
> > > >>>> Is it just that some of the locking gets optimized away e.g.
> > > >>>>
> > > >>>> synchronised(obj) {
> > > >>>>      synchronised(obj) {
> > > >>>>        synchronised(obj) {
> > > >>>>          obj.wait();
> > > >>>>        }
> > > >>>>      }
> > > >>>> }
> > > >>>>
> > > >>>> If this is reduced to a form as-if it were a single lock of the monitor
> > > >>>> (due to EA) and the wait() triggers a JVM TI event which leads to the
> > > >>>> escape of "obj" then we need to reconstruct the true lock state, and
> > so
> > > >>>> when the wait() internally unblocks and reacquires the monitor it
> has
> > to
> > > >>>> set the true recursion count to 3, not the 1 that it appeared to be
> > when
> > > >>>> wait() was initially called. Is that the scenario?
> > > >>>>
> > > >>>> If so I find this truly awful. Anyone using wait() in a realistic form
> > > >>>> requires a notification and so the object cannot be thread confined.
> > In
> > > >>>> which case I would strongly argue that upon hitting the wait() the
> > > deopt
> > > >>>> should occur unconditionally and so the lock state is correct before
> > we
> > > >>>> wait and so we don't need to mess with the recursion count
> internally
> > > >>>> when we reacquire the monitor.
> > > >>>>
> > > >>>>>
> > > >>>>>      > which I don't like the sound of at all when it comes to
> > > >>>>> ObjectMonitor
> > > >>>>>      > state. So I'd like to understand in detail exactly what is going
> > > >>>>> on here
> > > >>>>>      > and why.  This is a very intrusive change that seems to badly
> > > break
> > > >>>>>      > encapsulation and impacts future changes to ObjectMonitor
> > that
> > > >>>>> are under
> > > >>>>>      > investigation.
> > > >>>>>
> > > >>>>> I would not regard this as breaking encapsulation. Certainly not
> > badly.
> > > >>>>>
> > > >>>>> I've added a property relock_count_after_wait to JavaThread. The
> > > >>>>> property is well
> > > >>>>> encapsulated. Future ObjectMonitor implementations have to deal
> > > with
> > > >>>>> recursion too. They are free in
> > > >>>>> choosing a way to do that as long as that property is taken into
> > > >>>>> account. This is hardly a
> > > >>>>> limitation.
> > > >>>>
> > > >>>> I do think this badly breaks encapsulation as you have to add a
> callout
> > > >>>> from the guts of the ObjectMonitor code to reach into the thread to
> > > get
> > > >>>> this lock count adjustment. I understand why you have had to do
> this
> > > but
> > > >>>> I would much rather see a change to the EA optimisation strategy so
> > > that
> > > >>>> this is not needed.
> > > >>>>
> > > >>>>> Note also that the property is a straight forward extension of the
> > > >>>>> existing concept of deferred
> > > >>>>> local updates. It is embedded into the structure holding them. So
> > not
> > > >>>>> even the footprint of a
> > > >>>>> JavaThread is enlarged if no deferred updates are generated.
> > > >>>>>
> > > >>>>>      > ---
> > > >>>>>      >
> > > >>>>>      > src/hotspot/share/runtime/thread.cpp
> > > >>>>>      >
> > > >>>>>      > Can you please explain why
> > > >>>>> JavaThread::wait_for_object_deoptimization
> > > >>>>>      > has to be handcrafted in this way rather than using proper
> > > >>>>> transitions.
> > > >>>>>      >
> > > >>>>>
> > > >>>>> I wrote wait_for_object_deoptimization taking
> > > >>>>> JavaThread::java_suspend_self_with_safepoint_check
> > > >>>>> as template. So in short: for the same reasons :)
> > > >>>>>
> > > >>>>> Threads reach both methods as part of thread state transitions,
> > > >>>>> therefore special handling is
> > > >>>>> required to change thread state on top of ongoing transitions.
> > > >>>>>
> > > >>>>>      > We got rid of "deopt suspend" some time ago and it is
> > disturbing
> > > >>>>> to see
> > > >>>>>      > it being added back (effectively). This seems like it may be
> > > >>>>> something
> > > >>>>>      > that handshakes could be used for.
> > > >>>>>
> > > >>>>> Deopt suspend used to be something rather different with a
> similar
> > > >>>>> name[1]. It is not being added back.
> > > >>>>
> > > >>>> I stand corrected. Despite comments in the code to the contrary
> > > >>>> deopt_suspend didn't actually cause a self-suspend. I was doing a
> lot
> > of
> > > >>>> cleanup in this area 13 years ago :)
> > > >>>>
> > > >>>>>
> > > >>>>> I'm actually duplicating the existing external suspend mechanism,
> > > >>>>> because a thread can be suspended
> > > >>>>> at most once. And hey, and don't like that either! But it seems not
> > > >>>>> unlikely that the duplicate can
> > > >>>>> be removed together with the original and the new type of
> > > handshakes
> > > >>>>> that will be used for
> > > >>>>> thread suspend can be used for object deoptimization too. See
> > > today's
> > > >>>>> discussion in JDK-8227745 [2].
> > > >>>>
> > > >>>> I hope that discussion bears some fruit, at the moment it seems not
> > to
> > > >>>> be possible to use handshakes here. :(
> > > >>>>
> > > >>>> The external suspend mechanism is a royal pain in the proverbial
> that
> > > we
> > > >>>> have to carefully live with. The idea that we're duplicating that for
> > > >>>> use in another fringe area of functionality does not thrill me at all.
> > > >>>>
> > > >>>> To be clear, I understand the problem that exists and that you wish
> to
> > > >>>> solve, but for the runtime parts I balk at the complexity cost of
> > > >>>> solving it.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> David
> > > >>>> -----
> > > >>>>
> > > >>>>> Thanks, Richard.
> > > >>>>>
> > > >>>>> [1] Deopt suspend was something like an async. handshake for
> > > >>>>> architectures with register windows,
> > > >>>>>        where patching the return pc for deoptimization of a compiled
> > > >>>>> frame was racy if the owner thread
> > > >>>>>        was in native code. Instead a "deopt" suspend flag was set on
> > > >>>>> which the thread patched its own
> > > >>>>>        frame upon return from native. So no thread was suspended.
> It
> > > got
> > > >>>>> its name only from the name of
> > > >>>>>        the flags.
> > > >>>>>
> > > >>>>> [2] Discussion about using handshakes to sync. with the target
> > thread:
> > > >>>>>
> > > >>>>> https://bugs.openjdk.java.net/browse/JDK-
> > > >>
> > >
> >
> 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst
> > > e
> > > >> m.issuetabpanels:comment-tabpanel#comment-14306727
> > > >>>>>
> > > >>>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: David Holmes <david.holmes at oracle.com>
> > > >>>>> Sent: Freitag, 13. Dezember 2019 00:56
> > > >>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
> > > >>>>> serviceability-dev at openjdk.java.net;
> > > >>>>> hotspot-compiler-dev at openjdk.java.net;
> > > >>>>> hotspot-runtime-dev at openjdk.java.net
> > > >>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > >>>>> Performance in the Presence of JVMTI Agents
> > > >>>>>
> > > >>>>> Hi Richard,
> > > >>>>>
> > > >>>>> Some further queries/concerns:
> > > >>>>>
> > > >>>>> src/hotspot/share/runtime/objectMonitor.cpp
> > > >>>>>
> > > >>>>> Can you please explain the changes to ObjectMonitor::wait:
> > > >>>>>
> > > >>>>> !   _recursions = save      // restore the old recursion count
> > > >>>>> !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > >>>>> increased by the deferred relock count
> > > >>>>>
> > > >>>>> what is the "deferred relock count"? I gather it relates to
> > > >>>>>
> > > >>>>> "The code was extended to be able to deoptimize objects of a
> > frame
> > > that
> > > >>>>> is not the top frame and to let another thread than the owning
> > thread
> > > do
> > > >>>>> it."
> > > >>>>>
> > > >>>>> which I don't like the sound of at all when it comes to
> ObjectMonitor
> > > >>>>> state. So I'd like to understand in detail exactly what is going on
> here
> > > >>>>> and why.  This is a very intrusive change that seems to badly break
> > > >>>>> encapsulation and impacts future changes to ObjectMonitor that
> > are
> > > under
> > > >>>>> investigation.
> > > >>>>>
> > > >>>>> ---
> > > >>>>>
> > > >>>>> src/hotspot/share/runtime/thread.cpp
> > > >>>>>
> > > >>>>> Can you please explain why
> > > JavaThread::wait_for_object_deoptimization
> > > >>>>> has to be handcrafted in this way rather than using proper
> > transitions.
> > > >>>>>
> > > >>>>> We got rid of "deopt suspend" some time ago and it is disturbing
> to
> > > see
> > > >>>>> it being added back (effectively). This seems like it may be
> > something
> > > >>>>> that handshakes could be used for.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> David
> > > >>>>> -----
> > > >>>>>
> > > >>>>> On 12/12/2019 7:02 am, David Holmes wrote:
> > > >>>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
> > > >>>>>>> Hi David,
> > > >>>>>>>
> > > >>>>>>>       > Most of the details here are in areas I can comment on in
> > > detail,
> > > >>>>>>> but I
> > > >>>>>>>       > did take an initial general look at things.
> > > >>>>>>>
> > > >>>>>>> Thanks for taking the time!
> > > >>>>>>
> > > >>>>>> Apologies the above should read:
> > > >>>>>>
> > > >>>>>> "Most of the details here are in areas I *can't* comment on in
> > detail
> > > >>>>>> ..."
> > > >>>>>>
> > > >>>>>> David
> > > >>>>>>
> > > >>>>>>>       > The only thing that jumped out at me is that I think the
> > > >>>>>>>       > DeoptimizeObjectsALotThread should be a hidden thread.
> > > >>>>>>>       >
> > > >>>>>>>       > +  bool is_hidden_from_external_view() const { return true;
> > }
> > > >>>>>>>
> > > >>>>>>> Yes, it should. Will add the method like above.
> > > >>>>>>>
> > > >>>>>>>       > Also I don't see any testing of the
> > > DeoptimizeObjectsALotThread.
> > > >>>>>>> Without
> > > >>>>>>>       > active testing this will just bit-rot.
> > > >>>>>>>
> > > >>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
> > > >>>>>>> workload. I will add a minimal test
> > > >>>>>>> to keep it fresh.
> > > >>>>>>>
> > > >>>>>>>       > Also on the tests I don't understand your @requires clause:
> > > >>>>>>>       >
> > > >>>>>>>       >   @requires ((vm.compMode != "Xcomp") &
> > > vm.compiler2.enabled
> > > >> &
> > > >>>>>>>       > (vm.opt.TieredCompilation != true))
> > > >>>>>>>       >
> > > >>>>>>>       > This seems to require that TieredCompilation is disabled,
> but
> > > >>>>>>> tiered is
> > > >>>>>>>       > our normal mode of operation. ??
> > > >>>>>>>       >
> > > >>>>>>>
> > > >>>>>>> I removed the clause. I guess I wanted to target the tests
> towards
> > > the
> > > >>>>>>> code they are supposed to
> > > >>>>>>> test, and it's easier to analyze failures w/o tiered compilation
> and
> > > >>>>>>> with just one compiler thread.
> > > >>>>>>>
> > > >>>>>>> Additionally I will make use of
> > > >>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the
> > tests.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Richard.
> > > >>>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: David Holmes <david.holmes at oracle.com>
> > > >>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
> > > >>>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
> > > >>>>>>> serviceability-dev at openjdk.java.net;
> > > >>>>>>> hotspot-compiler-dev at openjdk.java.net;
> > > >>>>>>> hotspot-runtime-dev at openjdk.java.net
> > > >>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > >>>>>>> Performance in the Presence of JVMTI Agents
> > > >>>>>>>
> > > >>>>>>> Hi Richard,
> > > >>>>>>>
> > > >>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> I would like to get reviews please for
> > > >>>>>>>>
> > > >>>>>>>>
> > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> > > >>>>>>>>
> > > >>>>>>>> Corresponding RFE:
> > > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
> > > >>>>>>>>
> > > >>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> > > >>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-
> > > 8214584 [1]
> > > >>>>>>>>
> > > >>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing
> > > without
> > > >>>>>>>> issues (thanks!). In addition the
> > > >>>>>>>> change is being tested at SAP since I posted the first RFR some
> > > >>>>>>>> months ago.
> > > >>>>>>>>
> > > >>>>>>>> The intention of this enhancement is to benefit performance
> > wise
> > > from
> > > >>>>>>>> escape analysis even if JVMTI
> > > >>>>>>>> agents request capabilities that allow them to access local
> > variable
> > > >>>>>>>> values. E.g. if you start-up
> > > >>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,
> > > then
> > > >>>>>>>> escape analysis is disabled right
> > > >>>>>>>> from the beginning, well before a debugger attaches -- if ever
> > one
> > > >>>>>>>> should do so. With the
> > > >>>>>>>> enhancement, escape analysis will remain enabled until and
> > after
> > > a
> > > >>>>>>>> debugger attaches. EA based
> > > >>>>>>>> optimizations are reverted just before an agent acquires the
> > > >>>>>>>> reference to an object. In the JBS item
> > > >>>>>>>> you'll find more details.
> > > >>>>>>>
> > > >>>>>>> Most of the details here are in areas I can comment on in detail,
> > but
> > > I
> > > >>>>>>> did take an initial general look at things.
> > > >>>>>>>
> > > >>>>>>> The only thing that jumped out at me is that I think the
> > > >>>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
> > > >>>>>>>
> > > >>>>>>> +  bool is_hidden_from_external_view() const { return true; }
> > > >>>>>>>
> > > >>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread.
> > > >>>>>>> Without
> > > >>>>>>> active testing this will just bit-rot.
> > > >>>>>>>
> > > >>>>>>> Also on the tests I don't understand your @requires clause:
> > > >>>>>>>
> > > >>>>>>>       @requires ((vm.compMode != "Xcomp") &
> > > vm.compiler2.enabled &
> > > >>>>>>> (vm.opt.TieredCompilation != true))
> > > >>>>>>>
> > > >>>>>>> This seems to require that TieredCompilation is disabled, but
> > tiered
> > > is
> > > >>>>>>> our normal mode of operation. ??
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> David
> > > >>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Richard.
> > > >>>>>>>>
> > > >>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
> > > >>>>>>>>
> > > >>
> > >
> >
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa
> > > tc
> > > >> h
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>