RFR: 8241605: Shenandoah: More aggressive reference discovery

Thu Mar 26 15:21:29 UTC 2020

Hi Roman,

For the record, the barriers in debugInfo.cpp are for locals 
materialized through nmethod oops.
That is not the problem I talked about. It is locals on-stack. You do 
indeed apply barriers
when deoptimizing on said locals. It eventually goes through a shared 
stack walking API in
stackValue.cpp, where we see the following code protecting oops loads 
from the stack:

#if INCLUDE_SHENANDOAHGC
       if (UseShenandoahGC) {
         val = 
ShenandoahBarrierSet::barrier_set()->load_reference_barrier(val);
       }
#endif

This function seems to take care of relocation, but not keeping things 
alive. I'm guessing it is
presumed that things on-stack should already be live. ;)

On a related note... note that this also implies that if youdeoptimize 
between a referent load
and its keep-alive barrier, you will expose broken objectsto the object 
graph, as nothing keeps
that alive. This comprises part of the reason we don'twant to allow the 
compiler to put safepoints
between loads and their load barriers. This problembecomes a million 
times worse when you have
concurrent reference processing, as you can't knowthe strength used to 
load the referent, and
depending on the strength, the stack fixup code needsto change the value 
to different things.
Before we switched over to our super late barrierexpansion, there were 
still a number of scenarios
causing safepoints to float in between loads and theirbarriers. I would 
be concerned about that,
because I don't believe anything has changed. It's just very hard to 
find the bugs.

Anyway, thank you for withdrawing the RFR.

Thanks,
/Erik

On 2020-03-26 14:49, Roman Kennke wrote:
> Hi Erik,
>
> Thank you for sharing your thoughts.
>
> You are right. Some isses can (and already are) addressed, for example
> the barriers in debugInfo.cpp take care of exposure via deoptimization.
> I haven't checked the stackwalker, but I think it can be taken care of
> in a similar fashion.
>
> The main problem is indeed the exposure of objects that are transitively
> reachable from the referent (incl. classes). This can not be easily fixed.
>
> I'll withdraw the RFR.
>
> Thank you!
> Roman
>
>
>> Hi Roman,
>>
>> First of all, interesting idea. Thanks for sharing your thoughts. I
>> would however like to share
>> some conceptual concerns. The patch has some flaws that needs further
>> consideration.
>> The core assumption that is wrong is that if there are no stores/returns
>> exposing the referent,
>> then it is safe to elide the keep-alive barrier. This is not true.
>>
>> Listing a selection of concerns/bugs here:
>>
>> 1) What the compiler proves can't happen, can always suddenly happen,
>> due to class redefinition.
>> So any idea utilizing compiler proof of what paths can be taken, always
>> need to be backed up
>> by fall-back paths when the optimization eventually gets proven wrong.
>> Let's have a look at
>> this trivial example:
>>
>> void A(List<Reference> references) {
>>    for (var ref: references) {
>>      Object local = ref.get();
>>      for (int i = 0; i < 10000; ++i) (
>>        B(local);
>>        // safepoint poll with local being live
>>      }
>>    }
>> }
>>
>> void B(Object referent) {
>>    if (referent == System.out) {
>>      System.out.println("I like fish. They swim. Blub blub.");
>>    }
>> }
>>
>> Assume that we compile A, inlining B. The compiler can now prove that
>> the referent doesn't escape.
>> However, we then hit a safepoint in A, and in that safepoint, B is
>> redefined to do this:
>>
>> void B(Object referent) {
>>    globalList.add(referent);
>> }
>>
>> What will happen now is that the nmethod of A (and B inlined) gets
>> deoptimized. Once execution
>> starts, the deopt handler will run, transferring the state (including
>> the local) over to the
>> interpreter, and in the interpreted version of B, the referent illegally
>> escapes, eventually
>> causing heap corruption, as subsequent remarks will not see the escaped
>> local.
>>
>> 2) Even without class redefinition changing the logic that the compiler
>> proved had no escaping
>> referents through stores, stack walkers can expose locals at any
>> safepoint poll. So in the above
>> example, a stack walker can expose the local to other Java code, which
>> attaches it to the object
>> graph, without ever marking it, again causing heap corruption.
>>
>> Because of this, stack walkers and deoptimization logic needs to keep
>> everything they see in the
>> stack alive, or things will break in subtle ways that are not fun to debug.
>>
>> 3) The analysis needs to also consider not just exposing of the
>> referent, but exposure of anything
>> transitively reachable from the object graph under the referent. If you
>> load the referent, which is a
>> wrapper object, then load the contents of the wrapper, and expose that
>> to the object graph, then that
>> will similarly break, even though the referent itself was never exposed.
>> I can't see that your analysis
>> catches that (LoadP with an AddP with the referent as base, and
>> transitively performing the analysis on that)
>>
>> 4) Even then, there are more problems, involving class unloading. It is
>> not just the object (and its
>> transitive closure) you need to consider w.r.t. whether to keep it alive
>> or not, but implicitly
>> transitively also its class, and the classes of everything transitively
>> reachable from the referent,
>> which is now ignored. For example, if someone calls
>> referent.(chain_of_objects).getClass(), then you can
>> expose classes that have not been kept alive, as your strong native
>> barriers do not keep things alive, AFAICT.
>>
>> 5) Classes are implicitly used when code is run from its class holder,
>> and there are a few ways in
>> which this could go wrong. The nmethod entry barriers shields you
>> against many of them (including
>> speculative inlining of bytecodes from otherwise not kept alive Klass),
>> but not all.
>> For example, if such a transitive class under the referent is acquired,
>> and passed as an argument
>> to the interpreter, it can reflectively call methods without holding its
>> class alive, causing various
>> forms of failure modes crashing the JVM.
>>
>> 6) Sometimes the use of classes is even more subtle and implicit. Here
>> is one example I can think of:
>> Imagine that your referent performs a virtual call. It is found throuch
>> CHA that there exists a single
>> unique selected method (callee). Then it is inlined straight into the
>> code, without any conditional checking
>> that the klass pointer was right. An entry is created into the
>> dependency context. But the class tracked
>> in the dependency context is not the selected class (i.e. the one you
>> inlined), but the base class,
>> because that is the class that will get poked during class loading, to
>> check when the CHA assumption
>> fails in the future. So the dependency context does not track the callee
>> method(). The oop finalization
>> will not find any explicit mentions of this klass. It is just implicitly
>> used. The only scenario in which
>> we explicitly track inlined callees classes, is with the following code
>> for class redefinition:
>>
>>    // Always register dependence if JVMTI is enabled, because
>>    // either breakpoint setting or hotswapping of methods may
>>    // cause deoptimization.
>>    if (C->env()->jvmti_can_hotswap_or_post_breakpoint()) {
>>       C->dependencies()->assert_evol_method(method());
>>    }
>>
>> This dependency will create a metadata entry in the nmethod which is
>> caught by the oop finalizer, and
>> hence by your nmethod entry barriers. When
>> jvmti_can_hotswap_or_post_breakpoint() is disabled, the callee holder
>> is not tracked anywhere, because it is assumed that the receiver of such
>> an inlined method call is live,
>> which seems like a reasonable assumption. Now that this elision of
>> keep-alive barriers is used, we enter a
>> gateway to hell. In bytecodes of this not kept alive class loader, you
>> can do lots of fun things, like
>> allocating an object of a classin that class loader, or materialize
>> constants from its static finals,
>> that will not be caught by the GC.This results in objects with broken
>> class pointers being allocated,
>> and dead objects being attached to the object graph. From there on, it
>> is game over.
>>
>> 7) I can imagine code involving method handles going wrong too. You can
>> perform method handle calls through
>> a referent, and the whole method handle call may get inlined. It tracks
>> the holder through a MemberName.
>> It is assumed that if you have a reference to the MemberName, it has a
>> reference to the class holder, and
>> hence it should be safe to execute its bytecodes. But when we start
>> relaxing things such that this has not
>> been kept alive, I can imagine such code blowing up in unexpected and
>> obscure ways. It looks like nothing
>> transitively reachable from the referent is being exposed in any way,
>> and it looks like there are no calls,
>> but inlined use of bytecodes of reachable metadata is an implicit side
>> effect that exposes not the referent
>> but just references to its transitively reachable class loaders, without
>> there ever being an obvious store
>> or side effect involving the referent or its transitive closure of
>> objects or uses.
>>
>> This aside, I think there are many other fun assumptions made over the
>> years about things being live, not
>> listed above, implyingvarious subtle assumptions in our compilers. I
>> would be quite wary of messing with
>> that without a very careful lookinto all the things that can go
>> wrong.Catching such bugs will be a nightmare.
>>
>> So given these various issues, are you still sure you want to continue
>> down this path? If I were you, I
>> think I would probably not perform this change.It gives me a migraine to
>> think about the things that can
>> go wrong, and I feel like I have just scratched the surface of it. I
>> would be quite afraid of destabilizing.
>>
>> /Erik
>>
>> On 2020-03-25 18:29, Roman Kennke wrote:
>>> Shenandoah uses SATB for concurrent marking, and thus needs to track
>>> stores and explicitely make previous values grey. An exception to this
>>> are (soft/weak/final) references: the reachability of referents can be
>>> changed (from weak to strong) when a referent is accessed. For this
>>> reason we require a keep-alive barrier which also makes referents grey
>>> whenever they are accessed via Reference.get(). The downside of this is
>>> if a workload churns weak-references (i.e. accesses them often) they
>>> might never get a chance to be reclaimed.
>>>
>>> The key insight for improving the situation is that the change of
>>> reachability of the referent is only relevant when that referent is
>>> stored somewhere else. We can elide the keep-alive barrier, if the
>>> compiler can prove that this doesn't happen. (We also need to check if
>>> the referent leaves the scope of the method via method-call or return,
>>> because it may be stored outside of the method.) The caveat here is that
>>> we must do another scan of the threads' stacks at final-mark to catch
>>> referents that are in a local variable at final-mark - we must not loose
>>> those. However, according to my measurements, this is only a minor (if
>>> any) regression in final-mark-latency.
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8241605
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/
>>>
>>> Testing: hotspot_gc_shenandoah (including new testcase that verifies the
>>> improved behaviour), manual tests, specjbb and specjvm runs
>>>
>>> I'd like to push this to shenandoah/jdk first, to give it a few more
>>> rounds of testing before hitting mainline JDK.
>>>
>>> Ok?
>>>
>>> Roman
>>>