RFR: 8256020: Don't resurrect objects on argument-dependency access

Tue Nov 10 14:11:55 UTC 2020

On Sun, 8 Nov 2020 21:35:29 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> In Shenandoah-testing, we noticed that compiler/jsr292/CallSiteDepContextTest.java fails with the following error:
> 
> Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:92), pid=906849, tid=907073
> Error: Before Updating References, Marked; Must be marked in complete bitmap
> 
> Referenced from:
>   interior location: 0x00000000fff87504
>   0x00000000fff874f8 - klass 0x000000010004ecd8 java.lang.invoke.MutableCallSite
>         allocated after mark start
>     not after update watermark
>         marked strong
>         marked weak
>     not in collection set
>   mark: mark(is_neutral no_hash age=0)
>   region: | 2565|R |BTE fff80000, fffc0000, fffc0000|TAMS fff80000|UWM fffc0000|U 256K|T 0B|G 256K|S 0B|L 0B|CP 0
> 
> Object:
>   0x00000000d80a9210 - klass 0x000000010004cf58 java.lang.invoke.DirectMethodHandle
>     not allocated after mark start
>     not after update watermark
>     not marked strong
>     not marked weak
>         in collection set
>   mark: mark(is_neutral no_hash age=0)
>   region: | 9|CS |BTE d8080000, d80c0000, d80c0000|TAMS d80c0000|UWM d80c0000|U 256K|T 256K|G 0B|S 0B|L 22464B|CP 0
> 
> Forwardee:
>   (the object itself)
> 
> In other words, a reachable (marked) MutableCallSite references an unreachable DirectMethodHandle. That reference would subsequently become dangling and lead to crashes if accessed.
> 
> I narrowed it down to the access in Dependencies::DepStream::recorded_oop_at(int i) which is done as 'strong', which means that it would return the reference even if it is unreachable, e.g. during concurrent class-unloading. This resurrection of the unreachable DMH is potentially fatal: eventually the reference will become dangling (after GC) and lead to crashes when accessed. I believe that access should be 'phantom' instead which causes GCs like Shenandoah and ZGC to return NULL when encountering unreachable objects. 
> 
> (Notice that the bug only manifested after JDK-8255691, we accidentally applied the resurrection-preventing weak-LRB on strong access too)
> 
> Testing: the offending CallSiteDepContextTest.java, tier1+UseShenandoahGC+ShenandoahVerify, tier2+UseShenandoahGC+ShenandoahVerify, hotspot_gc_shenandoah

So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are *not* poking around at oops.

So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1113