RFR: 8240693: Sweeper should not examine dying metadata in is_unloading() nmethod during static call stub cleaning
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Apr 2 18:59:40 UTC 2020
Erik,
The redefinition part looks good to me. It should have been is_alive
and not unloading, like the code cache walk to determine evol dependent
nmethods.
thanks,
Coleen
On 4/2/20 12:28 PM, Erik Österlund wrote:
> Hi,
>
> There is currently an unfortunate race condition between concurrent
> code cache unloading and the Sweeper
> (affecting ZGC and Shenandoah).
>
> First of all, big thanks to Nils for the initial analysis, requiring
> downloading a 90 GB core file. Ouch.
> He discovered that we crashed accessing a freed metadata relocation in
> an unloaded nmethod from the sweeper.
>
> I am fairly certain it is due to the following race condition that can
> indeed happen today:
>
> 1. GC terminates marking in a safepoint
> 2. GC performs concurrent code cache unlinking. In this step, inline
> caches are cleaned for is_alive() &&
> !is_unloading() nmethods. The key is that it is not performed for
> is_alive() && is_unloading() nmethods.
> So metadata relocations of their static/opt virtual call stubs
> remain in the nmethod, and could potentially
> be dying.
> 3. A handshake operation is issued to separate unlinking from purging.
> The sweeper replies to the handshake.
> 4. GC performs concurrent code cache purging, making all nmethods
> unloaded. But just before one nmethod is unloaded...
> 5. The sweeper starts processing the nmethod. It is is_alive() &&
> is_unloading() and the sweeper correctly
> decides to clean its inline caches. As part of that inline cache
> cleaning, it examines if the metadata
> relocations of static/opt virtual call stubs need clearing
> (conditional of the holder of the target method being unloading).
> 6. Just before examining the metadata relocation, the sweeper gets
> preempted.
> 7. The GC finishes concurrent code cache purging.
> 8. The GC finishes concurrent metaspace purging (the Method* is now
> dead).
> 9. The sweeper wakes up from its slumber and starts processing the now
> freed Method* of the metadata relocation.
> 10. Crash.
>
> The key here is that the reason why the sweeper clears metadata
> relocations of static/opt virtual call stubs,
> is to unlink the metadata of is_alive() && !is_unloading() nmethods,
> so that they are not used after the class
> unloading handshake. However, the GC does not unlink said metadata
> from is_alive() && is_unloading() nmethods,
> because it is assumed that nobody should follow metadata from
> is_unloading() nmethods. This violation in the
> sweeper is the exception to that rule.
>
> The fix is to skip the processing of metadata relocations of
> is_unloading() nmethods, during inline cache cleaning.
> I also noticed another related but harmless violation of this rule by
> class redefinition, where metadata only filters
> is_alive(), not && !is_unloading(). I figured that this is actually
> harmless, because class redefinition happens in
> a safepoint, and code cache purging is done with the suspendible
> thread set taken, blocking out safepoints. Therefore
> it will happen either completely before or after code cache purging.
> If it happens before, then the metadata is not
> yet deleted, and if it happens after, then it is no longer is_alive().
> However, I would like to fix that anyway,
> just to make the model more consistent.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8240693
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8240693/webrev.00/
>
> Sanity checked with hs-tier1-8 with ZGC.
>
> Thanks,
> /Erik
More information about the hotspot-dev
mailing list