RFR: 8276801: gc/stress/CriticalNativeStress.java fails intermittently with Shenandoah

Tue Nov 9 16:32:36 UTC 2021

On Tue, 9 Nov 2021 15:08:32 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> JDK-8276205 fixed a bug that prevents concurrent code cache iteration, without holding CodeCache_lock. That causes register_nmethod() calls can go through during concurrent code cache iteration.
> 
> This is no a problem for new nmethod, cause it is *not* in cache cache snapshot for concurrent code cache iteration, but *nmethod patching* is. We can not allow *nmethod patching* during concurrent code cache iteration, should block it until iteration is completed.
> 
> Test:
> 
> - [x] hotspot_gc_shenandoah
> - [x] Stress test on gc/stress/CriticalNativeStress.java

src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp line 119:

> 117: 
> 118: void ShenandoahCodeRoots::unregister_nmethod(nmethod* nm) {
> 119:   assert(CodeCache_lock->owned_by_self(), "Must have CodeCache_lock held");

I don't understand this change.

This method is called from:

   // Unregister must be done before the state change
  {
    MutexLocker ml(SafepointSynchronize::is_at_safepoint() ? NULL : CodeCache_lock,
                     Mutex::_no_safepoint_check_flag);
    Universe::heap()->unregister_nmethod(this);
  }

...which would mean we _do not_ hold the `CodeCache_lock` at safepoint? Previous assert captured that fact. New one does not. Shouldn't this assert fail intermittently? Same goes for `register_nmethod`, `flush_nmethod`, probably.

src/hotspot/share/gc/shenandoah/shenandoahNMethod.cpp line 283:

> 281:     data->update();
> 282:   } else {
> 283:     // For a new nmethod, we can safely append it to the list, cause

Suggestion:

    // For a new nmethod, we can safely append it to the list, because

-------------

PR: https://git.openjdk.java.net/jdk/pull/6316