[master] RFR: Implement Shenandoah support [v14]

Mon Mar 28 17:14:03 UTC 2022

> This implements support for the Shenandoah GC in Lilliput. The following areas require special treatment in order for Shenandoah to work:
> 
> ### JVMTI/JFR
> 
> For JVMTI and JFR heap iteration, we used to check the (marked+masked) header for NULL. However, now we preserve the Klass* in the upper 32bits of the header, which breaks this. I changed it to check ShenandoahHeap::is_in(..) instead. It seems kinda brittle: can we guarantee that a narrowKlass in the upper bits doesn't look like a heap pointer? In any case, it seems to work for now. We might want to figure out something more reliable, or at least add some verification that the compressed and shifted Klass* can't look like a heap address.
> 
> ### Stack-locking
> 
> Accessing the header for the Klass* (and therefore, for size) requires a special protocol to ensure that we're not chasing a stack-locked displaced header that is about to be unlocked, and therefore access potential garbage memory. This is done in #25. However, in Shenandoah it is slightly more complicated, because we need to access the size of an object in from-space, and don't want to observe a stack-lock that is about to be unlocked by another thread. In particular, I am worried about the following scenario:
> T1:
> 1. leaves the final-mark safepoint, while holding lock O
> 2. Starts concurrent evacuation of O (concurrent threads evacuation)
> 3. CAS fwdptr to header of O
> 4. Unlocks O
> 
> T2 (possibly GC thread):
> 1. Starts evacuation of O
> 2. Accesses size/Klass*/header of O *in from space*, observe stack-lock
> 3. CAS fwdptr to header of O
> 
> If context switches after step 2, then T2 loads a stack-lock, then T1 succeeds to evacuate *and* unlock O, and then T2 accesses a dangling stack-lock.
> 
> We can use the same protocol that we implemented in #25 to prevent this: whenever we access the header of a from-space object, CAS 0 (INFLATING) into the header to prevent progress by any other thread, while at the same time get a safe hold on the stack-lock (or neutral lock if other thread was faster). In order for this to work, we need to change the evacuation protocol such that it retries (in busy-loop) when it observes a 0. Same goes for any code that loads the mark-word in from-space (not all that many places). For loading the mark-word, we need to extend the protocol a little to allow reaching through the forwarding pointer. This is GC specific, and thus requires a hook in BarrierSet.
> 
> ### Monitors
> 
> Monitors exhibit a similar problem: when observing a monitor while accessing an object's header, the concurrent deflater thread might concurrently deflate that monitor, and our thread might access a dangling monitor pointer. For Java threads, this is already prevented by the deflating protocol:
> - First the deflater thread fixes all monitor headers back to neutral. During this phase, it is ok to racily load a monitor header: the monitor is still there, and the displaced header is safe to access.
> - All Java threads are rendezvous'ed.
> - Deflater destroys all deflated monitors. At this point, all Java thread would see a neutral header, and cannot access the destroyed monitors anymore.
> 
> This protocol is already extended by #27 to also rendezvous GC threads. This only requires that concurrent GC threads participate in SuspendibleThreadSet. Shenandoah has already implemented this, but turned off by default. The remaining step for Shenandoah to safely access monitor headers is to enable Suspendible GC workers.
> 
> ### Other
> 
> The change also folds ObjectSynchronizer::safe_load_mark() and safe_mark() which basically do the same thing, except that one is doing the loop that we require.
> 
> Also, markWord::has_monitor() has been changed to be more reliable, and not require checking is_marked() first to catch the case where both lower bits are set.
> 
> Testing:
>  - [x] hotspot_gc_shenandoah (x86_64, x86_32, aarch64)
>  - [x] tier1 +UseShenandoahGC  (x86_64, x86_32, aarch64)
>  - [x] tier2 +UseShenandoahGC  (x86_64, x86_32, aarch64)
>  - [x] tier3 +UseShenandoahGC (x86_64, x86_32)
>  - [ ] tier4 +UseShenandoahGC
>  - [x] specjvm

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Revert include

-------------

Changes:
  - all: https://git.openjdk.java.net/lilliput/pull/32/files
  - new: https://git.openjdk.java.net/lilliput/pull/32/files/0942889d..926fd399

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=32&range=13
 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=32&range=12-13

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.java.net/lilliput/pull/32.diff
  Fetch: git fetch https://git.openjdk.java.net/lilliput pull/32/head:pull/32

PR: https://git.openjdk.java.net/lilliput/pull/32