From rkennke at openjdk.java.net Mon May 2 16:35:21 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 May 2022 16:35:21 GMT Subject: [master] RFR: Implement Shenandoah support [v28] In-Reply-To: References: Message-ID: > This implements support for the Shenandoah GC in Lilliput. The following areas require special treatment in order for Shenandoah to work: > > ### JVMTI/JFR > > I replace the fwdptr-resolve in the object scan loop with a proper LRB to prevent exposing from-space objects to JVMTI runtime. > > ### Stack-locking > > Accessing the header for the Klass* (and therefore, for size) requires a special protocol to ensure that we're not chasing a stack-locked displaced header that is about to be unlocked, and therefore access potential garbage memory. This is done in #25. However, in Shenandoah it is slightly more complicated, because we need to access the size of an object in from-space, and don't want to observe a stack-lock that is about to be unlocked by another thread. In particular, I am worried about the following scenario: > T1: > 1. leaves the final-mark safepoint, while holding lock O > 2. Starts concurrent evacuation of O (concurrent threads evacuation) > 3. CAS fwdptr to header of O > 4. Unlocks O > > T2 (possibly GC thread): > 1. Starts evacuation of O > 2. Accesses size/Klass*/header of O *in from space*, observe stack-lock > 3. CAS fwdptr to header of O > > If context switches after step 2, then T2 loads a stack-lock, then T1 succeeds to evacuate *and* unlock O, and then T2 accesses a dangling stack-lock. > > We can use the same protocol that we implemented in #25 to prevent this: whenever we access the header of a from-space object, CAS 0 (INFLATING) into the header to prevent progress by any other thread, while at the same time get a safe hold on the stack-lock (or neutral lock if other thread was faster). In order for this to work, we need to change the evacuation protocol such that it retries (in busy-loop) when it observes a 0. > Same goes for any code that loads the mark-word in from-space (not all that many places). For loading the mark-word, we need to extend the protocol a little to allow reaching through the forwarding pointer. This is GC specific for Shenandoah. That is why I put the implementation for this into ShenandoahObjectUtils, which mostly mirrors ObjectSynchronizer implementation to load the mark word, with additional from-space object handling (which would not be necessary in regular runtime accesses to the mark-word, outside of GC code). > > ### Monitors > > Monitors exhibit a similar problem: when observing a monitor while accessing an object's header, the concurrent deflater thread might concurrently deflate that monitor, and our thread might access a dangling monitor pointer. For Java threads, this is already prevented by the deflating protocol: > - First the deflater thread fixes all monitor headers back to neutral. During this phase, it is ok to racily load a monitor header: the monitor is still there, and the displaced header is safe to access. > - All Java threads are rendezvous'ed. > - Deflater destroys all deflated monitors. At this point, all Java thread would see a neutral header, and cannot access the destroyed monitors anymore. > > This protocol is already extended by #27 to also rendezvous GC threads. This only requires that concurrent GC threads participate in SuspendibleThreadSet. Shenandoah has already implemented this, but turned off by default. The remaining step for Shenandoah to safely access monitor headers is to enable Suspendible GC workers. > > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x86_32, aarch64) > - [x] tier1 +UseShenandoahGC (x86_64, x86_32, aarch64) > - [x] tier2 +UseShenandoahGC (x86_64, x86_32, aarch64) > - [x] tier3 +UseShenandoahGC (x86_64, x86_32) > - [ ] tier4 +UseShenandoahGC > - [x] specjvm Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge branch 'master' into shenandoah-lilliput - Merge branch 'master' into shenandoah-lilliput - Remove superfluous continue statement - Update/fix mark of copy object before CASing forward pointer - Reload mark-word in retry-loop when encountering INFLATING - Zhengyu's suggestions - Merge remote-tracking branch 'origin/shenandoah-lilliput' into shenandoah-lilliput - Merge remote-tracking branch 'origin/shenandoah-lilliput' into shenandoah-lilliput - Merge branch 'master' into shenandoah-lilliput - Merge branch 'master' into shenandoah-lilliput - ... and 40 more: https://git.openjdk.java.net/lilliput/compare/d2048dd3...c7ca7bae ------------- Changes: https://git.openjdk.java.net/lilliput/pull/32/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=32&range=27 Stats: 238 lines in 11 files changed: 209 ins; 7 del; 22 mod Patch: https://git.openjdk.java.net/lilliput/pull/32.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/32/head:pull/32 PR: https://git.openjdk.java.net/lilliput/pull/32 From rkennke at openjdk.java.net Mon May 2 19:59:06 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 May 2022 19:59:06 GMT Subject: [master] RFR: Implement Shenandoah support [v28] In-Reply-To: References: Message-ID: On Mon, 2 May 2022 16:35:21 GMT, Roman Kennke wrote: >> This implements support for the Shenandoah GC in Lilliput. The following areas require special treatment in order for Shenandoah to work: >> >> ### JVMTI/JFR >> >> I replace the fwdptr-resolve in the object scan loop with a proper LRB to prevent exposing from-space objects to JVMTI runtime. >> >> ### Stack-locking >> >> Accessing the header for the Klass* (and therefore, for size) requires a special protocol to ensure that we're not chasing a stack-locked displaced header that is about to be unlocked, and therefore access potential garbage memory. This is done in #25. However, in Shenandoah it is slightly more complicated, because we need to access the size of an object in from-space, and don't want to observe a stack-lock that is about to be unlocked by another thread. In particular, I am worried about the following scenario: >> T1: >> 1. leaves the final-mark safepoint, while holding lock O >> 2. Starts concurrent evacuation of O (concurrent threads evacuation) >> 3. CAS fwdptr to header of O >> 4. Unlocks O >> >> T2 (possibly GC thread): >> 1. Starts evacuation of O >> 2. Accesses size/Klass*/header of O *in from space*, observe stack-lock >> 3. CAS fwdptr to header of O >> >> If context switches after step 2, then T2 loads a stack-lock, then T1 succeeds to evacuate *and* unlock O, and then T2 accesses a dangling stack-lock. >> >> We can use the same protocol that we implemented in #25 to prevent this: whenever we access the header of a from-space object, CAS 0 (INFLATING) into the header to prevent progress by any other thread, while at the same time get a safe hold on the stack-lock (or neutral lock if other thread was faster). In order for this to work, we need to change the evacuation protocol such that it retries (in busy-loop) when it observes a 0. >> Same goes for any code that loads the mark-word in from-space (not all that many places). For loading the mark-word, we need to extend the protocol a little to allow reaching through the forwarding pointer. This is GC specific for Shenandoah. That is why I put the implementation for this into ShenandoahObjectUtils, which mostly mirrors ObjectSynchronizer implementation to load the mark word, with additional from-space object handling (which would not be necessary in regular runtime accesses to the mark-word, outside of GC code). >> >> ### Monitors >> >> Monitors exhibit a similar problem: when observing a monitor while accessing an object's header, the concurrent deflater thread might concurrently deflate that monitor, and our thread might access a dangling monitor pointer. For Java threads, this is already prevented by the deflating protocol: >> - First the deflater thread fixes all monitor headers back to neutral. During this phase, it is ok to racily load a monitor header: the monitor is still there, and the displaced header is safe to access. >> - All Java threads are rendezvous'ed. >> - Deflater destroys all deflated monitors. At this point, all Java thread would see a neutral header, and cannot access the destroyed monitors anymore. >> >> This protocol is already extended by #27 to also rendezvous GC threads. This only requires that concurrent GC threads participate in SuspendibleThreadSet. Shenandoah has already implemented this, but turned off by default. The remaining step for Shenandoah to safely access monitor headers is to enable Suspendible GC workers. >> >> >> Testing: >> - [x] hotspot_gc_shenandoah (x86_64, x86_32, aarch64) >> - [x] tier1 +UseShenandoahGC (x86_64, x86_32, aarch64) >> - [x] tier2 +UseShenandoahGC (x86_64, x86_32, aarch64) >> - [x] tier3 +UseShenandoahGC (x86_64, x86_32) >> - [ ] tier4 +UseShenandoahGC >> - [x] specjvm > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: > > - Merge branch 'master' into shenandoah-lilliput > - Merge branch 'master' into shenandoah-lilliput > - Remove superfluous continue statement > - Update/fix mark of copy object before CASing forward pointer > - Reload mark-word in retry-loop when encountering INFLATING > - Zhengyu's suggestions > - Merge remote-tracking branch 'origin/shenandoah-lilliput' into shenandoah-lilliput > - Merge remote-tracking branch 'origin/shenandoah-lilliput' into shenandoah-lilliput > - Merge branch 'master' into shenandoah-lilliput > - Merge branch 'master' into shenandoah-lilliput > - ... and 40 more: https://git.openjdk.java.net/lilliput/compare/d2048dd3...c7ca7bae Ok thanks Zhengyu! ------------- PR: https://git.openjdk.java.net/lilliput/pull/32 From rkennke at openjdk.java.net Mon May 2 20:03:03 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 May 2022 20:03:03 GMT Subject: [master] Integrated: Implement Shenandoah support In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 22:19:13 GMT, Roman Kennke wrote: > This implements support for the Shenandoah GC in Lilliput. The following areas require special treatment in order for Shenandoah to work: > > ### JVMTI/JFR > > I replace the fwdptr-resolve in the object scan loop with a proper LRB to prevent exposing from-space objects to JVMTI runtime. > > ### Stack-locking > > Accessing the header for the Klass* (and therefore, for size) requires a special protocol to ensure that we're not chasing a stack-locked displaced header that is about to be unlocked, and therefore access potential garbage memory. This is done in #25. However, in Shenandoah it is slightly more complicated, because we need to access the size of an object in from-space, and don't want to observe a stack-lock that is about to be unlocked by another thread. In particular, I am worried about the following scenario: > T1: > 1. leaves the final-mark safepoint, while holding lock O > 2. Starts concurrent evacuation of O (concurrent threads evacuation) > 3. CAS fwdptr to header of O > 4. Unlocks O > > T2 (possibly GC thread): > 1. Starts evacuation of O > 2. Accesses size/Klass*/header of O *in from space*, observe stack-lock > 3. CAS fwdptr to header of O > > If context switches after step 2, then T2 loads a stack-lock, then T1 succeeds to evacuate *and* unlock O, and then T2 accesses a dangling stack-lock. > > We can use the same protocol that we implemented in #25 to prevent this: whenever we access the header of a from-space object, CAS 0 (INFLATING) into the header to prevent progress by any other thread, while at the same time get a safe hold on the stack-lock (or neutral lock if other thread was faster). In order for this to work, we need to change the evacuation protocol such that it retries (in busy-loop) when it observes a 0. > Same goes for any code that loads the mark-word in from-space (not all that many places). For loading the mark-word, we need to extend the protocol a little to allow reaching through the forwarding pointer. This is GC specific for Shenandoah. That is why I put the implementation for this into ShenandoahObjectUtils, which mostly mirrors ObjectSynchronizer implementation to load the mark word, with additional from-space object handling (which would not be necessary in regular runtime accesses to the mark-word, outside of GC code). > > ### Monitors > > Monitors exhibit a similar problem: when observing a monitor while accessing an object's header, the concurrent deflater thread might concurrently deflate that monitor, and our thread might access a dangling monitor pointer. For Java threads, this is already prevented by the deflating protocol: > - First the deflater thread fixes all monitor headers back to neutral. During this phase, it is ok to racily load a monitor header: the monitor is still there, and the displaced header is safe to access. > - All Java threads are rendezvous'ed. > - Deflater destroys all deflated monitors. At this point, all Java thread would see a neutral header, and cannot access the destroyed monitors anymore. > > This protocol is already extended by #27 to also rendezvous GC threads. This only requires that concurrent GC threads participate in SuspendibleThreadSet. Shenandoah has already implemented this, but turned off by default. The remaining step for Shenandoah to safely access monitor headers is to enable Suspendible GC workers. > > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x86_32, aarch64) > - [x] tier1 +UseShenandoahGC (x86_64, x86_32, aarch64) > - [x] tier2 +UseShenandoahGC (x86_64, x86_32, aarch64) > - [x] tier3 +UseShenandoahGC (x86_64, x86_32) > - [ ] tier4 +UseShenandoahGC > - [x] specjvm This pull request has now been integrated. Changeset: 9f4a50fe Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/9f4a50febc483a342ef09ac0c08f6d742857178d Stats: 238 lines in 11 files changed: 209 ins; 7 del; 22 mod Implement Shenandoah support Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/lilliput/pull/32 From jwaters at openjdk.java.net Tue May 3 02:44:28 2022 From: jwaters at openjdk.java.net (Julian Waters) Date: Tue, 3 May 2022 02:44:28 GMT Subject: [master] RFR: Only include Mark Word when required [v4] In-Reply-To: References: Message-ID: <5NbHhInebntY2ZhwxXYXrY9r_L-Ynx9ySo-8JlFXmnE=.951817d8-a95e-461e-8ef6-b18bc5ec35e4@github.com> > WIP, some code here is meant as a placeholder > > Discussion: https://bugs.openjdk.java.net/browse/JDK-8198331 > > Current goals and dependencies (To help keep track) > - [x] 2 bit identity Hash Code > - [ ] Elimination of locking from the Mark Word entirely (Hopefully) > - [ ] 5 bit GC section (4 bit object age, 1 bit forwarding state) > - [ ] No Mark Word in header until required > - [ ] Support and optimizations for objects guaranteed to not have a Mark Word (May depend on Valhalla for the GC section) Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Don't initially allocate the Mark Word - Initial Commit ------------- Changes: https://git.openjdk.java.net/lilliput/pull/47/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=47&range=03 Stats: 12 lines in 2 files changed: 11 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/lilliput/pull/47.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/47/head:pull/47 PR: https://git.openjdk.java.net/lilliput/pull/47 From rkennke at redhat.com Wed May 4 16:15:22 2022 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 May 2022 18:15:22 +0200 Subject: Milestone 1 Message-ID: <4e72691e-320b-23be-47e6-c5d382b5d5fc@redhat.com> Hi all, I would like to announce that Lilliput reached an important milestone: 64bit sized object headers. :-) Currently, in upstream JDK, objects in the Java heap have 96 bits of header (or even 128 when running with -XX:-UseCompressedClassPointers): 64 bits for the mark-word (which is multi-purpose, covering lock-bits, pointers to stack-locks or object monitors, GC age bits, GC forwarding pointer and identity hashcode) and 32 or 64 bits for compressed Klass* pointer (possibly compressed). This milestone of Lilliput merges the mark-word and the Klass* into a single 64bit word, thus saving ~32bit (or 64bit with CCPs) per object. In order to do so, we turn on class-pointer-compression unconditionally, and put the 32bit Klass* into the upper 32bits of the object header. This means we need to coordinate with synchronization subsystem and the GC in order to read the correct header in case the object header has been displaced by a stack-lock, object monitor or GC forwarding pointer. Any code that loads the Klass* now needs to check for such situation (by checking the lock-bits) and carefully extract the Klass* if it is the case. Caveats: - The identity hashcode is currently limited to 25bits. That is so that we can fit everything into the header. The next milestone will handle the identity hashcode entirely differently, requiring only 2 bits in the header, and allowing 32bits (or more, if we wanted) for the actual hashcode. - The serviceability agent is not supported. It would have to deal with header overloads to get to the Klass*, and I did not want to bother with that, especially given that I intend to get rid of the overload entirely - then this problem with disappear. - Only x86_64 and aarch64 are supported. x86_32 should work, but don't behave any differently (mark-word and Klass* are already 32bits there, that makes 64bit together). - Some array types don't benefit from the improvement, due to alignment restrictions. That is double[], long[] and Object[] (when running without compressed oops). In other words, all 64bit array element types need to be aligned to 64bit addresses, and thus cannot use the extra 32bits free space after 64bit header and 32bit array length. You can grab the milestone 1 by checking out the lilliput-milestone-1 tag, and build that yourself: https://github.com/openjdk/lilliput/tree/lilliput-milestone-1 Or by trying one of Aleksey's nightly builds (they are following master branch, though, so will eventually drift away from the tag): https://builds.shipilev.net/openjdk-jdk-lilliput/ Let us now how it goes! Huge thanks go to everybody involved, be it by contributing code, discussions, reviews, infrastructure, etc. We are already working on next milestone, which will be 32bit headers :-D Cheers, Roman -- Red Hat GmbH, Registered seat: Werner von Siemens Ring 14, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross From jwaters at openjdk.java.net Mon May 16 07:51:21 2022 From: jwaters at openjdk.java.net (Julian Waters) Date: Mon, 16 May 2022 07:51:21 GMT Subject: [master] RFR: Only include Mark Word when required [v5] In-Reply-To: References: Message-ID: > WIP, some code here is meant as a placeholder > > Discussion: https://bugs.openjdk.java.net/browse/JDK-8198331 > > Current goals and dependencies (To help keep track) > - [x] 2 bit identity Hash Code > - [ ] Elimination of locking from the Mark Word entirely (Hopefully) > - [ ] 5 bit GC section (4 bit object age, 1 bit forwarding state) > - [ ] No Mark Word in header until required > - [ ] Support and optimizations for objects guaranteed to not have a Mark Word (May depend on Valhalla for the GC section) Julian Waters has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains one commit: Initial Commit ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/47/files - new: https://git.openjdk.java.net/lilliput/pull/47/files/aad1f197..c38a772b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=47&range=04 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=47&range=03-04 Stats: 6 lines in 2 files changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/lilliput/pull/47.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/47/head:pull/47 PR: https://git.openjdk.java.net/lilliput/pull/47 From tanksherman27 at gmail.com Sun May 29 12:37:15 2022 From: tanksherman27 at gmail.com (Julian Waters) Date: Sun, 29 May 2022 20:37:15 +0800 Subject: Compressed Class Pointers Message-ID: Hi all, Apologies for the sudden noise in the list, but I've come to notice that Lilliput seems to be enforcing Compressed Class Pointers if the running HotSpot VM happens to be 64-bit. ( https://github.com/openjdk/lilliput/blob/9f4a50febc483a342ef09ac0c08f6d742857178d/src/hotspot/share/oops/oop.inline.hpp#L99 ) I'm admittedly a little behind on the work done with Lilliput, so I'm not entirely sure what the rationale behind this choice is, is there anywhere I could go to for a little catching up? (Alternatively Compressed Class Pointers aren't actually being enforced, and I'm mistakenly missing something else) best regards, Julian From forax at univ-mlv.fr Sun May 29 15:38:07 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 29 May 2022 17:38:07 +0200 (CEST) Subject: Compressed Class Pointers In-Reply-To: References: Message-ID: <1802102141.14337273.1653838687119.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Julian Waters" > To: lilliput-dev at openjdk.java.net > Sent: Sunday, May 29, 2022 2:37:15 PM > Subject: Compressed Class Pointers > Hi all, > > Apologies for the sudden noise in the list, but I've come to notice that > Lilliput seems to be enforcing Compressed Class Pointers if the running > HotSpot VM happens to be 64-bit. ( > https://github.com/openjdk/lilliput/blob/9f4a50febc483a342ef09ac0c08f6d742857178d/src/hotspot/share/oops/oop.inline.hpp#L99 > ) > > I'm admittedly a little behind on the work done with Lilliput, so I'm not > entirely sure what the rationale behind this choice is, is there anywhere I > could go to for a little catching up? (Alternatively Compressed Class > Pointers aren't actually being enforced, and I'm mistakenly missing > something else) > > best regards, > Julian Hi Julian, The aim of liliput is to reduce the size of the object header, to either 64 bits or better 32 bits if possible. If a class pointer is 64 bits == the size of a non compressed pointer on a 64 bits VM, the game is already lost. regards, R?mi