From jrose at openjdk.java.net Thu Jul 1 01:46:33 2021 From: jrose at openjdk.java.net (John R Rose) Date: Thu, 1 Jul 2021 01:46:33 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits In-Reply-To: References:

Message-ID: On Wed, 30 Jun 2021 09:22:16 GMT, Roman Kennke wrote: > For this PR, I'd like to take the BL bit. But I believe it should be ok, because it's only relevant during GC and only if the lowest two bits are also set, when doing self-forwarding in case of a promotion failure. In this scenario, the header bits would be overwritten anyway, and GC would have to ensure that the header (including the BL bit) is preserved if there is anything interesting in the lower header bits. I think I agree, given that the BL bit will be used only while the header is in the special forwarding state and/or during a safepoint. Except for the larval marking (which is akin to a locking state), these Valhalla header bits are really hoisted copies of properties that can be reconstituted from the `Klass` block associated with the object. So, worst case, the GC might blow them away in favor of a forwarding pointer, and then "heal" them by re-fetching header bits from the `Klass`. The larval marking should probably be healed (if necessary) by whatever mechanism takes care of lock states. ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Mon Jul 5 15:09:11 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 5 Jul 2021 15:09:11 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v12] In-Reply-To: References: Message-ID: > The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, t he Klass* information is preserved in the to-space copy. > > G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. > > Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. > > Testing: > - [x] manual testing with +passive and -degen-gc > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 > - [x] tier1 (x86_32) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable G1 serial full-GC, simplify sliding forwarding ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/8/files - new: https://git.openjdk.java.net/lilliput/pull/8/files/92c2f1ae..83a607f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=11 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=10-11 Stats: 222 lines in 27 files changed: 84 ins; 78 del; 60 mod Patch: https://git.openjdk.java.net/lilliput/pull/8.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/8/head:pull/8 PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Mon Jul 5 16:38:27 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 5 Jul 2021 16:38:27 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v13] In-Reply-To: References: Message-ID: > The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, t he Klass* information is preserved in the to-space copy. > > G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. > > Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. > > Testing: > - [x] manual testing with +passive and -degen-gc > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 > - [x] tier1 (x86_32) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into sliding-forwarding - Disable G1 serial full-GC, simplify sliding forwarding - Merge branch 'master' into sliding-forwarding - Fixes for G1 - SlidingForwarding support for G1GC - Const-ness fixes for sliding forwarding in Serial GC - Build fixes for templated SlidingForwarding - Use sliding forwarding in Serial GC - Remove include of precompiled.hpp - Fix region_contains() - ... and 8 more: https://git.openjdk.java.net/lilliput/compare/81f823a4...de1d4e09 ------------- Changes: https://git.openjdk.java.net/lilliput/pull/8/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=12 Stats: 443 lines in 31 files changed: 381 ins; 6 del; 56 mod Patch: https://git.openjdk.java.net/lilliput/pull/8.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/8/head:pull/8 PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Mon Jul 5 18:14:08 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 5 Jul 2021 18:14:08 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v14] In-Reply-To: References: Message-ID: > The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, t he Klass* information is preserved in the to-space copy. > > G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. > > Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. > > Testing: > - [x] manual testing with +passive and -degen-gc > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 > - [x] tier1 (x86_32) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some cosmetic fixes ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/8/files - new: https://git.openjdk.java.net/lilliput/pull/8/files/de1d4e09..7d41e930 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=13 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=12-13 Stats: 10 lines in 8 files changed: 5 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/lilliput/pull/8.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/8/head:pull/8 PR: https://git.openjdk.java.net/lilliput/pull/8 From shade at openjdk.java.net Wed Jul 7 17:28:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 7 Jul 2021 17:28:18 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v14] In-Reply-To: References:

Message-ID: On Mon, 5 Jul 2021 18:14:08 GMT, Roman Kennke wrote: >> The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, the Klass* information is preserved in the to-space copy. >> >> G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. >> >> Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. >> >> Testing: >> - [x] manual testing with +passive and -degen-gc >> - [x] hotspot_gc_shenandoah >> - [x] tier1 >> - [x] tier2 >> - [x] tier1 (x86_32) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some cosmetic fixes Okay, I admit this is cute! I have a few cosmetic comments, but otherwise, it looks good for the experimental code. I would probably have a round of code tideups later... src/hotspot/share/gc/g1/g1FullCollector.cpp line 309: > 307: > 308: // To avoid OOM when there is memory left. > 309: // NOTE: Disabled for now because it violates sliding-forwarding assumption. I think it should be "TODO", so source analyzers can highlight this. src/hotspot/share/gc/g1/g1FullGCOopClosures.inline.hpp line 92: > 90: // Forwarded, just update. > 91: oop forwardee = _forwarding->forwardee(obj); > 92: assert(G1CollectedHeap::heap()->is_in_reserved(forwardee), "should be in object space: " PTR_FORMAT "(" PTR_FORMAT ", " PTR_FORMAT ") " INTPTR_FORMAT, p2i(forwardee), p2i(G1CollectedHeap::heap()->reserved().start()), p2i(G1CollectedHeap::heap()->reserved().end()), obj->mark().value()); Style: break the line here. src/hotspot/share/gc/shared/preservedMarks.hpp line 67: > 65: // Iterate over the stack, adjust all preserved marks according > 66: // to their forwarding location stored in the mark. > 67: void adjust_during_full_gc(); Does this method have any uses? Should those be updated to use `forwarding`? If not, it should be removed for safety? src/hotspot/share/gc/shared/slidingForwarding.hpp line 40: > 38: * The idea is to use a pointer compression scheme very similar to the one that is used for compressed oops. > 39: * We divide the heap into number of logical regions. Each region spans maximum of 2^NUM_BITS words. > 40: * We take advantage of the fact that sliding compaction can forward objects from ore region to a maximum of Suggestion: * We take advantage of the fact that sliding compaction can forward objects from one region to a maximum of src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 50: > 48: HeapWord* encode_base; > 49: uintptr_t region_idx; > 50: for (region_idx = 0; region_idx < (ONE << NUM_REGION_BITS); region_idx++) { I think `ONE << NUM_REGION_BITS` really deserves a separate constant. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 377: > 375: > 376: public: > 377: ShenandoahPrepareForCompactionTask(PreservedMarksSet* preserved_marks, ShenandoahHeapRegionSet **worker_slices) : Irrelevant change. test/hotspot/jtreg/gc/stress/TestMultiThreadStressRSet.java line 52: > 50: * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI > 51: * -XX:+UseG1GC -XX:G1SummarizeRSetStatsPeriod=100 -Xlog:gc > 52: * -Xmx1100M -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=1000 gc.stress.TestMultiThreadStressRSet 60 16 Why this change? ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/8 From shade at openjdk.java.net Wed Jul 7 17:43:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 7 Jul 2021 17:43:38 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits In-Reply-To: References: Message-ID: On Mon, 28 Jun 2021 13:49:09 GMT, Roman Kennke wrote: > In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. > > I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. > > A few notes: > - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. > - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. > - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. > > An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). > > The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. > > We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. > > (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) This looks generally fine for the experimental code (i.e. no obvious bugs), I just have a few questions. src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 171: > 169: // We only re-prepare objects forwarded within the current region, so > 170: // skip objects that are already forwarded to another region. > 171: if (obj->is_forwarded()) { Yeah, this seems like a nice micro-optimization for upstream. Do you want to take care of it? src/hotspot/share/oops/markWord.hpp line 135: > 133: static const uintptr_t marked_value = 3; > 134: > 135: static const uintptr_t self_forwarded_value = 1 << self_forwarded_shift; This actually looks like `self_forwarded_in_place`? Because `self_forwarded_value` is just `1`. src/hotspot/share/oops/oop.hpp line 250: > 248: > 249: inline void forward_to(oop p); > 250: inline void forward_to_self(); Yes, this rename makes total sense for upstream. Want to do it? src/hotspot/share/oops/oop.inline.hpp line 314: > 312: return forwardee(old_mark); > 313: } else { > 314: compare = old_mark; Is it even possible to get to this branch? I.e. when CAS fails, aren't we guaranteed to see `old_mark.is_marked()`? ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Wed Jul 7 19:47:30 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 7 Jul 2021 19:47:30 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v15] In-Reply-To: References: Message-ID: > The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, t he Klass* information is preserved in the to-space copy. > > G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. > > Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. > > Testing: > - [x] manual testing with +passive and -degen-gc > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 > - [x] tier1 (x86_32) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/8/files - new: https://git.openjdk.java.net/lilliput/pull/8/files/7d41e930..af4420fd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=14 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=8&range=13-14 Stats: 20 lines in 8 files changed: 9 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/lilliput/pull/8.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/8/head:pull/8 PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Wed Jul 7 19:47:35 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 7 Jul 2021 19:47:35 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v14] In-Reply-To: References:

Message-ID: On Wed, 7 Jul 2021 17:01:48 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some cosmetic fixes > > src/hotspot/share/gc/g1/g1FullGCOopClosures.inline.hpp line 92: > >> 90: // Forwarded, just update. >> 91: oop forwardee = _forwarding->forwardee(obj); >> 92: assert(G1CollectedHeap::heap()->is_in_reserved(forwardee), "should be in object space: " PTR_FORMAT "(" PTR_FORMAT ", " PTR_FORMAT ") " INTPTR_FORMAT, p2i(forwardee), p2i(G1CollectedHeap::heap()->reserved().start()), p2i(G1CollectedHeap::heap()->reserved().end()), obj->mark().value()); > > Style: break the line here. I'd rather remove the new debug output. > src/hotspot/share/gc/shared/preservedMarks.hpp line 67: > >> 65: // Iterate over the stack, adjust all preserved marks according >> 66: // to their forwarding location stored in the mark. >> 67: void adjust_during_full_gc(); > > Does this method have any uses? Should those be updated to use `forwarding`? If not, it should be removed for safety? Actually not, except in a unit test. This test should be written to use a forwarding instead, but this probably requires a mockable/abstract forwarding that can be plugged in for the purpose of the test. I added TODOs and postpone this to later. > test/hotspot/jtreg/gc/stress/TestMultiThreadStressRSet.java line 52: > >> 50: * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >> 51: * -XX:+UseG1GC -XX:G1SummarizeRSetStatsPeriod=100 -Xlog:gc >> 52: * -Xmx1100M -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=1000 gc.stress.TestMultiThreadStressRSet 60 16 > > Why this change? This is related to disabling the last-last-ditch serial full GC. Apparently, in the old configuration, the serial-full-GC would free enough remaining regions to make this test work. Without the change, it runs into OOM. ------------- PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Wed Jul 7 20:10:05 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 7 Jul 2021 20:10:05 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits In-Reply-To: References:

Message-ID: On Wed, 7 Jul 2021 17:29:50 GMT, Aleksey Shipilev wrote: >> In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. >> >> I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. >> >> A few notes: >> - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. >> - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. >> - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. >> >> An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). >> >> The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. >> >> We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. >> >> (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) > > src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 171: > >> 169: // We only re-prepare objects forwarded within the current region, so >> 170: // skip objects that are already forwarded to another region. >> 171: if (obj->is_forwarded()) { > > Yeah, this seems like a nice micro-optimization for upstream. Do you want to take care of it? Naa, this doesn't really do much, unless we extract the header once, and then check is_forwarded() on it, and avoid decoding if it's not forwarded - but even then this seems to be too micro ;-) Here it is not an optimization but a necessary change, because we cannot blindly decode the forwardee anymore. > src/hotspot/share/oops/markWord.hpp line 135: > >> 133: static const uintptr_t marked_value = 3; >> 134: >> 135: static const uintptr_t self_forwarded_value = 1 << self_forwarded_shift; > > This actually looks like `self_forwarded_in_place`? Because `self_forwarded_value` is just `1`. It's actually self_forwarded_mask_in_place, I removed the _value and simplified the set_self_forwarded() method. > src/hotspot/share/oops/oop.inline.hpp line 314: > >> 312: return forwardee(old_mark); >> 313: } else { >> 314: compare = old_mark; > > Is it even possible to get to this branch? I.e. when CAS fails, aren't we guaranteed to see `old_mark.is_marked()`? I don't think so. I thought a little too far here, Shenandoah-style territory. I removed this branch, and the whole loop. ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Wed Jul 7 20:15:50 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 7 Jul 2021 20:15:50 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v2] In-Reply-To: References: Message-ID: > In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. > > I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. > > A few notes: > - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. > - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. > - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. > > An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). > > The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. > > We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. > > (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Alekey's comments ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/10/files - new: https://git.openjdk.java.net/lilliput/pull/10/files/e4a27baa..52062320 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=10&range=01 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=10&range=00-01 Stats: 15 lines in 2 files changed: 0 ins; 6 del; 9 mod Patch: https://git.openjdk.java.net/lilliput/pull/10.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/10/head:pull/10 PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Wed Jul 7 20:15:51 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 7 Jul 2021 20:15:51 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v2] In-Reply-To: References:

Message-ID: On Wed, 7 Jul 2021 17:39:40 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Alekey's comments > > src/hotspot/share/oops/oop.hpp line 250: > >> 248: >> 249: inline void forward_to(oop p); >> 250: inline void forward_to_self(); > > Yes, this rename makes total sense for upstream. Want to do it? This is not the rename. cas_forward_to() is only used in a single place for self-forwarding, but the replacement in upstream would be forward_to_atomic(), which is doing almost exactly the same (except it returns a value not a bool). Filed: https://bugs.openjdk.java.net/browse/JDK-8270041 ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From shade at openjdk.java.net Thu Jul 8 06:43:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 8 Jul 2021 06:43:59 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v2] In-Reply-To: References:

Message-ID: <4oMU8WM73IEJzULWcAE-lp9HqwkxkFyABonYkl65NXU=.dbe76e06-5546-4b5a-890c-67cf6767e5bb@github.com> On Wed, 7 Jul 2021 20:15:50 GMT, Roman Kennke wrote: >> In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. >> >> I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. >> >> A few notes: >> - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. >> - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. >> - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. >> >> An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). >> >> The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. >> >> We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. >> >> (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Alekey's comments Marked as reviewed by shade (Committer). ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From shade at openjdk.java.net Thu Jul 8 06:44:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 8 Jul 2021 06:44:00 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v2] In-Reply-To: References:

Message-ID: On Wed, 7 Jul 2021 20:06:46 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/markWord.hpp line 135: >> >>> 133: static const uintptr_t marked_value = 3; >>> 134: >>> 135: static const uintptr_t self_forwarded_value = 1 << self_forwarded_shift; >> >> This actually looks like `self_forwarded_in_place`? Because `self_forwarded_value` is just `1`. > > It's actually self_forwarded_mask_in_place, I removed the _value and simplified the set_self_forwarded() method. Good. ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From shade at openjdk.java.net Fri Jul 9 10:18:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 9 Jul 2021 10:18:19 GMT Subject: [master] RFR: Implement sliding forwarding scheme that preserves upper header bits [v15] In-Reply-To: References:

Message-ID: On Wed, 7 Jul 2021 19:47:30 GMT, Roman Kennke wrote: >> The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, the Klass* information is preserved in the to-space copy. >> >> G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. >> >> Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. >> >> Testing: >> - [x] manual testing with +passive and -degen-gc >> - [x] hotspot_gc_shenandoah >> - [x] tier1 >> - [x] tier2 >> - [x] tier1 (x86_32) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Aleksey's comments Marked as reviewed by shade (Committer). ------------- PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Fri Jul 9 11:37:27 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 9 Jul 2021 11:37:27 GMT Subject: [master] Integrated: Implement sliding forwarding scheme that preserves upper header bits In-Reply-To: References: Message-ID: <7jlgWVwamtrjGl1Ar6ZbeRSKCqwi_3TMJJHcViMDQSE=.69ea0517-9771-4b51-ad9b-5003da9487ae@github.com> On Thu, 10 Jun 2021 13:06:18 GMT, Roman Kennke wrote: > The current way of storing forwarding information - by storing the forwardee address into the header - poses a problem for sliding collectors: it overrides the upper bits that keep the (compressed) Klass*. Sliding collectors have no way to recover the Klass*, and therefore need a different forwarding scheme that preserves the upper bits. I propose a scheme that compresses the forwarding pointer and by taking advantage of the fact that each region only ever forwards to at most two other regions (when dividing the heap into equal-sized logical regions), it can address all of the heap, regardless of its size. This obviouly works well with regionalized collectors, but it also works with contiguous-heap collectors like serial or parallel GC. The latter would divide the heap into logical regions sized by 4G (or single region if heap is smaller than that). Notice that evacuating and scavenging collectors don't have this problem: they can safely stomp over the from-space copy of objects, t he Klass* information is preserved in the to-space copy. > > G1 is special: the fallback from parallel to serial full GC means that we can have maximum of N target regions with N == ParallelGCThreads. For this reason, I use an encoding with 5 bits for target region index, that is enough to address 32 target regions from each region, assuming a maximum heap region size of 32M. In the future we will be able to steal bits from the Klass* in the upper half of the header, allowing us to address even more regions. Alternatively, we can change the full-GC fallback to re-forward all objects serially, instead of trying to re-use the already-forwarded state from the parallel full GC. > > Serial GC and Shenandoah GC use only 1 bit to address target regions, because they really can only ever have 2 target regions out of a single region. > > Testing: > - [x] manual testing with +passive and -degen-gc > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 > - [x] tier1 (x86_32) This pull request has now been integrated. Changeset: 5f6cb177 Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/5f6cb1777d4b9cf51486ad33f8b975bf29d8d3a0 Stats: 448 lines in 31 files changed: 392 ins; 5 del; 51 mod Implement sliding forwarding scheme that preserves upper header bits Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/lilliput/pull/8 From rkennke at openjdk.java.net Fri Jul 9 12:12:37 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 9 Jul 2021 12:12:37 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v3] In-Reply-To: References: Message-ID: > In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. > > I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. > > A few notes: > - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. > - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. > - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. > > An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). > > The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. > > We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. > > (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Increase sliding forwarding base shift to 3, to leave room for self-forwarding - Merge - Alekey's comments - Implement self-forwarding of objects that preserves header bits ------------- Changes: https://git.openjdk.java.net/lilliput/pull/10/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=10&range=02 Stats: 60 lines in 10 files changed: 35 ins; 2 del; 23 mod Patch: https://git.openjdk.java.net/lilliput/pull/10.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/10/head:pull/10 PR: https://git.openjdk.java.net/lilliput/pull/10 From shade at openjdk.java.net Fri Jul 9 15:09:20 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 9 Jul 2021 15:09:20 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v3] In-Reply-To: References:

Message-ID: On Fri, 9 Jul 2021 12:12:37 GMT, Roman Kennke wrote: >> In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. >> >> I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. >> >> A few notes: >> - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. >> - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. >> - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. >> >> An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). >> >> The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. >> >> We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. >> >> (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) >> >> Testing: >> - [x] tier1 >> - [x] hotspot_gc >> - [x] hotspot_gc_shenandoah > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Increase sliding forwarding base shift to 3, to leave room for self-forwarding > - Merge > - Alekey's comments > - Implement self-forwarding of objects that preserves header bits Marked as reviewed by shade (Committer). ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Fri Jul 9 15:16:20 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 9 Jul 2021 15:16:20 GMT Subject: [master] Integrated: Implement self-forwarding of objects that preserves header bits In-Reply-To: References: Message-ID: On Mon, 28 Jun 2021 13:49:09 GMT, Roman Kennke wrote: > In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. > > I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. > > A few notes: > - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. > - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. > - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. > > An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). > > The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. > > We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. > > (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) > > Testing: > - [x] tier1 > - [x] hotspot_gc > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 0130503d Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/0130503d8df710b24c85aadcbde0eb3307dfc2ba Stats: 60 lines in 10 files changed: 35 ins; 2 del; 23 mod Implement self-forwarding of objects that preserves header bits Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From tschatzl at openjdk.java.net Mon Jul 12 09:02:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 12 Jul 2021 09:02:44 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v3] In-Reply-To: References:

Message-ID: <_9u7ILck5pdjq5Yj0cF1kmABB26R2nChMnAUguf_K7w=.6d4b5ee0-23c7-4ef6-9d34-1a6356e61003@github.com> On Fri, 9 Jul 2021 12:12:37 GMT, Roman Kennke wrote: >> In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits. >> >> I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header. >> >> A few notes: >> - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting. >> - We have some naked header-decodings, which get tidied-up. This could also be brought upstream. >> - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too. >> >> An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc). >> >> The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this. >> >> We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space. >> >> (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.) >> >> Testing: >> - [x] tier1 >> - [x] hotspot_gc >> - [x] hotspot_gc_shenandoah > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Increase sliding forwarding base shift to 3, to leave room for self-forwarding > - Merge > - Alekey's comments > - Implement self-forwarding of objects that preserves header bits Sorry, I'm late, but I only now got the time to look into this. src/hotspot/share/gc/g1/g1OopClosures.inline.hpp line 237: > 235: markWord m = obj->mark(); > 236: if (m.is_marked()) { > 237: forwardee = obj->forwardee(m); The original code has been done that way intentionally: the changed code reloads the value from the header from memory (from what I can tell from generated code), while the original code does not, just reusing the value in the register. Even if this is a nano-nano-optimization I would prefer to keep it as is. Some of the other changes seem to cause very similar "regressions". (Yeah, I'm late, sorry). ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Mon Jul 12 10:49:32 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 12 Jul 2021 10:49:32 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v3] In-Reply-To: <_9u7ILck5pdjq5Yj0cF1kmABB26R2nChMnAUguf_K7w=.6d4b5ee0-23c7-4ef6-9d34-1a6356e61003@github.com> References:

<_9u7ILck5pdjq5Yj0cF1kmABB26R2nChMnAUguf_K7w=.6d4b5ee0-23c7-4ef6-9d34-1a6356e61003@github.com> Message-ID: <4Ir8JgplVi6FlnZ7pjIOoP4Djc7MK6MCu-kEd0dKCng=.94bf7625-a084-42e9-b068-f6e386d09d9d@github.com> On Mon, 12 Jul 2021 08:46:06 GMT, Thomas Schatzl wrote: > The original code has been done that way intentionally: the changed code reloads the value from the header from memory (from what I can tell from generated code), while the original code does not, just reusing the value in the register. > > Even if this is a nano-nano-optimization I would prefer to keep it as is. Some of the other changes seem to cause very similar "regressions". > > (Yeah, I'm late, sorry). I think I preserved the original behavior of re-using the already-loaded header. Notice that I added an oopDesc::forwardee(markWord m) to do that. ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From tschatzl at openjdk.java.net Mon Jul 12 13:02:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 12 Jul 2021 13:02:47 GMT Subject: [master] RFR: Implement self-forwarding of objects that preserves header bits [v3] In-Reply-To: <4Ir8JgplVi6FlnZ7pjIOoP4Djc7MK6MCu-kEd0dKCng=.94bf7625-a084-42e9-b068-f6e386d09d9d@github.com> References:

<_9u7ILck5pdjq5Yj0cF1kmABB26R2nChMnAUguf_K7w=.6d4b5ee0-23c7-4ef6-9d34-1a6356e61003@github.com> <4Ir8JgplVi6FlnZ7pjIOoP4Djc7MK6MCu-kEd0dKCng=.94bf7625-a084-42e9-b068-f6e386d09d9d@github.com> Message-ID: <9kEkfc9KUtisWpzR5S_TL-WcFOYJn6IfGoaTFVOch0E=.6bac8bce-b8ab-4ab9-971b-fca05600fd10@github.com> On Mon, 12 Jul 2021 10:46:28 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/g1/g1OopClosures.inline.hpp line 237: >> >>> 235: markWord m = obj->mark(); >>> 236: if (m.is_marked()) { >>> 237: forwardee = obj->forwardee(m); >> >> The original code has been done that way intentionally: the changed code reloads the value from the header from memory (from what I can tell from generated code), while the original code does not, just reusing the value in the register. >> >> Even if this is a nano-nano-optimization I would prefer to keep it as is. Some of the other changes seem to cause very similar "regressions". >> >> (Yeah, I'm late, sorry). > >> The original code has been done that way intentionally: the changed code reloads the value from the header from memory (from what I can tell from generated code), while the original code does not, just reusing the value in the register. >> >> Even if this is a nano-nano-optimization I would prefer to keep it as is. Some of the other changes seem to cause very similar "regressions". >> >> (Yeah, I'm late, sorry). > > I think I preserved the original behavior of re-using the already-loaded header. Notice that I added an oopDesc::forwardee(markWord m) to do that. Okay, then ignore me :) ------------- PR: https://git.openjdk.java.net/lilliput/pull/10 From rkennke at openjdk.java.net Tue Jul 27 17:09:16 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 27 Jul 2021 17:09:16 GMT Subject: [master] RFR: Read class from object header Message-ID: This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately). Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much. It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation. That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive. In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded). Testing: - [x] tier1 - [x] tier2 - [x] hotspot_gc (all without Shenandoah and ZGC, see above) ------------- Commit messages: - Merge branch 'master' into klass-from-header - Removed commented-out code - Revert unnecessary change - Simplify fetching klass from header at safepoint - Fix 32bit build - Simplify oopDesc::klass() and related methods - Revert ZGC changes - Revert one more Shenandoah change - Revert Shenandoah changes - Simplify OS::stable_mark() - ... and 39 more: https://git.openjdk.java.net/lilliput/compare/18ffb596...f98e3545 Changes: https://git.openjdk.java.net/lilliput/pull/12/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=12&range=00 Stats: 172 lines in 22 files changed: 107 ins; 35 del; 30 mod Patch: https://git.openjdk.java.net/lilliput/pull/12.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/12/head:pull/12 PR: https://git.openjdk.java.net/lilliput/pull/12 From rkennke at openjdk.java.net Wed Jul 28 09:07:23 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 28 Jul 2021 09:07:23 GMT Subject: [master] RFR: Read class from object header [v2] In-Reply-To: References: Message-ID: > This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately). > > Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much. > > It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation. > > That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive. > > In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded). > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc > (all without Shenandoah and ZGC, see above) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable Shenandoah and ZGC in GA jobs ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/12/files - new: https://git.openjdk.java.net/lilliput/pull/12/files/f98e3545..3203a503 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=12&range=01 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=12&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/lilliput/pull/12.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/12/head:pull/12 PR: https://git.openjdk.java.net/lilliput/pull/12 From rkennke at openjdk.java.net Wed Jul 28 13:22:42 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 28 Jul 2021 13:22:42 GMT Subject: [master] RFR: Read class from object header [v3] In-Reply-To: References: Message-ID: <_KjFB_7rJQ_bW_t30Q2eO7uvWExH3np7zqJ3QnaQ8h8=.d28df3c6-b690-4766-bc3d-8a63a67c5076@github.com> > This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately). > > Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much. > > It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation. > > That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive. > > In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded). > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc > (all without Shenandoah and ZGC, see above) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add missing new markWord.inline.hpp ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/12/files - new: https://git.openjdk.java.net/lilliput/pull/12/files/3203a503..1f9e5aef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=12&range=02 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=12&range=01-02 Stats: 66 lines in 1 file changed: 66 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/lilliput/pull/12.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/12/head:pull/12 PR: https://git.openjdk.java.net/lilliput/pull/12 From shade at openjdk.java.net Fri Jul 30 08:09:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 30 Jul 2021 08:09:54 GMT Subject: [master] RFR: Read class from object header [v3] In-Reply-To: <_KjFB_7rJQ_bW_t30Q2eO7uvWExH3np7zqJ3QnaQ8h8=.d28df3c6-b690-4766-bc3d-8a63a67c5076@github.com> References: <_KjFB_7rJQ_bW_t30Q2eO7uvWExH3np7zqJ3QnaQ8h8=.d28df3c6-b690-4766-bc3d-8a63a67c5076@github.com> Message-ID: <9IVtuzNAPCOn5oGNKk4acD8iFZvSqhjHmpzaz7GUw84=.f4406f05-c065-439d-bbc9-4c899aa5ddab@github.com> On Wed, 28 Jul 2021 13:22:42 GMT, Roman Kennke wrote: >> This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately). >> >> Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much. >> >> It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation. >> >> That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive. >> >> In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded). >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] hotspot_gc >> (all without Shenandoah and ZGC, see above) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing new markWord.inline.hpp All right, it looks fine for the experimental code. A few questions/comments: src/hotspot/share/gc/shared/preservedMarks.inline.hpp line 58: > 56: header = header.displaced_mark_helper(); > 57: } > 58: narrowKlass nklass = header.narrow_klass(); This assumes `UseCompressedClassPointers` is `true`, right? Needs to be asserted? src/hotspot/share/oops/markWord.inline.hpp line 54: > 52: if (mrk.has_displaced_mark_helper()) { > 53: mrk = mrk.displaced_mark_helper(); > 54: } Suggestion: markWord m = *this; if (m.has_displaced_mark_helper()) { m = m.displaced_mark_helper(); } src/hotspot/share/oops/oop.hpp line 80: > 78: inline Klass* klass_or_null_acquire() const; > 79: > 80: narrowKlass narrow_klass() const { return _metadata._compressed_klass; } Maybe we should rename this to e.g. `narrow_klass_legacy` to avoid accidentally using it? AFAIU, this is for legacy code paths? src/hotspot/share/oops/oop.inline.hpp line 117: > 115: markWord header = mark(); > 116: if (!header.is_neutral()) { > 117: header =ObjectSynchronizer::stable_mark(cast_to_oop(this)); Suggestion: header = ObjectSynchronizer::stable_mark(cast_to_oop(this)); src/hotspot/share/runtime/synchronizer.cpp line 755: > 753: // This is a stack lock owned by the calling thread so fetch the > 754: // displaced markWord from the BasicLock on the stack. > 755: mark = mark.displaced_mark_helper(); Do we have to check `has_displaced_mark_helper` before accessing? src/hotspot/share/runtime/synchronizer.cpp line 763: > 761: assert(mark.is_neutral(), "invariant: header=" INTPTR_FORMAT, mark.value()); > 762: assert(!mark.is_marked(), "no forwarded objects here"); > 763: return mark; Style: seems like `return mark` can be moved at the end of the method. ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/12