From ayang at openjdk.org Tue May 2 08:01:19 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 May 2023 08:01:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 26 Apr 2023 09:20:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas src/hotspot/share/gc/g1/g1CollectionSet.cpp line 328: > 326: assert(_optional_old_regions.length() == 0, "must be"); > 327: > 328: if (collector_state()->in_mixed_phase()) { Why checking the same condition again (L322 the first time)? src/hotspot/share/gc/g1/g1CollectionSet.cpp line 329: > 327: > 328: if (collector_state()->in_mixed_phase()) { > 329: time_remaining_ms = _policy->select_candidates_from_marking(&candidates()->marking_regions(), `time_remaining_ms` seems unused after the assignment. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 47: > 45: } > 46: > 47: void G1CollectionCandidateList::append_unsorted(HeapRegion* r) { Some methods in this file seem never used. src/hotspot/share/gc/shared/ptrQueue.hpp line 43: > 41: class BufferNode; > 42: class PtrQueueSet; > 43: class PtrQueue : public CHeapObj { Why is this required? (Seems to work fine without it when I tried it.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182204221 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182204738 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182212123 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182203296 From tschatzl at openjdk.org Tue May 2 12:04:18 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 12:04:18 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Tue, 2 May 2023 07:49:42 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 328: > >> 326: assert(_optional_old_regions.length() == 0, "must be"); >> 327: >> 328: if (collector_state()->in_mixed_phase()) { > > Why checking the same condition again (L322 the first time)? In https://bugs.openjdk.org/browse/JDK-8140326 the first condition will change to something like "are there collection set candidates" and retained regions will be added later. Will remove. > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 329: > >> 327: >> 328: if (collector_state()->in_mixed_phase()) { >> 329: time_remaining_ms = _policy->select_candidates_from_marking(&candidates()->marking_regions(), > > `time_remaining_ms` seems unused after the assignment. Same reason as above. Later changes will need/use this. Removed. > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 47: > >> 45: } >> 46: >> 47: void G1CollectionCandidateList::append_unsorted(HeapRegion* r) { > > Some methods in this file seem never used. They are used in https://bugs.openjdk.org/browse/JDK-8140326 . I will look through and remove unused ones. > src/hotspot/share/gc/shared/ptrQueue.hpp line 43: > >> 41: class BufferNode; >> 42: class PtrQueueSet; >> 43: class PtrQueue : public CHeapObj { > > Why is this required? > > (Seems to work fine without it when I tried it.) Required for https://bugs.openjdk.org/browse/JDK-8140326. Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182448685 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182449132 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182451617 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182452932 From shade at openjdk.org Tue May 2 12:07:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 12:07:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References:

Message-ID: On Fri, 28 Apr 2023 19:34:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some more changes by @shipilev I think we are very close. Another round of review: src/hotspot/share/gc/shared/gcForwarding.hpp line 39: > 37: > 38: public: > 39: static void initialize(MemRegion heap, size_t region_size_words_shift); Suggestion: static void initialize(MemRegion heap, size_t region_size_words); src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: > 134: GCInitLogger::print(); > 135: > 136: GCForwarding::initialize(_reserved, SpaceAlignment); The second argument is not "shift" anymore, right? So this should be the actual reserved space size? src/hotspot/share/gc/shared/slidingForwarding.cpp line 131: > 129: assert(val < TABLE_SIZE, "must fit in table: val: " UINT64_FORMAT ", table-size: " UINTX_FORMAT ", table-size-bits: %d", > 130: val, TABLE_SIZE, log2i_exact(TABLE_SIZE)); > 131: return static_cast(val); Want to cast first, and _then_ assert, maybe? src/hotspot/share/gc/shared/slidingForwarding.hpp line 68: > 66: * ^------------------------------------------- alternate region select > 67: * ^----------------------------------------- in-region offset > 68: * ^----------------------- compressed class pointer (not handled, but also *not touched* by this code) I think we can invert these: * 64 32 0 * [........................|OOOOOOOOOOOOOOO|A|F|TT] * ^--- normal lock bits, would record "object is forwarded" * ^----- fallback bit (explained below) * ^------- alternate region select * ^----------------------- in-region offset * ^------------------------------------------------ protected area, *not touched* by this code, useful for * compressed class pointer with compact object headers ``` src/hotspot/share/gc/shared/slidingForwarding.hpp line 93: > 91: class SlidingForwarding : public CHeapObj { > 92: private: > 93: static const uintptr_t MARK_LOWER_HALF_MASK = 0xffffffff; This is just `right_n_bits(32)`? ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1408928430 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182441736 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182440390 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182452646 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182434367 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182438481 From tschatzl at openjdk.org Tue May 2 12:15:36 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 12:15:36 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v2] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review - remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/e58864e1..ee76b9ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=00-01 Stats: 29 lines in 5 files changed: 0 ins; 23 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From rkennke at openjdk.org Tue May 2 12:50:22 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 12:50:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 11:47:15 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some more changes by @shipilev > > src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: > >> 134: GCInitLogger::print(); >> 135: >> 136: GCForwarding::initialize(_reserved, SpaceAlignment); > > The second argument is not "shift" anymore, right? So this should be the actual reserved space size? I think SpaceAlignment is correct. We want to pass a region-size there, and the (default) region size for Serial should be the space alignment, because that is what eden, survivors and old-space will be aligned at. Unfortunately, Serial GC doesn't generally slide from top to bottom: it starts to slide old into old, then young into old until old is full, then slide the rest into young. Even worse, the survivor spaces are swapped with every GC cycle, so we really don't know that sliding goes top -> bottom. Using 'virtual' regions that align at SpaceAlignment solves the problem, though. (One exception is when the whole heap fits into our 2^28 words range, in which case we can treat the whole heap as single region) That said, I see a bug in the line: GCForwarding::initialize() takes region size *in words* but SpaceAlignment is *in bytes*. I'm fixing that to passing space-alignment in words instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182500382 From rkennke at openjdk.org Tue May 2 13:00:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 13:00:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @shipilev's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/74d4ad1f..5892ad5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=14-15 Stats: 15 lines in 4 files changed: 2 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From tschatzl at openjdk.org Tue May 2 13:41:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 13:41:28 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang review - remove unused methods - Whitespace fixes - typo - More cleanup - Cleanup - Cleanup - Refactor collection set candidates Improve the interface to collection set candidates and prepare for having collection set candidates at any time. Preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch only uses candidates from marking at this time. Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. * the collection set candidates set is not temporarily allocated any more, but the candidate set object must be available all the time. * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). * there are several additional helper sets/lists * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. All these sets implement C++ iterators for simpler use in various places. Everything else are changes to use these helper sets/lists throughout. Some additional FIXME for log messages to remove are in there. Please ignore. ------------- Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=02 Stats: 1085 lines in 26 files changed: 622 ins; 217 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From shade at openjdk.org Tue May 2 14:23:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 14:23:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 12:47:26 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: >> >>> 134: GCInitLogger::print(); >>> 135: >>> 136: GCForwarding::initialize(_reserved, SpaceAlignment); >> >> The second argument is not "shift" anymore, right? So this should be the actual reserved space size? > > I think SpaceAlignment is correct. We want to pass a region-size there, and the (default) region size for Serial should be the space alignment, because that is what eden, survivors and old-space will be aligned at. Unfortunately, Serial GC doesn't generally slide from top to bottom: it starts to slide old into old, then young into old until old is full, then slide the rest into young. Even worse, the survivor spaces are swapped with every GC cycle, so we really don't know that sliding goes top -> bottom. Using 'virtual' regions that align at SpaceAlignment solves the problem, though. > (One exception is when the whole heap fits into our 2^28 words range, in which case we can treat the whole heap as single region) > That said, I see a bug in the line: GCForwarding::initialize() takes region size *in words* but SpaceAlignment is *in bytes*. I'm fixing that to passing space-alignment in words instead. Ah, that is _region size_, okay. `SpaceAlignment` seems okay then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182618027 From shade at openjdk.org Tue May 2 14:51:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 14:51:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

Message-ID: <5K91Rc15LdUc1SOEllnetA3-IS5T_pDYSkEXFIR8M64=.4ba97ce6-1033-490e-a4a6-911a9a870109@github.com> On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review Some more... src/hotspot/share/gc/shared/slidingForwarding.cpp line 40: > 38: SlidingForwarding::SlidingForwarding(MemRegion heap, size_t region_size_words) > 39: : _heap_start(heap.start()), > 40: _num_regions(((heap.end() - heap.start()) / region_size_words) + 1), This one overestimates the number of regions by 1, if heap is covered by regions exactly, right? Seems innocuous, though. src/hotspot/share/gc/shared/slidingForwarding.hpp line 112: > 110: > 111: // How many bits we use for the compressed pointer > 112: static const int NUM_COMPRESSED_BITS = 32 - OFFSET_BITS_SHIFT; Suggestion: // How many bits we use for the offset static const int NUM_OFFSET_BITS = 32 - OFFSET_BITS_SHIFT; src/hotspot/share/gc/shared/slidingForwarding.hpp line 165: > 163: }; > 164: > 165: static const size_t TABLE_SIZE = 128; Any reason why we do `128` here? I think we can take a bit larger table here, given that: a) the footprint would be eaten by chaining anyway; b) we delete the table after use anyway. 1K entries would take about 32K native memory, if I calculate it right. ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1409233470 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182631375 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182651630 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182660172 From eosterlund at openjdk.org Tue May 2 15:15:26 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 2 May 2023 15:15:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review src/hotspot/share/gc/shared/preservedMarks.cpp line 52: > 50: if (GCForwarding::is_forwarded(obj)) { > 51: elem->set_oop(GCForwarding::forwardee(obj)); > 52: } Is PreservedMarks still useful after moving the spacious forwarding/mark information out from the markWord? I can see that we need it while transitioning to using your new code, but that's about it right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182691096 From stuefe at openjdk.org Tue May 2 15:28:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 15:28:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

Message-ID: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review Hi Roman, Small general concern, the last-last-ditch-GC fallback table may be impractical cost-wise. How large is that expected to grow? You pay 24+x (~48 on glibc with internal overhead) bytes per forwarded oop. Very easy first-step mitigation: Let the table house the first n (1000-10000) nodes as an inline member array. Allocate nodes from there, only allocate spilloffs from C-heap. Allocation would be a lot faster and cheaper memory wise, and its just some lines of code. Cheers, Thomas src/hotspot/share/gc/shared/slidingForwarding.cpp line 35: > 33: > 34: // We cannot use 0, because that may already be a valid base address in zero-based heaps. > 35: // 0x1 is safe because heap base addresses must be aligned by much larger alginemnt typo src/hotspot/share/gc/shared/slidingForwarding.cpp line 44: > 42: _region_size_words_shift(log2i_exact(region_size_words)), > 43: _bases_table(nullptr), > 44: _fallback_table(nullptr) { Assert for sane values for region_size? At least >= word size? src/hotspot/share/gc/shared/slidingForwarding.cpp line 81: > 79: _bases_table = nullptr; > 80: > 81: if (_fallback_table != nullptr) { null check not needed src/hotspot/share/gc/shared/slidingForwarding.cpp line 137: > 135: void FallbackTable::forward_to(HeapWord* from, HeapWord* to) { > 136: size_t idx = home_index(from); > 137: if (_table[idx]._from != nullptr) { Here you need to do a contains check, right? Because, as you wrote in your answer to Aleksey, forwardings can be rewritten: https://github.com/openjdk/jdk/pull/13582/files#r1180126262 src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: > 119: size_t _region_size_words; > 120: size_t _region_size_words_shift; > 121: HeapWord** _bases_table; Small nit. For clarity, I would prefer if we had a real structure here, e.g.: struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; region_forwarding* _table; src/hotspot/share/gc/shared/slidingForwarding.hpp line 168: > 166: FallbackTableEntry _table[TABLE_SIZE]; > 167: > 168: static size_t home_index(HeapWord* from); Nitpicking, but I'd prefer an int or unsigned as return val here. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: > 54: // Primary is free > 55: _bases_table[base_idx] = to_region_base; > 56: } else if (region_contains(_bases_table[base_idx], to)) { Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 87: > 85: > 86: HeapWord* SlidingForwarding::decode_forwarding(HeapWord* from, uintptr_t encoded) const { > 87: assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); Assert for !FALLBACK too? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 95: > 93: size_t base_idx = from_idx + alt_region; > 94: > 95: HeapWord* decoded = _bases_table[base_idx] + offset; Maybe assert that table slot != UNUSED_BASE first src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 115: > 113: uintptr_t encoded = encode_forwarding(from_hw, to_hw); > 114: markWord new_header = markWord((from_header.value() & ~MARK_LOWER_HALF_MASK) | encoded); > 115: from->set_mark(new_header); What happens if the header is displaced into an OM? Should we not update the displaced header instead? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 125: > 123: assert(_bases_table != nullptr, "call begin() before asking for forwarding"); > 124: > 125: markWord header = from->mark(); Could this header be displaced? test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 40: > 38: return ((uintptr_t(1) << 2) /* fallback */ | 3 /* forwarded */); > 39: } > 40: Could you add a test that forwarding works for displaced Oop+OM ? ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1405661292 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182596471 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182631488 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182633369 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182691084 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182612217 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182688315 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182622823 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182636381 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182640645 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182644415 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182647104 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182706683 From stuefe at openjdk.org Tue May 2 15:28:41 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 15:28:41 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v10] In-Reply-To: <1rB2fkk813I4tm8B-G2ArcAjSJxWzyYIgf8yBWBVGwc=.8dc9ecb5-24c0-4a86-bf29-4cf6408a1b1b@github.com> References: <1rB2fkk813I4tm8B-G2ArcAjSJxWzyYIgf8yBWBVGwc=.8dc9ecb5-24c0-4a86-bf29-4cf6408a1b1b@github.com> Message-ID: <2F7cnbE_2v4qCk1LoBFRI2S9Ky2bcwgPMxHC0Cf0lHU=.0973cc70-d5df-409c-87b0-f21562f1010d@github.com> On Fri, 28 Apr 2023 07:52:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal build test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 35: > 33: // Test simple forwarding within the same region. > 34: TEST_VM(SlidingForwarding, simple) { > 35: HeapWord heap[16]; Please initialize array for release build gtests ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1180220812 From rkennke at openjdk.org Tue May 2 15:33:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 15:33:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 15:11:59 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/preservedMarks.cpp line 52: > >> 50: if (GCForwarding::is_forwarded(obj)) { >> 51: elem->set_oop(GCForwarding::forwardee(obj)); >> 52: } > > Is PreservedMarks still useful after moving the spacious forwarding/mark information out from the markWord? I can see that we need it while transitioning to using your new code, but that's about it right? It is still useful. This PR implements a compression that allows to use only the lowest 32bit of the mark-word for the forwarding pointer, but it still essentially uses the mark-word to store that information. That means that it overrides i-hash-code and lock-bits just the same as the normal implementation, and thus must preserve this information. I *also* prototyped a hash-table-based forwarding which does no longer use the mark-word to store forwarding. However, I found that to be 1. significantly slower and 2. significantly larger. That was a trade-off that I did not want to make at this point, when we 'only' want 64-bit-headers, simply because it's not yet necessary. It *will* become necessary to make that trade-off, or come up with a better overall approach (e.g. use scissor-GC like Parallel GC does, or come up with a better fwd-table like in that paper that you sent me: https://dl.acm.org/doi/abs/10.1145/3546918.3546928) but this needs to be researched. So yeah, the sliding forwarding algorithm is an interim solution but I think it is worth to have it at this point in time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182713491 From tschatzl at openjdk.org Tue May 2 15:53:17 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 15:53:17 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v5] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that removes the pinned tag from `HeapRegion`. > > So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. > > With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". > > The (current) pinned flag is surprisingly little used, only for policy decisions. > > The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). > > Testing: tier1-3, gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into 8306836-remove-pinned-tag - remove is_young_gc_movable in full gc code - cplummer review - ayang review - Fix hsdb - compilation fixes - Initial implementation ------------- Changes: https://git.openjdk.org/jdk/pull/13643/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13643&range=04 Stats: 69 lines in 20 files changed: 12 ins; 30 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/13643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13643/head:pull/13643 PR: https://git.openjdk.org/jdk/pull/13643 From tschatzl at openjdk.org Tue May 2 16:47:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 16:47:06 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v6] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that removes the pinned tag from `HeapRegion`. > > So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. > > With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". > > The (current) pinned flag is surprisingly little used, only for policy decisions. > > The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). > > Testing: tier1-3, gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Remove is_young_gc_movable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13643/files - new: https://git.openjdk.org/jdk/pull/13643/files/3577054b..3516e982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13643&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13643&range=04-05 Stats: 17 lines in 6 files changed: 1 ins; 9 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13643/head:pull/13643 PR: https://git.openjdk.org/jdk/pull/13643 From tschatzl at openjdk.org Tue May 2 16:47:36 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 16:47:36 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v5] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 15:53:17 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that removes the pinned tag from `HeapRegion`. >> >> So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. >> >> With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". >> >> The (current) pinned flag is surprisingly little used, only for policy decisions. >> >> The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into 8306836-remove-pinned-tag > - remove is_young_gc_movable in full gc code > - cplummer review > - ayang review > - Fix hsdb > - compilation fixes > - Initial implementation I removed the `young_gc_is_movable()` predicate; it is probably the wrong time to introduce more abstract concepts like this in this change. Moved off the refactoring of the `G1CollectionSetChooser::should_add()` and its caller to sometime else too - it's not relevant to this change either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13643#issuecomment-1531806113 From rkennke at openjdk.org Tue May 2 16:51:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 16:51:13 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v17] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More @shipilev's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/5892ad5d..494ec9ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=15-16 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Tue May 2 16:54:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 16:54:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 15:25:10 GMT, Thomas Stuefe wrote: > Hi Roman, > > Small general concern, the last-last-ditch-GC fallback table may be impractical cost-wise. How large is that expected to grow? You pay 24+x (~48 on glibc with internal overhead) bytes per forwarded oop. > > Very easy first-step mitigation: Let the table house the first n (1000-10000) nodes as an inline member array. Allocate nodes from there, only allocate spilloffs from C-heap. Allocation would be a lot faster and cheaper memory wise, and its just some lines of code. > I did some experiments with the only jtreg test that seems to exercise the G1 serial compaction (and thus the fallback-table) (the test is: gc/stress/TestMultiThreadStressRSet.java). With fallback-table size 128 I'd typically end up with several dozens excess nodes, sometimes more than the base table size. Up to table size of 512 this reduces signicantly but still typically one to several dozen extra nodes. When I switched to table-size of 1024 the extra nodes count drops to below one dozen in most cases. I'll leave the table-size at this value until we find a good reason to extend it, ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1531813673 From rkennke at openjdk.org Tue May 2 17:37:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 14:16:16 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: > >> 119: size_t _region_size_words; >> 120: size_t _region_size_words_shift; >> 121: HeapWord** _bases_table; > > Small nit. For clarity, I would prefer if we had a real structure here, e.g.: > > struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; > region_forwarding* _table; Ok, I am changing it. It looks like it's introducing a branch on the decoding-path though. I am not sure if a C++ compiler would optimise it to a branch-free code, though. It's probably a very minor concern. > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 115: > >> 113: uintptr_t encoded = encode_forwarding(from_hw, to_hw); >> 114: markWord new_header = markWord((from_header.value() & ~MARK_LOWER_HALF_MASK) | encoded); >> 115: from->set_mark(new_header); > > What happens if the header is displaced into an OM? Should we not update the displaced header instead? When the header is displaced, it will be recorded in the preserved-marks table. Then we over-write the mark-word with the forwarding. At the end of the GC, we will restore the original mark from the preserved-marks table. This is the same mechanism that is already used in normal uncompressed forwarding. > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 125: > >> 123: assert(_bases_table != nullptr, "call begin() before asking for forwarding"); >> 124: >> 125: markWord header = from->mark(); > > Could this header be displaced? No. See above. We actually check for that in decode_forwarding(): assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182827422 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182846767 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182847913 From shade at openjdk.org Tue May 2 17:37:28 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 17:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 17:12:12 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: >> >>> 119: size_t _region_size_words; >>> 120: size_t _region_size_words_shift; >>> 121: HeapWord** _bases_table; >> >> Small nit. For clarity, I would prefer if we had a real structure here, e.g.: >> >> struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; >> region_forwarding* _table; > > Ok, I am changing it. It looks like it's introducing a branch on the decoding-path though. I am not sure if a C++ compiler would optimise it to a branch-free code, though. It's probably a very minor concern. No wait, let's keep it as `HeapWord*` array. The fact that alternate selection is just a math addition matters a bit for decoding performance. I think it does not complicate the code all that much to warrant extra abstraction here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182832355 From shade at openjdk.org Tue May 2 17:37:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 17:37:31 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 14:23:36 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: > >> 54: // Primary is free >> 55: _bases_table[base_idx] = to_region_base; >> 56: } else if (region_contains(_bases_table[base_idx], to)) { > > Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? (kicks himself a little). Yes. Yes, it can. We would not need `region_contains` method then at all, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182826854 From rkennke at openjdk.org Tue May 2 17:37:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:37:32 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 17:11:35 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: >> >>> 54: // Primary is free >>> 55: _bases_table[base_idx] = to_region_base; >>> 56: } else if (region_contains(_bases_table[base_idx], to)) { >> >> Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? > > (kicks himself a little). Yes. Yes, it can. We would not need `region_contains` method then at all, I think. Indeed! Well spotted! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182827529 From rkennke at openjdk.org Tue May 2 17:46:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:46:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References:

<9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: <30hPs9U3Wt5ITn5XHdjVPuVcbqK4YWq1Xxfw2LznDYo=.0194177b-9472-41e1-bd2a-056eb80104ff@github.com> On Tue, 2 May 2023 15:11:59 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 137: > >> 135: void FallbackTable::forward_to(HeapWord* from, HeapWord* to) { >> 136: size_t idx = home_index(from); >> 137: if (_table[idx]._from != nullptr) { > > Here you need to do a contains check, right? Because, as you wrote in your answer to Aleksey, forwardings can be rewritten: https://github.com/openjdk/jdk/pull/13582/files#r1180126262 I don't think that ever happens (I think we'd only ever re-forward from normal forwarding to fallback-forwarding once), but I am adding that check for extra sanity. > test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 40: > >> 38: return ((uintptr_t(1) << 2) /* fallback */ | 3 /* forwarded */); >> 39: } >> 40: > > Could you add a test that forwarding works for displaced Oop+OM ? Uhhh, that would involved the OM and preserved-marks subsystems. The saving and restoring of 'interesting mark-words' is done outside of the GCForwarding subsystem and not the responsibility here. I'd rather not test for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182855383 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182856956 From rkennke at openjdk.org Tue May 2 18:06:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 18:06:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v18] In-Reply-To: References: Message-ID: <06loJyeqlW5aON-IGrWJzY6DQBLkC3kyuxxeCMxq3xI=.da8bbc4c-0cd7-4c7f-9bb2-0b087ce70d11@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @tstuefe's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/494ec9ad..84181db6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=16-17 Stats: 38 lines in 3 files changed: 8 ins; 6 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Tue May 2 18:21:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 18:21:18 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v19] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Initialize 'heap' elements in test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/84181db6..8366454e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=17-18 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From sspitsyn at openjdk.org Tue May 2 19:02:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 19:02:22 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v6] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 16:47:06 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that removes the pinned tag from `HeapRegion`. >> >> So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. >> >> With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". >> >> The (current) pinned flag is surprisingly little used, only for policy decisions. >> >> The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Remove is_young_gc_movable This looks good in general. I can't judge on the GC side decision about this removal and all updated comments but it looks consistent. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13643#pullrequestreview-1409712756 From ayang at openjdk.org Tue May 2 22:08:18 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 May 2023 22:08:18 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v6] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 16:47:06 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that removes the pinned tag from `HeapRegion`. >> >> So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. >> >> With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". >> >> The (current) pinned flag is surprisingly little used, only for policy decisions. >> >> The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Remove is_young_gc_movable Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13643#pullrequestreview-1409951167 From ysr at openjdk.org Wed May 3 00:32:23 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 May 2023 00:32:23 GMT Subject: RFR: 8305062: Refactor CardTable::resize_covered_region [v3] In-Reply-To: References:

Message-ID: On Tue, 18 Apr 2023 09:21:54 GMT, Albert Mingkun Yang wrote: >> Simple refactoring to make logic around cardtable cover-region more concrete, since #generations and gen-boundary is fixed for Serial/Parallel. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Left a comment re `guard_region`. src/hotspot/share/gc/shared/cardTable.hpp line 63: > 61: > 62: // The last card is a guard card; never committed. > 63: MemRegion _guard_region; @albertnetymk : It looks like, following this refactor, you have stopped using `guard_region` for its previous role. I'd either put some of those checks back in, or just delete this now otherwise obsolete field. It is possible, however, that I am missing something here. Thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/13206#pullrequestreview-1410041342 PR Review Comment: https://git.openjdk.org/jdk/pull/13206#discussion_r1183156270 From ysr at openjdk.org Wed May 3 06:46:25 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 May 2023 06:46:25 GMT Subject: RFR: 8305062: Refactor CardTable::resize_covered_region [v3] In-Reply-To: References:

Message-ID: <0A_VTfQsSSTI5BGWdFlDWFfwVugfN8MhwYOo_b2astU=.35e7969c-9c98-4565-be8c-af8a4ca7b5a4@github.com> On Tue, 18 Apr 2023 09:21:54 GMT, Albert Mingkun Yang wrote: >> Simple refactoring to make logic around cardtable cover-region more concrete, since #generations and gen-boundary is fixed for Serial/Parallel. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/gc/shared/cardTable.hpp line 63: > 61: > 62: // The last card is a guard card; never committed. > 63: MemRegion _guard_region; Doh, scratch that comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13206#discussion_r1183291917 From iwalulya at openjdk.org Wed May 3 08:20:17 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 3 May 2023 08:20:17 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Tue, 2 May 2023 13:41:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang review - remove unused methods > - Whitespace fixes > - typo > - More cleanup > - Cleanup > - Cleanup > - Refactor collection set candidates > > Improve the interface to collection set candidates and prepare for having collection set > candidates at any time. Preparations to allow for multiple sources for these candidates > (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch > only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's > not used otherwise. > > * the collection set candidates set is not temporarily allocated any more, but the candidate > set object must be available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains > the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not > necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. > Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Everything else are changes to use these helper sets/lists throughout. > > Some additional FIXME for log messages to remove are in there. Please ignore. src/hotspot/share/gc/g1/g1CollectionSet.hpp line 155: > 153: // When doing mixed collections we can add old regions to the collection set, which > 154: // will be collected only if there is enough time. We call these optional regions. > 155: // This member records the current number of regions that are of that type that Comment needs to be revised src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 50: > 48: guarantee((uint)_candidates.length() >= other->length(), "must be"); > 49: > 50: if ((other->length() == 0) || (_candidates.length() == 0)) { `guarantee((uint)_candidates.length() >= other->length(), "must be");` implies that the second part of the predicate is not necessary i.e `|| (_candidates.length() == 0)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183278338 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183285839 From ayang at openjdk.org Wed May 3 09:54:19 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 09:54:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> On Tue, 2 May 2023 13:41:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang review - remove unused methods > - Whitespace fixes > - typo > - More cleanup > - Cleanup > - Cleanup > - Refactor collection set candidates > > Improve the interface to collection set candidates and prepare for having collection set > candidates at any time. Preparations to allow for multiple sources for these candidates > (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch > only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's > not used otherwise. > > * the collection set candidates set is not temporarily allocated any more, but the candidate > set object must be available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains > the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not > necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. > Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Everything else are changes to use these helper sets/lists throughout. > > Some additional FIXME for log messages to remove are in there. Please ignore. src/hotspot/share/gc/g1/heapRegion.inline.hpp line 344: > 342: } > 343: > 344: inline bool HeapRegion::in_collection_set_candidates() const { The impl is identical to `is_collection_set_candidate`. Maybe one is enough? src/hotspot/share/gc/shared/ptrQueue.hpp line 202: > 200: // In particular, the individual queues allocate buffers from this shared > 201: // set, and return completed buffers to the set. > 202: class PtrQueueSet : public CHeapObj { This doesn't seem required in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182609579 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182610148 From duke at openjdk.org Wed May 3 10:11:22 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 10:11:22 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code Message-ID: Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. Added output: Serial [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses Parallel [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms [6.313s][info ][gc,phases,start] GC(12) Summary Phase G1 Full [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction ------------- Commit messages: - 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code Changes: https://git.openjdk.org/jdk/pull/13772/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13772&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307346 Stats: 12 lines in 3 files changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13772.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13772/head:pull/13772 PR: https://git.openjdk.org/jdk/pull/13772 From tschatzl at openjdk.org Wed May 3 10:19:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 10:19:20 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:04:14 GMT, olivergillespie wrote: > Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. > > Added output: > > Serial > > [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms > [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms > ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count > ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms > [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms > [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses > > Parallel > > [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms > [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms > ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count > ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms > [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms > [6.313s][info ][gc,phases,start] GC(12) Summary Phase > > G1 Full > > [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms > [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms > ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count > ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms > [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms > [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13772#pullrequestreview-1410586664 From shade at openjdk.org Wed May 3 10:19:23 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 10:19:23 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References: Message-ID: <6IdRUilFngVdJMReyIYaKHl-3j3JQWPWvKqdiq81h54=.3505b32b-7120-4c04-add3-0dceceb1ec90@github.com> On Wed, 3 May 2023 10:04:14 GMT, olivergillespie wrote: > Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. > > Added output: > > Serial > > [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms > [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms > ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count > ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms > [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms > [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses > > Parallel > > [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms > [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms > ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count > ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms > [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms > [6.313s][info ][gc,phases,start] GC(12) Summary Phase > > G1 Full > > [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms > [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms > ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count > ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms > [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms > [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction Some nits. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2071: > 2069: > 2070: { > 2071: GCTraceTime(Debug, gc, phases) debug("Report Object Count", &_gc_timer); Nit: in this file, the holder variables are called `tm`, not `debug`. src/hotspot/share/gc/serial/genMarkSweep.cpp line 214: > 212: > 213: { > 214: GCTraceTime(Debug, gc, phases) debug("Report Object Count", gc_timer()); Nit: in this file, the holder variables are called `tm_m`, not `debug`. ------------- PR Review: https://git.openjdk.org/jdk/pull/13772#pullrequestreview-1410586067 PR Review Comment: https://git.openjdk.org/jdk/pull/13772#discussion_r1183496549 PR Review Comment: https://git.openjdk.org/jdk/pull/13772#discussion_r1183496759 From ayang at openjdk.org Wed May 3 10:25:14 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 10:25:14 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:04:14 GMT, olivergillespie wrote: > single threaded STW full heap scan `HeapInspection::populate_table` can use multiple threads. Could `report_object_count_after_gc` invoke the parallel version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13772#issuecomment-1532783089 From duke at openjdk.org Wed May 3 10:31:15 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 10:31:15 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References:

Message-ID: On Wed, 3 May 2023 10:22:52 GMT, Albert Mingkun Yang wrote: > Could report_object_count_after_gc invoke the parallel version? Yes, I was just thinking the same thing! I think it could, I will follow up to implement that change, thanks for the suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13772#issuecomment-1532788952 From shade at openjdk.org Wed May 3 10:34:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 10:34:15 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References:

Message-ID: On Wed, 3 May 2023 10:27:59 GMT, olivergillespie wrote: > > Could report_object_count_after_gc invoke the parallel version? > > Yes, I was just thinking the same thing! I think it could, I will follow up to implement that change, thanks for the suggestion. Filed: https://bugs.openjdk.org/browse/JDK-8307348 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13772#issuecomment-1532793618 From tschatzl at openjdk.org Wed May 3 10:34:19 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 10:34:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> Message-ID: On Tue, 2 May 2023 14:14:21 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/heapRegion.inline.hpp line 344: > >> 342: } >> 343: >> 344: inline bool HeapRegion::in_collection_set_candidates() const { > > The impl is identical to `is_collection_set_candidate`. Maybe one is enough? I inlined a few helpers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183512571 From duke at openjdk.org Wed May 3 10:40:24 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 10:40:24 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code [v2] In-Reply-To: References: Message-ID: > Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. > > Added output: > > Serial > > [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms > [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms > ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count > ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms > [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms > [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses > > Parallel > > [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms > [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms > ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count > ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms > [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms > [6.313s][info ][gc,phases,start] GC(12) Summary Phase > > G1 Full > > [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms > [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms > ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count > ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms > [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms > [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction olivergillespie has updated the pull request incrementally with one additional commit since the last revision: Use correct holder var names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13772/files - new: https://git.openjdk.org/jdk/pull/13772/files/3ef4f4cc..ce7227c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13772&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13772&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13772.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13772/head:pull/13772 PR: https://git.openjdk.org/jdk/pull/13772 From shade at openjdk.org Wed May 3 10:41:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 10:41:16 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code [v2] In-Reply-To: References:

Message-ID: On Wed, 3 May 2023 10:40:24 GMT, olivergillespie wrote: >> Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. >> >> Added output: >> >> Serial >> >> [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms >> [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms >> ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count >> ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms >> [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms >> [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses >> >> Parallel >> >> [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms >> [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms >> ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count >> ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms >> [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms >> [6.313s][info ][gc,phases,start] GC(12) Summary Phase >> >> G1 Full >> >> [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms >> [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms >> ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count >> ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms >> [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms >> [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction > > olivergillespie has updated the pull request incrementally with one additional commit since the last revision: > > Use correct holder var names This looks fine to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13772#pullrequestreview-1410628693 From rkennke at openjdk.org Wed May 3 10:54:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 10:54:43 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: References: Message-ID: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More Thomas' comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/8366454e..b623db55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=18-19 Stats: 25 lines in 3 files changed: 7 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From stuefe at openjdk.org Wed May 3 11:11:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:11:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: > 151: // Set from and to in new or found entry. > 152: entry->_from = from; > 153: entry->_to = to; Why so complicated? Proposal: while (entry != nullptr && entry->_from != from) { entry = entry->_next; } if (entry == nullptr) { FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); new_entry->next = head; new_entry->_from = from; head = entry = new_entry; } entry->_to = to; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183545518 From rkennke at openjdk.org Wed May 3 11:16:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 11:16:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> Message-ID: On Wed, 3 May 2023 11:08:04 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> More Thomas' comments > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: > >> 151: // Set from and to in new or found entry. >> 152: entry->_from = from; >> 153: entry->_to = to; > > Why so complicated? Proposal: > > while (entry != nullptr && entry->_from != from) { > entry = entry->_next; > } > if (entry == nullptr) { > FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); > new_entry->next = head; > new_entry->_from = from; > head = entry = new_entry; > } > entry->_to = to; Uhm, so this would not change the actual head > Why so complicated? Proposal: > > ``` > while (entry != nullptr && entry->_from != from) { > entry = entry->_next; > } > if (entry == nullptr) { > FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); > new_entry->next = head; > new_entry->_from = from; > head = entry = new_entry; > } > entry->_to = to; > ``` Remember that head points into the array. We cannot actually prepend the new entry, we can only insert it as the first linked entry after head. If I see it correctly, it would not actually change the head-entry (the stuff in the array) except for its _to field. Also, the new_entry would not get linked anywhere. Or what am I missing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183550376 From duke at openjdk.org Wed May 3 11:18:29 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 11:18:29 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection Message-ID: ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? ------------- Commit messages: - 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection Changes: https://git.openjdk.org/jdk/pull/13774/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307348 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13774/head:pull/13774 PR: https://git.openjdk.org/jdk/pull/13774 From stuefe at openjdk.org Wed May 3 11:20:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:20:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> Message-ID: <5fyq8JC1XDQJffYmRtITjnFebGWAuYN8doLD_DoiPN0=.8809283a-4cbc-4e47-9598-aeb8a335c8eb@github.com> On Wed, 3 May 2023 11:13:47 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: >> >>> 151: // Set from and to in new or found entry. >>> 152: entry->_from = from; >>> 153: entry->_to = to; >> >> Why so complicated? Proposal: >> >> while (entry != nullptr && entry->_from != from) { >> entry = entry->_next; >> } >> if (entry == nullptr) { >> FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); >> new_entry->next = head; >> new_entry->_from = from; >> head = entry = new_entry; >> } >> entry->_to = to; > > Uhm, so this would not change the actual head > >> Why so complicated? Proposal: >> >> ``` >> while (entry != nullptr && entry->_from != from) { >> entry = entry->_next; >> } >> if (entry == nullptr) { >> FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); >> new_entry->next = head; >> new_entry->_from = from; >> head = entry = new_entry; >> } >> entry->_to = to; >> ``` > > Remember that head points into the array. We cannot actually prepend the new entry, we can only insert it as the first linked entry after head. If I see it correctly, it would not actually change the head-entry (the stuff in the array) except for its _to field. Also, the new_entry would not get linked anywhere. Or what am I missing? Ah, sorry, I just realized you inlined the head elements into the table. Okay, never mind then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183553784 From duke at openjdk.org Wed May 3 11:23:50 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 11:23:50 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v2] In-Reply-To: References: Message-ID: > ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. > > The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: > > > Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) > After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) > > > Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? > > Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? olivergillespie has updated the pull request incrementally with one additional commit since the last revision: Fix compile error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13774/files - new: https://git.openjdk.org/jdk/pull/13774/files/b8b30b5e..88eb1ede Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13774/head:pull/13774 PR: https://git.openjdk.org/jdk/pull/13774 From stuefe at openjdk.org Wed May 3 11:24:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:24:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments Okay, good so far. src/hotspot/share/gc/shared/slidingForwarding.cpp line 147: > 145: new_entry->_next = head->_next; > 146: new_entry->_from = head->_from; > 147: new_entry->_to = head->_to; You could probably just use assignment here, which does memberwise copy. `*new_entry = *head;` ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1410687480 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183556810 From tschatzl at openjdk.org Wed May 3 11:27:37 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 11:27:37 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v4] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang, iwalulya review fix inlining in g1CollectionSet.inline.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/30a157ed..cdc63375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=02-03 Stats: 30 lines in 8 files changed: 3 ins; 10 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From duke at openjdk.org Wed May 3 12:01:13 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 12:01:13 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v3] In-Reply-To: References: Message-ID: <_4E0C256mo0CLZnMrKJq0JoCn7tsFppEGyGt-SsjH9A=.eaa625c9-f745-4341-87d0-97127820bb21@github.com> > ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. > > The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: > > > Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) > After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) > > > Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? > > Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? olivergillespie has updated the pull request incrementally with one additional commit since the last revision: Fix compile error ``` === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_gcTrace.o: /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp: In member function 'void GCTracer::report_object_count_after_gc(BoolObjectClosure*)': /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp:114:48: error: invalid use of incomplete type 'class CollectedHeap' 114 | WorkerThreads* workers = Universe::heap()->safepoint_workers(); | ^~ In file included from /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp:35: /home/runner/work/jdk/jdk/src/hotspot/share/memory/universe.hpp:42:7: note: forward declaration of 'class CollectedHeap' 42 | class CollectedHeap; | ^~~~~~~~~~~~~ * All command lines available in /home/runner/work/jdk/jdk/build/linux-x64/make-support/failure-logs. === End of repeated output === ``` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13774/files - new: https://git.openjdk.org/jdk/pull/13774/files/88eb1ede..22b9b6d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13774/head:pull/13774 PR: https://git.openjdk.org/jdk/pull/13774 From shade at openjdk.org Wed May 3 12:06:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 12:06:15 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v3] In-Reply-To: <_4E0C256mo0CLZnMrKJq0JoCn7tsFppEGyGt-SsjH9A=.eaa625c9-f745-4341-87d0-97127820bb21@github.com> References: <_4E0C256mo0CLZnMrKJq0JoCn7tsFppEGyGt-SsjH9A=.eaa625c9-f745-4341-87d0-97127820bb21@github.com> Message-ID: On Wed, 3 May 2023 12:01:13 GMT, olivergillespie wrote: >> ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. >> >> The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: >> >> >> Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) >> After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) >> >> >> Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? >> >> Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? > > olivergillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix compile error > > ``` > === Output from failing command(s) repeated here === > * For target hotspot_variant-server_libjvm_objs_gcTrace.o: > /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp: In member function 'void GCTracer::report_object_count_after_gc(BoolObjectClosure*)': > /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp:114:48: error: invalid use of incomplete type 'class CollectedHeap' > 114 | WorkerThreads* workers = Universe::heap()->safepoint_workers(); > | ^~ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/gc/shared/gcTrace.cpp:35: > /home/runner/work/jdk/jdk/src/hotspot/share/memory/universe.hpp:42:7: note: forward declaration of 'class CollectedHeap' > 42 | class CollectedHeap; > | ^~~~~~~~~~~~~ > > * All command lines available in /home/runner/work/jdk/jdk/build/linux-x64/make-support/failure-logs. > === End of repeated output === > ``` Why not just `hi.populate_table(&cit, is_alive_cl, ParallelGCThreads);`, and let the `populate_table` deal with the rest? I think we have a convention that `ParallelGCThreads` is roughly the proxy for the number of GC threads at paused operation. (It is weird that `HeapInspection::populate_table` uses `safepoint_workers` -- maybe that's for additional isolation from the GC threads -- let's not proliferate it here. `populate_table` also caps the worker count at `max_workers`, which answers one of your questions) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13774#issuecomment-1532905321 From duke at openjdk.org Wed May 3 12:16:13 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 12:16:13 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v4] In-Reply-To: References: Message-ID: > ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. > > The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: > > > Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) > After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) > > > Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? > > Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? olivergillespie has updated the pull request incrementally with one additional commit since the last revision: Use ParallelGCThreads instead of active_workers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13774/files - new: https://git.openjdk.org/jdk/pull/13774/files/22b9b6d5..711cb643 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13774&range=02-03 Stats: 9 lines in 1 file changed: 1 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13774/head:pull/13774 PR: https://git.openjdk.org/jdk/pull/13774 From duke at openjdk.org Wed May 3 12:16:14 2023 From: duke at openjdk.org (olivergillespie) Date: Wed, 3 May 2023 12:16:14 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v3] In-Reply-To: References: <_4E0C256mo0CLZnMrKJq0JoCn7tsFppEGyGt-SsjH9A=.eaa625c9-f745-4341-87d0-97127820bb21@github.com> Message-ID: <7ZZt-QYarRp7IHjf8vYEvn8Asp2cmZ8KFwoQLwWKLy8=.bd45eca2-ed82-4cb8-b10c-c1c97b5fab14@github.com> On Wed, 3 May 2023 12:03:44 GMT, Aleksey Shipilev wrote: > Why not just `hi.populate_table(&cit, is_alive_cl, ParallelGCThreads);`, and let the `populate_table` deal with the rest? I think we have a convention that `ParallelGCThreads` is roughly the proxy for the number of GC threads at paused operation. > > (It is weird that `HeapInspection::populate_table` uses `safepoint_workers` -- maybe that's for additional isolation from the GC threads -- let's not proliferate it here. `populate_table` also caps the worker count at `max_workers`, which answers one of your questions) Thanks, that's fine by me, whatever is most idiomatic. Updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13774#issuecomment-1532917696 From rkennke at openjdk.org Wed May 3 12:21:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 12:21:44 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v21] In-Reply-To: References: Message-ID: <9ne0qVqlv8GjcVAZ76BIuMLqypEmpAhS-W_cHi_FRfE=.8491768a-6c51-4af9-a07f-d99863d634a5@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Refactor GCForwarding into SlidingForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/b623db55..568e5ea3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=19-20 Stats: 343 lines in 20 files changed: 87 ins; 184 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Wed May 3 12:34:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 12:34:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v22] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Place 'public' correctly - Use member assignment, instead of explicitly copying the struct - Set UseAltGCForwarding flag in test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/568e5ea3..7691eb81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=20-21 Stats: 8 lines in 3 files changed: 4 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From ayang at openjdk.org Wed May 3 12:56:16 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 12:56:16 GMT Subject: RFR: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code [v2] In-Reply-To: References:

Message-ID: On Wed, 26 Apr 2023 17:28:49 GMT, Chris Plummer wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> cplummer review > > SA changes look good. Thanks @plummercj @sspitsyn @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/13643#issuecomment-1533064129 From tschatzl at openjdk.org Wed May 3 13:53:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 13:53:28 GMT Subject: Integrated: 8306836: Remove pinned tag for G1 heap regions In-Reply-To: References: Message-ID: <4wdBNSgTzWoVKhbSXY8vlBwj_3eE2pyB3knxVGWKDHk=.0225c1ae-6a26-4170-b2ea-1e85ea6e6a64@github.com> On Tue, 25 Apr 2023 13:49:05 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes the pinned tag from `HeapRegion`. > > So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. > > With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". > > The (current) pinned flag is surprisingly little used, only for policy decisions. > > The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). > > Testing: tier1-3, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: fc76687c Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/fc76687c2fac39fcbf706c419bfa170b8efa5747 Stats: 62 lines in 18 files changed: 5 ins; 31 del; 26 mod 8306836: Remove pinned tag for G1 heap regions Reviewed-by: ayang, cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13643 From rkennke at openjdk.org Wed May 3 14:10:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 14:10:33 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Bunch of fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/7691eb81..f30039a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=21-22 Stats: 13 lines in 3 files changed: 0 ins; 10 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Wed May 3 14:34:22 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 14:34:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References:

Message-ID: <1JFhieDv0YPe9ntcx6S2IFzkMdj6NGQtuoWPcG0KXUU=.eb4cafe0-4086-49d2-9a6d-720aa9b2fe69@github.com> On Wed, 3 May 2023 14:10:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bunch of fixes Performance: the worst case I can come up with is Serial Full GC that moves the entire heap full of smallest objects, like this: public class Retain { static final int RETAINED = Integer.getInteger("retained", 10_000_000); static final int GCS = Integer.getInteger("gcs", 100); static Object[] OBJECTS = new Object[RETAINED]; public static void main(String... args) { for (int t = 0; t < GCS; t++) { for (int c = 0; c < RETAINED; c++) { OBJECTS[c] = new Object(); } System.gc(); } } } On my `c6n.8xlarge` instance, with `java -Xmx1g -Xlog:gc -XX:+UseSerialGC Retain.java`, I see: baseline: 364 +- 5 ms patched, -AltGCForwarding: 385 +- 3 ms [+6%] patched, +AltGCForwarding: 445 +- 5ms [+22%] There are regressions even with `-AltGCForwarding`, and judging from the profiles and the point experiments, those are caused by the `AltGCForwarding` flag checks for every `forward_to` and `forwardee`, split evenly between these two paths. But given the very targeted workload above running back-to-back Full GCs intentionally, this regression looks okay. (I think the only way to dodge it would be to template the bunch of GC code and dispatch to it once per GC phase, rather than per oop, which would be very intrusive and serve no practical need, IMO.) The regression with `+AltGCForwarding` looks impressive in comparison: it is "only" worth three flag checks or so. The code I am seeing in profiles is already quite polished, so we would unlikely squeeze more from it without investing much more time. I don't think any of this would show up at larger benchmarks running in usual (young, mixed) GC modes. Indeed, I ran a few point experiments, and there seem to be no visible change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1533133187 From tschatzl at openjdk.org Wed May 3 15:35:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 15:35:20 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang, iwalulya review fix inlining in g1CollectionSet.inline.hpp - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang review - remove unused methods - Whitespace fixes - typo - More cleanup - Cleanup - Cleanup - Refactor collection set candidates Improve the interface to collection set candidates and prepare for having collection set candidates at any time. Preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch only uses candidates from marking at this time. Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. * the collection set candidates set is not temporarily allocated any more, but the candidate set object must be available all the time. * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). * there are several additional helper sets/lists * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. All these sets implement C++ iterators for simpler use in various places. Everything else are changes to use these helper sets/lists throughout. Some additional FIXME for log messages to remove are in there. Please ignore. ------------- Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=04 Stats: 1082 lines in 25 files changed: 617 ins; 219 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From stuefe at openjdk.org Wed May 3 16:02:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 16:02:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References:

Message-ID: On Wed, 3 May 2023 14:10:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bunch of fixes Changes requested by stuefe (Reviewer). src/hotspot/share/gc/shared/slidingForwarding.cpp line 162: > 160: FallbackTableEntry* head = &_table[idx]; > 161: FallbackTableEntry* entry = head; > 162: // Search existing entry in chain starting at idx. You dont use the head node. You should use the head node before creating a new node. ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1411249288 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183900419 From rkennke at openjdk.org Wed May 3 17:47:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 17:47:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v24] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Flatten SlidingForwarding and use heads of FallbackTable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/f30039a0..fe0915e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=22-23 Stats: 117 lines in 3 files changed: 19 ins; 43 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Wed May 3 19:19:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 19:19:44 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: Message-ID: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix type narrowing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/fe0915e2..5ee17597 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From tschatzl at openjdk.org Wed May 3 21:42:24 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 21:42:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> On Wed, 3 May 2023 19:19:44 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix type narrowing Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/gc_globals.hpp line 699: > 697: \ > 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ > 699: "Use alternative GC forwarding that preserves object headers") \ I would strongly prefer if this were not a product flag at this time, but a develop flag. It potentially decreases performance of serial gc full gcs by a significant amount with no upside at all (not that worried about g1 or other concurrent gcs). Can you give me reasons why an end user would ever consciously enable this flag? Using a develop flag is only a minor annoyance for development - we already do that for other features like evacuation failure injection in G1. For end users this would result in (guaranteed) zero performance impact. Only when adding compressed object headers with Lilliput this should be changed to a product flag. I do not know your schedule for upstreaming Lilliput, but if it would miss JDK 21, people would suffer from this for the entire lifetime of JDK 21.... which is an LTS release. (Fwiw I would suggest the same for a non-LTS release, it seems to be worse in this situation though). src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > 41: > 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { > 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 59: > 57: // Primary is free > 58: _bases_table[base_idx] = to_region_base; > 59: } else if (_bases_table[base_idx] == to_region_base) { This probably won't help at all with performance, but I would kind of put the checks for the common cases where the table values are set (particularly the first one) first (I may be wrong about whether this is possible). The `UNUSED_BASE` values in the tables will be encountered exactly once... ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1411879100 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184309687 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184310440 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184313798 From tschatzl at openjdk.org Wed May 3 22:08:24 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 22:08:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:30:31 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > >> 41: >> 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { >> 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); > > I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. > > Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. Maybe possible ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184338484 From tschatzl at openjdk.org Wed May 3 22:08:25 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 22:08:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: On Wed, 3 May 2023 19:19:44 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix type narrowing src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 82: > 80: > 81: uintptr_t encoded = (offset << OFFSET_BITS_SHIFT) | > 82: (alt_region << ALT_REGION_SHIFT) | While I understand that a `bool` is typically encoded as either `0` or `1` (not sure if it's actually specified somewhere) it would likely make the code cleaner to use a real integer of some type here to me. Also, the shift could be inlined in the assignments above. Like setting `alt_region` to either `0 (<< ALT_REGION_SHIFT)` or `1 << ALT_REGION_SHIFT` directly in the code. This is obviously a nano-optimization that probably won't show up anywhere... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184337013 From kbarrett at openjdk.org Thu May 4 05:35:26 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 May 2023 05:35:26 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v6] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 16:47:06 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that removes the pinned tag from `HeapRegion`. >> >> So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. >> >> With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". >> >> The (current) pinned flag is surprisingly little used, only for policy decisions. >> >> The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Remove is_young_gc_movable Sorry to be late to the review. I noticed a problem in a comment. ------------- PR Review: https://git.openjdk.org/jdk/pull/13643#pullrequestreview-1407061087 From kbarrett at openjdk.org Thu May 4 05:35:29 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 May 2023 05:35:29 GMT Subject: RFR: 8306836: Remove pinned tag for G1 heap regions [v4] In-Reply-To: References:

Message-ID: <6sav8G_h5tJF6Chc-hLzW2k_7WtHPc6uk5Fr7zmuGSM=.bcece9ab-0f31-4f84-8dcf-05e530cac9df@github.com> On Thu, 27 Apr 2023 12:31:24 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that removes the pinned tag from `HeapRegion`. >> >> So that "pinned" tag for G1 heap regions indicates that the region should not move during (young) gc. This applies to now removed archive regions and humongous objects/regions. >> >> With "real" g1 region pinning to deal with gclocker in g1 once and for all upcoming we need a refcount, a single bit is not sufficient anymore. Further there will be a naming conflict as this kind of "pinning" is different to g1 region pinning "pinning". The former indicates "contents can not be moved, but can be reclaimed", while the latter means "contents can not be moved and not reclaimed". >> >> The (current) pinned flag is surprisingly little used, only for policy decisions. >> >> The suggestion this change implements is to remove the "pinned" tag as it is, and reserve it for future g1 region pinning (that needs to store the pinning attribute differently as a refcount anyway). >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > remove is_young_gc_movable in full gc code src/hotspot/share/gc/g1/g1CollectionSetChooser.hpp line 57: > 55: // Determine whether to add the given region to the collection set candidates or > 56: // not. Currently, we skip regions that we will never move during young gc, and > 57: // regions which liveness is below the occupancy threshold. s/liveness is below/liveness is over/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13643#discussion_r1181174243 From rkennke at openjdk.org Thu May 4 06:01:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:01:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:29:20 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/gc_globals.hpp line 699: > >> 697: \ >> 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ >> 699: "Use alternative GC forwarding that preserves object headers") \ > > I would strongly prefer if this were not a product flag at this time, but a develop flag. > > It potentially decreases performance of serial gc full gcs by a significant amount with no upside at all (not that worried about g1 or other concurrent gcs). Can you give me reasons why an end user would ever consciously enable this flag? > > Using a develop flag is only a minor annoyance for development - we already do that for other features like evacuation failure injection in G1. For end users this would result in (guaranteed) zero performance impact. > > Only when adding compressed object headers with Lilliput this should be changed to a product flag. > > I do not know your schedule for upstreaming Lilliput, but if it would miss JDK 21, people would suffer from this for the entire lifetime of JDK 21.... which is an LTS release. (Fwiw I would suggest the same for a non-LTS release, it seems to be worse in this situation though). Ok that is reasonable, I will do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184571039 From rkennke at openjdk.org Thu May 4 06:01:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:01:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 22:05:43 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: >> >>> 41: >>> 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { >>> 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); >> >> I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. >> >> Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. > > Maybe possible ;) I don't think so. The biasing in G1 GC (and Shenandoah GC) uses an array to look up per-region stuff (like cset property) without first calculating the actual region index. Instead, it allows to simply shift an address and use that biased index to address the biased array. Here we don't have an array, we only want the index of the region that contains the address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184571191 From rkennke at openjdk.org Thu May 4 06:05:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:05:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:35:03 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 59: > >> 57: // Primary is free >> 58: _bases_table[base_idx] = to_region_base; >> 59: } else if (_bases_table[base_idx] == to_region_base) { > > This probably won't help at all with performance, but I would kind of put the checks for the common cases where the table values are set (particularly the first one) first (I may be wrong about whether this is possible). > The `UNUSED_BASE` values in the tables will be encountered exactly once... I believe we can safely swap the UNUSED with the primary check. I'll do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184573418 From rkennke at openjdk.org Thu May 4 06:09:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:09:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: <3DQM2ay8VGdLVhxa2iCqOOS3KX3AvXyoq_w3t228Sm0=.9385d002-3124-40ac-bddf-5340015dfed5@github.com> On Wed, 3 May 2023 22:04:08 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 82: > >> 80: >> 81: uintptr_t encoded = (offset << OFFSET_BITS_SHIFT) | >> 82: (alt_region << ALT_REGION_SHIFT) | > > While I understand that a `bool` is typically encoded as either `0` or `1` (not sure if it's actually specified somewhere) it would likely make the code cleaner to use a real integer of some type here to me. > > Also, the shift could be inlined in the assignments above. > Like setting `alt_region` to either `0 (<< ALT_REGION_SHIFT)` or `1 << ALT_REGION_SHIFT` directly in the code. > This is obviously a nano-optimization that probably won't show up anywhere... Oh yes, I'll change it to an integral value. I don't see how moving the shift to the assignment would help, and I'd prefer to keep it in the place where we encode the value, I think that is more readable/less confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184575641 From rkennke at openjdk.org Thu May 4 06:30:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:30:01 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v26] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Switch back to size_t for some fields - Address @tschatzl's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/5ee17597..2762f1b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=24-25 Stats: 40 lines in 4 files changed: 4 ins; 4 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Thu May 4 07:04:19 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 07:04:19 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v27] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix release build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/2762f1b1..0cc732ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=25-26 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From tschatzl at openjdk.org Thu May 4 07:56:17 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 4 May 2023 07:56:17 GMT Subject: RFR: 8307421: Fix comment in g1CollectionSetChooser.hpp after JDK-8306836 Message-ID: Hi all, please review this trivial comment fix @kimbarrett noticed while reviewing the [JDK-8306836](https://bugs.openjdk.org/browse/JDK-8306836) change after having it pushed. Testing: local compilation ------------- Commit messages: - fix comment, kbarrett finding Changes: https://git.openjdk.org/jdk/pull/13793/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13793&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307421 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13793/head:pull/13793 PR: https://git.openjdk.org/jdk/pull/13793 From duke at openjdk.org Thu May 4 09:22:26 2023 From: duke at openjdk.org (olivergillespie) Date: Thu, 4 May 2023 09:22:26 GMT Subject: Integrated: 8307346 - Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:04:14 GMT, olivergillespie wrote: > Add logging for the time taken to collect/report ObjectCount(AfterGC) event to Serial, Parallel and G1 full collectors, for parity with the G1 concurrent collector. This event can be *very* expensive to collect (single threaded STW full heap scan), so it's very important to show it clearly to the user. > > Added output: > > Serial > > [7.976s][debug][gc,phases ] GC(7) Trigger cleanups 0.002ms > [7.977s][debug][gc,phases ] GC(7) Class Unloading 0.556ms > ++ [7.977s][debug][gc,phases,start] GC(7) Report Object Count > ++ [8.529s][debug][gc,phases ] GC(7) Report Object Count 552.065ms > [8.529s][info ][gc,phases ] GC(7) Phase 1: Mark live objects 1066.882ms > [8.529s][info ][gc,phases,start] GC(7) Phase 2: Compute new object addresses > > Parallel > > [5.786s][debug][gc,phases ] GC(12) Trigger cleanups 0.002ms > [5.786s][debug][gc,phases ] GC(12) Class Unloading 0.556ms > ++ [5.786s][debug][gc,phases,start] GC(12) Report Object Count > ++ [6.313s][debug][gc,phases ] GC(12) Report Object Count 526.307ms > [6.313s][info ][gc,phases ] GC(12) Marking Phase 889.900ms > [6.313s][info ][gc,phases,start] GC(12) Summary Phase > > G1 Full > > [3.922s][debug][gc,phases ] GC(24) Trigger cleanups 0.002ms > [3.922s][debug][gc,phases ] GC(24) Phase 1: Class Unloading and Cleanup 0.159ms > ++ [3.922s][debug][gc,phases,start ] GC(24) Report Object Count > ++ [4.442s][debug][gc,phases ] GC(24) Report Object Count 519.292ms > [4.442s][info ][gc,phases ] GC(24) Phase 1: Mark live objects 653.209ms > [4.442s][info ][gc,phases,start ] GC(24) Phase 2: Prepare compaction This pull request has now been integrated. Changeset: 3f1927a7 Author: Oli Gillespie Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3f1927a7f3a2914402a25335c47a5a8bdd5511a6 Stats: 12 lines in 3 files changed: 9 ins; 0 del; 3 mod 8307346: Add missing gc+phases logging for ObjectCount(AfterGC) JFR event collection code Reviewed-by: tschatzl, shade, ayang ------------- PR: https://git.openjdk.org/jdk/pull/13772 From eosterlund at openjdk.org Thu May 4 09:37:28 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 May 2023 09:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com>

<3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com>

Message-ID: On Fri, 28 Apr 2023 17:52:33 GMT, Erik ?sterlund wrote: >> It seems to be used in a couple of places already: >> >> grep -R ff51afd7ed558ccd src >> src/jdk.jfr/share/classes/jdk/jfr/internal/EventWriterKey.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/java.base/share/classes/java/util/SplittableRandom.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; // MurmurHash3 mix constants >> src/java.base/share/classes/java/util/concurrent/ThreadLocalRandom.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/java.base/share/classes/jdk/internal/util/random/RandomSupport.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/hotspot/share/gc/shared/slidingForwarding.cpp: val *= 0xff51afd7ed558ccdULL; >> src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp: key *= UINT64_C(0xff51afd7ed558ccd); > > Sounds good then. If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184778234 From rkennke at openjdk.org Thu May 4 10:53:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 10:53:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com>

<3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com>

Message-ID: <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> On Thu, 4 May 2023 09:34:27 GMT, Erik ?sterlund wrote: >> Sounds good then. > > If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 > @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184854210 From eosterlund at openjdk.org Thu May 4 11:00:29 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 May 2023 11:00:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com>

<3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com>

<4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> Message-ID: <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZS4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> On Thu, 4 May 2023 10:50:17 GMT, Roman Kennke wrote: >> If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 >> @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. > > Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) > > Oh and that code is using __int128 type, how/where do I get that outside of GCC? Yes - great idea. Maybe somewhere in utilities. We might swap to it with ZGC as well when things settle down there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184860529 From rkennke at openjdk.org Thu May 4 11:27:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:27:35 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v28] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use @rose00's fast-hash impl instead of murmur ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0cc732ed..ad9fb171 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=26-27 Stats: 106 lines in 2 files changed: 93 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Thu May 4 11:27:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:27:36 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZS4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com>

<3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com>

<4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZ S4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> Message-ID: On Thu, 4 May 2023 10:57:07 GMT, Erik ?sterlund wrote: >> Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) >> >> Oh and that code is using __int128 type, how/where do I get that outside of GCC? > > Yes - great idea. Maybe somewhere in utilities. We might swap to it with ZGC as well when things settle down there. Ok, I pushed a change that uses @rose00's better hashing. I added/changed the 128-bit multiplication to (hopefully) make it portable. Let's see what GHA has to say about this ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184882541 From rkennke at openjdk.org Thu May 4 11:40:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:40:14 GMT Subject: RFR: 8307395: Add missing STS to Shenandoah Message-ID: Testing in project Lilliput has revealed that Shenandoah GC is lacking one STS. This causes a reliable crash (with Lilliput) when running TestGCBasherWithShenandoah.java with -XX:+UseHeavyMonitors because it touches an already deflated monitor. Testing (all in Lilliput where it caused the troubles, but applies to upstream as well): - [x] TestGCBasherWithShenandoah.java +UseHeavyMonitors - [x] hotspot_gc_shenandoah +UseHeavyMonitors ------------- Commit messages: - 8307395: Add missing STS to Shenandoah Changes: https://git.openjdk.org/jdk/pull/13799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307395 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13799/head:pull/13799 PR: https://git.openjdk.org/jdk/pull/13799 From rkennke at openjdk.org Thu May 4 11:56:22 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:56:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add usual header include guards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/ad9fb171..0f3604aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=27-28 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Thu May 4 12:24:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 12:24:15 GMT Subject: RFR: 8307395: Add missing STS to Shenandoah In-Reply-To: References: Message-ID: <9NYMLTyK8H2Ruo-8pZ8UtOR0m_tjv7rXucsYlYIhFUs=.3495923c-6f52-42e1-90b9-9b8930f111d3@github.com> On Thu, 4 May 2023 11:34:15 GMT, Roman Kennke wrote: > Testing in project Lilliput has revealed that Shenandoah GC is lacking one STS. This causes a reliable crash (with Lilliput) when running TestGCBasherWithShenandoah.java with -XX:+UseHeavyMonitors because it touches an already deflated monitor. > > Testing (all in Lilliput where it caused the troubles, but applies to upstream as well): > - [x] TestGCBasherWithShenandoah.java +UseHeavyMonitors > - [x] hotspot_gc_shenandoah +UseHeavyMonitors Looks fine, provided testing is clean. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13799#pullrequestreview-1412969497 From shade at openjdk.org Thu May 4 13:28:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 13:28:16 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v4] In-Reply-To: References:

Message-ID: On Wed, 3 May 2023 12:16:13 GMT, olivergillespie wrote: >> ObjectCount(AfterGC) event does a full single-threaded heap scan at a safepoint. After https://bugs.openjdk.org/browse/JDK-8215624, it is trivial to use the parallel version of the heap scan, reducing the time spent at the safepoint, and thus reducing the overhead of this event. >> >> The performance improvement is obvious, but just for confirmation, on my 16-core host, at around 1GB occupancy: >> >> >> Before: 770ms ( [3.059s][debug][gc,phases ] GC(13) Report Object Count 770.317ms ) >> After: 92ms ( [2.335s][debug][gc,phases ] GC(13) Report Object Count 91.742ms ) >> >> >> Question 1: Should this be the default behaviour for populate_table (use the number active workers as the parallelism, if nothing else specified)? >> >> Question 2: Is active_workers the correct value to use here? Or is max_workers more appropriate? > > olivergillespie has updated the pull request incrementally with one additional commit since the last revision: > > Use ParallelGCThreads instead of active_workers I looked if there might be a better option, like passing the `WorkerThreads*` from callers to actually figure out the number of active workers from the GC itself, but all of this cuts rather deep. I would say that should be done in a separate PR. `ParallelGCThreads` should work well meanwhile. @albertnetymk, @tschatzl might have an opinion here. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13774#pullrequestreview-1413089971 From shade at openjdk.org Thu May 4 13:29:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 13:29:14 GMT Subject: RFR: 8307421: Fix comment in g1CollectionSetChooser.hpp after JDK-8306836 In-Reply-To: References: Message-ID: On Thu, 4 May 2023 07:49:43 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial comment fix @kimbarrett noticed while reviewing the [JDK-8306836](https://bugs.openjdk.org/browse/JDK-8306836) change after having it pushed. > > Testing: local compilation Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13793#pullrequestreview-1413092518 From rkennke at openjdk.org Thu May 4 13:56:07 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 13:56:07 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Clamp home index. Duh. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0f3604aa..c3b9ae9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=28-29 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From stuefe at openjdk.org Thu May 4 13:56:09 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 May 2023 13:56:09 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References:

Message-ID: <7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> On Thu, 4 May 2023 13:51:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clamp home index. Duh. LGTM. I ok this now; my remaining comments are suggestions - up to you to take them or not. I removed some of my obsolete comments to clear the space. Tests are missing. A simple way would be to run a selection of our standard GC tests with +AltGCForwarding. This is especially important if you follow Thomas' advice and make AltGCForwarding a develop switch. The only other thing that occurred to me is that you could probably change initialization: don't require caller to specify it but calculate it yourself such that the 28 bit offset is maximally used. That would save some memory since the bases table can be smaller. Again, up to you. src/hotspot/share/gc/shared/slidingForwarding.cpp line 127: > 125: size_t FallbackTable::home_index(HeapWord* from) { > 126: uint64_t val = reinterpret_cast(from); > 127: uint64_t hash = FastHash::get_hash64(val, 0xAAAAAAAAAAAAAAAA); Use UCONST64(0xAAA..AA) ? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1413025879 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185044658 From stuefe at openjdk.org Thu May 4 13:56:19 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 May 2023 13:56:19 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 11:56:22 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add usual header include guards src/hotspot/share/gc/shared/slidingForwarding.cpp line 35: > 33: // We cannot use 0, because that may already be a valid base address in zero-based heaps. > 34: // 0x1 is safe because heap base addresses must be aligned by much larger alignment > 35: HeapWord* const SlidingForwarding::UNUSED_BASE = reinterpret_cast(0x1); I try to understand under which circumstances a zero heap location would be okay. This is *uncompressed* oops, right? If that were 0, you could just hardcode constexpr 0 in the header. src/hotspot/share/gc/shared/slidingForwarding.cpp line 111: > 109: _table[i]._from = nullptr; > 110: _table[i]._to = nullptr; > 111: } It would be enough to set _from to nullptr. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 60: > 58: } else if (_bases_table[base_idx] == UNUSED_BASE) { > 59: // Primary is free > 60: _bases_table[base_idx] = to_region_base; Since the else branch is probably much more common, would it make sense to swap the conditions? Same below. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 87: > 85: assert(to == decode_forwarding(from, encoded), "must be reversible"); > 86: return encoded; > 87: } Since encoding should produce a 32-bit value, why not return a 32-bit value? Same below, for decoding. Or, at least assert that returned value has no higher bits set. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 90: > 88: > 89: HeapWord* SlidingForwarding::decode_forwarding(HeapWord* from, uintptr_t encoded) { > 90: assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); s/marked_value/lock_mask ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184986345 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184989465 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184972889 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184976288 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184980683 From shade at openjdk.org Thu May 4 14:05:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 13:56:07 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clamp home index. Duh. Another round... ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1409678649 From shade at openjdk.org Thu May 4 14:05:36 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:36 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments src/hotspot/share/gc/g1/g1FullGCOopClosures.inline.hpp line 35: > 33: #include "gc/g1/g1FullGCMarker.inline.hpp" > 34: #include "gc/g1/heapRegionRemSet.inline.hpp" > 35: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. src/hotspot/share/gc/serial/markSweep.inline.hpp line 33: > 31: #include "classfile/javaClasses.inline.hpp" > 32: #include "gc/shared/continuationGCSupport.inline.hpp" > 33: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. src/hotspot/share/gc/shared/gc_globals.hpp line 698: > 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 697: \ > 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ See if copyright years need to be updated. src/hotspot/share/gc/shared/preservedMarks.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183540583 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183540894 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183542696 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183542195 From shade at openjdk.org Thu May 4 14:05:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:39 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 11:56:22 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add usual header include guards src/hotspot/share/gc/shared/gc_globals.hpp line 698: > 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 697: \ > 698: develop(bool, UseAltGCForwarding, false, \ I don't think you can opt-in into `true` in release bits, if this flag is `develop` when the rest of Lilliput arrives. In release bits, all checks involving this flag would fold with `false`. Maybe that's the intent here, as it keeps the release performance at the baseline level, but this makes performance overhead estimations for this PR a bit hard :) [I'll hack the flag back to experimental for tests] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185043173 From shade at openjdk.org Thu May 4 14:05:41 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:41 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v19] In-Reply-To: References:

Message-ID: On Tue, 2 May 2023 18:21:18 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Initialize 'heap' elements in test case src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > 41: bool SlidingForwarding::region_contains(HeapWord* region_base, HeapWord* addr) const { > 42: return (region_base <= addr) && (addr < (region_base + _region_size_words)); > 43: } Now unused! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182911404 From rkennke at openjdk.org Thu May 4 14:34:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 14:34:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: <7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> References:

<7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> Message-ID: On Thu, 4 May 2023 13:50:36 GMT, Thomas Stuefe wrote: > LGTM. I ok this now; my remaining comments are suggestions - up to you to take them or not. Thanks! > Tests are missing. A simple way would be to run a selection of our standard GC tests with +AltGCForwarding. This is especially important if you follow Thomas' advice and make AltGCForwarding a develop switch. I run hotspot_gc with UseAltGCForwarding turned on. Not sure if there is an easy way to make this a test task. I could perhaps add a few run configurations to tests that are useful. For example, gc/stress/TestMultiThreadStressRSet.java tended to exercise both the sliding-forwarding and the fallback-forwarding But would the develop-only switch not complicate this? Because it means we could only run such tests in debug builds. > The only other thing that occurred to me is that you could probably change initialization: don't require caller to specify it but calculate it yourself such that the 28 bit offset is maximally used. That would save some memory since the bases table can be smaller. Again, up to you. Yeah maybe. SpaceAlignment should be set by all GCs to a reasonable region-size, we could probably just pick that up. OTOH, we need a little bit of cooperation from the GC here: The whole sliding-forwarding algo relies on the fact that GC workers divide up their work based on their regions, and are essentially single-threaded within their work queues. I'm a bit worried about touching this stuff at this point, and cause another round of reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1534889536 From duke at openjdk.org Thu May 4 14:55:16 2023 From: duke at openjdk.org (olivergillespie) Date: Thu, 4 May 2023 14:55:16 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v4] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 13:25:34 GMT, Aleksey Shipilev wrote: > I looked if there might be a better option, like passing the `WorkerThreads*` from callers to actually figure out the number of active workers from the GC itself, but all of this cuts rather deep. I would say that should be done in a separate PR. `ParallelGCThreads` should work well meanwhile. > > @albertnetymk, @tschatzl might have an opinion here. As you suggested, the typical way is passing the `WorkerThreads*` along instead of passing a thread number and the code selecting the `WorkerThread*` by itself. Actually I'm not sure why safepoint_workers are used at all. The changes to do that does not seem to be that significant to me actually, so I would prefer that. The current change is an improvement already though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13774#issuecomment-1534953027 From ayang at openjdk.org Thu May 4 15:41:14 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 4 May 2023 15:41:14 GMT Subject: RFR: 8307348 - Parallelize heap walk for ObjectCount(AfterGC) JFR event collection [v4] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 15:09:52 GMT, Thomas Schatzl wrote: > Actually I'm not sure why safepoint_workers are used at all. I believe it's semantically incorrect to use `safepoint_workers` here. Maybe `HeapInspection` should live in `gc` folder. > The changes to do that does not seem to be that significant to me actually, so I would prefer that. How does that affect another caller, `VM_GC_HeapInspection::doit`? Unclear to me how one can get gc-workers in that context. Could one introduce another API in `class CollectedHeap`, sth like `virtual WorkerThreads* gc_workers() { return nullptr; }`, next to `safepoint_workers`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13774#issuecomment-1535000662 From rkennke at openjdk.org Thu May 4 16:35:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:35:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References:

Message-ID: On Thu, 4 May 2023 12:53:23 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add usual header include guards > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 60: > >> 58: } else if (_bases_table[base_idx] == UNUSED_BASE) { >> 59: // Primary is free >> 60: _bases_table[base_idx] = to_region_base; > > Since the else branch is probably much more common, would it make sense to swap the conditions? Same below. I just swapped that around to what it is now: I think the UNUSED_BASE would be taken exactly once per region, then every other call would find a 'good' target. Which means the if-branch would be much more common and else only taken rarely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185251911 From rkennke at openjdk.org Thu May 4 16:41:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:41:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: