From rkennke at openjdk.org Fri Mar 7 12:34:24 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 7 Mar 2025 12:34:24 GMT Subject: [master] Withdrawn: Merge jdk:jdk-25+9 In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 14:32:20 GMT, Roman Kennke wrote: > Merging upstream jdk-25+9 and re-applying/merging the 3 Lilliput patches on-top. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/lilliput/pull/194 From rkennke at openjdk.org Fri Mar 14 17:11:07 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 14 Mar 2025 17:11:07 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode Message-ID: The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC. The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects. The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers). I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation: Baseline Full GC: AllDead.gc ss 25 31.120 ? 0.447 ms/op AllLive.gc ss 25 83.655 ? 2.238 ms/op DifferentObjectSizesArray.gc ss 25 179.725 ? 1.171 ms/op DifferentObjectSizesHashMap.gc ss 25 186.011 ? 1.409 ms/op DifferentObjectSizesTreeMap.gc ss 25 65.668 ? 3.333 ms/op HalfDeadFirstPart.gc ss 25 64.862 ? 0.696 ms/op HalfDeadInterleaved.gc ss 25 67.764 ? 3.139 ms/op HalfDeadInterleavedChunks.gc ss 25 59.160 ? 1.667 ms/op HalfDeadSecondPart.gc ss 25 66.210 ? 1.167 ms/op HalfHashedHalfDead.gc ss 25 69.584 ? 2.276 ms/op NoObjects.gc ss 25 18.462 ? 0.270 ms/op OneBigObject.gc ss 25 587.425 ? 27.493 ms/op New Parallel Full GC: AllDead.gc ss 25 39.891 ? 0.461 ms/op AllLive.gc ss 25 87.898 ? 1.940 ms/op DifferentObjectSizesArray.gc ss 25 184.109 ? 0.795 ms/op DifferentObjectSizesHashMap.gc ss 25 189.620 ? 2.236 ms/op DifferentObjectSizesTreeMap.gc ss 25 69.915 ? 3.308 ms/op HalfDeadFirstPart.gc ss 25 70.664 ? 0.804 ms/op HalfDeadInterleaved.gc ss 25 71.318 ? 1.583 ms/op HalfDeadInterleavedChunks.gc ss 25 65.050 ? 1.827 ms/op HalfDeadSecondPart.gc ss 25 70.964 ? 0.878 ms/op HalfHashedHalfDead.gc ss 25 72.506 ? 1.617 ms/op NoObjects.gc ss 25 23.809 ? 0.494 ms/op OneBigObject.gc ss 25 403.461 ? 27.079 ms/op Testing: - [x] tier1 (+UseParallelGC +UCOH) - [x] tier2 (+UseParallelGC +UCOH) - [x] hotspot_gc (+UCOH) ------------- Commit messages: - 8347711: [Lilliput] Parallel GC support for compact identity hashcode Changes: https://git.openjdk.org/lilliput/pull/195/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=195&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347711 Stats: 2080 lines in 11 files changed: 2065 ins; 4 del; 11 mod Patch: https://git.openjdk.org/lilliput/pull/195.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/195/head:pull/195 PR: https://git.openjdk.org/lilliput/pull/195 From zgu at openjdk.org Sun Mar 16 17:14:06 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sun, 16 Mar 2025 17:14:06 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 11:25:23 GMT, Roman Kennke wrote: > The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC. > > The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects. > > The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers). > > I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation: > > Baseline Full GC: > > AllDead.gc ss 25 31.120 ? 0.447 ms/op > AllLive.gc ss 25 83.655 ? 2.238 ms/op > DifferentObjectSizesArray.gc ss 25 179.725 ? 1.171 ms/op > DifferentObjectSizesHashMap.gc ss 25 186.011 ? 1.409 ms/op > DifferentObjectSizesTreeMap.gc ss 25 65.668 ? 3.333 ms/op > HalfDeadFirstPart.gc ss 25 64.862 ? 0.696 ms/op > HalfDeadInterleaved.gc ss 25 67.764 ? 3.139 ms/op > HalfDeadInterleavedChunks.gc ss 25 59.160 ? 1.667 ms/op > HalfDeadSecondPart.gc ss 25 66.210 ? 1.167 ms/op > HalfHashedHalfDead.gc ss 25 69.584 ? 2.276 ms/op > NoObjects.gc ss 25 18.462 ? 0.270 ms/op > OneBigObject.gc ss 25 587.425 ? 27.493 ms/op > > > New Parallel Full GC: > > > AllDead.gc ss 25 39.891 ? 0.461 ms/op > AllLive.gc ss 25 87.898 ? 1.940 ms/op > DifferentObjectSizesArray.gc ss 25 184.109 ? 0.795 ms/op > DifferentObjectSizesHashMap.gc ss 25 189.620 ? 2.236 ms/op > DifferentObjectSizesTreeMap.gc ss 25 69.915 ? 3.308 ms/op > HalfDeadFirstPart.gc ss 25 70.664 ? 0.804 ms/op > HalfDeadInterleaved.gc ss 25 71.318 ? 1.583 ms/op > HalfDeadInterleavedChunks.gc ss 25 65.050... Can you articulate why object cannot grow? ------------- PR Comment: https://git.openjdk.org/lilliput/pull/195#issuecomment-2727547321 From rkennke at openjdk.org Mon Mar 17 09:12:33 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Mar 2025 09:12:33 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode In-Reply-To: References: Message-ID: On Sun, 16 Mar 2025 17:11:49 GMT, Zhengyu Gu wrote: > Can you articulate why object cannot grow? Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into. The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still. The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste). ------------- PR Comment: https://git.openjdk.org/lilliput/pull/195#issuecomment-2728694713 From zgu at openjdk.org Mon Mar 17 13:33:20 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 17 Mar 2025 13:33:20 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 09:09:28 GMT, Roman Kennke wrote: > > Can you articulate why object cannot grow? > > Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into. > You can count liveness with/without object expansion during mark phase, can you? given it reads object header and size anyway. I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right? > The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still. > > The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste). ------------- PR Comment: https://git.openjdk.org/lilliput/pull/195#issuecomment-2729518280 From rkennke at openjdk.org Mon Mar 17 14:21:04 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Mar 2025 14:21:04 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode In-Reply-To: References: Message-ID: <20RhrwHy30cmdGZoTeIw-56HCv8gyvTywvLKDwNw6ts=.63fc5747-99be-4f08-b91f-e11911928dc8@github.com> On Mon, 17 Mar 2025 13:30:19 GMT, Zhengyu Gu wrote: > > > Can you articulate why object cannot grow? > > > > > > Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into. > > You can count liveness with/without object expansion during mark phase, can you? Yes, I can. I actually implemented that in an earlier attemp. > I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right? The problem is not potential overflow. In-fact, overflow can not really happen. The problem is that all those calculations are assumed and required to be precise. Turning them into guesses breaks the algorithm. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/195#issuecomment-2729690308 From rkennke at openjdk.org Mon Mar 17 14:28:47 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Mar 2025 14:28:47 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode [v2] In-Reply-To: References: Message-ID: > The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC. > > The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects. > > The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers). > > I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation: > > Baseline Full GC: > > AllDead.gc ss 25 31.120 ? 0.447 ms/op > AllLive.gc ss 25 83.655 ? 2.238 ms/op > DifferentObjectSizesArray.gc ss 25 179.725 ? 1.171 ms/op > DifferentObjectSizesHashMap.gc ss 25 186.011 ? 1.409 ms/op > DifferentObjectSizesTreeMap.gc ss 25 65.668 ? 3.333 ms/op > HalfDeadFirstPart.gc ss 25 64.862 ? 0.696 ms/op > HalfDeadInterleaved.gc ss 25 67.764 ? 3.139 ms/op > HalfDeadInterleavedChunks.gc ss 25 59.160 ? 1.667 ms/op > HalfDeadSecondPart.gc ss 25 66.210 ? 1.167 ms/op > HalfHashedHalfDead.gc ss 25 69.584 ? 2.276 ms/op > NoObjects.gc ss 25 18.462 ? 0.270 ms/op > OneBigObject.gc ss 25 587.425 ? 27.493 ms/op > > > New Parallel Full GC: > > > AllDead.gc ss 25 39.891 ? 0.461 ms/op > AllLive.gc ss 25 87.898 ? 1.940 ms/op > DifferentObjectSizesArray.gc ss 25 184.109 ? 0.795 ms/op > DifferentObjectSizesHashMap.gc ss 25 189.620 ? 2.236 ms/op > DifferentObjectSizesTreeMap.gc ss 25 69.915 ? 3.308 ms/op > HalfDeadFirstPart.gc ss 25 70.664 ? 0.804 ms/op > HalfDeadInterleaved.gc ss 25 71.318 ? 1.583 ms/op > HalfDeadInterleavedChunks.gc ss 25 65.050... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix big comments that describe the algorithm ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/195/files - new: https://git.openjdk.org/lilliput/pull/195/files/08703bcf..ec376a91 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=195&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=195&range=00-01 Stats: 155 lines in 2 files changed: 39 ins; 39 del; 77 mod Patch: https://git.openjdk.org/lilliput/pull/195.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/195/head:pull/195 PR: https://git.openjdk.org/lilliput/pull/195 From rkennke at openjdk.org Wed Mar 26 09:30:31 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Mar 2025 09:30:31 GMT Subject: [master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode [v3] In-Reply-To: References: Message-ID: > The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC. > > The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects. > > The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers). > > I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation: > > Baseline Full GC: > > AllDead.gc ss 25 31.120 ? 0.447 ms/op > AllLive.gc ss 25 83.655 ? 2.238 ms/op > DifferentObjectSizesArray.gc ss 25 179.725 ? 1.171 ms/op > DifferentObjectSizesHashMap.gc ss 25 186.011 ? 1.409 ms/op > DifferentObjectSizesTreeMap.gc ss 25 65.668 ? 3.333 ms/op > HalfDeadFirstPart.gc ss 25 64.862 ? 0.696 ms/op > HalfDeadInterleaved.gc ss 25 67.764 ? 3.139 ms/op > HalfDeadInterleavedChunks.gc ss 25 59.160 ? 1.667 ms/op > HalfDeadSecondPart.gc ss 25 66.210 ? 1.167 ms/op > HalfHashedHalfDead.gc ss 25 69.584 ? 2.276 ms/op > NoObjects.gc ss 25 18.462 ? 0.270 ms/op > OneBigObject.gc ss 25 587.425 ? 27.493 ms/op > > > New Parallel Full GC: > > > AllDead.gc ss 25 39.891 ? 0.461 ms/op > AllLive.gc ss 25 87.898 ? 1.940 ms/op > DifferentObjectSizesArray.gc ss 25 184.109 ? 0.795 ms/op > DifferentObjectSizesHashMap.gc ss 25 189.620 ? 2.236 ms/op > DifferentObjectSizesTreeMap.gc ss 25 69.915 ? 3.308 ms/op > HalfDeadFirstPart.gc ss 25 70.664 ? 0.804 ms/op > HalfDeadInterleaved.gc ss 25 71.318 ? 1.583 ms/op > HalfDeadInterleavedChunks.gc ss 25 65.050... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't require addresses to be aligned in ParMarkBitMap::find_obj_beg_reverse() ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/195/files - new: https://git.openjdk.org/lilliput/pull/195/files/ec376a91..738c3093 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=195&range=02 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=195&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/lilliput/pull/195.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/195/head:pull/195 PR: https://git.openjdk.org/lilliput/pull/195