[master] RFR: 8347711: [Lilliput] Parallel GC support for compact identity hashcode [v4]

Mon Apr 7 13:51:18 UTC 2025

On Mon, 7 Apr 2025 11:12:40 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC.
>> 
>> The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects.
>> 
>> The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers).
>> 
>> I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation:
>> 
>> Baseline Full GC:
>> 
>> AllDead.gc                        ss   25   31.120 ±  0.447  ms/op
>> AllLive.gc                        ss   25   83.655 ±  2.238  ms/op
>> DifferentObjectSizesArray.gc      ss   25  179.725 ±  1.171  ms/op
>> DifferentObjectSizesHashMap.gc    ss   25  186.011 ±  1.409  ms/op
>> DifferentObjectSizesTreeMap.gc    ss   25   65.668 ±  3.333  ms/op
>> HalfDeadFirstPart.gc              ss   25   64.862 ±  0.696  ms/op
>> HalfDeadInterleaved.gc            ss   25   67.764 ±  3.139  ms/op
>> HalfDeadInterleavedChunks.gc      ss   25   59.160 ±  1.667  ms/op
>> HalfDeadSecondPart.gc             ss   25   66.210 ±  1.167  ms/op
>> HalfHashedHalfDead.gc             ss   25   69.584 ±  2.276  ms/op
>> NoObjects.gc                      ss   25   18.462 ±  0.270  ms/op
>> OneBigObject.gc                   ss   25  587.425 ± 27.493  ms/op
>> 
>> 
>> New Parallel Full GC:
>> 
>> 
>> AllDead.gc                        ss   25   39.891 ±  0.461  ms/op
>> AllLive.gc                        ss   25   87.898 ±  1.940  ms/op
>> DifferentObjectSizesArray.gc      ss   25  184.109 ±  0.795  ms/op
>> DifferentObjectSizesHashMap.gc    ss   25  189.620 ±  2.236  ms/op
>> DifferentObjectSizesTreeMap.gc    ss   25   69.915 ±  3.308  ms/op
>> HalfDeadFirstPart.gc              ss   25   70.664 ±  0.804  ms/op
>> HalfDeadInterleaved.gc            ss   25   71.31...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Avoid racy update in Klass::expand_for_hash()

> > How does this algorithm deal with objects larger than region size?
> 
> Currently not particularily well: it doesn't move them at all (because they don't fit), and it _also_ doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.

Are you talking about parallel specific? Thanks.

-------------

PR Comment: https://git.openjdk.org/lilliput/pull/195#issuecomment-2783405252