RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase

Axel Boldt-Christmas aboldtch at openjdk.org
Wed May 21 11:49:52 UTC 2025


On Wed, 21 May 2025 09:49:52 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots.
> 
> One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor).
> 
> While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered
> 
> While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. 
> 
> This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads.
> 
> The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR.
> 
> 
> 4 GB MaxHeapSize
>                     Original       Patch
> Default threads
> 
> mac:                0.27812        0.0507
> win:                0.9485         0.10452
> linux-x64:          0.53858        0.092
> linux-x64 NUMA:     0.89974        0.15452
> linux-aarch64:      0.32574        0.15832
> 
> 4 threads
> 
> mac:                0.19112        0.04916
> win:                0.83346        0.08796
> linux-x64:          0.57692        0.09526
> linux-x64 NUMA:     1.23684        0.17008
> linux-aarch64:      0.334          0.21918
> 
> 1 thread:
> 
> mac:                0.19678        0.0589
> win:                1.96496        0.09928
> linux-x64:          1.00788        0.1381
> linux-x64 NUMA:     2.77312        0.21134
> linux-aarch64:      0.63696        0.31286
> 
> 
> The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size (we don't do that anymore for JDK 25). It shows a quite significant difference, bu...

lgtm. Very nice to use information we are already tracking rather than walking everything.

> This change removes the last usage of ZIndexDistributor. I don't know if we want to remove it, or leave it in case we need it for any of our upcoming features.

It is probably nice too at least keep our page table iterators in the code base, so you do not have to go dig them up / do something ad hoc if you ever want to check something. Whether they need ZIndexDistributor or not is another question.

src/hotspot/share/gc/z/zGeneration.cpp line 952:

> 950: 
> 951: ZRemembered* ZGenerationYoung::remembered() {
> 952:   return  &_remembered;

Suggestion:

  return &_remembered;

src/hotspot/share/gc/z/zRemembered.cpp line 405:

> 403: 
> 404:   // This iterator uses the "found old" optimization.
> 405: bool ZRemsetTableIterator::next(ZRemsetTableEntry* entry_addr)  {

Suggestion:

bool ZRemsetTableIterator::next(ZRemsetTableEntry* entry_addr) {

src/hotspot/share/gc/z/zRemembered.cpp line 475:

> 473:       _remembered(remembered),
> 474:       _mark(mark),
> 475:       _remset_table_iterator(remembered, true /* previous */)  {

Suggestion:

      _remset_table_iterator(remembered, true /* previous */) {

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25345#pullrequestreview-2857188465
PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2099943312
PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2100063807
PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2100064432


More information about the hotspot-gc-dev mailing list