RFR: 8357533: ZGC: ZIndexDistributorClaimTree functions not inlined on Windows

Stefan Karlsson stefank at openjdk.org
Thu May 22 08:12:01 UTC 2025


While investigating a minor performance regression in a System.gc() microbenchmark I saw that the Concurrent Remap Roots phase took a significant portion of the GC cycle time.

It turns out that even though we use constexpr we don't get the inlining of the the functions as we expected and this results in a noticeable performance loss.

When running System.gc() with on old GC worker thread with a 16TB max heap size we spend around 300 ms just iterating over the page table and its indices via ZIndexDistributorClaimTree. If the inlining is fixed this drops down to ~70 ms, which is similar to what we see on Linux and MacOS.

We already have a patch out to remove the last usage of the ZIndexDistributor mechanism in https://github.com/openjdk/jdk/pull/25345, but ZIndexDistributor has less restrictions and is easier to use so we might want to keep it around (with fixed performance) so that it can be used for prototyping and maybe future features.

Noteworthy in the patch is the following function:

  static constexpr int level_multiplier(int level) {
    assert(level < ClaimLevels, "Must be");
    constexpr int array[ClaimLevels]{16, 16, 16};
    return array[level];
  }


When the last statement is changed to:

    constexpr int result = array[level];
    return result;


The MS compiler complains that the expression is not a constant. And if we get the correct inlining the warning goes away.

To get the required inlining I've changed the `level` parameter to be a template parameter and restructured the code to use template specialization instead of if-statements. This required some finagling with template classes to get partial specialization to work.

Some other changes in the patch:
* Limited includes of zIndexDistributor.inline.hpp
* Extracted the atomic inc into zfetch_then_inc for easier evaluation of the performance impact of our Atomic implementation vs relaxed std::atomic vs non-atomic updates.
* Hid the logging under a compile-time define. The logging is useful when changing/debugging this code, but otherwise it is not.

-------------

Commit messages:
 - 8357533: ZGC: ZIndexDistributorClaimTree functions not inlined on Windows

Changes: https://git.openjdk.org/jdk/pull/25385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25385&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8357533
  Stats: 410 lines in 9 files changed: 227 ins; 84 del; 99 mod
  Patch: https://git.openjdk.org/jdk/pull/25385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25385/head:pull/25385

PR: https://git.openjdk.org/jdk/pull/25385


More information about the hotspot-gc-dev mailing list