RFR: 8346194: Improve G1 pre-barrier C2 cost estimate

Roberto Castañeda Lozano rcastanedalo at openjdk.org
Thu Mar 6 09:05:55 UTC 2025


On Mon, 3 Mar 2025 12:30:23 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   please review this change that modifies pre-barrier node costs for loop-unrolling to only consider the fast path. The reasoning is similar to zgc (and the new costs as well): only the part of the barrier inlined into the main code stream, as the slow path is laid out separately and does/should not directly affect performance (particularly if there is no marking going on).
> 
> There are no differences/impact in performance since the post barrier cost is still very large, which fill be fixed elsewhere.
> 
> Testing: gha, perf testing standalone (neither micros nor actual benchmarks give any difference outside of variance), testing with JDK-8342382
> 
> Hth,
>   Thomas

src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 298:

> 296:     // directly affect performance.
> 297:     // It has a cost of 4 (Cmp, Bool, If, IfProj).
> 298:     nodes += 4;

It probably does not make a big overall difference to the loop unrolling heuristics, but for better accuracy you might want to count in the nodes for loading the "active" byte (one node for computing the address relative to the thread-local storage base and one node for the load itself), i.e. 6 nodes in total rather than 4 (the `ThreadLocal` node representing the thread-local storage base is shared by other barrier operations so I would not count it as part of the pre-barrier fast path):

![pre-barrier-fast-path](https://github.com/user-attachments/assets/7718ec94-1612-44ac-b18c-78d962057ab6)


Suggestion:

    // It has a cost of 6 (AddP, LoadB, Cmp, Bool, If, IfProj).
    nodes += 6;

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23862#discussion_r1982961196


More information about the hotspot-dev mailing list