RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2]

Tue Jan 16 10:28:24 UTC 2024

On Mon, 15 Jan 2024 09:01:14 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling.
>> 
>> #### Testing
>> 
>> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64).
>> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only).
>> 
>> #### Performance and code size evaluation
>> 
>> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%.
>
> Roberto Castañeda Lozano has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update copyright years
>  - Exclude size of slow path from estimation

src/hotspot/share/opto/loopTransform.cpp line 1003:

> 1001:   // Also count ModL, DivL, MulL, and other nodes that expand mightly
> 1002:   for (uint k = 0; k < _body.size(); k++) {
> 1003:     Node* n = _body.at(k);

A lot of functions here are used to do the same thing (that is to estimate the size of a node), I think we should consolidate them, and use a specified value such as number of machine instructions instead. Maybe that could be done later?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1453225329