RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size

Mon Jan 15 09:01:15 UTC 2024

On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling.
> 
> #### Testing
> 
> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64).
> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only).
> 
> #### Performance and code size evaluation
> 
> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%.

The latest changes exclude the barrier slow path from the loop size estimation, as suggested by @fisk (offline) and @merykitty. Compared to the original changeset, this makes loop unrolling for ZGC more aggressive at the expense of code size, which is deemed acceptable in the typical scenarios in which ZGC is used. Compared to mainline, the code size improvement is now reduced to a mere 0.3% for DaCapo `fop` only, but in return SPECjvm2008 `Serial` is sped up by 4%. Please review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1891614933