RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Fri Jan 12 10:56:19 UTC 2024
On Thu, 11 Jan 2024 10:06:54 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling.
>>
>> #### Testing
>>
>> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64).
>> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only).
>>
>> #### Performance and code size evaluation
>>
>> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect.
>
> src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp line 334:
>
>> 332: // seven more nodes (CallLeaf, control Proj, memory Proj, data Proj, Region,
>> 333: // memory Phi, data Phi).
>> 334: return uncolor_or_color_size + 12;
>
> I thought the runtime call does not lie inside the loop. Is it necessary to take them into account, too?
Conceptually, the runtime call belongs to the loop, even if it is laid out in the cold section of the method. The current unrolling heuristic counts all basic blocks in the loop, regardless of whether they are hot or cold and how they are arranged in the final code. This changeset does the same for consistency.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1448818759
More information about the hotspot-gc-dev
mailing list