RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators()
Aleksey Shipilev
shade at openjdk.org
Thu Oct 24 18:12:16 UTC 2024
On Thu, 24 Oct 2024 17:10:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free".
>
> It also looks like current initialization misses initializing the last element (at `C->unique()+1`).
>
> I'll put performance data in separate comment.
On various tests on x86_64, this gives me +1% faster runs in `-Xcomp` scenarios. Not very visible with "normal" amount of C2 compilations.
## HelloWorld, -Xcomp -XX:-TieredCompilation
# Before
Time (mean ± σ): 617.7 ms ± 2.5 ms [User: 584.6 ms, System: 31.5 ms]
Range (min … max): 614.2 ms … 624.5 ms 20 runs
# After
Time (mean ± σ): 611.3 ms ± 1.9 ms [User: 578.0 ms, System: 31.8 ms]
Range (min … max): 608.0 ms … 614.4 ms 20 runs
## JavacBenchApp 50, -XX:-TieredCompilation
# Before
Time (mean ± σ): 1.733 s ± 0.011 s [User: 3.074 s, System: 0.139 s]
Range (min … max): 1.719 s … 1.753 s 20 runs
# After
Time (mean ± σ): 1.727 s ± 0.011 s [User: 3.023 s, System: 0.144 s]
Range (min … max): 1.704 s … 1.751 s 20 runs
## JavacBenchApp 50, -Xcomp -XX:-TieredCompilation
# Before
Time (mean ± σ): 15.223 s ± 0.061 s [User: 15.048 s, System: 0.239 s]
Range (min … max): 15.152 s … 15.330 s 10 runs
# After
Time (mean ± σ): 15.080 s ± 0.021 s [User: 14.896 s, System: 0.239 s]
Range (min … max): 15.048 s … 15.115 s 10 runs
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21690#issuecomment-2435882250
More information about the hotspot-compiler-dev
mailing list