RFR: 8296545: C2 Blackholes should allow load optimizations
Aleksey Shipilev
shade at openjdk.org
Tue Nov 8 22:12:22 UTC 2022
If you look at generated code for the JMH benchmark like:
public class ArrayRead {
@Param({"1", "100", "10000", "1000000"})
int size;
int[] is;
@Setup
public void setup() {
is = new int[size];
for (int c = 0; c < size; c++) {
is[c] = c;
}
}
@Benchmark
public void test(Blackhole bh) {
for (int i = 0; i < is.length; i++) {
bh.consume(is[i]);
}
}
}
...then you would notice that the loop always re-reads `is`, `is.length`, does the range check, etc. -- all the things we would otherwise expect to be hoisted out of the loop.
This is because C2 blackholes are modeled as membars that pinch both control and memory slices (like you would expect from the opaque non-inlined call), therefore every iteration has to re-read the referenced memory contents and recompute everything dependent on those loads. This behavior is not new -- the old, non-compiler blackholes were doing the same thing, accidentally -- but it was drowned in blackhole overheads. Now, these effects are clearly visible.
We can try to do this a bit better: allow load optimizations to work across the blackholes, leaving only "prevent dead code elimination" part, as minimally required by blackhole semantics.
Motivational improvements on the test above:
Benchmark (size) Mode Cnt Score Error Units
# Before, full Java blackholes
ArrayRead.test 1 avgt 9 5.422 ± 0.023 ns/op
ArrayRead.test 100 avgt 9 460.619 ± 0.421 ns/op
ArrayRead.test 10000 avgt 9 44697.909 ± 1964.787 ns/op
ArrayRead.test 1000000 avgt 9 4332723.304 ± 2791.324 ns/op
# Before, compiler blackholes
ArrayRead.test 1 avgt 9 1.791 ± 0.007 ns/op
ArrayRead.test 100 avgt 9 114.103 ± 1.677 ns/op
ArrayRead.test 10000 avgt 9 8528.544 ± 52.010 ns/op
ArrayRead.test 1000000 avgt 9 1005139.070 ± 2883.011 ns/op
# After, compiler blackholes
ArrayRead.test 1 avgt 9 1.686 ± 0.006 ns/op ; ~1.1x better
ArrayRead.test 100 avgt 9 16.249 ± 0.019 ns/op ; ~7.0x better
ArrayRead.test 10000 avgt 9 1375.265 ± 2.420 ns/op ; ~6.2x better
ArrayRead.test 1000000 avgt 9 136862.574 ± 1057.100 ns/op ; ~7.3x better
`-prof perfasm` shows the reason for these improvements clearly:
Before:
↗ 0x00007f0b54498360: mov 0xc(%r12,%r10,8),%edx ; range check 1
7.97% │ 0x00007f0b54498365: cmp %edx,%r11d
1.27% │ 0x00007f0b54498368: jae 0x00007f0b5449838f
│ 0x00007f0b5449836a: shl $0x3,%r10
0.03% │ 0x00007f0b5449836e: mov 0x10(%r10,%r11,4),%r10d ; get "is[i]"
7.76% │ 0x00007f0b54498373: mov 0x10(%r9),%r10d ; restore "is"
0.24% │ 0x00007f0b54498377: mov 0x3c0(%r15),%rdx ; safepoint poll, part 1
17.48% │ 0x00007f0b5449837e: inc %r11d ; i++
0.17% │ 0x00007f0b54498381: test %eax,(%rdx) ; safepoint poll, part 2
53.26% │ 0x00007f0b54498383: mov 0xc(%r12,%r10,8),%edx ; loop index check
4.84% │ 0x00007f0b54498388: cmp %edx,%r11d
0.31% ╰ 0x00007f0b5449838b: jl 0x00007f0b54498360
After:
↗ 0x00007fa06c49a8b0: mov 0x2c(%rbp,%r10,4),%r9d ; stride read
19.66% │ 0x00007fa06c49a8b5: mov 0x28(%rbp,%r10,4),%edx
0.14% │ 0x00007fa06c49a8ba: mov 0x10(%rbp,%r10,4),%ebx
22.09% │ 0x00007fa06c49a8bf: mov 0x14(%rbp,%r10,4),%ebx
0.21% │ 0x00007fa06c49a8c4: mov 0x18(%rbp,%r10,4),%ebx
20.19% │ 0x00007fa06c49a8c9: mov 0x1c(%rbp,%r10,4),%ebx
0.04% │ 0x00007fa06c49a8ce: mov 0x20(%rbp,%r10,4),%ebx
24.02% │ 0x00007fa06c49a8d3: mov 0x24(%rbp,%r10,4),%ebx
0.21% │ 0x00007fa06c49a8d8: add $0x8,%r10d ; i += 8
│ 0x00007fa06c49a8dc: cmp %esi,%r10d
0.07% ╰ 0x00007fa06c49a8df: jl 0x00007fa06c49a8b0
Additional testing:
- [x] Eyeballing JMH Samples `-prof perfasm`
- [x] Linux x86_64 fastdebug, `compiler/blackhole`, `compiler/c2/irTests/blackhole`
- [x] Linux x86_64 fastdebug, JDK benchmark corpus
-------------
Commit messages:
- Fix
Changes: https://git.openjdk.org/jdk/pull/11041/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11041&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8296545
Stats: 128 lines in 3 files changed: 127 ins; 1 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/11041.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11041/head:pull/11041
PR: https://git.openjdk.org/jdk/pull/11041
More information about the hotspot-compiler-dev
mailing list