RFR: 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2

Yi Yang yyang at openjdk.java.net
Thu Mar 10 08:01:56 UTC 2022


8272493 reports a minor regression when using Preconditions.checkIndex in String.checkIndex. The reason is some unnecessary vzeroupper instructions were emitted. The vzerouppers are introduced in [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934), which are emitted by clear_upper_avx within inline_preconditions_checkIndex. I did some digging into the history of this code. Please correct me if I misunderstand something

[JDK-8178811](https://bugs.openjdk.java.net/browse/JDK-8178811) emits vzeroupper on every MachEpilogueNode to avoid AVX <-> SSE transition penalty during the call.

[JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934) emits vzeroupper on some MachEpilogueNode by setting clear_upper_avx flag, because vzeroupper itself is a high-cost instruction, we don't want to emit it everywhere a function is finished.

[JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493) emits vzeroupper because inline_preconditions_checkIndex sets clear_upper_avx flag.

Micro benchmark are as follows

-------Preconditions.checkIndex without clear_upper_avx
Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.257 ± 0.011 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.251 ± 0.008 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.254 ± 0.003 ns/op

-------Preconditions.checkIndex with clear_upper_avx(Current Implementation)
Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.421 ± 0.003 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.419 ± 0.002 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.433 ± 0.044 ns/op

------- -XX:DisableIntrinsic=_Preconditions_checkIndex
Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.229 ± 0.018 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.224 ± 0.006 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.218 ± 0.011 ns/op

------- -XX:UseAVX=1
Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.247 ± 0.022 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.234 ± 0.018 ns/op

Benchmark Mode Cnt Score Error Units
StringBuilders.charAtLatin1 avgt 15 6.261 ± 0.042 ns/op

As I understand, inline_Preconditions_checkIndex only do some simple range check, there is no xmm(sse)/ymm(avx) 
 registers involved, so I propose to remove clear_upper_avx flag to avoid emitting vzeroupper for this intrinsic.

-------------

Commit messages:
 - 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2

Changes: https://git.openjdk.java.net/jdk/pull/7770/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7770&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8272493
  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7770.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7770/head:pull/7770

PR: https://git.openjdk.java.net/jdk/pull/7770


More information about the hotspot-compiler-dev mailing list