RFR: 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2
Tobias Hartmann
thartmann at openjdk.java.net
Thu Mar 10 11:16:48 UTC 2022
On Thu, 10 Mar 2022 07:55:16 GMT, Yi Yang <yyang at openjdk.org> wrote:
> 8272493 reports a minor regression when using Preconditions.checkIndex in String.checkIndex. The reason is some unnecessary vzeroupper instructions were emitted. The vzerouppers are introduced in [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934), which are emitted by clear_upper_avx within inline_preconditions_checkIndex. I did some digging into the history of this code. Please correct me if I misunderstand something
>
> [JDK-8178811](https://bugs.openjdk.java.net/browse/JDK-8178811) emits vzeroupper on every MachEpilogueNode to avoid AVX <-> SSE transition penalty during the call.
>
> [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934) emits vzeroupper on some MachEpilogueNode by setting clear_upper_avx flag, because vzeroupper itself is a high-cost instruction, we don't want to emit it everywhere a function is finished.
>
> [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493) emits vzeroupper because inline_preconditions_checkIndex sets clear_upper_avx flag.
>
> Micro benchmark are as follows
>
> -------Preconditions.checkIndex without clear_upper_avx
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.257 ± 0.011 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.251 ± 0.008 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.254 ± 0.003 ns/op
>
> -------Preconditions.checkIndex with clear_upper_avx(Current Implementation)
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.421 ± 0.003 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.419 ± 0.002 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.433 ± 0.044 ns/op
>
> ------- -XX:DisableIntrinsic=_Preconditions_checkIndex
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.229 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.224 ± 0.006 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.218 ± 0.011 ns/op
>
> ------- -XX:UseAVX=1
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.247 ± 0.022 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.234 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.261 ± 0.042 ns/op
>
> As I understand, inline_Preconditions_checkIndex only do some simple range check, there is no xmm(sse)/ymm(avx)
> registers involved, so I propose to remove clear_upper_avx flag to avoid emitting vzeroupper for this intrinsic.
Looks good to me. I'm not sure why `clear_upper_avx()` was set there.
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/7770
More information about the hotspot-compiler-dev
mailing list