RFR: 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2
Yi Yang
yyang at openjdk.java.net
Thu Mar 10 09:32:39 UTC 2022
On Thu, 10 Mar 2022 07:55:16 GMT, Yi Yang <yyang at openjdk.org> wrote:
> 8272493 reports a minor regression when using Preconditions.checkIndex in String.checkIndex. The reason is some unnecessary vzeroupper instructions were emitted. The vzerouppers are introduced in [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934), which are emitted by clear_upper_avx within inline_preconditions_checkIndex. I did some digging into the history of this code. Please correct me if I misunderstand something
>
> [JDK-8178811](https://bugs.openjdk.java.net/browse/JDK-8178811) emits vzeroupper on every MachEpilogueNode to avoid AVX <-> SSE transition penalty during the call.
>
> [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934) emits vzeroupper on some MachEpilogueNode by setting clear_upper_avx flag, because vzeroupper itself is a high-cost instruction, we don't want to emit it everywhere a function is finished.
>
> [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493) emits vzeroupper because inline_preconditions_checkIndex sets clear_upper_avx flag.
>
> Micro benchmark are as follows
>
> -------Preconditions.checkIndex without clear_upper_avx
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.257 ± 0.011 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.251 ± 0.008 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.254 ± 0.003 ns/op
>
> -------Preconditions.checkIndex with clear_upper_avx(Current Implementation)
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.421 ± 0.003 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.419 ± 0.002 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.433 ± 0.044 ns/op
>
> ------- -XX:DisableIntrinsic=_Preconditions_checkIndex
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.229 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.224 ± 0.006 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.218 ± 0.011 ns/op
>
> ------- -XX:UseAVX=1
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.247 ± 0.022 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.234 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.261 ± 0.042 ns/op
>
> As I understand, inline_Preconditions_checkIndex only do some simple range check, there is no xmm(sse)/ymm(avx)
> registers involved, so I propose to remove clear_upper_avx flag to avoid emitting vzeroupper for this intrinsic.
Thanks for the clarifation!
> Yes what I mean here is that even if we touch the YMM registers we still don't need to emit vzeroupper everywhere but on transition to codes not emitted by the hotspot assembler.
Yes, until we see a real AVX <-> SSE transition like cases you mentioned, we don't need to emit it
-------------
PR: https://git.openjdk.java.net/jdk/pull/7770
More information about the hotspot-compiler-dev
mailing list