RFR: 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2
Claes Redestad
redestad at openjdk.java.net
Thu Mar 10 10:18:38 UTC 2022
On Thu, 10 Mar 2022 07:55:16 GMT, Yi Yang <yyang at openjdk.org> wrote:
> 8272493 reports a minor regression when using Preconditions.checkIndex in String.checkIndex. The reason is some unnecessary vzeroupper instructions were emitted. The vzerouppers are introduced in [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934), which are emitted by clear_upper_avx within inline_preconditions_checkIndex. I did some digging into the history of this code. Please correct me if I misunderstand something
>
> [JDK-8178811](https://bugs.openjdk.java.net/browse/JDK-8178811) emits vzeroupper on every MachEpilogueNode to avoid AVX <-> SSE transition penalty during the call.
>
> [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934) emits vzeroupper on some MachEpilogueNode by setting clear_upper_avx flag, because vzeroupper itself is a high-cost instruction, we don't want to emit it everywhere a function is finished.
>
> [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493) emits vzeroupper because inline_preconditions_checkIndex sets clear_upper_avx flag.
>
> Micro benchmark are as follows
>
> -------Preconditions.checkIndex without clear_upper_avx
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.257 ± 0.011 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.251 ± 0.008 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.254 ± 0.003 ns/op
>
> -------Preconditions.checkIndex with clear_upper_avx(Current Implementation)
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.421 ± 0.003 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.419 ± 0.002 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.433 ± 0.044 ns/op
>
> ------- -XX:DisableIntrinsic=_Preconditions_checkIndex
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.229 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.224 ± 0.006 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.218 ± 0.011 ns/op
>
> ------- -XX:UseAVX=1
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.247 ± 0.022 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.234 ± 0.018 ns/op
>
> Benchmark Mode Cnt Score Error Units
> StringBuilders.charAtLatin1 avgt 15 6.261 ± 0.042 ns/op
>
> As I understand, inline_Preconditions_checkIndex only do some simple range check, there is no xmm(sse)/ymm(avx)
> registers involved, so I propose to remove clear_upper_avx flag to avoid emitting vzeroupper for this intrinsic.
IIRC the intent of the patch for [JDK-8190934](https://bugs.openjdk.java.net/browse/JDK-8190934) was to reduce the number of unnecessary vzeroupper instructions generated by the JIT by marking the compilation and emitting it as late as possible and (hopefully) only once per path. There was a short time when we emitted vzerouppers all over the place, and JDK-8190934 fixed most of the adverse effects from that. It's likely still a bit too conservative, but the alternative cost is high and might be elusive.
To me it looks like this patch fixes one place where it's definitely not needed. While the win is small in absolute terms in isolation, there are a lot of places where we want to use `Preconditions.checkIndex` and friends.
-------------
Marked as reviewed by redestad (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/7770
More information about the hotspot-compiler-dev
mailing list