RFR: 8320206: Some intrinsics/stubs missing vzeroupper on x86_64
Vladimir Kozlov
kvn at openjdk.org
Fri Nov 17 16:00:32 UTC 2023
On Fri, 17 Nov 2023 01:45:16 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Okay. Then if intrinsic stub is called only from compiled code you don't need `vzeroupper`. You only need it if intrinsics are called from Interpreter or runtime. Which are `crc32`, `crc32c`, `float16` intrinsics [templateInterpreterGenerator.cpp#L472](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L472)
>>
>> Arraycopy stubs could be used by VM's runtime, I think. At least they are called from `test_arraycopy_func`.
>>
>> I may forgot something about intrinsics. Why we need `vzeroupper` on all intrinsics exit?
>>
>> There was actually the issue with `vzeroupper` called in intrinsics: [JDK-8078113](https://bugs.openjdk.org/browse/JDK-8078113).
>> Then there was [JDK-8178811](https://bugs.openjdk.org/browse/JDK-8178811) and followup [JDK-8190934](https://bugs.openjdk.org/browse/JDK-8190934)
>>
>> There are a lot of places in VM currently where `vzeroupper` and it is a mess but we need to clean it up to clear state where we should use it. May be add comments in all places where it is called to state why it is called there.
>
> @vnkozlov
> Currently in the stock JVM we are generating vzeroupper at the end of a stub and at the end of a C2 jitted method only if it has larger than 128-bit vector instructions. For a C2 jitted method, this could be either due to auto vectorization or due to inline intrinsics. The clear_upper_avx() only marks that the method has larger vectors. The vzeroupper is generated in the method epilog when the marker is found set for the method.
>
> This PR is not deviating from that scheme.
>
> We settled on this scheme based during our discussion on [JDK-8178811](https://bugs.openjdk.org/browse/JDK-8178811) and its followup [JDK-8190934](https://bugs.openjdk.org/browse/JDK-8190934). The discussion thread for prior is at https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2017-April/026049.html. As part of JDK-8190934 we restricted vzeroupper generation in the epilog of c2 jitted method from always to only when larger vectors are used in the method. This had resolved any over generation of vzeroupper.
@sviswa7 please, file followup RFEs based on my and Vladimir's Ivanov comments.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16678#issuecomment-1816681479
More information about the hotspot-compiler-dev
mailing list