[jdk16] RFR: 8258703: Incorrect 512-bit vector registers restore on x86_32

Jie Fu jiefu at openjdk.java.net
Wed Dec 23 23:59:59 UTC 2020


On Wed, 23 Dec 2020 19:01:56 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Hi all,
>> 
>> Following tests fail on our AVX512 machines with x86_32:
>>   - compiler/runtime/Test7196199.java
>>   - compiler/runtime/safepoints/TestRegisterRestoring.java
>>   - compiler/vectorization/TestVectorsNotSavedAtSafepoint.java
>> 
>> The reason is that 512-bit registers (zmm0 ~ zmm7) are restored incorrectly.
>> 
>> Current restore logic for 512-bit registers includes:
>>   1) restore zmm[511..256] [1]
>>   2) restore zmm[255..128] [2]  <-- Wrong on AVX512 with avx512vl
>> 
>> On our AVX512 machine, Assembler::vinsertf128 [3] was called in step 2).
>> According to the Intel instruction set reference,  vinsertf128 just copies the lower half of zmm, which lost the upper half of zmm.
>>    VINSERTF128 (VEX encoded version)
>>    TEMP[255:0] <- SRC1[255:0]
>>    CASE (imm8[0]) OF
>>    0: TEMP[127:0]   <- SRC2[127:0]
>>    1: TEMP[255:128] <- SRC2[127:0]
>>    ESAC
>>    DEST <- TEMP
>> 
>> The fix just changes the order of the restore logic for 512-bit registers:
>>   1) restore zmm[255..128] 
>>   2) restore zmm[511..256] 
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] https://github.com/openjdk/jdk16/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L320
>> [2] https://github.com/openjdk/jdk16/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L326
>> [3] https://github.com/openjdk/jdk16/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L1463
>
> src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 326:
> 
>> 324:     // Restore upper half of YMM registers.
>> 325:     for (int n = 0; n < num_xmm_regs; n++) {
>> 326:       __ vinsertf128_high(as_XMMRegister(n), Address(rsp, n*16));
> 
> `dst` register is used as `src1`:
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L1459
> I would assume that it should copy all bits from `dst` first before inserting 128 bits from `src2` which is stack in this case.

The instruction reference says only src1[255..0] is copied.
http://ftp.neutrino.es/x86InstructionSet/VINSERTF128.html

So I would assume dst[511..256] will be unpredictable after VINSERTF128 is executed.

-------------

PR: https://git.openjdk.java.net/jdk16/pull/64


More information about the hotspot-compiler-dev mailing list