RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts

Emanuel Peter epeter at openjdk.org
Thu May 1 06:32:50 UTC 2025


On Thu, 1 May 2025 06:17:44 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>>> @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct?
>> 
>> @eme64 Thanks for looking. Point form in attempt to be concise:
>> - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473)
>> - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). 
>> - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed.
>
>> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct?
>> 
>> @eme64 Thanks for looking. Point form in attempt to be concise:
>> 
>> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473)
>> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization).
>> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed.
> 
> Hi @vpaprotsk , @eme64,
> 
> Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index.
> 
> Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. 
> 
> 
> Best Regards,
> Jatin

@jatin-bhateja Thanks for reviewing!

@vpaprotsk I'm realdy to give the approval too, just want to run some internal testing first - please ping me again in 24h :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2844193698


More information about the hotspot-compiler-dev mailing list