Memory Corruption in JDK-21 on x86_64 with AVX-512

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Sep 29 02:29:14 UTC 2023


Thank you, Sandhya, for your suggestion.

But even backing off such not small changes may introduce instability 
into release which is 2 weeks away.

Disabling avx512 is only temporary. It will be restored together with 
fix in next update in 3 months.

We will have release note about that so users will not be surprised by 
temporary performance lost.

And finally, I don't think a lot of people will immediately switch to 
JDK 21 and be affected by this temporary change.

This will give us time to work on fix and regression test without rush.

Regards,
Vladimir K

On 9/28/23 2:47 PM, Viswanathan, Sandhya wrote:
> I meant 9037 (https://github.com/openjdk/jdk/pull/9037). The PR was for SVE and not at all related to x86 architecture.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Viswanathan, Sandhya
> Sent: Thursday, September 28, 2023 2:25 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Carter Kozak <ckozak at ckozak.net>; Xiaohong Gong <Xiaohong.Gong at arm.com>; hotspot-compiler-dev at openjdk.org
> Cc: Tobias Hartmann <tobias.hartmann at oracle.com>; Bhateja, Jatin <Jatin.Bhateja at intel.com>
> Subject: RE: Memory Corruption in JDK-21 on x86_64 with AVX-512
> 
> Could we not back out 9307 instead, the offending PR causing the issue? Even the alternate fix discussed by Xiaohong is small. JDK 21 performance for many real-world applications on x86_64 will be much worse if we set AVX=2. This is not same as JDK 9.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Thursday, September 28, 2023 1:43 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Carter Kozak <ckozak at ckozak.net>; Xiaohong Gong <Xiaohong.Gong at arm.com>; hotspot-compiler-dev at openjdk.org
> Cc: Tobias Hartmann <tobias.hartmann at oracle.com>; Bhateja, Jatin <jatin.bhateja at intel.com>
> Subject: Re: Memory Corruption in JDK-21 on x86_64 with AVX-512
> 
> Hi,
> 
> After discussion we decided to postpone fix to 21.0.2.
> There is no much time left before 21.0.1 is released and we still don't have regression test and did not review or fully tested the fix.
> 
> But because of severity of the issue and to be safe we decided to set UseAVX == 2 by default even on AVX512 CPUs in 21.0.1. And add release note about this. It is temporary solution until next update release 21.0.2. We did similar thing (disabled avx512 instructions) in JDK 9.
> 
> I am also thinking about setting UseSVE to `false` by default on Aarch64 with SVE since it could be affected too. We don't have such machine so I can't verify the fix.
> 
> Meanwhile I will work on test and fix for JDK 22. And I hope Xiaohong will help, when he is back from break, with SVE testing.
> 
> Regards,
> Vladimir K
> 
> On 9/27/23 11:32 AM, Vladimir Kozlov wrote:
>> https://bugs.openjdk.org/browse/JDK-8317121
>>
>> On 9/27/23 11:19 AM, Vladimir Kozlov wrote:
>>> Officially jdk21.0.1 is closed for fixes but we are discussing this
>>> issue now.
>>>
>>> I verified the suggested by Xiaohong fix:
>>> retrun nullptr in StoreVectorNode::Ideal() and LoadVectorNode::Ideal().
>>>
>>> I will file a bug and follow the process.
>>>
>>> Regards,
>>> Vladimir K
>>>
>>> On 9/27/23 9:54 AM, Viswanathan, Sandhya wrote:
>>>> Copying Tobias.
>>>>
>>>>   From what I see JDK21 update schedule has a mid-October release
>>>> plan (https://wiki.openjdk.org/display/JDKUpdates/JDK+21u).
>>>>
>>>> Tobias, do you know the code freeze date for that. Hopefully, we
>>>> still have a couple of days left.
>>>>
>>>> At the minimum we should revert to return nullptr in both
>>>> LoadVectorMaskedNode::Ideal and StoreVectorMaskedNode::Ideal as a
>>>> workaround.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.org>
>>>> On Behalf Of Carter Kozak
>>>> Sent: Wednesday, September 27, 2023 7:26 AM
>>>> To: Xiaohong Gong <Xiaohong.Gong at arm.com>;
>>>> hotspot-compiler-dev at openjdk.org
>>>> Subject: Re: Memory Corruption in JDK-21 on x86_64 with AVX-512
>>>>
>>>> Thank you for such a quick diagnosis, I really appreciate it!
>>>>
>>>> While I don't have a good sense of the mapping between bug severity
>>>> and backport/release criteria, I suspect this issue may be a strong
>>>> candidate for an out-of-band jdk-21 release. I'm certainly biased
>>>> because I had invalid bytes silently persisted, and defer to folks
>>>> closer to the process.
>>>>
>>>> Best,
>>>> Carter Kozak
>>>>
>>>> On Wed, Sep 27, 2023, at 02:57, Xiaohong Gong wrote:
>>>>> Hi Carter,
>>>>> Thanks for reporting this issue!
>>>>> It seems the issue is caused by the calling of the super class’s
>>>>> “Ideal()” method in “LoadVectorMaskedNode::Ideal()” (code is:
>>>>> 8286941: Add mask IR for partial vector operations for ARM SVE by
>>>>> XiaohongGong · Pull Request #9037 · openjdk/jdk (github.com)
>>>>> <https://github.com/openjdk/jdk/pull/9037/files#diff-692826251cae892bc4737919579c6afbd317551cd507f99c7bd29d585c1282e2R1028>).
>>>>> Some optimization applied on “LoadNode” is not suitable for
>>>>> “LoadVectorMaskedNode” now. Reverting it back to “return nullptr”
>>>>> can make your benchmark pass.
>>>>> I will look at this issue and try to fix it next. Thanks!


More information about the hotspot-compiler-dev mailing list