Intel AMX and feature detection

Thu Jun 27 23:21:10 UTC 2024

On Skylake we restrict the JVM to AVX=2 overall (no AVX 512 ISA) and user can override with UseAVX=3.
On Cascade Lake we only restrict the auto vectorizer to 256-bit vector width with AVX 512 ISA, and also allow the intrinsics and Vector API to benefit from 512-bit vector width. FMA is expensive but not every vector instruction is and when the user explicitly uses IntVector.SPECIES_512 and they would expect to get 512-bit code generation, won't they?  Also, SPECIES_PREFERRED and SPECIES_MAX for an element type are interconnected. So, lot of questions on how to handle everything if we do want to restrict SPECIES_PREFERRED.
As Paul mentioned we need an override anyway, so then the thought came to mind if the existing option (MaxVectorSize=32) would work, then why not use the override other way and then we don’t have to go into all the complications.

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Thursday, June 27, 2024 3:55 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Paul Sandoz <paul.sandoz at oracle.com>; John Rose <john.r.rose at oracle.com>
Cc: Uwe Schindler <uschindler at apache.org>; panama-dev at openjdk.org
Subject: Re: Intel AMX and feature detection

> It is possible do the “down bit’ing” today by setting -XX:MaxVectorSize=32 on JVM command line. This sets the preferred species to 256-bit vector size with AVX-512 ISA.  Would that work by any chance? Or I guess that is not what we want ...

HotSpot JVM already does that: by default, AVX512 is not used on Skylake CPUs [1] even though they support AVX512F et all  (unless user explicitly specifies -XX:UseAVX=3). Maybe the JVM should be more aggressive, hard to say. (It'll affect Cascade Lake.) The choice is not specific to Vector API, but affects the whole JVM (in particular, VM intrinsics and C2 auto-vectorizer).

Best regards,
Vladimir Ivanov

[1]
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L1005

> -----Original Message-----
> From: panama-dev <panama-dev-retn at openjdk.org> On Behalf Of Paul 
> Sandoz
> Sent: Thursday, June 27, 2024 7:59 AM
> To: John Rose <john.r.rose at oracle.com>
> Cc: Uwe Schindler <uschindler at apache.org>; panama-dev at openjdk.org
> Subject: Re: Intel AMX and feature detection
> 
> 
> 
>> On Jun 26, 2024, at 3:55 PM, John Rose <john.r.rose at oracle.com> wrote:
>>
>> Actually we have VectorShape.S_64_BIT which, if it is the preferred 
>> shape, is really telling you the “vector” processing is inside the 
>> CPU not the VPU.
>> That’s a good-enough hint to avoid the Vector API, right?
> 
> Yes, I think that is a good proxy for lack of any compiler support.
> 
> I would like to hear opinions from the Intel’s folks on “down bit’ing” from 512 to 256 on AVX-512 without VBMI2. It seems pragmatic. What about the auto-vectorizer? We also have HotSpot flags to say uses AVX2 on an AVX-512 machine as a workaround. If we do this perhaps we need a flag to override the “down bit’ing”.
> 
> I will log issues for both.
> 
> Paul.