Intel AMX and feature detection

Mon Jun 24 13:48:32 UTC 2024

.NET offers this sort of functionality: Arm64.IsSupported, Avx512Vbmi.IsSupported, Avx2.IsSupported, etc. We know that the JVM has to have this information, so exposing it ought to be feasible.

Lucene is obviously highly sophisticated and its maintainers are willing to do hard work... but for smaller projects, or for projects with people who are not as sophisticated... you need to make it easy to selectively activate the Vector API.

> Hi,
> 
> I agree fully about 2nd point. The vector API requires some feature detection, otherwise it is impossible to use it without the risk of a dramatic slowdown (40x with Graal or C1 only). In Apache Lucene we have support for the vector API, but according to some best guesses with parsing HotspotMXBeans command line flags, we decide which of the algorithms in Apache Lucene are delegated to the Panama vectorized implementation.
> 
> In addition, the FFM API is also damn slow once you enable Graal or disable C2 (e.g., client VM). So our code is a real spaghetti-code mess to detect if it is useful to switch to vectorized impls using Panama-Vector.
> 
> I am planning to submit a feature request about this. It would be good to get at least the actual maximum bitsize and which of the vector operators are supported (like masks, FMA,...). One problem is also that if C2 is disabled the  code returns default values for the maximum vectorsize/species.
> 
> Have a look at these code desasters:
> 
>  • https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L103-L139 (worst, it parses Hotspot flags and disables by inspecting system properties)
>  • https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73 (this is mostly OK)
>  • https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73 (here we have to use different implemntation depending on vector bitsize in default species,....
> Some of that code can't be avoided by some feature detection API, as we for example avoid Panama Vectors with FMA on Apple Silicon or avoid AVX-512 on some Intel/AMD Silicon, not sure what was the problem - slowness in some combinations, for sure.
> 
> Uwe
> 
> Am 17.06.2024 um 06:26 schrieb Andrii Lomakin:
>> Hi guys.
>> 
>> I have three  questions:
>> 
>> 1.   Do you plan to add support for Intel AMX instructions? According
>> to Intel reports, it can add 2-3 times speedup in deep learning model
>> inference.
>> 2. The next question follows from the first one. Even now, masks are
>> not supported in every architecture, but AFAIK, there is no way to
>> detect whether they are supported at runtime. Do you plan to provide a
>> so-called "feature detection" API?
>> 3. And the last question: even on older sets of commands, there are
>> some that use register values as masks, blending, for example. Will
>> those instructions be supported on architectures that do not support
>> masking registers per se?
>> 
> -- 
> Uwe Schindler
> uschindler at apache.org 
> ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
> Bremen, Germany
> https://lucene.apache.org/
> https://solr.apache.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240624/0302f222/attachment-0001.htm>