<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>.NET offers this sort of functionality: Arm64.IsSupported, Avx512Vbmi.IsSupported, Avx2.IsSupported, etc. We know that the JVM has to have this information, so exposing it ought to be feasible.<br></div><div><br></div><div>Lucene is obviously highly sophisticated and its maintainers are willing to do hard work... but for smaller projects, or for projects with people who are not as sophisticated... you need to make it easy to selectively activate the Vector API.<br></div><div><br></div><div><br></div><div><br></div><blockquote type="cite" id="qt" style=""><p>Hi,<br></p><p>I agree fully about 2nd point. The vector API requires some

      feature detection, otherwise it is impossible to use it without

      the risk of a dramatic slowdown (40x with Graal or C1 only). In

      Apache Lucene we have support for the vector API, but according to

      some best guesses with parsing HotspotMXBeans command line flags,

      we decide which of the algorithms in Apache Lucene are delegated

      to the Panama vectorized implementation.<br></p><p>In addition, the FFM API is also damn slow once you enable Graal

      or disable C2 (e.g., client VM). So our code is a real

      spaghetti-code mess to detect if it is useful to switch to

      vectorized impls using Panama-Vector.<br></p><p>I am planning to submit a feature request about this. It would be

      good to get at least the actual maximum bitsize and which of the

      vector operators are supported (like masks, FMA,...). One problem

      is also that if C2 is disabled the  code returns default values

      for the maximum vectorsize/species.<br></p><p>Have a look at these code desasters:<br></p><ul><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L103-L139">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L103-L139</a> (worst, it parses Hotspot flags and disables by inspecting

        system properties)<br></li><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73</a> (this is mostly OK)<br></li><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73</a> (here we have to use different implemntation depending on vector

        bitsize in default species,....<br></li></ul><p>Some of that code can't be avoided by some feature detection API,

      as we for example avoid Panama Vectors with FMA on Apple Silicon

      or avoid AVX-512 on some Intel/AMD Silicon, not sure what was the

      problem - slowness in some combinations, for sure.<br></p><p>Uwe<br></p><div class="qt-moz-cite-prefix">Am 17.06.2024 um 06:26 schrieb Andrii

      Lomakin:<br></div><blockquote type="cite" cite="mid:CAEjdjO+BZ0F5pMSVL2LfNmhyLPJsc3RZ6z04Ki80ESPSTcFmkw@mail.gmail.com"><pre class="qt-moz-quote-pre">Hi guys.

I have three  questions:

1.   Do you plan to add support for Intel AMX instructions? According

to Intel reports, it can add 2-3 times speedup in deep learning model

inference.

2. The next question follows from the first one. Even now, masks are

not supported in every architecture, but AFAIK, there is no way to

detect whether they are supported at runtime. Do you plan to provide a

so-called "feature detection" API?

3. And the last question: even on older sets of commands, there are

some that use register values as masks, blending, for example. Will

those instructions be supported on architectures that do not support

masking registers per se?

<br></pre></blockquote><pre class="qt-moz-signature" cols="72">-- 

Uwe Schindler

<a class="qt-moz-txt-link-abbreviated" href="mailto:uschindler@apache.org">uschindler@apache.org</a> 

ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr

Bremen, Germany

<a class="qt-moz-txt-link-freetext" href="https://lucene.apache.org/">https://lucene.apache.org/</a>

<a class="qt-moz-txt-link-freetext" href="https://solr.apache.org/">https://solr.apache.org/</a><br></pre></blockquote><div><br></div></body></html>