<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>.NET offers this sort of functionality: Arm64.IsSupported, Avx512Vbmi.IsSupported, Avx2.IsSupported, etc. We know that the JVM has to have this information, so exposing it ought to be feasible.<br></div><div><br></div><div>Lucene is obviously highly sophisticated and its maintainers are willing to do hard work... but for smaller projects, or for projects with people who are not as sophisticated... you need to make it easy to selectively activate the Vector API.<br></div><div><br></div><div><br></div><div><br></div><blockquote type="cite" id="qt" style=""><p>Hi,<br></p><p>I agree fully about 2nd point. The vector API requires some
feature detection, otherwise it is impossible to use it without
the risk of a dramatic slowdown (40x with Graal or C1 only). In
Apache Lucene we have support for the vector API, but according to
some best guesses with parsing HotspotMXBeans command line flags,
we decide which of the algorithms in Apache Lucene are delegated
to the Panama vectorized implementation.<br></p><p>In addition, the FFM API is also damn slow once you enable Graal
or disable C2 (e.g., client VM). So our code is a real
spaghetti-code mess to detect if it is useful to switch to
vectorized impls using Panama-Vector.<br></p><p>I am planning to submit a feature request about this. It would be
good to get at least the actual maximum bitsize and which of the
vector operators are supported (like masks, FMA,...). One problem
is also that if C2 is disabled the code returns default values
for the maximum vectorsize/species.<br></p><p>Have a look at these code desasters:<br></p><ul><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L103-L139">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L103-L139</a> (worst, it parses Hotspot flags and disables by inspecting
system properties)<br></li><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73</a> (this is mostly OK)<br></li><li><a class="qt-moz-txt-link-freetext" href="https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73">https://github.com/apache/lucene/blob/3ae59a9809d9239593aa94dcc23f8ce382d59e60/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java#L40-L73</a> (here we have to use different implemntation depending on vector
bitsize in default species,....<br></li></ul><p>Some of that code can't be avoided by some feature detection API,
as we for example avoid Panama Vectors with FMA on Apple Silicon
or avoid AVX-512 on some Intel/AMD Silicon, not sure what was the
problem - slowness in some combinations, for sure.<br></p><p>Uwe<br></p><div class="qt-moz-cite-prefix">Am 17.06.2024 um 06:26 schrieb Andrii
Lomakin:<br></div><blockquote type="cite" cite="mid:CAEjdjO+BZ0F5pMSVL2LfNmhyLPJsc3RZ6z04Ki80ESPSTcFmkw@mail.gmail.com"><pre class="qt-moz-quote-pre">Hi guys.
I have three questions:
1. Do you plan to add support for Intel AMX instructions? According
to Intel reports, it can add 2-3 times speedup in deep learning model
inference.
2. The next question follows from the first one. Even now, masks are
not supported in every architecture, but AFAIK, there is no way to
detect whether they are supported at runtime. Do you plan to provide a
so-called "feature detection" API?
3. And the last question: even on older sets of commands, there are
some that use register values as masks, blending, for example. Will
those instructions be supported on architectures that do not support
masking registers per se?
<br></pre></blockquote><pre class="qt-moz-signature" cols="72">--
Uwe Schindler
<a class="qt-moz-txt-link-abbreviated" href="mailto:uschindler@apache.org">uschindler@apache.org</a>
ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
Bremen, Germany
<a class="qt-moz-txt-link-freetext" href="https://lucene.apache.org/">https://lucene.apache.org/</a>
<a class="qt-moz-txt-link-freetext" href="https://solr.apache.org/">https://solr.apache.org/</a><br></pre></blockquote><div><br></div></body></html>