Vectorizing POPCNT (Long.bitCount)
Stefan Reich
stefan.reich.maker.of.eye at googlemail.com
Sat Sep 3 00:59:54 UTC 2022
Hey tthere,
I'd like to parallelize this function:
static int countDifferingBits(long[] array1, long[] array2) {
int n = array1.length;
int diff = 0;
for (int i = 0; i < n; i++)
diff += Long.bitCount(array1[i]^array2[i]);
return diff;
}
The function calculates the difference between two shapes (pixel by pixel)
and is the central piece of an image recognition I am making.
Example for a bunch of shapes (these are all the shapes found in the
extended latin alphabet, in fact): https://botcompany.de/images/1103149
Example for a shape comparison: https://botcompany.de/images/1103151
The routine above is already mega-fast as is (thanks to HotSpot!), I can
compare two 256 byte arrays in 29 CPU cycles. That's less than one cycle
per 64 bit!
But I'd still like to try to vectorize it.
However, I think the bitCount function will prevent this because it doesn't
seem to exist in a vectorized form. Or does it? (That would be my main
question here.)
Many greetings,
Stefan
--
== Gaz.AI ==
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220903/019e7ac8/attachment-0001.htm>
More information about the panama-dev
mailing list