Vectorizing POPCNT (Long.bitCount)

Stefan Reich stefan.reich.maker.of.eye at googlemail.com
Sat Sep 3 00:59:54 UTC 2022


Hey tthere,

I'd like to parallelize this function:

static int countDifferingBits(long[] array1, long[] array2) {
  int n = array1.length;
  int diff = 0;
  for (int i = 0; i < n; i++)
    diff += Long.bitCount(array1[i]^array2[i]);
  return diff;
}

The function calculates the difference between two shapes (pixel by pixel)
and is the central piece of an image recognition I am making.

Example for a bunch of shapes (these are all the shapes found in the
extended latin alphabet, in fact): https://botcompany.de/images/1103149
Example for a shape comparison: https://botcompany.de/images/1103151

The routine above is already mega-fast as is (thanks to HotSpot!), I can
compare two 256 byte arrays in 29 CPU cycles. That's less than one cycle
per 64 bit!

But I'd still like to try to vectorize it.

However, I think the bitCount function will prevent this because it doesn't
seem to exist in a vectorized form. Or does it? (That would be my main
question here.)

Many greetings,
Stefan

-- 
== Gaz.AI ==
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220903/019e7ac8/attachment-0001.htm>


More information about the panama-dev mailing list