Vector API / SIMD optimizations possible?

Mon Aug 8 15:28:34 UTC 2022

Hello,

I've found a method, which is called a lot of times and occupies roughly
20% CPU time according to YourKit when importing a 3,8 Gb JSON file if the
storage of DeweyIDs is enabled (which is the default currently):

https://github.com/sirixdb/sirix/blob/05fcb0ea4f989bcc90028213d7b1ef66e3bb9f20/bundles/sirix-core/src/main/java/org/sirix/node/SirixDeweyID.java#L507

And in there
https://github.com/sirixdb/sirix/blob/05fcb0ea4f989bcc90028213d7b1ef66e3bb9f20/bundles/sirix-core/src/main/java/org/sirix/node/SirixDeweyID.java#L455

Mainly

// calculate the rest of the bits
int rest = divisionSize - prefix.length;
for (int i = 1; i <= rest; i++) {
  int k = 1;
  k = k << rest - i;
  if (suffix >= k) {
    suffix -= k;
    byteArray[bitIndex / 8] |= (int) Math.pow(2, 7 - (bitIndex % 8));
  }
  bitIndex++;
}
return bitIndex;

I do not fully understand the code (as it's from BrackitDB), that's my main
problem in the first place. However, I wondered if it might be possible to
speed up the toBytes() method with the Vector-API maybe (or even other
tricks?). I'm storing the node labels optionally for each node in a JSON
tree:
http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/users/mathis/documents/HHM_SBBD.pdf

These are labels of the form 1, 1.17, 1.17.1.32 ... and they also encode
all ancestors. Furthermore, you can check if one node is in preorder before
another node, what's the common ancestor and stuff like that (so immensely
useful in some situations).

Kind regards
Johannes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220808/f662a58c/attachment.htm>