Vector API : Lack of support for vectorized lookup tables

Daniel Lemire daniel at lemire.me
Wed Jun 19 16:17:46 UTC 2024


When parsing strings with SIMD instructions, vectorized table lookup like vtbl (ARM NEON) are important. They are cheap (often run in 1 cycle) and powerful.


Though the details depend on the exact instruction set, the general idea is that you provide a 16-byte table, and a vector with indexes. If the indexes are in the range [0,16), then the byte is retrieved from the 16-byte table. When programming in C#, you can call them directly: e.g., Ssse3.Shuffle or AdvSimd.Arm64.VectorTableLookup. Google relies on Highway (a C++ framework) which offers TableLookupBytes for this purpose.

You can use these instructions to validate and transcode Unicode, to parse DNS records, faster regular expression parsers, base64 codecs, cryptographic hash functions and so forth. The applications are almost endless (see reference at the end). These instructions are effectively ubiquitous in 2024. On x64 systems, SSSE3 and its pshufb instruction (equivalent to ARM's vtbl) are increasingly assumed as a requirement (Windows 11, RedHat, etc.). For example, it is used in Chromium (arguably the most important Web engine) to parse HTML quickly:
https://chromium-review.googlesource.com/c/chromium/src/+/5538407

The .NET runtime uses these instructions, calling them from C#: E.g., see their base64 encoder... (System/Buffers/Text/Base64Decoder.cs) In C#, we see fast tokenizers and parsers making use of these instructions.

Unfortunately, the Vector API in Java has no equivalent.

It may seem lie rearrange and selectFrom are related, but these are not vectorized lookups. And once compiled, they generate a long flow of instructions. It provides the same functionality, but without the performance. And, of course, the performance is the whole point of using something like the Vector API.

Overall, this lack of access to an important functionality simply cuts off important algorithmic optimizations from Java.



- Daniel


---
References:


Transcoding Billions of Unicode Characters per Second with SIMD Instructions
Software: Practice and Experience 52 (2), 2022
https://arxiv.org/abs/2109.10433

Validating UTF-8 In Less Than One Instruction Per Byte
Software: Practice and Experience 51 (5), 2021
https://arxiv.org/abs/2010.03090

Faster Base64 Encoding and Decoding using AVX2 Instructions
ACM Transactions on the Web 12 (3), 2018
https://arxiv.org/abs/1704.00605

Parsing Gigabytes of JSON per Second
VLDB Journal 28 (6), 2019
https://arxiv.org/abs/1902.08318



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240619/3e594bfe/attachment-0001.htm>


More information about the panama-dev mailing list