UTF-8 validation using Vector API
Quân Anh Mai
anhmdq at gmail.com
Sun Feb 20 08:43:58 UTC 2022
Hi,
I have spent this weekend trying to make a UTF-8 validator using the Vector
API and the look-up algorithm proposed by Keiser and Lemire [1]. The
implementation is straightforward and the result looks good with the
throughput validating twitter.json reaching 9GB/s. For reference, the
throughput of my machine running simdjson [2] is around 20.5GB/s. Using the
Vector API and tunning the performance, there are some noticeable points I
can come up with:
1, The most significant anomaly is that the compiler seems to be not so
good at register management. Even after many attempts to reduce the number
of vectors living across iterations, there are still a lot of vectors being
spilled on the stack, this leads to a lot of load and store in a
latency-sensitive hot loop.
2, A rearrange operation is often used in look-up tables. However, the
current situation requires excessive range checks on every invocation.
Remove these range checks
(set jdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0) increases the
throughput by roughly 15%, resulting in more than 10GB/s. Furthermore,
currently, the wrapping is conducted upon the creations of the vector
shuffles. As a result, I propose adding a method rearrangeWithWrap that
takes an additional argument representing the wrapping modulus. If the
modulus is constant, it opens the ability to remove the costly range check
as well as more efficient rearrange implementations as rearrangement
requires O(n^2) connections and a wide rearrangement may be much more
costly than that on narrower vectors. For experiments, I replaced the
rearrangement with a cheaper operation in the lookup phase (the add
operation). This combined with the range check removal resulted in a nearly
40% increase in throughput which approached 13GB/s.
3, Slice operation should be improved. Given it takes several API points in
the Vector class, I believe we should implement it using a more efficient
manner than a general rearrange one.
4, Vectors loaded from constant vector objects are currently not hoisted
outside the loop, while easy to make ones (all zeros and all ones) are fine
creating on the fly.
The source code of my implementation can be found here [3], the files I
used for testing and benchmarking are taken from here [4].
Thanks a lot for your reading,
Quan Anh
[1]: [2010.03090] Validating UTF-8 In Less Than One Instruction Per Byte
(arxiv.org) <https://arxiv.org/abs/2010.03090>
[2]: simdjson/simdjson: Parsing gigabytes of JSON per second (github.com)
<https://github.com/simdjson/simdjson>
[3]: merykitty/utf8-validator: A high performance UTF-8 validator written
using Java Vector API (github.com)
<https://github.com/merykitty/utf8-validator>
[4]: AugustNagro/utf8.java: Vectorized UTF-8 Validation for Java
(github.com) <https://github.com/AugustNagro/utf8.java/>
More information about the panama-dev
mailing list