[vectorIntrinsics] RFR: Add utf8 decoding benchmarks

Wed Nov 25 07:58:01 UTC 2020

On Tue, 24 Nov 2020 00:46:37 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

> Following discussions on the mailing list, I'm submitting three benchmarks around UTF-8 decoding:
>  - decode: uses a while-loop based implementation currently in use in the JDK
>  - decodeVector: uses a lookup table with vector operations for 1-3 bytes characters
>  - decodeVectorASCII: uses a simple vector operation to accelerate parsing ASCII-only characters
> 
> We don't observe the expected speedups with either decodeVector and decodeVectorASCII, so these are, I think, good test cases to further develop the Vector API.

Yes, let's figure out how to make LUT-based algorithms really "sing".

I wonder if there's any benefit to intrinsifying some or all of the steps between deriving a syndrome number and applying the corresponding selected shuffle(s).  In this example the steps are:  Do a `compare`, convert the comparison to a scalar bit mask (syndrome number), use it as a `get` key on a Java object, make some more indirections, grab a shuffle vector, and finally use it to steer the original data.  There's also bits of control flow interspersed.  That's a lot of stuff for the JIT to "see through".

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/26