Usage of the Vector API in java.base

Fri Nov 13 12:54:18 UTC 2020

It's better with the link to the gist.

[2] https://gist.github.com/luhenry/dc84c661997d5bc1cd51c2088d234727

________________________________________
From: Ludovic Henry
Sent: Friday, November 13, 2020 13:52
To: Paul Sandoz
Cc: panama-dev at openjdk.java.net; openjdk-aarch64
Subject: RE: Usage of the Vector API in java.base

Hi,

> Thanks for the update. I think charset encoding/decoding is a good place to focus efforts. See also Daniel Lemire’s work [1], and in fact other areas he has worked on for a good source of inspiration.

I didn't find that in my research so I'll definitely take a look at it.

> It's difficult for me to comment on the specifics of lack of char support without looking more deeply at your prototype. However, I suspect one feature we are missing is the ability to load/store a ShortVector from/to a char[] array i.e. treat the signed lane element values as if they are unsigned for the purpose of the algorithm.
>
> Within the JDK itself java.util.String stores characters in byte[], which might help avoid some of the API restrictions. There are also a bunch of intrinsics to support operating on the characters in byte[]. I suspect we could replace then with loops using the Vector API.
>
> I would like to avoid adding Vector<Character> (or CharVector). I suspect we can lean on ShortVector for now, and use short values as the carrier for the 16 bits. Then later if possible with Valhalla consider unsigned integral types. Then deal with conversion of values between such integral types and char.

You're absolutely right that the need for CharVector could be replaced by a ShortVector which can load/store from/to a char[]. The need to operate on a Short/Char is to operate on 2 bytes at a time. The algorithm I've looked at for a prototype [1] doesn't support 3-bytes UTF-8 characters but that is rare enough that we can fall back to the "slow path" which is the current implementation.

I've uploaded an implementation of the algorithm at [2]. It's incomplete because I, unfortunately, lost my progress from some weeks back as I changed of computer as part of a move. I'm slowly but surely getting it back to where I was. I've not gotten to a stage where I could test the algorithm itself but it compiles. I'll keep updating this gist with my latest changes, and I'll ping this mailing list as soon as I have something more substantial.

Ludovic

[1] https://dirtyhandscoding.github.io/posts/utf8lut-vectorized-utf-8-converter-introduction.html