Usage of the Vector API in java.base

Paul Sandoz paul.sandoz at oracle.com
Thu Nov 5 16:22:41 UTC 2020


Hi,

Thanks for the update. I think charset encoding/decoding is a good place to focus efforts. See also Daniel Lemire’s work [1], and in fact other areas he has worked on for a good source of inspiration.

It's difficult for me to comment on the specifics of lack of char support without looking more deeply at your prototype. However, I suspect one feature we are missing is the ability to load/store a ShortVector from/to a char[] array i.e. treat the signed lane element values as if they are unsigned for the purpose of the algorithm.

Within the JDK itself java.util.String stores characters in byte[], which might help avoid some of the API restrictions. There are also a bunch of intrinsics to support operating on the characters in byte[]. I suspect we could replace then with loops using the Vector API.

I would like to avoid adding Vector<Character> (or CharVector). I suspect we can lean on ShortVector for now, and use short values as the carrier for the 16 bits. Then later if possible with Valhalla consider unsigned integral types. Then deal with conversion of values between such integral types and char.

Paul.

[1] https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/

> On Nov 5, 2020, at 5:51 AM, Ludovic Henry <luhenry at microsoft.com> wrote:
> 
> Hi,
> 
> I have had little time to invest in that investigation, but I want to update you on what I have so far.
> 
> For the pieces of java.base that would benefit from vectorization, I've mostly focused on sun.nio.cs.UTF_8. I've looked at the literature for vectorized algorithms to encode and decode strings between UTF16 and UTF8 [1] [2] [3], and I've started to implement a prototype for [1] (it's the most open one in terms of licenses and it supports both encoding and decoding).
> 
> When implementing this algorithm, I've noted the lack of support for all primitive types (`char`, for example). It is especially impactful for UTF16/UTF8 conversion since it's all about manipulating bytes and chars. AFAIU, that is a known limitation of the current implementation since it can only support "value types" (primitive and inline types), and the general support for inline types (Valhalla) hasn't landed in the OpenJDK yet. However, since `char` is a primitive type, I wonder whether it has been explicitly and knowingly ignored and whether you are ok to add it as part of the existing Vector types (a CharVector with the accompanying types and intrinsics).
> 
> By looking at other runtimes (like .NET, for example), more places in java.base and other core libraries would benefit from vectorization. I am looking to invest some more in the Vector API in the upcoming future as part of my work, so if there is anything that seems particularly interesting to investigate to anyone, please let me know, and I'd be happy to look into it together.
> 
> Thank you,
> Ludovic
> 
> [1] https://urldefense.com/v3/__https://dirtyhandscoding.github.io/posts/utf8lut-vectorized-utf-8-converter-introduction.html__;!!GqivPVa7Brio!IoeJHQptGtB25hgtYsYPzzcNqGlnHZD7ZSJ1o52CEJHR9qI2Z4ra6mlHtP1wcLJr2A$ 
> [2] https://urldefense.com/v3/__https://dl.acm.org/doi/10.1145/1345206.1345222__;!!GqivPVa7Brio!IoeJHQptGtB25hgtYsYPzzcNqGlnHZD7ZSJ1o52CEJHR9qI2Z4ra6mlHtP2BqD7IOw$ 
> [3] https://urldefense.com/v3/__https://researcher.watson.ibm.com/researcher/files/jp-INOUEHRS/IPSJPRO2008_SIMDdecoding.pdf__;!!GqivPVa7Brio!IoeJHQptGtB25hgtYsYPzzcNqGlnHZD7ZSJ1o52CEJHR9qI2Z4ra6mlHtP0E8pu13g$ 
> 
> -----Original Message-----
> From: Paul Sandoz <paul.sandoz at oracle.com> 
> Sent: Saturday, 3 October 2020 01:25
> To: Ludovic Henry <luhenry at microsoft.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: Usage of the Vector API in java.base
> 
> Hi,
> 
> Perhaps a good location is within the embedded vector benchmarks project:
> 
> https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fpanama-vector*2Ftree*2FvectorIntrinsics*2Ftest*2Fjdk*2Fjdk*2Fincubator*2Fvector*2Fbenchmark&data=02*7C01*7Cluhenry*40microsoft.com*7C9e7b3334d6c549c5bcc408d8672aaab0*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637372780232005661&sdata=2nGjNNX8QvAd6dWwkC1XpwOtV*2BZImzZtoz7Hre2AhJU*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!IoeJHQptGtB25hgtYsYPzzcNqGlnHZD7ZSJ1o52CEJHR9qI2Z4ra6mlHtP0z00N3EA$ 
> 
> 
>> On Oct 2, 2020, at 1:07 PM, Ludovic Henry <luhenry at microsoft.com> wrote:
>> 
>> Hi Paul,
>> 
>>> That will help us evaluate. Is that something you are interested in pursuing?
>> 
>> I'm absolutely interested to do so. Where would be the best place to contribute to in the repo?
>> 
>>> In that spirit, I think what is valuable right now is identifying the use-cases in java.base and writing stand alone examples.
>> 
>> I very much agree. I'll start doing implementing some locally and submit a PR whenever I've anything substantial.
>> 
>> As for identifying what use-cases are valuable, I'll document what I have and sent it your way in this discussion. Does that work for you?
> 
> Yes, thanks.
> 
> 
>> 
>>> If the value of leveraging the Vector API functionality from within java.base is considered large enough, there is another option we could consider at a future point.
>> 
>> Learning from other runtimes who have vectorized some algorithms from their core libraries, there are some substantial gains to get. I'm interested in getting some more concrete numbers for the OpenJDK specifically.
>> 
>> For the integration to java.base, do you expect we are talking of being months away or years away?
>> 
> 
> More the later scenario. While I don’t anticipate making large functional changes to the API and will endeavor to preserve compatibility where possible Valhalla may well necessitate some breaking changes for the better (with generic specialization).
> 
> Paul.



More information about the panama-dev mailing list