Some questions on intrinsic for UTF8 to UTF16 decoding

Vladimir Kozlov vladimir.kozlov at oracle.com
Sat Nov 21 17:13:02 UTC 2020


Hi Ludovic,

On 11/20/20 3:01 AM, Ludovic Henry wrote:
> Hi,
> 
> I've started to implement an intrinsic to vectorize the decoding (and soon encoding) of UTF-8 to UTF-16. My current work in progress is at [1]. However, I'm running into limitations in my knowledge of Hotspot, and I am seeking your advice and know-how.
> 
> The first thing I'm running into is how to pass parameters to the intrinsic by reference. AFAIK there is no way to do such a thing in Java code. My hope is then that when creating the call to the intrinsic in library_call.cpp, we can pass the address of the variable instead of the value. But I don't know if it's even possible, and if it is, how to do so.

The simpliest solution is a new int[] array which holds locals var values which intrinsic can update and return.

An other solution (and more complex code in library_call.cpp) is to pass `src' and 'dst' to intrinsic and read/update 
their fields inside:

         private CoderResult decodeArrayLoop(ByteBuffer src,
                                             CharBuffer dst)
         {
             decodeArrayVectorized(src, dst);

             // This method is optimized for ASCII input.
             byte[] sa = src.array();
             int sp = src.arrayOffset() + src.position();
             int sl = src.arrayOffset() + src.limit();
             char[] da = dst.array();
             int dp = dst.arrayOffset() + dst.position();
             int dl = dst.arrayOffset() + dst.limit();


> 
> The second thing I'm running into is not so much a technical limitation, but a question that, I am sure, is going to be raised during the review. This vectorization depends on a lookup table, but this lookup table can grow quite big (32768 elements, each of a size of 64-72 bytes, so ~2MB). I understand that this is much bigger than anything currently existing for any of the intrinsics, so I'm currently trying to figure out how I can reduce drastically the size of this table (compaction, lazy building, etc.). But first, I would like to hear your ideas as it may be an issue that was already faced in the past and for which a better solution was found.

Yes, 2Mb is too much. And the problem is not size but affect on startup time - it is calculated dynamically.

An other issue with such intrinsic I see is that decodeArrayLoop() code has a lot of checks for malformed strings which 
intrinsic does not have. Most likely it will not pass JCK testing.

Would be interesting to see performance if you vectorize only ASCII copy loop which seems most common case and you don't 
need table:

             // ASCII only loop
             while (dp < dlASCII && sa[sp] >= 0)
                 da[dp++] = (char) sa[sp++];

I don't think C2 can auto-vectorize it because of sa[sp] >= 0 check. Intrinsic can return number of elements copied 
which can be used to update `sp` and `dp`.

Regards,
Vladimir

> 
> Thank you,
> 
> --
> Ludovic
> 
> [1] https://github.com/openjdk/jdk/compare/master...luhenry:vectorUTF8
> 


More information about the hotspot-compiler-dev mailing list