Some questions on intrinsic for UTF8 to UTF16 decoding
Ludovic Henry
luhenry at microsoft.com
Fri Nov 20 11:01:17 UTC 2020
Hi,
I've started to implement an intrinsic to vectorize the decoding (and soon encoding) of UTF-8 to UTF-16. My current work in progress is at [1]. However, I'm running into limitations in my knowledge of Hotspot, and I am seeking your advice and know-how.
The first thing I'm running into is how to pass parameters to the intrinsic by reference. AFAIK there is no way to do such a thing in Java code. My hope is then that when creating the call to the intrinsic in library_call.cpp, we can pass the address of the variable instead of the value. But I don't know if it's even possible, and if it is, how to do so.
The second thing I'm running into is not so much a technical limitation, but a question that, I am sure, is going to be raised during the review. This vectorization depends on a lookup table, but this lookup table can grow quite big (32768 elements, each of a size of 64-72 bytes, so ~2MB). I understand that this is much bigger than anything currently existing for any of the intrinsics, so I'm currently trying to figure out how I can reduce drastically the size of this table (compaction, lazy building, etc.). But first, I would like to hear your ideas as it may be an issue that was already faced in the past and for which a better solution was found.
Thank you,
--
Ludovic
[1] https://github.com/openjdk/jdk/compare/master...luhenry:vectorUTF8
More information about the hotspot-compiler-dev
mailing list