RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

Mon Nov 14 17:51:38 UTC 2022

On Sun, 13 Nov 2022 21:01:21 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> src/hotspot/share/opto/intrinsicnode.hpp line 175:
>> 
>>> 173:   // as well as adjusting for special treatment of various encoding of String
>>> 174:   // arrays. Must correspond to declared constants in jdk.internal.util.ArraysSupport
>>> 175:   typedef enum HashModes { LATIN1 = 0, UTF16 = 1, BYTE = 2, CHAR = 3, SHORT = 4, INT = 5 } HashMode;
>> 
>> I question the need for `LATIN1` and `UTF16` modes. If you lift some of input adjustments (initial value and input size) into JDK, it becomes indistinguishable from `BYTE`/`CHAR`.  Then you can reuse existing constants for basic types.
>
> UTF16 can easily be replaced with CHAR by lifting up the shift as you say, but LATIN1 needs to be distinguished from BYTE since the former needs unsigned semantics. Modeling in a signed/unsigned input is possible, but I figured we might as well call it UNSIGNED_BYTE and decouple it logically from String::LATIN1.

FTR `T_BOOLEAN` effectively represents unsigned byte.

-------------

PR: https://git.openjdk.org/jdk/pull/10847