RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]

Thu May 12 15:38:57 UTC 2022

On Thu, 12 May 2022 11:08:10 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> Looks like you are making great progress.
>> 
>> Have you thought about ways the intrinsic implementation might be simplified if some code is retained in Java and passed as constant arguments? e.g. table of constants, scalar loop, bounds checks etc, such that the intrinsic primarily focuses on the vectorized code. To some extent that's related to John's point on generalization, and through simplification there may be some generalization.
>> 
>> For example if there was a general intrinsic that returned a long value (e.g. first 32 bits are the offset in the array to continue processing, the second 32 bits are the current hashcode value) then we could call that from the Java implementations that then proceed with the scalar loop up to the array length. The Java implementation intrinsic would return `(0L | 1L << 32)`.
>> 
>> Separately it would be nice to consider computing the hash code from the contents of a memory segment, similar to how we added `mismatch` support, but the trick of returning a value that merges the offset and hash code would not work, unless we return the remaining elements to process and that guaranteed to be less than a certain value (or alternatively a relative offset value given an upper bound argument, and performing the intrinsic call in a loop until no progress can be made, which works better for safepoints).
>> 
>> The `long[]` hashcode is annoying given `(element ^ (element >>> 32))`, but if we simplify the intrinsic maybe we can add back that complexity?
>
> @PaulSandoz yes, keeping the "short" string part in pure Java and switching to an intrinsified/vectorized version for "long" strings is on the next avenue of exploration. I would also put the intrinsic as a runtime stub to avoid unnecessarily increase the size of the calling method unnecessarily. The overhead of the call would be amortised because it would only be called for longer strings anyway.
> 
> I haven't given much thoughts to how we could split up the different elements of the algorithm to generalise the approach just yet. I'll give it a try, see how far I can get with it, and keep you updated on my findings.

@luhenry ok, we took a similar approach to the mismatch intrinsic, carefully analyzing the threshold by which the intrinsic would be called.

My suggestion would be to follow that approach further and head towards an internal intrinsic perhaps with this signature:

@IntrinsicCandidate
static long hashCode(Class<T> eType, Object base, long offset, int length /* in bytes * /) {
  return 0 | (1L << 32); // or perhaps just return 0
}  
``` 

Then on a further iteration try and pass the polynomial constant and table of powers (stable array) as arguments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7700