RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8]

Mon May 5 18:12:49 UTC 2025

On Mon, 5 May 2025 10:17:27 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>> 
>> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change slli+add sequence to shadd

As you can expect I am trying to implement the following code with RVV:

for (; i + (N-1) < cnt; i += N) {
   h =   31^^N     * h 
       + 31^^(N-1) * val[i + 0] 
       + 31^^(N-2) * val[i + 1] 
	   ...
       + 31^^1 * val[i + (N-2)] 
       + 31^^0 * val[i + (N-1)];
}
for (; i < cnt; i++) {
   h = 31 * h + val[i];
}

where `N` is a number of processing array elements in "chunk".
IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`.

   h =   31^^M     * h 
       + 31^^(M-1) * val[i + 0] 
       + 31^^(M-2) * val[i + 1] 
	   ...
       + 31^^1 * val[i + (M-2)] 
       + 32^^0 * val[i + (M-1)];

or returning to our `N` for clarity

   h =   31^^(N-1)     * h 
       + 31^^(N-2) * val[i + 0] 
       + 31^^(N-3) * val[i + 1] 
	   ...
       + 31^^1 * val[i + (N-3)] 
       + 31^^0 * val[i + (N-2)];

Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2851905398