RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8]
Yuri Gaevsky
duke at openjdk.org
Mon May 5 18:12:49 UTC 2025
On Mon, 5 May 2025 10:17:27 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>>
>> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> change slli+add sequence to shadd
As you can expect I am trying to implement the following code with RVV:
for (; i + (N-1) < cnt; i += N) {
h = 31^^N * h
+ 31^^(N-1) * val[i + 0]
+ 31^^(N-2) * val[i + 1]
...
+ 31^^1 * val[i + (N-2)]
+ 31^^0 * val[i + (N-1)];
}
for (; i < cnt; i++) {
h = 31 * h + val[i];
}
where `N` is a number of processing array elements in "chunk".
IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`.
h = 31^^M * h
+ 31^^(M-1) * val[i + 0]
+ 31^^(M-2) * val[i + 1]
...
+ 31^^1 * val[i + (M-2)]
+ 32^^0 * val[i + (M-1)];
or returning to our `N` for clarity
h = 31^^(N-1) * h
+ 31^^(N-2) * val[i + 0]
+ 31^^(N-3) * val[i + 1]
...
+ 31^^1 * val[i + (N-3)]
+ 31^^0 * val[i + (N-2)];
Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2851905398
More information about the hotspot-compiler-dev
mailing list