RFR: 8319716: RISC-V: Add SHA-2
Robbin Ehn
rehn at openjdk.org
Tue Nov 21 09:02:07 UTC 2023
On Tue, 21 Nov 2023 08:47:37 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>>> Depending on hardware pipeline depth this load can actually be executed after "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1.
>>>
>>> Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel.
>>
>> Make sense. I was expecting those to retire when reaching the first round (round0).
>>
>>> So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading _should_ be better on bigger hardware.
>>
>> But I see that there is a true data dependence on the vector load for each round. Any thing I missed?
>> Say, for round2:
>>
>> // Quad-round 2 (+2, v12->v13->v10->v11)
>> __ vl1re32_v(v15, consts); ----> Define v15
>> __ addi(consts, consts, 16);
>> __ vadd_vv(v14, v15, v12); ----> Use v15
>
> __ vadd_vv(v14, v15, v11);
> <load can start> <<----------------------------------------------------------|
> __ vsha2cl_vv(v17, v16, v14); |
> __ vsha2ch_vv(v16, v17, v14); |
> __ vmerge_vvm(v14, v13, v12); |
> __ vsha2ms_vv(v11, v14, v10); // Generate W[23:20] |
> //-------------------------------------------------------------------------------- |
> // Quad-round 2 (+2, v12->v13->v10->v11) |
> __ vl1re32_v(v15, consts); ---------------------------------------------------------------
>
>
> No ?
If you consider register renaming load should be enable to start even earlier.
E.g. load into vX, then rename vX to v15 after the add that uses v15.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400231771
More information about the hotspot-dev
mailing list