RFR: 8338407: Support grouping several of existing regs into a new one

Sat Aug 31 10:48:18 UTC 2024

On Thu, 29 Aug 2024 15:11:32 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch to add `group` support to operand?
> 
> ### Some background about this pr
> 
> In some platforms, there is some concept like a group of registers, for example on riscv there is vector group, which is a group of other single vectors. For example, m2 could be v2+v3, or v4+v5, m4 could be v4+v5+v6+v7, or v8+v9+v10+v11.
> And, it's helpful to represent these vector group explicitly, otherwise it's tedious and error-prone. For example, in existing code, there's some like below:
> 
> instruct vstring_compareUL(iRegP_R11 str1, iRegI_R12 cnt1, iRegP_R13 str2, iRegI_R14 cnt2,
>                            iRegI_R10 result, vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7,
>                            vReg_V8 v8, vReg_V9 v9, vReg_V10 v10, vReg_V11 v11,
>                            iRegP_R28 tmp1, iRegL_R29 tmp2)
> // ...
>   effect(KILL tmp1, KILL tmp2, USE_KILL str1, USE_KILL str2, USE_KILL cnt1, USE_KILL cnt2,
>          TEMP v4, TEMP v5, TEMP v6, TEMP v7, TEMP v8, TEMP v9, TEMP v10, TEMP v11);
> // ...
>     __ string_compare_v($str1$$Register, $str2$$Register,
>                         $cnt1$$Register, $cnt2$$Register, $result$$Register,
>                         $tmp1$$Register, $tmp2$$Register,
>                         StrIntrinsicNode::UL);
> 
> The potential problems of the above code are that we need to
> 1. write v4~v11 explicitly in its `instruct` and its `effect`, it's tedious;
> 2. vector group are represented implicitly, which is not clear and error-prone;
> 3. in its encoding `string_compare_v`, we need to specify m4, and v4/v8 explicitly.
> 4. if some day we need to adjust from m4 to m2 or m8, it's really tedious and error-prone to make that change in both ad file and macro assembler files.
> 
> 
> ### This PR
> 
> The proposed solution is to represent a group of vector registers with a real vector group, e.g. `vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7` with `vReg_V4M4 v4m4`, `TEMP v4, TEMP v5, TEMP v6, TEMP v7` with `TEMP v4m4` and in `string_compare_v` implementation, we could query the length of of vector group (i.e. m4 in this case) and set its vtype automatically.
> This solution solve the above listed issues, especially the last issue, that means in the future if we need to adjust m4 to m2 or m8, we only need to change the code in ad file and the change is simpler, and no change in string_compare_v is needed.
> 
> ### What it looks like
> 
> For more usage details, please please check [here](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-gro...

Thanks for taking a look and suggestions!

> Are you trying to support arbitrary groups of vectors, or only aligned and sequential groups, like on riscv?

I'm trying supporting the latter.
Some more backgroud: On riscv, its vector only support scale one (that means it's vectorA in hotspot, length of one single vector register is dynamic, it could be 128/256/512... bits). The start vector reg in a vector group must be aligned to the length of vector group, e.g. if a vector group length is 8, then it can only be one of (v0-v7), (v8-v15), (v16-v23), (v24-v31).

> 
> > 1. One of them is to just add some new reg class and operand, it kindly worked, but can only prevent other regs in an instruct using the one of the vector regs in a vector group.
> 
> I don't see why this wouldn't work. As long as the register mask is correct, it should prevent/exclude all the vector regs in a vector group. I believe this is how arm32 implements vecD on the dflt_low_reg registers. 64-bit "D" vectors are composed of two adjacent 32-bit "S" registers.

(I could be wrong about VecD on arm below.)

[Here](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-group-v1) is the alternative implementation of "`just add some new reg class and operand`" mentioned above, please check changes in src/hotspot/cpu/riscv/riscv.ad and other related changes.
I think the reasons why this change (or the way you suggested above) does not work as expected are:
* vecD and vecA are different way to control vector register allocation in hotspot, vecD is fixed size, vecA is dynamic, so that vecD works does not directly mean vecA could work as well in the same way. (riscv only support vecA.) 
* with new reg_class and operand introduced (take v2m2_reg, vReg_V2_m2 as example), current code base only ensure vReg_V2_m2 will take one vector register from between v2 and v3, but will not ensure both v2 and v3 are allocated at the same time for it. So if an instruct in ad file has (..., vReg_V2_m2 v2m2, vReg vx, ...) as its inputs (and suppose both are `TEMP`), then there is a chance vx will finally be allocated at v2 or v3, which is unexpected. But on the other hand, if an instruct in ad file has (..., vReg vx, vReg vy, ...) as its inputs  (and suppose both are `TEMP`), then there is no chance vx will be same as vy.

Hope the above could answer your question, but it could also be that I'm doing some wrong things in [that implementation](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-group-v1).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20775#issuecomment-2322858439