RFR: 8356159: RISC-V: Add Zabha [v4]

Wed May 21 02:22:53 UTC 2025

On Tue, 20 May 2025 12:37:06 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi, please consider.
>> 
>> This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha.
>> All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present.
>> 
>> The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways:
>> - lr.w/sc.w 'narrow' CAS (no extension)
>> - amocas.w 'narrow' CAS (Zacas)
>> - amocas.h/b (Zacas + Zabha)
>> 
>> There is no hwprobe support yet.
>> 
>> Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu.
>> 
>> Thanks, Robbin
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review fix

The performance issue is gone after the update. Several minor comments remain. Thanks.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4002:

> 4000: void MacroAssembler::cmpxchg(Register addr, Register expected,
> 4001:                              Register new_val,
> 4002:                              Assembler::operand_size size,

But the signature in the header file macroAssembler_riscv.hpp isn't updated?
And you might want to rename the same param for `cmpxchg_narrow_value_helper`, `cmpxchg_narrow_value`, `weak_cmpxchg_narrow_value`, `atomic_cas`, `load_reserved`, `store_conditional` as well.

src/hotspot/cpu/riscv/riscv.ad line 5232:

> 5230: // standard CompareAndSwapX when we are using barriers
> 5231: // these have higher priority than the rules selected by a predicate
> 5232: instruct compareAndSwapB_narrow(iRegINoSp res, indirect mem, iRegI_R12 oldval, iRegI_R13 newval,

You might want to update opto print at the same time. Similar for other newly-added instructs in this file.

  format %{
    "cmpxchg $mem, $oldval, $newval\t# (byte) if $mem == $oldval then $mem <-- $newval\n\t"
    "mv $res, $res == $oldval\t# $res <-- ($res == $oldval ? 1 : 0), #@compareAndSwapB"    <===========
  %}

src/hotspot/cpu/riscv/riscv.ad line 5265:

> 5263:   ins_cost(LOAD_COST + STORE_COST + ALU_COST * 10 + BRANCH_COST * 4);
> 5264: 
> 5265:   effect(TEMP_DEF res, USE_KILL oldval, USE_KILL newval, KILL cr);

It's similar to word and doubleword CompareAndSwap cases under UseZabha && UseZacas. We don't have this side effect for word and doubleword CompareAndSwap cases. So I don't think we need this effect either. Same for `compareAndSwapS`, `compareAndSwapBAcq`, and `compareAndSwapSAcq` and the weak CompareAndSwap variants.

src/hotspot/cpu/riscv/riscv.ad line 5638:

> 5636:   ins_cost(LOAD_COST + STORE_COST + BRANCH_COST * 3 + ALU_COST * 5);
> 5637: 
> 5638:   effect(TEMP_DEF res, KILL cr, USE_KILL oldval, USE_KILL newval);

It's similar to word and doubleword CompareAndExchange cases under UseZabha && UseZacas. We have a more simpler `effect(TEMP_DEF res)` for word and doubleword CompareAndExchange cases. So I think we can simplify this effect as well. Similar for `compareAndExchangeS`, `compareAndExchangeBAcq` and `compareAndExchangeSAcq`.

And after having another look at the code of `MacroAssembler::cmpxchg`, seems we don't even need this `effect(TEMP_DEF res)` for all these including the existing word and doubleword CompareAndExchange cases.

-------------

PR Review: https://git.openjdk.org/jdk/pull/25252#pullrequestreview-2855931591
PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2099100316
PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2099126525
PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2099119065
PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2099158261