RFR: 8340241: RISC-V: Returns mispredicted

Fei Yang fyang at openjdk.org
Fri Oct 11 08:01:17 UTC 2024


On Tue, 8 Oct 2024 12:54:35 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hi, please consider.
> 
> RISC-V don't have dedicated call/ret instructions.
> Instead the registers used in the jal/jalr instructions determine if this is a JUMP or CALL/RET.
> The cpu have a return-address stack where it stores return addresses for prediction.
> There are two possible calling conventions: x1 and x5 (or using both for co-routines).
> This stack is updated according this table (from unpriv manual, 2.5.1. Unconditional Jumps) for JALR:
> 
> | rd is x1/x5 | rs1 is x1/x5 | rd=rs1 | RAS action
> | ------------- | ------------- | ------------- |------------- |
> |No | No | — | None|
> |No | Yes | — | Pop|
> |Yes | No | — | Push|
> |Yes | Yes | No | Pop, then push|
> |Yes | Yes | Yes | Push|
> 
> And additionally:
> "A JAL instruction should push the return address onto a return-address stack (RAS) only when rd is 'x1' or x5."
> 
> As the JDK is using x5/(t0) as main scratch all plains jumps are actually calls and calls are co-routine calls (push and pop).
> This causes performance issues as the predictions is often wrong.
> 
> Average time for 10 best iterations (VF2):
> | Benchmark | Baseline (ms) | RAS fixed (ms) | Diff |
> |-- | -- | -- | -- |
> |future-genetic | 22126.6 | 20461.8 | -7.52%|
> |akka-uct | 97119.6 | 97498 | 0.39%|
> |movie-lens | 82359.3 | 81009.2 | -1.64%|
> |scala-doku | 29246.1 | 24518.6 | -16.16%|
> |chi-square | 10207.3 | 10624.9 | 4.09%|
> |fj-kmeans | 55127.9 | 56169.1 | 1.89%|
> |finagle-http | 24845 | 24891.9 | 0.19%|
> |reactors | 97473.9 | 96655.5 | -0.84%|
> |dec-tree | 8322.99 | 8243.11 | -0.96%|
> |naive-bayes | 79249.1 | 76851.9 | -3.02%|
> |als | 52678 | 51245.9 | -2.72%|
> |par-mnemonics | 52237.4 | 53149.8 | 1.75%|
> |scala-kmeans | 2990.88 | 2992.14 | 0.04%|
> |philosophers | 9156.9 | 7754.5 | -15.32%|
> |log-regression | 7621.65 | 7540.85 | -1.06%|
> |gauss-mix | 9835.7 | 9396.25 | -4.47%|
> |mnemonics | 73087.3 | 69426.6 | -5.01%|
> |dotty | 10970.9 | 10719.1 | -2.30%|
> |finagle-chirper | 23386.1 | 23630.3 | 1.04%|
> |recursive fibonacci | 7338.56 | 5369.83 | **-26.83%**|
> 
> For some of workloads, e.g. call to small function in a loop, it really matters.
> 
> This patch blacklist x5(/t0) for JAL/JALR as we only use x1 calling convention.
> And changes all jumps to use x6(/t1) instead of x5(/t0).
> This patch was incrementally done, i.e. the first change removed the default t0.
> I visited all places makings jumps, to make sure t1 was available.
> Then changed to default t1 and removed argument in many cases.
> 
> Other approaches was tested, e.g. completely switch t0 <-> t1.
> This was much harder and more intr...

Hi, I witnessed performance improvement on other vendor's hardware too. Minor comments after a cursory look. Will take a more closer look. Thanks.

src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp line 97:

> 95:   // t0 and t1 are used as args in generate_exception_throw,
> 96:   // so use x18 as the tmp register for rt_call.
> 97:   __ rt_call(Runtime1::entry_for(stub_id), x18);

Can we use `x9` instead here? It's the first callee-save register in numbering.

src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 299:

> 297:   }
> 298:   if (entry_point != nullptr) {
> 299:     __ mv(t1, continuation);

I see `continuation` will be changed into `x9` for some switch cases, so maybe we can let `continuation` be `x9` and unconditionallly move `ra` into `x9` on entry. That would save the update of `continuation` and avoid use of `t1`, which I think will be cleaner.

src/hotspot/cpu/riscv/templateTable_riscv.cpp line 722:

> 720:   __ mv(x13, array);
> 721:   __ mv(t1, Interpreter::_throw_ArrayIndexOutOfBoundsException_entry);
> 722:   __ jr(t1);

Please also update preceding code comment: `// destroys x11, t0, t1`

-------------

PR Review: https://git.openjdk.org/jdk/pull/21406#pullrequestreview-2362130148
PR Review Comment: https://git.openjdk.org/jdk/pull/21406#discussion_r1796575024
PR Review Comment: https://git.openjdk.org/jdk/pull/21406#discussion_r1796548679
PR Review Comment: https://git.openjdk.org/jdk/pull/21406#discussion_r1796537523


More information about the hotspot-dev mailing list