RFR: 8294729: [s390] Implement nmethod entry barriers
Martin Doerr
mdoerr at openjdk.org
Wed Oct 26 20:00:36 UTC 2022
On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele <tsteele at openjdk.org> wrote:
> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform.
I've taken a quick look. Please find my change requests.
Please find my change requests.
Please fix the offset computation for Z_ARG1 and try your tests again.
LGTM. I think you can mark it ready for review if the tests are passing.
Found another missing piece.
Looks complete, now.
I just spotted the bug (see below).
About the register usage: Passing R0 as tmp while using R1 implicitly is inconsistent. I'd prefer e.g. using both regs implicitly and removing the tmp argument.
You were also wondering about usage of an nv reg on PPC64: Don't confuse C calling convention with Java calling convention! Java callers don't expect them to be preserved.
One more missing part: nmethod_entry_barrier + `C->output()->set_frame_complete(cbuf.insts_size());` at the end of C2 MachPrologNode::emit (s390.ad).
Please check the build log in the Pre-submit tests. Seems like an `include` is missing. Can probably get reproduced by building without precompiled headers (configure flag `--disable-precompiled-headers`).
1. Why don’t you use R0 like at all other places? You could even remove the argument and use R0 inside of `nmethod_entry_barrier`.
2. There’s a `source %{` section. PPC64 uses it to `#include "oops/klass.inline.hpp“`.
3. Looks like R2 got overwritten at some point. You should see more information in the hs_err file and figure out where it was called (C1 method, C2 method, native wrapper). For that, you could try switching off C1 by -XX:-TieredCompilation or C2 by -XX:TieredStopAtLevel=1.
src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 95:
> 93:
> 94: // Conditional Jump
> 95: __ z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); // 2 bytes
This is only a jump. We would need a call which sets Z_R14 = return address. It should be possible to set Z_R14 manually to the return address before the jump (z_lghrl). Alternatively, you could implement a stub like on x86_64.
src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49:
> 47:
> 48: public:
> 49: static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes
Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr)
src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 60:
> 58: int32_t* data_addr = (int32_t*)get_patchable_data_address();
> 59:
> 60:
Extra empty line.
src/hotspot/cpu/s390/s390.ad line 851:
> 849: Compile* C = ra_->C;
> 850: C2_MacroAssembler _masm(&cbuf);
> 851: Register nmethod_tmp = Z_R3;
I guess we can't kill Z_R3, here.
src/hotspot/cpu/s390/s390.ad line 851:
> 849: Compile* C = ra_->C;
> 850: C2_MacroAssembler _masm(&cbuf);
> 851: // Register nmethod_tmp = Z_R3;
Don't kill R3! I'd use R0 and R1 only.
src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1546:
> 1544:
> 1545: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler();
> 1546: bs->nmethod_entry_barrier(masm, Z_R3);
Don't kill R3! I'd use R0 and R1 only.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869:
> 2867: // Save caller's sp & return_pc
> 2868: __ push_frame(frame::z_abi_16_size);
> 2869: __ z_stmg(Z_R14, Z_R15, _z_abi16(callers_sp), Z_SP);
Please use `save_return_pc()`. Z_R15 = Z_SP is already saved by push_frame.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869:
> 2867: // Save caller's sp & return_pc
> 2868: __ push_frame(frame::z_abi_16_size);
> 2869: __ save_return_pc();
Wrong order: return pc needs to get stored in the caller's frame header (before push_frame). See PPC64 or other usages on s390.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2875:
> 2873: // We construct a pointer to the location of R14 stored above.
> 2874: __ z_xgr(Z_R2, Z_R2);
> 2875: __ z_ag(Z_R2, _z_abi(return_pc), 0, Z_SP);
Please use a better fitting instruction like z_la.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2876:
> 2874: __ z_lay(Z_R1_scratch, -32, Z_R0, Z_R14); // R1 <- R14 - 32
> 2875: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1
> 2876: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2
Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. (Don't generate a copy!) That matches _z_abi16(return_pc) + current frame size + Z_SP in the current frame at this point.
In addition, I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. I think they should get saved directly before you use Z_ARG1 for the return pc address and restored after the call_VM_leaf + z_ltr(Z_RET, Z_RET) which needs to get moved before the restoration. Note that this will need extra stack space: (5 + 8) * BytesPerWord
(See `MacroAssembler::verify_oop` for reference, but note that you don't need to include_flags which reduces complexity.)
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2880:
> 2878: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1
> 2879: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2
> 2880: // __ z_la(Z_ARG1, _z_abi(return_pc), Z_R0, Z_SP);
Offset needs to be computed relative to the callee's SP, here: _z_abi(return_pc) + frame::z_abi_160_size + nbytes_volatile
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2883:
> 2881:
> 2882: // Restore caller's sp & return_pc
> 2883: __ z_lmg(Z_R14, Z_R15, _z_abi(callers_sp), Z_SP);
Like above.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2883:
> 2881: // Restore caller's sp & return_pc
> 2882: __ restore_return_pc();
> 2883: __ pop_frame();
Like above.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2890:
> 2888: // if (return val != 0)
> 2889: // return to caller
> 2890: __ z_bcr(Assembler::bcondNotZero, Z_R14);
The condition is inverted! Should be `bcondZero`.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2894:
> 2892: // if (return val != 0)
> 2893: // return to caller
> 2894: __ z_ltr(Z_R0_scratch, Z_R0_scratch);
This is redundant. Flags were already set above and not killed by restore_volatile_regs + pop_frame + restore_return_pc.
src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2895:
> 2893: // Get handle to wrong-method-stub for s390
> 2894: __ load_const_optimized(Z_R1_scratch, SharedRuntime::get_handle_wrong_method_stub());
> 2895: __ z_br(Z_R1_scratch);
Missing: Pop the frame built in the prologue and load the respective return_pc before the jump (see PPC64).
-------------
Changes requested by mdoerr (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10558Marked as reviewed by mdoerr (Reviewer).
More information about the hotspot-dev
mailing list