RFR: 8294729: [s390] Implement nmethod entry barriers
Tyler Steele
tsteele at openjdk.org
Wed Oct 26 20:00:37 UTC 2022
On Tue, 25 Oct 2022 10:35:08 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
> Please check the build log in the Pre-submit tests.
It looks like there are 2 issues in the pre-submit tests:
1. I copied a named register variable use without copying it's declaration. I see PPC used R22, which seems odd to me. Should I use a non-volatile register on s390 as well?
2. An import issue that complains about BarrierSetAssembler, but there are no c-style imports in s390.ad. How do I control the header files imported in .ad files?
I am also encountering an issue on my builds:
# Internal Error (/home/ty/openjdk/jdk-current/src/hotspot/share/runtime/frame.cpp:1082), pid=1276093, tid=1276098
# assert(Universe::heap()->is_in_or_null(r)) failed: bad receiver: 0x000003ffb48ebb60 (4396780796768)
Which happens after the jump to wrong_method_stub. I've seen errors here on and off throughout development, and I was really hoping that adding the missing pop_frame and restore_return_pc before the jump there was the missing piece.
> About the register usage
I think this is a good point. I've changed s390's nmethod_entry_barrier to take no tmp reg, and use R0 and R1 implicitly.
> Don't confuse C calling convention with Java calling convention!
I sometimes do make the error of thinking that registers have platform imposed restrictions on their behaviour. Thanks for pointing this out.
> src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49:
>
>> 47:
>> 48: public:
>> 49: static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes
>
> Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr)
GUARD_INSTRUCTION_OFFSET is the offset to the beginning of the patchable instruction. So, I believe it should be: 6 (cfi) + 6 (larl) + 2 (bcr).
It may be worth renaming 'GUARD_INSTRUCTION_OFFSET' as I feel it's a bit confusing.
> src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 60:
>
>> 58: int32_t* data_addr = (int32_t*)get_patchable_data_address();
>> 59:
>> 60:
>
> Extra empty line.
Removed. Thanks.
> src/hotspot/cpu/s390/s390.ad line 851:
>
>> 849: Compile* C = ra_->C;
>> 850: C2_MacroAssembler _masm(&cbuf);
>> 851: Register nmethod_tmp = Z_R3;
>
> I guess we can't kill Z_R3, here.
I have removed the reference to Z_R3.
> src/hotspot/cpu/s390/s390.ad line 851:
>
>> 849: Compile* C = ra_->C;
>> 850: C2_MacroAssembler _masm(&cbuf);
>> 851: // Register nmethod_tmp = Z_R3;
>
> Don't kill R3! I'd use R0 and R1 only.
I would guess this is because, even though it's a volatile register, R3 can be a parameter or return value. So in the context of the compiler, we really don't want to use this register?
BTW, I have now removed my changes to this file.
> src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1546:
>
>> 1544:
>> 1545: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler();
>> 1546: bs->nmethod_entry_barrier(masm, Z_R3);
>
> Don't kill R3! I'd use R0 and R1 only.
Thanks!
> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869:
>
>> 2867: // Save caller's sp & return_pc
>> 2868: __ push_frame(frame::z_abi_16_size);
>> 2869: __ save_return_pc();
>
> Wrong order: return pc needs to get stored in the caller's frame header (before push_frame). See PPC64 or other usages on s390.
Thanks.
> Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame.
This matches what the PPC implementation does, but when I do the same thing on s390 I get a cache miss in nmethod_stub_entry_barrier (the vm-call). It looked as though CodeCache::find_blob expects the address of the start of the compiled code, so I tried subtracting the size of the barrier from R14 (which currently points to end of the barrier in the compiled frame). After doing this I no longer saw the CodeCache miss.
> I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs.
I had been trying to get the volatile registers saved, but didn't have any luck. I tried it today with your suggestions and it worked like a charm. Not sure what the difference was. Thanks for the pointers.
> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2880:
>
>> 2878: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1
>> 2879: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2
>> 2880: // __ z_la(Z_ARG1, _z_abi(return_pc), Z_R0, Z_SP);
>
> Offset needs to be computed relative to the callee's SP, here: _z_abi(return_pc) + frame::z_abi_160_size + nbytes_volatile
Got it. Thanks.
> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2890:
>
>> 2888: // if (return val != 0)
>> 2889: // return to caller
>> 2890: __ z_bcr(Assembler::bcondNotZero, Z_R14);
>
> The condition is inverted! Should be `bcondZero`.
That appears to have solved it. Many thanks 😄
> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2894:
>
>> 2892: // if (return val != 0)
>> 2893: // return to caller
>> 2894: __ z_ltr(Z_R0_scratch, Z_R0_scratch);
>
> This is redundant. Flags were already set above and not killed by restore_volatile_regs + pop_frame + restore_return_pc.
Agreed. Thanks for catching that.
> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2895:
>
>> 2893: // Get handle to wrong-method-stub for s390
>> 2894: __ load_const_optimized(Z_R1_scratch, SharedRuntime::get_handle_wrong_method_stub());
>> 2895: __ z_br(Z_R1_scratch);
>
> Missing: Pop the frame built in the prologue and load the respective return_pc before the jump (see PPC64).
I saw that the PPC impl does that, and had been experimenting with it. But it always felt weird to call pop_frame twice here.
Thanks for confirming that it is necessary. I believe I have a better understanding of why this is needed.
-------------
PR: https://git.openjdk.org/jdk/pull/10558
More information about the hotspot-dev
mailing list