RFR: 8294729: [s390] Implement nmethod entry barriers

Tyler Steele tsteele at openjdk.org
Wed Oct 26 20:00:37 UTC 2022


On Tue, 25 Oct 2022 10:35:08 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> Please check the build log in the Pre-submit tests.

It looks like there are 2 issues in the pre-submit tests:

1. I copied a named register variable use without copying it's declaration. I see PPC used R22, which seems odd to me. Should I use a non-volatile register on s390 as well?
2. An import issue that complains about BarrierSetAssembler, but there are no c-style imports in s390.ad. How do I control the header files imported in .ad files?

I am also encountering an issue on my builds:


#  Internal Error (/home/ty/openjdk/jdk-current/src/hotspot/share/runtime/frame.cpp:1082), pid=1276093, tid=1276098
#  assert(Universe::heap()->is_in_or_null(r)) failed: bad receiver: 0x000003ffb48ebb60 (4396780796768)

Which happens after the jump to wrong_method_stub. I've seen errors here on and off throughout development, and I was really hoping that adding the missing pop_frame and restore_return_pc before the jump there was the missing piece.

> About the register usage

I think this is a good point. I've changed s390's nmethod_entry_barrier to take no tmp reg, and use R0 and R1 implicitly.

> Don't confuse C calling convention with Java calling convention!

I sometimes do make the error of thinking that registers have platform imposed restrictions on their behaviour. Thanks for pointing this out.

> src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49:
> 
>> 47: 
>> 48:   public:
>> 49:     static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes
> 
> Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr)

GUARD_INSTRUCTION_OFFSET is the offset to the beginning of the patchable instruction. So, I believe it should be: 6 (cfi) + 6 (larl) + 2 (bcr).

It may be worth renaming 'GUARD_INSTRUCTION_OFFSET' as I feel it's a bit confusing.

> src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 60:
> 
>> 58:       int32_t* data_addr = (int32_t*)get_patchable_data_address();
>> 59: 
>> 60: 
> 
> Extra empty line.

Removed. Thanks.

> src/hotspot/cpu/s390/s390.ad line 851:
> 
>> 849:   Compile* C = ra_->C;
>> 850:   C2_MacroAssembler _masm(&cbuf);
>> 851:   Register nmethod_tmp = Z_R3;
> 
> I guess we can't kill Z_R3, here.

I have removed the reference to Z_R3.

> src/hotspot/cpu/s390/s390.ad line 851:
> 
>> 849:   Compile* C = ra_->C;
>> 850:   C2_MacroAssembler _masm(&cbuf);
>> 851:   // Register nmethod_tmp = Z_R3;
> 
> Don't kill R3! I'd use R0 and R1 only.

I would guess this is because, even though it's a volatile register, R3 can be a parameter or return value. So in the context of the compiler, we really don't want to use this register?

BTW, I have now removed my changes to this file.

> src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1546:
> 
>> 1544: 
>> 1545:   BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler();
>> 1546:   bs->nmethod_entry_barrier(masm, Z_R3);
> 
> Don't kill R3! I'd use R0 and R1 only.

Thanks!

> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869:
> 
>> 2867:     // Save caller's sp & return_pc
>> 2868:     __ push_frame(frame::z_abi_16_size);
>> 2869:     __ save_return_pc();
> 
> Wrong order: return pc needs to get stored in the caller's frame header (before push_frame). See PPC64 or other usages on s390.

Thanks.

> Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame.

This matches what the PPC implementation does, but when I do the same thing on s390 I get a cache miss in nmethod_stub_entry_barrier (the vm-call). It looked as though CodeCache::find_blob expects the address of the start of the compiled code, so I tried subtracting the size of the barrier from R14 (which currently points to end of the barrier in the compiled frame). After doing this I no longer saw the CodeCache miss.

> I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs.

I had been trying to get the volatile registers saved, but didn't have any luck. I tried it today with your suggestions and it worked like a charm. Not sure what the difference was. Thanks for the pointers.

> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2880:
> 
>> 2878:     __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1
>> 2879:     __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP);        // R2 <- SP + abi_carg2
>> 2880:     // __ z_la(Z_ARG1, _z_abi(return_pc), Z_R0, Z_SP);
> 
> Offset needs to be computed relative to the callee's SP, here: _z_abi(return_pc) + frame::z_abi_160_size + nbytes_volatile

Got it. Thanks.

> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2890:
> 
>> 2888:     // if (return val != 0)
>> 2889:     // return to caller
>> 2890:     __ z_bcr(Assembler::bcondNotZero, Z_R14);
> 
> The condition is inverted! Should be `bcondZero`.

That appears to have solved it. Many thanks 😄

> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2894:
> 
>> 2892:     // if (return val != 0)
>> 2893:     // return to caller
>> 2894:     __ z_ltr(Z_R0_scratch, Z_R0_scratch);
> 
> This is redundant. Flags were already set above and not killed by restore_volatile_regs + pop_frame + restore_return_pc.

Agreed. Thanks for catching that.

> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2895:
> 
>> 2893:     // Get handle to wrong-method-stub for s390
>> 2894:     __ load_const_optimized(Z_R1_scratch, SharedRuntime::get_handle_wrong_method_stub());
>> 2895:     __ z_br(Z_R1_scratch);
> 
> Missing: Pop the frame built in the prologue and load the respective return_pc before the jump (see PPC64).

I saw that the PPC impl does that, and had been experimenting with it. But it always felt weird to call pop_frame twice here.

Thanks for confirming that it is necessary. I believe I have a better understanding of why this is needed.

-------------

PR: https://git.openjdk.org/jdk/pull/10558


More information about the hotspot-dev mailing list