RFR: 8318157: RISC-V: implement ensureMaterializedForStackWalk intrinsic
Fei Yang
fyang at openjdk.org
Wed Nov 29 02:11:07 UTC 2023
On Tue, 28 Nov 2023 14:02:49 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> Please, review addition of the ensureMaterializedForStackWalk C2 intrinsic for RISC-V.
>> It's implemented by analogy to AArch64 and x86_64 [1].
>>
>> The benchmark shows the following performance improvement on the T-Head RVB-ICE board:
>> **Before**
>>
>> Benchmark Mode Cnt Score Error Units
>> ScopedValues.CreateBindThenGetThenRemove_ScopedValue avgt 10 873.072 12.051 ns/op
>> ScopedValues.bindThenGetNoRemove_ThreadLocal avgt 10 55.507 18.006 ns/op
>> ScopedValues.bindThenGetThenRemove_ScopedValue avgt 10 758.185 14.585 ns/op
>> ScopedValues.bindThenGetThenRemove_ThreadLocal avgt 10 922.742 9.597 ns/op
>> ScopedValues.bindViaGet_ScopedValue avgt 10 603.164 13.694 ns/op
>> ScopedValues.bind_ScopedValue avgt 10 557.377 9.591 ns/op
>> ScopedValues.bind_ThreadLocal avgt 10 854.636 17.014 ns/op
>> ScopedValues.counter_ScopedValue avgt 10 35.367 0.421 ns/op
>> ScopedValues.counter_ThreadLocal avgt 10 45.345 0.283 ns/op
>> ScopedValues.setNoRemove_ScopedValue avgt 10 39.838 0.432 ns/op
>> ScopedValues.setNoRemove_ThreadLocal avgt 10 44.769 0.404 ns/op
>> ScopedValues.sixValues_ScopedValue avgt 10 2.566 0.006 us/op
>> ScopedValues.sixValues_ThreadLocal avgt 10 9.067 0.020 us/op
>> ScopedValues.thousandAdds_ScopedValue avgt 10 0.158 0.004 us/op
>> ScopedValues.thousandAdds_ThreadLocal avgt 10 9.662 0.019 us/op
>> ScopedValues.thousandIsBoundQueries avgt 10 31.203 0.451 ns/op
>> ScopedValues.thousandMaybeGets avgt 10 156.356 4.469 ns/op
>>
>> **After**
>>
>> Benchmark Mode Cnt Score Error Units
>> ScopedValues.CreateBindThenGetThenRemove_ScopedValue avgt 10 367.616 4.382 ns/op
>> ScopedValues.bindThenGetNoRemove_ThreadLocal avgt 10 55.935 18.048 ns/op
>> ScopedValues.bindThenGetThenRemove_ScopedValue avgt 10 371.733 2.835 ns/op
>> ScopedValues.bindThenGetThenRemove_ThreadLocal avgt 10 902.112 12.709 ns/op
>> ScopedValues.bindViaGet_ScopedValue avgt 10 158.404 2.920 ns/op
>> ScopedValues.bind_ScopedValue avgt 10...
>
> Hi, thanks for finding this. Looks OK. Any correctness test performed?
> @RealFYang, yes, tier1 tests passed.
>
> The only thing, I'm hesitating about, is the size of __ nop(). I assume that it'll be 4 bytes because, for example, I see in code related to deoptimization the following: [src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L1290](https://github.com/openjdk/jdk/blob/a40d8d97e84d88d1a65aba81bfc09339be95e427/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L1290)
>
> ```
> // First instruction must be a nop as it may need to be patched on deoptimisation
> {
> Assembler::IncompressibleRegion ir(masm); // keep the nop as 4 bytes for patching.
> MacroAssembler::assert_alignment(__ pc());
> __ nop(); // 4 bytes
> }
> ```
>
> But from the other hand, according to the comments, in case of RVC __ nop() shall be 2 bytes: [src/hotspot/cpu/riscv/riscv.ad#L1292](https://github.com/openjdk/jdk/blob/98f7d263915e20a78f1ab53016079e3f15988aa3/src/hotspot/cpu/riscv/riscv.ad#L1292)
>
> ```
> Assembler::CompressibleRegion cr(&_masm); // nops shall be 2-byte under RVC for alignment purposes.
> for (int i = 0; i < _count; i++) {
> __ nop();
> }
> ```
>
> Is there possibility of error?
Note that there is a `Assembler::IncompressibleRegion ir(&_masm);` declaration on the function entry [1]. This effectivly ensures that the `nop` planted here won't be compressed and thus will be 4-bytes in size.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L2369
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16808#issuecomment-1831077362
More information about the hotspot-compiler-dev
mailing list