RFR: 8318157: RISC-V: implement ensureMaterializedForStackWalk intrinsic

Wed Nov 29 00:00:05 UTC 2023

On Tue, 28 Nov 2023 14:02:49 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Please, review addition of the ensureMaterializedForStackWalk C2 intrinsic for RISC-V. 
>> It's implemented by analogy to AArch64 and x86_64 [1].
>> 
>> The benchmark shows the following performance improvement on the T-Head RVB-ICE board:
>> **Before**
>> 
>> Benchmark                                             Mode  Cnt    Score    Error  Units
>> ScopedValues.CreateBindThenGetThenRemove_ScopedValue  avgt   10  873.072  12.051  ns/op
>> ScopedValues.bindThenGetNoRemove_ThreadLocal          avgt   10   55.507  18.006  ns/op
>> ScopedValues.bindThenGetThenRemove_ScopedValue        avgt   10  758.185  14.585  ns/op
>> ScopedValues.bindThenGetThenRemove_ThreadLocal        avgt   10  922.742   9.597  ns/op
>> ScopedValues.bindViaGet_ScopedValue                   avgt   10  603.164  13.694  ns/op
>> ScopedValues.bind_ScopedValue                         avgt   10  557.377   9.591  ns/op
>> ScopedValues.bind_ThreadLocal                         avgt   10  854.636  17.014  ns/op
>> ScopedValues.counter_ScopedValue                      avgt   10   35.367   0.421  ns/op
>> ScopedValues.counter_ThreadLocal                      avgt   10   45.345   0.283  ns/op
>> ScopedValues.setNoRemove_ScopedValue                  avgt   10   39.838   0.432  ns/op
>> ScopedValues.setNoRemove_ThreadLocal                  avgt   10   44.769   0.404  ns/op
>> ScopedValues.sixValues_ScopedValue                    avgt   10    2.566   0.006  us/op
>> ScopedValues.sixValues_ThreadLocal                    avgt   10    9.067   0.020  us/op
>> ScopedValues.thousandAdds_ScopedValue                 avgt   10    0.158   0.004  us/op
>> ScopedValues.thousandAdds_ThreadLocal                 avgt   10    9.662   0.019  us/op
>> ScopedValues.thousandIsBoundQueries                   avgt   10   31.203   0.451  ns/op
>> ScopedValues.thousandMaybeGets                        avgt   10  156.356   4.469  ns/op
>> 
>> **After**
>> 
>> Benchmark                                             Mode  Cnt    Score    Error  Units
>> ScopedValues.CreateBindThenGetThenRemove_ScopedValue  avgt   10  367.616   4.382  ns/op
>> ScopedValues.bindThenGetNoRemove_ThreadLocal          avgt   10   55.935  18.048  ns/op
>> ScopedValues.bindThenGetThenRemove_ScopedValue        avgt   10  371.733   2.835  ns/op
>> ScopedValues.bindThenGetThenRemove_ThreadLocal        avgt   10  902.112  12.709  ns/op
>> ScopedValues.bindViaGet_ScopedValue                   avgt   10  158.404   2.920  ns/op
>> ScopedValues.bind_ScopedValue                         avgt   10...
>
> Hi, thanks for finding this. Looks OK. Any correctness test performed?

@RealFYang, yes, tier1 tests passed.

The only thing, I'm hesitating about, is the size of __ nop(). I assume that it'll be 4 bytes because, for example, I see in code related to deoptimization the following:
[src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L1290](https://github.com/openjdk/jdk/blob/a40d8d97e84d88d1a65aba81bfc09339be95e427/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L1290)

    // First instruction must be a nop as it may need to be patched on deoptimisation
    {
      Assembler::IncompressibleRegion ir(masm);  // keep the nop as 4 bytes for patching.
      MacroAssembler::assert_alignment(__ pc());
      __ nop();  // 4 bytes
    } 

But from the other hand, according to the comments, in case of RVC __ nop() shall be 2 bytes:
[src/hotspot/cpu/riscv/riscv.ad#L1292](https://github.com/openjdk/jdk/blob/98f7d263915e20a78f1ab53016079e3f15988aa3/src/hotspot/cpu/riscv/riscv.ad#L1292)

    Assembler::CompressibleRegion cr(&_masm); // nops shall be 2-byte under RVC for alignment purposes.
    for (int i = 0; i < _count; i++) {
      __ nop();
    }

Is there possibility of error?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16808#issuecomment-1830964623