RFR: 8330266: RISC-V: Restore frm to RoundingMode::rne after JNI [v2]

Vladimir Kempik vkempik at openjdk.org
Tue Apr 16 11:21:59 UTC 2024


On Tue, 16 Apr 2024 10:52:58 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> Here are results from c910 ( licheePi4a). using:
>> a) jni_blank as is
>> b) modified jni_blank where native func() does this:
>> 
>> 
>> int x = 2;
>>     asm
>>         ("csrw fcsr,%0\n\t"
>>         :
>>         : "r" (x)
>>         );
>> 
>> branchful - exactly this PR
>> branchless - this PR without csrr&beq
>> 
>> results:
>> 
>> no fcsr change in native code
>> 
>> 
>> branchless
>> Benchmark                       Mode  Cnt    Score   Error  Units
>> CallOverheadConstant.jni_blank  avgt   30  133.586 ? 1.431  ns/op
>> CallOverheadVirtual.jni_blank   avgt   30  131.715 ? 0.570  ns/op
>> 
>> 
>> branchful
>> Benchmark                       Mode  Cnt    Score   Error  Units
>> CallOverheadConstant.jni_blank  avgt   30  133.376 ? 1.491  ns/op
>> CallOverheadVirtual.jni_blank   avgt   30  133.560 ? 1.782  ns/op
>> 
>> 
>> fcsr changed to rdn in native code
>> 
>> 
>> 
>> branchless
>> Benchmark                       Mode  Cnt    Score   Error  Units
>> CallOverheadConstant.jni_blank  avgt   30  153.708 ? 1.191  ns/op
>> CallOverheadVirtual.jni_blank   avgt   30  150.653 ? 1.617  ns/op
>> 
>> branchful
>> Benchmark                       Mode  Cnt    Score    Error  Units
>> CallOverheadConstant.jni_blank  avgt   30  153.595 ?  0.759  ns/op
>> CallOverheadVirtual.jni_blank   avgt   30  149.992 ?  1.605  ns/op
>> 
>> 
>> Basically there are not difference here ( thanks to BranchPredictor), so why would you make code more complex (and require additional registers) ?
>
> Great to see there is no performance degradation on the current generation of hardware. That's a great motivator to get this change in with the branch as we already know that the branch is going to bring a performance improvement on next/other generations of hardware.

Even when showing middle finger to branch predictor:

unsigned char prbarray[512];
static int setupme()
{
    srand(time(NULL));
    for(int i = 0; i < 512; i++)
    {
	int r = rand()%15;
	prbarray[i]= (r > 10) ? (r-10) : 0;
    }

    return 1;
}

EXPORT void func() {
    static int setup = 0;
    static unsigned long counter = 0;
    if(!setup) setup = setupme();
    int x = prbarray[counter++%512];
    asm
        ("csrw fcsr,%0\n\t"
        :
        : "r" (x)
        );
}


results are still equal between branchless and branchful versions

I'm ok with this change, buts lets wait for yet another opinion: do we accept such "future proof" changes or not

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18785#discussion_r1567180887


More information about the hotspot-dev mailing list