RFR: 8330266: RISC-V: Restore frm to RoundingMode::rne after JNI [v2]
Vladimir Kempik
vkempik at openjdk.org
Tue Apr 16 11:21:59 UTC 2024
On Tue, 16 Apr 2024 10:52:58 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:
>> Here are results from c910 ( licheePi4a). using:
>> a) jni_blank as is
>> b) modified jni_blank where native func() does this:
>>
>>
>> int x = 2;
>> asm
>> ("csrw fcsr,%0\n\t"
>> :
>> : "r" (x)
>> );
>>
>> branchful - exactly this PR
>> branchless - this PR without csrr&beq
>>
>> results:
>>
>> no fcsr change in native code
>>
>>
>> branchless
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadConstant.jni_blank avgt 30 133.586 ? 1.431 ns/op
>> CallOverheadVirtual.jni_blank avgt 30 131.715 ? 0.570 ns/op
>>
>>
>> branchful
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadConstant.jni_blank avgt 30 133.376 ? 1.491 ns/op
>> CallOverheadVirtual.jni_blank avgt 30 133.560 ? 1.782 ns/op
>>
>>
>> fcsr changed to rdn in native code
>>
>>
>>
>> branchless
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadConstant.jni_blank avgt 30 153.708 ? 1.191 ns/op
>> CallOverheadVirtual.jni_blank avgt 30 150.653 ? 1.617 ns/op
>>
>> branchful
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadConstant.jni_blank avgt 30 153.595 ? 0.759 ns/op
>> CallOverheadVirtual.jni_blank avgt 30 149.992 ? 1.605 ns/op
>>
>>
>> Basically there are not difference here ( thanks to BranchPredictor), so why would you make code more complex (and require additional registers) ?
>
> Great to see there is no performance degradation on the current generation of hardware. That's a great motivator to get this change in with the branch as we already know that the branch is going to bring a performance improvement on next/other generations of hardware.
Even when showing middle finger to branch predictor:
unsigned char prbarray[512];
static int setupme()
{
srand(time(NULL));
for(int i = 0; i < 512; i++)
{
int r = rand()%15;
prbarray[i]= (r > 10) ? (r-10) : 0;
}
return 1;
}
EXPORT void func() {
static int setup = 0;
static unsigned long counter = 0;
if(!setup) setup = setupme();
int x = prbarray[counter++%512];
asm
("csrw fcsr,%0\n\t"
:
: "r" (x)
);
}
results are still equal between branchless and branchful versions
I'm ok with this change, buts lets wait for yet another opinion: do we accept such "future proof" changes or not
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18785#discussion_r1567180887
More information about the hotspot-dev
mailing list