RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java

Fri Apr 4 09:27:53 UTC 2025

On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double.
> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path.
> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`.
> 
> Tested on both x86 and riscv64.
> 
> Thanks

026     lh  R28, [R11, #12]     # short, #@loadS ! Field: jdk/incubator/vector/Float16.value (constant)
02a     NullCheck R11

02a     B2: #   out( B5 B3 ) <- in( B1 )  Freq: 0.999999
02a +    --     // R23=Thread::current(), empty, #@tlsLoadP
02a     ld  R10, [R23, #464]    # ptr, #@loadP
02e +   fmv.h.x F2, zr  # float, #@loadConH0
032 +   fmv.h.x F0, R28
036 +   ld  R7, [R23, #480]     # ptr, #@loadP
03a +   addi  R28, R10, #16     # ptr, #@addP_reg_imm
03e +   binop_hf F3, F2, F0
042 +   bgeu  R28, R7, B5       #@cmpP_branch  P=0.000100 C=-1.000000

046     B3: #   out( B4 ) <- in( B2 )  Freq: 0.999899
046 +   mv R7, #1       # long, #@loadConL
048 +   sd  R28, [R23, #464]    # ptr, #@storeP
04c +   mv  R29, narrowklass: precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact *     # compressed klass ptr, #@loadConNKlass
058 +   sd  R7, [R10]   # long, #@storeL
05c +   sw  R29, [R10, #8]      # compressed klass ptr, #@storeNKlass
060 +   prefetch_w [R28, #192]  # Prefetch for write
064 +   sw  zr, [R10, #12]      # int, #@storeimmI0

068     B4: #   out( N1 ) <- in( B6 B3 )  Freq: 0.999999
068 +   fmv.h.x F0, zr  # float, #@loadConH0
06c +   binop_hf F0, F0, F3
070 +   fmv.x.h R28, F0
074 +   sh  R28, [R10, #12]     # short, #@storeC
078
078 +   MEMBAR-store-store      #@membar_storestore
078 +   # checkcastPP of R10, #@checkCastPP
078     # pop frame 48
        add  sp, sp, #48
        ld  ra, [sp,#-16]
        ld  fp, [sp,#-8]
        # test polling word
        ld t0, [xthread,#40]
        bgtu sp, t0, #slow_path
08a +   ret     // return register, #@Ret

08c     B5: #   out( B8 B6 ) <- in( B2 )  Freq: 0.000100016
08c +   fmv.w.x F0, zr  # float, #@loadConF0
090 +   convHF2SAndHF2F F2, F3
094 +   fsub.s  F0, F0, F2      #@subF_reg_reg
098     spill F3 -> [sp, #0]    # spill size = 32
09c +   spill F0 -> [sp, #4]    # spill size = 32
0a0 +   mv  R11, precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact *  # ptr, #@loadConP
0b8     CALL,static 0x00007fa4cb85ba80  #@CallStaticJavaDirect wrapper for: C2 Runtime new_instance
        # jdk.incubator.vector.Float16::valueOf @ bci:0 (line 329) L[0]=sp + #4
        # jdk.incubator.vector.Float16::subtract @ bci:9 (line 1137) L[0]=_ L[1]=_
        # compiler.floatingpoint.TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 62) L[0]=_
        # OopMap {off=196/0xc4}

0d0     B6: #   out( B4 ) <- in( B5 )  Freq: 0.000100014
        # Block is sole successor of call
0d0 +   spill [sp, #0] -> F3    # spill size = 32
0d4 +   j  B4   #@branch

Thanks for having a look, interesting!

Please check the B5 block (which is when TLAB run out I think), there is an *extra* `fsub.s` putting value into F0, but F0 value is not really used, as in B4 block it just loads zero into F0. This fsub.s should be useless, although it should do no harness in the sense of correctness.

I checked the x86, find out I don't have the CPU feature to generate real float16 instructions, so it only has 2 SubF rather than SubHF which I think is expected for Float16.
Not sure if this useless `fsub.s` is only an issue on riscv or maybe also on x86 if `supports_avx512_fp16` return true. It will be great if someone can help to verify it.

Also not sure if this useless `extra` instruction (here it's `fsub.s`) could be generated in other situations.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778059148