Re: RVC by default?

Xiaolin Zheng yunyao.zxl at alibaba-inc.com
Thu Sep 15 15:21:43 UTC 2022


Hi Vladimir,
Extremely interesting, which allows me remember the https://github.com/riscv-collab/riscv-openjdk/issues/23 <https://github.com/riscv-collab/riscv-openjdk/issues/23 > issue one year before: the "hs_err_pid4188454.log <https://github.com/riscv-collab/riscv-openjdk/files/7230851/hs_err_pid4188454.log >".
At that time I was developing one RVC prototype; after merge it, we found this issue.
We can see it has nearly the same interesting thing here, in a similar form.
In that hs_err, 
```
> x1(ra) =0x0000003fa9732c66 is at code_begin+470 in ...
```
ra is 0x0000003fa9732c66, so if we search the ra:
```
DeoptimizationBlob
--------------------------------------------------------------------------------
Decoding CodeBlob, name: DeoptimizationBlob, at [0x0000003fa9732a90, 0x0000003fa9732d38] 680 bytes
 0x0000003fa9732a90: addi sp,sp,-16
 0x0000003fa9732a92: sd ra,8(sp)
 0x0000003fa9732a94: sd s0,0(sp)
 0x0000003fa9732a96: mv s0,sp
 0x0000003fa9732a98: addi sp,sp,-240
 0x0000003fa9732a9a: sd zero,0(sp)
 0x0000003fa9732a9c: sd gp,8(sp)
 0x0000003fa9732a9e: sd tp,16(sp)
 0x0000003fa9732aa0: sd t0,24(sp)
...
...
0x0000003fa9732c36: fsd ft9,232(sp)
 0x0000003fa9732c38: fsd ft10,240(sp)
 0x0000003fa9732c3a: fsd ft11,248(sp)
 0x0000003fa9732c3c: li s10,1
 0x0000003fa9732c3e: ld a3,936(s7)
 0x0000003fa9732c42: sd a3,8(s0)
 0x0000003fa9732c44: sd zero,936(s7)
 0x0000003fa9732c48: auipc t0,0x0
 0x0000003fa9732c4c: addi t0,t0,30 # 0x0000003fa9732c66
 0x0000003fa9732c50: sd t0,664(s7)
 0x0000003fa9732c54: mv t0,sp
 0x0000003fa9732c56: sd t0,656(s7)
 0x0000003fa9732c5a: mv a0,s7
 0x0000003fa9732c5c: mv a1,s10
 0x0000003fa9732c5e: auipc t0,0xf2c4
 0x0000003fa9732c62: jalr -1722(t0) # 0x0000003fb89f65a4 = DeoptReasonSerializer::serialize(JfrCheckpointWriter&)+220 <- ??????? here, very alike your hs_err right? :-) It's flying away. Of course we are not calling JFR here.
 0x0000003fa9732c66: sd zero,656(s7)
 0x0000003fa9732c6a: sd zero,664(s7)
 0x0000003fa9732c6e: mv a5,a0
 0x0000003fa9732c70: lwu s10,60(a5)
```
In your presented hs_err, the flying pc is:
```
0x0000003f8458af26: jalr 2014(t0) # 0x0000003f94507700 = SharedRuntime::look_for_reserved_stack_annotated_method(JavaThread*, frame)+892
```
Of course in the resolve_static_call it should definitely not going to a function (SharedRuntime::look_for_reserved_stack_annotated_method) only SIG handler calls.
(Specially thanks Feilong for preserving the original issue so I can find it)
I have no proof that these two are the same issue anyway, but they look so much similar, which recalls me maybe there has some relationship.
The solution at that time is https://github.com/riscv-collab/riscv-openjdk/issues/23#issuecomment-939247481 <https://github.com/riscv-collab/riscv-openjdk/issues/23#issuecomment-939247481 > , from Yadong.
I would like to hear from your opinions about this, for you have the reproducible environment. This is an indeed interesting issue.
Best,
Xiaolin
------------------------------------------------------------------
From:Vladimir Kempik <vladimir.kempik at gmail.com>
Send Time:2022年9月15日(星期四) 22:17
To:郑孝林(云矅) <yunyao.zxl at alibaba-inc.com>
Cc:riscv-port-dev <riscv-port-dev-retn at openjdk.org>; Aleksey Shipilev <shade at redhat.com>; riscv-port-dev at openjdk.org <riscv-port-dev at openjdk.org>
Subject:Re: RVC by default?
The main thing there is the amount of to be allocated memory, clearly it’s not normal.
Here is another bug with RVC which again happens only on fpga, but stable and I can see wrong code generation, could be interesting to you:
# SIGILL (0x4) at pc=0x0000003f94507700, pid=363, tid=364
Stack: [0x0000003f939b0000,0x0000003f93bb0000], sp=0x0000003f93bad9a0, free space=2038k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x8dc700] SharedRuntime::look_for_reserved_stack_annotated_method(JavaThread*, frame)+0x37c
J 2 c1 java.lang.String.hashCode()I java.base (60 bytes) @ 0x0000003f7d000874 [0x0000003f7d000580+0x00000000000002f4]
j java.util.ImmutableCollections$SetN.probe(Ljava/lang/Object;)I+1 java.base
j java.util.ImmutableCollections$SetN.<init>([Ljava/lang/Object;)V+35 java.base
j java.util.Set.of([Ljava/lang/Object;)Ljava/util/Set;+64 java.base
j jdk.internal.module.SystemModules$default.moduleDescriptors()[Ljava/lang/module/ModuleDescriptor;+1890 java.base
j jdk.internal.module.SystemModuleFinders.of(Ljdk/internal/module/SystemModules;)Ljava/lang/module/ModuleFinder;+1 java.base
j jdk.internal.module.ModuleBootstrap.boot2()Ljava/lang/ModuleLayer;+240 java.base
j jdk.internal.module.ModuleBootstrap.boot()Ljava/lang/ModuleLayer;+64 java.base
j java.lang.System.initPhase2(ZZ)I+0 java.base
v ~StubRoutines::call_stub 0x0000003f8453849c
V [libjvm.so+0x5b8018] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x1d6
V [libjvm.so+0x5b8274] JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0xe8
V [libjvm.so+0xa0320c] Threads::create_vm(JavaVMInitArgs*, bool*)+0x63c
V [libjvm.so+0x647d6c] JNI_CreateJavaVM+0x6a
C [libjli.so+0x3658] JavaMain+0x7a
C [libjli.so+0x670e] ThreadJavaMain+0xc
C [libc.so.6+0x675a6]
C [libc.so.6+0xb3a02]
pc =0x0000003f94507700
x1(ra) =0x0000003f8458af2a is at code_begin+170 in 
[CodeBlob (0x0000003f8458ae10)]
Framesize: 62
Runtime Stub (0x0000003f8458ae10): resolve_static_call
--------------------------------------------------------------------------------
Decoding CodeBlob, name: resolve_static_call, at [0x0000003f8458ae80, 0x0000003f8458b078] 504 bytes
 0x0000003f8458ae80: addi sp,sp,-16
 0x0000003f8458ae84: sd ra,8(sp)
 0x0000003f8458ae88: sd s0,0(sp)
 0x0000003f8458ae8c: addi s0,sp,16
 0x0000003f8458ae90: addi sp,sp,-224
 0x0000003f8458ae92: sd t0,8(sp)
…..
0x0000003f8458af08: fsd ft11,248(sp)
 0x0000003f8458af0a: auipc t0,0x0
 0x0000003f8458af0e: addi t0,t0,32 # 0x0000003f8458af2a
 0x0000003f8458af12: sd t0,712(s7)
 0x0000003f8458af16: mv t0,sp
 0x0000003f8458af1a: sd t0,704(s7)
 0x0000003f8458af1e: mv a0,s7
 0x0000003f8458af22: auipc t0,0xff7c
 0x0000003f8458af26: jalr 2014(t0) # 0x0000003f94507700 = SharedRuntime::look_for_reserved_stack_annotated_method(JavaThread*, frame)+892
 0x0000003f8458af2a: sd zero,704(s7)
 0x0000003f8458af2e: sd zero,712(s7)
 0x0000003f8458af32: ld t0,8(s7)
 0x0000003f8458af36: bnez t0,0x0000003f8458afd8
 0x0000003f8458af3a: ld t6,800(s7)
 0x0000003f8458af3e: sd zero,800(s7)
 0x0000003f8458af42: sd t6,472(sp)
 0x0000003f8458af46: sd a0,264(sp)
the interesting line here is 
 0x0000003f8458af26: jalr 2014(t0) # 0x0000003f94507700 = SharedRuntime::look_for_reserved_stack_annotated_method(JavaThread*, frame)+892
+892 offset is 0x37c in hex, it’s exactly our crash site as backtrace says ( aka pc = 0x0000003f94507700) , lets check what’s there
0x0000003f945076e0: c9 02 23 34 d9 02 23 38 f9 02 ef 60 30 ff 7d 77
0x0000003f945076f0: 93 07 87 70 a2 97 90 63 aa 85 17 65 28 00 13 05
0x0000003f94507700: 65 9e 97 30 b2 ff e7 80 20 05 7d 77 13 07 87 70
0x0000003f94507710: 17 ac 36 00 03 3c 0c 5e 22 97 18 63 83 47 1c 1f
0x3f945076e0: 02c9 addi t0,t0,18
0x3f945076e2: 02d93423 sd a3,40(s2)
0x3f945076e6: 02f93823 sd a5,48(s2)
0x3f945076ea: ff3060ef jal ra,-1019918 # 0x3f9440e6dc
0x3f945076ee: 777d lui a4,-4096
0x3f945076f0: 70870793 addi a5,a4,1800
0x3f945076f4: 97a2 add a5,a5,s0
0x3f945076f6: 6390 ld a2,0(a5)
0x3f945076f8: 85aa mv a1,a0
0x3f945076fa: 00286517 auipc a0,2646016 # 0x3f9478d6fa
0x3f945076fe: 9e650513 addi a0,a0,-1562
0x3f94507702: ffb23097 auipc ra,-5099520 # 0x3f9402a702
0x3f94507706: 052080e7 jalr ra,ra,82
0x3f9450770a: 777d lui a4,-4096
0x3f9450770c: 70870713 addi a4,a4,1800
0x3f94507710: 0036ac17 auipc s8,3579904 # 0x3f94871710
0x3f94507714: 5e0c3c03 ld s8,1504(s8)
0x3f94507718: 9722 add a4,a4,s0
0x3f9450771a: 6318 ld a4,0(a4)
0x3f9450771c: 1f1c4783 lbu a5,497(s8)
so, jalr jumped into the middle of opcode
0x3f945076fe: 9e650513 addi a0,a0,-1562
So this could be an issue with runtime blob generation.
Regards, Vladimir
15 сент. 2022 г., в 16:01, Xiaolin Zheng <yunyao.zxl at alibaba-inc.com <mailto:yunyao.zxl at alibaba-inc.com >> написал(а):
Hi Vladimir,
Thank you for the information. RVC's performance gain is a side effect alike thing, and it seems the larger the icache size, the less performance gain of it. Besides, the current RVC implement in the backend is only a basic one, covering some of C2 match rules, far from complete. So I might not assume observing performance gain with the current RVC implementation, but we should also not observe regressions of generated code here. So of course I'd agree with your analysis.
The second one seems interesting as well. Weird, it seems a common native out of memory, so shouldn't turning off RVC reveal the same issue, I guess? I will wait for the JBS issue and do some JVM options tuning to simulate that case to see if I can reproduce it in the meantime.
Best,
Xiaolin
------------------------------------------------------------------
From:Vladimir Kempik <vladimir.kempik at gmail.com <mailto:vladimir.kempik at gmail.com >>
Send Time:2022年9月15日(星期四) 20:33
To:郑孝林(云矅) <yunyao.zxl at alibaba-inc.com <mailto:yunyao.zxl at alibaba-inc.com >>
Cc:riscv-port-dev <riscv-port-dev-retn at openjdk.org <mailto:riscv-port-dev-retn at openjdk.org >>; Aleksey Shipilev <shade at redhat.com <mailto:shade at redhat.com >>; riscv-port-dev at openjdk.org <mailto:riscv-port-dev at openjdk.org > <riscv-port-dev at openjdk.org <mailto:riscv-port-dev at openjdk.org >>
Subject:Re: RVC by default?
Hello
Yes, it’s slightly unstable. even on fpga.
I have found I can compare results only from two consequential runs ( e.g. first run without RVC, second run with RVC), then some average result from iterations 5-15, removing some too slow results.
I think your results shows no perf gain from RVC, that’s expected as RVC gives no perf improvements for opcodes, only requiring less i-cache space.
Another interesting moment with RVC, I see some jdk failure only when RVC is enabled and only on fpga. ( on philosophers test)
it’s very strange, I will try to debug it and file a bug in JBS if it turns out to be a real jdk bug (or this could easily be a fpga "core" issue)
Regards, Vladimir
# Native memory allocation (malloc) failed to allocate 4352974235792 bytes for Chunk::new
# Out of Memory Error (arena.cpp:184), pid=5722, tid=5723
Stack: [0x0000003f83111000,0x0000003f83311000], sp=0x0000003f8330e2e0, free space=2036k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xa6c064] VMError::report_and_die(int, char const*, char const*, void*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x16a
V [libjvm.so+0xa6ca9e] VMError::report_and_die(Thread*, char const*, int, unsigned long, VMErrorType, char const*, void*)+0x28
V [libjvm.so+0x3ff306] report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*, ...)+0x6a
V [libjvm.so+0x2603de] Chunk::operator new(unsigned long, AllocFailStrategy::AllocFailEnum, unsigned long)+0x108
V [libjvm.so+0x260cf2] Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x36
V [libjvm.so+0x8d7392] AdapterHandlerLibrary::create_adapter(AdapterBlob*&, int, BasicType*, bool)+0x39e
V [libjvm.so+0x8dcb7e] AdapterHandlerLibrary::get_adapter(methodHandle const&)+0x41e
J 5 c1 java.util.ImmutableCollections$SetN.probe(Ljava/lang/Object;)I java.base (56 bytes) @ 0x0000003f696e5858 [0x0000003f696e5700+0x0000000000000158]
j java.util.ImmutableCollections$SetN.<init>([Ljava/lang/Object;)V+35 java.base
j java.util.Set.of([Ljava/lang/Object;)Ljava/util/Set;+64 java.base
j jdk.internal.module.SystemModules$default.moduleDescriptors()[Ljava/lang/module/ModuleDescriptor;+3619 java.base
j jdk.internal.module.SystemModuleFinders.of(Ljdk/internal/module/SystemModules;)Ljava/lang/module/ModuleFinder;+1 java.base
j jdk.internal.module.ModuleBootstrap.boot2()Ljava/lang/ModuleLayer;+240 java.base
j jdk.internal.module.ModuleBootstrap.boot()Ljava/lang/ModuleLayer;+64 java.base
j java.lang.System.initPhase2(ZZ)I+0 java.base
v ~StubRoutines::call_stub 0x0000003f70c1c49c
V [libjvm.so+0x5b790c] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x1d6
V [libjvm.so+0x5b7b68] JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0xe8
V [libjvm.so+0xa0281c] Threads::create_vm(JavaVMInitArgs*, bool*)+0x63c
V [libjvm.so+0x647656] JNI_CreateJavaVM+0x6a
C [libjli.so+0x3658] JavaMain+0x7a
C [libjli.so+0x670e] ThreadJavaMain+0xc
15 сент. 2022 г., в 06:25, Xiaolin Zheng <yunyao.zxl at alibaba-inc.com <mailto:yunyao.zxl at alibaba-inc.com >> написал(а):
Hi Vladimir,
There are some minor updates for the philosophers in Renaissance discussed days before: I have tested the philosophers on my Unmatched board, and found the test itself seems not stable, even if the JMH version. I gave its JMH version a two-day long run, exclusively, but the score varies in the 13000 ms/op range (iterations = 30 by default), even if RVC doesn't get turned on. Have you encountered the same issue?
+ /home/ubuntu/yunyao/jdk19-release/bin/java -XX:+UnlockExperimentalVMOptions -XX:-UseRVC -jar renaissance-jmh-0.14.1.jar org.renaissance.scala.stm.JmhPhilosophers.runOperation
JmhPhilosophers.runOperation ss 40 14307.472 ± 656.456 ms/op
JmhPhilosophers.runOperation ss 40 13175.640 ± 303.038 ms/op
JmhPhilosophers.runOperation ss 40 13474.124 ± 349.349 ms/op
JmhPhilosophers.runOperation ss 40 13545.786 ± 327.735 ms/op
JmhPhilosophers.runOperation ss 40 13085.097 ± 306.891 ms/op
JmhPhilosophers.runOperation ss 40 12880.270 ± 265.028 ms/op
JmhPhilosophers.runOperation ss 40 13232.006 ± 209.613 ms/op
JmhPhilosophers.runOperation ss 40 13334.098 ± 443.757 ms/op
JmhPhilosophers.runOperation ss 40 13168.990 ± 575.965 ms/op
JmhPhilosophers.runOperation ss 40 13424.250 ± 381.084 ms/op
JmhPhilosophers.runOperation ss 40 13655.426 ± 428.624 ms/op
JmhPhilosophers.runOperation ss 40 14430.485 ± 488.797 ms/op
JmhPhilosophers.runOperation ss 40 13999.061 ± 359.320 ms/op
JmhPhilosophers.runOperation ss 40 13623.308 ± 531.513 ms/op
JmhPhilosophers.runOperation ss 40 13757.331 ± 373.905 ms/op
+ /home/ubuntu/yunyao/jdk19-release/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseRVC -jar renaissance-jmh-0.14.1.jar org.renaissance.scala.stm.JmhPhilosophers.runOperation
JmhPhilosophers.runOperation ss 40 12772.517 ± 227.409 ms/op
JmhPhilosophers.runOperation ss 40 13456.228 ± 498.724 ms/op
JmhPhilosophers.runOperation ss 40 13727.211 ± 476.491 ms/op
JmhPhilosophers.runOperation ss 40 13122.838 ± 246.673 ms/op
JmhPhilosophers.runOperation ss 40 13082.768 ± 405.194 ms/op
JmhPhilosophers.runOperation ss 40 13905.753 ± 456.474 ms/op
JmhPhilosophers.runOperation ss 40 13503.479 ± 351.191 ms/op
JmhPhilosophers.runOperation ss 40 13365.138 ± 380.285 ms/op
JmhPhilosophers.runOperation ss 40 13842.509 ± 487.629 ms/op
JmhPhilosophers.runOperation ss 40 13965.286 ± 330.423 ms/op
JmhPhilosophers.runOperation ss 40 13615.975 ± 352.590 ms/op
JmhPhilosophers.runOperation ss 40 13564.777 ± 452.947 ms/op
JmhPhilosophers.runOperation ss 40 13720.022 ± 519.965 ms/op
JmhPhilosophers.runOperation ss 40 14033.287 ± 404.377 ms/op
JmhPhilosophers.runOperation ss 40 13680.432 ± 539.549 ms/op
The noise here is a little big; I was wondering if it's stable on the FPGA?
Maybe I need to find some more stable tests anyway.
Best,
Xiaolin
------------------------------------------------------------------
From:Vladimir Kempik <vladimir.kempik at gmail.com <mailto:vladimir.kempik at gmail.com >>
Send Time:2022年9月8日(星期四) 20:24
To:郑孝林(云矅) <yunyao.zxl at alibaba-inc.com <mailto:yunyao.zxl at alibaba-inc.com >>
Cc:riscv-port-dev <riscv-port-dev-retn at openjdk.org <mailto:riscv-port-dev-retn at openjdk.org >>; Aleksey Shipilev <shade at redhat.com <mailto:shade at redhat.com >>; riscv-port-dev at openjdk.org <mailto:riscv-port-dev at openjdk.org > <riscv-port-dev at openjdk.org <mailto:riscv-port-dev at openjdk.org >>
Subject:Re: RVC by default?
Hello
To be more specific, I saw slight perf decrease with RVC only on a core running on fpga.
On thead c910 results ( -RVC and + RVC) are on par.
Regards, Vladimir
8 сент. 2022 г., в 15:09, Xiaolin Zheng <yunyao.zxl at alibaba-inc.com <mailto:yunyao.zxl at alibaba-inc.com >> написал(а):
Hi Aleksey and Vladimir,
The current RVC support is okay but not complete: it only covers ~10% of total instructions emitted (mostly C2 code, including some part of Stub code), and we might want to transform instructions into the compressed counterparts as much as possible, so maybe the design will change from a whitelist mode (the class CompressibleRegion) to a black list mode. There is one implementation at my local branch https://github.com/zhengxiaolinX/jdk/commits/REBASE-rvc-beautify <https://github.com/zhengxiaolinX/jdk/commits/REBASE-rvc-beautify > (might not be stable yet, I have not gotten enough time to give it a sufficient test on jtregs and specjbb2015/other benchmarks yet). There are plans reserved to commit them (which cover ~20% of instructions under some tests) after reviewing, but this is currently WIP and waiting loom port to merge first.
And thank you Vladimir for your observations, I will test the Renaissance benchmark as you have mentioned. I have given tests for specjbb2015 months before and found slight performance increase there; as far as I know, the compile time will increase for the transformation logic is extra overhead during the instruction emission phase, such as the code in Assembler::add. Theoretically, when running the compiled code with RVC turning on, though IPC and CPI are not changed, the code size shrinks; I think it should have the same effect as the icache size becoming larger. Maybe something goes wrong? :-) I might need to look into the performance problem in a high priority, so will test the Renaissance first.
Best,
Xiaolin
------------------------------------------------------------------
From:Aleksey Shipilev <shade at redhat.com <mailto:shade at redhat.com >>
Send Time:2022年9月8日(星期四) 18:34
To:undefined <undefined>; undefined <undefined>
Subject:RVC by default?
Hi,
I was looking at some generated code on RISC-V, and realized while we have RVC support, we don't 
enable it by default. On HiFive Unleashed:
$ test-jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal 2>&1 | grep RVC
 bool UseRVC = false {ARCH 
experimental} {default}
Is there a reason not to do RVC by default? Can we reliably poll the RVC capabilities in current 
hardware?
-- 
Thanks,
-Aleksey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/riscv-port-dev/attachments/20220915/91169ba9/attachment-0001.htm>


More information about the riscv-port-dev mailing list