Missaligned memory accesses from JDK
Vladimir Kempik
vladimir.kempik at gmail.com
Fri Mar 17 10:50:19 UTC 2023
Hello
Continuing on misaligned memory accesses from JDK.
Hearing no news from Yadong's team [4], I have decided to take a look myself.
I have an fpga with risc-v cores, in this config it has no support for misaligned loads/stores.
When such memory access happens, it leads to trap and then the M-mode emulator is used ( very similar to opensbi)
Also I have two perf counters - trp_lam/trp_sam (for misaligned loads and stores resp.)
Using them and perfasm.jar I can track every misaligned access.
I have started with the patch from Xiaolin Zheng (which fixes misaligned memory access when writing/reading instructions), that completely removed trp_sam events, but trp_lam was mostly unaffected.
Using perfasm.jar I have found the rest of trp_lam to originate from Template Interpreter's generated code.
Here is numbers on current jdk21-dev (without Xiaolin's patch)
java -Xint -version
239163 trp_lam
16289 trp_sam
5.602736519 seconds time elapsed
5.260201000 seconds user
Total executed instructions - 380M
1:1600 (trp_lam:total) - pretty high ration.
I was able to identify and fix all of them (also applying Xiaolin's patch)
New results:
java -Xint -version
0 trp_lam
0 trp_sam
4.273510055 seconds time elapsed
3.926482000 seconds user
Notice time improvements.
Also running renaissance philosophers in Xint mode for 20 minutes:
0 trp_lam
0 trp_sam
1290.397695196 seconds time elapsed
2099.607472000 seconds user
40.825845000 seconds sys
Clear win, for this fpga.
I can still get some trp_lam when running java -Xcomp -version, but their number is pretty low (less than 50) and they come from C2 generated code.
Now need to check if this changes affect performance on real hardware (I don't want to impact their performance)
java -Xint -version is too fast for it, so I was running renaissance philosophers in Xint mode, just one repetition, multiple runs.
Checking on Thead (c910 core):
before:
671-684 seconds
after:
657-689 seconds
It's good it’s not worse
On hifive umatched:
before:
2638-2663 seconds
after:
1489-1504 seconds
hifive benefits it.
I would like to get some pre-review for my patch [1]
Main points:
- safeness of using t0/t1 registers.
- the method void TemplateTable::load_unsigned_short_at_bcp(Register dst, int offset, Register tmp), maybe it has to be designed differently [2] [3]
The patch [1] has some comments describing how much of trp_lam events I won there.
Regards, Vladimir
[1] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616
[2] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-ecc50a63ee11d784ec34c55425afb755500a58f9ef4065cdc691fe18fce3692dR148
[3] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-412c07ae1ae7770f87b04175c0d65ed3cc1f60dca186e3cfaf0af6b6d00b597eR104
[4] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-July/000563.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/riscv-port-dev/attachments/20230317/7733838f/attachment.htm>
More information about the riscv-port-dev
mailing list