Missaligned memory accesses from JDK
yangfei at iscas.ac.cn
yangfei at iscas.ac.cn
Mon Mar 20 07:29:36 UTC 2023
Hi,
Just did a quick scan of the changes. I have a few comments.
It's interesting to see that changes were made in hotspot shared code, especially in file: src/hotspot/share/asm/codeBuffer.hpp
For each emit_intX functions modified, I see there is a correspondent version which handles unaligned access. For example, 'void emit_int16(uint8_t x1, uint8_t x2)' for 'void emit_int16(uint16_t x)'
So if we encounter an unaligned access issue when using 'emit_int16(uint16_t x)', shouldn't we change the callsite to use 'emit_int16(uint8_t x1, uint8_t x2)' instead?
I think this will also be a potential issue for other platforms in respect of functionality or performance. It doesn't look nice for us to handle that in a platform-dependent way.
Also, instead of changing file: src/hotspot/share/interpreter/templateTable.hpp for new function 'load_unsigned_short_at_bcp', I personally perfer inlining it at its callsites.
And what about C1 & C2 JIT compilers?
Thanks,
Fei Yang
-----Original Messages-----
From:"Vladimir Kempik" <vladimir.kempik at gmail.com>
Sent Time:2023-03-17 18:50:19 (Friday)
To: riscv-port-dev <riscv-port-dev at openjdk.org>
Cc: yunyao.zxl at alibaba-inc.com
Subject: Missaligned memory accesses from JDK
Hello
Continuing on misaligned memory accesses from JDK.
Hearing no news from Yadong's team [4], I have decided to take a look myself.
I have an fpga with risc-v cores, in this config it has no support for misaligned loads/stores.
When such memory access happens, it leads to trap and then the M-mode emulator is used ( very similar to opensbi)
Also I have two perf counters - trp_lam/trp_sam (for misaligned loads and stores resp.)
Using them and perfasm.jar I can track every misaligned access.
I have started with the patch from Xiaolin Zheng (which fixes misaligned memory access when writing/reading instructions), that completely removed trp_sam events, but trp_lam was mostly unaffected.
Using perfasm.jar I have found the rest of trp_lam to originate from Template Interpreter's generated code.
Here is numbers on current jdk21-dev (without Xiaolin's patch)
java -Xint -version
239163 trp_lam
16289 trp_sam
5.602736519 seconds time elapsed
5.260201000 seconds user
Total executed instructions - 380M
1:1600 (trp_lam:total) - pretty high ration.
I was able to identify and fix all of them (also applying Xiaolin's patch)
New results:
java -Xint -version
0 trp_lam
0 trp_sam
4.273510055 seconds time elapsed
3.926482000 seconds user
Notice time improvements.
Also running renaissance philosophers in Xint mode for 20 minutes:
0 trp_lam
0 trp_sam
1290.397695196 seconds time elapsed
2099.607472000 seconds user
40.825845000 seconds sys
Clear win, for this fpga.
I can still get some trp_lam when running java -Xcomp -version, but their number is pretty low (less than 50) and they come from C2 generated code.
Now need to check if this changes affect performance on real hardware (I don't want to impact their performance)
java -Xint -version is too fast for it, so I was running renaissance philosophers in Xint mode, just one repetition, multiple runs.
Checking on Thead (c910 core):
before:
671-684 seconds
after:
657-689 seconds
It's good it’s not worse
On hifive umatched:
before:
2638-2663 seconds
after:
1489-1504 seconds
hifive benefits it.
I would like to get some pre-review for my patch [1]
Main points:
- safeness of using t0/t1 registers.
- the method void TemplateTable::load_unsigned_short_at_bcp(Register dst, int offset, Register tmp), maybe it has to be designed differently [2] [3]
The patch [1] has some comments describing how much of trp_lam events I won there.
Regards, Vladimir
[1] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616
[2] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-ecc50a63ee11d784ec34c55425afb755500a58f9ef4065cdc691fe18fce3692dR148
[3] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-412c07ae1ae7770f87b04175c0d65ed3cc1f60dca186e3cfaf0af6b6d00b597eR104
[4] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-July/000563.html
</riscv-port-dev at openjdk.org></vladimir.kempik at gmail.com>
More information about the riscv-port-dev
mailing list