<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Hello</div><div>Continuing on misaligned memory accesses from JDK.</div><div><div>Hearing no news from Yadong's team [4], I have decided to take a look myself.</div></div><div>I have an fpga with risc-v cores, in this config it has no support for misaligned loads/stores.</div><div>When such memory access happens, it leads to trap and then the M-mode emulator is used ( very similar to opensbi)</div><div>Also I have two perf counters - trp_lam/trp_sam (for misaligned loads and stores resp.)</div><div>Using them and perfasm.jar I can track every misaligned access.</div><div>I have started with the patch from Xiaolin Zheng (which fixes misaligned memory access when writing/reading instructions), that completely removed trp_sam events, but trp_lam was mostly unaffected.</div><div>Using perfasm.jar I have found the rest of trp_lam to originate from Template Interpreter's generated code.</div><div><br></div><div>Here is numbers on current jdk21-dev (without Xiaolin's patch)</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>java -Xint -version</div><div><br></div><div> 239163 trp_lam </div><div> 16289 trp_sam </div><div><br></div><div> 5.602736519 seconds time elapsed</div><div> 5.260201000 seconds user</div><div><br></div><div>Total executed instructions - 380M</div><div>1:1600 (trp_lam:total) - pretty high ration.</div><div><br></div><div>I was able to identify and fix all of them (also applying Xiaolin's patch)</div><div>New results:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>java -Xint -version</div><div><br></div><div> 0 trp_lam </div><div> 0 trp_sam </div><div><br></div><div> 4.273510055 seconds time elapsed</div><div> 3.926482000 seconds user</div><div><br></div><div>Notice time improvements.</div><div><br></div><div>Also running renaissance philosophers in Xint mode for 20 minutes:</div><div><br></div><div> 0 trp_lam </div><div> 0 trp_sam </div><div><br></div><div> 1290.397695196 seconds time elapsed</div><div><br></div><div> 2099.607472000 seconds user</div><div> 40.825845000 seconds sys</div><div><br></div><div>Clear win, for this fpga.</div><div><br></div><div>I can still get some trp_lam when running java -Xcomp -version, but their number is pretty low (less than 50) and they come from C2 generated code.</div><div><br></div><div>Now need to check if this changes affect performance on real hardware (I don't want to impact their performance)</div><div>java -Xint -version is too fast for it, so I was running renaissance philosophers in Xint mode, just one repetition, multiple runs.</div><div><br></div><div>Checking on Thead (c910 core):</div><div>before:</div><div>671-684 seconds</div><div><br></div><div>after:</div><div>657-689 seconds</div><div><br></div><div>It's good it’s not worse</div><div><br></div><div>On hifive umatched:</div><div>before:</div><div>2638-2663 seconds</div><div><br></div><div>after:</div><div>1489-1504 seconds</div><div><br></div><div>hifive benefits it.</div><div><br></div><div>I would like to get some pre-review for my patch [1]</div><div>Main points:</div><div> - safeness of using t0/t1 registers.</div><div> - the method void TemplateTable::load_unsigned_short_at_bcp(Register dst, int offset, Register tmp), maybe it has to be designed differently [2] [3]</div><div><br></div><div><br></div><div> The patch [1] has some comments describing how much of trp_lam events I won there.</div><div><br></div><div>Regards, Vladimir</div><div><br></div><div> [1] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616</div><div> [2] https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-ecc50a63ee11d784ec34c55425afb755500a58f9ef4065cdc691fe18fce3692dR148</div><div> [3] <a href="https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-412c07ae1ae7770f87b04175c0d65ed3cc1f60dca186e3cfaf0af6b6d00b597eR104">https://github.com/VladimirKempik/jdk/commit/18d7f399ce1bc213b2495411193938d914d3f616#diff-412c07ae1ae7770f87b04175c0d65ed3cc1f60dca186e3cfaf0af6b6d00b597eR104</a></div><div> [4] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-July/000563.html</div></body></html>