<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">Hello.<div>Currently ( if RVV is not used), we doing copy_memory not so great.</div><div>At best doing just 8 bytes per loop ( copy8 label, one ld, one sd)</div><div><br></div><div>I propose we use faster version when possible: </div><div>using 4 ld in a row then 4 sd. Copying 32 bytes per loop, similar to [1]</div><div><br></div><div>I have made a prototype [2], check the copy32 label there. It also have some comments on other parts of copy_memory stub</div><div>Here are results of jmh testing on rvb-ice thead c910 board:</div><div><br></div><div><div>Before ( copy8 only )</div><div>Benchmark <span class="Apple-tab-span" style="white-space:pre"> </span> (size) Mode Cnt Score Error Units</div><div>ArrayCopyObject.conjoint_micro 31 thrpt 25 6653.095 ± 251.565 ops/ms</div><div>ArrayCopyObject.conjoint_micro 63 thrpt 25 4933.970 ± 77.559 ops/ms</div><div>ArrayCopyObject.conjoint_micro 127 thrpt 25 3627.454 ± 34.589 ops/ms</div><div>ArrayCopyObject.conjoint_micro 2047 thrpt 25 368.249 ± 0.453 ops/ms</div><div>ArrayCopyObject.conjoint_micro 4095 thrpt 25 187.776 ± 0.306 ops/ms</div><div>ArrayCopyObject.conjoint_micro 8191 thrpt 25 94.477 ± 0.340 ops/ms</div><div><br></div><div>after ( with copy32 )</div><div><br></div><div>ArrayCopyObject.conjoint_micro 31 thrpt 25 7620.546 ± 69.756 ops/ms</div><div>ArrayCopyObject.conjoint_micro 63 thrpt 25 6677.978 ± 33.112 ops/ms</div><div>ArrayCopyObject.conjoint_micro 127 thrpt 25 5206.973 ± 22.612 ops/ms</div><div>ArrayCopyObject.conjoint_micro 2047 thrpt 25 653.655 ± 31.494 ops/ms</div><div>ArrayCopyObject.conjoint_micro 4095 thrpt 25 352.905 ± 7.390 ops/ms</div><div>ArrayCopyObject.conjoint_micro 8191 thrpt 25 165.127 ± 0.832 ops/ms</div></div><div><br></div><div>However I still have some issues with the code, when copy mode is (!is_aligned and !is_backward) - I’m getting ClassNotFound exceptions from classLoader, while trying to run JMH tests.</div><div>I think it’s related to my patch, I have made a simple workaround for this case [3] to be able to make some measurements.</div><div><br></div><div>Any help on catching these bugs is highly appreciated.</div><div><br></div><div>Best Regards, Vladimir.</div><div>[1] <a href="https://github.com/eblot/newlib/blob/master/newlib/libc/string/memcpy.c">https://github.com/eblot/newlib/blob/master/newlib/libc/string/memcpy.c</a></div><div>[2] <a href="https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc">https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc</a></div><div>[3] https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc#r89241535</div></body></html>