RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory

Eugene Astigeevich github.com+42899633+eastig at openjdk.java.net
Mon Nov 23 21:07:05 UTC 2020


On Sun, 22 Nov 2020 20:57:51 GMT, Eugene Astigeevich <github.com+42899633+eastig at openjdk.org> wrote:

>> JMH microbenchmark results for testChar:
>> |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error|
>> |-|-|-|-|-|-|
>> |ArrayCopyAligned.testChar|33|25|ns/op|-29.41%|0.73%|
>> |ArrayCopyAligned.testChar|34|25|ns/op|-30.14%|0.99%|
>> |ArrayCopyAligned.testChar|35|25|ns/op|-29.37%|0.44%|
>> |ArrayCopyAligned.testChar|36|25|ns/op|-29.85%|0.70%|
>> |ArrayCopyAligned.testChar|37|25|ns/op|-29.33%|0.65%|
>> |ArrayCopyAligned.testChar|38|25|ns/op|-29.69%|0.52%|
>> |ArrayCopyAligned.testChar|39|25|ns/op|-29.44%|0.79%|
>> |ArrayCopyAligned.testChar|40|25|ns/op|-29.82%|0.82%|
>> |ArrayCopyAligned.testChar|41|25|ns/op|-29.62%|0.74%|
>> |ArrayCopyAligned.testChar|42|25|ns/op|-29.88%|0.61%|
>> |ArrayCopyAligned.testChar|43|25|ns/op|-29.19%|0.64%|
>> |ArrayCopyAligned.testChar|44|25|ns/op|-29.89%|0.71%|
>> |ArrayCopyAligned.testChar|45|25|ns/op|-29.52%|0.80%|
>> |ArrayCopyAligned.testChar|46|25|ns/op|-29.71%|0.58%|
>> |ArrayCopyAligned.testChar|47|25|ns/op|-29.49%|0.71%|
>> |ArrayCopyAligned.testChar|48|25|ns/op|-29.89%|0.91%|
>> |ArrayCopyUnalignedBoth.testChar|33|25|ns/op|-29.04%|0.87%|
>> |ArrayCopyUnalignedBoth.testChar|34|25|ns/op|-29.21%|0.70%|
>> |ArrayCopyUnalignedBoth.testChar|35|25|ns/op|-27.70%|1.22%|
>> |ArrayCopyUnalignedBoth.testChar|36|25|ns/op|-28.68%|1.86%|
>> |ArrayCopyUnalignedBoth.testChar|37|25|ns/op|-27.81%|1.43%|
>> |ArrayCopyUnalignedBoth.testChar|38|25|ns/op|-29.54%|0.61%|
>> |ArrayCopyUnalignedBoth.testChar|39|25|ns/op|-29.89%|0.85%|
>> |ArrayCopyUnalignedBoth.testChar|40|25|ns/op|-30.97%|0.68%|
>> |ArrayCopyUnalignedBoth.testChar|41|25|ns/op|-29.96%|0.78%|
>> |ArrayCopyUnalignedBoth.testChar|42|25|ns/op|-30.79%|0.81%|
>> |ArrayCopyUnalignedBoth.testChar|43|25|ns/op|-29.57%|0.58%|
>> |ArrayCopyUnalignedBoth.testChar|44|25|ns/op|-31.02%|0.34%|
>> |ArrayCopyUnalignedBoth.testChar|45|25|ns/op|-30.05%|0.75%|
>> |ArrayCopyUnalignedBoth.testChar|46|25|ns/op|-30.56%|0.55%|
>> |ArrayCopyUnalignedBoth.testChar|47|25|ns/op|-30.39%|0.52%|
>> |ArrayCopyUnalignedBoth.testChar|48|25|ns/op|-30.94%|0.38%|
>> |ArrayCopyUnalignedDst.testChar|33|25|ns/op|-19.97%|1.08%|
>> |ArrayCopyUnalignedDst.testChar|34|25|ns/op|-16.05%|0.89%|
>> |ArrayCopyUnalignedDst.testChar|35|25|ns/op|-20.83%|1.26%|
>> |ArrayCopyUnalignedDst.testChar|36|25|ns/op|-16.09%|0.77%|
>> |ArrayCopyUnalignedDst.testChar|37|25|ns/op|-20.11%|1.24%|
>> |ArrayCopyUnalignedDst.testChar|38|25|ns/op|-15.26%|0.91%|
>> |ArrayCopyUnalignedDst.testChar|39|25|ns/op|-29.54%|0.56%|
>> |ArrayCopyUnalignedDst.testChar|40|25|ns/op|-29.53%|0.77%|
>> |ArrayCopyUnalignedDst.testChar|41|25|ns/op|-29.52%|0.87%|
>> |ArrayCopyUnalignedDst.testChar|42|25|ns/op|-29.45%|0.77%|
>> |ArrayCopyUnalignedDst.testChar|43|25|ns/op|-29.57%|1.06%|
>> |ArrayCopyUnalignedDst.testChar|44|25|ns/op|-29.69%|0.61%|
>> |ArrayCopyUnalignedDst.testChar|45|25|ns/op|-29.52%|0.83%|
>> |ArrayCopyUnalignedDst.testChar|46|25|ns/op|-29.31%|0.48%|
>> |ArrayCopyUnalignedDst.testChar|47|25|ns/op|-29.64%|0.50%|
>> |ArrayCopyUnalignedDst.testChar|48|25|ns/op|-29.75%|0.22%|
>> |ArrayCopyUnalignedSrc.testChar|33|25|ns/op|-29.33%|0.76%|
>> |ArrayCopyUnalignedSrc.testChar|34|25|ns/op|-30.11%|0.39%|
>> |ArrayCopyUnalignedSrc.testChar|35|25|ns/op|-29.54%|0.80%|
>> |ArrayCopyUnalignedSrc.testChar|36|25|ns/op|-30.07%|0.36%|
>> |ArrayCopyUnalignedSrc.testChar|37|25|ns/op|-29.41%|0.40%|
>> |ArrayCopyUnalignedSrc.testChar|38|25|ns/op|-29.95%|0.32%|
>> |ArrayCopyUnalignedSrc.testChar|39|25|ns/op|-29.39%|0.82%|
>> |ArrayCopyUnalignedSrc.testChar|40|25|ns/op|-29.85%|0.69%|
>> |ArrayCopyUnalignedSrc.testChar|41|25|ns/op|-28.93%|0.67%|
>> |ArrayCopyUnalignedSrc.testChar|42|25|ns/op|-29.50%|0.70%|
>> |ArrayCopyUnalignedSrc.testChar|43|25|ns/op|-28.95%|0.71%|
>> |ArrayCopyUnalignedSrc.testChar|44|25|ns/op|-29.75%|0.66%|
>> |ArrayCopyUnalignedSrc.testChar|45|25|ns/op|-29.02%|0.87%|
>> |ArrayCopyUnalignedSrc.testChar|46|25|ns/op|-29.76%|0.69%|
>> |ArrayCopyUnalignedSrc.testChar|47|25|ns/op|-29.37%|0.50%|
>> |ArrayCopyUnalignedSrc.testChar|48|25|ns/op|-29.71%|0.73%|
>
> JMH microbenchmark results for testInt:
> |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error|
> |-|-|-|-|-|-|
> |ArrayCopyAligned.testInt|17|25|ns/op|-26.25%|2.08%|
> |ArrayCopyAligned.testInt|18|25|ns/op|-29.10%|0.04%|
> |ArrayCopyAligned.testInt|19|25|ns/op|-28.93%|0.19%|
> |ArrayCopyAligned.testInt|20|25|ns/op|-29.11%|0.06%|
> |ArrayCopyAligned.testInt|21|25|ns/op|-29.06%|0.06%|
> |ArrayCopyAligned.testInt|22|25|ns/op|-29.12%|0.13%|
> |ArrayCopyAligned.testInt|23|25|ns/op|-29.10%|0.04%|
> |ArrayCopyAligned.testInt|24|25|ns/op|-28.96%|0.16%|
> |ArrayCopyUnalignedBoth.testInt|17|25|ns/op|-25.34%|2.05%|
> |ArrayCopyUnalignedBoth.testInt|18|25|ns/op|-28.96%|0.07%|
> |ArrayCopyUnalignedBoth.testInt|19|25|ns/op|-29.01%|0.09%|
> |ArrayCopyUnalignedBoth.testInt|20|25|ns/op|-28.95%|0.10%|
> |ArrayCopyUnalignedBoth.testInt|21|25|ns/op|-29.01%|0.07%|
> |ArrayCopyUnalignedBoth.testInt|22|25|ns/op|-29.04%|0.12%|
> |ArrayCopyUnalignedBoth.testInt|23|25|ns/op|-29.01%|0.10%|
> |ArrayCopyUnalignedBoth.testInt|24|25|ns/op|-29.05%|0.04%|
> |ArrayCopyUnalignedDst.testInt|17|25|ns/op|-27.63%|3.12%|
> |ArrayCopyUnalignedDst.testInt|18|25|ns/op|-25.75%|3.44%|
> |ArrayCopyUnalignedDst.testInt|19|25|ns/op|-29.06%|0.06%|
> |ArrayCopyUnalignedDst.testInt|20|25|ns/op|-29.07%|0.04%|
> |ArrayCopyUnalignedDst.testInt|21|25|ns/op|-29.02%|0.07%|
> |ArrayCopyUnalignedDst.testInt|22|25|ns/op|-29.03%|0.06%|
> |ArrayCopyUnalignedDst.testInt|23|25|ns/op|-29.01%|0.07%|
> |ArrayCopyUnalignedDst.testInt|24|25|ns/op|-29.05%|0.07%|
> |ArrayCopyUnalignedSrc.testInt|17|25|ns/op|-27.76%|1.35%|
> |ArrayCopyUnalignedSrc.testInt|18|25|ns/op|-28.91%|0.10%|
> |ArrayCopyUnalignedSrc.testInt|19|25|ns/op|-28.92%|0.12%|
> |ArrayCopyUnalignedSrc.testInt|20|25|ns/op|-28.91%|0.09%|
> |ArrayCopyUnalignedSrc.testInt|21|25|ns/op|-28.97%|0.06%|
> |ArrayCopyUnalignedSrc.testInt|22|25|ns/op|-28.95%|0.29%|
> |ArrayCopyUnalignedSrc.testInt|23|25|ns/op|-29.01%|0.04%|
> |ArrayCopyUnalignedSrc.testInt|24|25|ns/op|-28.93%|0.23%|

JMH microbenchmark results for testLong:
|Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error|
|-|-|-|-|-|-|
|ArrayCopyAligned.testLong|9|25|ns/op|-29.05%|0.04%|
|ArrayCopyAligned.testLong|10|25|ns/op|-28.91%|0.06%|
|ArrayCopyAligned.testLong|11|25|ns/op|-29.08%|0.06%|
|ArrayCopyAligned.testLong|12|25|ns/op|-29.07%|0.04%|
|ArrayCopyUnalignedBoth.testLong|9|25|ns/op|-29.08%|0.06%|
|ArrayCopyUnalignedBoth.testLong|10|25|ns/op|-2.83%|0.56%|
|ArrayCopyUnalignedBoth.testLong|11|25|ns/op|-29.13%|0.04%|
|ArrayCopyUnalignedBoth.testLong|12|25|ns/op|-16.06%|0.45%|
|ArrayCopyUnalignedDst.testLong|9|25|ns/op|-29.03%|0.06%|
|ArrayCopyUnalignedDst.testLong|10|25|ns/op|-28.88%|0.04%|
|ArrayCopyUnalignedDst.testLong|11|25|ns/op|-29.02%|0.07%|
|ArrayCopyUnalignedDst.testLong|12|25|ns/op|-28.92%|0.07%|
|ArrayCopyUnalignedSrc.testLong|9|25|ns/op|-29.11%|0.04%|
|ArrayCopyUnalignedSrc.testLong|10|25|ns/op|-29.10%|0.06%|
|ArrayCopyUnalignedSrc.testLong|11|25|ns/op|-29.11%|0.04%|
|ArrayCopyUnalignedSrc.testLong|12|25|ns/op|-29.12%|0.03%|

-------------

PR: https://git.openjdk.java.net/jdk/pull/1293


More information about the hotspot-compiler-dev mailing list