RFR: 8255351: Add detection for Graviton 1 & 2 CPUs
Eugene Astigeevich
github.com+42899633+eastig at openjdk.java.net
Mon Nov 23 21:07:10 UTC 2020
On Fri, 20 Nov 2020 17:47:13 GMT, Eugene Astigeevich <github.com+42899633+eastig at openjdk.org> wrote:
>> Hi Evegeny,
>>
>> in general, your changes look good to me.
>>
>> You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them.
>> Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :)
>>
>> Could you please also post some results for byte arrays (with and without SIMD).
>>
>> Thank you and best regards,
>> Volker
>
> I work for the Amazon Corretto team and am covered by the Amazon OCA. See the [comment](https://github.com/openjdk/jdk/pull/1315#issuecomment-731262807) from Volker above.
These are JMH microbenchmark results when UseSIMDForMemoryOps is on for all types of copying.
Regressions are due to the use of ld4/st4 and are fixed in PR https://github.com/openjdk/jdk/pull/1293
|Benchmark|Length|Cnt|Units|Diff|Max Relative Error|
|-|-|-|-|-|-|
|ArrayCopy.arrayCopyChar|46|25|ns/op|5.26%🔴|0.01%|
|ArrayCopy.arrayCopyCharNonConst|46|25|ns/op|15.79%🔴|0.01%|
|ArrayCopy.arrayCopyObject|200|25|ns/op|-5.91%|1.76%|
|ArrayCopy.arrayCopyObjectNonConst|200|25|ns/op|-5.78%|0.73%|
|ArrayCopy.arrayCopyObjectSameArraysBackwa|200|25|ns/op|-1.27%|0.71%|
|ArrayCopy.arrayCopyObjectSameArraysForwar|200|25|ns/op|-2.05%|0.76%|
|ArrayCopyAligned.testByte|70|25|ns/op|48.59%🔴|0.48%|
|ArrayCopyAligned.testByte|150|25|ns/op|1.72%|4.19%|
|ArrayCopyAligned.testByte|300|25|ns/op|-3.20%|4.78%|
|ArrayCopyAligned.testByte|600|25|ns/op|-8.24%|2.62%|
|ArrayCopyAligned.testByte|1200|25|ns/op|-13.33%|3.13%|
|ArrayCopyAligned.testChar|20|25|ns/op|-5.57%|0.01%|
|ArrayCopyAligned.testChar|70|25|ns/op|-5.42%|3.64%|
|ArrayCopyAligned.testChar|150|25|ns/op|-4.96%|1.58%|
|ArrayCopyAligned.testChar|300|25|ns/op|-12.06%|0.83%|
|ArrayCopyAligned.testChar|600|25|ns/op|-16.13%|0.37%|
|ArrayCopyAligned.testChar|1200|25|ns/op|-16.12%|0.80%|
|ArrayCopyAligned.testInt|10|25|ns/op|-5.55%|0.04%|
|ArrayCopyAligned.testInt|20|25|ns/op|34.75%🔴|1.30%|
|ArrayCopyAligned.testInt|70|25|ns/op|-8.75%|2.12%|
|ArrayCopyAligned.testInt|150|25|ns/op|-11.74%|1.13%|
|ArrayCopyAligned.testInt|300|25|ns/op|-14.38%|0.69%|
|ArrayCopyAligned.testInt|600|25|ns/op|-17.87%|1.08%|
|ArrayCopyAligned.testInt|1200|25|ns/op|-18.01%|0.90%|
|ArrayCopyAligned.testLong|5|25|ns/op|-4.37%|0.77%|
|ArrayCopyAligned.testLong|10|25|ns/op|27.45%🔴|6.59%|
|ArrayCopyAligned.testLong|20|25|ns/op|-1.95%|2.76%|
|ArrayCopyAligned.testLong|70|25|ns/op|-11.46%|1.42%|
|ArrayCopyAligned.testLong|150|25|ns/op|-16.28%|0.68%|
|ArrayCopyAligned.testLong|300|25|ns/op|-18.02%|1.90%|
|ArrayCopyAligned.testLong|600|25|ns/op|-18.08%|0.90%|
|ArrayCopyAligned.testLong|1200|25|ns/op|-18.67%|1.16%|
|ArrayCopyUnalignedBoth.testByte|70|25|ns/op|38.98%🔴|0.47%|
|ArrayCopyUnalignedBoth.testByte|150|25|ns/op|0.32%|1.15%|
|ArrayCopyUnalignedBoth.testByte|300|25|ns/op|-1.94%|1.72%|
|ArrayCopyUnalignedBoth.testByte|600|25|ns/op|-4.96%|1.05%|
|ArrayCopyUnalignedBoth.testByte|1200|25|ns/op|-11.10%|1.11%|
|ArrayCopyUnalignedBoth.testChar|20|25|ns/op|-5.56%|0.07%|
|ArrayCopyUnalignedBoth.testChar|70|25|ns/op|-2.05%|1.43%|
|ArrayCopyUnalignedBoth.testChar|150|25|ns/op|-5.62%|1.32%|
|ArrayCopyUnalignedBoth.testChar|300|25|ns/op|-10.33%|0.70%|
|ArrayCopyUnalignedBoth.testChar|600|25|ns/op|-14.68%|0.39%|
|ArrayCopyUnalignedBoth.testChar|1200|25|ns/op|-16.43%|0.44%|
|ArrayCopyUnalignedBoth.testInt|10|25|ns/op|-5.55%|0.01%|
|ArrayCopyUnalignedBoth.testInt|20|25|ns/op|33.55%🔴|0.36%|
|ArrayCopyUnalignedBoth.testInt|70|25|ns/op|-5.35%|2.38%|
|ArrayCopyUnalignedBoth.testInt|150|25|ns/op|-11.15%|1.75%|
|ArrayCopyUnalignedBoth.testInt|300|25|ns/op|-14.18%|1.27%|
|ArrayCopyUnalignedBoth.testInt|600|25|ns/op|-15.84%|0.68%|
|ArrayCopyUnalignedBoth.testInt|1200|25|ns/op|-16.42%|0.45%|
|ArrayCopyUnalignedBoth.testLong|5|25|ns/op|-5.54%|0.01%|
|ArrayCopyUnalignedBoth.testLong|10|25|ns/op|35.60%🔴|1.69%|
|ArrayCopyUnalignedBoth.testLong|20|25|ns/op|-3.18%|2.26%|
|ArrayCopyUnalignedBoth.testLong|70|25|ns/op|-10.66%|0.99%|
|ArrayCopyUnalignedBoth.testLong|150|25|ns/op|-15.68%|1.01%|
|ArrayCopyUnalignedBoth.testLong|300|25|ns/op|-15.57%|0.47%|
|ArrayCopyUnalignedBoth.testLong|600|25|ns/op|-17.11%|0.23%|
|ArrayCopyUnalignedBoth.testLong|1200|25|ns/op|-17.00%|0.55%|
|ArrayCopyUnalignedDst.testByte|70|25|ns/op|48.32%🔴|0.30%|
|ArrayCopyUnalignedDst.testByte|150|25|ns/op|-0.68%|4.05%|
|ArrayCopyUnalignedDst.testByte|300|25|ns/op|-7.50%|1.35%|
|ArrayCopyUnalignedDst.testByte|600|25|ns/op|-10.04%|1.51%|
|ArrayCopyUnalignedDst.testByte|1200|25|ns/op|-14.07%|0.98%|
|ArrayCopyUnalignedDst.testChar|20|25|ns/op|-5.54%|0.01%|
|ArrayCopyUnalignedDst.testChar|70|25|ns/op|-5.70%|3.79%|
|ArrayCopyUnalignedDst.testChar|150|25|ns/op|-6.26%|1.59%|
|ArrayCopyUnalignedDst.testChar|300|25|ns/op|-12.78%|0.86%|
|ArrayCopyUnalignedDst.testChar|600|25|ns/op|-14.29%|0.54%|
|ArrayCopyUnalignedDst.testChar|1200|25|ns/op|-17.37%|1.18%|
|ArrayCopyUnalignedDst.testInt|10|25|ns/op|-5.55%|0.01%|
|ArrayCopyUnalignedDst.testInt|20|25|ns/op|34.84%🔴|1.46%|
|ArrayCopyUnalignedDst.testInt|70|25|ns/op|-8.32%|1.33%|
|ArrayCopyUnalignedDst.testInt|150|25|ns/op|-11.82%|0.61%|
|ArrayCopyUnalignedDst.testInt|300|25|ns/op|-13.78%|0.59%|
|ArrayCopyUnalignedDst.testInt|600|25|ns/op|-16.61%|0.94%|
|ArrayCopyUnalignedDst.testInt|1200|25|ns/op|-17.63%|0.78%|
|ArrayCopyUnalignedDst.testLong|5|25|ns/op|-5.51%|0.07%|
|ArrayCopyUnalignedDst.testLong|10|25|ns/op|37.01%🔴|1.03%|
|ArrayCopyUnalignedDst.testLong|20|25|ns/op|-3.52%|2.72%|
|ArrayCopyUnalignedDst.testLong|70|25|ns/op|-10.93%|1.16%|
|ArrayCopyUnalignedDst.testLong|150|25|ns/op|-16.99%|0.61%|
|ArrayCopyUnalignedDst.testLong|300|25|ns/op|-16.65%|0.45%|
|ArrayCopyUnalignedDst.testLong|600|25|ns/op|-16.55%|0.60%|
|ArrayCopyUnalignedDst.testLong|1200|25|ns/op|-17.37%|0.76%|
|ArrayCopyUnalignedSrc.testByte|70|25|ns/op|38.66%🔴|0.22%|
|ArrayCopyUnalignedSrc.testByte|150|25|ns/op|1.64%|1.69%|
|ArrayCopyUnalignedSrc.testByte|300|25|ns/op|-5.86%|0.64%|
|ArrayCopyUnalignedSrc.testByte|600|25|ns/op|-10.30%|1.71%|
|ArrayCopyUnalignedSrc.testByte|1200|25|ns/op|-14.25%|0.91%|
|ArrayCopyUnalignedSrc.testChar|20|25|ns/op|-5.73%|0.10%|
|ArrayCopyUnalignedSrc.testChar|70|25|ns/op|-3.69%|1.68%|
|ArrayCopyUnalignedSrc.testChar|150|25|ns/op|-8.36%|2.28%|
|ArrayCopyUnalignedSrc.testChar|300|25|ns/op|-9.90%|0.49%|
|ArrayCopyUnalignedSrc.testChar|600|25|ns/op|-15.08%|0.55%|
|ArrayCopyUnalignedSrc.testChar|1200|25|ns/op|-17.08%|0.49%|
|ArrayCopyUnalignedSrc.testInt|10|25|ns/op|-5.55%|0.01%|
|ArrayCopyUnalignedSrc.testInt|20|25|ns/op|33.53%🔴|0.28%|
|ArrayCopyUnalignedSrc.testInt|70|25|ns/op|-8.23%|2.06%|
|ArrayCopyUnalignedSrc.testInt|150|25|ns/op|-12.65%|1.27%|
|ArrayCopyUnalignedSrc.testInt|300|25|ns/op|-14.22%|0.41%|
|ArrayCopyUnalignedSrc.testInt|600|25|ns/op|-16.20%|0.37%|
|ArrayCopyUnalignedSrc.testInt|1200|25|ns/op|-15.81%|1.09%|
|ArrayCopyUnalignedSrc.testLong|5|25|ns/op|-5.54%|0.01%|
|ArrayCopyUnalignedSrc.testLong|10|25|ns/op|35.81%🔴|0.54%|
|ArrayCopyUnalignedSrc.testLong|20|25|ns/op|-3.96%|2.10%|
|ArrayCopyUnalignedSrc.testLong|70|25|ns/op|-10.90%|0.79%|
|ArrayCopyUnalignedSrc.testLong|150|25|ns/op|-15.83%|0.64%|
|ArrayCopyUnalignedSrc.testLong|300|25|ns/op|-17.88%|1.61%|
|ArrayCopyUnalignedSrc.testLong|600|25|ns/op|-18.03%|0.88%|
|ArrayCopyUnalignedSrc.testLong|1200|25|ns/op|-18.93%|0.04%|
-------------
PR: https://git.openjdk.java.net/jdk/pull/1315
More information about the hotspot-dev
mailing list