RFR: 8179444: AArch64: Put zero_words on a diet
Andrew Haley
aph at redhat.com
Wed May 3 17:05:10 UTC 2017
New version, corrected:
The code we generate for ClearArray in C2 is much too verbose. It
looks like this:
0x000003ffad2213a4: cbz x11, 0x000003ffad22140c
0x000003ffad2213a8: tbz w10, #3, 0x000003ffad2213b4
0x000003ffad2213ac: str xzr, [x10],#8
0x000003ffad2213b0: sub x11, x11, #0x1
0x000003ffad2213b4: subs xscratch1, x11, #0x20
0x000003ffad2213b8: b.lt 0x000003ffad2213c0
0x000003ffad2213bc: bl Stub::zero_longs ; {external_word}
0x000003ffad2213c0: and xscratch1, x11, #0xe
0x000003ffad2213c4: sub x11, x11, xscratch1
0x000003ffad2213c8: add x10, x10, xscratch1, lsl #3
0x000003ffad2213cc: adr xscratch2, 0x000003ffad2213fc
0x000003ffad2213d0: sub xscratch2, xscratch2, xscratch1, lsl #1
0x000003ffad2213d4: br xscratch2
0x000003ffad2213d8: add x10, x10, #0x80
0x000003ffad2213dc: stp xzr, xzr, [x10,#-128]
0x000003ffad2213e0: stp xzr, xzr, [x10,#-112]
0x000003ffad2213e4: stp xzr, xzr, [x10,#-96]
0x000003ffad2213e8: stp xzr, xzr, [x10,#-80]
0x000003ffad2213ec: stp xzr, xzr, [x10,#-64]
0x000003ffad2213f0: stp xzr, xzr, [x10,#-48]
0x000003ffad2213f4: stp xzr, xzr, [x10,#-32]
0x000003ffad2213f8: stp xzr, xzr, [x10,#-16]
0x000003ffad2213fc: subs x11, x11, #0x10
0x000003ffad221400: b.ge 0x000003ffad2213d8
0x000003ffad221404: tbz w11, #0, 0x000003ffad22140c
0x000003ffad221408: str xzr, [x10],#8
This patch takes much of this code and puts it into a stub. The new
version of ClearArray is:
0x000003ff8022b7b0: cmp x11, #0x8
0x000003ff8022b7b4: b.cc 0x000003ff8022b7bc
0x000003ff8022b7b8: bl Stub::zero_blocks ; {runtime_call StubRoutines (2)}
0x000003ff8022b7bc: tbz w11, #2, 0x000003ff8022b7c8
0x000003ff8022b7c0: stp xzr, xzr, [x10],#16
0x000003ff8022b7c4: stp xzr, xzr, [x10],#16
0x000003ff8022b7c8: tbz w11, #1, 0x000003ff8022b7d0
0x000003ff8022b7cc: stp xzr, xzr, [x10],#16
0x000003ff8022b7d0: tbz w11, #0, 0x000003ff8022b7d8
0x000003ff8022b7d4: str xzr, [x10]
... which I hope you'll agree is much better.
The idea is to handle array sizes of 0-7 words inline, so small arrays
are got out of the way very quickly, and handle anything larger in
Stub::zero_blocks. I wanted to make sure that there is no significant
loss of performance, and I have attached the results of the benchmark
I used, which does no more than create an array of ints of various
sizes. There are winners and losers, but nothing is changed by very
much, and the code cache usage of each ClearArray goes down from 104
to 40 bytes.
http://cr.openjdk.java.net/~aph/8179444-2/
OK?
Andrew.
Before:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 10 48.273 ? 1.679 ns/op
CreateArray.newArray 7 avgt 10 48.915 ? 0.793 ns/op
CreateArray.newArray 10 avgt 10 49.826 ? 0.868 ns/op
CreateArray.newArray 15 avgt 10 52.582 ? 0.521 ns/op
CreateArray.newArray 23 avgt 10 57.589 ? 0.670 ns/op
CreateArray.newArray 34 avgt 10 67.233 ? 0.984 ns/op
CreateArray.newArray 51 avgt 10 120.652 ? 2.018 ns/op
CreateArray.newArray 77 avgt 10 102.745 ? 1.034 ns/op
CreateArray.newArray 115 avgt 10 136.703 ? 1.067 ns/op
CreateArray.newArray 173 avgt 10 182.247 ? 1.093 ns/op
CreateArray.newArray 259 avgt 10 163.168 ? 5.967 ns/op
CreateArray.newArray 389 avgt 10 233.874 ? 3.400 ns/op
CreateArray.newArray 584 avgt 10 251.286 ? 4.892 ns/op
CreateArray.newArray 876 avgt 10 242.510 ? 0.520 ns/op
CreateArray.newArray 1314 avgt 10 382.846 ? 0.624 ns/op
CreateArray.newArray 1971 avgt 10 487.590 ? 1.409 ns/op
After:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 10 47.208 ? 0.656 ns/op
CreateArray.newArray 7 avgt 10 47.838 ? 0.608 ns/op
CreateArray.newArray 10 avgt 10 48.798 ? 0.797 ns/op
CreateArray.newArray 15 avgt 10 51.981 ? 0.424 ns/op
CreateArray.newArray 23 avgt 10 56.614 ? 1.064 ns/op
CreateArray.newArray 34 avgt 10 65.986 ? 1.114 ns/op
CreateArray.newArray 51 avgt 10 119.811 ? 0.857 ns/op
CreateArray.newArray 77 avgt 10 101.694 ? 1.192 ns/op
CreateArray.newArray 115 avgt 10 137.169 ? 2.159 ns/op
CreateArray.newArray 173 avgt 10 185.815 ? 0.754 ns/op
CreateArray.newArray 259 avgt 10 163.305 ? 2.107 ns/op
CreateArray.newArray 389 avgt 10 234.049 ? 3.162 ns/op
CreateArray.newArray 584 avgt 10 250.729 ? 1.714 ns/op
CreateArray.newArray 876 avgt 10 242.921 ? 0.577 ns/op
CreateArray.newArray 1314 avgt 10 384.337 ? 1.465 ns/op
CreateArray.newArray 1971 avgt 10 486.948 ? 5.303 ns/op
More information about the hotspot-dev
mailing list