RFR: 8179444: AArch64: Put zero_words on a diet
Andrew Haley
aph at redhat.com
Fri Apr 28 19:28:11 UTC 2017
The code we generate for ClearArray in C2 is much too verbose. It
looks like this:
0x000003ffad2213a4: cbz x11, 0x000003ffad22140c
0x000003ffad2213a8: tbz w10, #3, 0x000003ffad2213b4
0x000003ffad2213ac: str xzr, [x10],#8
0x000003ffad2213b0: sub x11, x11, #0x1
0x000003ffad2213b4: subs xscratch1, x11, #0x20
0x000003ffad2213b8: b.lt 0x000003ffad2213c0
0x000003ffad2213bc: bl Stub::zero_longs ; {external_word}
0x000003ffad2213c0: and xscratch1, x11, #0xe
0x000003ffad2213c4: sub x11, x11, xscratch1
0x000003ffad2213c8: add x10, x10, xscratch1, lsl #3
0x000003ffad2213cc: adr xscratch2, 0x000003ffad2213fc
0x000003ffad2213d0: sub xscratch2, xscratch2, xscratch1, lsl #1
0x000003ffad2213d4: br xscratch2
0x000003ffad2213d8: add x10, x10, #0x80
0x000003ffad2213dc: stp xzr, xzr, [x10,#-128]
0x000003ffad2213e0: stp xzr, xzr, [x10,#-112]
0x000003ffad2213e4: stp xzr, xzr, [x10,#-96]
0x000003ffad2213e8: stp xzr, xzr, [x10,#-80]
0x000003ffad2213ec: stp xzr, xzr, [x10,#-64]
0x000003ffad2213f0: stp xzr, xzr, [x10,#-48]
0x000003ffad2213f4: stp xzr, xzr, [x10,#-32]
0x000003ffad2213f8: stp xzr, xzr, [x10,#-16]
0x000003ffad2213fc: subs x11, x11, #0x10
0x000003ffad221400: b.ge 0x000003ffad2213d8
0x000003ffad221404: tbz w11, #0, 0x000003ffad22140c
0x000003ffad221408: str xzr, [x10],#8
This patch takes much of this code and puts it into a stub. The new
version of ClearArray is:
0x000003ffad21088c: cmp x11, #0x8
0x000003ffad210890: b.lt 0x000003ffad210898
0x000003ffad210894: bl Stub::zero_blocks ; {runtime_call StubRoutines (2)}
0x000003ffad210898: and xscratch1, x11, #0x6
0x000003ffad21089c: adr xscratch2, 0x000003ffad2108b4
0x000003ffad2108a0: sub xscratch2, xscratch2, xscratch1, lsl #1
0x000003ffad2108a4: br xscratch2
0x000003ffad2108a8: stp xzr, xzr, [x10],#16
0x000003ffad2108ac: stp xzr, xzr, [x10],#16
0x000003ffad2108b0: stp xzr, xzr, [x10],#16
0x000003ffad2108b4: tbz w11, #0, 0x000003ffad2108bc
0x000003ffad2108b8: str xzr, [x10]
The idea is to handle array sizes of 0-7 words inline, so small arrays
are got out of the way very quickly, and handle anything larger in
Stub::zero_blocks. I wanted to make sure that there is no significant
loss of performance, and I have attached the results of the benchmark
I used, which does no more than create an array of ints of various
sizes. There are winners and losers, but nothing is changed by very
much, and the code cache usage of each ClearArray goes down from 104
to 48 bytes.
http://cr.openjdk.java.net/~aph/8179444/
OK?
Andrew.
Machine A:
Before:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 5 48.221 ? 3.185 ns/op
CreateArray.newArray 7 avgt 5 48.853 ? 1.921 ns/op
CreateArray.newArray 10 avgt 5 49.963 ? 2.240 ns/op
CreateArray.newArray 15 avgt 5 52.538 ? 1.332 ns/op
CreateArray.newArray 23 avgt 5 57.289 ? 1.120 ns/op
CreateArray.newArray 34 avgt 5 67.091 ? 2.207 ns/op
CreateArray.newArray 51 avgt 5 119.948 ? 1.839 ns/op
CreateArray.newArray 77 avgt 5 101.851 ? 1.968 ns/op
CreateArray.newArray 115 avgt 5 142.568 ? 3.621 ns/op
CreateArray.newArray 173 avgt 5 180.204 ? 2.908 ns/op
CreateArray.newArray 259 avgt 5 170.446 ? 6.083 ns/op
CreateArray.newArray 389 avgt 5 231.124 ? 1.804 ns/op
CreateArray.newArray 584 avgt 5 248.411 ? 0.438 ns/op
CreateArray.newArray 876 avgt 5 241.776 ? 1.261 ns/op
CreateArray.newArray 1314 avgt 5 383.609 ? 1.363 ns/op
CreateArray.newArray 1971 avgt 5 483.217 ? 8.044 ns/op
After:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 5 47.256 ? 1.511 ns/op
CreateArray.newArray 7 avgt 5 48.674 ? 1.046 ns/op
CreateArray.newArray 10 avgt 5 50.915 ? 2.581 ns/op
CreateArray.newArray 15 avgt 5 53.351 ? 6.562 ns/op
CreateArray.newArray 23 avgt 5 56.746 ? 3.820 ns/op
CreateArray.newArray 34 avgt 5 65.796 ? 3.357 ns/op
CreateArray.newArray 51 avgt 5 119.825 ? 2.268 ns/op
CreateArray.newArray 77 avgt 5 100.708 ? 1.647 ns/op
CreateArray.newArray 115 avgt 5 135.210 ? 2.844 ns/op
CreateArray.newArray 173 avgt 5 180.521 ? 1.373 ns/op
CreateArray.newArray 259 avgt 5 160.899 ? 2.677 ns/op
CreateArray.newArray 389 avgt 5 230.253 ? 1.412 ns/op
CreateArray.newArray 584 avgt 5 249.173 ? 2.827 ns/op
CreateArray.newArray 876 avgt 5 242.180 ? 0.991 ns/op
CreateArray.newArray 1314 avgt 5 385.272 ? 1.872 ns/op
CreateArray.newArray 1971 avgt 5 485.198 ? 3.196 ns/op
Machine B:
The timings for Machine B are very noisy with small array sizes, so
it's hard to conclude very much, but I don't think there is any
regression.
Before:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 5 89.209 ? 11.640 ns/op
CreateArray.newArray 7 avgt 5 93.453 ? 2.113 ns/op
CreateArray.newArray 10 avgt 5 93.388 ? 21.406 ns/op
CreateArray.newArray 15 avgt 5 102.904 ? 23.075 ns/op
CreateArray.newArray 23 avgt 5 117.167 ? 19.673 ns/op
CreateArray.newArray 34 avgt 5 130.184 ? 1.042 ns/op
CreateArray.newArray 51 avgt 5 132.981 ? 8.446 ns/op
CreateArray.newArray 77 avgt 5 137.438 ? 5.723 ns/op
CreateArray.newArray 115 avgt 5 135.289 ? 3.393 ns/op
CreateArray.newArray 173 avgt 5 151.245 ? 8.469 ns/op
CreateArray.newArray 259 avgt 5 157.292 ? 2.087 ns/op
CreateArray.newArray 389 avgt 5 176.621 ? 3.741 ns/op
CreateArray.newArray 584 avgt 5 200.957 ? 6.825 ns/op
CreateArray.newArray 876 avgt 5 233.122 ? 3.508 ns/op
CreateArray.newArray 1314 avgt 5 280.525 ? 5.696 ns/op
CreateArray.newArray 1971 avgt 5 360.799 ? 8.859 ns/op
After:
Benchmark (size) Mode Cnt Score Error Units
CreateArray.newArray 5 avgt 5 90.168 ? 4.363 ns/op
CreateArray.newArray 7 avgt 5 88.221 ? 32.537 ns/op
CreateArray.newArray 10 avgt 5 97.991 ? 1.778 ns/op
CreateArray.newArray 15 avgt 5 102.441 ? 30.219 ns/op
CreateArray.newArray 23 avgt 5 120.875 ? 11.074 ns/op
CreateArray.newArray 34 avgt 5 130.916 ? 2.476 ns/op
CreateArray.newArray 51 avgt 5 134.765 ? 10.002 ns/op
CreateArray.newArray 77 avgt 5 138.228 ? 2.479 ns/op
CreateArray.newArray 115 avgt 5 135.907 ? 1.025 ns/op
CreateArray.newArray 173 avgt 5 150.318 ? 9.291 ns/op
CreateArray.newArray 259 avgt 5 156.671 ? 2.023 ns/op
CreateArray.newArray 389 avgt 5 175.735 ? 3.861 ns/op
CreateArray.newArray 584 avgt 5 206.501 ? 9.117 ns/op
CreateArray.newArray 876 avgt 5 233.676 ? 3.463 ns/op
CreateArray.newArray 1314 avgt 5 280.259 ? 4.131 ns/op
CreateArray.newArray 1971 avgt 5 360.037 ? 9.968 ns/op
More information about the hotspot-dev
mailing list