RFR: 8179444: AArch64: Put zero_words on a diet

Andrew Haley aph at redhat.com
Wed May 3 17:05:10 UTC 2017


New version, corrected:


The code we generate for ClearArray in C2 is much too verbose.  It
looks like this:

  0x000003ffad2213a4: cbz	x11, 0x000003ffad22140c
  0x000003ffad2213a8: tbz	w10, #3, 0x000003ffad2213b4
  0x000003ffad2213ac: str	xzr, [x10],#8
  0x000003ffad2213b0: sub	x11, x11, #0x1
  0x000003ffad2213b4: subs	xscratch1, x11, #0x20
  0x000003ffad2213b8: b.lt	0x000003ffad2213c0
  0x000003ffad2213bc: bl	Stub::zero_longs  ;   {external_word}
  0x000003ffad2213c0: and	xscratch1, x11, #0xe
  0x000003ffad2213c4: sub	x11, x11, xscratch1
  0x000003ffad2213c8: add	x10, x10, xscratch1, lsl #3
  0x000003ffad2213cc: adr	xscratch2, 0x000003ffad2213fc
  0x000003ffad2213d0: sub	xscratch2, xscratch2, xscratch1, lsl #1
  0x000003ffad2213d4: br	xscratch2
  0x000003ffad2213d8: add	x10, x10, #0x80
  0x000003ffad2213dc: stp	xzr, xzr, [x10,#-128]
  0x000003ffad2213e0: stp	xzr, xzr, [x10,#-112]
  0x000003ffad2213e4: stp	xzr, xzr, [x10,#-96]
  0x000003ffad2213e8: stp	xzr, xzr, [x10,#-80]
  0x000003ffad2213ec: stp	xzr, xzr, [x10,#-64]
  0x000003ffad2213f0: stp	xzr, xzr, [x10,#-48]
  0x000003ffad2213f4: stp	xzr, xzr, [x10,#-32]
  0x000003ffad2213f8: stp	xzr, xzr, [x10,#-16]
  0x000003ffad2213fc: subs	x11, x11, #0x10
  0x000003ffad221400: b.ge	0x000003ffad2213d8
  0x000003ffad221404: tbz	w11, #0, 0x000003ffad22140c
  0x000003ffad221408: str	xzr, [x10],#8

This patch takes much of this code and puts it into a stub.  The new
version of ClearArray is:

  0x000003ff8022b7b0: cmp       x11, #0x8
  0x000003ff8022b7b4: b.cc      0x000003ff8022b7bc
  0x000003ff8022b7b8: bl        Stub::zero_blocks  ;   {runtime_call StubRoutines (2)}
  0x000003ff8022b7bc: tbz       w11, #2, 0x000003ff8022b7c8
  0x000003ff8022b7c0: stp       xzr, xzr, [x10],#16
  0x000003ff8022b7c4: stp       xzr, xzr, [x10],#16
  0x000003ff8022b7c8: tbz       w11, #1, 0x000003ff8022b7d0
  0x000003ff8022b7cc: stp       xzr, xzr, [x10],#16
  0x000003ff8022b7d0: tbz       w11, #0, 0x000003ff8022b7d8
  0x000003ff8022b7d4: str       xzr, [x10]

... which I hope you'll agree is much better.

The idea is to handle array sizes of 0-7 words inline, so small arrays
are got out of the way very quickly, and handle anything larger in
Stub::zero_blocks.  I wanted to make sure that there is no significant
loss of performance, and I have attached the results of the benchmark
I used, which does no more than create an array of ints of various
sizes.  There are winners and losers, but nothing is changed by very
much, and the code cache usage of each ClearArray goes down from 104
to 40 bytes.

http://cr.openjdk.java.net/~aph/8179444-2/

OK?

Andrew.


Before:

Benchmark             (size)  Mode  Cnt    Score   Error  Units

CreateArray.newArray       5  avgt   10   48.273 ?   1.679  ns/op
CreateArray.newArray       7  avgt   10   48.915 ?   0.793  ns/op
CreateArray.newArray      10  avgt   10   49.826 ?   0.868  ns/op
CreateArray.newArray      15  avgt   10   52.582 ?   0.521  ns/op
CreateArray.newArray      23  avgt   10   57.589 ?   0.670  ns/op
CreateArray.newArray      34  avgt   10   67.233 ?   0.984  ns/op
CreateArray.newArray      51  avgt   10  120.652 ?   2.018  ns/op
CreateArray.newArray      77  avgt   10  102.745 ?   1.034  ns/op
CreateArray.newArray     115  avgt   10  136.703 ?   1.067  ns/op
CreateArray.newArray     173  avgt   10  182.247 ?   1.093  ns/op
CreateArray.newArray     259  avgt   10  163.168 ?   5.967  ns/op
CreateArray.newArray     389  avgt   10  233.874 ?   3.400  ns/op
CreateArray.newArray     584  avgt   10  251.286 ?   4.892  ns/op
CreateArray.newArray     876  avgt   10  242.510 ?   0.520  ns/op
CreateArray.newArray    1314  avgt   10  382.846 ?   0.624  ns/op
CreateArray.newArray    1971  avgt   10  487.590 ?   1.409  ns/op



After:

Benchmark             (size)  Mode  Cnt    Score   Error  Units
CreateArray.newArray       5  avgt   10   47.208 ? 0.656  ns/op
CreateArray.newArray       7  avgt   10   47.838 ? 0.608  ns/op
CreateArray.newArray      10  avgt   10   48.798 ? 0.797  ns/op
CreateArray.newArray      15  avgt   10   51.981 ? 0.424  ns/op
CreateArray.newArray      23  avgt   10   56.614 ? 1.064  ns/op
CreateArray.newArray      34  avgt   10   65.986 ? 1.114  ns/op
CreateArray.newArray      51  avgt   10  119.811 ? 0.857  ns/op
CreateArray.newArray      77  avgt   10  101.694 ? 1.192  ns/op
CreateArray.newArray     115  avgt   10  137.169 ? 2.159  ns/op
CreateArray.newArray     173  avgt   10  185.815 ? 0.754  ns/op
CreateArray.newArray     259  avgt   10  163.305 ? 2.107  ns/op
CreateArray.newArray     389  avgt   10  234.049 ? 3.162  ns/op
CreateArray.newArray     584  avgt   10  250.729 ? 1.714  ns/op
CreateArray.newArray     876  avgt   10  242.921 ? 0.577  ns/op
CreateArray.newArray    1314  avgt   10  384.337 ? 1.465  ns/op
CreateArray.newArray    1971  avgt   10  486.948 ? 5.303  ns/op





More information about the hotspot-dev mailing list