New subject: aarch64: RFR: Block zeroing by 'DC ZVA'

19 Apr 2016

      On 04/19/2016 01:54 PM, Long Chen wrote:
...
Thanks for all these nice comments. Here is a revised version:
http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch
Changes:
1.       Are DC and IC really synonyms?
DC and IC assembling was supposed to be distinguished by different
cache_maintenance parameters. I create two enums ‘icache_maintanence’ and
‘dcache_maintanence’ in the revised patch, to make it look better.
+  enum icache_maintenance {IVAU = 0b0101};
+  enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110,
ZVA = 0b100};
+  void dc(dcache_maintenance cm, Register Rt) {
+    sys(0b011, 0b0111, cm, 0b001, Rt);
+  }
+
+  void ic(icache_maintenance cm, Register Rt) {
+    sys(0b011, 0b0111, cm, 0b001, Rt);
   }
That looks better, yes.
...
5.       To avoid scratching a new register, I write a small piece of code
after the dc zva loop in block_zero, so that block_zero doesn’t need to
fall through to fill_words to zero the small part of array. This code might
not perform as good as fill_words (unrolled), but it requires one less
register, and the code size becomes smaller as well.
The final code is like this:
0x0000007f7d3dd4fc: cmp       x11, #0x20
  0x0000007f7d3dd500: b.lt      0x0000007f7d3dd538
  0x0000007f7d3dd504: neg       x8, x10
  0x0000007f7d3dd508: and       x8, x8, #0x3f
  0x0000007f7d3dd50c: cbz       x8, 0x0000007f7d3dd520
  0x0000007f7d3dd510: sub       x11, x11, x8, asr #3
  0x0000007f7d3dd514: sub       x8, x8, #0x8
  0x0000007f7d3dd518: str       xzr, [x10],#8
  0x0000007f7d3dd51c: cbnz      x8, 0x0000007f7d3dd514
  0x0000007f7d3dd520: sub       x11, x11, #0x8
  0x0000007f7d3dd524: dc        zva, x10
  0x0000007f7d3dd528: subs      x11, x11, #0x8
  0x0000007f7d3dd52c: add       x10, x10, #0x40
  0x0000007f7d3dd530: b.ge      0x0000007f7d3dd524
  0x0000007f7d3dd534: add       x11, x11, #0x8
  0x0000007f7d3dd538: tbz       w11, #0, 0x0000007f7d3dd544
  0x0000007f7d3dd53c: str       xzr, [x10],#8
  0x0000007f7d3dd540: sub       x11, x11, #0x1
  0x0000007f7d3dd544: cbz       x11, 0x0000007f7d3dd554
  0x0000007f7d3dd548: sub       x11, x11, #0x2
  0x0000007f7d3dd54c: stp       xzr, xzr, [x10],#16
  0x0000007f7d3dd550: cbnz      x11, 0x0000007f7d3dd548
Would this be fine?
It might well be.  I'd like Ed to do a few measurements of large and
small block zeroing.  My guess is that a reasonably small unrolled loop
doing  STP ZR, ZR  will work better than anything else, but we'll see.

Thanks,

Andrew.

Re: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'

Andrew Haley

Edward Nevill

Andrew Dinn

Edward Nevill

Edward Nevill

Andrew Haley

Edward Nevill

tags

participants (3)