aarch64: RFR: Block zeroing by 'DC ZVA'
Andrew Haley
aph at redhat.com
Tue Apr 19 13:19:31 UTC 2016
On 04/19/2016 01:54 PM, Long Chen wrote:
> Thanks for all these nice comments. Here is a revised version:
>
> http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch
>
>
> Changes:
>
> 1. Are DC and IC really synonyms?
>
> DC and IC assembling was supposed to be distinguished by different
> cache_maintenance parameters. I create two enums ‘icache_maintanence’ and
> ‘dcache_maintanence’ in the revised patch, to make it look better.
>
> + enum icache_maintenance {IVAU = 0b0101};
> + enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110,
> ZVA = 0b100};
> + void dc(dcache_maintenance cm, Register Rt) {
> + sys(0b011, 0b0111, cm, 0b001, Rt);
> + }
> +
> + void ic(icache_maintenance cm, Register Rt) {
> + sys(0b011, 0b0111, cm, 0b001, Rt);
> }
That looks better, yes.
> 5. To avoid scratching a new register, I write a small piece of code
> after the dc zva loop in block_zero, so that block_zero doesn’t need to
> fall through to fill_words to zero the small part of array. This code might
> not perform as good as fill_words (unrolled), but it requires one less
> register, and the code size becomes smaller as well.
> The final code is like this:
>
> 0x0000007f7d3dd4fc: cmp x11, #0x20
> 0x0000007f7d3dd500: b.lt 0x0000007f7d3dd538
> 0x0000007f7d3dd504: neg x8, x10
> 0x0000007f7d3dd508: and x8, x8, #0x3f
> 0x0000007f7d3dd50c: cbz x8, 0x0000007f7d3dd520
> 0x0000007f7d3dd510: sub x11, x11, x8, asr #3
> 0x0000007f7d3dd514: sub x8, x8, #0x8
> 0x0000007f7d3dd518: str xzr, [x10],#8
> 0x0000007f7d3dd51c: cbnz x8, 0x0000007f7d3dd514
> 0x0000007f7d3dd520: sub x11, x11, #0x8
> 0x0000007f7d3dd524: dc zva, x10
> 0x0000007f7d3dd528: subs x11, x11, #0x8
> 0x0000007f7d3dd52c: add x10, x10, #0x40
> 0x0000007f7d3dd530: b.ge 0x0000007f7d3dd524
> 0x0000007f7d3dd534: add x11, x11, #0x8
> 0x0000007f7d3dd538: tbz w11, #0, 0x0000007f7d3dd544
> 0x0000007f7d3dd53c: str xzr, [x10],#8
> 0x0000007f7d3dd540: sub x11, x11, #0x1
> 0x0000007f7d3dd544: cbz x11, 0x0000007f7d3dd554
> 0x0000007f7d3dd548: sub x11, x11, #0x2
> 0x0000007f7d3dd54c: stp xzr, xzr, [x10],#16
> 0x0000007f7d3dd550: cbnz x11, 0x0000007f7d3dd548
>
> Would this be fine?
It might well be. I'd like Ed to do a few measurements of large and
small block zeroing. My guess is that a reasonably small unrolled loop
doing STP ZR, ZR will work better than anything else, but we'll see.
Thanks,
Andrew.
More information about the hotspot-compiler-dev
mailing list