[aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not use DC ZVA

Edward Nevill edward.nevill at gmail.com
Fri Apr 29 15:30:33 UTC 2016


Hi Roland,


On Fri, 2016-04-29 at 16:05 +0200, Roland Westrelin wrote:
> Hi Ed,
> 
> I can't start a debug VM since that change was pushed:
> 
> #  Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912
> #  assert(s.body[i] == 32) failed: what?
> 
> I think the problem is MacroAssembler::fill_words() moves the pointer
> past the last word that was written (i.e. there'a hole between the last
> written word and the value of the base). It passes again with this
> patch:

Sorry, I missed the post-condition that base should point to the end.

Could you please try the following patch

diff -r e118111d4433 src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp	Fri Apr 29 14:32:19 2016 +0200
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp	Fri Apr 29 08:08:47 2016 -0700
@@ -4767,15 +4767,15 @@
   br(rscratch2);
 
   bind(loop);
+  add(base, base, unroll * 16);
   for (int i = -unroll; i < 0; i++)
     stp(value, value, Address(base, i * 16));
   bind(entry);
   subs(cnt, cnt, unroll * 2);
-  add(base, base, unroll * 16);
   br(Assembler::GE, loop);
 
   tbz(cnt, 0, fini);
-  str(value, Address(base, -unroll * 16));
+  str(value, Address(post(base, 8)));
   bind(fini);
 }
 

I believe the only thing wrong with the code is that it exits without the base being updated as expected.

> 
> diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> @@ -4754,12 +4754,12 @@
>    const int unroll = 8; // Number of stp instructions we'll unroll
>  
>    cbz(cnt, fini);
> -  tbz(base, 3, skip);
> +  tbz(cnt, 0, skip);
>    str(value, Address(post(base, 8)));
>    sub(cnt, cnt, 1);
>    bind(skip);

The intention here is to align the base, not ensure the count is even. An odd count is handled by the final store.

>  
> -  andr(rscratch1, cnt, (unroll-1) * 2);
> +  andr(rscratch1, cnt, unroll* 2 - 1);

The mask (unroll-1) * 2 ensures rscratch1 is even. With an unroll value of 8 the mask will be 0xe giving a count of the even no of values to be stored by the jump into the loop.


>    sub(cnt, cnt, rscratch1);
>    add(base, base, rscratch1, Assembler::LSL, 3);
>    adr(rscratch2, entry);
> @@ -4767,15 +4767,14 @@
>    br(rscratch2);
>  
>    bind(loop);
> -  for (int i = -unroll; i < 0; i++)
> +  add(base, base, unroll * 16);
> +  sub(cnt, cnt, unroll * 2);
> +  for (int i = -unroll; i < 0; i++) {
>      stp(value, value, Address(base, i * 16));
> +  }
>    bind(entry);
> -  subs(cnt, cnt, unroll * 2);
> -  add(base, base, unroll * 16);
> -  br(Assembler::GE, loop);

I need to do subs / b GE here because cnt may be odd. Therefore I cannot use cbnz cnt, ... because the count may never become zero.

Moving the update of the base to the head of the loop is correct as that ensures the base points correctly to the end on exit.

> -
> -  tbz(cnt, 0, fini);
> -  str(value, Address(base, -unroll * 16));
> +  cbnz(cnt, loop);
> +
>    bind(fini);
>  }
>  
> 
> I've done very little testing with it but a few things felt strange in
> that code:
> 
> - I don't understand why aligning the base at the beginning is
>   necessary. Once the base is aligned there's no guarantee the cnt is an
>   even number of words. so when we update the base with:
> 
> add(base, base, rscratch1, Assembler::LSL, 3);

Yes, but the mask (unroll-1) * 2 ensures rscratch1 is even.

> 
> base can be misaligned again. I changed it to test whether cnt is even
> so at least we don't have to worry about that anymore.
> 
> - (unroll-1) * 2 seems like an incorrect mask.
> 
> - then there's the problem of base being updated at the end of the loop
>   that I mentioned above and that can lead to an incorrect base.

Yes, sorry I completely missed that postcondition.

> 
> - That part at the end:
>  str(value, Address(base, -unroll * 16));

Because I did the update of the base at the end, base points unroll * 16 beyond the end. So if there is an odd word at the end it is stored at offset -unroll * 16.

But updating the base at the head of the loop ensures it is pointing to the end on exit. Then if there is an odd word at the end we can use post(base, 8) to store the odd word and ensure the base is updated.

> 
> I don't understand.
> 
> Anyway, as I said I've done very little testing so it's entirely
> possible my patch above is broken.

Thanks for finding this and sorry again. I'll give the above patch a good testing and submit it for RFR. If you could also try it that would be appreciated.

All the best,
Ed.




More information about the hotspot-compiler-dev mailing list