aarch64: RFR: Block zeroing by 'DC ZVA'

Andrew Haley aph at redhat.com
Mon Apr 18 12:55:12 UTC 2016


One other thing.  This is rather a lot of code to emit every time an
array is created:

 ;; zero_words {
  0x0000007fa880f5f0: cmp	x11, #0x20
  0x0000007fa880f5f4: b.lt	0x0000007fa880f62c

  0x0000007fa880f5f8: neg	x8, x10
  0x0000007fa880f5fc: and	x8, x8, #0x7f
  0x0000007fa880f600: cbz	x8, 0x0000007fa880f614
  0x0000007fa880f604: sub	x11, x11, x8, asr #3
  0x0000007fa880f608: sub	x8, x8, #0x8
  0x0000007fa880f60c: str	xzr, [x10],#8
  0x0000007fa880f610: cbnz	x8, 0x0000007fa880f608
  0x0000007fa880f614: sub	x11, x11, #0x10
  0x0000007fa880f618: dc	zva, x10
  0x0000007fa880f61c: subs	x11, x11, #0x10
  0x0000007fa880f620: add	x10, x10, #0x80
  0x0000007fa880f624: b.ge	0x0000007fa880f618
  0x0000007fa880f628: add	x11, x11, #0x10

  0x0000007fa880f62c: and	x8, x11, #0x7

I don't think this CBZ does anything useful:

  0x0000007fa880f630: cbz	x8, 0x0000007fa880f670

(I'm assuming that the 0-7 cases are uniformly distributed.)

  0x0000007fa880f634: sub	x11, x11, x8
  0x0000007fa880f638: add	x10, x10, x8, lsl #3
  0x0000007fa880f63c: adr	x9, 0x0000007fa880f670
  0x0000007fa880f640: sub	x9, x9, x8, lsl #2
  0x0000007fa880f644: br	x9
  0x0000007fa880f648: add	x10, x10, #0x40
  0x0000007fa880f64c: sub	x11, x11, #0x8
  0x0000007fa880f650: stur	xzr, [x10,#-64]
  0x0000007fa880f654: stur	xzr, [x10,#-56]
  0x0000007fa880f658: stur	xzr, [x10,#-48]
  0x0000007fa880f65c: stur	xzr, [x10,#-40]
  0x0000007fa880f660: stur	xzr, [x10,#-32]
  0x0000007fa880f664: stur	xzr, [x10,#-24]
  0x0000007fa880f668: stur	xzr, [x10,#-16]
  0x0000007fa880f66c: stur	xzr, [x10,#-8]
  0x0000007fa880f670: cbnz	x11, 0x0000007fa880f648
 ;; } zero_words

We could think about moving the large block case into a stub which is
emitted after the main body of the method, or even into a shared stub.
A shared stub would require the args to be in fixed registers, though.

Andrew.



More information about the hotspot-compiler-dev mailing list