RFR: 8144028: Use AArch64 bit-test instructions in C2
Andrew Haley
aph at redhat.com
Wed Dec 2 11:13:39 UTC 2015
On 02/12/15 09:45, Andrew Haley wrote:
> On 01/12/15 21:18, Vladimir Kozlov wrote:
>> Thanks. I will add -Xbatch flag. It will make sure to trigger compilation when threshold is reached. And I will verify.
>
> That does not work, I'm afraid. When -Xbatch is used, C2 does not
> generate the instructions I'm trying to test. The problem is that
> it generates conditional branches and moves instead of CMove.
>
> Why should it do this? Maybe the profile counts are different,
> but I don't think they are.
Here is the good code, without -Xbatch:
;; B2: # N40 <- B1 Freq: 0.999999
0x0000007fa8daf4c0: eor x11, x11, x11, lsl #13
;*lxor {reexecute=0 rethrow=0 return_oop=0}
; - XorShift::nextLong at 12 (line 159)
; - BitTests::testLongMaskBranch at 4 (line 101)
0x0000007fa8daf4c4: eor x11, x11, x11, lsr #17
;*lxor {reexecute=0 rethrow=0 return_oop=0}
; - XorShift::nextLong at 28 (line 160)
; - BitTests::testLongMaskBranch at 4 (line 101)
0x0000007fa8daf4c8: eor x11, x11, x11, lsl #5 ;*lxor {reexecute=0 rethrow=0 return_oop=0}
; - XorShift::nextLong at 43 (line 161)
; - BitTests::testLongMaskBranch at 4 (line 101)
0x0000007fa8daf4cc: tst x3, x11
0x0000007fa8daf4d0: str x11, [x10,#16] ;*invokevirtual nextLong {reexecute=0 rethrow=0 return_oop=0}
; - BitTests::testLongMaskBranch at 4 (line 101)
0x0000007fa8daf4d4: add x10, x2, #0x1
0x0000007fa8daf4d8: csel x0, x10, x2, ne ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}
; - BitTests::testLongMaskBranch at 18 (line 104)
Note that the call to XorShift::nextLong has been nicely inlined, and
if ((((int)r.nextLong() & mask) != 0)) {
counter++;
}
generates simply a TST, and ADD, and a CSEL. (It's the TST
instruction that I need to be executed for this testcase.)
Here is the bad code, with -Xbatch:
;; B2: # B8 B3 <- B1 Freq: 0.999999
0x0000007f851d92c8: bl 0x0000007f85148300 ; ImmutableOopMap{}
;*invokevirtual nextLong {reexecute=0 rethrow=0 return_oop=0}
; - BitTests::testLongMaskBranch at 4 (line 101)
; {optimized virtual_call}
;; B3: # B6 B4 <- B2 Freq: 0.999979
0x0000007f851d92cc: ldr x10, [sp]
0x0000007f851d92d0: and x10, x0, x10
0x0000007f851d92d4: cbz x10, 0x0000007f851d92f0
;*ifeq {reexecute=0 rethrow=0 return_oop=0}
; - BitTests::testLongMaskBranch at 11 (line 101)
;; B4: # B5 <- B3 Freq: 0.899981
0x0000007f851d92d8: add x0, xfp, #0x1 ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}
; - BitTests::testLongMaskBranch at 18 (line 104)
;; B5: # N62 <- B4 B6 Freq: 0.999979
0x0000007f851d92dc: ldp xfp, xlr, [sp,#32]
0x0000007f851d92e0: add sp, sp, #0x30
0x0000007f851d92e4: adrp xscratch1, 0x0000007f8f099000
; {poll_return}
0x0000007f851d92e8: ldr wzr, [xscratch1] ; {poll_return}
0x0000007f851d92ec: ret
;; B6: # B5 <- B3 Freq: 0.0999979
0x0000007f851d92f0: mov x0, xfp
0x0000007f851d92f4: b 0x0000007f851d92dc
where it doesn't inline a trivial function and it doesn't CSEL either.
This looks like C1 code, but it's not. Is C2 trying to do a
quick-and-dirty compilation or something? This pattern (great code
without -Xbatch, poor code with -Xbatch) is repeated throughout these
tests.
Andrew.
More information about the hotspot-compiler-dev
mailing list