[aarch64-port-dev ] Error in server compiler when packing/unpacking data from arrays using shift and mask ops.
Andy Johnson
andy.johnson at linaro.org
Wed Dec 4 13:07:42 PST 2013
The jtreg hotspot/compiler test TestCharVect.java contains the following
code snippet:
long l0 = (long)a1[i*4+0];
long l1 = (long)a1[i*4+1];
long l2 = (long)a1[i*4+2];
long l3 = (long)a1[i*4+3];
p4[i] = (l0 & 0xFFFFl) |
((l1 & 0xFFFFl) << 16) |
((l2 & 0xFFFFl) << 32) |
((l3 & 0xFFFFl) << 48);
The code generated by the server compiler is this:
0x00007fcac91a7d04: add xscratch1, x2, #0x10
0x00007fcac91a7d08: ldrh w15, [xscratch1,w11,sxtw #1]
0x00007fcac91a7d0c: ldrh w18, [x14,#24] ;*caload
; -
TestCharVect2::test_pack4_swap at 24 (line 1309)
0x00007fcac91a7d10: ldrh w3, [x14,#22] ;*caload
; -
TestCharVect2::test_pack4_swap at 54 (line 1312)
0x00007fcac91a7d14: ldrh w1, [x14,#20] ;*caload
; -
TestCharVect2::test_pack4_swap at 44 (line 1311)
0x00007fcac91a7d18: ldrh w5, [x14,#18] ;*caload
; -
TestCharVect2::test_pack4_swap at 34 (line 1310)
0x00007fcac91a7d1c: ldrh w11, [x14,#30] ;*caload
; -
TestCharVect2::test_pack4_swap at 54 (line 1312)
0x00007fcac91a7d20: ldrh w17, [x14,#28] ;*caload
; -
TestCharVect2::test_pack4_swap at 44 (line 1311)
0x00007fcac91a7d24: ldrh w14, [x14,#26] ;*caload
; -
TestCharVect2::test_pack4_swap at 34 (line 1310)
0x00007fcac91a7d28: sbfx x15, x15, #16, #16
0x00007fcac91a7d2c: sbfiz x14, x14, #32, #32
0x00007fcac91a7d30: sbfiz x16, x17, #16, #32
0x00007fcac91a7d34: sxtw x11, w11
0x00007fcac91a7d38: orr x11, x11, x16
0x00007fcac91a7d3c: orr x11, x11, x14
0x00007fcac91a7d40: sbfiz x14, x5, #32, #32
0x00007fcac91a7d44: sbfiz x16, x1, #16, #32
0x00007fcac91a7d48: sxtw x17, w3
0x00007fcac91a7d4c: orr x16, x17, x16
0x00007fcac91a7d50: orr x14, x16, x14
0x00007fcac91a7d54: orr x14, x14, x15
0x00007fcac91a7d58: add xscratch1, x13, #0x10
0x00007fcac91a7d5c: str x14, [xscratch1,w0,sxtw #3]
0x00007fcac91a7d60: sbfx x14, x18, #16, #16
0x00007fcac91a7d64: orr x11, x11, x14
0x00007fcac91a7d68: add xscratch1, x13, #0x18
0x00007fcac91a7d6c: str x11, [xscratch1,w0,sxtw #3]
;*lastore
; -
TestCharVect2::test_pack4_swap at 96 (line 1313)
0x00007fcac91a7d70: add w0, w0, #0x2 ;*iinc
; -
TestCharVect2::test_pack4_swap at 97 (line 1308)
0x00007fcac91a7d74: cmp w0, w12
0x00007fcac91a7d78: b.lt 0x00007fcac91a7cfc ;*if_icmpge
; -
TestCharVect2::test_pack4_swap at 15 (line 1308)
;; B16: # B18 B17 <- B14 B15 Freq: 0.481246
0x00007fcac91a7d7c: cmp w0, w10
0x00007fcac91a7d80: b.ge 0x00007fcac91a7dd0 ;*aload_1
; -
TestCharVect2::test_pack4_swap at 18 (line 1309)
;; B17: # B17 B18 <- B16 B17 Loop: B17-B17 inner post of N324
Freq: 0.481246
0x00007fcac91a7d84: lsl w11, w0, #2 ;*imul
; -
TestCharVect2::test_pack4_swap at 21 (line 1309)
0x00007fcac91a7d88: add xmethod, x2, w11, sxtw #1
;*caload
; -
TestCharVect2::test_pack4_swap at 54 (line 1312)
0x00007fcac91a7d8c: add xscratch1, x2, #0x10
0x00007fcac91a7d90: ldrh w14, [xscratch1,w11,sxtw #1]
;*caload
; -
TestCharVect2::test_pack4_swap at 24 (line 1309)
0x00007fcac91a7d94: ldrh w11, [xmethod,#18] ;*caload
; -
TestCharVect2::test_pack4_swap at 34 (line 1310)
0x00007fcac91a7d98: ldrh w16, [xmethod,#22] ;*caload
; -
TestCharVect2::test_pack4_swap at 54 (line 1312)
0x00007fcac91a7d9c: ldrh w12, [xmethod,#20] ;*caload
; -
TestCharVect2::test_pack4_swap at 44 (line 1311)
0x00007fcac91a7da0: sbfx x14, x14, #16, #16
0x00007fcac91a7da4: sbfiz xmethod, xmethod, #16, #32
0x00007fcac91a7da8: sxtw x15, w16
0x00007fcac91a7dac: orr xmethod, x15, xmethod
0x00007fcac91a7db0: sbfiz x11, x11, #32, #32
0x00007fcac91a7db4: orr x11, xmethod, x11
0x00007fcac91a7db8: orr x11, x11, x14
0x00007fcac91a7dbc: add xscratch1, x13, #0x10
0x00007fcac91a7dc0: str x11, [xscratch1,w0,sxtw #3]
;*lastore
; -
TestCharVect2::test_pack4_swap at 96 (line 1313)
0x00007fcac91a7dc4: add w0, w0, #0x1 ;*iinc
; -
TestCharVect2::test_pack4_swap at 97 (line 1308)
0x00007fcac91a7dc8: cmp w0, w10
0x00007fcac91a7dcc: b.lt 0x00007fcac91a7d84 ;*arraylength
; -
TestCharVect2::test_pack4_swap at 5 (line 1307)
Obviously the loop has been unrolled.
For some reason, only 48 bits of the 64-bit long are set. The 16
high-order (numerical, not endian) bits are all zero. The pack test fails,
as does the subsequent unpack test, since it multiplies together each of
the sub-components, one of which is zero, so the final result is zero.
However, If I add the following line:
dummy_print_long(i, p4[i],print_flag);
I get the following code:
0x00007fb7c91b153c: add xscratch1, x2, #0x10
0x00007fb7c91b1540: ldrh w11, [xscratch1,w10,sxtw #1]
0x00007fb7c91b1544: add x10, x2, w10, sxtw #1 ;*caload
; -
TestCharVect::test_pack4_swap at 54 (line 1312)
0x00007fb7c91b1548: str x2, [sp]
0x00007fb7c91b154c: ldrh w12, [x10,#22]
0x00007fb7c91b1550: ldrh w13, [x10,#18]
0x00007fb7c91b1554: ldrh w10, [x10,#20]
0x00007fb7c91b1558: orr x10, xmethod, x10, lsl #16
0x00007fb7c91b155c: orr x10, x10, x13, lsl #32
0x00007fb7c91b1560: orr x2, x10, x11, lsl #48 ;*lor
; -
TestCharVect::test_pack4_swap at 95 (line 1313)
0x00007fb7c91b1564: ldr x10, [sp,#24]
0x00007fb7c91b1568: ldr w11, [sp,#8]
0x00007fb7c91b156c: add xscratch1, x10, #0x10
0x00007fb7c91b1570: str x2, [xscratch1,w11,sxtw #3]
;*aload_1
; -
TestCharVect::test_pack4_swap at 18 (line 1309)
0x00007fb7c91b1574: ldr w1, [sp,#8]
0x00007fb7c91b1578: ldr w3, [sp,#16]
0x00007fb7c91b157c: bl 0x00007fb7c91482a0 ; OopMap{[0]=Oop [24]=Oop
off=256}
;*invokestatic
dummy_print_long
; -
TestCharVect::test_pack4_swap at 102 (line 1318)
; {static_call}
;; B15: # B13 B16 <- B14 Freq: 12.8298
0x00007fb7c91b1580: ldr w11, [sp,#8]
0x00007fb7c91b1584: add w12, w11, #0x1 ;*iinc
; -
TestCharVect::test_pack4_swap at 105 (line 1308)
0x00007fb7c91b1588: ldr w13, [sp,#12]
0x00007fb7c91b158c: cmp w12, w13
0x00007fb7c91b1590: b.lt 0x00007fb7c91b152c ;*if_icmpge
; -
TestCharVect::test_pack4_swap at 15 (line 1308)
;; B16: # B12 <- B15 Freq: 0.481118
0x00007fb7c91b1594: b 0x00007fb7c91b1510
;; B17: # N1 <- B10 B5 B6 B7 B8 B9 Freq: 2.90666e-06
;; 0xFFFFFF86
0x00007fb7c91b1598: movn w1, #0x79
0x00007fb7c91b159c: str x13, [sp]
0x00007fb7c91b15a0: str x2, [sp,#8]
0x00007fb7c91b15a4: bl 0x00007fb7c9149520 ; OopMap{[0]=Oop [8]=Oop
off=296}
;*aload_1
; -
TestCharVect::test_pack4_swap at 18 (line 1309)
; {runtime_call}
0x00007fb7c91b15a8: brk #0x3e7 ;*invokestatic dummy_print_long
; -
TestCharVect::test_pack4_swap at 102 (line 1318)
Note that in this case, the loop is not unrolled.
All 64 bits are included in the printout, and the validation tests pass.
BTW, in this particular test, the size of the array is 997. If I reduce it
to 120, it still fails, but if I reduce the size to any value smaller than
120, the test passes. I assume that this value is lower than the threshold
for triggering a compilation, and that the code is running in the
interpreter.
BTW, there are two other tests, TestByteVect.java and TestShortVect.java
that exhibit the same behavior, and I suspect that a single fix will
correct all of these problems.
More information about the aarch64-port-dev
mailing list