[aarch64-port-dev ] Error in server compiler when packing/unpacking data from arrays using shift and mask ops.

Andy Johnson andy.johnson at linaro.org
Wed Dec 4 13:07:42 PST 2013


The jtreg hotspot/compiler test TestCharVect.java contains the following
code snippet:
      long l0 = (long)a1[i*4+0];
      long l1 = (long)a1[i*4+1];
      long l2 = (long)a1[i*4+2];
      long l3 = (long)a1[i*4+3];
      p4[i] = (l0 & 0xFFFFl) |
             ((l1 & 0xFFFFl) << 16) |
             ((l2 & 0xFFFFl) << 32) |
             ((l3 & 0xFFFFl) << 48);

The code generated by the server compiler is this:
 0x00007fcac91a7d04: add    xscratch1, x2, #0x10
  0x00007fcac91a7d08: ldrh    w15, [xscratch1,w11,sxtw #1]
  0x00007fcac91a7d0c: ldrh    w18, [x14,#24]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 24 (line 1309)

  0x00007fcac91a7d10: ldrh    w3, [x14,#22]   ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 54 (line 1312)

  0x00007fcac91a7d14: ldrh    w1, [x14,#20]   ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 44 (line 1311)

  0x00007fcac91a7d18: ldrh    w5, [x14,#18]   ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 34 (line 1310)

  0x00007fcac91a7d1c: ldrh    w11, [x14,#30]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 54 (line 1312)

  0x00007fcac91a7d20: ldrh    w17, [x14,#28]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 44 (line 1311)

  0x00007fcac91a7d24: ldrh    w14, [x14,#26]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 34 (line 1310)

  0x00007fcac91a7d28: sbfx    x15, x15, #16, #16
  0x00007fcac91a7d2c: sbfiz    x14, x14, #32, #32
  0x00007fcac91a7d30: sbfiz    x16, x17, #16, #32
  0x00007fcac91a7d34: sxtw    x11, w11
  0x00007fcac91a7d38: orr    x11, x11, x16
  0x00007fcac91a7d3c: orr    x11, x11, x14
  0x00007fcac91a7d40: sbfiz    x14, x5, #32, #32
  0x00007fcac91a7d44: sbfiz    x16, x1, #16, #32
  0x00007fcac91a7d48: sxtw    x17, w3
  0x00007fcac91a7d4c: orr    x16, x17, x16
  0x00007fcac91a7d50: orr    x14, x16, x14
  0x00007fcac91a7d54: orr    x14, x14, x15
  0x00007fcac91a7d58: add    xscratch1, x13, #0x10
  0x00007fcac91a7d5c: str    x14, [xscratch1,w0,sxtw #3]
  0x00007fcac91a7d60: sbfx    x14, x18, #16, #16
  0x00007fcac91a7d64: orr    x11, x11, x14
  0x00007fcac91a7d68: add    xscratch1, x13, #0x18
  0x00007fcac91a7d6c: str    x11, [xscratch1,w0,sxtw #3]
                                                ;*lastore
                                                ; -
TestCharVect2::test_pack4_swap at 96 (line 1313)

  0x00007fcac91a7d70: add    w0, w0, #0x2    ;*iinc
                                                ; -
TestCharVect2::test_pack4_swap at 97 (line 1308)

  0x00007fcac91a7d74: cmp    w0, w12
  0x00007fcac91a7d78: b.lt    0x00007fcac91a7cfc  ;*if_icmpge
                                                ; -
TestCharVect2::test_pack4_swap at 15 (line 1308)

  ;; B16: #    B18 B17 &lt;- B14 B15  Freq: 0.481246

  0x00007fcac91a7d7c: cmp    w0, w10
  0x00007fcac91a7d80: b.ge    0x00007fcac91a7dd0  ;*aload_1
                                                ; -
TestCharVect2::test_pack4_swap at 18 (line 1309)

  ;; B17: #    B17 B18 &lt;- B16 B17     Loop: B17-B17 inner post of N324
Freq: 0.481246

  0x00007fcac91a7d84: lsl    w11, w0, #2     ;*imul
                                                ; -
TestCharVect2::test_pack4_swap at 21 (line 1309)

  0x00007fcac91a7d88: add    xmethod, x2, w11, sxtw #1
                                                ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 54 (line 1312)

  0x00007fcac91a7d8c: add    xscratch1, x2, #0x10
  0x00007fcac91a7d90: ldrh    w14, [xscratch1,w11,sxtw #1]
                                                ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 24 (line 1309)

  0x00007fcac91a7d94: ldrh    w11, [xmethod,#18]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 34 (line 1310)

  0x00007fcac91a7d98: ldrh    w16, [xmethod,#22]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 54 (line 1312)

  0x00007fcac91a7d9c: ldrh    w12, [xmethod,#20]  ;*caload
                                                ; -
TestCharVect2::test_pack4_swap at 44 (line 1311)

  0x00007fcac91a7da0: sbfx    x14, x14, #16, #16
  0x00007fcac91a7da4: sbfiz    xmethod, xmethod, #16, #32
  0x00007fcac91a7da8: sxtw    x15, w16
  0x00007fcac91a7dac: orr    xmethod, x15, xmethod
  0x00007fcac91a7db0: sbfiz    x11, x11, #32, #32
  0x00007fcac91a7db4: orr    x11, xmethod, x11
  0x00007fcac91a7db8: orr    x11, x11, x14
  0x00007fcac91a7dbc: add    xscratch1, x13, #0x10
  0x00007fcac91a7dc0: str    x11, [xscratch1,w0,sxtw #3]
                                                ;*lastore
                                                ; -
TestCharVect2::test_pack4_swap at 96 (line 1313)

  0x00007fcac91a7dc4: add    w0, w0, #0x1    ;*iinc
                                                ; -
TestCharVect2::test_pack4_swap at 97 (line 1308)

  0x00007fcac91a7dc8: cmp    w0, w10
  0x00007fcac91a7dcc: b.lt    0x00007fcac91a7d84  ;*arraylength
                                                ; -
TestCharVect2::test_pack4_swap at 5 (line 1307)


Obviously the loop has been unrolled.

For some reason, only 48 bits of the 64-bit long are set.  The 16
high-order (numerical, not endian) bits are all zero.  The pack test fails,
as does the subsequent unpack test, since it multiplies together each of
the sub-components, one of which is zero, so the final result is zero.

However, If I add the following line:
   dummy_print_long(i, p4[i],print_flag);
I get the following code:
  0x00007fb7c91b153c: add    xscratch1, x2, #0x10
  0x00007fb7c91b1540: ldrh    w11, [xscratch1,w10,sxtw #1]
  0x00007fb7c91b1544: add    x10, x2, w10, sxtw #1  ;*caload
                                                ; -
TestCharVect::test_pack4_swap at 54 (line 1312)

  0x00007fb7c91b1548: str    x2, [sp]
  0x00007fb7c91b154c: ldrh    w12, [x10,#22]
  0x00007fb7c91b1550: ldrh    w13, [x10,#18]
  0x00007fb7c91b1554: ldrh    w10, [x10,#20]
  0x00007fb7c91b1558: orr    x10, xmethod, x10, lsl #16
  0x00007fb7c91b155c: orr    x10, x10, x13, lsl #32
  0x00007fb7c91b1560: orr    x2, x10, x11, lsl #48  ;*lor
                                                ; -
TestCharVect::test_pack4_swap at 95 (line 1313)

  0x00007fb7c91b1564: ldr    x10, [sp,#24]
  0x00007fb7c91b1568: ldr    w11, [sp,#8]
  0x00007fb7c91b156c: add    xscratch1, x10, #0x10
  0x00007fb7c91b1570: str    x2, [xscratch1,w11,sxtw #3]
                                                ;*aload_1
                                                ; -
TestCharVect::test_pack4_swap at 18 (line 1309)

  0x00007fb7c91b1574: ldr    w1, [sp,#8]
  0x00007fb7c91b1578: ldr    w3, [sp,#16]
  0x00007fb7c91b157c: bl    0x00007fb7c91482a0  ; OopMap{[0]=Oop [24]=Oop
off=256}
                                                ;*invokestatic
dummy_print_long
                                                ; -
TestCharVect::test_pack4_swap at 102 (line 1318)
                                                ;   {static_call}
  ;; B15: #    B13 B16 &lt;- B14  Freq: 12.8298

  0x00007fb7c91b1580: ldr    w11, [sp,#8]
  0x00007fb7c91b1584: add    w12, w11, #0x1  ;*iinc
                                                ; -
TestCharVect::test_pack4_swap at 105 (line 1308)

  0x00007fb7c91b1588: ldr    w13, [sp,#12]
  0x00007fb7c91b158c: cmp    w12, w13
  0x00007fb7c91b1590: b.lt    0x00007fb7c91b152c  ;*if_icmpge
                                                ; -
TestCharVect::test_pack4_swap at 15 (line 1308)

  ;; B16: #    B12 &lt;- B15  Freq: 0.481118

  0x00007fb7c91b1594: b    0x00007fb7c91b1510
  ;; B17: #    N1 &lt;- B10 B5 B6 B7 B8 B9  Freq: 2.90666e-06

  ;; 0xFFFFFF86
  0x00007fb7c91b1598: movn    w1, #0x79
  0x00007fb7c91b159c: str    x13, [sp]
  0x00007fb7c91b15a0: str    x2, [sp,#8]
  0x00007fb7c91b15a4: bl    0x00007fb7c9149520  ; OopMap{[0]=Oop [8]=Oop
off=296}
                                                ;*aload_1
                                                ; -
TestCharVect::test_pack4_swap at 18 (line 1309)
                                                ;   {runtime_call}
  0x00007fb7c91b15a8: brk    #0x3e7          ;*invokestatic dummy_print_long
                                                ; -
TestCharVect::test_pack4_swap at 102 (line 1318)
Note that in this case, the loop is not unrolled.

All 64 bits are included in the printout, and the validation tests pass.

BTW, in this particular test, the size of the array is 997.  If I reduce it
to 120, it still fails, but if I reduce the size to any value smaller than
120, the test passes.  I assume that this value is lower than the threshold
for triggering a compilation, and that the code is running in the
interpreter.

BTW, there are two other tests, TestByteVect.java and TestShortVect.java
that exhibit the same behavior, and I suspect that a single fix will
correct all of these problems.



More information about the aarch64-port-dev mailing list