[9] RFR (S): 8161720: Better byte behavior for off-heap data

Andrew Haley aph at redhat.com
Mon Aug 22 16:13:59 UTC 2016


Hi,

On 22/08/16 15:25, Zoltán Majó wrote:

> 
> 
> Solution: Normalize the result returned by Unsafe.getBoolean in
> - src/share/vm/prims/unsafe.cpp (used by the interpreter and by compiled 
> code if the compiler intrinsics for Unsafe.getBoolean() are disabled)
> - the C1 and C2 intrinsics for Unsafe.getBoolean().
> 
> 
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8161720/webrev.00/
> 
> Testing:
> - JPRT (incl. Unsafe[On|Off]HeapBooleanTest.java);
> - RBT testing with all hotspot tests both w/ -Xmixed and -Xcomp (no new 
> problems have showed up).

Result looks pretty decent for AArch64.

For test case:

    static boolean bang() {
        boolean l = false;
        for(int i = 0; i < SIZE; i++) {
            l |= UNSAFE.getBoolean(null, offHeapMemory+i);
        }
        return l;
    }

Before on the left, after on the right: (Sorry, needs a wide window)

;; B3: #        B2 B4 <- B1 B2  Loop: B3-B2 inner main of N14 Freq: 1045.37     ;; B3: #        B2 B4 <- B10 B2         Loop: B3-B2 inner main of N34 Freq: 1

  0x000003ff892665b4: ldrb      w11, [x23,w1,sxtw #0]                             0x000003ffa5279004: adrp      x10, 0x000003ffa84b9000
  0x000003ff892665b8: ldrb      w4, [x22,w1,sxtw #0]                                                                           ;   {external_word}
  0x000003ff892665bc: orr       w13, w0, w11                                      0x000003ffa5279008: add       x10, x10, #0x1a0
  0x000003ff892665c0: ldrb      w10, [x5,w1,sxtw #0]                              0x000003ffa527900c: ldrb      w10, [x10,w13,sxtw #0]
  0x000003ff892665c4: orr       w13, w13, w4                                      0x000003ffa5279010: cmp       w10, #0x0
  0x000003ff892665c8: ldrb      w11, [x3,w1,sxtw #0]                              0x000003ffa5279014: ldrb      w12, [x19,w13,sxtw #0]
  0x000003ff892665cc: orr       w13, w13, w10                                     0x000003ffa5279018: csel      w10, w20, w10, ne
  0x000003ff892665d0: ldrb      w4, [x2,w1,sxtw #0]                               0x000003ffa527901c: cmp       w12, #0x0
  0x000003ff892665d4: orr       w13, w13, w11                                     0x000003ffa5279020: ldrb      w15, [x21,w13,sxtw #0]
  0x000003ff892665d8: ldrb      w10, [x18,w1,sxtw #0]                             0x000003ffa5279024: csel      w11, w20, w12, ne
  0x000003ff892665dc: orr       w13, w13, w4                                      0x000003ffa5279028: cmp       w15, #0x0
  0x000003ff892665e0: ldrb      w11, [x17,w1,sxtw #0]                             0x000003ffa527902c: ldrb      w18, [x22,w13,sxtw #0]
  0x000003ff892665e4: orr       w10, w13, w10                                     0x000003ffa5279030: csel      w15, w20, w15, ne
  0x000003ff892665e8: ldrb      w4, [x16,w1,sxtw #0]                              0x000003ffa5279034: orr       w10, w10, w16
  0x000003ff892665ec: orr       w11, w10, w11                                     0x000003ffa5279038: cmp       w18, #0x0
  0x000003ff892665f0: ldrb      w13, [x15,w1,sxtw #0]                             0x000003ffa527903c: ldrb      w12, [x23,w13,sxtw #0]
  0x000003ff892665f4: orr       w11, w11, w4                                      0x000003ffa5279040: orr       w16, w11, w10
  0x000003ff892665f8: ldrb      w10, [x14,w1,sxtw #0]                             0x000003ffa5279044: csel      w10, w20, w18, ne
  0x000003ff892665fc: orr       w11, w11, w13                                     0x000003ffa5279048: cmp       w12, #0x0
  0x000003ff89266600: ldrb      w4, [x12,w1,sxtw #0]                              0x000003ffa527904c: ldrb      w17, [x24,w13,sxtw #0]
  0x000003ff89266604: orr       w11, w11, w10                                     0x000003ffa5279050: orr       w16, w16, w15
  0x000003ff89266608: ldrb      w13, [x7,w1,sxtw #0]                              0x000003ffa5279054: csel      w15, w20, w12, ne
  0x000003ff8926660c: orr       w11, w11, w4                                      0x000003ffa5279058: cmp       w17, #0x0
  0x000003ff89266610: ldrb      w10, [x19,w1,sxtw #0]                             0x000003ffa527905c: ldrb      w11, [x25,w13,sxtw #0]
  0x000003ff89266614: orr       w13, w11, w13                                     0x000003ffa5279060: csel      w12, w20, w17, ne
  0x000003ff89266618: ldrb      w4, [x20,w1,sxtw #0]                              0x000003ffa5279064: orr       w10, w16, w10
  0x000003ff8926661c: orr       w10, w13, w10                                     0x000003ffa5279068: cmp       w11, #0x0
  0x000003ff89266620: ldrb      w11, [x6,w1,sxtw #0]                              0x000003ffa527906c: ldrb      w18, [x26,w13,sxtw #0]
  0x000003ff89266624: orr       w10, w10, w4                                                                                    ;*invokevirtual getBoolean {r
  0x000003ff89266628: ldrb      w13, [x21,w1,sxtw #0]                                                                           ; - Bytes::bang at 22 (line 20)
  0x000003ff8926662c: orr       w11, w10, w11                                                                                   ; - Bytes::main at 11 (line 30)
  0x000003ff89266630: orr       w0, w11, w13    ;*ior {reexecute=0 rethrow=
                                                ; - Bytes::bang at 25 (line 20       0x000003ffa5279070: orr       w10, w15, w10
                                                                                  0x000003ffa5279074: csel      w15, w20, w11, ne
  0x000003ff89266634: add       w11, w1, #0x10  ;*iinc {reexecute=0 rethrow       0x000003ffa5279078: orr       w10, w12, w10
                                                ; - Bytes::bang at 27 (line 19       0x000003ffa527907c: cmp       w18, #0x0
                                                                                  0x000003ffa5279080: orr       w12, w15, w10
  0x000003ff89266638: cmp       w11, #0x3f1                                       0x000003ffa5279084: csel      w11, w20, w18, ne
  0x000003ff8926663c: b.lt      0x000003ff892665b0  ;*if_icmpge {reexecute=       0x000003ffa5279088: add       w10, w13, #0x8  ;*iinc {reexecute=0 rethrow=0
                                                ; - Bytes::bang at 8 (line 19)                                                     ; - Bytes::bang at 27 (line 19)
                                                                                                                                ; - Bytes::main at 11 (line 30)

                                                                                  0x000003ffa527908c: orr       w16, w11, w12   ;*ior {reexecute=0 rethrow=0
                                                                                                                                ; - Bytes::bang at 25 (line 20)
                                                                                                                                ; - Bytes::main at 11 (line 30)

                                                                                  0x000003ffa5279090: cmp       w10, #0x3f9
                                                                                  0x000003ffa5279094: b.lt      0x000003ffa5279000  ;*if_icmpge {reexecute=0
                                                                                                                                ; - Bytes::bang at 8 (line 19)
                                                                                                                                ; - Bytes::main at 11 (line 30)

The new version is only 8-unrolled rather than 16-unrolled like the
old one.  But the comparison and conditional selects happen almost
entirely in the shadow of the load instructions, so there is no loss
of performance.

C1 code looks fine too.  OK from me.

Andrew.


More information about the hotspot-compiler-dev mailing list