[9] RFR (S): 8161720: Better byte behavior for off-heap data
Andrew Haley
aph at redhat.com
Mon Aug 22 16:13:59 UTC 2016
Hi,
On 22/08/16 15:25, Zoltán Majó wrote:
>
>
> Solution: Normalize the result returned by Unsafe.getBoolean in
> - src/share/vm/prims/unsafe.cpp (used by the interpreter and by compiled
> code if the compiler intrinsics for Unsafe.getBoolean() are disabled)
> - the C1 and C2 intrinsics for Unsafe.getBoolean().
>
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8161720/webrev.00/
>
> Testing:
> - JPRT (incl. Unsafe[On|Off]HeapBooleanTest.java);
> - RBT testing with all hotspot tests both w/ -Xmixed and -Xcomp (no new
> problems have showed up).
Result looks pretty decent for AArch64.
For test case:
static boolean bang() {
boolean l = false;
for(int i = 0; i < SIZE; i++) {
l |= UNSAFE.getBoolean(null, offHeapMemory+i);
}
return l;
}
Before on the left, after on the right: (Sorry, needs a wide window)
;; B3: # B2 B4 <- B1 B2 Loop: B3-B2 inner main of N14 Freq: 1045.37 ;; B3: # B2 B4 <- B10 B2 Loop: B3-B2 inner main of N34 Freq: 1
0x000003ff892665b4: ldrb w11, [x23,w1,sxtw #0] 0x000003ffa5279004: adrp x10, 0x000003ffa84b9000
0x000003ff892665b8: ldrb w4, [x22,w1,sxtw #0] ; {external_word}
0x000003ff892665bc: orr w13, w0, w11 0x000003ffa5279008: add x10, x10, #0x1a0
0x000003ff892665c0: ldrb w10, [x5,w1,sxtw #0] 0x000003ffa527900c: ldrb w10, [x10,w13,sxtw #0]
0x000003ff892665c4: orr w13, w13, w4 0x000003ffa5279010: cmp w10, #0x0
0x000003ff892665c8: ldrb w11, [x3,w1,sxtw #0] 0x000003ffa5279014: ldrb w12, [x19,w13,sxtw #0]
0x000003ff892665cc: orr w13, w13, w10 0x000003ffa5279018: csel w10, w20, w10, ne
0x000003ff892665d0: ldrb w4, [x2,w1,sxtw #0] 0x000003ffa527901c: cmp w12, #0x0
0x000003ff892665d4: orr w13, w13, w11 0x000003ffa5279020: ldrb w15, [x21,w13,sxtw #0]
0x000003ff892665d8: ldrb w10, [x18,w1,sxtw #0] 0x000003ffa5279024: csel w11, w20, w12, ne
0x000003ff892665dc: orr w13, w13, w4 0x000003ffa5279028: cmp w15, #0x0
0x000003ff892665e0: ldrb w11, [x17,w1,sxtw #0] 0x000003ffa527902c: ldrb w18, [x22,w13,sxtw #0]
0x000003ff892665e4: orr w10, w13, w10 0x000003ffa5279030: csel w15, w20, w15, ne
0x000003ff892665e8: ldrb w4, [x16,w1,sxtw #0] 0x000003ffa5279034: orr w10, w10, w16
0x000003ff892665ec: orr w11, w10, w11 0x000003ffa5279038: cmp w18, #0x0
0x000003ff892665f0: ldrb w13, [x15,w1,sxtw #0] 0x000003ffa527903c: ldrb w12, [x23,w13,sxtw #0]
0x000003ff892665f4: orr w11, w11, w4 0x000003ffa5279040: orr w16, w11, w10
0x000003ff892665f8: ldrb w10, [x14,w1,sxtw #0] 0x000003ffa5279044: csel w10, w20, w18, ne
0x000003ff892665fc: orr w11, w11, w13 0x000003ffa5279048: cmp w12, #0x0
0x000003ff89266600: ldrb w4, [x12,w1,sxtw #0] 0x000003ffa527904c: ldrb w17, [x24,w13,sxtw #0]
0x000003ff89266604: orr w11, w11, w10 0x000003ffa5279050: orr w16, w16, w15
0x000003ff89266608: ldrb w13, [x7,w1,sxtw #0] 0x000003ffa5279054: csel w15, w20, w12, ne
0x000003ff8926660c: orr w11, w11, w4 0x000003ffa5279058: cmp w17, #0x0
0x000003ff89266610: ldrb w10, [x19,w1,sxtw #0] 0x000003ffa527905c: ldrb w11, [x25,w13,sxtw #0]
0x000003ff89266614: orr w13, w11, w13 0x000003ffa5279060: csel w12, w20, w17, ne
0x000003ff89266618: ldrb w4, [x20,w1,sxtw #0] 0x000003ffa5279064: orr w10, w16, w10
0x000003ff8926661c: orr w10, w13, w10 0x000003ffa5279068: cmp w11, #0x0
0x000003ff89266620: ldrb w11, [x6,w1,sxtw #0] 0x000003ffa527906c: ldrb w18, [x26,w13,sxtw #0]
0x000003ff89266624: orr w10, w10, w4 ;*invokevirtual getBoolean {r
0x000003ff89266628: ldrb w13, [x21,w1,sxtw #0] ; - Bytes::bang at 22 (line 20)
0x000003ff8926662c: orr w11, w10, w11 ; - Bytes::main at 11 (line 30)
0x000003ff89266630: orr w0, w11, w13 ;*ior {reexecute=0 rethrow=
; - Bytes::bang at 25 (line 20 0x000003ffa5279070: orr w10, w15, w10
0x000003ffa5279074: csel w15, w20, w11, ne
0x000003ff89266634: add w11, w1, #0x10 ;*iinc {reexecute=0 rethrow 0x000003ffa5279078: orr w10, w12, w10
; - Bytes::bang at 27 (line 19 0x000003ffa527907c: cmp w18, #0x0
0x000003ffa5279080: orr w12, w15, w10
0x000003ff89266638: cmp w11, #0x3f1 0x000003ffa5279084: csel w11, w20, w18, ne
0x000003ff8926663c: b.lt 0x000003ff892665b0 ;*if_icmpge {reexecute= 0x000003ffa5279088: add w10, w13, #0x8 ;*iinc {reexecute=0 rethrow=0
; - Bytes::bang at 8 (line 19) ; - Bytes::bang at 27 (line 19)
; - Bytes::main at 11 (line 30)
0x000003ffa527908c: orr w16, w11, w12 ;*ior {reexecute=0 rethrow=0
; - Bytes::bang at 25 (line 20)
; - Bytes::main at 11 (line 30)
0x000003ffa5279090: cmp w10, #0x3f9
0x000003ffa5279094: b.lt 0x000003ffa5279000 ;*if_icmpge {reexecute=0
; - Bytes::bang at 8 (line 19)
; - Bytes::main at 11 (line 30)
The new version is only 8-unrolled rather than 16-unrolled like the
old one. But the comparison and conditional selects happen almost
entirely in the shadow of the load instructions, so there is no loss
of performance.
C1 code looks fine too. OK from me.
Andrew.
More information about the hotspot-compiler-dev
mailing list