JEP 254, HotSpot, and optimizations
Rémi Forax
forax at univ-mlv.fr
Tue Mar 22 20:59:30 UTC 2016
Value types that defines a vectorized operation API is all you need.
Rémi
Le 22 mars 2016 21:29:07 CET, Andrew Haley <aph at redhat.com> a écrit :
>I'm looking at compact strings for AArch64. I know that they are
>intended to be implemented but HotSpot intrinsics, and the Java
>methods are placeholders, but I was tempted to investigate: could you
>write efficient implementations of the methods in pure Java?
>
>Here's what I tried for StringLatin1::inflate:
>
> while (srcPtr < endPtr) {
> long bytes = U.getLongUnaligned(src, srcPtr);
> srcPtr += 8;
>
> long chars =
> bytes << 56 >>> 56;
> chars |= bytes << 48 >>> 56 << 16;
> chars |= bytes << 40 >>> 56 << 32;
> chars |= bytes << 32 >>> 56 << 48;
>
> U.putLongUnaligned(dst, dstPtr, chars);
> dstPtr += 8;
>
> chars = bytes << 24 >>> 56;
> chars |= bytes << 16 >>> 56 << 16;
> chars |= bytes << 8 >>> 56 << 32;
> chars |= bytes >>> 56 << 48;
>
> U.putLongUnaligned(dst, dstPtr, chars);
> dstPtr += 8;
> }
>
>
>and here's the inner loop generated by C2:
>
>0x0000007fa8725de0: ldr x11, [x17,x3] ;*invokevirtual
>getLongUnaligned {reexecute=0 rethrow=0 return_oop=0}
> 0x0000007fa8725de4: ubfx x12, x11, #8, #8
> 0x0000007fa8725de8: and x13, x11, #0xff
> 0x0000007fa8725dec: ubfx x14, x11, #16, #8
> 0x0000007fa8725df0: orr x12, x13, x12, lsl #16
> 0x0000007fa8725df4: ubfx x15, x11, #40, #8
> 0x0000007fa8725df8: ubfx x16, x11, #32, #8
> 0x0000007fa8725dfc: ubfx x13, x11, #48, #8
> 0x0000007fa8725e00: ubfx x18, x11, #24, #8
> 0x0000007fa8725e04: orr x12, x12, x14, lsl #32
> 0x0000007fa8725e08: orr x14, x16, x15, lsl #16
> 0x0000007fa8725e0c: lsr x11, x11, #56
> 0x0000007fa8725e10: orr x13, x14, x13, lsl #32
> 0x0000007fa8725e14: orr x12, x12, x18, lsl #48
> 0x0000007fa8725e18: orr x11, x13, x11, lsl #48
> 0x0000007fa8725e1c: add x13, x2, x4
>0x0000007fa8725e20: str x12, [x13] ;*invokevirtual
>putLongUnaligned {reexecute=0 rethrow=0 return_oop=0}
>0x0000007fa8725e24: str x11, [x13,#8] ;*invokevirtual
>putLongUnaligned {reexecute=0 rethrow=0 return_oop=0}
>0x0000007fa8725e28: add x4, x4, #0x10 ;*goto {reexecute=0 rethrow=0
>return_oop=0}
>0x0000007fa8725e2c: add x3, x3, #0x8 ; ImmutableOopMap{r17=Oop
>c_rarg2=Oop }
>0x0000007fa8725e30: ldr wzr, [x5] ;*goto {reexecute=0 rethrow=0
>return_oop=0}
> ; {poll}
> 0x0000007fa8725e34: cmp x3, x6
>0x0000007fa8725e38: b.lt 0x0000007fa8725de0 ;*ifge {reexecute=0
>rethrow=0 return_oop=0}
> ; - java.lang.StringLatin1::inflate at 61 (line 576)
>
>This is pretty good code. (It's only little endian, but that's not
>hard to fix.) I could not do any better writing this by hand unless I
>used the vector processor. C1-generated code is worse than this, but
>it's still not bad.
>
>Perhaps it doesn't matter; perhaps we know there is no real point
>trying to make the Java versions of these methods "efficient". We
>know that the real goal is the intrinsics which use the vector
>processor.
>
>And one other thing: if we had simple primitives available as HotSpot
>intrinsics to do a few simple vector pack and unpack operations we
>wouldn't need to write all these hand-carved assembly language
>String intrinsics.
>
>Thoughts? Opinions?
>
>Andrew.
More information about the hotspot-dev
mailing list