A question about bytecodes + unsigned load performance ./. add performace
Christian Thalinger
Christian.Thalinger at Sun.COM
Mon Jan 19 06:43:13 PST 2009
On Fri, 2009-01-16 at 12:29 -0800, John Rose wrote:
> Yes. It's a valid ideal node type, not hardware specific and useful
> to optimizations.
As I've already written on hotspot-dev, the optimization works but it's
generally not faster.
I'm not sure yet why this is the case, as the new code is denser. Maybe
both codes get translated to the same micro-ops, but then the
performance should be at least equal.
The following example is with my changes (Intel Core2 Duo T9300 @
2.5GHz):
time for map[a & 0xFF]: 1525 ms
time for map[a + 0x80]: 1461 ms
The first one boils down to:
0xfffffd7ffa3029d2: movzbl 0x19(%r10),%r10d
0xfffffd7ffa3029d7: movzwl 0x18(%r14,%r10,2),%r10d ;*caload
0xfffffd7ffa3029dd: mov %r10w,0x1a(%rbp,%rdi,2) ;*castore
and the second to:
0xfffffd7ffa302b24: movsbl 0x19(%r13,%rdi,1),%r10d
0xfffffd7ffa302b4e: movslq %r10d,%r10
0xfffffd7ffa302b51: movzwl 0x118(%r14,%r10,2),%r10d ;*caload
0xfffffd7ffa302b5a: mov %r10w,0x1a(%rbp,%rdi,2) ;*castore
Maybe out-of-order execution and micro-ops optimizations (I don't know
if there are any in an Intel CPU) can combine movsbl and movslq to one
micro-op, but still, both variants should have the same performance.
Generating movzbq instead of movzbl gives:
time for map[a & 0xFF]: 1533 ms
0xfffffd7ffa303312: movzbq 0x19(%r10),%r10
0xfffffd7ffa303317: movzwl 0x18(%r14,%r10,2),%r10d ;*caload
0xfffffd7ffa30331d: mov %r10w,0x1a(%rbp,%rdi,2) ;*castore
However, I think we should integrate my changes as it opens up the
possibility for new optimizations more easily, e.g. superword. The
unrolled loop could then use a code sequence like:
pxor xmm7, xmm7
movdqa xmm1, xmm0 ; copy source
punpcklbw xmm0, xmm7 ; unpack the 8 low-end bytes
; into 8 zero-extended 16-bit words
punpckhbw xmm1, xmm7 ; unpack the 8 high-end bytes
; into 8 zero-extended 16-bit words
Processing 8 or 16 values at once. And that should definitely be
faster...
-- Christian
More information about the hotspot-compiler-dev
mailing list