A question about bytecodes + unsigned load performance ./. add performace
Christian Thalinger
Christian.Thalinger at Sun.COM
Mon Jan 19 10:39:12 PST 2009
On Mon, 2009-01-19 at 15:43 +0100, Christian Thalinger wrote:
> However, I think we should integrate my changes as it opens up the
> possibility for new optimizations more easily, e.g. superword. The
> unrolled loop could then use a code sequence like:
>
> pxor xmm7, xmm7
> movdqa xmm1, xmm0 ; copy source
> punpcklbw xmm0, xmm7 ; unpack the 8 low-end bytes
> ; into 8 zero-extended 16-bit words
> punpckhbw xmm1, xmm7 ; unpack the 8 high-end bytes
> ; into 8 zero-extended 16-bit words
>
> Processing 8 or 16 values at once. And that should definitely be
> faster...
...but in this special decoder example, that one could only be applied
for the non-map case.
-- Christian
More information about the hotspot-compiler-dev
mailing list