A question about bytecodes + unsigned load performance ./. add performace

Christian Thalinger Christian.Thalinger at Sun.COM
Mon Jan 19 10:39:12 PST 2009


On Mon, 2009-01-19 at 15:43 +0100, Christian Thalinger wrote:
> However, I think we should integrate my changes as it opens up the
> possibility for new optimizations more easily, e.g. superword.  The
> unrolled loop could then use a code sequence like:
> 
> pxor      xmm7, xmm7
> movdqa    xmm1, xmm0 ; copy source
> punpcklbw xmm0, xmm7 ; unpack the 8 low-end bytes
>                      ; into 8 zero-extended 16-bit words
> punpckhbw xmm1, xmm7 ; unpack the 8 high-end bytes
>                      ; into 8 zero-extended 16-bit words
> 
> Processing 8 or 16 values at once.  And that should definitely be
> faster...

...but in this special decoder example, that one could only be applied
for the non-map case.

-- Christian




More information about the hotspot-compiler-dev mailing list