Request for reviews (L): 6814842: Load shortening optimizations

Mon May 11 01:55:11 PDT 2009

On Fri, 2009-05-08 at 19:01 +0200, Ulf Zibis wrote:
> > I already said that once, it's not the code the makes the second one
> > slower, it's the register allocation.
> 
> Does that mean, that I should use (char)a to decode ASCII-bytes 
> [0x00..0x7F] and (char)(a & 0xFF) to decode ISO-8859-1-bytes 
> [0x00..0xFF] to have best performance? Using (char)(a & 0xFF) in general 
> would waste performance for only-ASCII-bytes (...but win for 
> ISO-8859-1-bytes, if using -XX:LoopUnrollLimit=1, see below).

No, the & 0xFF case should always be faster, because the shifter unit is
not involved.  This is definitely the case on SPARC, I'm not sure if
there is any performance benefit on x86 CPUs.

> 
> >   When running with:
> >
> > -XX:LoopUnrollLimit=1
> >   
> 
> Result:
> time for (char)a: 2280 ms
> time for (char)(a & 0xFF): 2284 ms
> 
> How should I interpret this?

The performance is the same.

> In case of ISO-8859-1-bytes it's faster than without 
> -XX:LoopUnrollLimit=1, but slower in case of only-ASCII-bytes.
> Which values are relevant for real world case? I'm not sure, which java 
> code to use for most fast charset decoders.

That's hard to tell and depends a lot on the call-site and the code
around what we are currently talking about.  A good recipe is to write
code as straightforward as Java can be, since that code can be optimized
best from the VM in all use-cases and not only a particular one.

-- Christian