Request for reviews (L): 6814842: Load shortening optimizations
Christian Thalinger
Christian.Thalinger at Sun.COM
Fri May 8 05:01:17 PDT 2009
On Fri, 2009-05-08 at 12:29 +0200, Ulf Zibis wrote:
> Am 07.05.2009 15:55, Christian Thalinger schrieb:
> > On Thu, 2009-05-07 at 15:40 +0200, Ulf Zibis wrote:
> >
> >> E.g.:
> >> char decode(byte b) {
> >> return (char)(b & 0xFF);
> >> }
> >>
> >
> > This is handled by HotSpot for a long time.
> >
>
> Hm, I can't share this experience.
>
> Results from my benchmark on JDK7 b51:
> <https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/DecoderBenchmark.java?rev=672&view=markup>
>
> time for (char)a: 1652 ms
> time for (char)(a & 0xFF): 2772 ms
>
> IMHO (char)(a & 0xFF) should be faster, or at least as fast as (char)a,
> because only basic LoadUB should be executed.
I already said that once, it's not the code the makes the second one
slower, it's the register allocation. When running with:
-XX:LoopUnrollLimit=1
you get more accurate numbers regarding the generated code.
I looked at the code of loop3 and loop4 and they are actually identical.
Except loop4 uses a zero-extension move instead of a sign-extension one,
which is what we want.
time for (char)a: 1350 ms
time for (char)(a & 0xFF): 1429 ms
time for inlined (char)a: 1493 ms
time for inlined (char)(a & 0xFF): 1494 ms
The differences in runtime might be related to other things
(cache, ...), since loop3/loop4 generate exact the same code as
inline3/inline4.
-- Christian
More information about the hotspot-compiler-dev
mailing list