Request for reviews (L): 6814842: Load shortening optimizations

Fri May 8 09:36:39 PDT 2009

Thanks for detailed information.

Some more questions:
- Where can I find information about -XX:LoopUnrollLimit?
- Where can I find information about other -XX hotspot options (for jdk7)?
- How to see the generated optimized code from hotspot? I don't know 
where to find the option flag docu.
 > it's not the code the makes the second one slower, it's the register 
allocation.
Does that mean, that sign-extension has faster register access than 
zero-extension?

-Ulf

Am 08.05.2009 14:01, Christian Thalinger schrieb:
> On Fri, 2009-05-08 at 12:29 +0200, Ulf Zibis wrote:
>   
>> Am 07.05.2009 15:55, Christian Thalinger schrieb:
>>     
>>> On Thu, 2009-05-07 at 15:40 +0200, Ulf Zibis wrote:
>>>   
>>>       
>>>> E.g.:
>>>>     char decode(byte b) {
>>>>         return (char)(b & 0xFF);
>>>>     }
>>>>     
>>>>         
>>> This is handled by HotSpot for a long time.
>>>   
>>>       
>> Hm, I can't share this experience.
>>
>> Results from my benchmark on JDK7 b51:
>> <https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/DecoderBenchmark.java?rev=672&view=markup>
>>
>> time for (char)a: 1652 ms
>> time for (char)(a & 0xFF): 2772 ms
>>
>> IMHO (char)(a & 0xFF) should be faster, or at least as fast as (char)a, 
>> because only basic LoadUB should be executed.
>>     
>
> I already said that once, it's not the code the makes the second one
> slower, it's the register allocation.  When running with:
>
> -XX:LoopUnrollLimit=1
>
> you get more accurate numbers regarding the generated code.
>
> I looked at the code of loop3 and loop4 and they are actually identical.
> Except loop4 uses a zero-extension move instead of a sign-extension one,
> which is what we want.
>
> time for (char)a: 1350 ms
> time for (char)(a & 0xFF): 1429 ms
>
> time for inlined (char)a: 1493 ms
> time for inlined (char)(a & 0xFF): 1494 ms
>
> The differences in runtime might be related to other things
> (cache, ...), since loop3/loop4 generate exact the same code as
> inline3/inline4.
>
> -- Christian
>
>
>