A question about bytecodes + unsigned load performance ./. add performace

Ulf Zibis Ulf.Zibis at gmx.de
Mon Jan 12 11:19:13 PST 2009


Hi Christian,

I must admit, for me this codes are like Chinese letters.

But I'm very happy, that you take the time to find out how hotspot could 
be forced to kick in here.
I'm looking further your results, so that { char c = 
(char)((byte[])sa[sp] & 0xFF) } etc. would be optimised best, and the 
+128 trick would become superfluous.

-Ulf


Am 12.01.2009 17:29, Christian Thalinger schrieb:
> On Sat, 2009-01-10 at 10:39 -0800, John Rose wrote:
>
>   
>> It's already in there, to some degree, but hindered somehow by the
>> peepholing problem.  See 'instruct loadUB' around line 6406 of:
>>   http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/cpu/x86/vm/x86_32.ad
>>
>>
>> What that does is, when it is time to "match" (or lower) ideal to
>> machine nodes in the IR graph, if a suitable AndI and LoadB are
>> adjacent, and if the LoadB is unshared, they are coalesced into a
>> loadUB machine node.
>>
>>
>> It would be a detailed debugging exercise to find out why, in the case
>> of your code, that optimization does not appear to kick in.
>>     
>
> I tried to take a look at it, but now I'm stuck.
>
> The ideal nodes in question are:
>
>  129	LoadB	===  311  51  127  [[ 141 ]]  @byte[int:>=0]:exact+any *, idx=4; #byte !jvms: test::foo @ bci:28
>  140	ConI	===  0  [[ 141  217  268  347  439  441 ]]  #int:255
>  141	AndI	===  458  129  140  [[ 164 ]]  !orig=[377] !jvms: test::decode @ bci:4 test::foo @ bci:29
>
> So loadUB should match but it does not (and I don't know why, yet).  The
> opto output is:
>
> 102   B7: #	B6 B8 <- B6  Freq: 2
> 102   	movslq  R10, R11	# i2l
> 105   	movq    R8, [rsp + #8]	# spill
> 10a   	movsbl  R8, [R8 + #24 + R10]	# byte
> 110   	incl    R11	# int
> 113   	movzbl  R8, R8	# int & 0xFF
> 117   	movw    [R9 + #24 + R10 << #1], R8	# char/short
> 11d   	cmpl    R11, #1
> 121   	jl,s   B6	# loop end  P=0.500000 C=22950.000000
>
> It seems the increment of the loop variable gets scheduled between LoadB
> and immI_255 and thus loadUB cannot match.
>
> Not sure yet when matching is applied and if I'm right with my
> assumption above.  I'm looking further...
>
> -- Christian
>
>
>   




More information about the hotspot-dev mailing list