A question about bytecodes + unsigned load performance ./. add performace
Christian Thalinger
christian.thalinger at gmail.com
Mon Jan 12 08:29:40 PST 2009
On Sat, 2009-01-10 at 10:39 -0800, John Rose wrote:
> It's already in there, to some degree, but hindered somehow by the
> peepholing problem. See 'instruct loadUB' around line 6406 of:
> http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/cpu/x86/vm/x86_32.ad
>
>
> What that does is, when it is time to "match" (or lower) ideal to
> machine nodes in the IR graph, if a suitable AndI and LoadB are
> adjacent, and if the LoadB is unshared, they are coalesced into a
> loadUB machine node.
>
>
> It would be a detailed debugging exercise to find out why, in the case
> of your code, that optimization does not appear to kick in.
I tried to take a look at it, but now I'm stuck.
The ideal nodes in question are:
129 LoadB === 311 51 127 [[ 141 ]] @byte[int:>=0]:exact+any *, idx=4; #byte !jvms: test::foo @ bci:28
140 ConI === 0 [[ 141 217 268 347 439 441 ]] #int:255
141 AndI === 458 129 140 [[ 164 ]] !orig=[377] !jvms: test::decode @ bci:4 test::foo @ bci:29
So loadUB should match but it does not (and I don't know why, yet). The
opto output is:
102 B7: # B6 B8 <- B6 Freq: 2
102 movslq R10, R11 # i2l
105 movq R8, [rsp + #8] # spill
10a movsbl R8, [R8 + #24 + R10] # byte
110 incl R11 # int
113 movzbl R8, R8 # int & 0xFF
117 movw [R9 + #24 + R10 << #1], R8 # char/short
11d cmpl R11, #1
121 jl,s B6 # loop end P=0.500000 C=22950.000000
It seems the increment of the loop variable gets scheduled between LoadB
and immI_255 and thus loadUB cannot match.
Not sure yet when matching is applied and if I'm right with my
assumption above. I'm looking further...
-- Christian
More information about the hotspot-dev
mailing list