A question about bytecodes + unsigned load performance ./. add performace
Tom Rodriguez
Thomas.Rodriguez at Sun.COM
Mon Jan 12 13:52:27 PST 2009
> I tried to take a look at it, but now I'm stuck.
I suspect this is a problem I remember seeing long ago in the logic
for breaking match trees with multiple memory edges. When converting
the ideal form into the mach form, the matcher has to take the ideal
form, which is a DAG, and cut it up into trees. The matcher can only
operate on binary trees. Within these binary trees it's possible for
multiple ideal nodes to be swallowed into a single mach node. One
danger with this is when matching multiple nodes that have memory
inputs it might be possible to violate anti-dependence. There's a
some cutouts in Matcher::Label_Root that force a break in the match
when there are multiple memory inputs in the match tree:
// Check for leaves of the State Tree; things that cannot be a
part of
// the current tree. If it finds any, that value is matched as a
// register operand. If not, then the normal matching is used.
if( match_into_reg(n, m, control, i, is_shared(m)) ||
//
// Stop recursion if this is LoadNode and the root of this
tree is a
// StoreNode and the load & store have different memories.
((mem!=(Node*)1) && m->is_Load() && m->in(MemNode::Memory) !=
mem) ||
// Can NOT include the match of a subtree when its memory state
// is used by any of the other subtrees
(input_mem == NodeSentinel) ) {
The ideal for the simple example is something like (StoreC mem2 addr2
(AndI (LoadB mem1 addr1) (ConI 0xff))). The code above will break the
match at the load, forcing the value into a register. It's seem like
an excessively strong cutout but I'm not sure how to phrase it better,
particularly since I don't know what exactly what problem it designed
to eliminate. I believe it's probably the anti-dep issue but without
a concrete failure it's hard to know what exactly it should look like.
tom
>
>
> The ideal nodes in question are:
>
> 129 LoadB === 311 51 127 [[ 141 ]] @byte[int:>=0]:exact+any *,
> idx=4; #byte !jvms: test::foo @ bci:28
> 140 ConI === 0 [[ 141 217 268 347 439 441 ]] #int:255
> 141 AndI === 458 129 140 [[ 164 ]] !orig=[377] !jvms:
> test::decode @ bci:4 test::foo @ bci:29
>
> So loadUB should match but it does not (and I don't know why, yet).
> The
> opto output is:
>
> 102 B7: # B6 B8 <- B6 Freq: 2
> 102 movslq R10, R11 # i2l
> 105 movq R8, [rsp + #8] # spill
> 10a movsbl R8, [R8 + #24 + R10] # byte
> 110 incl R11 # int
> 113 movzbl R8, R8 # int & 0xFF
> 117 movw [R9 + #24 + R10 << #1], R8 # char/short
> 11d cmpl R11, #1
> 121 jl,s B6 # loop end P=0.500000 C=22950.000000
>
> It seems the increment of the loop variable gets scheduled between
> LoadB
> and immI_255 and thus loadUB cannot match.
>
> Not sure yet when matching is applied and if I'm right with my
> assumption above. I'm looking further...
>
> -- Christian
>
More information about the hotspot-dev
mailing list