Request for reviews (S): 6814842: Load shortening optimizations
Tom Rodriguez
Thomas.Rodriguez at Sun.COM
Mon Mar 9 12:28:24 PDT 2009
>> I think the bang for the buck is low to change it at the ideal
>> level. Can't this be addressed in the ad files? The adds to deal
>> with endianness will get hidden in the encoding so it won't effect
>> alias analysis.
>
> We tried shortening 64- to 32-bit loads once, and we backed off.
> IIRC, there was an interaction with copy coalescing or some other
> post-match logic. I looked in the old teamware histories, but
> couldn't find evidence of it, so it's just a hazy memory of years ago.
I can't see how it would interact with copy coalescing. It will screw
anti-dep computations since stores to it and loads from it would be in
different alias classes if the offset was exposed in the AddP
expression. A shortening load and a store to the same location could
be incorrectly scheduled because they would appear unrelated. I
remember seeing a comment to that effect somewhere though I can't find
it now. Actually you have to go back a bit to find it. From an older
workspace:
// Fold up with a prior LoadL: LoadL->ConvL2I ==> LoadI
// Requires we understand the 'endianess' of Longs.
if( andl_op == Op_LoadL ) {
Node *adr = andl->in(MemNode::Address);
// VM_LITTLE_ENDIAN is #defined appropriately in the Makefiles
#ifndef VM_LITTLE_ENDIAN
// The transformation can cause problems on BIG_ENDIAN
architectures
// where the jint is not the same address as the jlong.
Specifically, we
// will fail to insert an anti-dependence in GCM between the
LoadI and a
// subsequent StoreL because different memory offsets provoke
// flatten_alias_type() into indicating two different types. See
bug
// 4755222.
// Node *base = adr->is_AddP() ? adr->in(AddPNode::Base) : adr;
// adr = phase->transform( new (phase->C, 4)
AddPNode(base,adr,phase->MakeConX(sizeof(jint))));
return NULL;
#else
if (phase->C->alias_type(andl->adr_type())->is_volatile()) {
// Picking up the low half by itself bypasses the atomic load
and we could
// end up with more than one non-atomic load. See bugs 4432655
and 4526490.
// We could go to the trouble of iterating over andl's output
edges and
// punting only if there's more than one real use, but we don't
bother.
return NULL;
}
return new (phase->C, 3) LoadINode(andl-
>in(MemNode::Control),andl->in(MemNode::Memory),adr,
((LoadLNode*)andl)->raw_adr_type());
#endif
}
In current builds this whole thing has been replaced with:
// Disable optimization: LoadL->ConvL2I ==> LoadI.
// It causes problems (sizes of Load and Store nodes do not match)
// in objects initialization code and Escape Analysis.
At the ideal level there's also the problem that the store_OpCode()
method will no longer work match which is part of what's being
referred to in the above comment I think. I suspect we could use an
assert in the alias idx computation logic to barf if it sees offsets
in the middle an existing field.
> Load shortening might make sense very late, after register
> allocation, near post-allocation copy removal. At that point, you
> should be able to alter offsets also, so you are not limited just to
> the word or byte which happens to come first in the CPU's endian
> order.
Doing it in the ad should be sufficient for hiding them. The offset
would only be in the encoding and it's not possible to look inside the
encoding.
tom
> -- John
More information about the hotspot-compiler-dev
mailing list