Request for reviews (S): 6814842: Load shortening optimizations

Mon Mar 9 12:28:24 PDT 2009

>> I think the bang for the buck is low to change it at the ideal  
>> level.  Can't this be addressed in the ad files?  The adds to deal  
>> with endianness will get hidden in the encoding so it won't effect  
>> alias analysis.
>
> We tried shortening 64- to 32-bit loads once, and we backed off.   
> IIRC, there was an interaction with copy coalescing or some other  
> post-match logic.  I looked in the old teamware histories, but  
> couldn't find evidence of it, so it's just a hazy memory of years ago.

I can't see how it would interact with copy coalescing.  It will screw  
anti-dep computations since stores to it and loads from it would be in  
different alias classes if the offset was exposed in the AddP  
expression.  A shortening load and a store to the same location could  
be incorrectly scheduled because they would appear unrelated.  I  
remember seeing a comment to that effect somewhere though I can't find  
it now.  Actually you have to go back a bit to find it.  From an older  
workspace:

   // Fold up with a prior LoadL: LoadL->ConvL2I ==> LoadI
   // Requires we understand the 'endianess' of Longs.
   if( andl_op == Op_LoadL ) {
     Node *adr = andl->in(MemNode::Address);
     // VM_LITTLE_ENDIAN is #defined appropriately in the Makefiles
#ifndef VM_LITTLE_ENDIAN
     // The transformation can cause problems on BIG_ENDIAN  
architectures
     // where the jint is not the same address as the jlong.  
Specifically, we
     // will fail to insert an anti-dependence in GCM between the  
LoadI and a
     // subsequent StoreL because different memory offsets provoke
     // flatten_alias_type() into indicating two different types.  See  
bug
     // 4755222.

     // Node *base = adr->is_AddP() ? adr->in(AddPNode::Base) : adr;
     // adr = phase->transform( new (phase->C, 4)  
AddPNode(base,adr,phase->MakeConX(sizeof(jint))));
     return NULL;
#else
     if (phase->C->alias_type(andl->adr_type())->is_volatile()) {
       // Picking up the low half by itself bypasses the atomic load  
and we could
       // end up with more than one non-atomic load.  See bugs 4432655  
and 4526490.
       // We could go to the trouble of iterating over andl's output  
edges and
       // punting only if there's more than one real use, but we don't  
bother.
       return NULL;
     }
     return new (phase->C, 3) LoadINode(andl- 
 >in(MemNode::Control),andl->in(MemNode::Memory),adr, 
((LoadLNode*)andl)->raw_adr_type());
#endif
   }

In current builds this whole thing has been replaced with:

   // Disable optimization: LoadL->ConvL2I ==> LoadI.
   // It causes problems (sizes of Load and Store nodes do not match)
   // in objects initialization code and Escape Analysis.

At the ideal level there's also the problem that the store_OpCode()  
method will no longer work match which is part of what's being  
referred to in the above comment I think.  I suspect we could use an  
assert in the alias idx computation logic to barf if it sees offsets  
in the middle an existing field.

> Load shortening might make sense very late, after register  
> allocation, near post-allocation copy removal.  At that point, you  
> should be able to alter offsets also, so you are not limited just to  
> the word or byte which happens to come first in the CPU's endian  
> order.

Doing it in the ad should be sufficient for hiding them.  The offset  
would only be in the encoding and it's not possible to look inside the  
encoding.

tom

> -- John