Unnecessary Long computation/loads feeding into ConvL2I

Thu Dec 11 09:41:25 PST 2008

Ed,

These changes are platform specific for 32-bits x86.
And we are trying to stay platform neutral during ideal
optimizations.

What you can do is to add new mach nodes in x86_32.ad:

instruct xorl2i_reg_reg(eRegI dst, eRegL src1, eRegL src2, eFlagsReg cr) %{
   match(Set dst (ConvL2I (XorL src1 src2)));
   expand %{
     xorI_eReg(src1, src2, cr);
     convL2I_reg(dst, src1);
   %}
%}

instruct xorl2i_reg_mem(eRegI dst, eRegL src1, memory src2, eFlagsReg cr) %{
   match(Set dst (ConvL2I (XorL src1 src2)));
   expand %{
     xorI_eReg_mem(src1, src2, cr);
     convL2I_reg(dst, src1);
   %}
%}

Unfortunately, with our current implementation you can't
avoid L2I conversion if register allocator uses different
registers for src1 and dst.

Vladimir

Edward Lee wrote:
> I was looking at the OptoAssembly and final x86 code for the following
> code sequence and noticed a bit of unnecessary work in the generated
> code:
> 
> int hash = (int) (this.identity.containerId ^ this.identity.segmentId);
> hash = (int) ((long) hash ^ pageNumber);
> 
> The ideal graph looks like..
> L2I(XorL(
>   I2L(L2I(XorL(containerId, segmentId))),
>   pageNumber))
> 
> It eventually becomes..
> mov    0x10(%esi),%ecx ; containerId.lo
> mov    0x14(%esi),%ebx ; containerId.hi
> xor    0x8(%esi),%ecx ; XorL segmentId.lo
> xor    0xc(%esi),%ebx ; XorL segmentId.hi
> mov    %ecx,%ebx ; 2-line upcast I2L
> sar    $0x1f,%ebx ; I2L
> xor    0x40(%esp),%ecx ; XorL pageNumber.lo
> xor    0x44(%esp),%ebx ; XorL pageNumber.hi
> mov    %ecx,%ebx ; unnecessary ?? L2I (Opto: MOV    EBX,ECX.lo)
> 
> The attached patch splits L2I(XorL(a,b)) into XorI(L2I(a),L2I(b)) and
> in this situation, it's simplified to..
> 
> XorI(
>   L2I(XorL(containerId, segmentId)),
>   L2I(pageNumber))
> 
> mov    0x10(%esi),%ebp ; containerId.lo
> mov    0x14(%esi),%edi ; containerId.hi
> xor    0x8(%esi),%ebp ; XorL segmentId.lo
> xor    0xc(%esi),%edi ; XorL segmentId.hi
> mov    0x40(%esp),%eax ; pageNumber.lo
> mov    0x44(%esp),%edx ; pageNumber.hi
> mov    %ebp,%ebx ; (Opto: MOV    EBX,EBP.lo)
> xor    %eax,%ebx ; XorI pageNumber.lo
> 
> Unfortunately, there's still a number of rough spots.. As per comments
> in ConvL2INode::Ideal..
>   // Disable optimization: LoadL->ConvL2I ==> LoadI.
>   // It causes problems (sizes of Load and Store nodes do not match)
>   // in objects initialization code and Escape Analysis.
> 
> This forces the loads to stay as long loads, and this causes
> unnecessary register pressure. In the "after" x86 code, pageNumber is
> explicitly moved into eax and edx even though xor could have just used
> 0x40(%esp) directly. Also, all the high bits of the longs should be
> completely ignored.
> 
> Additionally, it seems that EBP.lo in the OptoAssembly output gets
> treated as a separate entity from EBP. Otherwise, the xor could have
> just used %ebp (EBP.lo) directly as in "xor 0x40($esp), %ebp".
> 
> Ideally the whole sequence should look like..
> 
> mov    0x10(%esi),%ebp ; containerId.lo
> xor    0x8(%esi),%ebp ; XorI segmentId.lo
> xor    0x40(%esp),%ebp ; XorI pageNumber.lo
> 
> Ed