Unnecessary Long computation/loads feeding into ConvL2I
Vladimir Kozlov
Vladimir.Kozlov at Sun.COM
Thu Dec 11 09:41:25 PST 2008
Ed,
These changes are platform specific for 32-bits x86.
And we are trying to stay platform neutral during ideal
optimizations.
What you can do is to add new mach nodes in x86_32.ad:
instruct xorl2i_reg_reg(eRegI dst, eRegL src1, eRegL src2, eFlagsReg cr) %{
match(Set dst (ConvL2I (XorL src1 src2)));
expand %{
xorI_eReg(src1, src2, cr);
convL2I_reg(dst, src1);
%}
%}
instruct xorl2i_reg_mem(eRegI dst, eRegL src1, memory src2, eFlagsReg cr) %{
match(Set dst (ConvL2I (XorL src1 src2)));
expand %{
xorI_eReg_mem(src1, src2, cr);
convL2I_reg(dst, src1);
%}
%}
Unfortunately, with our current implementation you can't
avoid L2I conversion if register allocator uses different
registers for src1 and dst.
Vladimir
Edward Lee wrote:
> I was looking at the OptoAssembly and final x86 code for the following
> code sequence and noticed a bit of unnecessary work in the generated
> code:
>
> int hash = (int) (this.identity.containerId ^ this.identity.segmentId);
> hash = (int) ((long) hash ^ pageNumber);
>
> The ideal graph looks like..
> L2I(XorL(
> I2L(L2I(XorL(containerId, segmentId))),
> pageNumber))
>
> It eventually becomes..
> mov 0x10(%esi),%ecx ; containerId.lo
> mov 0x14(%esi),%ebx ; containerId.hi
> xor 0x8(%esi),%ecx ; XorL segmentId.lo
> xor 0xc(%esi),%ebx ; XorL segmentId.hi
> mov %ecx,%ebx ; 2-line upcast I2L
> sar $0x1f,%ebx ; I2L
> xor 0x40(%esp),%ecx ; XorL pageNumber.lo
> xor 0x44(%esp),%ebx ; XorL pageNumber.hi
> mov %ecx,%ebx ; unnecessary ?? L2I (Opto: MOV EBX,ECX.lo)
>
> The attached patch splits L2I(XorL(a,b)) into XorI(L2I(a),L2I(b)) and
> in this situation, it's simplified to..
>
> XorI(
> L2I(XorL(containerId, segmentId)),
> L2I(pageNumber))
>
> mov 0x10(%esi),%ebp ; containerId.lo
> mov 0x14(%esi),%edi ; containerId.hi
> xor 0x8(%esi),%ebp ; XorL segmentId.lo
> xor 0xc(%esi),%edi ; XorL segmentId.hi
> mov 0x40(%esp),%eax ; pageNumber.lo
> mov 0x44(%esp),%edx ; pageNumber.hi
> mov %ebp,%ebx ; (Opto: MOV EBX,EBP.lo)
> xor %eax,%ebx ; XorI pageNumber.lo
>
> Unfortunately, there's still a number of rough spots.. As per comments
> in ConvL2INode::Ideal..
> // Disable optimization: LoadL->ConvL2I ==> LoadI.
> // It causes problems (sizes of Load and Store nodes do not match)
> // in objects initialization code and Escape Analysis.
>
> This forces the loads to stay as long loads, and this causes
> unnecessary register pressure. In the "after" x86 code, pageNumber is
> explicitly moved into eax and edx even though xor could have just used
> 0x40(%esp) directly. Also, all the high bits of the longs should be
> completely ignored.
>
> Additionally, it seems that EBP.lo in the OptoAssembly output gets
> treated as a separate entity from EBP. Otherwise, the xor could have
> just used %ebp (EBP.lo) directly as in "xor 0x40($esp), %ebp".
>
> Ideally the whole sequence should look like..
>
> mov 0x10(%esi),%ebp ; containerId.lo
> xor 0x8(%esi),%ebp ; XorI segmentId.lo
> xor 0x40(%esp),%ebp ; XorI pageNumber.lo
>
> Ed
More information about the hotspot-dev
mailing list