Unnecessary Long computation/loads feeding into ConvL2I

Tom Rodriguez Thomas.Rodriguez at Sun.COM
Thu Dec 11 09:49:26 PST 2008


You might also a rule for (Set dst (ConvL2I (LoadL mem))) that loads  
from the appropriate half of the long.

tom

On Dec 11, 2008, at 9:41 AM, Vladimir Kozlov wrote:

> Ed,
>
> These changes are platform specific for 32-bits x86.
> And we are trying to stay platform neutral during ideal
> optimizations.
>
> What you can do is to add new mach nodes in x86_32.ad:
>
> instruct xorl2i_reg_reg(eRegI dst, eRegL src1, eRegL src2, eFlagsReg  
> cr) %{
>  match(Set dst (ConvL2I (XorL src1 src2)));
>  expand %{
>    xorI_eReg(src1, src2, cr);
>    convL2I_reg(dst, src1);
>  %}
> %}
>
> instruct xorl2i_reg_mem(eRegI dst, eRegL src1, memory src2,  
> eFlagsReg cr) %{
>  match(Set dst (ConvL2I (XorL src1 src2)));
>  expand %{
>    xorI_eReg_mem(src1, src2, cr);
>    convL2I_reg(dst, src1);
>  %}
> %}
>
> Unfortunately, with our current implementation you can't
> avoid L2I conversion if register allocator uses different
> registers for src1 and dst.
>
> Vladimir
>
>
> Edward Lee wrote:
>> I was looking at the OptoAssembly and final x86 code for the  
>> following
>> code sequence and noticed a bit of unnecessary work in the generated
>> code:
>> int hash = (int) (this.identity.containerId ^  
>> this.identity.segmentId);
>> hash = (int) ((long) hash ^ pageNumber);
>> The ideal graph looks like..
>> L2I(XorL(
>>  I2L(L2I(XorL(containerId, segmentId))),
>>  pageNumber))
>> It eventually becomes..
>> mov    0x10(%esi),%ecx ; containerId.lo
>> mov    0x14(%esi),%ebx ; containerId.hi
>> xor    0x8(%esi),%ecx ; XorL segmentId.lo
>> xor    0xc(%esi),%ebx ; XorL segmentId.hi
>> mov    %ecx,%ebx ; 2-line upcast I2L
>> sar    $0x1f,%ebx ; I2L
>> xor    0x40(%esp),%ecx ; XorL pageNumber.lo
>> xor    0x44(%esp),%ebx ; XorL pageNumber.hi
>> mov    %ecx,%ebx ; unnecessary ?? L2I (Opto: MOV    EBX,ECX.lo)
>> The attached patch splits L2I(XorL(a,b)) into XorI(L2I(a),L2I(b)) and
>> in this situation, it's simplified to..
>> XorI(
>>  L2I(XorL(containerId, segmentId)),
>>  L2I(pageNumber))
>> mov    0x10(%esi),%ebp ; containerId.lo
>> mov    0x14(%esi),%edi ; containerId.hi
>> xor    0x8(%esi),%ebp ; XorL segmentId.lo
>> xor    0xc(%esi),%edi ; XorL segmentId.hi
>> mov    0x40(%esp),%eax ; pageNumber.lo
>> mov    0x44(%esp),%edx ; pageNumber.hi
>> mov    %ebp,%ebx ; (Opto: MOV    EBX,EBP.lo)
>> xor    %eax,%ebx ; XorI pageNumber.lo
>> Unfortunately, there's still a number of rough spots.. As per  
>> comments
>> in ConvL2INode::Ideal..
>>  // Disable optimization: LoadL->ConvL2I ==> LoadI.
>>  // It causes problems (sizes of Load and Store nodes do not match)
>>  // in objects initialization code and Escape Analysis.
>> This forces the loads to stay as long loads, and this causes
>> unnecessary register pressure. In the "after" x86 code, pageNumber is
>> explicitly moved into eax and edx even though xor could have just  
>> used
>> 0x40(%esp) directly. Also, all the high bits of the longs should be
>> completely ignored.
>> Additionally, it seems that EBP.lo in the OptoAssembly output gets
>> treated as a separate entity from EBP. Otherwise, the xor could have
>> just used %ebp (EBP.lo) directly as in "xor 0x40($esp), %ebp".
>> Ideally the whole sequence should look like..
>> mov    0x10(%esi),%ebp ; containerId.lo
>> xor    0x8(%esi),%ebp ; XorI segmentId.lo
>> xor    0x40(%esp),%ebp ; XorI pageNumber.lo
>> Ed




More information about the hotspot-dev mailing list