[9] RFR(S): 8075136: Unnecessary sign extension for byte array access
Roland Westrelin
roland.westrelin at oracle.com
Thu Mar 19 16:25:01 UTC 2015
> Here's the final webrev:
> http://cr.openjdk.java.net/~thartmann/8075136/webrev.05/
Looks good.
Roland.
>
> Best,
> Tobias
>
> On 17.03.2015 19:26, Vladimir Kozlov wrote:
>> On 3/17/15 5:33 AM, Tobias Hartmann wrote:
>>>
>>> On 13.03.2015 19:03, Vladimir Kozlov wrote:
>>>> Looks good. What about other platforms? I see the same optimization in aarch64.ad
>>>
>>> Thanks, Vladimir. I fixed the issue on ARM64 as well. Looks like other platforms are not affected.
>>>
>>> http://cr.openjdk.java.net/~thartmann/8075136/webrev.03/
>>
>> Looks good for me. Please, finish discussion with Andrew Dinn to make sure he is okay with changes.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> While verifying, I noticed that the costs of memory operands in aarch64.ad are inconsistent:
>>>
>>> The following operands have cost 0:
>>> - indirect
>>> - indIndexScaledI2L
>>> - indIndexScaled
>>> - indIndex
>>> - indOffL
>>> - indirectN
>>> - indIndexScaledOffsetIN
>>> - indIndexScaledI2LN
>>> - indIndexScaledN
>>> - indIndexN
>>> - indOffIN
>>> - indOffLN
>>>
>>> Whereas the following operands have cost 'INSN_COST':
>>> - indIndexScaledOffsetI
>>> - indIndexScaledOffsetL
>>> - indIndexScaledOffsetI2L
>>> - indOffI
>>> - indIndexScaledOffsetLN
>>> - indIndexScaledOffsetI2LN
>>>
>>> This does not make sense to me. In this case 'AddP (AddP reg (ConvI2L ireg)) off' is matched to 'addP_reg_reg_ext' (cost INSN_COST) and 'indOffL' (cost 0) instead of the newly added 'indIndexOffsetI2L' (cost INSN_COST) because the sum of the costs is equal.
>>>
>>> I filed JDK-8075324 [1] to take care of this. I would suggest that we set the cost of a memory operand to either 0 (if only immediates) or 10 (if more complex) like we do on x86.
>>>
>>> I verified that the patch (together with the fix for JDK-8075324) solves the issue.
>>>
>>> Before:
>>> 034 B3: # N1 <- B2 Freq: 0.999998
>>> 034 + add R10, R10, R1, sxtw # ptr
>>> 038 + ldrsbw R0, [R10, #16] # byte
>>> After:
>>> 034 B3: # N1 <- B2 Freq: 0.999998
>>> 034 + ldrsbw R0, R10, R1, #16 I2L # byte
>>>
>>> Thanks,
>>> Tobias
>>>
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8075324
>>>
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/13/15 7:15 AM, Tobias Hartmann wrote:
>>>>> Hi,
>>>>>
>>>>> please review the following patch.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8075136
>>>>> http://cr.openjdk.java.net/~thartmann/8075136/webrev.00/
>>>>>
>>>>> Problem:
>>>>> C2 adds an unnecessary 'movslq' sign extension while compiling a byte array access:
>>>>>
>>>>> public static byte accessByte(int index) {
>>>>> return byteArr[index];
>>>>> }
>>>>>
>>>>> 0x00007f76f4747208: movslq %esi,%r11
>>>>> 0x00007f76f474720b: movsbl 0x10(%r10,%r11,1),%eax
>>>>>
>>>>> Where the 'movslq' is not necessary because we emit range checks guaranteeing that index %esi is not negative.
>>>>> For a char array access no such sign extension is created:
>>>>>
>>>>> public static char accessChar(int index) {
>>>>> return charArr[index];
>>>>> }
>>>>>
>>>>> 0x00007fab3916b188: movzwl 0x10(%r10,%rsi,2),%eax
>>>>>
>>>>> This is because we only have matching rules to fold the corresponding ConvI2LNode in the following case:
>>>>>
>>>>> match(AddP (AddP (DecodeN reg) (LShiftL (ConvI2L idx) scale)) off);
>>>>>
>>>>> This applies to charAccess [1] but not to byteAccess [2] because the byte array access does not require a scaling factor.
>>>>>
>>>>> Solution:
>>>>> I added the corresponding matching rule to fold the ConvI2LNode.
>>>>>
>>>>> match(AddP (AddP (DecodeN reg) (ConvI2L idx)) off);
>>>>>
>>>>> Testing:
>>>>> - Testcase
>>>>> - JPRT
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/26108/accessChar.png
>>>>> [2] https://bugs.openjdk.java.net/secure/attachment/26107/accessByte.png
>>>>>
More information about the hotspot-compiler-dev
mailing list