[9] RFR(S): 8075136: Unnecessary sign extension for byte array access
Tobias Hartmann
tobias.hartmann at oracle.com
Wed Mar 18 09:51:21 UTC 2015
Hi Andrew,
thanks a lot for clarifying this. Please see comments inline.
> On 17/03/15 16:11, Andrew Dinn wrote:
>> Apologies for jumping the gun wiht my previous comment. Having reviewed
>> the earlier traffic in this thread I now understand why the two new
>> rules are needed (i.e. to deal with the case where there is no LShift in
>> the matched operand).
Actually, the original intention of this change was to get rid of unnecessary sign extensions if the ConvI2LNode is not needed. For example, on x86 we avoid a sign extension if idx is guaranteed to be >= 0. See 'indPosIndexScaleOffset' in x86_64.ad:
operand indPosIndexScaleOffset(any_RegP reg, immL32 off, rRegI idx, immI2 scale)
%{
constraint(ALLOC_IN_RC(ptr_reg));
predicate(n->in(2)->in(3)->in(1)->as_Type()->type()->is_long()->_lo >= 0);
[...]
On x86 this optimization was missing for the unscaled case (for example, a byte array access). That's why I added this case to the arm64 code.
I assumed that we do the same optimization on arm64 but now it seems to me that we don't. See the following output with the 'baseline' version:
Byte array access:
0x0000007f7453d8b4: add x10, x10, w1, sxtw
0x0000007f7453d8b8: ldrsb w0, [x10,#16]
Char array access:
0x0000007f7cb6f8b4: add x8, x10, #0x10
0x0000007f7cb6f8b8: ldrh w0, [x8,w1,sxtw #1]
In both cases we have a 'sxtw' which is not needed because we have range checks that guarantee that index (w1) is always >= 0.
> The two new rules both specify an unscaled 32-bit integer register as
> index and, of course, the scale value is installed as 0 in the rule. So,
> the correct patch to function storeLoad is to include these two switch
> cases in the set which specify a sign-extend:
>
> . . .
> switch (opcode) {
> case INDINDEXSCALEDOFFSETI2L:
> case INDINDEXSCALEDI2L:
> case INDINDEXSCALEDOFFSETI2LN:
> case INDINDEXSCALEDI2LN:
> + case INDINDEXOFFSETI2L:
> + case INDINDEXOFFSETI2LN:
> scale = Address::sxtw(size);
> break;
> default:
> scale = Address::lsl(size);
> . . .
>
Adapting the code as you suggested has no effect on the generated assembly code. It is equal to the baseline version. In fact, the indIndexOffsetI2L operand is not used.
0x0000007fa91c06b4: add x10, x10, w1, sxtw
0x0000007fa91c06b8: ldrsb w0, [x10,#16]
http://cr.openjdk.java.net/~thartmann/8075136/webrev.04/
I assume that the changes are not required, since we don't do the 'unnecessary sign extension optimization'. What do you think?
Thanks,
Tobias
More information about the hotspot-compiler-dev
mailing list