RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Sep 13 16:32:46 UTC 2016


Yes, I agree with generate_negative_guard() in inline_arraycopy().

But I think we should path flag to ArrayCopyNode::make() when negative guards is generated in inline_arraycopy().
It is generated under several conditions so I don't want it to be missed in expand_arraycopy_node().

Thanks,
Vladimir

On 9/12/16 9:35 AM, Volker Simonis wrote:
> Sorry for the long delay...
>
> Here's my new version:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/
>
> I've actually changed PhaseMacroExpand::expand_arraycopy_node() such
> that it calls generate_arraycopy() with 'length_never_negative' set to
> true if EliminateAllocations is true (in this case we already checked
> in LibraryCallKit::inline_arraycopy() that 'length' is not negative).
> This way I could leave generate_arraycopy() untouched.
>
> The generated code now looks as follows:
>
> Original version (without 'length < 0' check):
>
> 0a7   B5: #    B17 B6 <- B4  Freq: 0,999998
> 0a7       cmpl    R9, R11    # unsigned
> 0aa       jb,u  B17  P=0,000001 C=-1,000000
> ...
> 0da   B7: #    B18 B8 <- B6 B12 B13  Freq: 0,999997
> 0da       movl    R11, [rsp + #8]    # spill
> 0df       testl   R11, R11
> 0e2       jle     B18  P=0,000001 C=-1,000000
> ...
> 0e8   B8: #    B9 <- B7  Freq: 0,999996
> 0f9       call_leaf_nofp,runtime  oop_disjoint_arraycopy
> ...
> 106   B9: #    B10 <- B8 B18 B20  Freq: 0,999997
> 113       ret
> ...
> 184   B17: #    N1 <- B4 B5  Freq: 2,01328e-06
> 193       call,static  wrapper for:
> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
> action='make_not_entrant' debug_id='0')
>
> 19d   B18: #    B9 B19 <- B7  Freq: 9,99997e-07
> 19d       testl   R11, R11
> 1a0       jge     B9  P=0,999999 C=-1,000000
> 1a0
> 1a6   B19: #    B22 B20 <- B18  Freq: 9,99997e-13
> 1a6       movq    RSI, R8    # spill
> 1a9       movl    RDX, #1    # int
> 1ae       movq    RCX, R10    # spill
> 1b1       movl    R8, #1    # int
> 1b7       movl    R9, R11    # spill
>            nop     # 1 bytes pad for loops and calls
> 1bb       call,static  wrapper for: slow_arraycopy
>
> In B5 there's a check if 'offset+length' is still in the array range.
> If not we jump to the uncommon trap in B17.
> In B7 there's the first check from
> PhaseMacroExpand::generate_arraycopy() (i.e.
> generate_nonpositive_guard()). If 'length is less than or equal to
> zero we jump to B18 where there's the second check from
> PhaseMacroExpand::generate_arraycopy() (i.e.
> generate_negative_guard()). If 'length' is  zero, we jump to B9 and
> return. Otherwise we fall into B19 from where we call slow_arraycopy.
> slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will
> throw an AIOOB exception if 'length' is negative.
>
> The new version now looks as follows:
>
> 0a2   B5: #    B19 B6 <- B4  Freq: 0,999998
> 0a2       cmpl    R10, RCX    # unsigned
> 0a5       jb,u  B19  P=0,000001 C=-1,000000
> 0a5
> 0ab   B6: #    B20 B7 <- B5  Freq: 0,999997
> 0ab       movl    R10, [rsp + #0]    # spill
> 0af       testl   R10, R10
> 0b2       jl     B20  P=0,000001 C=-1,000000
> 0b2
> ...
> 0e2   B8: #    B10 B9 <- B7 B13 B14  Freq: 0,999996
> 0e2       testl   R10, R10
> 0e5       je,s   B10  P=0,000001 C=-1,000000
> ...
> 0e7   B9: #    B10 <- B8  Freq: 0,999995
> 0f8       call_leaf_nofp,runtime  oop_disjoint_arraycopy
> ...
> 105   B10: #    B11 <- B9 B8  Freq: 0,999996
> 112       ret
> ...
> 18e   B19: #    B20 <- B5  Freq: 9,99998e-07
> 192   B20: #    N1 <- B18 B19 B6  Freq: 3,01327e-06
> 1a3       call,static  wrapper for:
> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
> action='make_not_entrant' debug_id='0')
>
> B5 is like before, but is now followed by the extra check for 'length'
> being not negative in B6. In B8 we we now have the first check (i.e.
> generate_negative_guard()) from
> PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length'
> is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall
> directly into oop_disjoint_arraycopy(). There's no need to check for
> 'length' being negative and calling 'slow_arraycopy' because this case
> is already handled before now (in B6).
>
> Is this OK now?
>
> Thank you and best regards,
> Volker
>
>
> On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>> Looks good.
>>
>> Check does not fold because it is different: LT vs LE.
>>
>> Actually there are 3 checks together with yours (see
>> PhaseMacroExpand::generate_arraycopy()):
>>
>>    Node* not_pos = generate_nonpositive_guard(ctrl, copy_length,
>> length_never_negative);
>>    if (not_pos != NULL) {
>>      Node* local_ctrl = not_pos, *local_io = *io;
>>      MergeMemNode* local_mem = MergeMemNode::make(mem);
>>      transform_later(local_mem);
>>
>>      // (6) length must not be negative.
>>      if (!length_never_negative) {
>>        generate_negative_guard(&local_ctrl, copy_length, slow_region);
>>      }
>>
>> I think the only way to avoid this is to modify code in generate_arraycopy()
>> when EliminateAllocations is true. In such case you need to generate only
>> length == 0 check.
>>
>> Thanks,
>> Vladimir
>>
>>
>> On 8/25/16 10:03 AM, Volker Simonis wrote:
>>>
>>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> Not generating exception is definitely bug.
>>>>
>>>> First, about test case. It would be nice if it also verifies other
>>>> IndexOutOfBoundsException cases.
>>>>
>>>
>>> I've extended the test case. See:
>>>
>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/
>>>
>>> With the new test I've caught another problem in C1 (only on x86 and
>>> s390, but that's not in the OpenJDK yet :).
>>>
>>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which
>>> prevented the throwing of an ArrayStoreException if src and dst arrays
>>> have incompatible type (see do_test2() in the new regression test).
>>> Note that this is a different error from 8160591 and not fixed by the
>>> change for 8160591.
>>>
>>> I've also moved the new check after the offset + length check as
>>> suggested by you (see new webrev).
>>>
>>> Unfortunately, the new check is still not eliminated. Here's how it looks:
>>>
>>> 0ae   B6: #    B20 B7 <- B5  Freq: 0,999997
>>> 0ae       movl    R9, [rsp + #0]    # spill
>>> 0b2       testl   R9, R9
>>> 0b5       jl     B20  P=0,000001 C=-1,000000
>>> 0b5
>>> 0bb   B7: #    B12 B8 <- B6  Freq: 0,999996
>>> 0bb       movl    R11, [R10 + #8 (8-bit)]    # compressed klass ptr
>>> 0bf       decode_klass_not_null RAX,R11
>>> 0cc       movl    RBX, [RAX + #16 (8-bit)]    # int
>>> 0cf       movslq  RCX, RBX    # i2l
>>> 0d2       movq    RSI, precise klass [Ljava/lang/Object;:
>>> 0x00007ff1080320d0:Constant:exact *    # ptr
>>> 0dc       movq    RCX, [RSI + RCX]    # class
>>> 0e0       cmpq    RAX, RCX    # ptr
>>> 0e3       jne,us  B12  P=0,170000 C=-1,000000
>>> 0e3
>>> 0e5   B8: #    B21 B9 <- B7 B13 B14  Freq: 0,999996
>>> 0e5       testl   R9, R9
>>> 0e8       jle     B21  P=0,000001 C=-1,000000
>>>
>>> As you can see 'testl   R9, R9' is executed two times.
>>>
>>> I've even tried to move the new check after the subtype check, but
>>> that doesn't helps either:
>>>
>>> 0da   B7: #    B20 B8 <- B6 B13 B14  Freq: 0,999997
>>> 0da       movl    R11, [rsp + #8]    # spill
>>> 0df       testl   R11, R11
>>> 0e2       jl     B20  P=0,000001 C=-1,000000
>>> 0e2
>>> 0e8   B8: #    B10 B9 <- B7  Freq: 0,999996
>>> 0e8       testl   R11, R11
>>> 0eb       jle,s   B10  P=0,000001 C=-1,000000
>>>
>>> Any idea how this could be fixed?
>>>
>>> Thanks,
>>> Volker
>>>
>>> PS: and I still don't have a reproducible benchmark which shows a
>>> regression with my change...
>>>
>>>
>>>> Actually additional dynamic check will help in case of negative length is
>>>> know during compilation. The allocation code will be eliminated very
>>>> early
>>>> instead of waiting macro expansion:
>>>>
>>>>        int length = alloc->in(AllocateNode::ALength)->find_int_con(-1);
>>>>        if (length < 0) {
>>>>          NOT_PRODUCT(fail_eliminate = "Array's size is not constant";)
>>>>          can_eliminate = false;
>>>>        }
>>>>
>>>> About additional length check in your new test. I think it may be
>>>> collapsed
>>>> with preceding check since it is generated after other checks.
>>>> So I would suggest to move it after offset + length check.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 8/16/16 7:57 AM, Volker Simonis wrote:
>>>>>
>>>>>
>>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann
>>>>> <tobias.hartmann at oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Volker,
>>>>>>
>>>>>> thanks for taking care of this issue!
>>>>>>
>>>>>> Did you check what happens if the allocation is not eliminated and
>>>>>> macro
>>>>>> expansion phase emits another negative guard? Are the checks merged?
>>>>>>
>>>>>
>>>>> It depends. I just saw that in some cases the regression test worked
>>>>> before, because the length check was done in
>>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously
>>>>> nothing that can be merged. But the test case is obviously a
>>>>> degenerated example anyway, so I don't think that's a problem.
>>>>>
>>>>> If I do a more real-world example like this where the arracopy can not
>>>>> be eliminated because one of its arguments escapes:
>>>>>
>>>>>    public static boolean do_test2(int length, Object[] dest) {
>>>>>      try {
>>>>>        System.arraycopy(new Object[10], 1, dest, 1, length);
>>>>>        return false;
>>>>>      } catch (IndexOutOfBoundsException e) {
>>>>>        return true;
>>>>>      }
>>>>>    }
>>>>>
>>>>> and call it with:
>>>>>
>>>>> do_test2(8, new Object[10])
>>>>>
>>>>> the generated code for do_test2() unfortunately contains one more
>>>>> check now with my change (the 'length' field is in [rsp + #0]):
>>>>>
>>>>> 0a2   B4: #    B18 B5 <- B3  Freq: 0,999999
>>>>> 0a2       movl    R9, [rsp + #0]    # spill
>>>>> 0a6       testl   R9, R9
>>>>> 0a9       jl     B18  P=0,000001 C=-1,000000
>>>>> 0a9
>>>>> 0af   B5: #    B18 B6 <- B4  Freq: 0,999998
>>>>> 0af       movl    RBX, R9    # spill
>>>>> 0b2       incl    RBX    # int
>>>>> 0b4       cmpl    RBX, #10    # unsigned
>>>>> 0b7       jnbe,u  B18  P=0,000001 C=-1,000000
>>>>>
>>>>> The generated code before my change looked like this (againthe
>>>>> 'length' field is in [rsp + #0]):
>>>>>
>>>>> 0a1   B4: #    B17 B5 <- B3  Freq: 0,999999
>>>>> 0a1       movl    R11, [rsp + #8]    # spill
>>>>> 0a6       incl    R11    # int
>>>>> 0a9       cmpl    R11, #10    # unsigned
>>>>> 0ad       jnbe,u  B17  P=0,000001 C=-1,000000
>>>>>
>>>>> It seems that the 'length' check has been completely eliminated before.
>>>>>
>>>>> So I need to do some more tests to understand why the new check isn't
>>>>> eliminated.
>>>>>
>>>>> Do you think the new check results in a performance regression? Have
>>>>> you run some benchmarks?
>>>>>
>>>>>> I would prefer brackets around the if body but you don't need to send
>>>>>> another webrev:
>>>>>>   if (EliminateAllocations) {
>>>>>>     generate_negative_guard(length, slow_region);
>>>>>>   }
>>>>>
>>>>>
>>>>>
>>>>> Yes, I agree.
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>> On 12.08.2016 21:13, Volker Simonis wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> can I please have a review and sponsor for the following fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611
>>>>>>>
>>>>>>>
>>>>>>> We are inserting several checks for the arguments of
>>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before
>>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. However the
>>>>>>> check for the 'length' argument of arracopy is postponed to the macro
>>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy().
>>>>>>>
>>>>>>> But if we are running with EscapeAnalysis and EliminateAllocations,
>>>>>>> the array allocations inside a call to System.arraycopy() may get
>>>>>>> eliminated and thus the complete call to System.arraycopy() will be
>>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In this
>>>>>>> case the extra 'length' check won't be added by
>>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro
>>>>>>> expansion happens after the elimination of macro nodes.
>>>>>>>
>>>>>>> In such a case it may happen that System.arraycopy() will silently
>>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it
>>>>>>> should actually throw an ArrayOutOfBounds exception.
>>>>>>>
>>>>>>> The fix is simple: also insert a check for the length field in
>>>>>>> LibraryCallKit::inline_arraycopy() if we are running with
>>>>>>> EliminateAllocations.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>>
>>>>
>>


More information about the hotspot-compiler-dev mailing list