RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks

Zoltán Majó zoltan.majo at oracle.com
Wed Sep 14 12:19:15 UTC 2016


Hi Volker,


On 09/13/2016 05:04 PM, Zoltán Majó wrote:
> Hi Volker,
>
>
> On 09/12/2016 06:35 PM, Volker Simonis wrote:
>> Sorry for the long delay...
>
> thank you for spending more time on this bug and also for the detailed 
> description of the way your solution works!
>
>>
>> Here's my new version:
>>
>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/
>
> That looks good to me.
>
> I did a preliminary performance evaluation with Octane-Gbemu and 
> Octane-PdfJS, results look good on all platforms.  Let me now do a 
> more detailed evaluation.  I'll get back to you once the results are 
> available.

The performance evaluation with webrev.v3 is complete now. The change 
does not cause any performance regressions (neither for SPECjvm2008 nor 
for Octane). Once you update the code according to Vladimir's 
suggestions, I can look again.

Thank you!

Best regards,


Zoltan

>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> I've actually changed PhaseMacroExpand::expand_arraycopy_node() such
>> that it calls generate_arraycopy() with 'length_never_negative' set to
>> true if EliminateAllocations is true (in this case we already checked
>> in LibraryCallKit::inline_arraycopy() that 'length' is not negative).
>> This way I could leave generate_arraycopy() untouched.
>>
>> The generated code now looks as follows:
>>
>> Original version (without 'length < 0' check):
>>
>> 0a7   B5: #    B17 B6 <- B4  Freq: 0,999998
>> 0a7       cmpl    R9, R11    # unsigned
>> 0aa       jb,u  B17  P=0,000001 C=-1,000000
>> ...
>> 0da   B7: #    B18 B8 <- B6 B12 B13  Freq: 0,999997
>> 0da       movl    R11, [rsp + #8]    # spill
>> 0df       testl   R11, R11
>> 0e2       jle     B18  P=0,000001 C=-1,000000
>> ...
>> 0e8   B8: #    B9 <- B7  Freq: 0,999996
>> 0f9       call_leaf_nofp,runtime  oop_disjoint_arraycopy
>> ...
>> 106   B9: #    B10 <- B8 B18 B20  Freq: 0,999997
>> 113       ret
>> ...
>> 184   B17: #    N1 <- B4 B5  Freq: 2,01328e-06
>> 193       call,static  wrapper for:
>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>> action='make_not_entrant' debug_id='0')
>>
>> 19d   B18: #    B9 B19 <- B7  Freq: 9,99997e-07
>> 19d       testl   R11, R11
>> 1a0       jge     B9  P=0,999999 C=-1,000000
>> 1a0
>> 1a6   B19: #    B22 B20 <- B18  Freq: 9,99997e-13
>> 1a6       movq    RSI, R8    # spill
>> 1a9       movl    RDX, #1    # int
>> 1ae       movq    RCX, R10    # spill
>> 1b1       movl    R8, #1    # int
>> 1b7       movl    R9, R11    # spill
>>            nop     # 1 bytes pad for loops and calls
>> 1bb       call,static  wrapper for: slow_arraycopy
>>
>> In B5 there's a check if 'offset+length' is still in the array range.
>> If not we jump to the uncommon trap in B17.
>> In B7 there's the first check from
>> PhaseMacroExpand::generate_arraycopy() (i.e.
>> generate_nonpositive_guard()). If 'length is less than or equal to
>> zero we jump to B18 where there's the second check from
>> PhaseMacroExpand::generate_arraycopy() (i.e.
>> generate_negative_guard()). If 'length' is  zero, we jump to B9 and
>> return. Otherwise we fall into B19 from where we call slow_arraycopy.
>> slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will
>> throw an AIOOB exception if 'length' is negative.
>>
>> The new version now looks as follows:
>>
>> 0a2   B5: #    B19 B6 <- B4  Freq: 0,999998
>> 0a2       cmpl    R10, RCX    # unsigned
>> 0a5       jb,u  B19  P=0,000001 C=-1,000000
>> 0a5
>> 0ab   B6: #    B20 B7 <- B5  Freq: 0,999997
>> 0ab       movl    R10, [rsp + #0]    # spill
>> 0af       testl   R10, R10
>> 0b2       jl     B20  P=0,000001 C=-1,000000
>> 0b2
>> ...
>> 0e2   B8: #    B10 B9 <- B7 B13 B14  Freq: 0,999996
>> 0e2       testl   R10, R10
>> 0e5       je,s   B10  P=0,000001 C=-1,000000
>> ...
>> 0e7   B9: #    B10 <- B8  Freq: 0,999995
>> 0f8       call_leaf_nofp,runtime  oop_disjoint_arraycopy
>> ...
>> 105   B10: #    B11 <- B9 B8  Freq: 0,999996
>> 112       ret
>> ...
>> 18e   B19: #    B20 <- B5  Freq: 9,99998e-07
>> 192   B20: #    N1 <- B18 B19 B6  Freq: 3,01327e-06
>> 1a3       call,static  wrapper for:
>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>> action='make_not_entrant' debug_id='0')
>>
>> B5 is like before, but is now followed by the extra check for 'length'
>> being not negative in B6. In B8 we we now have the first check (i.e.
>> generate_negative_guard()) from
>> PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length'
>> is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall
>> directly into oop_disjoint_arraycopy(). There's no need to check for
>> 'length' being negative and calling 'slow_arraycopy' because this case
>> is already handled before now (in B6).
>>
>> Is this OK now?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>> Looks good.
>>>
>>> Check does not fold because it is different: LT vs LE.
>>>
>>> Actually there are 3 checks together with yours (see
>>> PhaseMacroExpand::generate_arraycopy()):
>>>
>>>    Node* not_pos = generate_nonpositive_guard(ctrl, copy_length,
>>> length_never_negative);
>>>    if (not_pos != NULL) {
>>>      Node* local_ctrl = not_pos, *local_io = *io;
>>>      MergeMemNode* local_mem = MergeMemNode::make(mem);
>>>      transform_later(local_mem);
>>>
>>>      // (6) length must not be negative.
>>>      if (!length_never_negative) {
>>>        generate_negative_guard(&local_ctrl, copy_length, slow_region);
>>>      }
>>>
>>> I think the only way to avoid this is to modify code in 
>>> generate_arraycopy()
>>> when EliminateAllocations is true. In such case you need to generate 
>>> only
>>> length == 0 check.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 8/25/16 10:03 AM, Volker Simonis wrote:
>>>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>> Not generating exception is definitely bug.
>>>>>
>>>>> First, about test case. It would be nice if it also verifies other
>>>>> IndexOutOfBoundsException cases.
>>>>>
>>>> I've extended the test case. See:
>>>>
>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/
>>>>
>>>> With the new test I've caught another problem in C1 (only on x86 and
>>>> s390, but that's not in the OpenJDK yet :).
>>>>
>>>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which
>>>> prevented the throwing of an ArrayStoreException if src and dst arrays
>>>> have incompatible type (see do_test2() in the new regression test).
>>>> Note that this is a different error from 8160591 and not fixed by the
>>>> change for 8160591.
>>>>
>>>> I've also moved the new check after the offset + length check as
>>>> suggested by you (see new webrev).
>>>>
>>>> Unfortunately, the new check is still not eliminated. Here's how it 
>>>> looks:
>>>>
>>>> 0ae   B6: #    B20 B7 <- B5  Freq: 0,999997
>>>> 0ae       movl    R9, [rsp + #0]    # spill
>>>> 0b2       testl   R9, R9
>>>> 0b5       jl     B20  P=0,000001 C=-1,000000
>>>> 0b5
>>>> 0bb   B7: #    B12 B8 <- B6  Freq: 0,999996
>>>> 0bb       movl    R11, [R10 + #8 (8-bit)]    # compressed klass ptr
>>>> 0bf       decode_klass_not_null RAX,R11
>>>> 0cc       movl    RBX, [RAX + #16 (8-bit)]    # int
>>>> 0cf       movslq  RCX, RBX    # i2l
>>>> 0d2       movq    RSI, precise klass [Ljava/lang/Object;:
>>>> 0x00007ff1080320d0:Constant:exact *    # ptr
>>>> 0dc       movq    RCX, [RSI + RCX]    # class
>>>> 0e0       cmpq    RAX, RCX    # ptr
>>>> 0e3       jne,us  B12  P=0,170000 C=-1,000000
>>>> 0e3
>>>> 0e5   B8: #    B21 B9 <- B7 B13 B14  Freq: 0,999996
>>>> 0e5       testl   R9, R9
>>>> 0e8       jle     B21  P=0,000001 C=-1,000000
>>>>
>>>> As you can see 'testl   R9, R9' is executed two times.
>>>>
>>>> I've even tried to move the new check after the subtype check, but
>>>> that doesn't helps either:
>>>>
>>>> 0da   B7: #    B20 B8 <- B6 B13 B14  Freq: 0,999997
>>>> 0da       movl    R11, [rsp + #8]    # spill
>>>> 0df       testl   R11, R11
>>>> 0e2       jl     B20  P=0,000001 C=-1,000000
>>>> 0e2
>>>> 0e8   B8: #    B10 B9 <- B7  Freq: 0,999996
>>>> 0e8       testl   R11, R11
>>>> 0eb       jle,s   B10  P=0,000001 C=-1,000000
>>>>
>>>> Any idea how this could be fixed?
>>>>
>>>> Thanks,
>>>> Volker
>>>>
>>>> PS: and I still don't have a reproducible benchmark which shows a
>>>> regression with my change...
>>>>
>>>>
>>>>> Actually additional dynamic check will help in case of negative 
>>>>> length is
>>>>> know during compilation. The allocation code will be eliminated very
>>>>> early
>>>>> instead of waiting macro expansion:
>>>>>
>>>>>        int length = 
>>>>> alloc->in(AllocateNode::ALength)->find_int_con(-1);
>>>>>        if (length < 0) {
>>>>>          NOT_PRODUCT(fail_eliminate = "Array's size is not 
>>>>> constant";)
>>>>>          can_eliminate = false;
>>>>>        }
>>>>>
>>>>> About additional length check in your new test. I think it may be
>>>>> collapsed
>>>>> with preceding check since it is generated after other checks.
>>>>> So I would suggest to move it after offset + length check.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> On 8/16/16 7:57 AM, Volker Simonis wrote:
>>>>>>
>>>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann
>>>>>> <tobias.hartmann at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Volker,
>>>>>>>
>>>>>>> thanks for taking care of this issue!
>>>>>>>
>>>>>>> Did you check what happens if the allocation is not eliminated and
>>>>>>> macro
>>>>>>> expansion phase emits another negative guard? Are the checks 
>>>>>>> merged?
>>>>>>>
>>>>>> It depends. I just saw that in some cases the regression test worked
>>>>>> before, because the length check was done in
>>>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously
>>>>>> nothing that can be merged. But the test case is obviously a
>>>>>> degenerated example anyway, so I don't think that's a problem.
>>>>>>
>>>>>> If I do a more real-world example like this where the arracopy 
>>>>>> can not
>>>>>> be eliminated because one of its arguments escapes:
>>>>>>
>>>>>>    public static boolean do_test2(int length, Object[] dest) {
>>>>>>      try {
>>>>>>        System.arraycopy(new Object[10], 1, dest, 1, length);
>>>>>>        return false;
>>>>>>      } catch (IndexOutOfBoundsException e) {
>>>>>>        return true;
>>>>>>      }
>>>>>>    }
>>>>>>
>>>>>> and call it with:
>>>>>>
>>>>>> do_test2(8, new Object[10])
>>>>>>
>>>>>> the generated code for do_test2() unfortunately contains one more
>>>>>> check now with my change (the 'length' field is in [rsp + #0]):
>>>>>>
>>>>>> 0a2   B4: #    B18 B5 <- B3  Freq: 0,999999
>>>>>> 0a2       movl    R9, [rsp + #0]    # spill
>>>>>> 0a6       testl   R9, R9
>>>>>> 0a9       jl     B18  P=0,000001 C=-1,000000
>>>>>> 0a9
>>>>>> 0af   B5: #    B18 B6 <- B4  Freq: 0,999998
>>>>>> 0af       movl    RBX, R9    # spill
>>>>>> 0b2       incl    RBX    # int
>>>>>> 0b4       cmpl    RBX, #10    # unsigned
>>>>>> 0b7       jnbe,u  B18  P=0,000001 C=-1,000000
>>>>>>
>>>>>> The generated code before my change looked like this (againthe
>>>>>> 'length' field is in [rsp + #0]):
>>>>>>
>>>>>> 0a1   B4: #    B17 B5 <- B3  Freq: 0,999999
>>>>>> 0a1       movl    R11, [rsp + #8]    # spill
>>>>>> 0a6       incl    R11    # int
>>>>>> 0a9       cmpl    R11, #10    # unsigned
>>>>>> 0ad       jnbe,u  B17  P=0,000001 C=-1,000000
>>>>>>
>>>>>> It seems that the 'length' check has been completely eliminated 
>>>>>> before.
>>>>>>
>>>>>> So I need to do some more tests to understand why the new check 
>>>>>> isn't
>>>>>> eliminated.
>>>>>>
>>>>>> Do you think the new check results in a performance regression? Have
>>>>>> you run some benchmarks?
>>>>>>
>>>>>>> I would prefer brackets around the if body but you don't need to 
>>>>>>> send
>>>>>>> another webrev:
>>>>>>>   if (EliminateAllocations) {
>>>>>>>     generate_negative_guard(length, slow_region);
>>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Yes, I agree.
>>>>>>
>>>>>>> Best regards,
>>>>>>> Tobias
>>>>>>>
>>>>>>> On 12.08.2016 21:13, Volker Simonis wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> can I please have a review and sponsor for the following fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611
>>>>>>>>
>>>>>>>>
>>>>>>>> We are inserting several checks for the arguments of
>>>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before
>>>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. 
>>>>>>>> However the
>>>>>>>> check for the 'length' argument of arracopy is postponed to the 
>>>>>>>> macro
>>>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy().
>>>>>>>>
>>>>>>>> But if we are running with EscapeAnalysis and 
>>>>>>>> EliminateAllocations,
>>>>>>>> the array allocations inside a call to System.arraycopy() may get
>>>>>>>> eliminated and thus the complete call to System.arraycopy() 
>>>>>>>> will be
>>>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In 
>>>>>>>> this
>>>>>>>> case the extra 'length' check won't be added by
>>>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro
>>>>>>>> expansion happens after the elimination of macro nodes.
>>>>>>>>
>>>>>>>> In such a case it may happen that System.arraycopy() will silently
>>>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it
>>>>>>>> should actually throw an ArrayOutOfBounds exception.
>>>>>>>>
>>>>>>>> The fix is simple: also insert a check for the length field in
>>>>>>>> LibraryCallKit::inline_arraycopy() if we are running with
>>>>>>>> EliminateAllocations.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Volker
>>>>>>>>
>



More information about the hotspot-compiler-dev mailing list