RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks
Zoltán Majó
zoltan.majo at oracle.com
Fri Oct 7 10:33:13 UTC 2016
Thank you, Tobias and Vladimir, for the review!
Just to be on the safe side: Before pushing the change, I'll run RBT and
I'll also do some performance evaluation. (And, of course, I'll wait for
hs to become open again.)
Best regards,
Zoltan
On 10/07/2016 10:24 AM, Tobias Hartmann wrote:
> Hi Volker,
>
> On 06.10.2016 19:00, Volker Simonis wrote:
>> Hi Vladimir,
>>
>> sorry once again for yet another long delay...
>>
>> In the meantime, Zoltan was so kind to prepare a new webrev with the
>> extension you've proposed.
>> I looked at it and from my point of view it looks good. Please take a
>> look at it as well and if you agree, please feel free to push it:
>>
>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v4/
> This looks good to me!
>
> Thanks,
> Tobias
>
>> Many thanks to Zoltan for his patience, support and help!
>>
>> Regards,
>> Volker
>>
>>
>> On Tue, Sep 13, 2016 at 6:32 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>> Yes, I agree with generate_negative_guard() in inline_arraycopy().
>>>
>>> But I think we should path flag to ArrayCopyNode::make() when negative
>>> guards is generated in inline_arraycopy().
>>> It is generated under several conditions so I don't want it to be missed in
>>> expand_arraycopy_node().
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 9/12/16 9:35 AM, Volker Simonis wrote:
>>>> Sorry for the long delay...
>>>>
>>>> Here's my new version:
>>>>
>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/
>>>>
>>>> I've actually changed PhaseMacroExpand::expand_arraycopy_node() such
>>>> that it calls generate_arraycopy() with 'length_never_negative' set to
>>>> true if EliminateAllocations is true (in this case we already checked
>>>> in LibraryCallKit::inline_arraycopy() that 'length' is not negative).
>>>> This way I could leave generate_arraycopy() untouched.
>>>>
>>>> The generated code now looks as follows:
>>>>
>>>> Original version (without 'length < 0' check):
>>>>
>>>> 0a7 B5: # B17 B6 <- B4 Freq: 0,999998
>>>> 0a7 cmpl R9, R11 # unsigned
>>>> 0aa jb,u B17 P=0,000001 C=-1,000000
>>>> ...
>>>> 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997
>>>> 0da movl R11, [rsp + #8] # spill
>>>> 0df testl R11, R11
>>>> 0e2 jle B18 P=0,000001 C=-1,000000
>>>> ...
>>>> 0e8 B8: # B9 <- B7 Freq: 0,999996
>>>> 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy
>>>> ...
>>>> 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997
>>>> 113 ret
>>>> ...
>>>> 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06
>>>> 193 call,static wrapper for:
>>>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>>>> action='make_not_entrant' debug_id='0')
>>>>
>>>> 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07
>>>> 19d testl R11, R11
>>>> 1a0 jge B9 P=0,999999 C=-1,000000
>>>> 1a0
>>>> 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13
>>>> 1a6 movq RSI, R8 # spill
>>>> 1a9 movl RDX, #1 # int
>>>> 1ae movq RCX, R10 # spill
>>>> 1b1 movl R8, #1 # int
>>>> 1b7 movl R9, R11 # spill
>>>> nop # 1 bytes pad for loops and calls
>>>> 1bb call,static wrapper for: slow_arraycopy
>>>>
>>>> In B5 there's a check if 'offset+length' is still in the array range.
>>>> If not we jump to the uncommon trap in B17.
>>>> In B7 there's the first check from
>>>> PhaseMacroExpand::generate_arraycopy() (i.e.
>>>> generate_nonpositive_guard()). If 'length is less than or equal to
>>>> zero we jump to B18 where there's the second check from
>>>> PhaseMacroExpand::generate_arraycopy() (i.e.
>>>> generate_negative_guard()). If 'length' is zero, we jump to B9 and
>>>> return. Otherwise we fall into B19 from where we call slow_arraycopy.
>>>> slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will
>>>> throw an AIOOB exception if 'length' is negative.
>>>>
>>>> The new version now looks as follows:
>>>>
>>>> 0a2 B5: # B19 B6 <- B4 Freq: 0,999998
>>>> 0a2 cmpl R10, RCX # unsigned
>>>> 0a5 jb,u B19 P=0,000001 C=-1,000000
>>>> 0a5
>>>> 0ab B6: # B20 B7 <- B5 Freq: 0,999997
>>>> 0ab movl R10, [rsp + #0] # spill
>>>> 0af testl R10, R10
>>>> 0b2 jl B20 P=0,000001 C=-1,000000
>>>> 0b2
>>>> ...
>>>> 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996
>>>> 0e2 testl R10, R10
>>>> 0e5 je,s B10 P=0,000001 C=-1,000000
>>>> ...
>>>> 0e7 B9: # B10 <- B8 Freq: 0,999995
>>>> 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy
>>>> ...
>>>> 105 B10: # B11 <- B9 B8 Freq: 0,999996
>>>> 112 ret
>>>> ...
>>>> 18e B19: # B20 <- B5 Freq: 9,99998e-07
>>>> 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06
>>>> 1a3 call,static wrapper for:
>>>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>>>> action='make_not_entrant' debug_id='0')
>>>>
>>>> B5 is like before, but is now followed by the extra check for 'length'
>>>> being not negative in B6. In B8 we we now have the first check (i.e.
>>>> generate_negative_guard()) from
>>>> PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length'
>>>> is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall
>>>> directly into oop_disjoint_arraycopy(). There's no need to check for
>>>> 'length' being negative and calling 'slow_arraycopy' because this case
>>>> is already handled before now (in B6).
>>>>
>>>> Is this OK now?
>>>>
>>>> Thank you and best regards,
>>>> Volker
>>>>
>>>>
>>>> On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>> Looks good.
>>>>>
>>>>> Check does not fold because it is different: LT vs LE.
>>>>>
>>>>> Actually there are 3 checks together with yours (see
>>>>> PhaseMacroExpand::generate_arraycopy()):
>>>>>
>>>>> Node* not_pos = generate_nonpositive_guard(ctrl, copy_length,
>>>>> length_never_negative);
>>>>> if (not_pos != NULL) {
>>>>> Node* local_ctrl = not_pos, *local_io = *io;
>>>>> MergeMemNode* local_mem = MergeMemNode::make(mem);
>>>>> transform_later(local_mem);
>>>>>
>>>>> // (6) length must not be negative.
>>>>> if (!length_never_negative) {
>>>>> generate_negative_guard(&local_ctrl, copy_length, slow_region);
>>>>> }
>>>>>
>>>>> I think the only way to avoid this is to modify code in
>>>>> generate_arraycopy()
>>>>> when EliminateAllocations is true. In such case you need to generate only
>>>>> length == 0 check.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> On 8/25/16 10:03 AM, Volker Simonis wrote:
>>>>>>
>>>>>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov
>>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>>>
>>>>>>> Not generating exception is definitely bug.
>>>>>>>
>>>>>>> First, about test case. It would be nice if it also verifies other
>>>>>>> IndexOutOfBoundsException cases.
>>>>>>>
>>>>>> I've extended the test case. See:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/
>>>>>>
>>>>>> With the new test I've caught another problem in C1 (only on x86 and
>>>>>> s390, but that's not in the OpenJDK yet :).
>>>>>>
>>>>>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which
>>>>>> prevented the throwing of an ArrayStoreException if src and dst arrays
>>>>>> have incompatible type (see do_test2() in the new regression test).
>>>>>> Note that this is a different error from 8160591 and not fixed by the
>>>>>> change for 8160591.
>>>>>>
>>>>>> I've also moved the new check after the offset + length check as
>>>>>> suggested by you (see new webrev).
>>>>>>
>>>>>> Unfortunately, the new check is still not eliminated. Here's how it
>>>>>> looks:
>>>>>>
>>>>>> 0ae B6: # B20 B7 <- B5 Freq: 0,999997
>>>>>> 0ae movl R9, [rsp + #0] # spill
>>>>>> 0b2 testl R9, R9
>>>>>> 0b5 jl B20 P=0,000001 C=-1,000000
>>>>>> 0b5
>>>>>> 0bb B7: # B12 B8 <- B6 Freq: 0,999996
>>>>>> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr
>>>>>> 0bf decode_klass_not_null RAX,R11
>>>>>> 0cc movl RBX, [RAX + #16 (8-bit)] # int
>>>>>> 0cf movslq RCX, RBX # i2l
>>>>>> 0d2 movq RSI, precise klass [Ljava/lang/Object;:
>>>>>> 0x00007ff1080320d0:Constant:exact * # ptr
>>>>>> 0dc movq RCX, [RSI + RCX] # class
>>>>>> 0e0 cmpq RAX, RCX # ptr
>>>>>> 0e3 jne,us B12 P=0,170000 C=-1,000000
>>>>>> 0e3
>>>>>> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996
>>>>>> 0e5 testl R9, R9
>>>>>> 0e8 jle B21 P=0,000001 C=-1,000000
>>>>>>
>>>>>> As you can see 'testl R9, R9' is executed two times.
>>>>>>
>>>>>> I've even tried to move the new check after the subtype check, but
>>>>>> that doesn't helps either:
>>>>>>
>>>>>> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997
>>>>>> 0da movl R11, [rsp + #8] # spill
>>>>>> 0df testl R11, R11
>>>>>> 0e2 jl B20 P=0,000001 C=-1,000000
>>>>>> 0e2
>>>>>> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996
>>>>>> 0e8 testl R11, R11
>>>>>> 0eb jle,s B10 P=0,000001 C=-1,000000
>>>>>>
>>>>>> Any idea how this could be fixed?
>>>>>>
>>>>>> Thanks,
>>>>>> Volker
>>>>>>
>>>>>> PS: and I still don't have a reproducible benchmark which shows a
>>>>>> regression with my change...
>>>>>>
>>>>>>
>>>>>>> Actually additional dynamic check will help in case of negative length
>>>>>>> is
>>>>>>> know during compilation. The allocation code will be eliminated very
>>>>>>> early
>>>>>>> instead of waiting macro expansion:
>>>>>>>
>>>>>>> int length = alloc->in(AllocateNode::ALength)->find_int_con(-1);
>>>>>>> if (length < 0) {
>>>>>>> NOT_PRODUCT(fail_eliminate = "Array's size is not constant";)
>>>>>>> can_eliminate = false;
>>>>>>> }
>>>>>>>
>>>>>>> About additional length check in your new test. I think it may be
>>>>>>> collapsed
>>>>>>> with preceding check since it is generated after other checks.
>>>>>>> So I would suggest to move it after offset + length check.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>
>>>>>>> On 8/16/16 7:57 AM, Volker Simonis wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann
>>>>>>>> <tobias.hartmann at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Volker,
>>>>>>>>>
>>>>>>>>> thanks for taking care of this issue!
>>>>>>>>>
>>>>>>>>> Did you check what happens if the allocation is not eliminated and
>>>>>>>>> macro
>>>>>>>>> expansion phase emits another negative guard? Are the checks merged?
>>>>>>>>>
>>>>>>>> It depends. I just saw that in some cases the regression test worked
>>>>>>>> before, because the length check was done in
>>>>>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously
>>>>>>>> nothing that can be merged. But the test case is obviously a
>>>>>>>> degenerated example anyway, so I don't think that's a problem.
>>>>>>>>
>>>>>>>> If I do a more real-world example like this where the arracopy can not
>>>>>>>> be eliminated because one of its arguments escapes:
>>>>>>>>
>>>>>>>> public static boolean do_test2(int length, Object[] dest) {
>>>>>>>> try {
>>>>>>>> System.arraycopy(new Object[10], 1, dest, 1, length);
>>>>>>>> return false;
>>>>>>>> } catch (IndexOutOfBoundsException e) {
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> and call it with:
>>>>>>>>
>>>>>>>> do_test2(8, new Object[10])
>>>>>>>>
>>>>>>>> the generated code for do_test2() unfortunately contains one more
>>>>>>>> check now with my change (the 'length' field is in [rsp + #0]):
>>>>>>>>
>>>>>>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999
>>>>>>>> 0a2 movl R9, [rsp + #0] # spill
>>>>>>>> 0a6 testl R9, R9
>>>>>>>> 0a9 jl B18 P=0,000001 C=-1,000000
>>>>>>>> 0a9
>>>>>>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998
>>>>>>>> 0af movl RBX, R9 # spill
>>>>>>>> 0b2 incl RBX # int
>>>>>>>> 0b4 cmpl RBX, #10 # unsigned
>>>>>>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000
>>>>>>>>
>>>>>>>> The generated code before my change looked like this (againthe
>>>>>>>> 'length' field is in [rsp + #0]):
>>>>>>>>
>>>>>>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999
>>>>>>>> 0a1 movl R11, [rsp + #8] # spill
>>>>>>>> 0a6 incl R11 # int
>>>>>>>> 0a9 cmpl R11, #10 # unsigned
>>>>>>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000
>>>>>>>>
>>>>>>>> It seems that the 'length' check has been completely eliminated
>>>>>>>> before.
>>>>>>>>
>>>>>>>> So I need to do some more tests to understand why the new check isn't
>>>>>>>> eliminated.
>>>>>>>>
>>>>>>>> Do you think the new check results in a performance regression? Have
>>>>>>>> you run some benchmarks?
>>>>>>>>
>>>>>>>>> I would prefer brackets around the if body but you don't need to send
>>>>>>>>> another webrev:
>>>>>>>>> if (EliminateAllocations) {
>>>>>>>>> generate_negative_guard(length, slow_region);
>>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, I agree.
>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>> On 12.08.2016 21:13, Volker Simonis wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> can I please have a review and sponsor for the following fix:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We are inserting several checks for the arguments of
>>>>>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before
>>>>>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. However
>>>>>>>>>> the
>>>>>>>>>> check for the 'length' argument of arracopy is postponed to the
>>>>>>>>>> macro
>>>>>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy().
>>>>>>>>>>
>>>>>>>>>> But if we are running with EscapeAnalysis and EliminateAllocations,
>>>>>>>>>> the array allocations inside a call to System.arraycopy() may get
>>>>>>>>>> eliminated and thus the complete call to System.arraycopy() will be
>>>>>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In this
>>>>>>>>>> case the extra 'length' check won't be added by
>>>>>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro
>>>>>>>>>> expansion happens after the elimination of macro nodes.
>>>>>>>>>>
>>>>>>>>>> In such a case it may happen that System.arraycopy() will silently
>>>>>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it
>>>>>>>>>> should actually throw an ArrayOutOfBounds exception.
>>>>>>>>>>
>>>>>>>>>> The fix is simple: also insert a check for the length field in
>>>>>>>>>> LibraryCallKit::inline_arraycopy() if we are running with
>>>>>>>>>> EliminateAllocations.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Volker
>>>>>>>>>>
More information about the hotspot-compiler-dev
mailing list