RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks
Zoltán Majó
zoltan.majo at oracle.com
Wed Sep 14 12:19:15 UTC 2016
Hi Volker,
On 09/13/2016 05:04 PM, Zoltán Majó wrote:
> Hi Volker,
>
>
> On 09/12/2016 06:35 PM, Volker Simonis wrote:
>> Sorry for the long delay...
>
> thank you for spending more time on this bug and also for the detailed
> description of the way your solution works!
>
>>
>> Here's my new version:
>>
>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/
>
> That looks good to me.
>
> I did a preliminary performance evaluation with Octane-Gbemu and
> Octane-PdfJS, results look good on all platforms. Let me now do a
> more detailed evaluation. I'll get back to you once the results are
> available.
The performance evaluation with webrev.v3 is complete now. The change
does not cause any performance regressions (neither for SPECjvm2008 nor
for Octane). Once you update the code according to Vladimir's
suggestions, I can look again.
Thank you!
Best regards,
Zoltan
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> I've actually changed PhaseMacroExpand::expand_arraycopy_node() such
>> that it calls generate_arraycopy() with 'length_never_negative' set to
>> true if EliminateAllocations is true (in this case we already checked
>> in LibraryCallKit::inline_arraycopy() that 'length' is not negative).
>> This way I could leave generate_arraycopy() untouched.
>>
>> The generated code now looks as follows:
>>
>> Original version (without 'length < 0' check):
>>
>> 0a7 B5: # B17 B6 <- B4 Freq: 0,999998
>> 0a7 cmpl R9, R11 # unsigned
>> 0aa jb,u B17 P=0,000001 C=-1,000000
>> ...
>> 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997
>> 0da movl R11, [rsp + #8] # spill
>> 0df testl R11, R11
>> 0e2 jle B18 P=0,000001 C=-1,000000
>> ...
>> 0e8 B8: # B9 <- B7 Freq: 0,999996
>> 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy
>> ...
>> 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997
>> 113 ret
>> ...
>> 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06
>> 193 call,static wrapper for:
>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>> action='make_not_entrant' debug_id='0')
>>
>> 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07
>> 19d testl R11, R11
>> 1a0 jge B9 P=0,999999 C=-1,000000
>> 1a0
>> 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13
>> 1a6 movq RSI, R8 # spill
>> 1a9 movl RDX, #1 # int
>> 1ae movq RCX, R10 # spill
>> 1b1 movl R8, #1 # int
>> 1b7 movl R9, R11 # spill
>> nop # 1 bytes pad for loops and calls
>> 1bb call,static wrapper for: slow_arraycopy
>>
>> In B5 there's a check if 'offset+length' is still in the array range.
>> If not we jump to the uncommon trap in B17.
>> In B7 there's the first check from
>> PhaseMacroExpand::generate_arraycopy() (i.e.
>> generate_nonpositive_guard()). If 'length is less than or equal to
>> zero we jump to B18 where there's the second check from
>> PhaseMacroExpand::generate_arraycopy() (i.e.
>> generate_negative_guard()). If 'length' is zero, we jump to B9 and
>> return. Otherwise we fall into B19 from where we call slow_arraycopy.
>> slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will
>> throw an AIOOB exception if 'length' is negative.
>>
>> The new version now looks as follows:
>>
>> 0a2 B5: # B19 B6 <- B4 Freq: 0,999998
>> 0a2 cmpl R10, RCX # unsigned
>> 0a5 jb,u B19 P=0,000001 C=-1,000000
>> 0a5
>> 0ab B6: # B20 B7 <- B5 Freq: 0,999997
>> 0ab movl R10, [rsp + #0] # spill
>> 0af testl R10, R10
>> 0b2 jl B20 P=0,000001 C=-1,000000
>> 0b2
>> ...
>> 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996
>> 0e2 testl R10, R10
>> 0e5 je,s B10 P=0,000001 C=-1,000000
>> ...
>> 0e7 B9: # B10 <- B8 Freq: 0,999995
>> 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy
>> ...
>> 105 B10: # B11 <- B9 B8 Freq: 0,999996
>> 112 ret
>> ...
>> 18e B19: # B20 <- B5 Freq: 9,99998e-07
>> 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06
>> 1a3 call,static wrapper for:
>> uncommon_trap(reason='intrinsic_or_type_checked_inlining'
>> action='make_not_entrant' debug_id='0')
>>
>> B5 is like before, but is now followed by the extra check for 'length'
>> being not negative in B6. In B8 we we now have the first check (i.e.
>> generate_negative_guard()) from
>> PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length'
>> is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall
>> directly into oop_disjoint_arraycopy(). There's no need to check for
>> 'length' being negative and calling 'slow_arraycopy' because this case
>> is already handled before now (in B6).
>>
>> Is this OK now?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>> Looks good.
>>>
>>> Check does not fold because it is different: LT vs LE.
>>>
>>> Actually there are 3 checks together with yours (see
>>> PhaseMacroExpand::generate_arraycopy()):
>>>
>>> Node* not_pos = generate_nonpositive_guard(ctrl, copy_length,
>>> length_never_negative);
>>> if (not_pos != NULL) {
>>> Node* local_ctrl = not_pos, *local_io = *io;
>>> MergeMemNode* local_mem = MergeMemNode::make(mem);
>>> transform_later(local_mem);
>>>
>>> // (6) length must not be negative.
>>> if (!length_never_negative) {
>>> generate_negative_guard(&local_ctrl, copy_length, slow_region);
>>> }
>>>
>>> I think the only way to avoid this is to modify code in
>>> generate_arraycopy()
>>> when EliminateAllocations is true. In such case you need to generate
>>> only
>>> length == 0 check.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 8/25/16 10:03 AM, Volker Simonis wrote:
>>>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>> Not generating exception is definitely bug.
>>>>>
>>>>> First, about test case. It would be nice if it also verifies other
>>>>> IndexOutOfBoundsException cases.
>>>>>
>>>> I've extended the test case. See:
>>>>
>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/
>>>>
>>>> With the new test I've caught another problem in C1 (only on x86 and
>>>> s390, but that's not in the OpenJDK yet :).
>>>>
>>>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which
>>>> prevented the throwing of an ArrayStoreException if src and dst arrays
>>>> have incompatible type (see do_test2() in the new regression test).
>>>> Note that this is a different error from 8160591 and not fixed by the
>>>> change for 8160591.
>>>>
>>>> I've also moved the new check after the offset + length check as
>>>> suggested by you (see new webrev).
>>>>
>>>> Unfortunately, the new check is still not eliminated. Here's how it
>>>> looks:
>>>>
>>>> 0ae B6: # B20 B7 <- B5 Freq: 0,999997
>>>> 0ae movl R9, [rsp + #0] # spill
>>>> 0b2 testl R9, R9
>>>> 0b5 jl B20 P=0,000001 C=-1,000000
>>>> 0b5
>>>> 0bb B7: # B12 B8 <- B6 Freq: 0,999996
>>>> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr
>>>> 0bf decode_klass_not_null RAX,R11
>>>> 0cc movl RBX, [RAX + #16 (8-bit)] # int
>>>> 0cf movslq RCX, RBX # i2l
>>>> 0d2 movq RSI, precise klass [Ljava/lang/Object;:
>>>> 0x00007ff1080320d0:Constant:exact * # ptr
>>>> 0dc movq RCX, [RSI + RCX] # class
>>>> 0e0 cmpq RAX, RCX # ptr
>>>> 0e3 jne,us B12 P=0,170000 C=-1,000000
>>>> 0e3
>>>> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996
>>>> 0e5 testl R9, R9
>>>> 0e8 jle B21 P=0,000001 C=-1,000000
>>>>
>>>> As you can see 'testl R9, R9' is executed two times.
>>>>
>>>> I've even tried to move the new check after the subtype check, but
>>>> that doesn't helps either:
>>>>
>>>> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997
>>>> 0da movl R11, [rsp + #8] # spill
>>>> 0df testl R11, R11
>>>> 0e2 jl B20 P=0,000001 C=-1,000000
>>>> 0e2
>>>> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996
>>>> 0e8 testl R11, R11
>>>> 0eb jle,s B10 P=0,000001 C=-1,000000
>>>>
>>>> Any idea how this could be fixed?
>>>>
>>>> Thanks,
>>>> Volker
>>>>
>>>> PS: and I still don't have a reproducible benchmark which shows a
>>>> regression with my change...
>>>>
>>>>
>>>>> Actually additional dynamic check will help in case of negative
>>>>> length is
>>>>> know during compilation. The allocation code will be eliminated very
>>>>> early
>>>>> instead of waiting macro expansion:
>>>>>
>>>>> int length =
>>>>> alloc->in(AllocateNode::ALength)->find_int_con(-1);
>>>>> if (length < 0) {
>>>>> NOT_PRODUCT(fail_eliminate = "Array's size is not
>>>>> constant";)
>>>>> can_eliminate = false;
>>>>> }
>>>>>
>>>>> About additional length check in your new test. I think it may be
>>>>> collapsed
>>>>> with preceding check since it is generated after other checks.
>>>>> So I would suggest to move it after offset + length check.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> On 8/16/16 7:57 AM, Volker Simonis wrote:
>>>>>>
>>>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann
>>>>>> <tobias.hartmann at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Volker,
>>>>>>>
>>>>>>> thanks for taking care of this issue!
>>>>>>>
>>>>>>> Did you check what happens if the allocation is not eliminated and
>>>>>>> macro
>>>>>>> expansion phase emits another negative guard? Are the checks
>>>>>>> merged?
>>>>>>>
>>>>>> It depends. I just saw that in some cases the regression test worked
>>>>>> before, because the length check was done in
>>>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously
>>>>>> nothing that can be merged. But the test case is obviously a
>>>>>> degenerated example anyway, so I don't think that's a problem.
>>>>>>
>>>>>> If I do a more real-world example like this where the arracopy
>>>>>> can not
>>>>>> be eliminated because one of its arguments escapes:
>>>>>>
>>>>>> public static boolean do_test2(int length, Object[] dest) {
>>>>>> try {
>>>>>> System.arraycopy(new Object[10], 1, dest, 1, length);
>>>>>> return false;
>>>>>> } catch (IndexOutOfBoundsException e) {
>>>>>> return true;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> and call it with:
>>>>>>
>>>>>> do_test2(8, new Object[10])
>>>>>>
>>>>>> the generated code for do_test2() unfortunately contains one more
>>>>>> check now with my change (the 'length' field is in [rsp + #0]):
>>>>>>
>>>>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999
>>>>>> 0a2 movl R9, [rsp + #0] # spill
>>>>>> 0a6 testl R9, R9
>>>>>> 0a9 jl B18 P=0,000001 C=-1,000000
>>>>>> 0a9
>>>>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998
>>>>>> 0af movl RBX, R9 # spill
>>>>>> 0b2 incl RBX # int
>>>>>> 0b4 cmpl RBX, #10 # unsigned
>>>>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000
>>>>>>
>>>>>> The generated code before my change looked like this (againthe
>>>>>> 'length' field is in [rsp + #0]):
>>>>>>
>>>>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999
>>>>>> 0a1 movl R11, [rsp + #8] # spill
>>>>>> 0a6 incl R11 # int
>>>>>> 0a9 cmpl R11, #10 # unsigned
>>>>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000
>>>>>>
>>>>>> It seems that the 'length' check has been completely eliminated
>>>>>> before.
>>>>>>
>>>>>> So I need to do some more tests to understand why the new check
>>>>>> isn't
>>>>>> eliminated.
>>>>>>
>>>>>> Do you think the new check results in a performance regression? Have
>>>>>> you run some benchmarks?
>>>>>>
>>>>>>> I would prefer brackets around the if body but you don't need to
>>>>>>> send
>>>>>>> another webrev:
>>>>>>> if (EliminateAllocations) {
>>>>>>> generate_negative_guard(length, slow_region);
>>>>>>> }
>>>>>>
>>>>>>
>>>>>> Yes, I agree.
>>>>>>
>>>>>>> Best regards,
>>>>>>> Tobias
>>>>>>>
>>>>>>> On 12.08.2016 21:13, Volker Simonis wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> can I please have a review and sponsor for the following fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611
>>>>>>>>
>>>>>>>>
>>>>>>>> We are inserting several checks for the arguments of
>>>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before
>>>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy.
>>>>>>>> However the
>>>>>>>> check for the 'length' argument of arracopy is postponed to the
>>>>>>>> macro
>>>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy().
>>>>>>>>
>>>>>>>> But if we are running with EscapeAnalysis and
>>>>>>>> EliminateAllocations,
>>>>>>>> the array allocations inside a call to System.arraycopy() may get
>>>>>>>> eliminated and thus the complete call to System.arraycopy()
>>>>>>>> will be
>>>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In
>>>>>>>> this
>>>>>>>> case the extra 'length' check won't be added by
>>>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro
>>>>>>>> expansion happens after the elimination of macro nodes.
>>>>>>>>
>>>>>>>> In such a case it may happen that System.arraycopy() will silently
>>>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it
>>>>>>>> should actually throw an ArrayOutOfBounds exception.
>>>>>>>>
>>>>>>>> The fix is simple: also insert a check for the length field in
>>>>>>>> LibraryCallKit::inline_arraycopy() if we are running with
>>>>>>>> EliminateAllocations.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Volker
>>>>>>>>
>
More information about the hotspot-compiler-dev
mailing list