[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero
Christian Hagedorn
christian.hagedorn at oracle.com
Wed Jul 15 17:44:27 UTC 2020
Thank you Vladimir for your review!
Best regards,
Christian
On 15.07.20 19:43, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 7/15/20 6:08 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> On 14.07.20 21:07, Vladimir Kozlov wrote:
>>> > Thinking about this type propagation problem, couldn't we somehow
>>> set the type of the Opaque1 node hiding the pre-loop
>>> > limit to the same type as the pre-loop limit to allow this
>>> information to flow to the pre and main loop? Or would that
>>> > cause other problems? I guess there probably must be a reason why
>>> we don't do it like that.
>>>
>>> It has wide type to prevent premature optimizations before loop is
>>> fully transformed. That is the reason we add it in first place.
>>>
>>> But it would be interesting to see if we can use more narrow type:
>>> TypeInt::POS1 for example for positive limits (>0) (and opposite for
>>> negative limits < 0). I may be missing some nuances and it may not
>>> work but we should try.
>>
>> I had an additional discussion about this with Roland. He made a good
>> point that not the Opaque1 nodes themselves are the problem but rather
>> the type of the iv phi, or more specifically the PhiNode::Value()
>> function.
>>
>> Before creating pre/main/post loops, the iv phi has already a narrow
>> type 1..300 set by PhiNode::Value(). However, when creating the pre
>> (and post loop), we actually widen the type of the iv phi of the
>> pre-loop to int>=1 (based on the pre-loop limit which is an Opaque1
>> node with type int). Roland suggested that we should not do that but
>> instead filter the returned type with the already existing type to not
>> widen it. I think that makes sense. We are already doing that for the
>> other cases in PhiNode::Value() [1][2]. It looks like we just miss it
>> for the special handling of iv phis of trip-counted loops. This also
>> fixes the assertion failure that occurred before with webrev.00.
>>
>> I created a new webrev based on webrev.00 with this change in
>> PhiNode::Value():
>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/
>>
>> I'm currently running some testing with it again.
>>
>> Best regards,
>> Christian
>>
>>
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097
>>
>> [2]
>> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157
>>
>>
>>> Regards,
>>> Vladimir
>>>
>>> On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>>>>
>>>>> [..] Since the DivINode has a control outside of the main loop [..]
>>>>
>>>> Edit: I actually meant that get_ctrl() returns a node outside of the
>>>> main-loop (i.e. the DivINode is not part of the main-loop body). The
>>>> DivINode still has NULL as control input.
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 14.07.20 14:32, Christian Hagedorn wrote:
>>>>> Hi Vladimir
>>>>>
>>>>> I had a closer look at the failing testcase with webrev.00. The
>>>>> original DivNode has its zero check removed based on correct type
>>>>> information. Afterwards its split through an induction variable phi
>>>>> for which both inputs have non-zero types. So, the DivNode end up
>>>>> after an AddINode (which adds a positive constant) which has a
>>>>> non-zero type. All good so far.
>>>>>
>>>>> Now we add pre/main/post loops and the induction variable phi for
>>>>> the pre-loop gets type int>=1 since the limit for the pre-loop is
>>>>> hidden behind an Opaque1 node which just returns int as type. The
>>>>> AddINode belonging to the loop induction variable phi in the
>>>>> pre-loop is therefore updated to have the type int as well (int>=1
>>>>> + positive_int could overflow). This type information propagates to
>>>>> the main-loop and its AddINode belonging to the loop induction
>>>>> variable phi (which is an input to the DivNode) also gets its type
>>>>> set to int.
>>>>>
>>>>> Later, we add a vector post loop where we clone the main loop and
>>>>> add a phi p for the the AddINode node and its new clone. Since the
>>>>> DivINode has a control outside of the main loop, it is not cloned
>>>>> and gets the phi p as an input. At a later point in time, we want
>>>>> to split through p. But then we detect zero as possible value due
>>>>> to the type range of both AddINodes being int.
>>>>>
>>>>> Even though the type information is not accurate enough, the
>>>>> DivINode is never zero and we could safely apply the split through
>>>>> the phi. We could think about doing a bail out for all kinds of
>>>>> phis but I think it should only be an actual problem for loop
>>>>> induction variable phis.
>>>>>
>>>>> Thinking about this type propagation problem, couldn't we somehow
>>>>> set the type of the Opaque1 node hiding the pre-loop limit to the
>>>>> same type as the pre-loop limit to allow this information to flow
>>>>> to the pre and main loop? Or would that cause other problems? I
>>>>> guess there probably must be a reason why we don't do it like that.
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>>>>> This rise question: why zero check was removed if one of merged
>>>>>> types has 0?
>>>>>> Should we be more careful when we remove zero check?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>>>>> A test in some later tier testing revealed that the assertion
>>>>>>> code is actually too strong. There can be a Div/Mod node whose
>>>>>>> zero check was removed but that is then spilt through a
>>>>>>> non-induction-variable phi whose inputs have zero in their type
>>>>>>> range (which is fine, this happens in some loop opts after
>>>>>>> partial peeling was applied earlier). This happened, for example,
>>>>>>> for a phi which merged two nodes from the original and a cloned
>>>>>>> loop. I think we just need to remove the additional assertion code.
>>>>>>>
>>>>>>> New webrev:
>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
>>>>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>>>>> Thank you Vladimir for your review!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>>>>> Looks good.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> Please review the following patch:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>>>>
>>>>>>>>>> In the failing testcase, C2 removes a zero check for a
>>>>>>>>>> division/modulo node n based on the type information of the
>>>>>>>>>> loop induction variable phi p (always between 1 and 50 and
>>>>>>>>>> never 0). However, n is later split through p and ends up
>>>>>>>>>> after the AddNode which updates the induction variable p. In
>>>>>>>>>> the last iteration j equals 2 and is then updated to 0. The
>>>>>>>>>> division/modulo node n is now executed before the loop limit
>>>>>>>>>> check which results in a SIGFPE.
>>>>>>>>>>
>>>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a
>>>>>>>>>> division or modulo node has its zero check removed (i.e.
>>>>>>>>>> control in NULL) and is split through a phi which has an input
>>>>>>>>>> that could be zero. This should only happen for an induction
>>>>>>>>>> variable phi of a trip-counted (integer) loop.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Christian
More information about the hotspot-compiler-dev
mailing list