[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Jul 14 19:07:31 UTC 2020
> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop
> limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that
> cause other problems? I guess there probably must be a reason why we don't do it like that.
It has wide type to prevent premature optimizations before loop is fully transformed. That is the reason we add it in
first place.
But it would be interesting to see if we can use more narrow type: TypeInt::POS1 for example for positive limits (>0)
(and opposite for negative limits < 0). I may be missing some nuances and it may not work but we should try.
Regards,
Vladimir
On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>
>> [..] Since the DivINode has a control outside of the main loop [..]
>
> Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the
> main-loop body). The DivINode still has NULL as control input.
>
> Best regards,
> Christian
>
> On 14.07.20 14:32, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on
>> correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero
>> types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good
>> so far.
>>
>> Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for
>> the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop
>> induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int could
>> overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction variable
>> phi (which is an input to the DivNode) also gets its type set to int.
>>
>> Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new
>> clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input. At
>> a later point in time, we want to split through p. But then we detect zero as possible value due to the type range of
>> both AddINodes being int.
>>
>> Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the
>> split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an
>> actual problem for loop induction variable phis.
>>
>> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop
>> limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that
>> cause other problems? I guess there probably must be a reason why we don't do it like that.
>>
>> Best regards,
>> Christian
>>
>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>> This rise question: why zero check was removed if one of merged types has 0?
>>> Should we be more careful when we remove zero check?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>> A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod
>>>> node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have zero
>>>> in their type range (which is fine, this happens in some loop opts after partial peeling was applied earlier). This
>>>> happened, for example, for a phi which merged two nodes from the original and a cloned loop. I think we just need to
>>>> remove the additional assertion code.
>>>>
>>>> New webrev:
>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>> Thank you Vladimir for your review!
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>> Looks good.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Please review the following patch:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>
>>>>>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of
>>>>>>> the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p and
>>>>>>> ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is then
>>>>>>> updated to 0. The division/modulo node n is now executed before the loop limit check which results in a SIGFPE.
>>>>>>>
>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed (i.e.
>>>>>>> control in NULL) and is split through a phi which has an input that could be zero. This should only happen for an
>>>>>>> induction variable phi of a trip-counted (integer) loop.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
More information about the hotspot-compiler-dev
mailing list