[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero

Wed Jul 15 13:08:33 UTC 2020

Hi Vladimir

On 14.07.20 21:07, Vladimir Kozlov wrote:
>  > Thinking about this type propagation problem, couldn't we somehow set 
> the type of the Opaque1 node hiding the pre-loop
>  > limit to the same type as the pre-loop limit to allow this 
> information to flow to the pre and main loop? Or would that
>  > cause other problems? I guess there probably must be a reason why we 
> don't do it like that.
> 
> It has wide type to prevent premature optimizations before loop is fully 
> transformed. That is the reason we add it in first place.
> 
> But it would be interesting to see if we can use more narrow type: 
> TypeInt::POS1 for example for positive limits (>0) (and opposite for 
> negative limits < 0). I may be missing some nuances and it may not work 
> but we should try.

I had an additional discussion about this with Roland. He made a good 
point that not the Opaque1 nodes themselves are the problem but rather 
the type of the iv phi, or more specifically the PhiNode::Value() function.

Before creating pre/main/post loops, the iv phi has already a narrow 
type 1..300 set by PhiNode::Value(). However, when creating the pre (and 
post loop), we actually widen the type of the iv phi of the pre-loop to 
int>=1 (based on the pre-loop limit which is an Opaque1 node with type 
int). Roland suggested that we should not do that but instead filter the 
returned type with the already existing type to not widen it. I think 
that makes sense. We are already doing that for the other cases in 
PhiNode::Value() [1][2]. It looks like we just miss it for the special 
handling of iv phis of trip-counted loops. This also fixes the assertion 
failure that occurred before with webrev.00.

I created a new webrev based on webrev.00 with this change in 
PhiNode::Value():
http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/

I'm currently running some testing with it again.

Best regards,
Christian

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157

> Regards,
> Vladimir
> 
> On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>>
>>> [..] Since the DivINode has a control outside of the main loop [..]
>>
>> Edit: I actually meant that get_ctrl() returns a node outside of the 
>> main-loop (i.e. the DivINode is not part of the main-loop body). The 
>> DivINode still has NULL as control input.
>>
>> Best regards,
>> Christian
>>
>> On 14.07.20 14:32, Christian Hagedorn wrote:
>>> Hi Vladimir
>>>
>>> I had a closer look at the failing testcase with webrev.00. The 
>>> original DivNode has its zero check removed based on correct type 
>>> information. Afterwards its split through an induction variable phi 
>>> for which both inputs have non-zero types. So, the DivNode end up 
>>> after an AddINode (which adds a positive constant) which has a 
>>> non-zero type. All good so far.
>>>
>>> Now we add pre/main/post loops and the induction variable phi for the 
>>> pre-loop gets type int>=1 since the limit for the pre-loop is hidden 
>>> behind an Opaque1 node which just returns int as type. The AddINode 
>>> belonging to the loop induction variable phi in the pre-loop is 
>>> therefore updated to have the type int as well (int>=1 + positive_int 
>>> could overflow). This type information propagates to the main-loop 
>>> and its AddINode belonging to the loop induction variable phi (which 
>>> is an input to the DivNode) also gets its type set to int.
>>>
>>> Later, we add a vector post loop where we clone the main loop and add 
>>> a phi p for the the AddINode node and its new clone. Since the 
>>> DivINode has a control outside of the main loop, it is not cloned and 
>>> gets the phi p as an input. At a later point in time, we want to 
>>> split through p. But then we detect zero as possible value due to the 
>>> type range of both AddINodes being int.
>>>
>>> Even though the type information is not accurate enough, the DivINode 
>>> is never zero and we could safely apply the split through the phi. We 
>>> could think about doing a bail out for all kinds of phis but I think 
>>> it should only be an actual problem for loop induction variable phis.
>>>
>>> Thinking about this type propagation problem, couldn't we somehow set 
>>> the type of the Opaque1 node hiding the pre-loop limit to the same 
>>> type as the pre-loop limit to allow this information to flow to the 
>>> pre and main loop? Or would that cause other problems? I guess there 
>>> probably must be a reason why we don't do it like that.
>>>
>>> Best regards,
>>> Christian
>>>
>>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>>> This rise question: why zero check was removed if one of merged 
>>>> types has 0?
>>>> Should we be more careful when we remove zero check?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>>> A test in some later tier testing revealed that the assertion code 
>>>>> is actually too strong. There can be a Div/Mod node whose zero 
>>>>> check was removed but that is then spilt through a 
>>>>> non-induction-variable phi whose inputs have zero in their type 
>>>>> range (which is fine, this happens in some loop opts after partial 
>>>>> peeling was applied earlier). This happened, for example, for a phi 
>>>>> which merged two nodes from the original and a cloned loop. I think 
>>>>> we just need to remove the additional assertion code.
>>>>>
>>>>> New webrev:
>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>>> Thank you Vladimir for your review!
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>>> Looks good.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Please review the following patch:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>>
>>>>>>>> In the failing testcase, C2 removes a zero check for a 
>>>>>>>> division/modulo node n based on the type information of the loop 
>>>>>>>> induction variable phi p (always between 1 and 50 and never 0). 
>>>>>>>> However, n is later split through p and ends up after the 
>>>>>>>> AddNode which updates the induction variable p. In the last 
>>>>>>>> iteration j equals 2 and is then updated to 0. The 
>>>>>>>> division/modulo node n is now executed before the loop limit 
>>>>>>>> check which results in a SIGFPE.
>>>>>>>>
>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a 
>>>>>>>> division or modulo node has its zero check removed (i.e. control 
>>>>>>>> in NULL) and is split through a phi which has an input that 
>>>>>>>> could be zero. This should only happen for an induction variable 
>>>>>>>> phi of a trip-counted (integer) loop.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian