[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero

Wed Jul 15 17:43:03 UTC 2020

Looks good.

Thanks,
Vladimir

On 7/15/20 6:08 AM, Christian Hagedorn wrote:
> Hi Vladimir
> 
> On 14.07.20 21:07, Vladimir Kozlov wrote:
>>  > Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop
>>  > limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that
>>  > cause other problems? I guess there probably must be a reason why we don't do it like that.
>>
>> It has wide type to prevent premature optimizations before loop is fully transformed. That is the reason we add it in 
>> first place.
>>
>> But it would be interesting to see if we can use more narrow type: TypeInt::POS1 for example for positive limits (>0) 
>> (and opposite for negative limits < 0). I may be missing some nuances and it may not work but we should try.
> 
> I had an additional discussion about this with Roland. He made a good point that not the Opaque1 nodes themselves are 
> the problem but rather the type of the iv phi, or more specifically the PhiNode::Value() function.
> 
> Before creating pre/main/post loops, the iv phi has already a narrow type 1..300 set by PhiNode::Value(). However, when 
> creating the pre (and post loop), we actually widen the type of the iv phi of the pre-loop to int>=1 (based on the 
> pre-loop limit which is an Opaque1 node with type int). Roland suggested that we should not do that but instead filter 
> the returned type with the already existing type to not widen it. I think that makes sense. We are already doing that 
> for the other cases in PhiNode::Value() [1][2]. It looks like we just miss it for the special handling of iv phis of 
> trip-counted loops. This also fixes the assertion failure that occurred before with webrev.00.
> 
> I created a new webrev based on webrev.00 with this change in PhiNode::Value():
> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/
> 
> I'm currently running some testing with it again.
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097
> [2] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157
> 
>> Regards,
>> Vladimir
>>
>> On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>>>
>>>> [..] Since the DivINode has a control outside of the main loop [..]
>>>
>>> Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the 
>>> main-loop body). The DivINode still has NULL as control input.
>>>
>>> Best regards,
>>> Christian
>>>
>>> On 14.07.20 14:32, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>> I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on 
>>>> correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero 
>>>> types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good 
>>>> so far.
>>>>
>>>> Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for 
>>>> the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop 
>>>> induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int 
>>>> could overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction 
>>>> variable phi (which is an input to the DivNode) also gets its type set to int.
>>>>
>>>> Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new 
>>>> clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input. 
>>>> At a later point in time, we want to split through p. But then we detect zero as possible value due to the type 
>>>> range of both AddINodes being int.
>>>>
>>>> Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the 
>>>> split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an 
>>>> actual problem for loop induction variable phis.
>>>>
>>>> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the 
>>>> pre-loop limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or 
>>>> would that cause other problems? I guess there probably must be a reason why we don't do it like that.
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>>>> This rise question: why zero check was removed if one of merged types has 0?
>>>>> Should we be more careful when we remove zero check?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>>>> A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod 
>>>>>> node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have 
>>>>>> zero in their type range (which is fine, this happens in some loop opts after partial peeling was applied 
>>>>>> earlier). This happened, for example, for a phi which merged two nodes from the original and a cloned loop. I 
>>>>>> think we just need to remove the additional assertion code.
>>>>>>
>>>>>> New webrev:
>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
>>>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>>>> Thank you Vladimir for your review!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
>>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> Please review the following patch:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>>>
>>>>>>>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of 
>>>>>>>>> the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p 
>>>>>>>>> and ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is 
>>>>>>>>> then updated to 0. The division/modulo node n is now executed before the loop limit check which results in a 
>>>>>>>>> SIGFPE.
>>>>>>>>>
>>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed 
>>>>>>>>> (i.e. control in NULL) and is split through a phi which has an input that could be zero. This should only 
>>>>>>>>> happen for an induction variable phi of a trip-counted (integer) loop.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Christian