[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero

Wed Jul 15 17:44:27 UTC 2020

Thank you Vladimir for your review!

Best regards,
Christian

On 15.07.20 19:43, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 7/15/20 6:08 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> On 14.07.20 21:07, Vladimir Kozlov wrote:
>>>  > Thinking about this type propagation problem, couldn't we somehow 
>>> set the type of the Opaque1 node hiding the pre-loop
>>>  > limit to the same type as the pre-loop limit to allow this 
>>> information to flow to the pre and main loop? Or would that
>>>  > cause other problems? I guess there probably must be a reason why 
>>> we don't do it like that.
>>>
>>> It has wide type to prevent premature optimizations before loop is 
>>> fully transformed. That is the reason we add it in first place.
>>>
>>> But it would be interesting to see if we can use more narrow type: 
>>> TypeInt::POS1 for example for positive limits (>0) (and opposite for 
>>> negative limits < 0). I may be missing some nuances and it may not 
>>> work but we should try.
>>
>> I had an additional discussion about this with Roland. He made a good 
>> point that not the Opaque1 nodes themselves are the problem but rather 
>> the type of the iv phi, or more specifically the PhiNode::Value() 
>> function.
>>
>> Before creating pre/main/post loops, the iv phi has already a narrow 
>> type 1..300 set by PhiNode::Value(). However, when creating the pre 
>> (and post loop), we actually widen the type of the iv phi of the 
>> pre-loop to int>=1 (based on the pre-loop limit which is an Opaque1 
>> node with type int). Roland suggested that we should not do that but 
>> instead filter the returned type with the already existing type to not 
>> widen it. I think that makes sense. We are already doing that for the 
>> other cases in PhiNode::Value() [1][2]. It looks like we just miss it 
>> for the special handling of iv phis of trip-counted loops. This also 
>> fixes the assertion failure that occurred before with webrev.00.
>>
>> I created a new webrev based on webrev.00 with this change in 
>> PhiNode::Value():
>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/
>>
>> I'm currently running some testing with it again.
>>
>> Best regards,
>> Christian
>>
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097 
>>
>> [2] 
>> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157 
>>
>>
>>> Regards,
>>> Vladimir
>>>
>>> On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>>>>
>>>>> [..] Since the DivINode has a control outside of the main loop [..]
>>>>
>>>> Edit: I actually meant that get_ctrl() returns a node outside of the 
>>>> main-loop (i.e. the DivINode is not part of the main-loop body). The 
>>>> DivINode still has NULL as control input.
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 14.07.20 14:32, Christian Hagedorn wrote:
>>>>> Hi Vladimir
>>>>>
>>>>> I had a closer look at the failing testcase with webrev.00. The 
>>>>> original DivNode has its zero check removed based on correct type 
>>>>> information. Afterwards its split through an induction variable phi 
>>>>> for which both inputs have non-zero types. So, the DivNode end up 
>>>>> after an AddINode (which adds a positive constant) which has a 
>>>>> non-zero type. All good so far.
>>>>>
>>>>> Now we add pre/main/post loops and the induction variable phi for 
>>>>> the pre-loop gets type int>=1 since the limit for the pre-loop is 
>>>>> hidden behind an Opaque1 node which just returns int as type. The 
>>>>> AddINode belonging to the loop induction variable phi in the 
>>>>> pre-loop is therefore updated to have the type int as well (int>=1 
>>>>> + positive_int could overflow). This type information propagates to 
>>>>> the main-loop and its AddINode belonging to the loop induction 
>>>>> variable phi (which is an input to the DivNode) also gets its type 
>>>>> set to int.
>>>>>
>>>>> Later, we add a vector post loop where we clone the main loop and 
>>>>> add a phi p for the the AddINode node and its new clone. Since the 
>>>>> DivINode has a control outside of the main loop, it is not cloned 
>>>>> and gets the phi p as an input. At a later point in time, we want 
>>>>> to split through p. But then we detect zero as possible value due 
>>>>> to the type range of both AddINodes being int.
>>>>>
>>>>> Even though the type information is not accurate enough, the 
>>>>> DivINode is never zero and we could safely apply the split through 
>>>>> the phi. We could think about doing a bail out for all kinds of 
>>>>> phis but I think it should only be an actual problem for loop 
>>>>> induction variable phis.
>>>>>
>>>>> Thinking about this type propagation problem, couldn't we somehow 
>>>>> set the type of the Opaque1 node hiding the pre-loop limit to the 
>>>>> same type as the pre-loop limit to allow this information to flow 
>>>>> to the pre and main loop? Or would that cause other problems? I 
>>>>> guess there probably must be a reason why we don't do it like that.
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>>>>> This rise question: why zero check was removed if one of merged 
>>>>>> types has 0?
>>>>>> Should we be more careful when we remove zero check?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>>>>> A test in some later tier testing revealed that the assertion 
>>>>>>> code is actually too strong. There can be a Div/Mod node whose 
>>>>>>> zero check was removed but that is then spilt through a 
>>>>>>> non-induction-variable phi whose inputs have zero in their type 
>>>>>>> range (which is fine, this happens in some loop opts after 
>>>>>>> partial peeling was applied earlier). This happened, for example, 
>>>>>>> for a phi which merged two nodes from the original and a cloned 
>>>>>>> loop. I think we just need to remove the additional assertion code.
>>>>>>>
>>>>>>> New webrev:
>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
>>>>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>>>>> Thank you Vladimir for your review!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>>>>> Looks good.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> Please review the following patch:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>>>>
>>>>>>>>>> In the failing testcase, C2 removes a zero check for a 
>>>>>>>>>> division/modulo node n based on the type information of the 
>>>>>>>>>> loop induction variable phi p (always between 1 and 50 and 
>>>>>>>>>> never 0). However, n is later split through p and ends up 
>>>>>>>>>> after the AddNode which updates the induction variable p. In 
>>>>>>>>>> the last iteration j equals 2 and is then updated to 0. The 
>>>>>>>>>> division/modulo node n is now executed before the loop limit 
>>>>>>>>>> check which results in a SIGFPE.
>>>>>>>>>>
>>>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a 
>>>>>>>>>> division or modulo node has its zero check removed (i.e. 
>>>>>>>>>> control in NULL) and is split through a phi which has an input 
>>>>>>>>>> that could be zero. This should only happen for an induction 
>>>>>>>>>> variable phi of a trip-counted (integer) loop.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Christian