[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero

Tue Jul 14 12:32:19 UTC 2020

Hi Vladimir

I had a closer look at the failing testcase with webrev.00. The original 
DivNode has its zero check removed based on correct type information. 
Afterwards its split through an induction variable phi for which both 
inputs have non-zero types. So, the DivNode end up after an AddINode 
(which adds a positive constant) which has a non-zero type. All good so far.

Now we add pre/main/post loops and the induction variable phi for the 
pre-loop gets type int>=1 since the limit for the pre-loop is hidden 
behind an Opaque1 node which just returns int as type. The AddINode 
belonging to the loop induction variable phi in the pre-loop is 
therefore updated to have the type int as well (int>=1 + positive_int 
could overflow). This type information propagates to the main-loop and 
its AddINode belonging to the loop induction variable phi (which is an 
input to the DivNode) also gets its type set to int.

Later, we add a vector post loop where we clone the main loop and add a 
phi p for the the AddINode node and its new clone. Since the DivINode 
has a control outside of the main loop, it is not cloned and gets the 
phi p as an input. At a later point in time, we want to split through p. 
But then we detect zero as possible value due to the type range of both 
AddINodes being int.

Even though the type information is not accurate enough, the DivINode is 
never zero and we could safely apply the split through the phi. We could 
think about doing a bail out for all kinds of phis but I think it should 
only be an actual problem for loop induction variable phis.

Thinking about this type propagation problem, couldn't we somehow set 
the type of the Opaque1 node hiding the pre-loop limit to the same type 
as the pre-loop limit to allow this information to flow to the pre and 
main loop? Or would that cause other problems? I guess there probably 
must be a reason why we don't do it like that.

Best regards,
Christian

On 13.07.20 19:16, Vladimir Kozlov wrote:
> This rise question: why zero check was removed if one of merged types 
> has 0?
> Should we be more careful when we remove zero check?
> 
> Thanks,
> Vladimir
> 
> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>> A test in some later tier testing revealed that the assertion code is 
>> actually too strong. There can be a Div/Mod node whose zero check was 
>> removed but that is then spilt through a non-induction-variable phi 
>> whose inputs have zero in their type range (which is fine, this 
>> happens in some loop opts after partial peeling was applied earlier). 
>> This happened, for example, for a phi which merged two nodes from the 
>> original and a cloned loop. I think we just need to remove the 
>> additional assertion code.
>>
>> New webrev:
>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>
>> Best regards,
>> Christian
>>
>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>> Thank you Vladimir for your review!
>>>
>>> Best regards,
>>> Christian
>>>
>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>> Looks good.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>> Hi
>>>>>
>>>>> Please review the following patch:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>
>>>>> In the failing testcase, C2 removes a zero check for a 
>>>>> division/modulo node n based on the type information of the loop 
>>>>> induction variable phi p (always between 1 and 50 and never 0). 
>>>>> However, n is later split through p and ends up after the AddNode 
>>>>> which updates the induction variable p. In the last iteration j 
>>>>> equals 2 and is then updated to 0. The division/modulo node n is 
>>>>> now executed before the loop limit check which results in a SIGFPE.
>>>>>
>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division 
>>>>> or modulo node has its zero check removed (i.e. control in NULL) 
>>>>> and is split through a phi which has an input that could be zero. 
>>>>> This should only happen for an induction variable phi of a 
>>>>> trip-counted (integer) loop.
>>>>>
>>>>> Best regards,
>>>>> Christian