[16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Jul 15 17:43:03 UTC 2020
Looks good.
Thanks,
Vladimir
On 7/15/20 6:08 AM, Christian Hagedorn wrote:
> Hi Vladimir
>
> On 14.07.20 21:07, Vladimir Kozlov wrote:
>> > Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop
>> > limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that
>> > cause other problems? I guess there probably must be a reason why we don't do it like that.
>>
>> It has wide type to prevent premature optimizations before loop is fully transformed. That is the reason we add it in
>> first place.
>>
>> But it would be interesting to see if we can use more narrow type: TypeInt::POS1 for example for positive limits (>0)
>> (and opposite for negative limits < 0). I may be missing some nuances and it may not work but we should try.
>
> I had an additional discussion about this with Roland. He made a good point that not the Opaque1 nodes themselves are
> the problem but rather the type of the iv phi, or more specifically the PhiNode::Value() function.
>
> Before creating pre/main/post loops, the iv phi has already a narrow type 1..300 set by PhiNode::Value(). However, when
> creating the pre (and post loop), we actually widen the type of the iv phi of the pre-loop to int>=1 (based on the
> pre-loop limit which is an Opaque1 node with type int). Roland suggested that we should not do that but instead filter
> the returned type with the already existing type to not widen it. I think that makes sense. We are already doing that
> for the other cases in PhiNode::Value() [1][2]. It looks like we just miss it for the special handling of iv phis of
> trip-counted loops. This also fixes the assertion failure that occurred before with webrev.00.
>
> I created a new webrev based on webrev.00 with this change in PhiNode::Value():
> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/
>
> I'm currently running some testing with it again.
>
> Best regards,
> Christian
>
>
> [1] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097
> [2] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157
>
>> Regards,
>> Vladimir
>>
>> On 7/14/20 5:39 AM, Christian Hagedorn wrote:
>>>
>>>> [..] Since the DivINode has a control outside of the main loop [..]
>>>
>>> Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the
>>> main-loop body). The DivINode still has NULL as control input.
>>>
>>> Best regards,
>>> Christian
>>>
>>> On 14.07.20 14:32, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>> I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on
>>>> correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero
>>>> types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good
>>>> so far.
>>>>
>>>> Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for
>>>> the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop
>>>> induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int
>>>> could overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction
>>>> variable phi (which is an input to the DivNode) also gets its type set to int.
>>>>
>>>> Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new
>>>> clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input.
>>>> At a later point in time, we want to split through p. But then we detect zero as possible value due to the type
>>>> range of both AddINodes being int.
>>>>
>>>> Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the
>>>> split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an
>>>> actual problem for loop induction variable phis.
>>>>
>>>> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the
>>>> pre-loop limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or
>>>> would that cause other problems? I guess there probably must be a reason why we don't do it like that.
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 13.07.20 19:16, Vladimir Kozlov wrote:
>>>>> This rise question: why zero check was removed if one of merged types has 0?
>>>>> Should we be more careful when we remove zero check?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote:
>>>>>> A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod
>>>>>> node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have
>>>>>> zero in their type range (which is fine, this happens in some loop opts after partial peeling was applied
>>>>>> earlier). This happened, for example, for a phi which merged two nodes from the original and a cloned loop. I
>>>>>> think we just need to remove the additional assertion code.
>>>>>>
>>>>>> New webrev:
>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
>>>>>> On 13.07.20 09:19, Christian Hagedorn wrote:
>>>>>>> Thank you Vladimir for your review!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
>>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote:
>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> Please review the following patch:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552
>>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/
>>>>>>>>>
>>>>>>>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of
>>>>>>>>> the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p
>>>>>>>>> and ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is
>>>>>>>>> then updated to 0. The division/modulo node n is now executed before the loop limit check which results in a
>>>>>>>>> SIGFPE.
>>>>>>>>>
>>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed
>>>>>>>>> (i.e. control in NULL) and is split through a phi which has an input that could be zero. This should only
>>>>>>>>> happen for an induction variable phi of a trip-counted (integer) loop.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Christian
More information about the hotspot-compiler-dev
mailing list