[9] RFR(M): 8038348: Instance field load is replaced by wrong data Phi

Tue Aug 23 11:39:21 UTC 2016

On 23.08.2016 13:39, Vladimir Ivanov wrote:
> 
>>>> http://cr.openjdk.java.net/~thartmann/8038348/webrev.01/
>>>
>>> I'm curious how does PhiNode::_inst_mem_id play with PhaseRenumberLive which changes node ids. I don't see any logic to adjust PhiNode::_inst_mem_id after the renumbering is over.
>>
>> Thanks for looking at this! As I explained in my previous email, this is currently not an issue because PhaseRenumberLive is only executed *before* EA and PhiNode::_inst_mem_id is only set if Type::is_known_instance(), i.e., if instance_id is set in ConnectionGraph::split_unique_types() -> TypeOopPtr::cast_to_instance_id(). If this ever changes, the assert I added to PhaseRenumberLive will be triggered and we need to add code to adjust the _inst_mem_id field.
> 
> Sorry, missed it. Thanks for the clarification, the fix looks good.

No problem, thanks for the review!

Best regards,
Tobias

> 
> Best regards,
> Vladimir Ivanov
> 
>>>>>> On 8/3/16 11:34 AM, Tobias Hartmann wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> thanks for the review!
>>>>>>>
>>>>>>> On 03.08.2016 03:19, Vladimir Kozlov wrote:
>>>>>>>> Thank you Tobias for looking on this old problem.
>>>>>>>>
>>>>>>>> So we have to find a data Phi which gives the same result as LoadNode::split_through_phi() for split through memory Phi3. Identity() code is short cut for that.
>>>>>>>
>>>>>>> Yes, exactly.
>>>>>>>
>>>>>>>> Unfortunately I forgot what the case was when I added this code.
>>>>>>>
>>>>>>> There is a comment that suggests that this is to avoid creating "infinite" Phis in a loop:
>>>>>>>    // Search for an existing data phi which was generated before for the same
>>>>>>>    // instance's field to avoid infinite generation of phis in a loop.
>>>>>>>
>>>>>>> You also described this in JDK-6673473 (see [1], [2]) which introduced that code. It's not clear to me how creation of infinite Phis can happen, do you remember the exact problem?
>>>>>>
>>>>>> More like this one:
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/de93acbb64fc
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-6682236?focusedCommentId=12388989&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12388989
>>>>>>
>>>>>> It is the case where memory Phi's back edge input points back to this Phi (may be though nodes which does not modify the instance). As result Load node (clone on back branch) stays in graph and it will be split again because Ideal(), which does the split, called before Identity() before the fix.
>>>>>
>>>>> Right, thanks for the pointers.
>>>>>
>>>>>>>> It is definitely incorrect in this case when you have multiple data Phi nodes.
>>>>>>>> Since in general case it will be difficult to find correct Phi may be we should not do this optimization. What happened if we do not do replacement if the region has several Phis? Can you check?
>>>>>>>
>>>>>>> I also thought about removing that code entirely. It does fix the problem but I was worried that we may miss optimization opportunities if equal Phis are not merged or new Phis are created for existing data paths, respectively. I'll do a performance another run without that code.
>>>>>>
>>>>>> Also Identity() for new data Phi will not IGVNed with previous generated data Phi since back edge input is different.
>>>>>>
>>>>>> Based on the above we can't skip Identity() code even if we have several Phi data nodes. Otherwise we can get infinite Phi nodes generation again.
>>>>>>
>>>>>> We back to the problem :(
>>>>>
>>>>> I did some correctness and performance runs with the Phi optimization in LoadNode::Identity() and PhaseMacroExpand::value_from_mem_phi() removed. Turns out we don't hit any asserts/problems with JDK 9 but performance degrades by up to 9% (see comment in the bug). Looks like we don't generate infinite Phis but we miss optimization opportunities.
>>>>>
>>>>>> How about this - record more information to Phi during split_through_Phi(). There is NodeCloneInfo and CloneMap for vectorization optimization. We can record split_through_Phi information in own CloneMap and in Identity code check that Load and data Phi were created before from the same Load node.
>>>>>
>>>>> Right, we could do this but that only works for the case where the data PhiNode was created from the same LoadNode. It would solve the problem with generation of infinite Phis but there may be cases where we would like to re-use a data PhiNode that was created by splitting another LoadNode (for the same field) through the same or a different memory PhiNode.
>>>>>
>>>>> I'll investigate if this approach is good enough.
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>>>>> Also what happens with LoadNode::split_through_phi() call which should happened before calling Identity()?
>>>>>>>
>>>>>>> Well, LoadNode::split_through_phi() invokes LoadNode::Identity() and if Identity does not return an existing Phi, we keep the LoadNode and remove it later. Or what do you mean exactly?
>>>>>>
>>>>>> Yes, I missed that Identity() call.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6673473
>>>>>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2008-March/000070.html
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 7/29/16 4:42 AM, Tobias Hartmann wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> please review the following patch:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8038348
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8038348/webrev.00/
>>>>>>>>>
>>>>>>>>> In C2, instance field loads may be replaced by a corresponding, already existing data Phi in LoadNode::identity():
>>>>>>>>>     // Search for an existing data phi which was generated before for the same
>>>>>>>>>     // instance's field to avoid infinite generation of phis in a loop.
>>>>>>>>>
>>>>>>>>> The problem is, that we only check if the Phi is from the same region and corresponds to the same instance field. We don't check if it contains the actual "up to date" value from the last store to the input memory:
>>>>>>>>>     if (phi->is_Phi() && phi != mem &&
>>>>>>>>>         phi->as_Phi()->is_same_inst_field(this_type, this_iid, this_index, this_offset)) {
>>>>>>>>>       return phi;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> In rare cases, it may happen that the same region contains multiple data Phis for the same instance field. For example:
>>>>>>>>>
>>>>>>>>>             AndI    AndI
>>>>>>>>>             /  \    / \
>>>>>>>>>            /  __\___| |
>>>>>>>>>            | |  |     |
>>>>>>>>>    Region  | |  AddI  AddI
>>>>>>>>>      |  \  | |  | |   |  \_________
>>>>>>>>>      |   \ | /  | |___|_____       |
>>>>>>>>>      |   Phi1   |     |     |      |
>>>>>>>>>      |________  |  ___|  Store1 Store2
>>>>>>>>>               \ | /          \   /
>>>>>>>>>                Phi2          Phi3
>>>>>>>>>                                |
>>>>>>>>>                              LoadI
>>>>>>>>>
>>>>>>>>> In this case, Phi3 is the instance field memory and Phi1/Phi2 are data Phis containing the value of the field. LoadINode::Identity() now tries to replace the load by a data Phi representing the field values corresponding to Store1 and Store2. Because Phi1Node::is_same_inst_field() returns true, the load is replaced by Phi1 which contains an "old" field value. We should use Phi2 instead.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I added PhiNode::mem_contains_value() to check if the given 'value' Phi contains the values from stores on the corresponding input paths of this memory Phi by walking up the memory and data input chains and checking for consistency. To keep complexity low, the implementation is conservative:
>>>>>>>>> - If true, 'value' is guaranteed to contain the values from stores on the memory Phi input paths.
>>>>>>>>> - If false, either 'value' is not the correct Phi or we were not able to prove this due to optimizations of the memory chain that are not (yet) reflected by the value chain nodes or vice versa.
>>>>>>>>>
>>>>>>>>> This problem was caught by the Apache Lucene team some time ago [1] because it causes "impossible" assertions in one of their tests (see my investigation in the bug comments). I was able to reproduce the bug with an old version of their test suite (with JDK 7, 8 and 9) but unfortunately, I was not able to write a regression test. This is because the problem depends on the order in which nodes are being processed by IGVN (field info needs to propagate to Phis) and the order of nodes in the out array of the Region.
>>>>>>>>>
>>>>>>>>> I verified that the fix solves the problem reported by Apache Lucene with JDK 7, 8 and 9. I also executed the hs-comp RBT. Performance numbers look okay (still running).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/LUCENE-5168
>>>>>>>>>