RFR: 8268744: Improve sinking algorithm in partial peeling to avoid redundant clones

Roland Westrelin roland at openjdk.java.net
Wed Oct 13 08:47:51 UTC 2021


On Wed, 28 Jul 2021 15:46:13 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> The algorithm in step 2 in partial peeling to move data nodes from the peel section to the non-peel section uses a straight forward cloning algorithm which creates redundant clones when the IR contains one ore more diamonds of data nodes to be cloned. The number of clones grows exponentially which could lead to a bailout (added by [JDK-8256934](https://bugs.openjdk.java.net/browse/JDK-8256934) for JDK 17 with a testcase). This RFE improves this algorithm to handle node diamonds more efficiently to avoid unnecessary cloning. The testcase for JDK-8256934 does not bail out anymore and uses only few clones.
> 
> The main idea is to first find all outside of the loop uses `u` of the nodes in the initial peel region to be moved into the non-peel region. We then only need to clone any data node during the algorithm at most `u` times, once for each initial outside of the loop use. If we process a node diamond (following inputs), we can use an already cloned node for the top node of the diamond (node A in the example below). An example with 1 initial outside of the loop use and 4 nodes to be cloned, forming a diamond, is shown as comment in the code:
> https://github.com/openjdk/jdk/blob/8ae0e1a06558a1678521dcb4ed32708a1821b47d/src/hotspot/share/opto/loopopts.cpp#L3605-L3635
> 
> The algorithm is explained in more details in the comments in the code (starting in method `move_nodes_to_not_peel()`).
> 
> I also cleaned up the code for step 2 of partial peeling. I left the bailout code added by JDK-8256934 in place which I think is still required if we enter partial peeling with a huge number of live nodes (quite rare though).
> 
> I additionally ran some standard benchmarks which did not show any improvements but also no regressions. I think it is rather an edge case where the old algorithm creates a huge number of redundant clones. Nevertheless, I think this improved algorithm is still worth to have to handle the more uncommon case of node diamonds. What do you think?
> 
> Thanks,
> Christian

Can you explain, maybe with an example, how PhaseIdealLoop::get_clone_for_outside_use() works?

-------------

PR: https://git.openjdk.java.net/jdk/pull/4923


More information about the hotspot-compiler-dev mailing list