Complexity reduction when choosing the best inference leaf to be solved
Vicente Romero
vicente.romero at oracle.com
Thu Oct 12 19:04:07 UTC 2017
Hi,
Sorry for the delay on coming into this. This is one of the most magical
parts of the compiler, and I'm not referring to the algorithms but to
the general problem. Which is the one I think we need to understand
before trying to optimize the current implementation. Because it could
be the case that the current implementation is not optimal, and we end
up making a sub-optimal implementation to run faster.
The general problem or situation here is that we start from a directed
graph representing the dependencies between inference variables. From
this graph a directed acyclic graph is obtained by joining strongly
connected components into "meta" or "fat" nodes. This is the point where
we start. Now I think that the problem we want to solve is: given a
directed acyclic graph and a list of inference variables find the
resolution "path" with minimal, or minimum if possible, number of
instantiations. Such a path should be an ordered list containing one or
more nodes. The first node must be a leaf.
Given the nature of the instantiation process, once we instantiate a
variable, all nodes containing that variable will be instantiated. A
node with all of its variables instantiated, let's call it instantiated
node, can be removed from the graph as it won't be adding any
information. There is a cost associated to every node, and that cost
could be the number of variables, still, to be instantiated in the node.
I think that there are two main strategies here:
* aggressive
* conservative
In an aggressive strategy I would expect that if cost(n1) == cost(n2)
and n1 and n2 are leafs, then we don't have another criteria to propose
a resolution order between n1 and n2, so we can pick any of them. Of
course I'm also assuming that n1 and n2 contain at least one of the
variables we want to solve. This means that a solution to the aggressive
strategy could be:
1. Add all leaves, having at least one of the variables we want to
solve, to a priority queue, ordered by cost. I wonder if there could
be a case for which no leaf has the variables we want to solve. I
guess that in that case we would add all the leaves to the queue
ordered by cost.
2. While there are elements in the queue
3. Get the first node, solve the contained variables and remove it from
the graph along with all instantiated nodes. Check what other nodes
in the graph are now leaves and add them to the queue following the
same criteria.
The aggressive strategy should be close to the minimum instantiations
with the benefit that finding the instantiation path is rather simple
and I think that would give the right order and result in most / all
cases. Now let's take a look at the conservative strategy, we want the
absolute minimum instantiations. So we have a graph, we have cost and we
want to optimize. This seems to me a variation of a classical max flow
problem. The cost is in the vertex instead of in the edges, we want to
minimize instead of maximize, but doing a couple of adaptations here and
there we can get to the classical problem. First we can update the cost
to be Integer.MAX_VALUE - cost, or any other value greater or equal to
the number of variables. We need to add a source and a sink node,
implementation could do this implicitly. And add an edge from the source
to every non-leaf node with at least one of the variables we want to
solve. There will be no cost associated to this edge. In addition we
will add an edge from every leaf node to the sink. If we want to
represent the cost in the edges then the cost of every edge from a leaf
to the sink will be the cost of the corresponding leaf. The path with
minimum instantiations will be the solution to the max flow problem. I
assume that if there is no non-leaf node that contains all the variables
we want to solve, then we will have to instantiate the nodes following
the path, starting from the leaf. The we can remove from the graph
instantiated nodes and once we have consumed the last node in the path
if there are still variables to be solved, we can apply the max flow
algorithm again, this time with far less nodes and do this till there
are no more variables to be solved.
This is my interpretation of this problem and how I think we should
discuss if this interpretation is correct, or if there is a better one,
before considering optimizing the current implementation which doesn't
fit very well into any of these descriptions strategies.
Comments?
Vicente
On 10/09/2017 02:00 PM, B. Blaser wrote:
> Hi,
>
> On 5 October 2017 at 20:26, B. Blaser <bsrbnd at gmail.com> wrote:
>> On 5 October 2017 at 16:28, Vicente Romero <vicente.romero at oracle.com> wrote:
>>>
>>> On 10/05/2017 04:32 AM, Maurizio Cimadamore wrote:
>>>> Please, send me a patch with all your fixed, including the new version of
>>>> (3) and I'll take a look/run some tests.
>>>
>>> +1 yes please share the last full version
>> Here it is, in attachment.
>>
>> Note that there's another possibility for pt (2), using "return
>> noPath" instead of "throw", but I think this would probably be slower.
>>
>> I did only some quick tests with this last full version, so please let
>> me know if you encounter any problem with it.
>>
>> I think the example I gave previously is really faster with (1) and
>> (3); we pick immediately, in O(1), 'X' then 'Y' and finally 'Z'.
> Here are two more examples showing the benefits of pt (2), this time.
> The first one uses 'T' via the actual arguments of 'n()' to build
> nodes with multiple ivars and the second one uses 'List<T>' to build
> dependent nodes of single ivar.
>
> When inferring the type of the lambda parameters in the example below,
> we first try the non-optimal node [T, B] of size 2. Then we try node
> [C] of size 1 but depending on [T, A] and [T, B]. When examining the
> first dependency [T, A] we can immediately conclude that this
> possibility is worse than the best known so far, discarding it with
> the first 'throw' and without evaluating the second dependency [T, B]:
>
> import java.util.Map;
> import java.util.List;
> import java.util.function.BiConsumer;
>
> public class H {
> <T> T m(T t) { return t; }
>
> <A, B, C extends Map<A, B>> void n(BiConsumer<B, C> f, A a, B b, C c) {}
>
> void o() {
> n( (x, y) -> {}, m(""), m(""), null);
> }
> }
>
> This next example shows another benefit of pt (2) when inferring the
> type of the lambda parameters. We first examine node [B] which depends
> on [T] having then a full cost of 2. Then, we start evaluating [C]
> which depends on [A] + [T] and [B] + [T]. When computing the full cost
> of [A], we can immediately discard this possibility (via the second
> 'throw'), the latter being worse than the best known so far:
>
> public class H {
> <T> List<T> m(T t) { return null; }
>
> <A, B, C extends Map<A, B>> void n(BiConsumer<B, C> f, A a, B b, C c) {}
>
> void o() {
> n( (x, y) -> {}, m(""), m(""), null);
> }
> }
>
> Cheers,
> Bernard
>
>
>> Please let me know about the test results.
>>
>> Cheers,
>> Bernard
>>
>>>> Cheers
>>>> Maurizio
>>>
>>> Vicente
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20171012/6c51ee4b/attachment-0001.html>
More information about the compiler-dev
mailing list