Linear implementation of Tarjan's algorithm

Fri Nov 3 13:17:37 UTC 2017

On 3 November 2017 at 02:58, Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
>
> On 02/11/17 23:26, B. Blaser wrote:
>>
>> Of course, it will still require some optimization to be sub-second
>> (as you said), but this simple correction could do half of the job.
>
> Hi Bernard,
> unfortunately this optimization is unsound - there's no way to prove that P
> <: Comparable<P> is the same as P' <: Comparable<P'>, because of the
> f-bound. That's why the cache is there - it's not an optimization, is
> something that prevents javac from doing the same check over and over again.
>
> When such a loop occurs, the right thing to do is to say 'false', as it is
> not possible to prove that the bounds are indeed the same.

If we want to prove that "P <: Comparable<P>" is the same type as "P'
<: Comparable<P'>", we have to do something like the following lines
of Prolog:

% same(P1, P2) :- samesym(P1, P2), samebounds(comparable(P1), comparable(P2)).
% samebounds(comparable(P1), comparable(P2)) :- samesym(P1, P2),
cache(comparable(P1), comparable(P2)).

To prove 'same(P1, P2)', we have to know about 'samesym(P1, P2)' and
'cache(comparable(P1), comparable(P2))'.
Looking at the implementation, we find that 'samesym(P1, P2)' is given
by 'P1.tsym == P2.tsym' and 'cache(comparable(P1), comparable(P2))'
currently returns 'false'.

Now, we can look at the truth table for 'samesym()' and 'cache()':

% samesym(P1, P2) :- true.
% cache(comparable(P1), comparable(P2)) :- true.
%
% ?- same(P1, P2).
% true.

% samesym(P1, P2) :- true.
% cache(comparable(P1), comparable(P2)) :- false.
%
% ?- same(P1, P2).
% false.

% samesym(P1, P2) :- false.
% cache(comparable(P1), comparable(P2)) :- true.
%
% ?- same(P1, P2).
% false.

% samesym(P1, P2) :- false.
% cache(comparable(P1), comparable(P2)) :- false.
%
% ?- same(P1, P2).
% false.

We see that the only way to prove that 'same(P1, P2) = true' is to
postulate that 'cache(...) = true' which is what I suggested.

If we make the opposite hypothesis that 'cache(...) = false', which is
currently done, we will never be able to prove that 'same(P, P) =
true'... but the strict comparison would do that successfully!

What do you think (for the rest, I agree with you)?

Bernard

> Note that the
> fact that the two variables are clones is largely irrelevant here - we clone
> new tvars every time we check a new instance of the same generic method
> call/diamond constructor - so in a way these are all different variables
> which just happen to have the same name/same set of initial (declared)
> bounds.
>
> Moreover, the 'loose' type equality test is an artifact of the past - it was
> introduced because the strict behavior (which was the default in 6) was,
> well, sometimes too strict - essentially, if bounds of a type variable are
> replaced for other type variables (e.g. replace U in the bound of 'Z extends
> List<U>'), javac creates a different type-variable with the replaced bound.
> If you performed the same replacement twice on the original variable, the
> resulting replaced vars would not be deemed 'equals' by javac, which is why
> this looser behavior was added. Some more comments are added here:
>
> http://hg.openjdk.java.net/jdk7/jdk7/langtools/file/tip/src/share/classes/com/sun/tools/javac/code/Types.java#l632
>
> These changes have been added as a consequence of the fixes for:
>
> https://bugs.openjdk.java.net/browse/JDK-6729401
>
> Now, all this is relatively ancient history, which predates the Java 8
> changes - in fact, this 'legacy' loose behavior, is very problematic - it's
> completely outside the spec, and, in the general case with fbounds, it
> cannot even be proven to always complete (which is why the cache is there).
> During 8, I sidestepped this issue by adding the 'strict' type equality
> check, which should become the 'standard' - but, as I there was only so much
> code I could rewrite in one shot (:-)), I added the new routine beside the
> old one, instead of having the new routine completely replacing the old one
> (and deal with the compatibility fallout).
>
> I think the time is come that we do a bit of spring cleaning in the area,
> and get rid of the legacy (and unsound) loose check, and see where we're at.
>
> That said, speaking of inference performances, if we ever move the entire
> code base to use the strict equality check, your proposed optimization would
> not even apply - in a way your optimization is exploiting the fact that that
> particular check is using the loose check instead of the strict one - with
> the strict one, two different type var instances (no matter if they are
> clones, and even have same bounds) will still be deemed different.
>
> I still stand by my earlier analysis that the particular issue highlighted
> in 8152289 is not something that needs to be addressed by looking at the
> profiler. While you can make the bottleneck smaller and smaller, you still
> have an underlying behavior that doesn't scale with the number of nested
> calls - with increasing values of N, you do 10, 100, 1000, 10000, 1000
> ....00000 checks - so no matter how fast the rewritten logic is, you can
> always add a bunch new nested call to bring the compilation time more or
> less in time to what it was before the fix.
>
> The real fix is to find a way so that the incorporation machinery doesn't
> have to actually perform a number of steps that is exponential in the number
> of the type variables involved, and to detect patterns that can be exploited
> in order to reduce the complexity of the incorporation step.
>
> Thanks
> Maurizio