Linear implementation of Tarjan's algorithm

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Nov 3 01:58:37 UTC 2017



On 02/11/17 23:26, B. Blaser wrote:
> Of course, it will still require some optimization to be sub-second
> (as you said), but this simple correction could do half of the job.
Hi Bernard,
unfortunately this optimization is unsound - there's no way to prove 
that P <: Comparable<P> is the same as P' <: Comparable<P'>, because of 
the f-bound. That's why the cache is there - it's not an optimization, 
is something that prevents javac from doing the same check over and over 
again.

When such a loop occurs, the right thing to do is to say 'false', as it 
is not possible to prove that the bounds are indeed the same. Note that 
the fact that the two variables are clones is largely irrelevant here - 
we clone new tvars every time we check a new instance of the same 
generic method call/diamond constructor - so in a way these are all 
different variables which just happen to have the same name/same set of 
initial (declared) bounds.

Moreover, the 'loose' type equality test is an artifact of the past - it 
was introduced because the strict behavior (which was the default in 6) 
was, well, sometimes too strict - essentially, if bounds of a type 
variable are replaced for other type variables (e.g. replace U in the 
bound of 'Z extends List<U>'), javac creates a different type-variable 
with the replaced bound. If you performed the same replacement twice on 
the original variable, the resulting replaced vars would not be deemed 
'equals' by javac, which is why this looser behavior was added. Some 
more comments are added here:

http://hg.openjdk.java.net/jdk7/jdk7/langtools/file/tip/src/share/classes/com/sun/tools/javac/code/Types.java#l632

These changes have been added as a consequence of the fixes for:

https://bugs.openjdk.java.net/browse/JDK-6729401

Now, all this is relatively ancient history, which predates the Java 8 
changes - in fact, this 'legacy' loose behavior, is very problematic - 
it's completely outside the spec, and, in the general case with fbounds, 
it cannot even be proven to always complete (which is why the cache is 
there). During 8, I sidestepped this issue by adding the 'strict' type 
equality check, which should become the 'standard' - but, as I there was 
only so much code I could rewrite in one shot (:-)), I added the new 
routine beside the old one, instead of having the new routine completely 
replacing the old one (and deal with the compatibility fallout).

I think the time is come that we do a bit of spring cleaning in the 
area, and get rid of the legacy (and unsound) loose check, and see where 
we're at.

That said, speaking of inference performances, if we ever move the 
entire code base to use the strict equality check, your proposed 
optimization would not even apply - in a way your optimization is 
exploiting the fact that that particular check is using the loose check 
instead of the strict one - with the strict one, two different type var 
instances (no matter if they are clones, and even have same bounds) will 
still be deemed different.

I still stand by my earlier analysis that the particular issue 
highlighted in 8152289 is not something that needs to be addressed by 
looking at the profiler. While you can make the bottleneck smaller and 
smaller, you still have an underlying behavior that doesn't scale with 
the number of nested calls - with increasing values of N, you do 10, 
100, 1000, 10000, 1000 ....00000 checks - so no matter how fast the 
rewritten logic is, you can always add a bunch new nested call to bring 
the compilation time more or less in time to what it was before the fix.

The real fix is to find a way so that the incorporation machinery 
doesn't have to actually perform a number of steps that is exponential 
in the number of the type variables involved, and to detect patterns 
that can be exploited in order to reduce the complexity of the 
incorporation step.

Thanks
Maurizio






More information about the compiler-dev mailing list