[lworld] RFR: JDK-8293321: [lworld] substitutability test shall support value class with cyclic membership

Fri Oct 28 19:33:03 UTC 2022

On Fri, 28 Oct 2022 17:15:19 GMT, Mandy Chung <mchung at openjdk.org> wrote:

> L-type allows cycles. `acmp` shall support value classes with cyclic membership.  This PR adds the support in the Java implementation of `acmp` (i.e. `ValueObjectMethods::isSubstitutable`). 
> 
> This patch includes [John's first cut to make a Y combinator method handle](https://github.com/openjdk/jdk/pull/10155) to enable a method handle to call itself recursively.    PR 10155 is adequate for internal use for now.  A separate RFE will follow to define such an API.
> 
> A separate issue [JDK-8296056](https://bugs.openjdk.org/browse/JDK-8296056) tracks the C2 `acmp` support.

Two values `x`, `y` are substitutable if and only if, for every access path `x.a.b.c`, `y.a.b.c` also exists (and vice versa), and moreover the two values of those two accesses are themselves substitutable.  (The call `.getClass()` is one possible path; the rest of the paths are field references, regardless of access protection.)  This definition can be simplified by stipulating that only paths `.a.b.c` which end in a non-value object (primitive, null, id-object) are significant; the equivalence is the same.

In the case of a deeply nested (probably recursively-typed) value, the number of such paths maybe exponentially large (compared with the number of object "nodes").  So a naive exploration of all paths will be infeasible.  For example, a "ladder" of binary nodes where both sides of each "rung" points to the same "ladder" of one less depth, will have 2^D complexity of enumerating all paths.

And cyclic value graphs (which must involve recursive types) have an infinite number of paths, which is even scarier.

That said, this is a known problem with known solutions.  (It is the DFA minimization problem in disguise, aka the RE equivalence problem, and also the type structural equivalence problem which the Algol designers inflicted on themselves.) In practice, in most cases, the naive algorithm will do fine.

https://www.ics.uci.edu/~goodrich/teach/cs162/notes/rm.pdf
https://cs.uwaterloo.ca/~dberry/FTP_SITE/reprints.journals.conferences/BerrySchwartz1979TypeEquiv.pdf

As we discussed offline, I suggest using a relatively native algorithm for a start, one that doesn't "remember where it has already been", but we should run a counter (of node visits), and throw `StackOverflowError` if the counter gets "unreasonably" large.  (Regardless of how many real stack frames were consumed.)   This is allowed by the JVMS, though we frown on it.  We can refine the algorithm over time.  There are lots of ideas in our heads and in the literature for making common cases work better.  By allowing SOE, this becomes a QOS rather than a correctness issue.

We might possibly come up with a solution for the general case that we are comfortable with.  (I can imagine how one would go about it, doing on-the-fly minimization moves on dual DFAs, in the course of a depth-first traversal… but it's tricky.  And the payoff is really small.)  I would want the algorithm to withstand even determined complexity attacks, from value-object graphs crafted by Evil Masterminds.  Since that's not a game I particularly want to play for very long, I think we will also keep the `StackOverflowError` option alive even when we have "smart" algorithms.  This can be tuned by a JVM switch.

For users who take umbrage at SOE when their value graphs get really big, we will have to say, "you're holding it wrong".  No decent language can completely prevent users from harming themselves via some correct but wrong-headed Turing-footgun.  I think this is such a situation.

-------------

PR: https://git.openjdk.org/valhalla/pull/802