Re: Self-type

Fri Jun 19 21:49:24 UTC 2015

I have been thinking about how this feature could be implemented while remaining cost-conscious. 

At one end would be fully unified receiver and parameter types. This would be akin to fully unified receiver and other parameters in general and could require changing method descriptors to have them include the receiver type. And if we are willing to do that, we could add generic types into the descriptor too, while we’re at it. This would be very costly -- like you mention -- but in return we might find things like fully generic overloading, unified static-instance-method dispatch and on-demand reified generics emerging at the end.

Actually, I find the question about the other end more interesting. Here the goal would be to make some currently ill-formed programs well-formed while preventing the opposite and to reuse parts of the JLS where appropriate rather than starting a rewrite. And the chosen feature should be a proper subset of the ideal solution that won’t preclude an ideal solution from being implemented at some point in the future.

Here’s what I think could be implemented. First, if the language generated receiver type and the annotation-less consumer-supplied receiver types are identical, disable the feature to retain source compatibility. At method declaration sites, reuse type checks from return types on the customized receiver types as both should be covariant.  So with additional bounds considered, the first declaration would turn out to mean the same as the second:

    default <T extends String> String join(List<T> this, String delim) {}

    default <THIS extends List<T2>, T2 extends Object & String> String join(THIS this, String delim) {}

The currently ill-formed T2 extends Object & String syntax is there to keep array component types erased to Object in overridden methods in concrete subclasses, e.g. in an ArrayList<T>.

Next in call sites, reuse type checks for normal parameter types and incorporate type bounds from the receiver, but don’t consider the receiver itself as an additional parameter for overloading. So no overloading by just the receiver, but no backwards compatibility problem either. Not a big problem as you can use differing method names as a workaround. 

Next within the method body, have the customized receiver type hide the language generated receiver type, fully or perhaps partially. Here the typing weirdness of arrays rears its ugly head in a brand new context as the bound of the same type variable can change from one instance method to the next. I don’t think there’s a general solution to this problem because the subclass could be using a generic array with its component type erased or not erased, and unfortunately the type system can’t differentiate between these cases. The best that can be done here may be to provide a few knobs to force the component type into one or the other and have API developers deal with this as they come across it.

Finally here’s what the JVM view of this might look like. As test-case I’m using a minimal ArrayList<T> extends List<T> implementation. I’m encoding the language generated receiver type into method names so I can choose which bridge methods to generate myself. First case, overriding a self-typed superclass method:

public interface List<T> extends Iterable<T> {

    @Generated(value = { "consumer" })

    <THIS extends List<T>> THIS add_List(THIS thiz, T t);

}

public class ArrayList<T> implements List<T> {

    protected @SuppressWarnings("unchecked") T[] array = (T[]) new Object[0];

    @Generated(value = { "javac" })

    public @Override <THIS extends List<T>> THIS add_List(THIS thiz, T t) {

        ArrayList<T> thiz2 = (ArrayList<T>) thiz;

        @SuppressWarnings("unchecked")

        THIS thiz3 = (THIS) add_ArrayList(thiz2, t);

        return thiz3;

    }

    @Generated(value = { "consumer" })

    public <THIS extends ArrayList<T>> THIS add_ArrayList(THIS thiz, T t) {

        assert this == thiz;

        thiz.array = Arrays.copyOf(thiz.array, thiz.array.length + 1);

        thiz.array[thiz.array.length - 1] = t;

        return thiz;

    }

}

You can see the bridge method javac would be generating. This is similar to what you get currently when overriding and using a covariant return type. Second case, not overriding a self-typed superclass method:

    @Generated(value = { "consumer" })

    default <THIS extends List<T>, T2 extends T> THIS addAll_List(THIS thiz, Iterable<T2> iterable) {

        assert this == thiz;

        THIS result = thiz;

        for(T2 item : iterable) {

            result = result.add_List(result, item);

        }

        return result;

    }

Naively we could again generate the first bridge method in the subclass and then generate another bridge method to call the superclass implementation:

    @Generated(value = { "javac" })

    public @Override <THIS extends List<T>, T2 extends T> THIS addAll_List(THIS thiz, Iterable<T2> iterable) {

        ArrayList<T> thiz2 = addAll_ArrayList((ArrayList<T>) thiz, iterable);

        @SuppressWarnings("unchecked")

        THIS thiz3 = (THIS) thiz2;

        return thiz3;

    }

    @Generated(value = { "javac" })

    public <THIS extends ArrayList<T>, T2 extends T> THIS addAll_ArrayList(THIS thiz, Iterable<T2> iterable) {

        return List.super.addAll_List(thiz, iterable);

    }

But all this really amounts to is the one unstated checkcast instruction in the second generated method. Instead we can adjust the callers to call the superclass method directly. Making use of the extra type bounds from the receiver we then add the checkcasts to the call sites. So we don’t need these two new bridge methods after all and we still get the covariant return. This sounds like a win-win to me.

It seems that the essence of self-types and even more could be implemented with what amounts to relatively minor changes to generated code within a restricted number of contexts, i.e. pretty cheaply. 

I’ve put the complete version of this code I’ve been playing it here: 

https://gist.github.com/Overruler/5a9990c609b9ad0adfe7

-- 
Have a nice day,
Timo.

Sent from Windows Mail

From: John Rose
Sent: ‎Thursday‎, ‎June‎ ‎18‎, ‎2015 ‎0‎:‎17
To: Timo Kinnunen
Cc: Alex Buckley, jdk9-dev at openjdk.java.net

On Jun 17, 2015, at 2:11 AM, Timo Kinnunen <timo.kinnunen at gmail.com> wrote:

> It seems to me that the receiver type being treated as a special case from the types of the other parameters is an unnecessary inconsistency.

I get what you are saying here, though I would say more carefully "some special rules for the receiver argument are unnecessarily different from the rules for other arguments".  Receivers will always be a little different from other arguments, because dynamic method selection introduces type shifting when an overloading is invoked, but those differences could be more carefully localized, in the Java of some parallel universe.  I would prefer we were in a world where the difference between "a.m(b)" and "m(a,b)" is little more than surface syntax, to be tweaked by tasteful library designers.

Here's my pet peeve in this vein:  Type inference rules treat receiver types as less coupled to constraints than other types; this is why some design patterns can only be written using Java statics, and not using instance methods—which (as you note here) is a pain, since the language syntax clearly favors non-static methods (for fluent APIs, as you refer to).  This irregularity weakens type inference in fluent builder expressions, when the thing being built has type variables.

But.  Pointing out the inconsistency is not even remotely close to deciding on a fix.  Making receiver types work consistently like other types would almost certainly require deep and destabilizing changes to the JLS.  Figuring out all the required changes and their impacts (on zillions of lines of code) is a long-term research project, somewhere (IMO) in complexity between inner classes and generics.

(Volunteers?)

— John