Collapsing the requirements

Tue Aug 6 21:03:09 UTC 2019

Good discussion!

On Aug 6, 2019, at 9:50 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> So, legal signatures will be:
>>  - QV;
>>  - LI;
>> and that’s it, right?
>> 
>> Q will continue to have its current semantic (flattenable, non-nullable, triggers pre/eager-loading).
>> L will continue to have its legacy semantic (indirection, nullable, no new loading rules)
> 
> Correct.  Nice and simple!  

Not completely simple.  The old contract of LV; will haunt us slightly.  Remember that LG; is a valid descriptor, for any garbage name G even if G doesn’t exist.  (E.g., “Lno/such/package/or/type!!;”.)

You can’t find all such LG;.  Therefore, LV; must be allowed as a possibility, on the same footing as LG;.

Note that reflecting over LG; will get a CNFE.  And the verifier will make only limited accommodation for such types, in effect allowing only “null” into such variables.

There’s nothing to be gained by trying to make the rules against LV; more strict than those for LG;.  Therefore, the interpretation of LV; should be “as if” the string V in that descriptor were truly a non-existing type, to be diagnosed at all the same times that any other LG; would be checked and diagnosed.

> 
>> 
>>> 
>>> Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V.
>> 
>> Optimizing I might require some knowledge about V, but because V <: I, I could be loaded while V is not loaded yet.
> 
> If the rule is “always preload Q” (which I think is what John is suggesting), then this case cannot come up, because I’s class file will mention QV.  Similarly, the opposite case does not happen either, as we load super types first, so loading V will trigger loading I.  

Yup.  The only truly lazy scenario would be when some API uses only the LI; type, as a descriptor not a  CONSTANT_Class.  Then the normal contract for L-descriptors applies:  I.class isn’t loaded until there’s some specific need for I (as in a CONSTANT_MethodType).

That is pleasingly similar to the situation with today’s primitives and their wrappers:  “I” is hardwired but “java/lang/Integer” is not hardwired to the same degree (the verifier doesn’t have to load it always, for example).

> Of course, we can twiddle these rules and get different answers, but this is my understanding based on the rules I have heard for load order.
> 
>> 
>>> 
>>> What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I.  If this is important (and maybe it’s not), we can handle this in other ways.
>> 
>> This is related to an open question that shows up in many places in this document.
>> What should be the nature of V’s super type? An interface or an abstract class?
>> If it is an abstract class, it could declare and access the fields.
>> The question expands further than just fields, what’s about methods’ bodies?

Yes, these are interesting questions.  One thing that makes me happier about this model is the fact that several of the possible answers require no new JVM functionality, but are simply translation strategy decisions.

At the moment, I personally prefer the idea (out of several possible ideas) of keeping all concrete functionality inside the inline class V, and lift only API surface into I as (i) abstract methods, (ii) supers, and (iii) type variables, and further to do this lifting “the old fashioned way” by requiring javac to do the copying at compile time.  This is good enough to kick off experimentation with the resulting user model, IMO, if not in LW10 then in LW5 (if we need margin for adjustment).

Indeed, after that many questions follow, about fields, static methods, the role (if any) of non-interface supers such as ValObject (if not an interface), the possible role of covariance (or not) on V<:I within the V/I APIs, nested classes of V, type inference rules for V and I, support for user customization of I, alternative patterns other than I=V.Box, JVM or JLS support for defining various bits of the pattern, and so on.  (I’m sure I missed something!)  But simply copying the (public!) methods into an otherwise-empty I.class (as abstracts, plus supers & typevars) seems a great first cut to me.

>> Should they be in V or in I? This has an impact on the type of ’this’ in these
>> methods, even if this model has the nice property that ’this’ will always point
>> to an instance of V (as long as the JVM protects the model, and prevents external
>> forces (JVMTI, Unsafe, etc.) from breaking the special and unique relationship
>> between I and V). And the type of ’this’ will also impact the way methods are
>> invoked (invokevirtual vs invokeinterface).
> 
> There’s a longer discussion to be had about bringing abstract classes and interfaces closer together, or allowing abstract class super types of values, and if so, how.  I have some vague ideas of how the VM and language could handle this combination; rather than dive into that now, I’ll just say that here are the places where the concept of inline-extends-abstract-class has come up:
> 
>  - Migrating VBC to inline classes
>  - Inline records (as there is an abstract Record super type)
>  - Whether ValObject is an interface or an abstract class
> 
> Which is to say, we should untangle this knot, which I think is pretty closely related to the RefObject/ValObject knot, so I would think it is best to untangle them together.

+1  I think there are several ways forward on this front, and we can pick a good one.

> 
>> 
>>> 
>>> #### With sugar on top, please
>>> 
>>> We can provide syntax sugar (please, let’s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted.  
>> 
>> Does the interface only declares public methods, or does it also provide the implementation (default method)?
> 
> If we extract an interface from the class mechanically, we would lift the public methods, the super types, and the type variables to the interface.  If the user writes the interface by hand, they will do what they’re going to do.

+1; a good first cut and maybe even the last cut.

> 
>> 
>>> For sake of exposition, let’s say this is called `C.Box` — and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.)  
>> 
>> Is it a new feature? Or just an idea how it could be implemented in the future?
>> 
>> Because I’ve tried to compile this:
>> 
>> public class C implements C.Box {
>>    static public interface Box {
>> 
>>    }
>> }
> 
> Yes, we would have to address this.  The cycle here is not a real cycle, in that Box does not depend on C for anything, except it happens to live there.  

As the author of that particular restriction I would support lifting it, at least in the case of interfaces, and probably also of any “static” nested class.  The proposed inheritance would be ill-founded if the outer were to extend a non-static inner, which is why it’s a restriction in the first place, but I widened it to a simpler rule out of an abundance of caution.  Time to change it.

> 
>>> #### Boxing conversion
>>> 
>>> Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null — which is the relationship boxes have with their corresponding primitives.)  The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast.
>> 
>> Could we assume that boxing/unboxing would be handled by the static compiler (like primitive boxing today),
>> and there’s no expectation that the JVM will do magic boxing when needed? (Not considering auto-bridges yet).
> 
> Yes.  In fact, we only need this in one direction; since C <: I, the conversion C -> I comes for free (scbtyping), it is only the conversion I -> C that would require an unboxing conversion.  The compiler would introduce the necessary casts (which the VM can optimize to null checks.)  

It’s less than the full boxing/unboxing pattern, since “boxing” is subsumed by simple widening to a super.  Also, “unboxing” is just a cast (narrowing to a sub).  We might need a new term to express this hybrid between full-on “unboxing” and a plain casting conversion, so the JLS can say “unboxing and devoxing” (or whatever) wherever today’s unboxing comes into play.

> 
>>> 
>>> The world is indeed full of existing utterances of `LOptional`, and they will still want to work.  Fortunately, Optional follows the rules for being a value-based class.  We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation.  Now, existing code just works (source and binary) — and optionals are values.  But, this isn’t good enough; existing variables of type Optional are not flattened.
>> 
>> Notable difference with previous statements: here the eclair is made of an inline class and an abstract class
>> (instead of an inline class and an interface). I assume this is for backward compatibility (Optional’s methods
>> are currently invoked using invokevirtual and not invokeinterface).
> 
> Correct.  There are multiple ways to handle this.  One is to allow eclairs with abstract classes; another is to blur the distinction between abstract class and interface so that we can make Optional an interface and support the invoke virtual callsites in the wild.  I think I prefer the former, but once we start to untangle the ValObject/RefObject knot, I suspect we’ll know more.

My long-term wish list of JVM cleanups already includes deprecating invokeinterface and upgrading invokevirtual to cover its job.  This is a fine time to think about doing that.

> 
>> Having V’s super type be an abstract class, some additional issues have to be considered.
>> If both V and V’s super are classes (abstract or not), they both can declare fields, so they
>> could end up having different layouts. Even if javac checks against that, manually crafted
>> class files and instrumentation frameworks injecting fields (with redefineClass) could create
>> situations where a mismatch exists between V and V’s super.
>> Would this cause issues? Should the JVM guard against that? To be investigated.

Indeed.  My thought here is that fields inherited into an inline type would be completely taken over by the inline type; the layout of the abstract super would *not* be reused, so there *would* be mismatches between V and its super.  We’d have to distinguish carefully between uses of fields inside an inline instance (which are always “full custom”) and fields inside a classic “identity” (indirect) instance, which are always set inside the super and inherited as a full layout.  Unsafe field offsets would be subject to restrictions:  You can always use them on the declaring class if it’s concrete, but if it a field is inherited into an inline you somehow have to determine the field offset relative to the particular inline class.  These restrictions apply to numeric offsets.  For symbolic references the problem is probably not so bad.  We can probably mandate that a symbolic reference to an inherited field must mention the inline type using the field, not the abstract declaring it; a similar effect is already obtained by the rules of protected fields.  Maybe we get some useful leverage from mandating that all fields inherited into an inline are protected?  Just brainstorming here… As Brian says, there are details to work out.

I’d be happy to exclude fields for now and have abstract superclasses define only behavior and statics, not instance state.  Or, if the problem is confined to just ValObject and methods of the Object protocol, I’d be OK with making ValObject be an interface, *but* a special one that can hold methods for the Object protocol (which is forbidden for most interfaces), and maybe also final methods (which is also forbidden for interfaces).  Like I say, we have multiple options.

> 
> It would have to be worked out.  I think John said something like “let the language guard against inline value classes extending inappropriate abstract classes, and if the VM sees inline class extend an abstract class, ignore the fields and the ctors.  This is probably a reasonable first-order-approximation if we decide to go this route.

Yeah; if there are no fields then I think we can make a structural rule that the abstract’s constructor is just as free of behavior as that of an interface.  The JVM can verify that it is a bare call to Object.<init>()V or whatever is next up the chain, and javac would forbid constructors to be coded.

If there are fields then there are complications to work out…  But I’ll stop brainstorming/ratholing now, since there’s more important stuff to consider.

> 
>> 
>>> There are a few ways to get there.  One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.)  I don’t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn’t a hard requirement.
>> 
>> Would (b) be applied to non-static inner inline classes, or are they definitively considered as a lost cause?
>> Currently they can throw a NPE which is not so bad after all.
> 
> Depends on how early we can guarantee that NPE.  If the class might do a bunch of side effects before hitting the dereference of the outer pointer, then we might leave things in an inconsistent state.  If we can fail faster, that is good.  This area definitely needs investigation.

Suppose C.IV is an inner inline class of C, in which both C.this and IV.this are in scope.  It might be OK if IV.default is NPE-happy *if* we can make it harder to observe.  There are lots of things we could try to do this, but perhaps the simplest thing to do is take Remi’s remedy, to confine such types to a non-public role.  A companion interface could be placed into the public API of C, as a replacement, and C would be free to pass around either null or valid instances of C.IV (but not IV.default, which C would avoid).  Since IV would be non-public, "nobody but family” would be making arrays or fields of type IV.  Maybe that’s enough; I think it’s worth an experiment.  Maybe more specific tactics would work also, such as having the JLS forbid uninitialized fields and array construction of type C.IV *outside of C’s nest*.  The JVM would allow such things, and they’s have NPE risks, but non-family would be firmly discouraged by the language from declaring variables that initialize to IV.default.  An explicit mention of IV.default is probably also to be discouraged; if C wants to export a constant that exposes this NPE-risky value, that’s the business of C’s author, but the language can forbid it outside of C’s nest.  Lots to talk about here, but we have (as I say) multiple options that seem OK.

FTR, I think Remi’s remedy (of confining inlines to non-public) is a little too restrictive for inlines in general, although maybe it’s a conservative thing to try in LW5, when we are running user model experiments.  If we want to start using inlines as “new numerics” (B-float, etc.) it’s really unfriendly to require users to encounter them only via their companion interfaces.

> 
>> The model looks promising, but a more precise specification of eclairs would be helpful
>> to estimate the impact on the JVM:
>> 
>>  - What is the nature of V’s super?

Low impact option for JVM:  Just an interface for starters, to be adjusted for ValObject later as needed.

>>  - How fields/methods are declared/implemented between V and V’s super? 

Low impact:  javac takes responsibility for copying stuff up from V to V’s super.  JVM just reads the class file.

>>  - Is there any special requirements regarding static members between V and V’s super?

Low impact:  JVM just does what it’s told according to whatever is in V.class and I.class, using standard rules.

>>  - Is there a requirement that V and V’s super share the bodies of their non-static public methods?

Low impact:  javac plants abstract methods inside I.class and the JVM does what it’s told.  (Javac is handling the Mirandas this time.)

> Good questions, I hope to have answers eventually.  If you have preferred answers, please share your thinking

(Done for my part, see above!)

— John