Reference-default style

Fri Feb 7 17:34:48 UTC 2020

I want to combine this discussion with the question about whether inline 
classes can extend abstract classes, and with the "reference projection" 
story in general.

Our initial inclination was to say "no abstract class" supertypes 
(largely on the basis that interfaces are a cleaner way to extend 
contracts) and therefore that the reference projection would always be 
an interface.  This felt clean, but opened up a pile of questions, since 
there are all sorts of things that interfaces can't express (such as 
package-private methods.)

For example, here's the sort of challenge this generates. Suppose we 
want to translate:

     inline class C<T> implements I { /* methods */ }

as

     // TRANSLATION A
     ACC_INLINE class C<T> implements C$ref<T>, I, InlineObject {
         /* methods */
     }

     sealed interface C$ref<T> extends I permits C {
         /* abstract versions of the public methods */
     }

where we lift the public methods onto the interface.  But what about the 
non-public methods?  We can't represent them in an interface.  Well, one 
trick we have in our bag is that, when we encounter the use of the ref 
projection as the receiver `ref.m()`, the compiler can cast it to the 
concrete type and invoke on that: `((C) ref).m()`.  This is safe because 
a value of the ref projection is either a value of the concrete class, 
or null; in both cases, the two yield identical results.  But, that's 
kind of yucky, not because the trick is inherently yucky, but because 
we're only doing it some of the time, which seems likely to cause other 
problems down the road.  (Also, this trick depends on the inline and 
reference classes being equally accessible. That's not the case in the 
current draft, but there's a way to get there, let's put a pin in that.)

If we assume that, _at least at the VM level_, inline classes can extend 
suitable abstract classes, this offers us an alternate translation 
story: the reference projection is an abstract class (one that is still 
sealed only to permit the inline class.)  Then this anomaly goes away, 
which is nice:

// TRANSLATION B
     ACC_INLINE class C<T> extends C$ref<T> implements I, InlineObject {
         /* methods */
     }

     sealed abstract class C$ref<T> implements I permits C {
         /* abstract versions of the methods */
     }

This avoids the asymmetry, but now, here comes John to say "Yuck, you're 
repeating yourself.  Why lift the methods to the interface at all?"  
Which brings us to the philosophical question Dan raises, which is: what 
is the reference projection, really?  Is it a fundamental part of the 
language model, or something that is just a convenience, to be produced 
by compiler sugar and/or design patterns?

The reference projection is largely a _language fiction_, like generics 
or checked exceptions.  (At the VM level, `QC <: LObject`; this is what 
makes our inline narrowing and widening conversions cheap.)  The more we 
can make these two types work together, the better.  (Sealing helps; 
nestmates help; we can do more.)  John would say: let's make the 
abstract class _empty_:

     // TRANSLATION C
     sealed abstract class C$ref<T> implements I permits C {
         /* no methods */
     }

and then handle _all_ uses of C$ref as a receiver by casting to C.  This 
is more uniform, and therefore less error-prone -- and is more in line 
(heh, inline) with its role as a language fiction.

Let's put this down for a minute and talk about reference-default.  Dan 
makes the argument that migration is not the only reason why we might 
want reference-default inline classes.  In the model currently on the 
table, we have an uneasy combination of "if you want inline default, you 
get nice language sugar, if you want reference-default, you code it 
yourself and the language tries to guess at what you mean."  Dan's (3), 
which lets you declare inline classes one way and pick which way the 
"good name" goes seems a lot simpler:

     // C and C.val refer to the same class, C
     // compiler generates C and C$ref
     /* val-default */ inline class C { }

     // D and D.ref refer to the same class, D
// compiler generates D and D$val
ref-default inline class D { }

This allows us to bring the reference projection closer to the inline 
class uniformly, always generating them both.  This eliminates potential 
anomalies and guesswork, and also eliminates any chance that the ref and 
val flavors could have different accessibilities, allowing the cast 
trick to work uniformly.  This seems a significant consolidation; every 
inline class has a ref projection; the ref projection is always 
generated at the same time as the val; we can safely assume useful 
relationships between them.  (It is worth noting that migrating between 
ref-default and val-default will not be a compatible move, so choose 
wisely.)

OK, now let's come back to translation.  A primary use case, but not the 
only, for ref-default is migration. In this case, we know that there 
will be classfiles out there that do `invokevirtual C.foo()`.  And under 
translation (C), these won't work.  We can refine our translation 
accordingly, where C is a val-default inline class and D is ref-default:

     // TRANSLATION D
ACC_INLINE class C<T> extends C$ref<T> implements I, InlineObject {
         /* fields and methods */
     }

     sealed abstract class C$ref<T> implements I permits C {
         /* nothing */
     }

ACC_INLINE class D$val<T> extends D<T> implements I, InlineObject {
         /* fields, but no methods */
     }

     sealed abstract class C$ref<T> implements I permits C {
         /* methods (which cast D to D$val internally to access fields) */
     }

This means that the fields will always be on the val class, and the 
methods will always be on the _default_ projection.

Alternately, we can pick translation (E), where we _always_ put the 
fields on the val class and _always_ put the methods on the reference 
projection, and the methods are just inherited by the val projection.  
Or, translation (F), where for ref-default we duplicate the methods onto 
the ref projection.

So, summary:

  - Yes, we should figure out how to support abstract class supertypes 
of inline classes, if only at the VM level;
  - There should be one way to declare an inline class, with a modifier 
saying which projection gets the good name;
  - Both the ref and val projections should have the same accessibility, 
in part so that the compiler can freely use inline widening/narrowing as 
convenient;
  - We would prefer to avoid duplication of the methods on both 
projections, where possible;
  - The migration case requires that, for ref-default inline classes, we 
translate so that the methods appear on the ref projection.

On 12/20/2019 3:04 PM, Brian Goetz wrote:
>
>> 1) As a design pattern
>
> This was the strawman starting point, shortly after the JVMLS meeting, 
> which kicked off the "eclair" notion.  While this one seems like "the 
> simplest thing that could work", it strikes me as too simple.
>
> When some version of this approach was floated much earlier, Stephen 
> commented "I'm not looking forward to making up new names for the 
> inline flavor of LocalDateTime and friends."  I share this concern, 
> but 100x so on behalf of the clients -- I don't want to force clients 
> to have to keep a mental database of "what is the inline flavor of 
> this called."  So I think its basically a forced move that there is 
> some mechanical way to say "the other flavor of T".
>
> <syntax-digression>
> Several folks have come out vocally in favor of the Foo / foo naming 
> convention, which could conceivably satisfy this requirement.  But, I 
> see this as a move we will likely come to regret.  (Among other 
> things, there goes our source of conditional keywords, forever.  On 
> its own, that's a lot of damage to the future evolution of the language.)
> </syntax-digression>
>
> The "mechanical way to describe the reference 
> companion/projection/pair/whatever" becomes even stronger when we get 
> to specialized generics, as we'll need to be able to say `T.ref` for a 
> type variable `T` (this is, for example, the return type of 
> `Map::get`.)  The other direction is plausible too (when `T extends 
> InlineObject`), though I don't have compelling examples of this in 
> mind right now, so its possible that this is only a one-way requirement.
>
>> 2) As an "advanced" feature of inline classes
>>
>> This is the State of Valhalla strategy: inline classes are designed 
>> to be inline-default, but as a special-case feature, you can also 
>> declare the 'Foo.ref' interface, give it a name, and wire it up to 
>> the inline class declaration.
>>
>> In reference-default style, the programmer gives the "good name" to 
>> the reference projection, and either gives an alternate name to the 
>> inline class or is able to elide it entirely (in that case, clients 
>> use 'Foo.inline').
>>
>> Ways this is different than (1):
>> - The 'Foo.inline' type operator
>> - Implicit conversions (although sealed types can get us there in (1))
>> - There are two types, not three (and two JVM classes, not three)
>> - Opportunities for "boilerplate reduction" in the two declarations
>
> Much of the generality of (2) comes from the goals of migrating 
> primitives to just be declared classes, while retaining the spelling 
> `Integer` for the ref projection, and not having _two_ box types. If 
> we're willing to special-case the primitives, then we may be able to 
> do better here.
>
>> 3) As an equal partner with inline-default
>>
>> An inline class declaration introduces two types, an inline type and 
>> a reference type. But a modifier on the declaration determines 
>> whether the "good name" goes to the inline type or the reference 
>> type. The other type can be derived using an operator ('Foo.ref' or 
>> 'Foo.inline'). There's never a need for an alternate name.
>>
>> In this case, the language isn't biased to one style or the other; 
>> each declaration picks one. The trade-off is that clients need to 
>> keep track of one more bit when thinking about the inline class ("Is 
>> this a *foo* inline class or a *bar* inline class?" Actual 
>> terminology to be bikeshedded...)
>
> In a previous iteration, we had an LV/QV duality at the VM level, 
> which corresponded to a null-default/zero-default duality at the 
> language level.  We hated both of these (too much complexity for too 
> little gain), so we ditched them.  What you're proposing is to 
> reintroduce a new duality, `ref-default` vs `inline-default`, which 
> would arbitrate custody of "the good name".
>
> What I like about this is that _both_ `Foo.ref` and `Foo.inline` 
> become true projections from the class declaration Foo; there's no 
> "write a bunch of classes and wire up their relationship". (Though 
> some degree of special pleading and auto-wiring would be needed for 
> primitives, which seems like it is probably acceptable.)  It is a more 
> principled position, and not actually all that different in practice 
> from (2), in that the default is still inline.
>
> What I don't like is that (a) the author has to pick a polarity at 
> development time (and therefore can pick wrong), and (b) to the extent 
> ref-default is common, the client now has to maintain a mental 
> database of the polarity of every inline class, and (c) if the 
> polarity is not effectively a forced move (as in (2), where we only 
> use it for migration), switching polarities will (at least) not be 
> binary compatible.  So the early choice (made with the least 
> information) is permanent. From a user perspective, we are introducing 
> _two_ new kinds of top level abstractions; in (2), we are introducing 
> one, and leaning on interfaces/abstract classes for the other.  On the 
> other other hand, having more ref-default classes than the migrated 
> ones will make `.inline` stick out less.
>
> <super-duper-bikeshed-alert>
> Do we want to step back away from the experiment that is `inline`, and 
> go back to `Foo.ref` and `Foo.val`?  If we're looking to level the 
> playing field, giving them equally fussy/unfussy names is a leveler...
> </super-duper-bikeshed-alert>
>
>
>> 4) As the only supported style
>>
>> An inline class declaration always gives the "good name" to the 
>> reference type, and you always use an operator to get to the inline 
>> type ('Foo.inline'—but we're gonna need better syntax.)
>>
>> This one would represent a significant shift in the design center of 
>> the feature. If you want flattening everywhere, you're going to need 
>> to make liberal use of the '.inline' operator. But if you just want 
>> to declare that a bunch of your classes don't have identity, and 
>> hopefully get a cheap performance boost as a result, it's simple. The 
>> burden of learning something new is shifted to "advanced" users and 
>> APIs to whom flattening is important.
>
> I can't really see this being a winner.
>
>> Conclusion:
>>
>> I'm not ready to completely dismiss any of these designs, but my 
>> preferences at the moment are (1) and (3). Options (4) and (5) are 
>> more ambitious, discarding some of our assumptions and taking things 
>> in a different direction.
>>
>> Like many design patterns, (1) suffers from boilerplate overhead ((2) 
>> too, without some language help). It also risks some missed 
>> opportunities for optimization or language convenience, because the 
>> relationship between the inline and reference type is incidental. 
>> (I'd like to get a clearer picture of whether this really matters or 
>> not.)
>
> The main knock on (1) is that it leans on an ad-hoc convention, and to 
> the extent this convention is not universally adhered to, user 
> confusion abounds.  (Think about how many brain cycles you've spent 
> being even mildly miffed that the box for `long` is `Long` but the box 
> for `char` is `Character`.  If it's more than zero, that's a waste of 
> cycles.)
>
> I really have a hard time seeing (1) as leading where we want.
>
>> (5) feels like something fundamentally new in Java, although if you 
>> squint it's "just" a variation on name resolution. What originally 
>> prompted this idea was seeing a similar approach in attempts to 
>> introduce nullability type operators—legacy code has the "wrong" 
>> default, so you need some lightweight way to pick a different default.
> (5) could be achieved with another long-standing requests, aliased 
> imports:
>
>     import Foo.inline as Foo;
>
> Not saying that makes it better, but a lot of people sort of want 
> import to work this way anyway.
>
>