[External] Foo / Foo.ref is a backward default; should be Foo.val / Foo

Mon Apr 25 14:05:21 UTC 2022

tl:dr; I find pretty much everything about this compelling.  And it comes at a good time, too, because now that we’ve figured out what we can deliver, we can figure out the sensible stacking of the object model.  

As a refresher, recall that we’ve been loosely organizing classes into buckets:

Bucket 1 — good old identity classes.  

Bucket 2 — Identity classes, minus the identity.  This has some restrictions (no representational polymorphism, no mutability), but a B2 class is still a reference type.  That means it can be null (nullity is a property of references) and comes with all the existing guarantees of initialization safety (no tearing.)  This is the obvious migration target for value-based classes, and enables us to migrate things like Optional safely because we can preserve all of the intended semantics, keep the L descriptors, keep the name, handle nulls, etc.  (As it turns out, we can get more flattening than you might think out of these, even with nullity, but less than we’d ideally like.  I’ll write another mail about performance reality.)  

Bucket 3 — here’s where it gets a little fuzzier how we stack it.  Bucket 3 drops reference-ness, or more precisely, gives you the option to drop reference-ness.  (And it is referenceness that enables nullability, and prevents tearing.)  A B3 class has two types, a “val” and a “ref” type, which have a relationship to each other that is not-coincidentally similar to int/Integer.  

I think we are all happy with Bucket 2; it has a single and understandable difference from B1, with clear consequences, it supports migration, has surprisingly good flattening *on the stack*, but doesn’t yet offer all th heap flattening we might want.  I have a hard time imagining this part of the design isn’t “done”, modulo syntax.  

I think we are all still bargaining with Bucket 3, because there is a certain amount of wanting to have the cake and eat it inherent in “codes like a class, works like an int.”  Who gets “custody of the good name” is part of it, but for me, the main question is “how do we let people get more flattening without fooling themselves into thinking that there aren’t additional concurrency risks (tearing).”  

But, let’s address Kevin’s arguments about who should get custody of the good name.  

That one class gives rise to two types is already weird, and creates opportunity for people to think that one is the “real” type and one is the “hanger on.”  Unfortunately, depending on which glasses you are wearing, the relationship inverts.  We see this with int and Integer.  From a user perspective, int is usually the real type, and Integer is this weird compatibility shim.  But when you look at their class literals, for example, Integer.class is a fully functional class literal, with member lookup and operational access, but int.class is the weird compatibility shim.  The int.class literal is only useful for reflecting over descriptors with primitive types, but does none of the other things reflection does.  This should be a hint that there’s a custody battle brewing.  

In the future world, which of these declarations do we expect to see?  

    public final class Integer { … }

or

    public mumble value class int { … }

The tension is apparent here too; I think most Java developers would hope that, were we writing the world from scratch, that we’d declare the latter, and then do something to associate the compatibility shim with the real type.  (Whatever we do, we still need an Integer.class on our class path, because existing code will want to load it.)  This tension carries over into how we declare Complex; are we declaring the “box”, or are we declaring the primitive?  

Let’s state the opposing argument up front, because it was our starting point: having to say “Complex.val” for 99% of the utterances of Complex would likely be perceived as “boy those Java guys love their boilerplate” (call this the “lol java” argument for short.)  But, since then, our understanding of how this will all actually work has evolved, so it is appropriate to question whether this argument still holds the weight we thought it did at the outset.  

> 1. The option with fewer hazards should usually be the default. Users won't opt themselves into extra safety, but they will sometimes opt out of it. Here, the value type is the one that has attendant risks -- risk of a bad default value, risk of a bad torn value. We want using `Foo.val` to *feel like* cracking open the shell of a `Foo` object and using its innards directly. But if it's spelled as plain `Foo` it won't "feel like" anything at all.

Let me state it more strongly: unboxed “primitives” are less safe.  Despite all the efforts from the brain trust, the computational physics still points us towards “the default is zero, even if you don’t like that value” and “these things can tear under race, even though they resemble immutable objects, which don’t.”  The insidious thing about tearing is that it is only exhibited in subtly broken programs.  The “subtly” part is the really bad part.  So we have four broad options:

 - neuter primitives so they are always as safe as we might naively hope, which will result in either less performance or a worse programming model;
 - keep a strong programming model, but allow users to trade some safety (which non-broken programs won’t suffer for) with an explicit declaration-site and/or use-site opt-in (“.val”)
 - same, but try to educate users about the risk of tearing under data race (good luck)
 - decide the tradeoff is impossible, and keep the status quo

The previous stake in the ground was #3; you are arguing towards #2.  

> 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. But it sure looks like that `.ref` is specifically telling it NOT to be -- like it's saying "no, VM, *don't* optimize this to be a value even if you can!" That's of course not what we mean. With the change I'm proposing, `Foo.val` does make sense: it's just saying "hey runtime, while you already *might* have represented this as a value, now I'm demanding that you *definitely* do". That's a normal kind of a thing to do.

A key aspect of this is the bike shed tint; .val is not really the right indicator  given that the reference type is also a “value class”.  I think we’re comfortable giving the “value” name to the whole family of identity-free classes, which means that .val needs a new name.  Bonus points if the name connotes “having burst free of the constraints of reference-hood”: unbound, loose, exploded, compound value, etc.  And also is pretty short.  

> 3. This change would permit compatible migration of an id-less to primitive class. It's a no-op, and use sites are free to migrate to the value type if and when ready. And if they already expose the type in their API, they are free to weigh the costs/benefits of foisting an incompatible change onto *their* users. They have facilities like method deprecation to do it with. In the current plan, this all seems impossible; you would have to fix all your problematic call sites *atomically* with migrating the class.

This is one of my favorite aspects of this direction.  If you recall, you were skeptical from the outset about migrating classes in place at all; the previous stake in the ground said “well, they can migrate to value classes, but will never be able to shed their null footprint or get ultimate flattening.”  With this, we can migrate easily from VBC to B2 with no change in client code, and then _further_ have a crack at migrating to full flatness inside the implementation capsule. That’s sweet.  

> 4. It's much (much) easier on the mental model because *every (id-less) class works in the exact same way*. Some just *also* give you something extra, that's all. This pulls no rugs out from under anyone, which is very very good.
> 
> 5. The two kinds of types have always been easily distinguishable to date. The current plan would change that. But they have important differences (nullability vs. the default value chief among them) just as Long and long do, and users will need to distinguish them. For example you can spot the redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`.

It is really nice that *any* unadorned identifier is immediately recognizable as being a reference, with all that entails — initialization safety and nullity.  The “mental database” burden is lower, because Foo is always a reference, and Foo.whatever is always direct/immediate/flat/whatever.  

> 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. That is, until a casual developer *sees* `.val` for the first time, they won't have to worry about it.
> 
> 7. John seemed to like my last fruit analogy, so consider these two equivalent fruit stand signs:
> 
> a) "for $1, get one apple OR one orange . . . with every orange purchased you must also take a free apple"
> b) "apples $1 . . . optional free orange with each purchase"
> 
> Enough said I think :-)
> 
> 8. The predefined primitives would need less magic. `int` simply acts like a type alias for `Integer.val`, simple as that. This actually shows that the whole feature will be easier to learn because it works very nearly how people already know primitives to work. Contrast with: we hack it so that what would normally be called `Integer` gets called `int` and what normally gets called `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much stranger.

One more: the .getClass() anomaly goes away.  

If we have 

    mumble primitive mumble Complex { … }

    Complex.val c = …

then what do we get when we ask c for its getClass?  The physics again point us at returning Complex.ref.class, not  Complex.val.class, but under the old scheme, where the val projection gets the good name, it would seem anomalous, since we ask a val for its class and get the ref mirror.  But under the Kevin interpretation, we can say “well, the CLASS is Complex, so if you ask getClass(), you get Complex.class.“