We don't need no stinkin' Q descriptors

Fri Jun 30 23:57:54 UTC 2023

This is a major step forward.  I have three sets of comments overall.

# Looking back…

First a bit of history here, as I recall it:  When we started working on
VM support for Valhalla, I remember very early conversations (2015-ish),
involving Brian, Mark R., Guy S., Doug L., Dan H., Karen K., Dan S., 
etc.,
in Burlington MA and Ottawa (IBM).  During these conversations we had
all manner of crazy ideas (as we still do today, TBH), including ornate
new syntaxes for descriptors.  Brian made the point that we should pick
just one L-like descriptor to describe the new flavor of data, and so
Q was born.  Brian further said something to this effect, IIRC:
“We won’t necessarily keep the Q forever, but it will help us, 
during
prototyping, to clearly map all of the places where value-ness needs
to be tracked.”  I remember thinking, “OK, but we’ll never get rid
of it; it’s too obviously correct.”  One result of this was we were
able to define everything about values in the VM more crisply and 
clearly.

Another result was yearly struggle sessions about how we were ever going
to handle migration of Optional, LocalDate, etc.  I’m surprised and 
glad
that we have come to a place of maximum erasure, where (a) all the 
places
where Q-ness needs mapping have been mapped, and yet (b) there is now no
remaining migration problem (despite no investment in high-tech API
bridges).

Along the way Dan S. started quietly talking about Type Restrictions,
which seemed (at first) to be some high-tech ceremony for stuff that 
could
just as easily be spelled with Q’s.  I’m glad he persisted, because 
now
they seem to be the right-sized bucket in which to place the Q-signals,
after Q’s go away.

So, although I am wistful to see that clarity of Q’s go, it is more 
with
nostalgia than regret.  We have the clarity they bought us.  And (bonus)
they seem to dovetail with the next giant task of Valhalla, which is
coping with generic data structure specialization (|List<int>|).

## Avoiding the slippery slope

Next, I want to point out that part of the trick of doing this well is
not doing too much all at once.  It’s not straightforward.  Our newly
won insights make it clear that we could do for |String!| what we
propose for |Point!|, but if we take such incremental RFEs as they
occur to us we will, in fact, be falling down a slippery slope towards
a Big Bang of VM functionality that gets deferred further and further.
(A Big Crunch would be a more likely outcome, frankly.  Happily, we
have learned to deliver incrementally, yes?)  I would like to restate
from Brian’s proposal a guiding principle to keep us off the slippery
slope, until such time as we agree to take the next steps downward.

I think one key principle here is to embrace erasure, and hide the
presence of new refinement types from legacy code.  (A nit:  We should
pick a phrase and stick with it.  “Type refinements” or “refined 
types”
are fine phrases, but it’s not clear they are exact synonyms with
“refinement types”.  Rather arbitrarily, I prefer “refinement 
type”,
perhaps because it point to two realities:  It’s a type, and there
was a refinement decision made.

This takes me to the following specific points about the Big Three use 
cases:

  - Field declaration - The refinement type can only be of the form 
|B3!|

  - Array creation - The component type can only be of the form |B3!|

  - Casting - The cast type can be either |B3!| or |B3![]|

I think Brian covered all that, except for the following lines, which I
think are a mis-step (down that slope I mentioned):

> It is a free choice as to whether we want to translate a field of type
> |Point![]| using an array refinement or fully erase it to |Point[]|.

If we support |multianewarray| then it must take a CP reference to 
|B3![]|.
But I don’t think that pulls its weight, so let’s not.

Why does |checkcast| get extra powers not enjoyed by the other two use
cases?  I think the answer is pretty simple:  |checkcast| is the last
resort for a Java compiler’s T.S. (translation strategy); if some type
cannot be represented on a VM container (and enforced by the verifier)
then either it cannot be safely cast (leading to “unchecked” 
warnings)
or else it must be dynamically checked (requiring a |checkcast|).

The case for an effective cast of the form |(Point![])a| is perhaps
less obvious, but it it very useful (from my VM perspective) to
let the programmer use it to communicate flattening intentions
outside of a loop, before the individual |Point| values are read or
written.  So the T.S. puts a dynamic check on an initialized |Point![]|
variable and then all the downstream code can “know” that flat 
access
is being performed.  Note that this design pattern works great for
multi-dimensional arrays (at source level), except that the type
|Point![][]| is uncheckable.  I’m not sure how to explain this gap
to users, but the VM-level reality is that the optimizations for
flat access care only about arrays of dimension one, so I’m happy
the gap is there.  I hope we won’t be forced to fill it, because that
will cause a large set of new compliance tests and a bug tail.

```
Point![][] a2d = …;    // T.S. cannot put checkcast on a2d, I hope
for (var a1d : a2d) {  // T.S. puts checkcast on each a1d, I hope
   for (var x : a1d) {
     … process x …
   }
}
```

Another likely use of a |checkcast| of a both kinds of type is when
the T.S. emits code to load from a field with of type |B3!| or
|B3![]|.

```
class C { B3! x; B3![] a; }
…
C c = …;
var x = c.x;  // T.S. could put a checkcast here, if it helps
var a = c.a;  // ditto
c.x = x.nextX();  // T.S. is very likely to put a checkcast here
c.a = Arrays.copyOf(a);  // ditto
```

Exactly where to put each |checkcast| (and where not to bother)
is an interesting question; perhaps it’s too much work to place
them on every read of a field.  (I think it’s a good idea, because
redundant checks are free in the VM and earlier checks are better
than later ones.)  But it seems very likely that at least field
writes will benefit from checkcasts, for all types that are
representable.  And, note that type of `new B3![]` is representable.
Its class will be `B3[].class`, but its representable type
will be something like `NullRestrictedArray.of(B3.class)`.

## Healing the rift?

One goal that is being held loosely at this moment is the old
promise of Valhalla to “heal the rift” between |int| and |Integer|.
(More generally, between primitives and references.)  We’ve come
this far, are we going to give up on that goal now?

I don’t think there is a rift-healing move we could do with field
declarations, since flat |int| fields are already fully supported.

But there is definitely a slippery slope here.  Should |int[]|
be a subtype of |Object[]|?  I think that also would be required.
I would like to do this, if possible.

(There is no cause to ask that |int|, which isn’t even a reference
type, should somehow be made to look like a subtype of |Integer|.)

I guess Dan S. is tracking these issues; I don’t recall them being
discussed recently, but maybe they will ripen after we get closure
on the bigger questions about Q.

(I suppose Brian’s mention of lifting |cast| is why I’m getting into
the question of “healing rift” at all.  Pulling on that string 
brings
us to the that rift, IMO.)

  - Leave it as is.  Sorry if you accidentally used |int.class|.
  - Enhance |int.class::cast| (and |isInstance|) to check for |Integer|.
  - Deem |cast| not liftable; make a new |RepresentableType::checkType|
    (and |isType|), and have it be total over B1/B2/B3 and primitives 
(B0??).

But I think I prefer the last.  In any case, I don’t look forward to a
widening rift between primitives (B0!) and the other types.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20230630/2950b665/attachment-0001.htm>