We don't need no stinkin' Q descriptors
John Rose
john.r.rose at oracle.com
Fri Jun 30 23:57:54 UTC 2023
This is a major step forward. I have three sets of comments overall.
# Looking back…
First a bit of history here, as I recall it: When we started working on
VM support for Valhalla, I remember very early conversations (2015-ish),
involving Brian, Mark R., Guy S., Doug L., Dan H., Karen K., Dan S.,
etc.,
in Burlington MA and Ottawa (IBM). During these conversations we had
all manner of crazy ideas (as we still do today, TBH), including ornate
new syntaxes for descriptors. Brian made the point that we should pick
just one L-like descriptor to describe the new flavor of data, and so
Q was born. Brian further said something to this effect, IIRC:
“We won’t necessarily keep the Q forever, but it will help us,
during
prototyping, to clearly map all of the places where value-ness needs
to be tracked.” I remember thinking, “OK, but we’ll never get rid
of it; it’s too obviously correct.” One result of this was we were
able to define everything about values in the VM more crisply and
clearly.
Another result was yearly struggle sessions about how we were ever going
to handle migration of Optional, LocalDate, etc. I’m surprised and
glad
that we have come to a place of maximum erasure, where (a) all the
places
where Q-ness needs mapping have been mapped, and yet (b) there is now no
remaining migration problem (despite no investment in high-tech API
bridges).
Along the way Dan S. started quietly talking about Type Restrictions,
which seemed (at first) to be some high-tech ceremony for stuff that
could
just as easily be spelled with Q’s. I’m glad he persisted, because
now
they seem to be the right-sized bucket in which to place the Q-signals,
after Q’s go away.
So, although I am wistful to see that clarity of Q’s go, it is more
with
nostalgia than regret. We have the clarity they bought us. And (bonus)
they seem to dovetail with the next giant task of Valhalla, which is
coping with generic data structure specialization (|List<int>|).
## Avoiding the slippery slope
Next, I want to point out that part of the trick of doing this well is
not doing too much all at once. It’s not straightforward. Our newly
won insights make it clear that we could do for |String!| what we
propose for |Point!|, but if we take such incremental RFEs as they
occur to us we will, in fact, be falling down a slippery slope towards
a Big Bang of VM functionality that gets deferred further and further.
(A Big Crunch would be a more likely outcome, frankly. Happily, we
have learned to deliver incrementally, yes?) I would like to restate
from Brian’s proposal a guiding principle to keep us off the slippery
slope, until such time as we agree to take the next steps downward.
I think one key principle here is to embrace erasure, and hide the
presence of new refinement types from legacy code. (A nit: We should
pick a phrase and stick with it. “Type refinements” or “refined
types”
are fine phrases, but it’s not clear they are exact synonyms with
“refinement types”. Rather arbitrarily, I prefer “refinement
type”,
perhaps because it point to two realities: It’s a type, and there
was a refinement decision made.
Here is a complementary principle: In the VM, we should choose to
support
exactly and only those refinement types that support Valhalla’s prime
goals,
which are data structure improvement (flattening). Since |String!|
doesn’t
(yet) have a flattening story, |String!| should not be a (VM)
representable
type. Since |Integer!| is already covered by |int|, neither should
|Integer!|
be a (VM) representable type. (A programmer may get fewer mirrors than
expected, but note that we are not adding any mirrors at all!) Although
|Point![]| is a useful specialized data structure, |Point![][]| is not
so useful; its usefulness stems from the structure of its components,
not its own top-level structure. Therefore, making a distinction
between
|Point![][]| and |Point![]![]| (and |Point![][]!| and so on) is
bookkeeping
which we would have to pay for but which wouldn’t pay us back.
This takes me to the following specific points about the Big Three use
cases:
- Field declaration - The refinement type can only be of the form
|B3!|
- Array creation - The component type can only be of the form |B3!|
- Casting - The cast type can be either |B3!| or |B3![]|
I think Brian covered all that, except for the following lines, which I
think are a mis-step (down that slope I mentioned):
> It is a free choice as to whether we want to translate a field of type
> |Point![]| using an array refinement or fully erase it to |Point[]|.
If we support |multianewarray| then it must take a CP reference to
|B3![]|.
But I don’t think that pulls its weight, so let’s not.
Why does |checkcast| get extra powers not enjoyed by the other two use
cases? I think the answer is pretty simple: |checkcast| is the last
resort for a Java compiler’s T.S. (translation strategy); if some type
cannot be represented on a VM container (and enforced by the verifier)
then either it cannot be safely cast (leading to “unchecked”
warnings)
or else it must be dynamically checked (requiring a |checkcast|).
In order for a Java cast like |(Point!)x| to be efficient, it seems
that |checkcast| should pick up the the job in one go, rather than
require the T.S. to emit first |Objects::requireNN| and then a
|checkcast|.
(Note also our self-imposed rules for avoiding library dependencies…)
And having |(Point!)x| be unchecked would be far too surprising, yes?
The case for an effective cast of the form |(Point![])a| is perhaps
less obvious, but it it very useful (from my VM perspective) to
let the programmer use it to communicate flattening intentions
outside of a loop, before the individual |Point| values are read or
written. So the T.S. puts a dynamic check on an initialized |Point![]|
variable and then all the downstream code can “know” that flat
access
is being performed. Note that this design pattern works great for
multi-dimensional arrays (at source level), except that the type
|Point![][]| is uncheckable. I’m not sure how to explain this gap
to users, but the VM-level reality is that the optimizations for
flat access care only about arrays of dimension one, so I’m happy
the gap is there. I hope we won’t be forced to fill it, because that
will cause a large set of new compliance tests and a bug tail.
```
Point![][] a2d = …; // T.S. cannot put checkcast on a2d, I hope
for (var a1d : a2d) { // T.S. puts checkcast on each a1d, I hope
for (var x : a1d) {
… process x …
}
}
```
Another likely use of a |checkcast| of a both kinds of type is when
the T.S. emits code to load from a field with of type |B3!| or
|B3![]|.
```
class C { B3! x; B3![] a; }
…
C c = …;
var x = c.x; // T.S. could put a checkcast here, if it helps
var a = c.a; // ditto
c.x = x.nextX(); // T.S. is very likely to put a checkcast here
c.a = Arrays.copyOf(a); // ditto
```
Exactly where to put each |checkcast| (and where not to bother)
is an interesting question; perhaps it’s too much work to place
them on every read of a field. (I think it’s a good idea, because
redundant checks are free in the VM and earlier checks are better
than later ones.) But it seems very likely that at least field
writes will benefit from checkcasts, for all types that are
representable. And, note that type of `new B3![]` is representable.
Its class will be `B3[].class`, but its representable type
will be something like `NullRestrictedArray.of(B3.class)`.
## Healing the rift?
One goal that is being held loosely at this moment is the old
promise of Valhalla to “heal the rift” between |int| and |Integer|.
(More generally, between primitives and references.) We’ve come
this far, are we going to give up on that goal now?
By choosing not to allow |RefinementType| mix with |Class|, we
are committing to leaving |int.class| and other primitive classes
(and |void.class|) by themselves as outliers among the “proper”
classes
and interfaces (|C.class|, |I.class|) and “array classes”
(|T[].class|).
That’s not a rift-healing move, but it doesn’t have to interfere
with
other rift-healing moves that we *could* do.
I don’t think there is a rift-healing move we could do with field
declarations, since flat |int| fields are already fully supported.
Although it is technically an incompatibility, we might consider
allowing legacy |int[]| arrays to interoperate with |Object[]|,
so that more generic code can be written. That would be close to
the spirit of allowing |B3![]| arrays be viewed transparently as
possibly-null-tolerant |B3[]| arrays.
But there is definitely a slippery slope here. Should |int[]|
be a subtype of |Object[]|? I think that also would be required.
I would like to do this, if possible.
(There is no cause to ask that |int|, which isn’t even a reference
type, should somehow be made to look like a subtype of |Integer|.)
One rift-widening move I’d like to avoid is introducing a third
representable type, between |int| and |Integer|, for the purpose
of making flat arrays of |Integer| that are not |int| arrays.
Any “value-ification” of |Integer| should avoid that trap.
Rather |Integer![]|, if it is representable at all, should be |int[]|.
I guess Dan S. is tracking these issues; I don’t recall them being
discussed recently, but maybe they will ripen after we get closure
on the bigger questions about Q.
There is another place where a “heal the rift” move might make
sense,
and that is in the API for |Class|. Brian suggests that perhaps the
|Class::cast| method could be lifted to |RepresentableType|. That
will make it easier to reflectively emulate |checkcast| instructions,
but it will give wider exposure to an existing sharp edge in the
|Class| API, which is the non-functionality of primitive mirrors.
(I suppose Brian’s mention of lifting |cast| is why I’m getting into
the question of “healing rift” at all. Pulling on that string
brings
us to the that rift, IMO.)
I mean that the call |int.class.cast(x)| does not work, and lifting
that non-behavior up to |RepresentableType| will make a new and
unwelcome distinction between |B3!| and |int|: The mirror for |B3!|
would (presumably) do a null check and cast to |B3|, while the mirror
for |int| would fail. Here are options to handle this sharpened edge:
- Leave it as is. Sorry if you accidentally used |int.class|.
- Enhance |int.class::cast| (and |isInstance|) to check for |Integer|.
- Deem |cast| not liftable; make a new |RepresentableType::checkType|
(and |isType|), and have it be total over B1/B2/B3 and primitives
(B0??).
Enhancing |int.class::cast| is arguably in the same spirit as allowing
|int[]| to be a subtype of |Object[]|.
But I think I prefer the last. In any case, I don’t look forward to a
widening rift between primitives (B0!) and the other types.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20230630/2950b665/attachment-0001.htm>
More information about the valhalla-spec-experts
mailing list