Primitives in instanceof and patterns

Brian Goetz brian.goetz at oracle.com
Fri Sep 9 18:07:41 UTC 2022


>
> The semantics you propose is not to emit a compile error but at 
> runtime to check if the value "i" is beetween Short.MIN_VALUE and 
> Short.MAX_VALUE.
>
> So there is perhaps a syntactic duality but clearly there is no 
> semantics duality.

Of course there is a semantic duality here.  Specifically, `int` and 
`short` are related by an _embedding-projection pair_.  Briefly: given 
two sets A and B (think "B" for "bigger"), an approximation metric on B 
(a complete partial ordering), and a pair of functions `e : A -> B` and 
`p : B -> A`, they form an e-p pair if (a) p . e is the identity 
function (dot is compose), and e . p produces an approximation of the 
input (according to the metric.)

The details are not critical here (though this algebraic structure shows 
up everywhere in our work if you look closely), but the point remains: 
there is an algebraic duality here.  Yes, when going in one direction, 
no runtime tests are needed; when going in the other direction, because 
it may be lossy in one direction, a runtime test is needed in that 
direction.  Just like with `instanceof String` / `case String s` today.

Anyway, I don't think you're saying what you really mean.  Let's not get 
caught up in silly arguments about what "dual" means; that won't be 
helpful.

> Moreover, the semantics you propose is not aligned with the concept of 
> data oriented programming which says that the data are more important 
> than the code so that we should try to raise a compile error when the 
> data changed to help the developer to change the code.
>
> If we take a simple example
>   record Point(int x, int y) { }
>   Point point = ...
>   switch(point) {
>    case Point(int i, int j) -> ...
>    ...
>   }
>
> let say know that we change Point to use longs
>   record Point(long x, long y) { }
>
> With the semantics you propose, the code still compile but the pattern 
> is now transformed to a partial pattern that will not match all Points 
> but only the ones with x and y in between Integer.MIN_VALUE and 
> Integer.MAX_VALUE.

This is an extraneous argument; if you change the declaration of Point 
to take two Strings, of course all the use sites will change their 
meaning.  Maybe they'll still compile but mean something else, maybe 
they will be errors.  Patterns are not special here; the semantics of 
nearly all language features (assignment, arithmetic, etc) will change 
when you change the type of the underlying arguments.  That the meaning 
of patterns changes also when you change the types involved is just more 
of the same.

> I believe this is exactly what Stephen Colbourne was complaining when 
> we discussed the previous iteration of this spec, the semantics of the 
> primtiive pattern change depending on the definition of the data.

I think what Stephen didn't like is that there is no syntactic 
difference between a total and partial pattern at the use site.  And I 
get why that made him uncomfortable; it's a valid concern, and one could 
imagine designing the language so that total and partial patterns look 
different.  This is one of the tradeoffs we have made; I do still think 
we picked a good one.

> The remark of Tagir about array pattern also works here, having a 
> named pattern like Short.asShort() makes the semantics far cleared 
> because it disambiguate between a a pattern that request a conversion 
> and a pattern that does a conversion because the data definition has 
> changed.

If the language didn't support primitive widening in assignment / method 
invocation context (like Golang does), and instead said "use 
Integer::toLong (or Long::fromInteger) to convert int -> long", then 
yes, the natural duality would be to also represent these as named 
patterns; then conversions in both directions are mediated by API 
points, total in one direction, partial in the other.  But that's not 
the language we have!  The language we have allows us to provide an int 
where a long is needed, and the language does the needful.  Pattern 
matching allows us to recover whether a value came from a certain type, 
even after we've lost the static type information.  Just as we can 
recover the String-ness here:

     Object o = "Foo";
     if (o instanceof String s) { ... }

because reference type patterns are willing to conditionally reverse 
reference widening, all the same arguments apply to

     long n = 3;
     if (n instanceof int i) { ... }

And not allowing this makes the language *more* complicated, because now 
some conversions are reversible and some are not, for ad-hoc reasons 
that no one will be able to understand.  Can you offer any compelling 
reason why we should be able to recover the String-ness of `o` after a 
widening, but not the int-ness of `n` after a widening?

> And i'm still worry that we are muddying the water here, instanceof is 
> about instance and subtypining relationship (hence the name), 
> extending it to cover non-instance / primitive value is very confusing.

Sorry, this is a cheap rhetorical trick; declaring words to mean what 
you want them to mean, and then pointing to that meaning as a way to 
close the argument.

Yes, saying "instanceof T is about subtyping" is a useful mental model 
*when the only types you can apply it to are those related by inclusion 
polymorphism*."  But the restriction of instanceof to reference types is 
arbitrary (and we've already decided to allow patterns in instanceof, 
which are surely not mere subtyping.)

Regardless, a better way to think about `instanceof` is that it is the 
precondition for "would a cast to this type be safe and useful."  In the 
world where we restrict to reference types, the two notions coincide.  
But the safe-cast-precondition is clearly more general (this is like the 
difference between defining the function 2^n on Z, vs on R or C; of 
course they have to agree at the integers, but the continuous 
exponential function is far more useful than the discrete one.)  
Moreover, the general mental model is just as simple: how do you know a 
cast is safe?  Ask instanceof.  What does safe mean?  No error or 
material loss of precision.

A more reasonable way to state this objection would be: "most users 
believe that `instnaceof` is purely about subtyping, and it will take 
some work to bring them around to a more general interpretation, how are 
we going to do that?"


Jumping up a level, you're throwing a lot of arguments at the wall that 
mostly come down to "I don't like this feature, so let me try and 
undermine it."  That's not a particularly helpful way to go about this, 
and none of the arguments so far have been very compelling (nor are they 
new from the last time we went around on it.)  I get that you would like 
pattern matching to have a more "surface" role in the language; that's a 
valid opinion.  But I would also like you to try harder to understand 
what we're trying to achieve and why we're pushing it deeper, and to 
respond to the substance of the proposal rather than just saying "YAGNI".

(I strongly encourage everyone to re-read JLS Ch5, and to meditate on 
*why* we have the particular conversions in the contexts we have.  
They're complex, but not arbitrary; if you listen closely to the 
specification, it sometimes whispers to you.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20220909/7cdab5ef/attachment-0001.htm>


More information about the amber-spec-observers mailing list