Feedback: Using pattern context to make type patterns more consistent and manage nulls

Brian Goetz brian.goetz at oracle.com
Mon Jan 25 12:46:36 UTC 2021



> I'm not concerned
> with null, but with basic code readability and consistency.


The great thing about being motivated by consistency is that you get to 
pick what you find it most important to be consistent with :)  In a 
complex world, there are always going to be preexisting 
"inconsistencies", which means you always have a choice of what to 
choose to be consistent with.  For example, the null-handling treatment 
of instanceof and cast are different, which one could call 
"inconsistent", except that ... it is right.  Consistency is a good 
guiding principles, and gratuitous inconsistency is surely bad, but it 
is not necessarily the highest good -- nor necessarily even a 
well-defined concept.

In any case, your concern has to be at least a little bit about null, 
because ... that's the only thing that is varying here; no one disagrees 
that `String s` should match all non-null instances of String.  You are 
suggesting type patterns never match null, so they can be "consistent" 
with how the `instanceof` bytecode works. There's nothing wrong with 
that particular preference of "things to be consistent with", but its 
not the only choice, and it has costs.

For the record, here are the consistencies we've chosen:
  - var should consistently only mean "type inference", so users can 
orthogonally choose whether to use manifest or inferred types, according 
to what they find more readable, and freely switch back and forth, and ..
  - a nested patterns `x matches P(Q)` should mean nothing more than 
P(var alpha) && alpha matches Q.

When you make these two choices, the semantics we have pretty much write 
themselves.

Of course, just because the semantics derive from consistent and 
principled choices, doesn't mean that it results in a good programming 
model; we have to validate that the consequences of these choices leads 
to the programs we actually want.  I think your main concern here is 
that you are worried this is not the case, but we should let actual code 
be our guide here.

It's important to realize that the typical "type switch" examples we've 
long wanted in Java don't actually give us that much intuition for what 
typically happens with *nested* pattern switches.  When you write enough 
code using such things, you see that there are tree-shaped patterns of 
code that emerge where at each level, cases organize themselves into 
lattices, where specialized cases funnel into catch-all cases:

     case Box(Prime p):
     case Box(Even e):
     case Box(Object o):   // catch all
     ...

and that the typical pattern of how this works, while "inconsistent" 
with respect to who gets the nulls, turns out to actually be what you 
want, a great fraction of the time.  (If null is not in the domain of 
what is in the boxes, it doesn't matter; if it is, catch-all Box code 
(which the last represents) will be prepared to deal with it.)  In the 
cases where it is not what you want, you can exclude it (with guards, 
with null checks, whatever.)

Another thing to realize is that the examples with `Box(Chocolate)` are 
just simple examples.  In the real world, we'll have deconstructors with 
handfuls of bindings (as constructors do today), and a deconstruction 
pattern that reads `Foo(var a, var b, var c, var d, var e)` may not be 
quite as appealing from a readability perspective as when there's only 
one parameter and its obvious what it is.  Forcing users to choose 
between semantics and readability (whichever way they happen to want to 
go with var-vs-manifest) is not a good look.

> That is a huge inconsistency! A developer has nothing in the code to
> separate the static one from the dynamic one:
>
>   switch (box) {
>     case Box(Integer i) ...
>     case Box(Number n) ...
> }
>
> Reading this code I have no way of knowing what it does. None.
>
> If box is Box<Number> it does one thing. If box is Box<Object> it does
> something else. Sure the impact is only on null, but that is a
> secondary detail and not what is driving my concern. The key point is
> that someone reading the code can't tell what branch the code will
> take, and can get a different outcome for two identical patterns in
> different parts of the codebase.

I get that this seems different and scary when it is all theoretical and 
we're extrapolating from almost no examples.  Write some code with it, 
though, and I think you'll find it is surprisingly natural, and the 
things you are worried about don't happen remotely as often as you are 
scared they will.  And, the alternatives are far worse. We could set 
`var` on fire (really not such a good deal), or we could invent multiple 
kinds of type patterns, one nullable, and one not (a lot of added 
complexity, just so users can spend more energy focusing on low-level 
corner cases than they really ever want), or we could tinker with the 
semantics so that the razor blades are hidden in even less expected 
places, or we could add `T!` type patterns but not have `T!` in the 
general type system (imagine the rants about inconsistency then.)  
Having a simple set of rules derived from a small number of clear 
principles goes a long way towards helping people reason about the 
complexity that we can't make go away.

> If you can find an alternative to using `var` in the way I propose
> that is fine by me. As I pointed out in my last email, the situations
> where there is a conflict to resolve are relatively rare, because best
> practice is to use `var` for the final case anyway.

I agree that will be common in the obvious cases.   (In which case, the 
things you are worried about will happen even less often.)  But as 
mentioned above, I'm skeptical that this will actually be a "best 
practice" when there are many bindings and its not completely obvious 
what the types are.  This is something users should get to choose for 
themselves.

> As it stands, the proposal will never be acceptable to me because it
> fails the code readability test - premature optimization by using the
> static context means the code doesn't do what it says it does.

These rules, for all the parts you don't like, are simple; I have great 
faith that you will learn how things work and how to read the patterns 
of code that typically emerge.  (You might even like it, after writing 
some actual code with it, but regardless, I don't believe you, or any 
other Java developer, is incapable of understanding what the code says.

We see examples of this all over the place in Java, such as:

  - Method overloading.  How do I know which overload of x.foo() I am 
calling, or which overload X::foo refers to?  A: when it's not obvious, 
ask the IDE, or look it up in the Javadoc.
  - Type inference.  How do I know what types are being inferred for 
generic method calls, or what gets inferred for `var`?  A: when it's not 
obvious, ask the IDE (or, for masochists, work it out yourself), and 
then, put explicit witnesses in the code so other readers can see.




More information about the amber-dev mailing list