Equality for values -- new analysis, same conclusion

Thu Aug 29 12:13:20 UTC 2019

On Wed, 21 Aug 2019, 02:16 Brian Goetz, <brian.goetz at oracle.com> wrote:
>
> > (although I'm not looking forward to an alternate name or a default
> > value for LocalDate).
>
> Assuming that this is what we will have to do, it would be a useful exploration to identify sensible idioms and guidelines for how to name these classes.  Are these new ad-hoc names (Opt for the new Optional), or a mechanically derived scheme such as Optional.Inline?  Finding a less objectionable scheme here can go a long way to mitigating the sadness of having to give these things new names.

Braindump thoughts went as follows:

LocalDate.Inline
LocalDate.Val
VLocalDate
LocalDateV
LDate
LocDate

Digging beyond the braindump, there are various dimensions here to
consider which led me to a different answer. These include related
things being discussed, notably null and ==.

Do migrated inline classes have different names to highlight the
migration? Highlighting migration seems like a bad idea in the long
term. Specifically, in the long term it seems like the best name
should go to the inline class, not the box. This seems to rule out
`LocalDate.Inline` as surely we don't want to refer to `Ratio.Inline`
or `Long128.Inline` everywhere.

Should inline class names highlight their behaviour wrt null and ==? I
think this should be seriously considered. While I'm willing to accept
the recent proposals to have two names for inline classes (box and
inline), I'm a lot less comfortable with the proposals around null and
== and their impact on the user model.

>From a user model perspective, it is pretty clear in Java today when
to use == and when not to. If it is a primitive then you should use
==, otherwise you should not (unless you are an expert or doing
something odd). Normal day-to-day coding essentially never needs to
use == on reference types, and doing so is almost always a bug. Users
don't have to think too much (curse NaN), but there are only 8
primitive types and everyone knows what they are.

Again, from a user model perspective, it is clear what can be null and
what can't - primitives vs reference. Again, users don't have to think
too much with only 8 primitive types.

Now, considering a psuedo-code example of my last email's point:

inline class Ratio {
  long top
  long bottom
}
inline class NamedRatio {
  String name
  long top
  long bottom
}

The rules for == mean that :
 Ratio.of(1, 3) == Ratio.of(1, 3)
but that:
 Ratio.of("One third", 1, 3) != Ratio.of( "One third", 1, 3)
in many cases (depends if the string is inlined or not).

For a naive user, this behaviour of == is deeply unhelpful and
apparently semi-random. A user has to know that it is an inline class
and that it contains a type that can't be == checked safely. Some
inline classes can be compared using == but other inline classes can't
be. And since this language feature will be used for many new
primitive types like an unsigned integer or 128-bot long, it seems
completely unreasonable and unrealistic to ask users not to use == for
these types.

Since exposing more of == as currently proposed is IMO deeply
confusing. Remi and I have proposed deprecating ==, but this email
essentially proposes an alternative approach - naming.

What if inline classes are divided between "pure" and "impure" (better
names needed). A "pure" inline class has (1) a sensible default value,
not null/exception (2) a correct == implementation that always works
as would be expected of a primitive type (3) no reference type fields.
An "impure" inline class may contain reference type fields, may have
no meaningful default value and should never be compared using == in
normal code.

Then we say that "pure" inline classes are named using lowerCamelCase,
and "impure" inline classes using UpperCamelCase.

Thus,`localDate` is the inline class for the `LocalDate` box (assuming
agreement on a suitable default value).

In the example above, `ratio` would be the inline class and `Ratio`
the box (assuming agreement on a suitable default value). But it would
be `NamedRatio`, not `namedRatio` because it contains a reference
type.

The overall user model is simpler I think. All types that are
lowerCamelCase behave like primitives wrt == and null, and all types
that are UpperCamelCase behave like reference types wrt == and null.

What about the meaning of null and the box name for "impure" inline
classes? Well, perhaps there is only one named type for "impure"
inline classes. ie. it is effectively both the box and the inline
form. I'm thinking perhaps that the JVM could treat such "impure"
inline classes as always being boxes, but potentially inlining them
where possible. ie. if you write an "impure" inline class as opposed
to a "pure" one, you give up the ability to remove null from variables
of that type.

I can also see it being possible/desirable for the JVM to ensure that
"pure" inline classes cannot contain reference types (providing a
`ref<T>` inline class as a backdoor to make it obvious that the
content is being compared by identity).

Summary: It seems possible to separate the user-defined primitives use
case from the faster-objects one using naming - lowerCamelCase vs
UpperCamelCase. This seems to make the user model around ==, null and
migrated types clearer.

thanks
Stephen