New candidate JEP: 401: Primitive Objects (Preview)
Brian Goetz
brian.goetz at oracle.com
Tue Apr 20 14:33:41 UTC 2021
Thanks for these thoughts. You may, or may not, find it comforting to
know that all these concerns -- and the syntactic "hack" you propose --
have been considered extensively prior to settling on this design.
While it's not intrinsically a terrible idea, it's not as powerful as it
first appears, and the problems that you are concerned about loom much
larger when contemplating this big change than they are likely to
actually be once you start using them.
The main question you are addressing is: primitive classes are
different, so should they look different? It is a very natural
temptation to want the new features to StAnD OuT and LooK D!FFeR3nt;
these things are new and we are worried users will be confused. (See
https://www.thefeedbackloop.xyz/stroustrups-rule-and-layering-over-time/
for a more detailed description of this common phenomena.) Indeed, the
original strawman syntax of lambdas was LOUD -- the first proposal used
`#(int x, int y)(x * y)`. When we changed this to `(x, y) -> x*y`,
people first complained "that's too subtle!" But it took all of about
five minutes to get over this, and looking back to the original syntax,
it feels like a hammer blow to the head. "I'M NEW AND DIFFERENT", it
shouts!
Your proposal seems to be to continue using lower-case identifiers for
primitive classes, and the leading-upper-case version for their
reference projection. This has been made before. It has some apparent
upsides, as you propose, but also some downsides.
First, it takes decades of naming conventions and throws them out the
window. Previously, lower-case identifiers are either keywords (drawn
from a fixed list, which includes `int` and friends) or variable/method
names; type names (except for the ones which are keywords) begin with an
upper case. This proposal spills type names into the identifier space,
meaning that we have lost valuable clues for both types and
variable/method names. This creates new problems as it attempts to
solve others.
Second, it creates an uncomfortable coupling between two identifiers,
whose names are only related through an ad-hoc (and latin-centric)
mechanism, upper-casing the first letter. Where is the definition of
`Point`? Having it be in `primitive class point { }` is confusing. The
language and JVM have gone to great lengths to avoid making such
couplings in the past.
Third, it doesn't really solve all the problems you think it does; your
point about Optional works exactly the same way under this proposal (you
have to stick with non-flat `Optional` in existing APIs, and switch to
`optional` to flatten where you can) as it does under the current plan
(switch to `Optional.val` to flatten where you can.)
Fourth, while this reduces the chance that a user will mistake a
primitive class instance for a reference class instance, the cost of
this is that APIs become, from the perspective of many users,
gratuitously inconsistent. Having some classes called "account" and
others called "AccountGroup" will also be a persistent irritant.
Fifth, using naming like this asks users to remember the
identity-primitive polarity of every identifier if they want to get the
benefits of flattening, and if they don't, they'll get the worst of both
worlds. Since `Point` is a valid type name, users are more likely to
type `Point` when they mean `point` (or worse, do so inconsistently),
and not get the runtime behavior they expect. Freely mixing `point` and
`Point` in programs is allowable, but creates potential performance
issues and null injection issues at the boundaries. If the boundary is
small and well-defined (existing APIs that have been compatibly
migrated), that's acceptable; if the boundary is pervasive and complex,
this might be worse than nothing.
So, this proposal is one that I put in the category of "seems attractive
at first" (it was attractive to us, at first, too), but I don't think it
is in the long-term best interests of the language.
More comments inline.
On 4/20/2021 9:29 AM, Michael Kuhlmann wrote:
> Hi,
>
> sorry to coming back to this topic after more than a month, but I
> thought about it several times and want to share my thoughts. Maybe
> the ideas are long discussed already and nobody wants to read about it
> any more, but I looked in the archives and didn't find anything, so
> I'll give it a try. If it's a stupid idea, just let me know, I can
> live with that.
>
> I'm not a contributor, but I'm using Java for more than twenty years,
> since Java 1.1 precisely. I really love the idea of having primitive
> classes, that would be amazing! I'm just concerned that we won't make
> the best use out of it, especially because of compatibility reasons,
> and I was wondering if this can be achieved with a simple design change.
>
> The problems I'm seeing:
> * Primitive classes behave very different from standard object
> classes, but users don't immediately see this. You have to look into
> the definition to know whether an instance variable of SomeType will
> be initialized with null or a default value.
This is true, but relying on uninitialized variables isn't a
particularly great idea either way (and the language doesn't even let
you do this for locals.) This point, though, embodies a hard choice:
are users better served by presenting all user-written abstractions the
same way, or by having a mandatory syntactic designation for classes
that have a certain runtime behavior? For reasons above, I don't think
users are well served by this (well-intentioned!) suggestion.
> * The suffixes .ref and .val don't fit into our concept of class
> names, they look ugly and can easily be mixed up
I'm really glad you brought this up, because it's a common misperception.
The docs on Valhalla feature .val and .ref prominently, because this is
a critical piece of making the whole fit together and proving that we
can solve the problems that Valhalla set out to solve. But it is easy
to jump from there to the assumption that users will be dealing with
.ref and .val as often as C programmers have to deal with the difference
between X and *X. This is totally not the case!
It has been a central design requirement to ensure that the use of .ref
and .val are minimized; in most normal situations, they will never
appear. Motivating use cases for explicit .ref are:
- When you explicitly do not want flattening, generally for memory
consumption management. This is an advanced use case for performance
weenies.
- When you want to support type circularity (e.g., a linked list node
that points to the next node.) Generally a low-level implementation
concern.
- (Later on) When you want to express generics _without_
specialization (List<Point.ref>). This is analogous to the "no
flattening" case above, for the same reasons.
Motivating use cases for .val are:
- When you are using a migrated class and want to get flattening. This
is a pure optimization; you can always use the unadorned class name
here, you just don't get flattening.
These all have to do with micro-performance adjustments. Additionally,
users may choose to use P.ref to get "nullable primitives"; they may
also figure into the story for "no good default", but this is not yet
clear.
(In an earlier version, `P.ref` was called `P.box`.)
So:
- Ugly: Ugly is in the eye of the beholder, but such opinions are (a)
not universal and (b) not always permanent. (The lambda syntax we have
now was called ugly when it was first proposed.)
- Don't fit: They don't fit because our mental model does not yet have
a concept of "two ways to represent the same value". That's the real
challenge, not the syntax.
- Easily mixed up: I don't think this will be the case in practice.
> * That we have to introduce .rel just for the existing classes is even
> worse
Not sure what this point is about. There's no `.rel`, and if you mean
`.ref`, I'm not sure what you mean.
> * Existing classes like Optional will be mostly used in their original
> form. That's unfortunate, not that much for performance reasons but
> rather because such a value should never be null, so it could make
> most use out of this concept.
Yes, but this is "glass 99% full." In the early years of this project,
people said we were insane to even consider trying to compatibly migrate
Optional. "It's impossible! Just leave it be!" (These gave way to
complaints about the complexity of migration, which is where we are
now.) I think the solution we have represents a
dramatically-better-than-expected outcome; the alternate is almost
certainly "sorry, Optional was born an identity class, and so it stays."
The syntactic hack of "colonize `optional` as the new name" is just a
different spelling of `Optional.val`; everything else about this is the
same.
> * There's already the discussion to delay the implementation of
> typical primitive classes. Raffaelo proposed to invent classes
> Decimal64 and Decimal128, but it will not be added before this JEP is
> going live to avoid the need of the ugly compatibility hack.
Same is true here; if we had Decimal64 now, regardless of how we spell
it, it would be nullable, and then if we migrated it in-place, it would
be an incompatible change for existing clients of `decimal64`. In this
case, this proposal does not improve compatibility, it just moves the
breakage around.
(General lesson: no matter how hard you think migration compatibility
is, its harder.)
> * We have to treat the seven existing primitive types in a very
> special way.
Not as special as "very" implies, but ... again, I think this is glass
1% empty. Again, in the early days, it was considered unthinkable that
we would be able to compatibly migrate `int` to be an object, but here
we are. Yes, there are some legacy considerations, but they are fewer
than you probably think. The main one is the most superficial -- that
its name is spelled differently, and its box has an ad-hoc name too.
(But even this is half hidden behind the fact that you can spell
`Integer` as `int.ref` if you like.) The other is that you can't
synchronize on Integer any more -- but if this is the biggest
compatibility sin we've committed, then we've hit this out of the park.
What other "very special" considerations are you worried about?
> People are already used to the idea that normal classes start with an
> uppercase character, but primitives are in lowercase characters. The
> predecessor language Oak even defined string as a primitive type. So
> why not picking up this idea and forcing all future primitive types to
> start with lowercase characters as well?
>
> Java has been very concrete in style guides but very relaxed in
> enforcing them in the past. You can define a class named 'integer'
> without problems. I would see this as a design bug and would rather
> enforce some stricter rules.
>
> So we could make it mandatory to have all primitive class names start
> with a lowercase character, more concrete to a character that can be
> converted to an uppercase character. Instead of creating a twin class
> names 'someClass.ref' what is proposed in the JEP, the reference class
> could be named like the primitive class just starting with the
> uppercase character.
For the reasons above, this seems like a small change but it ripples in
unexpected ways, and not all the advantages actually work as they might
first appear.
The reality is that the visible warts of this proposal come, in no small
part, from the desire for compatible migration for existing identity
classes. For example, we could have just said "Optional is frozen in
time forever", and we might have been able to banish `.val` from the
vocabulary, and then perhaps found another spelling for `.ref`. But, is
that the world we want to live in? If we accept that compatible
migration is a worthwhile goal, and "old optional" and "new optional"
have any difference in semantics, there have to be two names, and the
existing uses have to get the old name, since its burned into classfiles
(`java/util/Optional;`). Should we just give up on compatible migration?
The real shame is that the only difference in semantics that we can't
paper over is nullability (and for Optional, this is adding insult to
injury because the Whole Point of Optional is to not use null.) If we
could, then we wouldn't have to pick another name, and there would be
different options available to us. The pain of null keeps on giving.
More information about the valhalla-dev
mailing list