New candidate JEP: 401: Primitive Objects (Preview)

Tue Apr 20 14:33:41 UTC 2021

Thanks for these thoughts. You may, or may not, find it comforting to 
know that all these concerns -- and the syntactic "hack" you propose -- 
have been considered extensively prior to settling on this design.  
While it's not intrinsically a terrible idea, it's not as powerful as it 
first appears, and the problems that you are concerned about loom much 
larger when contemplating this big change than they are likely to 
actually be once you start using them.

The main question you are addressing is: primitive classes are 
different, so should they look different?  It is a very natural 
temptation to want the new features to StAnD OuT and LooK D!FFeR3nt; 
these things are new and we are worried users will be confused.  (See 
https://www.thefeedbackloop.xyz/stroustrups-rule-and-layering-over-time/ 
for a more detailed description of this common phenomena.) Indeed, the 
original strawman syntax of lambdas was LOUD -- the first proposal used 
`#(int x, int y)(x * y)`.  When we changed this to `(x, y) -> x*y`, 
people first complained "that's too subtle!"  But it took all of about 
five minutes to get over this, and looking back to the original syntax, 
it feels like a hammer blow to the head.  "I'M NEW AND DIFFERENT", it 
shouts!

Your proposal seems to be to continue using lower-case identifiers for 
primitive classes, and the leading-upper-case version for their 
reference projection.  This has been made before.  It has some apparent 
upsides, as you propose, but also some downsides.

First, it takes decades of naming conventions and throws them out the 
window.  Previously, lower-case identifiers are either keywords (drawn 
from a fixed list, which includes `int` and friends) or variable/method 
names; type names (except for the ones which are keywords) begin with an 
upper case.  This proposal spills type names into the identifier space, 
meaning that we have lost valuable clues for both types and 
variable/method names.  This creates new problems as it attempts to 
solve others.

Second, it creates an uncomfortable coupling between two identifiers, 
whose names are only related through an ad-hoc (and latin-centric) 
mechanism, upper-casing the first letter.  Where is the definition of 
`Point`?  Having it be in `primitive class point { }` is confusing.  The 
language and JVM have gone to great lengths to avoid making such 
couplings in the past.

Third, it doesn't really solve all the problems you think it does; your 
point about Optional works exactly the same way under this proposal (you 
have to stick with non-flat `Optional` in existing APIs, and switch to 
`optional` to flatten where you can) as it does under the current plan 
(switch to `Optional.val` to flatten where you can.)

Fourth, while this reduces the chance that a user will mistake a 
primitive class instance for a reference class instance, the cost of 
this is that APIs become, from the perspective of many users, 
gratuitously inconsistent.  Having some classes called "account" and 
others called "AccountGroup" will also be a persistent irritant.

Fifth, using naming like this asks users to remember the 
identity-primitive polarity of every identifier if they want to get the 
benefits of flattening, and if they don't, they'll get the worst of both 
worlds.  Since `Point` is a valid type name, users are more likely to 
type `Point` when they mean `point` (or worse, do so inconsistently), 
and not get the runtime behavior they expect.  Freely mixing `point` and 
`Point` in programs is allowable, but creates potential performance 
issues and null injection issues at the boundaries.  If the boundary is 
small and well-defined (existing APIs that have been compatibly 
migrated), that's acceptable; if the boundary is pervasive and complex, 
this might be worse than nothing.

So, this proposal is one that I put in the category of "seems attractive 
at first" (it was attractive to us, at first, too), but I don't think it 
is in the long-term best interests of the language.

More comments inline.

On 4/20/2021 9:29 AM, Michael Kuhlmann wrote:
> Hi,
>
> sorry to coming back to this topic after more than a month, but I 
> thought about it several times and want to share my thoughts. Maybe 
> the ideas are long discussed already and nobody wants to read about it 
> any more, but I looked in the archives and didn't find anything, so 
> I'll give it a try. If it's a stupid idea, just let me know, I can 
> live with that.
>
> I'm not a contributor, but I'm using Java for more than twenty years, 
> since Java 1.1 precisely. I really love the idea of having primitive 
> classes, that would be amazing! I'm just concerned that we won't make 
> the best use out of it, especially because of compatibility reasons, 
> and I was wondering if this can be achieved with a simple design change.
>
> The problems I'm seeing:
> * Primitive classes behave very different from standard object 
> classes, but users don't immediately see this. You have to look into 
> the definition to know whether an instance variable of SomeType will 
> be initialized with null or a default value.

This is true, but relying on uninitialized variables isn't a 
particularly great idea either way (and the language doesn't even let 
you do this for locals.)   This point, though, embodies a hard choice: 
are users better served by presenting all user-written abstractions the 
same way, or by having a mandatory syntactic designation for classes 
that have a certain runtime behavior?  For reasons above, I don't think 
users are well served by this (well-intentioned!) suggestion.

> * The suffixes .ref and .val don't fit into our concept of class 
> names, they look ugly and can easily be mixed up

I'm really glad you brought this up, because it's a common misperception.

The docs on Valhalla feature .val and .ref prominently, because this is 
a critical piece of making the whole fit together and proving that we 
can solve the problems that Valhalla set out to solve.  But it is easy 
to jump from there to the assumption that users will be dealing with 
.ref and .val as often as C programmers have to deal with the difference 
between X and *X.  This is totally not the case!

It has been a central design requirement to ensure that the use of .ref 
and .val are minimized; in most normal situations, they will never 
appear.  Motivating use cases for explicit .ref are:

  - When you explicitly do not want flattening, generally for memory 
consumption management.  This is an advanced use case for performance 
weenies.
  - When you want to support type circularity (e.g., a linked list node 
that points to the next node.)  Generally a low-level implementation 
concern.
  - (Later on) When you want to express generics _without_ 
specialization (List<Point.ref>).  This is analogous to the "no 
flattening" case above, for the same reasons.

Motivating use cases for .val are:

  - When you are using a migrated class and want to get flattening. This 
is a pure optimization; you can always use the unadorned class name 
here, you just don't get flattening.

These all have to do with micro-performance adjustments. Additionally, 
users may choose to use P.ref to get "nullable primitives"; they may 
also figure into the story for "no good default", but this is not yet 
clear.

(In an earlier version, `P.ref` was called `P.box`.)

So:

  - Ugly: Ugly is in the eye of the beholder, but such opinions are (a) 
not universal and (b) not always permanent.  (The lambda syntax we have 
now was called ugly when it was first proposed.)
  - Don't fit: They don't fit because our mental model does not yet have 
a concept of "two ways to represent the same value".  That's the real 
challenge, not the syntax.
  - Easily mixed up: I don't think this will be the case in practice.

> * That we have to introduce .rel just for the existing classes is even 
> worse

Not sure what this point is about.  There's no `.rel`, and if you mean 
`.ref`, I'm not sure what you mean.

> * Existing classes like Optional will be mostly used in their original 
> form. That's unfortunate, not that much for performance reasons but 
> rather because such a value should never be null, so it could make 
> most use out of this concept.

Yes, but this is "glass 99% full."  In the early years of this project, 
people said we were insane to even consider trying to compatibly migrate 
Optional.  "It's impossible!  Just leave it be!"   (These gave way to 
complaints about the complexity of migration, which is where we are 
now.)  I think the solution we have represents a 
dramatically-better-than-expected outcome; the alternate is almost 
certainly "sorry, Optional was born an identity class, and so it stays."

The syntactic hack of "colonize `optional` as the new name" is just a 
different spelling of `Optional.val`; everything else about this is the 
same.

> * There's already the discussion to delay the implementation of 
> typical primitive classes. Raffaelo proposed to invent classes 
> Decimal64 and Decimal128, but it will not be added before this JEP is 
> going live to avoid the need of the ugly compatibility hack.

Same is true here; if we had Decimal64 now, regardless of how we spell 
it, it would be nullable, and then if we migrated it in-place, it would 
be an incompatible change for existing clients of `decimal64`.  In this 
case, this proposal does not improve compatibility, it just moves the 
breakage around.

(General lesson: no matter how hard you think migration compatibility 
is, its harder.)

> * We have to treat the seven existing primitive types in a very 
> special way.

Not as special as "very" implies, but ... again, I think this is glass 
1% empty.  Again, in the early days, it was considered unthinkable that 
we would be able to compatibly migrate `int` to be an object, but here 
we are.  Yes, there are some legacy considerations, but they are fewer 
than you probably think.  The main one is the most superficial -- that 
its name is spelled differently, and its box has an ad-hoc name too.  
(But even this is half hidden behind the fact that you can spell 
`Integer` as `int.ref` if you like.)  The other is that you can't 
synchronize on Integer any more -- but if this is the biggest 
compatibility sin we've committed, then we've hit this out of the park.

What other "very special" considerations are you worried about?

> People are already used to the idea that normal classes start with an 
> uppercase character, but primitives are in lowercase characters. The 
> predecessor language Oak even defined string as a primitive type. So 
> why not picking up this idea and forcing all future primitive types to 
> start with lowercase characters as well?
>
> Java has been very concrete in style guides but very relaxed in 
> enforcing them in the past. You can define a class named 'integer' 
> without problems. I would see this as a design bug and would rather 
> enforce some stricter rules.
>
> So we could make it mandatory to have all primitive class names start 
> with a lowercase character, more concrete to a character that can be 
> converted to an uppercase character. Instead of creating a twin class 
> names 'someClass.ref' what is proposed in the JEP, the reference class 
> could be named like the primitive class just starting with the 
> uppercase character.

For the reasons above, this seems like a small change but it ripples in 
unexpected ways, and not all the advantages actually work as they might 
first appear.

The reality is that the visible warts of this proposal come, in no small 
part, from the desire for compatible migration for existing identity 
classes.  For example, we could have just said "Optional is frozen in 
time forever", and we might have been able to banish `.val` from the 
vocabulary, and then perhaps found another spelling for `.ref`.  But, is 
that the world we want to live in?  If we accept that compatible 
migration is a worthwhile goal, and "old optional" and "new optional" 
have any difference in semantics, there have to be two names, and the 
existing uses have to get the old name, since its burned into classfiles 
(`java/util/Optional;`). Should we just give up on compatible migration?

The real shame is that the only difference in semantics that we can't 
paper over is nullability (and for Optional, this is adding insult to 
injury because the Whole Point of Optional is to not use null.)  If we 
could, then we wouldn't have to pick another name, and there would be 
different options available to us.  The pain of null keeps on giving.