New candidate JEP: 401: Primitive Objects (Preview)

Tue Apr 20 17:34:35 UTC 2021

Hi Brian,

thank you so much for the detailed answer! I understand that the case 
was already discussed in the very beginning. I was expecting this but 
wanted to understand the reasoning behind, so your explanation helped a lot.

I don't want to open the whole discussion again, but since you gave such 
a detailed explanation, I want to answer them and share my motivation a 
bit further. Even though it's clear it won't change anything in the 
current proposal, it stays as it is, and I'm fine with it - at the end 
it's just another proposal from my side, nothing more.

Please see my comments inline.

On 4/20/21 4:33 PM, Brian Goetz wrote:
> The main question you are addressing is: primitive classes are 
> different, so should they look different?  It is a very natural 
> temptation to want the new features to StAnD OuT and LooK D!FFeR3nt; 
> these things are new and we are worried users will be confused.  (See 
> https://www.thefeedbackloop.xyz/stroustrups-rule-and-layering-over-time/ 
> for a more detailed description of this common phenomena.) Indeed, the 
> original strawman syntax of lambdas was LOUD -- the first proposal used 
> `#(int x, int y)(x * y)`.  When we changed this to `(x, y) -> x*y`, 
> people first complained "that's too subtle!"  But it took all of about 
> five minutes to get over this, and looking back to the original syntax, 
> it feels like a hammer blow to the head.  "I'M NEW AND DIFFERENT", it 
> shouts!

Yes, I think they should look different. Not too much, sure, but I 
wouldn't mind marking them a bit with a different naming scheme.

Let me add that I don't think that primitive classes will be a frequent 
case for a standard developer. 98% of Java developers will probably 
never be tempted to ever write their own primitive classes. I see this 
as a feature only for some rather low-level types, like Optional or some 
of the java.util.time classes. I can hardly imagine much more, even 
including all the standard frameworks - only some mathematical libraries 
and some internal classes in collection frameworks will make use of it. 
Or the disruptor framework. But there the benefit is huge!

So I don't think we'll be flooded with thousands of primitive classes 
looking different in the near future. Honestly I think even Point is not 
the best example. If you deal with image or 3D vector processing, having 
a lot of point and color elements in rather low level code, it might 
make sense to have this data in a condensed form using primitive 
classes, but if you model geometrical objects such as points, circles 
and rectangles, I would go for the standard reference types (or records 
probably).

One side note, I sometimes find it confusing that enums look the same as 
normal domain classes, where they are often mixed upon. You can't 
instantiate them, they're usually immutable, you might want to compare 
them by identity. Good that normal IDEs render them with a different 
icon, but that doesn't help within Java code.

I would not propose to have a different naming style for enums, at the 
end they're still relatively normal Java classes. But since primitive 
classes are even more different, and also much less frequent, it could 
justify a different naming style.

But at the end, it's a matter of opinion, and I understand your concerns 
against such an "awkward look" for primitive class names.

> 
> Your proposal seems to be to continue using lower-case identifiers for 
> primitive classes, and the leading-upper-case version for their 
> reference projection.  This has been made before.  It has some apparent 
> upsides, as you propose, but also some downsides.
> 
> First, it takes decades of naming conventions and throws them out the 
> window.  Previously, lower-case identifiers are either keywords (drawn 
> from a fixed list, which includes `int` and friends) or variable/method 
> names; type names (except for the ones which are keywords) begin with an 
> upper case.  This proposal spills type names into the identifier space, 
> meaning that we have lost valuable clues for both types and 
> variable/method names.  This creates new problems as it attempts to 
> solve others.

I agree, this is a downside. But you could argue that current primitive 
classes already are defined like this, they are types starting with a 
lowercase character, as method or variable names. That all primitive 
names are also listed as reserved keywords doesn't change much, it's 
more a language spec detail.

And I wouldn't have come up with a proposal that is contrary to the 
standard naming conventions if the whole JEP wouldn't be already kind of 
revolutionary. Having all current classes automatically implement some 
marker interface is definitely something very new and unexpected, so I 
thought all gates are open already. ;)

> 
> Second, it creates an uncomfortable coupling between two identifiers, 
> whose names are only related through an ad-hoc (and latin-centric) 
> mechanism, upper-casing the first letter.  Where is the definition of 
> `Point`?  Having it be in `primitive class point { }` is confusing.  The 
> language and JVM have gone to great lengths to avoid making such 
> couplings in the past.

The latin centric mechanism is indeed something I was thinking about as 
well. (But honestly, if you define the chinese class 点, and refer to 
the reference type, then 点.ref also looks very latin-centric.)

You could also ask, where is the class definition of Point.ref? Is Point 
the package name and ref the class name?

But agreed, this argument still is more against the lowercase proposal.

> 
> Third, it doesn't really solve all the problems you think it does; your 
> point about Optional works exactly the same way under this proposal (you 
> have to stick with non-flat `Optional` in existing APIs, and switch to 
> `optional` to flatten where you can) as it does under the current plan 
> (switch to `Optional.val` to flatten where you can.)

That is true. But in future code, I assume it will be much less likely 
that interface designers will declare the return type with the 
concatenated name monster 'Optional.val' that with a lowercase type 
'optional'. And it even reads nice: 'optional<String>' looks like an 
optional String, as if optional with be a modifier like volatile. Of 
course, that's just my personal preference.

> 
> Fourth, while this reduces the chance that a user will mistake a 
> primitive class instance for a reference class instance, the cost of 
> this is that APIs become, from the perspective of many users, 
> gratuitously inconsistent.  Having some classes called "account" and 
> others called "AccountGroup" will also be a persistent irritant.

You will find examples where this is true, but I don't buy this one. ;) 
I can't imagine why someone would want to declare the domain class 
'Account' as primitive? Especially when other classes in the same model 
are not. If someone is mixing primitive and reference types in the same 
model together, then he or she is doing something very wrong. I would be 
curious if there is a useful real life example for such a use case, but 
I can't find one.

And even if they do so: It's not much better with the current proposal. 
If you ask the account for its groups, you get instances of 
AccountGroup, but if you ask the group for its accounts, you get 
instances of Account.ref. That's also irritant.

> 
> Fifth, using naming like this asks users to remember the 
> identity-primitive polarity of every identifier if they want to get the 
> benefits of flattening, and if they don't, they'll get the worst of both 
> worlds.  Since `Point` is a valid type name, users are more likely to 
> type `Point` when they mean `point` (or worse, do so inconsistently), 
> and not get the runtime behavior they expect.  Freely mixing `point` and 
> `Point` in programs is allowable, but creates potential performance 
> issues and null injection issues at the boundaries.  If the boundary is 
> small and well-defined (existing APIs that have been compatibly 
> migrated), that's acceptable; if the boundary is pervasive and complex, 
> this might be worse than nothing.

That's a valid point. (Point, hehe.) But as said, I would anyway not 
recommend to use primitive types for domain classes, but only low level 
types as Instant for example. You can still mix up Instant and instant, 
but people who do this are often those who anyway already mix up int and 
Integer or Long and long.

> 
> So, this proposal is one that I put in the category of "seems attractive 
> at first" (it was attractive to us, at first, too), but I don't think it 
> is in the long-term best interests of the language.
> 
> More comments inline.

Thanks again, I'll comment them as well.

> 
> On 4/20/2021 9:29 AM, Michael Kuhlmann wrote:
>> The problems I'm seeing:
>> * Primitive classes behave very different from standard object 
>> classes, but users don't immediately see this. You have to look into 
>> the definition to know whether an instance variable of SomeType will 
>> be initialized with null or a default value.
> 
> This is true, but relying on uninitialized variables isn't a 
> particularly great idea either way (and the language doesn't even let 
> you do this for locals.)   This point, though, embodies a hard choice: 
> are users better served by presenting all user-written abstractions the 
> same way, or by having a mandatory syntactic designation for classes 
> that have a certain runtime behavior?  For reasons above, I don't think 
> users are well served by this (well-intentioned!) suggestion.

Hmh, depends. I agree relying on uninitialized variables isn't a great 
idea, but people do that. They know when the variable is of type int, 
it's initialized to zero, but if it's of type Point, they might wonder 
why it's set to (0,0), something like the upper left corner of the screen.

Again, it's a matter of opinion.

> 
>> * The suffixes .ref and .val don't fit into our concept of class 
>> names, they look ugly and can easily be mixed up
> 
> I'm really glad you brought this up, because it's a common misperception.
> 
> [Some good context here]

Thanks for the insight. And to be honest, I have no concerns against the 
.ref suffix; one would rarely explicitly use the reference type when the 
value type could also be used, and if so, it's perhaps a good idea to 
mark this more prominent. Also primitive types will rarely be used in 
standard collections, I assume. So an argument for the suffix style.

what I find sad is that we're in the need for the .val suffix. But that 
better explained in the next section.

> 
>> * That we have to introduce .rel just for the existing classes is even 
>> worse
> 
> Not sure what this point is about.  There's no `.rel`, and if you mean 
> `.ref`, I'm not sure what you mean.

Yeah, my bad, here I already mixed those two up. Or I types too fast. I 
meant the .val suffix, and that's AFAIU only introduced for the existing 
classes.

So in future we'll have two different naming schemas for the same concept:
* Existing classes can be addressed using Instant for the reference 
type, and Instant.val for the value type
* New classes can be addressed using Point.ref for the reference type 
and Point for the value type.

So there are two distinct naming schemas for the same kind of 
reference/value pairs. The only difference is that the one class was 
introduced before the JEP, and the second one after that.

And for libraries who want to stay compatible with older JDKs, but 
eventually want to make use of this feature in future releases, it's 
even more complicated.

I was trying to find a solution to get around with this. To make use of 
your metaphor, if the glass is 90% full, but we can fill it up to 100% 
without additional overhead, it would be even better.

> 
>> * Existing classes like Optional will be mostly used in their original 
>> form. That's unfortunate, not that much for performance reasons but 
>> rather because such a value should never be null, so it could make 
>> most use out of this concept.
> 
> Yes, but this is "glass 99% full."  In the early years of this project, 
> people said we were insane to even consider trying to compatibly migrate 
> Optional.  "It's impossible!  Just leave it be!"   (These gave way to 
> complaints about the complexity of migration, which is where we are 
> now.)  I think the solution we have represents a 
> dramatically-better-than-expected outcome; the alternate is almost 
> certainly "sorry, Optional was born an identity class, and so it stays."
> 
> The syntactic hack of "colonize `optional` as the new name" is just a 
> different spelling of `Optional.val`; everything else about this is the 
> same.

True, at the end it's just a naming concept, but at least a consistent 
one for old and new classes.

(But I generally agree to your arguments, and thank you for the insights 
into the evaluation of these concepts.)

> 
>> * We have to treat the seven existing primitive types in a very 
>> special way.
> 
> Not as special as "very" implies, but ... again, I think this is glass 
> 1% empty.  Again, in the early days, it was considered unthinkable that 
> we would be able to compatibly migrate `int` to be an object, but here 
> we are.  Yes, there are some legacy considerations, but they are fewer 
> than you probably think.  The main one is the most superficial -- that 
> its name is spelled differently, and its box has an ad-hoc name too.  
> (But even this is half hidden behind the fact that you can spell 
> `Integer` as `int.ref` if you like.)  The other is that you can't 
> synchronize on Integer any more -- but if this is the biggest 
> compatibility sin we've committed, then we've hit this out of the park.
> 
> What other "very special" considerations are you worried about?

I wonder if we can get around of all these inconsistencies. Not only 
between legacy code and post-JEP401-code, but also between core 
primitives and new ones.

If there wouldn't be a 'float' primitive yet, we would call the class 
Float and the boxed type Float.ref. Or if it would've been introduced a 
bit earlier, it would have been Float.val and Float. But it's not, the 
primitive name if float and the boxed type Float, just as in my proposal.

What I read is that you plan to define aliases for, e.g., float that 
refers to Float.val. This wouldn't be necessary if the naming scheme 
would already cover the existing pattern of primitive types. Except int 
and char which are falling out a bit.

So the idea is avoiding three different naming schemes for the same concept.

And if someone want to invent a type for imaginary numbers, they can 
call the class imaginary, the boxed type is Imaginary, and it fits very 
well into the existing primitive types. It just feels similar.

> 
>> People are already used to the idea that normal classes start with an 
>> uppercase character, but primitives are in lowercase characters. The 
>> predecessor language Oak even defined string as a primitive type. So 
>> why not picking up this idea and forcing all future primitive types to 
>> start with lowercase characters as well?
>>
>> Java has been very concrete in style guides but very relaxed in 
>> enforcing them in the past. You can define a class named 'integer' 
>> without problems. I would see this as a design bug and would rather 
>> enforce some stricter rules.
>>
>> So we could make it mandatory to have all primitive class names start 
>> with a lowercase character, more concrete to a character that can be 
>> converted to an uppercase character. Instead of creating a twin class 
>> names 'someClass.ref' what is proposed in the JEP, the reference class 
>> could be named like the primitive class just starting with the 
>> uppercase character.
> 
> For the reasons above, this seems like a small change but it ripples in 
> unexpected ways, and not all the advantages actually work as they might 
> first appear.
> 
> 
> The reality is that the visible warts of this proposal come, in no small 
> part, from the desire for compatible migration for existing identity 
> classes.  For example, we could have just said "Optional is frozen in 
> time forever", and we might have been able to banish `.val` from the 
> vocabulary, and then perhaps found another spelling for `.ref`.  But, is 
> that the world we want to live in?  If we accept that compatible 
> migration is a worthwhile goal, and "old optional" and "new optional" 
> have any difference in semantics, there have to be two names, and the 
> existing uses have to get the old name, since its burned into classfiles 
> (`java/util/Optional;`). Should we just give up on compatible migration?
> 
> The real shame is that the only difference in semantics that we can't 
> paper over is nullability (and for Optional, this is adding insult to 
> injury because the Whole Point of Optional is to not use null.)  If we 
> could, then we wouldn't have to pick another name, and there would be 
> different options available to us.  The pain of null keeps on giving.
> 
That is so true, and I agre completely.

Thank you very much for the detailed explanation. I see that we'll be 
using .val and .ref in future, and I agree that there will be only few 
cases where it's really needed.

>