We have to talk about "primitive".

Sat Dec 18 11:53:07 UTC 2021

> From: "Kevin Bourrillion" <kevinb at google.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Mercredi 15 Décembre 2021 19:42:55
> Subject: We have to talk about "primitive".

> (Okay, so we're doing this)
> I think the rename to "primitive classes" happened during my outage last year.
> When I came back I made the decision to like it.

> Since then, I've found that in my explanatory model I'm fighting against it
> constantly. I think it may actually be fatally flawed.

> The points I raise here were surely already known at the time, and I know there
> were good reasons for overriding them. But I feel the need to come back and
> push harder on their importance.

> Background: the textbook definition of "primitive" is centered on their nature
> of being elements-not-molecules, and I see no dispute about it. Also, there's
> no disputing the fact that we're allowed to adopt a different meaning if we so
> choose. So that's not even the fatal flaw.

As already said by John, there are atoms in term of user defined types but not at runtime, apart if declared volatile, a long or a double is two 32 bits values. 

> The main problem I think we can't escape is that we'll still need some word that
> means only the eight predefined types. (For the sake of argument let's assume
> we can pick one and lean hard on it, whether that's "predefined", "built-in",
> "elemental", "leaf type", or whatever.)

I still hope that we can see used defined primitive and a builtin type the same way from a JLS point of view. 
Obviously, from the JVMS POV they are different, but i think one of our goal should be that the distinction between a builtin primitive and a user defined primitive should not visible in the JLS. 

> Definitely, our trying to minimize their specialness is virtuous. They should be
> like helium: yes, they are molecules when you want a molecule! But on any
> deeper look they will clearly be "actually" elements, and the distinction will
> matter often enough.

> So we have to attempt to shift users' understanding of "primitive" while at the
> same time injecting a new term to mean exactly what primitive used to mean.
> That's the old Indiana Jones switch and I don't have to tell you how that
> turned out for him.

> It would be difficult to pull off in a world where we were just pushing some new
> server and the whole world gets the new model at once. But in this universe
> where every version of Java ever made all have to coexist, it's looking to me
> like a guaranteed source of never-ending confusion.

> I also think it robs us of our ability to smoothly portray the real changes of
> Valhalla. We want to be able to say "elements are still elements! now we have
> molecules too". Pedagogically that is always preferable to "elements aren't
> really what you thought they were". Okay, the real comparison is a little more
> nuanced than that, but I'll get to that now.

I agree retconing is better pedagogically because a lot of people think in term of analogy. 

> An alternative that seems to work fine, in my mental model at least, is:

>     * Primitive types are examples of value types, and have always been.
>    * Java never supported any other kinds of value types before, so we didn't
>     distinguish the terms before.
>     * Everything you associate with primitive types remains true.
>     * But most of those traits really come from their value-type-ness.

> (I plan to make the above shifts to my model document already.)

>     * Now we have user-defined value types too.
>    * The way we user-define a type is with a class, so a value type is defined by a
>     "value class" (sorry B2).
>     * The primitive types will now each get a value class.
>     * These 8 classes will look as much like user-defined types as Object does.
>    * They, like Object, will have a "cheat" in their source code that no one else
>    gets to use. (Object's is that there is no implied `extends Object` or
>    `super();`; these need no fields because the data they store is magically
>     handled by the VM. These feel like similar cheats.)

> Then mopping up the rest:

>    * Existing classes probably need a term like "reference classes" (in the model
>    I'm going to circulate that doubles down on values-are-not-objects, then this
>     wants to be "object classes", even though that feels weird at first).
>    * I think the term for bucket 2 classes really ought to center on
>    identitylessness, e.g. "noid", "noident", "idfree", or something. Anything else
>    is getting away from the essential meaning of the bucket; plus, we want people
>     to call bucket 1 classes "identity classes", don't we?

> Footnote: for a more concrete manifestation of this problem: I am sure we cannot
> possibly get away with Class.isPrimitive() being true for these classes. Right?

> Thoughts?

I agree but i don't think we should use "value type" as a term to encompass user defined primitive and builtin primitive. 

BTW, i think it's very interesting to have this discussion now that we have scramble the model by introducing the B1/B2/B3 model. 

This is how I see the thing, 
technically, we have 4 category, 
B1: user defined object with an identity, used by reference (nullable) 
B2: user defined object with no identity, used by reference (nullable) 
B3: user defined primitive with no identity, used as direct value 
B4: builtin primitive with no identity, used as direct value 

With the previous model of Valhalla, with had only the category B1, B2 and B3, so the cut was between having an identity or not to the point were we have introduced IdentityObject/ValueObject in the type system. 
I believe that introducing B2 change where we introduce the cut, we still hope that at the end we have only two category right ? 

I believe we should piggyback on the difference between reference vs direct value and do the cut here. 
After all, introducing B2 means that having identityless objects used by reference is useful. 

So, for me, it seems logical to group B1 and B2 together and to group B3 and B4 together and see B2 as a special king of B1 and B4 as a special kind of B3. 

So we have object or primitive, among the object, we have the one with identity and the one identityless, among the primitive, we have the one with Ref box and the one with historical box (Integer, etc). 

On the subject of boxes, i think we should go the other way, aka "you never really understood how box worked" because most people don't care about how a box work, and rightly so, 
once we get better generics, box will mostly disappear. 

I think we should not introduce the interfaces IdentityObject/ValueObject because it does not seem useful anymore to explain the new model, it's not the center of the model anymore, and their usefulness in term of typing is low. 
(We still need to consider empty abstract class as special but it's a detail for people wanting to play with primitive class + inheritance, so it's fairly specific). 

For a regular user of Java that does not care about the JVM details, 
- class/enum/record/lambda are handled by references, a class/enum/record can be declared identityless with a modifier, a lambda is identityless. 
- primitive are handled by direct values (so not nullable), they are tearable, have a default value/a non overridable default constructor, 
they are defined by the keyword primitive, the bulitins (the one written in lowercase) have a named box instead of Primitive.Box 

Examples: 
String is a class, Optional is an identityless class, Complex is a primitive, int is a builtin primitive 

Rémi