We have to talk about "primitive".

Thu Dec 16 18:31:15 UTC 2021

Really appreciate the attention and insight here. I must respond on the
installment plan.

On Wed, Dec 15, 2021 at 7:15 PM John Rose <john.r.rose at oracle.com> wrote:

> On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote:
>
> …
> The main problem I think we can't escape is that we'll still need some
> word
> that means only the eight predefined types. (For the sake of argument
> let's
> assume we can pick one and lean hard on it, whether that's "predefined",
> "built-in", "elemental", "leaf type", or whatever.)
>
> As others have said, we’ll pick a term for this. The idea of calling out a
> “leaf” in a data graph is compelling to me. As you say, people are going to
> wonder what is the foundation of the whole scheme. (No it’s not objects all
> the way down, at least that’s not what we are aiming for.)
>
> (But—spoiler alert—the division between leaf/scalar/basic type and
> composite/class type is *less important in daily practice* than the ad
> hoc mental models programmers make about which types they choose to view as
> composite and which are indivisible. Typical example: Most programmers
> choose to regard String as a sort of nullable primitive. I’ll pick up
> that thread later.)
>
Yes, I agree.
(Because I hate to drop a metaphor) Physicists want to know that the proton
is divisible, but they can do a hell of a lot without paying attention to
that fact.

> I like the term “basic type”, and (as we already discussed) I like
> “scalar” also, because “scalar” correctly suggests something about how it’s
> processed in hardware.
>
Note I'm stipulating that we'll find the most perfect term there is (next
to "primitive"), and all my arguments remain.

> Here’s a point I think is also important and has not been discussed much
> yet: A concept like “basic type” (or “scalar type”) should include
> references as well as Java’s eight current primitive types.
>
I think I've said this somewhere in this threads as well, "I'm quite
comfortable with the idea that references are the ninth primitive type",
but I backpedaled from that by the time I'd finished the conceptual model
<https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview>.
To suggest that "reference" is a type implies that each reference has
*two* static
types. Pros/cons:

+ Each of those static types always functions in the exact same way.
Non-reference values just don't have the second one (or it equals the
first).
− It makes *three* types involved overall (1. it's a reference / 2. it has
this constraint on the referent's dynamic type / 3. the referent has this
dynamic type).
− It means that what users see in their code might be the first *or* the
second of those, which seems like losing ground on the opaqueness of
references. Valhalla acts to strengthen the implementation-detailness of
references, e.g. we want users to think, "the dot means member access, and
Java dereferences first if necessary". (I see the main distinction between
the "values-are-objects" and "values-ain't-objects" candidate models as
being how *far* it goes down that line.)

So my alternative is: just let *valueness* be what unifies them. They are
only special in that (a) the static type functions totally differently and
(b) their opaqueness and everything that is done to provide that. Right now
I feel like this gets the job done.

As I read your messages, you would prefer to keep the term “primitive”
> narrow, because of the possible confusion of telling users “hey, what you
> think of as primitives are now the heirloom basic primitives.”
> Personally, I think users will say, to our unveiling “extended primitives”,
> something like this:
>
> Well, that’s not exactly what the dictionary says primitive means, if you
> can make new composite ones. But I do know that Java has non-reference
> types and calls them “primitive”. And I also know it would be really cool
> to define new types that work like `int`, such as `UnsignedInt` or
> `HalfFloat` or the like. I get why they don’t want to build all such types
> into the language; in fact maybe I’d like to try my hand someday at
> defining my own. So, “extended primitive”. It’s on: The Java primitives are
> now an open-ended set just like the Java objects.
>
> I have quibbles here and there but I definitely agree that everyone can
find a map through this. But:

> In other words, in saying “extended primitive” (and also “basic
> primitive”) we lean away from the dictionary definition of “primitive” and
> into the Java definition. That feels like a non-confusing choice to me.
>
This might be okay except for my central point: that we simultaneously need
a new term meaning exactly the dictionary definition.

So we have to attempt to shift users' understanding of "primitive" while at
> the same time injecting a new term to mean exactly what primitive used to
> mean. That's the old Indiana Jones switch and I don't have to tell you how
> that turned out for him.
>
> So, no, it’s not the Indy switch, at all. Users know what fruit
> primitives are in Java, and they will have no problem with adding new imported
> exotic apples extended primitive to the familiar set of primitive types.
> And in exchange for this infusion of wonderful new types, they will learn a
> new term for the old types, which is pears basic primitives (or scalar
> primitives).
>
It would be a *worse* "Indiana Jones switch" if these were sibling
concepts. But even if he was swapping the idol for a less detailed idol
he'd better start runnin'.

> It would be difficult to pull off in a world where we were just pushing
> some new server and the whole world gets the new model at once. But in
> this
> universe where every version of Java ever made all have to coexist, it's
> looking to me like a guaranteed source of never-ending confusion.
>
> I also think it robs us of our ability to smoothly portray the real
> changes
> of Valhalla. We want to be able to say "elements are still elements! now
> we
> have molecules too".
>
> There are two kinds of users w.r.t. the question of “what’s a primitive”
> and you can’t please both. You and I want to please different kinds. The
> user I want to please is one who thinks of “Java primitive” as a kind of
> non-nullable scalar number (or boolean or char). The user you want to
> please thinks of “Java primitive” as “all leaves in the Big Graph”. The
> latter user will be disappointed if we say “Java primitives” can be
> non-leaves. The former user will be delighted. The latter user sees a
> String and wants to crack out its underlying array, in a Gollum-like
> quest for the roots of the mountains. The latter user treats a String as
> a primitive. There are more of the former than the latter; we should cater
> to them. It’s the former who I was channeling above, concluding with “The
> Java primitives are now an open-ended set just like the Java objects.”
>
Here's where I suggest that we categorize our existing associations with
"primitive" into essential vs. incidental.

And generally, to claim essentialness for a meaning that's at odds with the
generally accepted meaning should be subject to a form of Sagan's razor
<https://www.google.com/search?q=sagan%27s+razor>. But I suppose the
question is what that generally accepted meaning is. For one thing, if any
other language has user-defined compound types that it calls "primitives"
already that would be very useful to know.

> The term “value” can be applied to composites in B3 alone, to composites
> in B2 alone, or to both. (Or neither.) All the basic types, including
> references, are values as well.
>
> This is big choice, where to “spend” the term “value”.
>
Just a reminder (esp. for others observing) that my conceptual model
document shows at least one thorough viewpoint that "value" has a strong
existing meaning, and one that Valhalla doesn't even have to shift at all.

> B2 are references to… well, values as well.

Just lacking identity already allows for substitutable copies; that isn't
necessarily valueness, and if users don't have to think B2 instances are a
whole new kind of thing they've never seen before, that is very (sorry)
valuable.

If I get to hang onto the meaning of "value" in my document, then I can use
it to explain B2: a B2 instance is an object, whose identity is either
nonexistent or completely unobservable (no difference). That makes the VM
free to substitute equal copies, or (whenever indistinguishable) to
*represent* it as a compound value instead, or even to box that compound
value up again as needed. In any case it is still, meaningfully, an object."

--
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com