From brian.goetz at oracle.com  Tue Nov  2 21:18:46 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 2 Nov 2021 17:18:46 -0400
Subject: Consolidating the user model
Message-ID: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>

We've been grinding away, and we think we have a reduced-complexity user 
model.? This is all very rough, and there's lots we need to write up 
more carefully first, but I'm sharing this as a preview of how we can 
simplify past where JEPs 401 and 402 currently stand.


# Consolidating the user model

As the mechanics of primitive classes have taken shape, it is time to take
another look at the user model.

Valhalla started with the goal of providing user-programmable classes which
could be flat and dense in memory.? Numerics are one of the motivating use
cases, but adding new primitive types directly to the language has a 
very high
barrier.? As we learned from [Growing a Language][growing] there are 
infinitely
many numeric types we might want to add to Java, but the proper way to 
do that
is as libraries, not as language features.

In the Java language as we have today, objects and primitives are 
different in
almost every way: objects have identity, primitives do not; objects are 
referred
to through references, primitives are not; object references can be null,
primitives cannot; objects can have mutable state, primitive can not; 
classes
can be extended, primitive types cannot; loading and storing of object
references is atomic, but loading and storing of large primitives is 
not.? For
obvious reasons, the design center has revolved around the 
characteristics of
primitives, but the desire to have it both ways is strong; developers 
continue
to ask for variants of primitive classes that have a little more in 
common with
traditional classes in certain situations.? These include:

 ?- **Nullability.**? By far the most common concern raised about primitive
 ?? classes, which "code like a class", is the treatment of null; many 
developers
 ?? want the benefits of flattening but want at least the option to have 
`null`
 ?? as the default value, and getting an exception when an uninitialized 
instance
 ?? is used.

 ?- **Classes with no sensible default.**? Prior to running the 
constructor, the
 ?? JVM initializes all memory to zero.? Since primitive classes are 
routinely
 ?? stored directly rather than via reference, it is possible that users 
might be
 ?? exposed to instances in this initial, all-zero state, without a 
constructor
 ?? having run.? For numeric classes such as complex numbers, zero is a fine
 ?? default, and indeed a good default.? But for some classes, not only 
is zero
 ?? not the best default, but there _is no good default_. Storing dates as
 ?? seconds-since-epoch would mean uninitialized dates are interpreted 
as Jan 1,
 ?? 1970, which is more likely to be a bug than the desired behavior.? 
Classes
 ?? may try to reject bad values in their constructor, but if a class has no
 ?? sensible default, then they would rather have a default that behaves 
more
 ?? like null, where you get an error if you dereference it.? And if the 
default
 ?? is going to behave like null, it's probably best if the default _is_ 
null.

 ?- **Migration**.? Classes like `Optional` and `LocalDate` today are
 ?? _value-based_, meaning they already disavow the use of object 
identity and
 ?? therefore are good candidates for being primitive classes. However, 
since
 ?? these classes exist today and are used in existing APIs and client 
code, they
 ?? would have additional compatibility constraints.? They would have to 
continue
 ?? to be passed by object references to existing API points (otherwise the
 ?? invocation would fail to link) and these types are already nullable.

 ?- **Non-tearability.**? 64-bit primitives (`long` and `double`) risk 
_tearing_
 ?? when accessed under race unless they are declared `volatile`.? However,
 ?? objects with final fields offer special initialization-safety guarantees
 ?? under the JMM, even under race.? So should primitive classes be more 
like
 ?? primitives (risking being seen to be in impossible states), or more like
 ?? classes (consistent views for immutable objects are guaranteed, even 
under
 ?? race)?? Tear-freedom has potentially signficant costs, and tearing has
 ?? signficant risks, so it is unlikely one size fits all.

 ?- **Direct control over flattening.**? In some cases, flattening is
 ?? counterproductive.? For example, if we have a primitive class with many
 ?? fields, sorting a flattened array may be more expensive than sorting 
an array
 ?? of references; while we don't pay the indirection costs, we do pay for
 ?? increased footprint, as well as increased memory movement when swapping
 ?? elements.? Similarly, if we want to permute an array with a side 
index, it
 ?? may well be cheaper to maintain an array of references rather than 
copying
 ?? all the data into a separate array.

These requests are all reasonable when taken individually; its easy to 
construct
use cases where one would want it both ways for any given 
characteristic.? But
having twelve knobs (and 2^12 possible settings) on primitive classes is 
not a
realistic option, nor does it result in a user model that is easy to reason
about.

In the current model, a primitive class is really like a primitive -- no 
nulls,
no references, always flattened, tearable when large enough. Each primitive
class `P` comes with a companion reference type (`P.ref`), which behaves 
much as
boxes do today (except without identity.)? There is also, for migration, an
option (`ref-default`) to invert the meaning of the unqualified name, so 
that by
default `Optional` means `Optional.ref`, and flattening must be explicitly
requested which, in turn, is the sole motivation for the `P.val` 
denotation.) We
would like for the use of the `.ref` and `.val` qualifiers to be rare, but
currently they are not rare enough for comfort.

Further, we've explored but have not committed to a means of declaring 
primitive
classes which don't like their zero value, for primitive classes with no 
good
default, so that dereferencing a zero value would result in some sort of
exception.? (The nullability question is really dominated by the 
initialization
safety question.)? This would be yet another variant of primitive class.

A serious challenge to this stacking is the proliferation of options; 
there are
knobs for nullability, zero-hostility, migration, tear-resistence, etc.
Explaining when to use which at the declaration site is already 
difficult, and
there is also the challenge of when to use `ref` or `val` at the use 
site.? The
current model has done well at enumerating the requirements (and, helping us
separate the real ones from the wannabes), so it is now time to consolidate.

## Finding the buckets

Intuitively, we sense that there are three buckets here; traditional 
identity
classes in one bucket, traditional primitives (coded like classes) in 
another,
and a middle bucket that offers some "works like an int" benefits but 
with some
of the affordances (e.g., nullability, non-tearability) of the first.

Why have multiple buckets at all?? Project Valhalla has two main goals: 
better
performance (enabling more routine flattening and better density), and 
unifying
the type system (healing the rift between primitives and objects.)? It's 
easy to
talk about flattening, but there really are at least three categories of
flattening, and different ones may be possible in different situations:

 ?- **Heap flattening.**? Inlining the layout of one object into another 
object
 ?? (or array) layout; when class `C` has a field of type `D`, rather than
 ?? indirecting to a `D`, we inline D's layout directly into C.

 ?- **Calling convention flattening.**? Shredding a primitive class into its
 ?? fields in (out-of-line) method invocations on the call stack.

 ?- **IR flattening.**? When calling a method that allocates a new 
instance and
 ?? returns it, eliding the allocation and shredding it into its fields 
instead.
 ?? This only applies when we can inline through from the allocation to the
 ?? consumption of its fields.? (Escape analysis also allows this form of
 ?? flattening, but only for provably non-escaping objects.? If we know the
 ?? object is identity free, we can optimize in places where EA would fail.)

#### Nullability

Variables in the heap (fields and array elements) must have a default 
value; for
all practical purposes it is a forced move that this default value is the
all-zero-bits value.? This zero-bits value is interpreted as `null` for
references, zero for numerics, and `false` for booleans today.

If primitives are to "code like a class", the constructor surely must be 
able to
reject bad proposed states.? But what if the constructor thinks the default
value is a bad state?? The desire to make some primitive classes 
nullable stems
from the reality that for some classes, we'd like a "safe" default -- 
one that
throws if you try to use it before it is initialized.

But, the "traditional" primitives are not nullable, and for good reason; 
zero is
a fine default value, and the primitives we have today typically use all 
their
bit patterns, meaning that arranging for a representation of null 
requires at
least an extra bit, which in reality means longs would take at least 65 bits
(which in reality means 128 bits most of the time.)

So we see nullability is a tradeoff; on the one hand, it gives us protection
from uninitialized variables, but also has costs -- extra footprint, extra
checks.? We experimented with a pair of modifiers `null-default` and
`zero-default`, which would determine how the zero value is 
interpreted.? But
this felt like solving the problem at the wrong level.

#### Tearing

The Java Memory Model includes special provisions for visibility of final
fields, even with the reference to their container object is shared via 
a data
race.? These initialization safety guarantees are the bedrock of the Java
security model; a String being seen to change its value -- or to not respect
invariants established by its constructor -- would make it nearly 
impossible to
reason about security.

On the other hand, longs and doubles permit tearing when shared via data 
races.
This isn't great, but preventing tearing has a cost, and the whole 
reason we got
primitives in 1995 was driven by expectations and tradeoffs around 
arithmetical
performance.? Preventing tearing is still quite expensive; above 64 
bits, atomic
instructions have a significant tax, and often the best way to manage 
tearing is
via an indirection when stored in the heap (which is precisely what 
flattening
is trying to avoid.)

When we can code primitives "like a class", which should they be more 
like?? It
depends!? Classes that are more like numerics may be willing to tolerate 
tearing
for the sake of improved performance; classes that are more like 
"traditional
classes" will want the initialization safety afforded to immutable objects
already.

So we see tearability is a tradeoff; on the one hand, it protects invariants
from data races, but also has costs -- expensive atomic instructions, or 
reduced
heap flattening.? We experimented with a modifier that marks classes as
non-tearable, but this would require users to keep track of which primitive
classes are tearable and which aren't.? This felt like solving the 
problem at
the wrong level.

#### Migration

There are some classes -- such as `java.lang.Integer`, or 
`java.util.Optional`
-- that meet all the requirements to be declared as (nullable) primitive
classes, but which exist today in as identity classes.? We would like to 
be able
to migrate these to primitives to get the benefits of flattening, but are
constrained that (at least for non-private API points) they must be 
represented
as `L` descriptors for reasons of binary compatibility.? Our existing
interpretation of `L` descriptors is that they represent references as 
pointers;
this means that even if we could migrate these types, we'd still give up 
on some
forms of flattening (heap and stack), and our migration would be less than
ideal.

Worse, the above interpretation of migration suggests that sometimes a 
use of
`P` is translated as `LP`, and sometimes as `QP`.? To the degree that 
there is
uncertainty in whether a given source type translates to an `L` or `Q`
descriptor, this flows into either uncertainty of how to use reflection 
(users
must guess as to whether a given API point using `P` was translated with 
`LP` or
`QP`), or uncertainty on the part of reflection (the user calls
`getMethod(P.class)`, and reflection must consider methods that accept 
both `LP`
and `QP` descriptors.)

## Restacking for simplicity

The various knobs on the user model (which may flow into translation and
reflection) risk being death by 1000 cuts; they not only add complexity 
to the
implementation, but they add complexity for users.? This prompted a 
rethink of
assumptions at every layer.

#### Nullable primitives

The first part of the restacking involved relaxing the assumption that 
primitive
classes are inherently non-nullable.? We shied away from this for a long 
time,
knowing that there would be significant VM complexity down this road, 
but in the
end concluded that the complexity is better spent here than elsewhere.? 
These
might be translated as `Q` descriptors, or might be translated as `L`
descriptors with a side channel for preloading metadata -- stay tuned for a
summary of this topic.

 > Why Q?? The reason we have `Q` descriptors at all is that we need to know
things about classes earlier than we otherwise would, in order to make 
decisions
that are hard to unmake later (such as layout and calling convention.)? 
Rather
than interpreting `Q` as meaning "value type" (as the early prototypes 
did), `Q`
acquired the interpretation "go and look."? When the JVM encounters a 
field or
method descriptor with a `Q` in it, rather than deferring classloading 
as long
as possible (as is the case with `L` descriptors), we load the class 
eagerly, so
we can learn all we need to know about it.? From classloading, we might 
not only
learn that it is a primitive class, but whether it should be nullable or 
not.
(Since primitive classes are monomorphic, carrying this information 
around on a
per-linkage basis is cheap enough.)

So some primitive classes are marked as "pure" primitives, and others as
supporting null; when the latter are used as receivers, `invokevirtual` 
does a
null check prior to invocation (and NPEs if the receiver is null).? When 
moving
values between the heap and the stack (`getfield`, `aastore`, etc), these
bytecodes must check for the "flat null" representation in the heap and 
a real
null on the stack.? The VM needs some help from the classfile to help 
choose a
bit pattern for the flat null; the most obvious strategy is to inject a
synthetic boolean, but there are others that don't require additional 
footprint
(e.g., flow analysis that proves a field is assigned a non-default 
value; using
low-order bits in pointers; using spare bits in booleans; using pointer 
colors;
etc.)? The details are for another day, but we would like for this to not
intrude on the user model.

#### L vs Q

The exploration into nullable primitives prompted a reevaluation of the 
meaning
of L vs Q.? Historically we had interpreted L vs Q as being "pointer vs 
flat"
(though the VM always has the right to unflatten if it feels like it.)? 
But over
time we've been moving towards Q mostly being about earlier loading (so 
the VM
can learn what it needs to know before making hard-to-reverse decisions, 
such as
layout.)? So let's go there fully.

A `Q` descriptor means that the class must be loaded eagerly (Q for "quick")
before resolving the descriptor; an `L` descriptor means it _must not 
be_ (L for
"lazy"), consistent with current JVMS treatment.? Since an `L` descriptor is
lazily resolved, we have to assume conservatively that it is nullable; a Q
descriptor might or might not be nullable (we'll know once we load the 
class,
which we do eagerly.)

What we've done is wrested control of flatness away from the language, 
and ceded
it to the VM, where it belongs.? The user/language expresses semantic
requirements (e.g., nullability) and the VM chooses a representation.? 
That's
how we like it.

#### It's all about the references

The rethink of L vs Q enabled a critical restack of the user model.? 
With this
reinterpretation, Q descriptors can (based on what is in the classfile) 
still be
reference types -- and these reference types can still be flattened; 
alternately,
with side-channels for preload metadata on `L` descriptors, we may be 
able to get
to non-flat references under `L` descriptors.

Returning to the tempting user knobs of nullability and tearability, we 
can now
put these where they belong: nullability is a property of _reference 
types_ --
and some primitive classes can be reference types.? Similarly, the
initialization safety of immutable objects derives from the fact that object
references are loaded atomically (with respect to stores of the same 
reference.)
Non-tearability is also a property of reference types.? (Similar with layout
circularity; references can break layout circularities.)? So rather than the
user choosing nullability and non-tearability as ad-hoc choices, we 
treat them
as affordances of references, and let users choose between reference-only
primitive classes, and the more traditional primitive classes, that come 
in both
reference and value flavors.

 > This restack allows us to eliminate `ref-default` completely (we'll 
share more
 > details later), which in turn allows us to eliminate `.val` completely.
 > Further, the use cases for `.ref` become smaller.

#### The buckets

So, without further ado, let's meet the new user model.? The names may 
change,
but the concepts seem pretty sensible.? We have identity classes, as before;
let's call that the first bucket.? These are unchanged; they are always
translated with L descriptors, and there is only one usable `Class` 
literal for
these.

The second bucket are _identity-free reference classes_.? They come with the
restrictions on identity-free classes: no mutability and limited 
extensibility.
Because they are reference types, they are nullable and receive tearing
protection.? They are flattenable (though, depending on layout size and 
hardware
details, we may choose to get tearing protection by maintaining the
indirection.)? These might be with Q descriptors, or with modified L
descriptors, but there is no separate `.ref` form (they're already 
references)
and there is only one usable `Class` literal for these.

The third bucket are the _true primitives_.? These are also identity-free
classes, but further give rise to both value and reference types, and 
the value
type is the default (we denote the reference type with the familiar 
`.ref`.)
Value types are non-nullable, and permit tearing just as existing 
primitives do.
The `.ref` type has all the affordances of reference types -- 
nullability and
tearing protection.? The value type is translated with Q; the reference 
type is
translated with L.? There are two mirrors (`P.class` and `P.ref.class`) to
reflect the difference in translation and semantics.

A valuable aspect of this translation strategy is that there is a 
deterministic,
1:1 correspondence between source types and descriptors.

How we describe the buckets is open to discussion; there are several 
possible
approaches.? One possible framing is that the middle bucket gives up 
identity,
and the third further gives up references (which can be clawed back with
`.ref`), but there are plenty of ways we might express it. If these are
expressed as modifiers, then they can be applied to records as well.

Another open question is whether we double down, or abandon, the 
terminology of
boxing.? On the one hand, users are familiar with it, and the new 
semantics are
the same as the old semantics; on the other, the metaphor of boxing is 
no longer
accurate, and users surely have a lot of mental baggage that says "boxes are
slow."? We'd like for users to come to a better understanding of the 
difference
between value and reference types.

#### Goodbye, direct control over flattening

In earlier explorations, we envisioned using `X.ref` as a way to explicitly
ask for no flattening.? But in the proposed model, flattening is entirely
under the control of the VM -- where we think it belongs.

#### What's left for .ref?

A pleasing outcome here is that many of the use cases for `X.ref` are 
subsumed
into more appropriate mechanisms, leaving a relatively small set of 
corner-ish
cases.? This is what we'd hoped `.ref` would be -- something that stays 
in the
corner until summoned.? The remaining reasons to use `X.ref` at the use site
include:

 ?- Boxing.? Primitives have box objects; strict value-based classes need
 ?? companion reference types for all the same situations as today's 
primitives
 ?? do.? It would be odd if the box were non-denotable.
 ?- Null-adjunction.? Some methods, like `Map::get`, return null to 
indicate no
 ?? mapping was present.? But if in `Map<K,V>`, `V` is not nullable, 
then? there
 ?? is no way to express this method.? We envision that such methods 
would return
 ?? `V.ref`, so that strict value-based classes would widened to their 
"box" on
 ?? return, and null would indicate no mapping present.
 ?- Cycle-breaking.? Primitives that are self-referential (e.g., linked 
list node
 ?? classes that have a next node field) would have layout 
circularities; using a
 ?? reference rather than a value allows the circularity to be broken.

This list is (finally!) as short as we would like it to be, and devoid of
low-level control over representation; users use `X.ref` when they need
references (either for interop with reference types, or to require 
nullability).
Our hope all along was that `.ref` was mostly "break glass in case of
emergency"; I think we're finally there.

#### Migration

The topic of migration is a complex one, and I won't treat it fully here 
(the
details are best left until we're fully agreed on the rest.) Earlier 
treatments
of migration were limited, in that even with all the complexity of
`ref-default`, we still didn't get all the flattening we wanted, because the
laziness of `L` descriptors kept us from knowing about potential 
flattenability
until it was too late.? Attempts to manage "preload lists" or "side preload
channels" in previous rounds foundered due to complexity or corner 
cases, but
the problem has gotten simpler, since we're only choosing representation 
rather
than value sets now -- which means that the `L*` types might work out here.
Stay tuned for more details.

## Reflection

Earlier designs all included some non-intuitive behavior around reflection.
What we'd like to do is align the user-visible types with reflection 
literals
with descriptors, following the invariant that

 ???? new X().getClass() == X.class

## TBD

Stay tuned for some details on managing null encoding and detection,
reference types under either Q or modified L descriptors, and some
thoughts on painting the bikeshed.

growing: https://dl.acm.org/doi/abs/10.1145/1176617.1176621


From john.r.rose at oracle.com  Tue Nov  2 21:54:17 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 2 Nov 2021 21:54:17 +0000
Subject: Consolidating the user model
In-Reply-To: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
Message-ID: <F2106BF4-775D-4B58-88C1-EDDB471F8BC9@oracle.com>

+100; great summary

> On Nov 2, 2021, at 2:18 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> which means that the `L*` types might work out here.   Stay tuned for more details.  

A footnote, FTR, about L*-descriptors, in case it doesn?t ring a bell.

Brian is referring here to the thing we have talked about several
years ago, of loosely coupling a side-record with an occurrence
of L-Foo that means ?link like L-Foo, but load and adapt like Q-Foo?.
We went through some of these iterations even before we settled
on Q-descriptors; they are back again, but in a far more tractable
form we think.

L* is not a new descriptor, it?s just an L (so it links to plain L?s)
but some sort of star-like marking * (not really in the descriptor
string but a side channel!) alerts the JVM to do extra loading
and adapting.

So, one current vision of this side-channel is a very limited early
use of the ?Type Restriction? mechanism, as mentioned in the
Parametric VM proposal and elsewhere.  The idea is that a type
L*-Foo would be TR-ed to itself (Foo.class) and since TR?s use
eager loading (of the content of the TR, not of the type it
applies to) the effect would be similar to a Q-Foo, but it
would still be spelled L-Foo.  To avoid implementation
burdens, the JVM would not accept any more ?interesting?
TRs, until we need to build them out for specialized generics.
Or we?d just have a one-shot, purpose-built side channel
which smells like an infant sibling to an eventual real T.R.
feature.  A T.R. that really restricts a type (instead of
just asks the JVM to take a closer look a la Q-Foo) is a
much deeper implementation challenge, since it creates
possible failure points when restrictions are violated.
An L* cannot violate itself since the value set is the same.
This is why L* only works on the middle bucket.

L*-Foo (using TRs or any other side-channel) is not a perfect
substitute for Q-Foo, because the stars ?rub off too easily?
to ensure rigid correspondence between callers and callee.
This means L*-based API linkage requires more speculation
and runtime checking, compared to Q-based API linkage.

Although it may seem odd, there are a number of practical
reasons to use L* in the middle bucket but Q in the left
bucket.  The left bucket needs two descriptors, so L/Q.
The middle bucket has just one class mirror, so either Q
or else a mix of L and L*, and it needs some story for
migration for a few of its citizens, so L* looks good
again (linking with legacy L with a dynamic mixup).

As Brian says, we may elect to use Q uniformly for the
middle bucket, and handle the migration problem
another way.  It would be good if we could decide
Q vs. L* for the middle bucket without co-solving
the migration problem.

Anyway, such smaller details are up in the air.  The
points in Brian?s message are the high-order bits, and
the stuff I?ve shared here is a footnote.  Please do give
the high-order bits your best attention.  It?s a really
good write-up.

? John

From brian.goetz at oracle.com  Tue Nov  2 21:58:37 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 2 Nov 2021 17:58:37 -0400
Subject: Consolidating the user model
In-Reply-To: <F2106BF4-775D-4B58-88C1-EDDB471F8BC9@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <F2106BF4-775D-4B58-88C1-EDDB471F8BC9@oracle.com>
Message-ID: <4388956e-f5a0-13ad-750f-cf15ba74b630@oracle.com>

"Links like an L; works like a Q"

On 11/2/2021 5:54 PM, John Rose wrote:
> L* is not a new descriptor, it?s just an L (so it links to plain L?s)
> but some sort of star-like marking * (not really in the descriptor
> string but a side channel!) alerts the JVM to do extra loading
> and adapting.

From kevinb at google.com  Tue Nov  2 22:44:34 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Tue, 2 Nov 2021 15:44:34 -0700
Subject: Consolidating the user model
In-Reply-To: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
Message-ID: <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>

Good stuff.


On Tue, Nov 2, 2021 at 2:19 PM Brian Goetz <brian.goetz at oracle.com> wrote:

But, the "traditional" primitives are not nullable, and for good reason;
> zero is
> a fine default value,
>

Yes, it would have been impractical to do otherwise, but here's my stock
reminder that zero being a "fine" default value has *still nonetheless* caused
many thousands of bugs.

Again, it had to be done. But I think it's notable that those bugs happen
even for the types that have the *absolute most sensible* default values.

My concern is that the purest form of value types will be overused and
misused for even less clear-cut cases. I would like to think that we can
convince these users that they really want the next "bucket" over, which I
think comes down to whether the added cost of `null` is worth it.


Returning to the tempting user knobs of nullability and tearability, we can
> now
>
put these where they belong: nullability is a property of _reference types_
> --
>

Though I've argued loudly here for the notion that nullability is not
*conceptually* intrinsic to references (and though I still think we should
start saying "the null value" instead of "the null reference"), I
nevertheless find this an acceptable compromise, because (a) I think
nullable values was just introducing too much practical complexity (b) I
hope most use cases really will just use the middle bucket and be fine.

Btw, am I right that for the middle bucket, `==` will fail (at compile-time
when possible)?


The third bucket are the _true primitives_.  These are also identity-free
> classes, but further give rise to both value and reference types, and the
> value
> type is the default (we denote the reference type with the familiar
> `.ref`.)
> Value types are non-nullable, and permit tearing just as existing
> primitives do.
> The `.ref` type has all the affordances of reference types -- nullability
> and
> tearing protection.
>

In fact, if I'm looking at a middle-bucket class, and I'm looking at one of
these `.ref` types of "primitive" class, as far as I can tell I should be
able to think of these in exactly the same way as exactly the same things.
(I'm aware you intend to define `==` differently for the two, but I'll get
into my massive concerns about that later.) Basically, that's good.


> How we describe the buckets is open to discussion; there are several
> possible
> approaches.  One possible framing is that the middle bucket gives up
> identity,
> and the third further gives up references (which can be clawed back with
> `.ref`), but there are plenty of ways we might express it.
>

We should address the conceptual-simplicity cost of this "clawing back"
sometime.


Another open question is whether we double down, or abandon, the
> terminology of
> boxing.  On the one hand, users are familiar with it, and the new
> semantics are
> the same as the old semantics; on the other, the metaphor of boxing is no
> longer
> accurate, and users surely have a lot of mental baggage that says "boxes
> are
> slow."  We'd like for users to come to a better understanding of the
> difference
> between value and reference types.
>

The key for me is that the new boxing takes over for everything the old
boxing did, and more. So, it's better boxing. I see no value in fighting
against that. If users are thinking of this by starting from what they know
about int/Integer, that's actually *good*. They will just find out it's
better, that's all.


 - Null-adjunction.  Some methods, like `Map::get`, return null to indicate
> no
>    mapping was present.  But if in `Map<K,V>`, `V` is not nullable, then
> there
>    is no way to express this method.  We envision that such methods would
> return
>    `V.ref`, so that strict value-based classes would widened to their
> "box" on
>    return, and null would indicate no mapping present.
>

Now just spell it `?` :-)

(not serious. Also, not not serious)


## Reflection
>
> Earlier designs all included some non-intuitive behavior around
> reflection.
> What we'd like to do is align the user-visible types with reflection
> literals
> with descriptors, following the invariant that
>
>      new X().getClass() == X.class
>

Seems like part of the goal would be making it fit naturally with the
current int/Integer relationship (of course, `42.getClass()` is uncommitted
to any precedent).

It seems like `Complex.class` (as opposed to `Complex.ref.class`) would
never be returned by `Object.getClass()` in any other condition than when
you could have just written `Complex.class` anyway.

Actually, that makes me start to wonder if `getClass()` should be another
method like `notify` that simply doesn't make sense to call on value types.
(But we still need the two distinct Class instances per class anyway.)


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Tue Nov  2 22:58:58 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 2 Nov 2021 22:58:58 +0000
Subject: Consolidating the user model
In-Reply-To: <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
Message-ID: <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>

On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

Btw, am I right that for the middle bucket, `==` will fail (at compile-time when possible)?


I don?t see how middle bucket references, which behave very
much like old-bucket references (id-classes), would tend to
fail on ==/acmp any more than old-bucket references.

Example please?

If X is an old-bucket or middle-bucket type, then all of
these are OK and lead to expected results:

X x, x1;
x == x
x == x1
x == null

If Y is a class which is statically disjoint from X, then
these may fail, but not through any bucket-related
effect:

Y y;
x == y  //error: incomparable types: X and Y

I think I?m missing your point?

From kevinb at google.com  Tue Nov  2 23:07:30 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Tue, 2 Nov 2021 16:07:30 -0700
Subject: Consolidating the user model
In-Reply-To: <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
Message-ID: <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>

Hmm, I'd rather pretend I hadn't said it, if that will keep the focus on
the main points. :-)

I haven't caught up on the plans for equality in a long time.


On Tue, Nov 2, 2021 at 3:59 PM John Rose <john.r.rose at oracle.com> wrote:

> On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion <kevinb at google.com> wrote:
>
>
> Btw, am I right that for the middle bucket, `==` will fail (at
> compile-time when possible)?
>
>
> I don?t see how middle bucket references, which behave very
> much like old-bucket references (id-classes), would tend to
> fail on ==/acmp any more than old-bucket references.
>
> Example please?
>
> If X is an old-bucket or middle-bucket type, then all of
> these are OK and lead to expected results:
>
> X x, x1;
> x == x
> x == x1
> x == null
>
> If Y is a class which is statically disjoint from X, then
> these may fail, but not through any bucket-related
> effect:
>
> Y y;
> x == y  //error: incomparable types: X and Y
>
> I think I?m missing your point?
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Tue Nov  2 23:08:39 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 2 Nov 2021 23:08:39 +0000
Subject: Consolidating the user model
In-Reply-To: <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
Message-ID: <CDC7ABCB-E108-4D0C-A98F-382EA3EBDFBA@oracle.com>

On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:


     new X().getClass() == X.class

Seems like part of the goal would be making it fit naturally with the current int/Integer relationship (of course, `42.getClass()` is uncommitted to any precedent).

It seems like `Complex.class` (as opposed to `Complex.ref.class`) would never be returned by `Object.getClass()` in any other condition than when you could have just written `Complex.class` anyway.

Actually, that makes me start to wonder if `getClass()` should be another method like `notify` that simply doesn't make sense to call on value types. (But we still need the two distinct Class instances per class anyway.)


Yep, you hit on a tricky spot there.  One part of the problem
is that getClass, specifically and uniquely, has a special relation
the the primitive types which is coupled to the typing of
class literals like int.class (which is Class<Integer> not Class<int>).
Also, Integer is a class, and Complex is a class, but they have
different ?tilts?:  Integer is (kinda sorta) int.ref but Complex
is not Complex.ref, and the mirrors reflect this difference.

Sorting this out seems to be an overconstrained problem.

As you say, we have not yet applied ?.getClass? to any
non-ref type, yet, but we will certainly do so, and that?s
when the fun begins.

Also, trying to retype int.class as Class<int> is a related
part of the fun.

In the end, however nicely we ?heal the rift? between
good old int and his new friend Complex, there will
surely be some scars on good old int from his time
marooned (with just a few friends) in primitive-land.

(My current mental metaphor for the isolation of int
is Gilligan, who had about the same number of
unfortunate island-mates as int does.)


From brian.goetz at oracle.com  Tue Nov  2 23:53:38 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 2 Nov 2021 19:53:38 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
Message-ID: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>


> My concern is that the purest form of value types will be overused and 
> misused for even less clear-cut cases. I would like to think that we 
> can convince these users that they really want the next "bucket" over, 
> which I think comes down to whether the added cost of `null` is worth it.

I share this concern.? Do you have any thoughts of how to make B2 more 
attractive?

For the record, we expect to see similar stack-based and IR-based 
flattening across buckets 2 and 3, but measurably less heap-based 
flattening for bucket 2.? Plus the extra footprint.

> Btw, am I right that for the middle bucket, `==` will fail (at 
> compile-time when possible)?

B2 gets state-based ==, just like B3.? No difference there; if you have 
no identity, then equality is state-based.

>
>
>     The third bucket are the _true primitives_.? These are also
>     identity-free
>     classes, but further give rise to both value and reference types,
>     and the value
>     type is the default (we denote the reference type with the
>     familiar `.ref`.)
>     Value types are non-nullable, and permit tearing just as existing
>     primitives do.
>     The `.ref` type has all the affordances of reference types --
>     nullability and
>     tearing protection.
>
>
> In fact, if I'm looking at a middle-bucket class, and I'm looking at 
> one of these `.ref` types of "primitive" class, as far as I can tell I 
> should be able to think of these in exactly the same way as exactly 
> the same things.

Yes.? A B2, and a B3.ref, behave identically.

> (I'm aware you intend to define `==` differently for the two, but I'll 
> get into my massive concerns about that later.)

Actually, B2, B3, and B3.ref all have the same interpretation of ==, 
which is state-based.? (You can think of this as "box (or unbox) before 
comparing a B3 with a B3.ref.)

>
>     ?- Null-adjunction.? Some methods, like `Map::get`, return null to
>     indicate no
>     ?? mapping was present.? But if in `Map<K,V>`, `V` is not
>     nullable, then? there
>     ?? is no way to express this method.? We envision that such
>     methods would return
>     ?? `V.ref`, so that strict value-based classes would widened to
>     their "box" on
>     ?? return, and null would indicate no mapping present.
>
>
> Now just spell it `?` :-)
> (not serious. Also, not not serious)

Yeah, maybe.? If that were the only difference, I'd be more inclined.? 
But it drags in ref-ness, and all the reference affordances, so it feels 
more misleading than helpful at this point.

>
>     ## Reflection
>
>     Earlier designs all included some non-intuitive behavior around
>     reflection.
>     What we'd like to do is align the user-visible types with
>     reflection literals
>     with descriptors, following the invariant that
>
>     ???? new X().getClass() == X.class
>
>
> Seems like part of the goal would be making it fit naturally with the 
> current int/Integer relationship (of course, `42.getClass()` is 
> uncommitted to any precedent).

There's a nasty tension here.? On the one hand, for B3 classes, it makes 
sense for b3.getClass() to yield the val mirror, but int.getClass() 
historically corresponds to the ref mirror (Object o = 3; o.getClass() 
== Integer.class.)? To invert it, we would have to break a lot of 
reflection-using code that tests for Integer.class because that's how 
primitives are reflected.? Work in progress.


> Actually, that makes me start to wonder if `getClass()` should be 
> another method like `notify` that simply doesn't make sense to call on 
> value types. (But we still need the two distinct Class instances per 
> class anyway.)

You could argue that it doesn't make sense on the values, but surely it 
makes sense on their boxes.? But its a thin argument, since classes 
extend Object, and we want to treat values as objects (without appealing 
to boxing) for purposes of invoking methods, accessing fields, etc.? So 
getClass() shouldn't be different.

From brian.goetz at oracle.com  Wed Nov  3 14:05:21 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 3 Nov 2021 10:05:21 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
Message-ID: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>


> I haven't caught?up on the plans for equality in a long time.

This is a good time to catch up on this.

Today, the JVM provides an equality operation on objects in the form of 
the `ACMP` instructions.? It also provides per-primitive equality 
operations (`ICMP`, `FCMP`, etc) for the various primitive types. (The 
JVM mostly erases boolean, byte, char, and short to int, so some of 
these instructions are "missing".)

Today, the language translate the `==` operator to the appropriate ACMP 
/ ICMP / etc instruction, depending on the static type of the operands.? 
(JLS Ch5 (Contexts and Conversions) does the lifting of managing 
mismatches when we, say, compare an object to a primitive.)? The 
important thing to take away here is that there really are multiple `==` 
operators, they are just spelled the same way, and disambiguated by 
static typing; let's call them `id==`, `int==`, etc if there's any 
ambiguity.? Note that `float==` and `double==` are weird when it comes 
to `NaN`, so `==` on primitives is not necessarily just a straight 
bitwise comparison.

Object has an `equals` method; the default implementation is:

 ??? boolean equals(Object other) {
 ??????? return this == other;
 ??? }

So in the absence of code to the contrary, two objects are `equals` if 
they are the same object.

Extrapolating, ACMP is a _substitutability test_; it says that 
substituting one for the other would have no detectable differences.? 
Because all objects have a unique identity, comparing the identities is 
both necessary and sufficient for a substitutability test.? This is the 
foundation on which we abstract `==` on the new classes.

If C is a class with no identity, that means an instance is the state, 
the whole state, and nothing but the state.? So the natural way to ask 
"could I substitute instance c1 for instance c2" is to compare each of 
its fields with a substitutability test.? Which is exactly what `ACMP` 
does on primitive objects.? In keeping with the notion that each 
primitive type has its own `==`, we'll write `Point==` for the equality 
on `Point`.

For a simple `Point` primitive class, this is obvious, but it gets 
tricky when a primitive is hiding behind a broader static type like 
Object or an interface type.? Consider:

 ??? primitive class Box {
 ??????? Object contents;
 ??? }

How do we compare two boxes?? By comparing their contents.? How do we 
compare contents?? With a substitutability test.? If we have identity 
objects in the box, then the box comparison is "are you both boxes, and 
are your contents `id==`".? What if we have Points in the box?? We need 
to compare them with `Point==`.? How do we know we have Points in the 
box?? By looking at their dynamic type.? So the `==` operation on 
primitive objects not only recurses into fields, but for fields that 
could hold _either_ identity or primitive objects (these are `Object`, 
interfaces, and some abstract classes), we dynamically select the `==` 
operator to use on that field.? (Edge cases: an id object is never `==` 
to a primitive object; null is always `==` to itself.)

Note that `.ref` is transparent here; in order to get a `Point` into the 
`Object` field, we (probably silently) converted it to `Point.ref`.? But 
`Point.ref` uses the same `==` computation as `Point`.? The same is true 
for the B2/B3 distinction; no difference.? Objects without identity are 
equal when their state is equal, whether they're a B2, B3, or B3.ref.

Possibly surprisingly, this has been pushed all the way into `ACMP`.? 
This means that existing code like the default implementation of 
`Object::equals` just works; if you give it primitive objects, it knows 
what to do, and performs the proper substitutability test.? One rough 
edge is that we don't use `==` as the test for float and double fields, 
because it's not a proper substitutability test; we use the semantics of 
`Float::equals` and `Double::equals` instead.? Historical wart.

The bottom line is that `==` is preserved as a substitutability test on 
instances of all primitive classes, whether they're "stored" by 
reference or value.? A corollary is that (finally) Integer instances 
provide reliable `==` semantics, rather than the old unreliable 
cache-based semantics.? (One rift healed.)


From daniel.smith at oracle.com  Wed Nov  3 14:45:59 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 3 Nov 2021 14:45:59 +0000
Subject: EG meeting, 2021-11-03
Message-ID: <CF8FD0A6-DFC9-4719-8683-774EBDEDBE93@oracle.com>

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). Note that we're still on DST in the US, won't shift to 5pm UTC until next time.

We'll discuss:

"Consolidating the user model": Brian described a user model centered on reference and value types. Sent just yesterday, so we'll probably spend most of the time just reviewing the main ideas.


From forax at univ-mlv.fr  Wed Nov  3 14:50:46 2021
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 3 Nov 2021 15:50:46 +0100 (CET)
Subject: Consolidating the user model
In-Reply-To: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
Message-ID: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr>

I really like this, it's far better than how i was seeing Valhalla, pushing .ref into a corner is a good move. 

I still hope that moving from B1 to B2 can be almost backward compatible, if no direct access to the constructor, no synchronized and reasonable uses of ==. 

My only concern now is the dual of Kevin's concern, 
what if people discover that they always want to use the identitiy-free reference types (B2), because it is better integrated with the rest of the Java world and that in the end, the OG/pure primitive types (B3) are almost never used. 

R?mi 

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Mardi 2 Novembre 2021 22:18:46
> Subject: Consolidating the user model

> We've been grinding away, and we think we have a reduced-complexity user model.
> This is all very rough, and there's lots we need to write up more carefully
> first, but I'm sharing this as a preview of how we can simplify past where JEPs
> 401 and 402 currently stand.

> # Consolidating the user model

> As the mechanics of primitive classes have taken shape, it is time to take
> another look at the user model.

> Valhalla started with the goal of providing user-programmable classes which
> could be flat and dense in memory. Numerics are one of the motivating use
> cases, but adding new primitive types directly to the language has a very high
> barrier. As we learned from [Growing a Language][growing] there are infinitely
> many numeric types we might want to add to Java, but the proper way to do that
> is as libraries, not as language features.

> In the Java language as we have today, objects and primitives are different in
> almost every way: objects have identity, primitives do not; objects are referred
> to through references, primitives are not; object references can be null,
> primitives cannot; objects can have mutable state, primitive can not; classes
> can be extended, primitive types cannot; loading and storing of object
> references is atomic, but loading and storing of large primitives is not. For
> obvious reasons, the design center has revolved around the characteristics of
> primitives, but the desire to have it both ways is strong; developers continue
> to ask for variants of primitive classes that have a little more in common with
> traditional classes in certain situations. These include:

> - **Nullability.** By far the most common concern raised about primitive
> classes, which "code like a class", is the treatment of null; many developers
> want the benefits of flattening but want at least the option to have `null`
> as the default value, and getting an exception when an uninitialized instance
> is used.

> - **Classes with no sensible default.** Prior to running the constructor, the
> JVM initializes all memory to zero. Since primitive classes are routinely
> stored directly rather than via reference, it is possible that users might be
> exposed to instances in this initial, all-zero state, without a constructor
> having run. For numeric classes such as complex numbers, zero is a fine
> default, and indeed a good default. But for some classes, not only is zero
> not the best default, but there _is no good default_. Storing dates as
> seconds-since-epoch would mean uninitialized dates are interpreted as Jan 1,
> 1970, which is more likely to be a bug than the desired behavior. Classes
> may try to reject bad values in their constructor, but if a class has no
> sensible default, then they would rather have a default that behaves more
> like null, where you get an error if you dereference it. And if the default
> is going to behave like null, it's probably best if the default _is_ null.

> - **Migration**. Classes like `Optional` and `LocalDate` today are
> _value-based_, meaning they already disavow the use of object identity and
> therefore are good candidates for being primitive classes. However, since
> these classes exist today and are used in existing APIs and client code, they
> would have additional compatibility constraints. They would have to continue
> to be passed by object references to existing API points (otherwise the
> invocation would fail to link) and these types are already nullable.

> - **Non-tearability.** 64-bit primitives (`long` and `double`) risk _tearing_
> when accessed under race unless they are declared `volatile`. However,
> objects with final fields offer special initialization-safety guarantees
> under the JMM, even under race. So should primitive classes be more like
> primitives (risking being seen to be in impossible states), or more like
> classes (consistent views for immutable objects are guaranteed, even under
> race)? Tear-freedom has potentially signficant costs, and tearing has
> signficant risks, so it is unlikely one size fits all.

> - **Direct control over flattening.** In some cases, flattening is
> counterproductive. For example, if we have a primitive class with many
> fields, sorting a flattened array may be more expensive than sorting an array
> of references; while we don't pay the indirection costs, we do pay for
> increased footprint, as well as increased memory movement when swapping
> elements. Similarly, if we want to permute an array with a side index, it
> may well be cheaper to maintain an array of references rather than copying
> all the data into a separate array.

> These requests are all reasonable when taken individually; its easy to construct
> use cases where one would want it both ways for any given characteristic. But
> having twelve knobs (and 2^12 possible settings) on primitive classes is not a
> realistic option, nor does it result in a user model that is easy to reason
> about.

> In the current model, a primitive class is really like a primitive -- no nulls,
> no references, always flattened, tearable when large enough. Each primitive
> class `P` comes with a companion reference type (`P.ref`), which behaves much as
> boxes do today (except without identity.) There is also, for migration, an
> option (`ref-default`) to invert the meaning of the unqualified name, so that by
> default `Optional` means `Optional.ref`, and flattening must be explicitly
> requested which, in turn, is the sole motivation for the `P.val` denotation.) We
> would like for the use of the `.ref` and `.val` qualifiers to be rare, but
> currently they are not rare enough for comfort.

> Further, we've explored but have not committed to a means of declaring primitive
> classes which don't like their zero value, for primitive classes with no good
> default, so that dereferencing a zero value would result in some sort of
> exception. (The nullability question is really dominated by the initialization
> safety question.) This would be yet another variant of primitive class.

> A serious challenge to this stacking is the proliferation of options; there are
> knobs for nullability, zero-hostility, migration, tear-resistence, etc.
> Explaining when to use which at the declaration site is already difficult, and
> there is also the challenge of when to use `ref` or `val` at the use site. The
> current model has done well at enumerating the requirements (and, helping us
> separate the real ones from the wannabes), so it is now time to consolidate.

> ## Finding the buckets

> Intuitively, we sense that there are three buckets here; traditional identity
> classes in one bucket, traditional primitives (coded like classes) in another,
> and a middle bucket that offers some "works like an int" benefits but with some
> of the affordances (e.g., nullability, non-tearability) of the first.

> Why have multiple buckets at all? Project Valhalla has two main goals: better
> performance (enabling more routine flattening and better density), and unifying
> the type system (healing the rift between primitives and objects.) It's easy to
> talk about flattening, but there really are at least three categories of
> flattening, and different ones may be possible in different situations:

> - **Heap flattening.** Inlining the layout of one object into another object
> (or array) layout; when class `C` has a field of type `D`, rather than
> indirecting to a `D`, we inline D's layout directly into C.

> - **Calling convention flattening.** Shredding a primitive class into its
> fields in (out-of-line) method invocations on the call stack.

> - **IR flattening.** When calling a method that allocates a new instance and
> returns it, eliding the allocation and shredding it into its fields instead.
> This only applies when we can inline through from the allocation to the
> consumption of its fields. (Escape analysis also allows this form of
> flattening, but only for provably non-escaping objects. If we know the
> object is identity free, we can optimize in places where EA would fail.)

> #### Nullability

> Variables in the heap (fields and array elements) must have a default value; for
> all practical purposes it is a forced move that this default value is the
> all-zero-bits value. This zero-bits value is interpreted as `null` for
> references, zero for numerics, and `false` for booleans today.

> If primitives are to "code like a class", the constructor surely must be able to
> reject bad proposed states. But what if the constructor thinks the default
> value is a bad state? The desire to make some primitive classes nullable stems
> from the reality that for some classes, we'd like a "safe" default -- one that
> throws if you try to use it before it is initialized.

> But, the "traditional" primitives are not nullable, and for good reason; zero is
> a fine default value, and the primitives we have today typically use all their
> bit patterns, meaning that arranging for a representation of null requires at
> least an extra bit, which in reality means longs would take at least 65 bits
> (which in reality means 128 bits most of the time.)

> So we see nullability is a tradeoff; on the one hand, it gives us protection
> from uninitialized variables, but also has costs -- extra footprint, extra
> checks. We experimented with a pair of modifiers `null-default` and
> `zero-default`, which would determine how the zero value is interpreted. But
> this felt like solving the problem at the wrong level.

> #### Tearing

> The Java Memory Model includes special provisions for visibility of final
> fields, even with the reference to their container object is shared via a data
> race. These initialization safety guarantees are the bedrock of the Java
> security model; a String being seen to change its value -- or to not respect
> invariants established by its constructor -- would make it nearly impossible to
> reason about security.

> On the other hand, longs and doubles permit tearing when shared via data races.
> This isn't great, but preventing tearing has a cost, and the whole reason we got
> primitives in 1995 was driven by expectations and tradeoffs around arithmetical
> performance. Preventing tearing is still quite expensive; above 64 bits, atomic
> instructions have a significant tax, and often the best way to manage tearing is
> via an indirection when stored in the heap (which is precisely what flattening
> is trying to avoid.)

> When we can code primitives "like a class", which should they be more like? It
> depends! Classes that are more like numerics may be willing to tolerate tearing
> for the sake of improved performance; classes that are more like "traditional
> classes" will want the initialization safety afforded to immutable objects
> already.

> So we see tearability is a tradeoff; on the one hand, it protects invariants
> from data races, but also has costs -- expensive atomic instructions, or reduced
> heap flattening. We experimented with a modifier that marks classes as
> non-tearable, but this would require users to keep track of which primitive
> classes are tearable and which aren't. This felt like solving the problem at
> the wrong level.

> #### Migration

> There are some classes -- such as `java.lang.Integer`, or `java.util.Optional`
> -- that meet all the requirements to be declared as (nullable) primitive
> classes, but which exist today in as identity classes. We would like to be able
> to migrate these to primitives to get the benefits of flattening, but are
> constrained that (at least for non-private API points) they must be represented
> as `L` descriptors for reasons of binary compatibility. Our existing
> interpretation of `L` descriptors is that they represent references as pointers;
> this means that even if we could migrate these types, we'd still give up on some
> forms of flattening (heap and stack), and our migration would be less than
> ideal.

> Worse, the above interpretation of migration suggests that sometimes a use of
> `P` is translated as `LP`, and sometimes as `QP`. To the degree that there is
> uncertainty in whether a given source type translates to an `L` or `Q`
> descriptor, this flows into either uncertainty of how to use reflection (users
> must guess as to whether a given API point using `P` was translated with `LP` or
> `QP`), or uncertainty on the part of reflection (the user calls
> `getMethod(P.class)`, and reflection must consider methods that accept both `LP`
> and `QP` descriptors.)

> ## Restacking for simplicity

> The various knobs on the user model (which may flow into translation and
> reflection) risk being death by 1000 cuts; they not only add complexity to the
> implementation, but they add complexity for users. This prompted a rethink of
> assumptions at every layer.

> #### Nullable primitives

> The first part of the restacking involved relaxing the assumption that primitive
> classes are inherently non-nullable. We shied away from this for a long time,
> knowing that there would be significant VM complexity down this road, but in the
> end concluded that the complexity is better spent here than elsewhere. These
> might be translated as `Q` descriptors, or might be translated as `L`
> descriptors with a side channel for preloading metadata -- stay tuned for a
> summary of this topic.

> > Why Q? The reason we have `Q` descriptors at all is that we need to know
> things about classes earlier than we otherwise would, in order to make decisions
> that are hard to unmake later (such as layout and calling convention.) Rather
> than interpreting `Q` as meaning "value type" (as the early prototypes did), `Q`
> acquired the interpretation "go and look." When the JVM encounters a field or
> method descriptor with a `Q` in it, rather than deferring classloading as long
> as possible (as is the case with `L` descriptors), we load the class eagerly, so
> we can learn all we need to know about it. From classloading, we might not only
> learn that it is a primitive class, but whether it should be nullable or not.
> (Since primitive classes are monomorphic, carrying this information around on a
> per-linkage basis is cheap enough.)

> So some primitive classes are marked as "pure" primitives, and others as
> supporting null; when the latter are used as receivers, `invokevirtual` does a
> null check prior to invocation (and NPEs if the receiver is null). When moving
> values between the heap and the stack (`getfield`, `aastore`, etc), these
> bytecodes must check for the "flat null" representation in the heap and a real
> null on the stack. The VM needs some help from the classfile to help choose a
> bit pattern for the flat null; the most obvious strategy is to inject a
> synthetic boolean, but there are others that don't require additional footprint
> (e.g., flow analysis that proves a field is assigned a non-default value; using
> low-order bits in pointers; using spare bits in booleans; using pointer colors;
> etc.) The details are for another day, but we would like for this to not
> intrude on the user model.

> #### L vs Q

> The exploration into nullable primitives prompted a reevaluation of the meaning
> of L vs Q. Historically we had interpreted L vs Q as being "pointer vs flat"
> (though the VM always has the right to unflatten if it feels like it.) But over
> time we've been moving towards Q mostly being about earlier loading (so the VM
> can learn what it needs to know before making hard-to-reverse decisions, such as
> layout.) So let's go there fully.

> A `Q` descriptor means that the class must be loaded eagerly (Q for "quick")
> before resolving the descriptor; an `L` descriptor means it _must not be_ (L for
> "lazy"), consistent with current JVMS treatment. Since an `L` descriptor is
> lazily resolved, we have to assume conservatively that it is nullable; a Q
> descriptor might or might not be nullable (we'll know once we load the class,
> which we do eagerly.)

> What we've done is wrested control of flatness away from the language, and ceded
> it to the VM, where it belongs. The user/language expresses semantic
> requirements (e.g., nullability) and the VM chooses a representation. That's
> how we like it.

> #### It's all about the references

> The rethink of L vs Q enabled a critical restack of the user model. With this
> reinterpretation, Q descriptors can (based on what is in the classfile) still be
> reference types -- and these reference types can still be flattened;
> alternately,
> with side-channels for preload metadata on `L` descriptors, we may be able to
> get
> to non-flat references under `L` descriptors.

> Returning to the tempting user knobs of nullability and tearability, we can now
> put these where they belong: nullability is a property of _reference types_ --
> and some primitive classes can be reference types. Similarly, the
> initialization safety of immutable objects derives from the fact that object
> references are loaded atomically (with respect to stores of the same reference.)
> Non-tearability is also a property of reference types. (Similar with layout
> circularity; references can break layout circularities.) So rather than the
> user choosing nullability and non-tearability as ad-hoc choices, we treat them
> as affordances of references, and let users choose between reference-only
> primitive classes, and the more traditional primitive classes, that come in both
> reference and value flavors.

> > This restack allows us to eliminate `ref-default` completely (we'll share more
> > details later), which in turn allows us to eliminate `.val` completely.
> > Further, the use cases for `.ref` become smaller.

> #### The buckets

> So, without further ado, let's meet the new user model. The names may change,
> but the concepts seem pretty sensible. We have identity classes, as before;
> let's call that the first bucket. These are unchanged; they are always
> translated with L descriptors, and there is only one usable `Class` literal for
> these.

> The second bucket are _identity-free reference classes_. They come with the
> restrictions on identity-free classes: no mutability and limited extensibility.
> Because they are reference types, they are nullable and receive tearing
> protection. They are flattenable (though, depending on layout size and hardware
> details, we may choose to get tearing protection by maintaining the
> indirection.) These might be with Q descriptors, or with modified L
> descriptors, but there is no separate `.ref` form (they're already references)
> and there is only one usable `Class` literal for these.

> The third bucket are the _true primitives_. These are also identity-free
> classes, but further give rise to both value and reference types, and the value
> type is the default (we denote the reference type with the familiar `.ref`.)
> Value types are non-nullable, and permit tearing just as existing primitives do.
> The `.ref` type has all the affordances of reference types -- nullability and
> tearing protection. The value type is translated with Q; the reference type is
> translated with L. There are two mirrors (`P.class` and `P.ref.class`) to
> reflect the difference in translation and semantics.

> A valuable aspect of this translation strategy is that there is a deterministic,
> 1:1 correspondence between source types and descriptors.

> How we describe the buckets is open to discussion; there are several possible
> approaches. One possible framing is that the middle bucket gives up identity,
> and the third further gives up references (which can be clawed back with
> `.ref`), but there are plenty of ways we might express it. If these are
> expressed as modifiers, then they can be applied to records as well.

> Another open question is whether we double down, or abandon, the terminology of
> boxing. On the one hand, users are familiar with it, and the new semantics are
> the same as the old semantics; on the other, the metaphor of boxing is no longer
> accurate, and users surely have a lot of mental baggage that says "boxes are
> slow." We'd like for users to come to a better understanding of the difference
> between value and reference types.

> #### Goodbye, direct control over flattening

> In earlier explorations, we envisioned using `X.ref` as a way to explicitly
> ask for no flattening. But in the proposed model, flattening is entirely
> under the control of the VM -- where we think it belongs.

> #### What's left for .ref?

> A pleasing outcome here is that many of the use cases for `X.ref` are subsumed
> into more appropriate mechanisms, leaving a relatively small set of corner-ish
> cases. This is what we'd hoped `.ref` would be -- something that stays in the
> corner until summoned. The remaining reasons to use `X.ref` at the use site
> include:

> - Boxing. Primitives have box objects; strict value-based classes need
> companion reference types for all the same situations as today's primitives
> do. It would be odd if the box were non-denotable.
> - Null-adjunction. Some methods, like `Map::get`, return null to indicate no
> mapping was present. But if in `Map<K,V>`, `V` is not nullable, then there
> is no way to express this method. We envision that such methods would return
> `V.ref`, so that strict value-based classes would widened to their "box" on
> return, and null would indicate no mapping present.
> - Cycle-breaking. Primitives that are self-referential (e.g., linked list node
> classes that have a next node field) would have layout circularities; using a
> reference rather than a value allows the circularity to be broken.

> This list is (finally!) as short as we would like it to be, and devoid of
> low-level control over representation; users use `X.ref` when they need
> references (either for interop with reference types, or to require nullability).
> Our hope all along was that `.ref` was mostly "break glass in case of
> emergency"; I think we're finally there.

> #### Migration

> The topic of migration is a complex one, and I won't treat it fully here (the
> details are best left until we're fully agreed on the rest.) Earlier treatments
> of migration were limited, in that even with all the complexity of
> `ref-default`, we still didn't get all the flattening we wanted, because the
> laziness of `L` descriptors kept us from knowing about potential flattenability
> until it was too late. Attempts to manage "preload lists" or "side preload
> channels" in previous rounds foundered due to complexity or corner cases, but
> the problem has gotten simpler, since we're only choosing representation rather
> than value sets now -- which means that the `L*` types might work out here.
> Stay tuned for more details.

> ## Reflection

> Earlier designs all included some non-intuitive behavior around reflection.
> What we'd like to do is align the user-visible types with reflection literals
> with descriptors, following the invariant that

> new X().getClass() == X.class

> ## TBD

> Stay tuned for some details on managing null encoding and detection,
> reference types under either Q or modified L descriptors, and some
> thoughts on painting the bikeshed.

> growing: [ https://dl.acm.org/doi/abs/10.1145/1176617.1176621 |
> https://dl.acm.org/doi/abs/10.1145/1176617.1176621 ]

From kevinb at google.com  Wed Nov  3 15:15:43 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 08:15:43 -0700
Subject: Consolidating the user model
In-Reply-To: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr>
Message-ID: <CAGKkBkuedh9PY4cLDFYty4c2hwd+QpVN4Qg0jrC08SUxkKGTJQ@mail.gmail.com>

On Wed, Nov 3, 2021 at 7:51 AM Remi Forax <forax at univ-mlv.fr> wrote:

My only concern now is the dual of Kevin's concern,
> what if people discover that they always want to use the identitiy-free
> reference types (B2), because it is better integrated with the rest of the
> Java world and that in the end, the OG/pure primitive types (B3) are almost
> never used.
>

B2 is certainly the more basic feature, and could at least hypothetically
release earlier than the rest. Regardless of timing, it does seem that the
costs and benefits of B3 need to be interpreted *relative* to B2.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Wed Nov  3 15:54:25 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 3 Nov 2021 11:54:25 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr>
Message-ID: <278cd063-9240-47db-6cb7-956065a1fb60@oracle.com>


> I really like this, it's far better than how i was seeing Valhalla, 
> pushing .ref into a corner is a good move.

Yes, we always disliked how prevalent .ref was; it took several rounds 
of "shaking the box" to get it to stay in the corner.

> I still hope that moving from B1 to B2 can be almost backward 
> compatible, if no direct access to the constructor, no synchronized 
> and reasonable uses of ==.

Yes, this works out better than we had hoped it might; as you say, if B1 
is value-based, it should be an "almost compatible" move to convert to 
B2.? Amazingly, it might even be "almost compatible" to go in the other 
direction too, something we'd almost given up on the possibility of.? 
Codes like a class, indeed.

The cost of this is, of course, that a B2 class gets less optimization 
than a B3 one (though more than a B1 one.)? Less heap flattening, more 
footprint, more null checks.? Though still substantial stack (calling 
convention) / IR (scalarization) flattening.? How we guide people to 
this is the next challenge.

> My only concern now is the dual of Kevin's concern,
> what if people discover that they always want to use the 
> identitiy-free reference types (B2), because it is better integrated 
> with the rest of the Java world and that in the end, the OG/pure 
> primitive types (B3) are almost never used.

In other words: having solved the almost-impossible technical problems, 
we now face the harder pedagogical problem :)

I'm actually worried about the opposite, though!? I think its a bigger 
risk that people will use B3 over B2 "because performance", and put 
themselves in danger (e.g., tearing, unexpected zeroes) without 
realizing it.


From kevinb at google.com  Wed Nov  3 15:58:17 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 08:58:17 -0700
Subject: Equality operator for identityless classes
In-Reply-To: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
 <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
Message-ID: <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>

I imagine we might be constrained to this design by the need to support
compatible migration. So there may be nothing we can do.

But there is a pretty serious problem here.

Background: code like IdentityHashMap, which cares about *objects per
se *instead
of what those objects *represent*, is unusual, special-case, egghead,
lift-the-caution-tape code. It is not normal. It's surely more common in
JDK code. But I strongly suspect that the vast majority of `==` tests in
the wild are not expressing questions of identity at all, but are
abbreviations for `equals()` when the developer happens to believe it's
safe. Many of those are of course bugs, and then there are plain accidental
usages as well.

Today, things are pretty okay because developers can learn that `==` is a
code smell. A responsible code reviewer has to think through each one like
this:

1. Look up the type. Is it a builtin, or Class? Okay, we're fine.
2. Is it an enum? Okay, I resent having to go look it up when they could
have just used switch, but fine.
3. Wait, is this weird code that actually cares about objects instead of
what they represent? This needs a comment.

The problem is that now we'll be introducing a whole class of ... classes
... for which `==` does something reasonable: only the ones that happen to
contain no references, however deeply nested! These cannot at all be easily
distinguished. This is giving bugs a really fantastic way to hide.

I think we'd better consider some heretical options, like introducing `===`
and `!==` as sugar for Object.equals(). It seems tragic to imagine the
entire world (except the special-case code) transitioning over to that, as
it's quite ugly. But it would lead to more correct code. Maybe you have
other ideas.


On Wed, Nov 3, 2021 at 7:05 AM Brian Goetz <brian.goetz at oracle.com> wrote:

Extrapolating, ACMP is a _substitutability test_; it says that
> substituting one for the other would have no detectable differences.
> Because all objects have a unique identity, comparing the identities is
> both necessary and sufficient for a substitutability test.


What you say here may be technically true, but people who override equals()
are already trying their best to disavow identity in the only way they
have. And that makes your statement here actually kinda *wrong*. Being a
necessary and sufficient substitutability test is literally, exactly, what
Object.equals() does (and never mind that people might implement it *wrong*).
If that method's purpose is not to give classes control over their own
substitutability test -- which they *need!* -- then I can't imagine a
purpose for it at all. (And yes, those objects still do expose identity,
but their equals() implementation is consenting to have that identity
"forgotten" at any time just by round-tripping it through some collection
etc.)


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Wed Nov  3 16:02:18 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 3 Nov 2021 16:02:18 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
Message-ID: <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>


On Nov 2, 2021, at 4:53 PM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>> wrote:


Actually, that makes me start to wonder if `getClass()` should be another method like `notify` that simply doesn't make sense to call on value types. (But we still need the two distinct Class instances per class anyway.)

You could argue that it doesn't make sense on the values, but surely it makes sense on their boxes.  But its a thin argument, since classes extend Object, and we want to treat values as objects (without appealing to boxing) for purposes of invoking methods, accessing fields, etc.  So getClass() shouldn't be different.

One way to thicken this thin argument is to say that Point is not really a class.  It?s a primitive.  Then it still has a value-set inclusion relation to Object, but it?s not a sub-class of Object.  It is a value-set subtype.

It?s probably fruitless, but worth brainstorming as a heuristic for possible moves, so? we could say that:

- Point is not a class, it?s a primitive with a value set
- Point is not a subclass of Object, it?s a subtype (with value set conversion, like int <: long)
- !(Point *is a* Object) & (Point *has a* Object box)
- Point does not (cannot) inherit methods from Object
- Point can *execute* methods from Object, but only after value-set mapping


From john.r.rose at oracle.com  Wed Nov  3 17:10:49 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 3 Nov 2021 17:10:49 +0000
Subject: [External] : Equality operator for identityless classes
In-Reply-To: <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
 <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
 <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
Message-ID: <B4E75411-DF33-4345-9F95-F94735CAD4ED@oracle.com>

One of the long standing fixtures in the ecosystem is the
set of idioms for correct use of op==/acmp.  Another is lots
of articles and IDE checkers which detect other uses which
are dubious.  It?s a problem that you cannot use op==/acmp
by itself in most cases; you have to accompany it by a call
to Object::equals.  We might try to fix this problem, but
it cannot be expunged from our billions of lines of
pre-existing Java code.

I like to call these equals-accompanying idioms L.I.F.E,
or Legacy Idiom(s) For Equality.  It shows up, canonically,
in this method of ju.Objects:

    public static boolean equals(Object a, Object b) {
        return (a == b) || (a != null && a.equals(b));
    }

Thus, the defective character of op==/acmp is just
(wait for it) a fact of L.I.F.E. and we cannot fight it too
much without hurting ourselves.

Turning that around, if L.I.F.E. is a dynamically common
occurrence (as it is surely statically common) then we
can expend JIT complexity budget to deal with it, and
(maybe even) adjust JVM rules around the optimizations
to make more edgy versions of the optimizations legal.

Specifically, this JIT-time transform has the potential to
radically reduce the frequency of op==/acmp:

   (a == b) || (a != null && a.equals(b))
=>
  (a == null ? b == null : a.equals(b))

This only works if all possible methods selected from
a.equals permit the dropping of op==.  The contract
of Object::equals does indeed allow this, but it is not
enforced; the JVMS allows the contract to be broken,
and the transform will expose the breakage.  And yet,
there are things we can do here to unlock this transform.

More generally, for other L.I.F.E.-forms, I am confident
we can build JIT transforms that reduce reliance on
acmp, which is suddenly more expensive than its coders
(and the original designers of Java) expect.

Programmers who override Object::equals to (as you
nicely say) disavow identity-based substitutability
will probably write, prompted by their IDE, in a
ceremonial mood, that one occurrence of op==/acmp
to short-circuit the rest of their Foo::equals method.
Or they may erase it, in a purifying mood.

In either case, the above transform requires the JIT
to examine such as either actually or potentially
starting with a short-circuiting op==/acmp.
In any case, such an identity comparison will be
monomorphic in the receiver type, not a
polymorphic multi-way dispatch on Object
references.

So this is not just moving around costs that stay the
same; you can de-virtualize op==/acmp by moving
it into the prologue of all Object::equal methods.
(Non-compliant ones can be handled by splitting
the entry point.)  Once the actual or potential
op==/acmp is found at the start of Foo::equals, we
can then inline and reorder the checks in the body
of the equals method.  At that point the cost of op==
starts to go to zero.

This is old news; we?ve discussed it in Burlington
now these many years ago.  But I thought I?d remind
us of it.  And this is really a more hopeful approach
to L.I.F.E.  That is, even if we don?t do these JIT
transforms in the first release, there is a path forward
that eventually removes the unintentional costs of
op==/acmp when L.I.F.E. throws them at us.

All this can work without requiring a global move to a
completely new operator (op===), surely an alien form
of L.I.F.E. within our ecosystem.

(Ba-DUM-ch!)


From brian.goetz at oracle.com  Wed Nov  3 17:21:18 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 3 Nov 2021 13:21:18 -0400
Subject: [External] : Equality operator for identityless classes
In-Reply-To: <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
 <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
 <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
Message-ID: <b7a933fe-dea5-9dd9-4742-84ce84799655@oracle.com>

A related concern is that many existing uses of == are optimizations 
intended to short-circuit evaluation of `equals`, under the assumption 
that == is "much faster" than equals.? When the performance reality 
shifts, some of this code might get slower.? (Though in most cases it 
probably makes no difference.)

If you assume that most uses of == are accidents, many of them might get 
less wrong; for example, using == on Integer (outside of the box cache) 
is almost always wrong, but will get less wrong in the future (since it 
will compare what's in the box.) This is both better and worse, in that 
fewer bugs will manifest as problems, but then bugs may sit undetected 
for longer.

(Don't get me started on the "primitives are good for numerics" -> 
"numerics will want operator overloading" -> "oh crap, == already means 
something" problem.)


On 11/3/2021 11:58 AM, Kevin Bourrillion wrote:
> I imagine we might be constrained to this design by the need to 
> support compatible migration. So there may be nothing we can do.
>
> But there is a pretty serious problem here.
>
> Background: code like IdentityHashMap, which cares about /objects per 
> se /instead of what those objects /represent/, is unusual, 
> special-case, egghead, lift-the-caution-tape code. It is not normal. 
> It's surely more common in JDK code. But I strongly suspect that the 
> vast majority of `==` tests in the wild are not?expressing questions 
> of identity at all, but are abbreviations for `equals()` when the 
> developer happens to believe it's safe. Many of those are of course 
> bugs, and then there are plain accidental usages as well.
>
> Today, things are pretty okay because developers can learn that `==` 
> is a code smell. A responsible code reviewer has to think through each 
> one like this:
>
> 1. Look up the type. Is it a builtin, or Class? Okay, we're fine.
> 2. Is it an enum? Okay, I resent having to go look it up when they 
> could have just used switch, but fine.
> 3. Wait, is this weird code that actually cares about objects instead 
> of what they represent? This needs a comment.
>
> The problem is that now we'll be introducing a whole class of ... 
> classes ... for which `==` does something reasonable: only the ones 
> that happen to contain no references, however deeply nested! These 
> cannot at all be easily distinguished. This is giving bugs a really 
> fantastic way to hide.
>
> I think we'd better consider some heretical options, like introducing 
> `===` and `!==` as sugar for Object.equals(). It seems tragic to 
> imagine the entire world (except the special-case code) transitioning 
> over to that, as it's quite ugly. But it would lead to more 
> correct?code. Maybe you have other ideas.
>
>
> On Wed, Nov 3, 2021 at 7:05 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>     Extrapolating, ACMP is a _substitutability test_; it says that
>     substituting one for the other would have no detectable differences.
>     Because all objects have a unique identity, comparing the
>     identities is
>     both necessary and sufficient for a substitutability test.
>
>
> What you say here may be technically true, but people who override 
> equals() are already trying their best to disavow identity in the only 
> way they have. And that makes your statement here actually kinda 
> /wrong/. Being a necessary and sufficient substitutability test is 
> literally, exactly, what Object.equals() does (and never mind that 
> people might implement it /wrong/). If that method's purpose is not to 
> give classes control over their own substitutability test -- which 
> they /need!/?-- then I can't imagine a purpose for it at all. (And 
> yes, those objects still do expose identity, but their equals() 
> implementation is consenting to have that identity "forgotten" at any 
> time just by round-tripping it through some collection etc.)
>
>
> -- 
> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com

From kevinb at google.com  Wed Nov  3 17:23:20 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 10:23:20 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
Message-ID: <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>

On Wed, Nov 3, 2021 at 9:02 AM John Rose <john.r.rose at oracle.com> wrote:

> One way to thicken this thin argument is to say that Point is not really
a class.
> It?s a primitive.  Then it still has a value-set inclusion relation to
Object, but it?s
> not a sub-class of Object.  It is a value-set subtype.

I would spin it like this: `Point` absolutely is a class. But its instances
are *values* (like ints and references are, but compound), and values *are
still not objects*.

We've said at times we want to "make everything an object", but I think the
unification users really care about is everything being a *class instance*.

I think this fits neatly with the current design: `Point` has no
supertypes*, not even `Object`, but `Point.ref` does.

(*I mean "supertype" in the polymorphic sense, not the "has a conversion"
sense or the "can inherit" sense. I don't know what the word is really
supposed to mean. :-))


> - !(Point *is a* Object) & (Point *has a* Object box)
> - Point does not (cannot) inherit methods from Object
> - Point can *execute* methods from Object, but only after value-set
mapping

I'm a little fuzzy on what these accomplish for us, can you spell it out a
bit?
It sounds like a special rule treating Object methods differently from
other supertype methods (?), which would be nice to not need.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Wed Nov  3 17:58:29 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 3 Nov 2021 17:58:29 +0000
Subject: Consequences of null for flattenable representations
Message-ID: <B6714E72-5F95-4D31-AA01-7CD8D008A029@oracle.com>

As we just discussed in the EG, allowing null to co-exist
with flattenable representations is a challenge.  It is
one we have in the past tried to avoid, but the very
legitimate needs for (what we now call) reference
semantics for all of Bucket 2 and some of Bucket 3
require us to give null a place at the table, even while
continuing to aim at flattening nullable values,
when possible.

A good example of this is Optional, migrated from a
Bucket 1 *value-based class* to a proper Bucket 2
*reference-based primitive*.   (See that tricky change
in POV?)  Another example to keep in mind is the
reference projection of a Bucket 3 type such as
Complex.ref or Point.ref.

The simplest way to support null is just to do what
we do today, and buffer on the heap, with the option
of a null reference instead of a reference to a boxed value.

(We call such things ?buffers? rather than ?boxes? simply
because, unlike int/Integer, the type of thing that?s in
the box might not be denotably different from the type
of the ?box? itself.)

The next thing to do is inject a *pivot field* into the flattened
layout of the primitive object.  When this invisible field
contains all zero bits, the flattened object encodes a null.
All the other bits are either ignorable or must be zero,
depending on what you are trying to do.

This idea splits into two directions:  How to work with
?pivoted? non-null values, and how to represent the pivot
efficiently. Both lines of thought are more or less required
exercises, once you allow null its place at the table.

We know where null comes from (the null literal and
aconst_null).   Where do pivoted values come from?
You need an original source of them for the initial
value of ?this? in the primitive constructor (a factory
method at the bytecode level).  Specifically, you need
that bit pattern which is almost but not quite all
zero bits; the pivot field is set to the ?non-null?
state but all other field values are zero.  Then
the constructor can get to work.

This might be the job of an ?initialvalue? bytecode,
which is a repackaging of the ?defaultvalue? bytecode.
Given a suitable definition with suitable restrictions
for initialvalue, a constructor uses a mix of initialvalue
and withfield executions to get to its output state for ?this?.
None of the intermediate states would be confusable
with null.

(We sometimes assumed, wrongly in hindsight, that
doing this simply requires assigning ?this? to
null in the constructor and then special-casing
withfield and maybe getfield to allow a null input
and maybe a null output.  But this is a thicket of
tangles and irregularities, and it doesn?t quite
get rid of the need for a separate operation to
actually set the pivot field.  Basically, once null
gets entrenched, defaultvalue has to turn into
initialvalue, or so it appears to me at this moment.)

Once the constructor returns a non-null set of
bits, all subsequent assignments continue to
separate null from non-null.  That?s true even
for racy assignments, assuming that pivot field
states are individually atomic, even if they race
relative to other fields.

(Race control might be important for Bucket 3
references like Complex.ref, if we ever try to
flatten those.  I?m digressing; my focus is to
build out Bucket 2, which suppresses such races.)

To allow Bucket 2 constructors control over their
outputs, it follows that initialvalue (unlike its
earlier version defaultvalue) must be restricted
to those same contexts where withfield is allowed.
Either to constructors only (for the same class)
or to the capsule (nest) of that class.

OK, so how is the pivot field physically represented?
Again, we have discussed this in years past, but I?ll
summarize some of the thinking:

1. It can be just a boolean, a byte or a packed bit
that is made free somehow.  A 65th bit to a 64-bit
payload perhaps.  This is sad, but also hard to get
around when every single bitwise encoding in the
existing layout already has a meaning.

But the payload of the primitive type might use a
field with ?slack?, aka unused bitwise encodings.
We can pounce on this and use bit-twiddling
to internally reserve the zero state, and declare
that when that field is zero, it is the pivot field
denoting null, and when it is non-zero it is
doing its normal job.

2. If the language tells us, ?yes I promise not
to use the default value on this field? then maybe
the JVM can do something with that promise.
There are issues, but it?s tempting for (say)
a Rational type where the denominator is
never zero.

3. More reliably, if the JVM knows that the
a field has unused encodings, it can just swap
the all-zero state with some other state.
People will immediate think of unused bits
which can be flipped to true in the field
when it is pivoted to non-null.

It?s better, IMO, to start out with the humble
increment operator (rather than the bit-set
operator) and work from there.  As long as
the encoding of all-one-bits is not taken,
for a given field (true for booleans and
managed pointers!) then the JVM can
simply perform an unsigned non-overflowing
increment when storing payload to the
pivot field (preserving the non-zero
invariant) and do a non-overflowing
unsigned decrement when loading.

I can just hear the GC folks groaning in the
distance about such increments, on managed
pointers.  For them, a slightly less JIT-friendly
operation might be preferable, to perform
the increment (on store) only when the value
is null, and vice versa on load, decrement
only when 1.  Or use bit twiddling in the
low bits of the pointer.  Or use all-one-bits
as the ?payload null? which is distinct
from the ?pivot is zero? state.  I think the
JIT and GC folks can come to an agreement,
in any given JVM.  When the JIT people
groan back about weirdo encodings of
managed pointers, we can gently tell them,
?it?s just another flavor of managed pointer
transcoding, a problem we solved when
we went to compressed oops.?

(On balance, I think the GC should define
a small family of ?quasi-null sentinel values?
which can be easily stored into any managed
pointer for ad hoc purposes like this and others. 
Others would be at least 1. an Optional::isEmpty
state for optionals *which are null-friendly*
and 2. a distinction between null and unbound,
for lazy variables which are also null-friendly.
Neither of these exist today, of course, and
none of these hypothetical sentinels would ever
be visible to normal Java code.)

My point is that we don?t have to just slap
a boolean on everything.  In particular,
when migrating ju.Optional to Bucket 2,
we can preserve its very attractive one-field
representation by invisibly assigning a
bad managed pointer value to encode
Optional::isEmpty.  No Java code changes
are needed (or desired) to pull this off,
just the increment hack sketched above,
or one of its variations.

Even Bucket 3 references could be encoded
in this way, if and when we desire to.  That
is, whatever JVM algorithm constructors a
pivot field and its logic could be pointed at
a Bucket 3 reference projection, if we think
this would be desirable.  One result would
be that Map.get, which returns T.ref, could
avoid buffering on the heap.  N.B. This assumes
stuff we don?t have yet, to specialize Map::get
to a particular flattenable type.  I hope we
will get there.

? John

From daniel.smith at oracle.com  Wed Nov  3 18:04:51 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 3 Nov 2021 18:04:51 +0000
Subject: Equality operator for identityless classes
In-Reply-To: <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
 <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
 <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
Message-ID: <BC57F727-E2CC-46AC-983C-1FB97EA8D90F@oracle.com>

On Nov 3, 2021, at 9:58 AM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

Today, things are pretty okay because developers can learn that `==` is a code smell. A responsible code reviewer has to think through each one like this:

1. Look up the type. Is it a builtin, or Class? Okay, we're fine.
2. Is it an enum? Okay, I resent having to go look it up when they could have just used switch, but fine.
3. Wait, is this weird code that actually cares about objects instead of what they represent? This needs a comment.

The problem is that now we'll be introducing a whole class of ... classes ... for which `==` does something reasonable: only the ones that happen to contain no references, however deeply nested! These cannot at all be easily distinguished. This is giving bugs a really fantastic way to hide.

I'm not sure about this leap: while it's true that `==` is sometimes equivalent to `equals`, in general, you can't be sure without deep knowledge about the class. As a coding convention, seems reasonable to me to continue to expect clients to use `equals` rather than trying to develop a finer-grained distinction between different classes. I think it's perfectly fine advice for most code to continue to treat `==` as a smell, like they always have.


From john.r.rose at oracle.com  Wed Nov  3 18:07:55 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 3 Nov 2021 18:07:55 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
Message-ID: <F4BF7F0C-9FB4-463C-8A01-CF8E73988215@oracle.com>

On Nov 3, 2021, at 10:23 AM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

I think this fits neatly with the current design: `Point` has no supertypes*, not even `Object`, but `Point.ref` does.

(*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense or the "can inherit" sense. I don't know what the word is really supposed to mean. :-))

Slippery terms.  ?Type? is hopelessly broad as is ?super type?.

For types as value sets, a super type is a value super set.
Again, int <: long in this view, and even in the JLS.

For types as in an object hierarchy, a super type is a parent+
type, an upper limit in the hierarchy lattice.  That view
centers on object polymorphism and virtual methods,
and is suspiciously bound up with pointer polymorphism.
So String <: Object in this view.

To heal the rift we are groping towards int <: Object, but
we don?t fully know which kind of ?<:? that is, and how
it breaks down into a value set super, an object hierarchy
super, or perhaps something further.  The best view we
have so far, IMO, is that int <: Object breaks apart into
int <: int.ref (value set) and int.ref <: Object (hierarchy).
In that view, the last link of int <: int.ref requires a
story of how methods ?inherit? across value sets,
without the benefit of a pointer-polymorphic hierarchy
to inherit within.  It?s doable, but we are running
into the sub-problems of this task.


From kevinb at google.com  Wed Nov  3 18:23:37 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 11:23:37 -0700
Subject: Equality operator for identityless classes
In-Reply-To: <BC57F727-E2CC-46AC-983C-1FB97EA8D90F@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <F59896B2-A3DD-4FB0-BC41-A77C6A1D0788@oracle.com>
 <CAGKkBkvR_E_NfCzwpHKbsChF5UK9ayq+uOL_YuSpHqBH7o11qw@mail.gmail.com>
 <160ba720-56af-8657-5afb-b94c6da45088@oracle.com>
 <CAGKkBksG=qKJVJfjJjN-QXbzeXp2Ei2i6_pe9zMGKm2cyj8s=Q@mail.gmail.com>
 <BC57F727-E2CC-46AC-983C-1FB97EA8D90F@oracle.com>
Message-ID: <CAGKkBkuwBR_d5vBcFwnq=f=8oYLEVzCg0g8XH=CjoLwaU6ze4w@mail.gmail.com>

On Wed, Nov 3, 2021 at 11:05 AM Dan Smith <daniel.smith at oracle.com> wrote:

I'm not sure about this leap: while it's true that `==` is sometimes
> equivalent to `equals`, in general, you can't be sure without deep
> knowledge about the class. As a coding convention, seems reasonable to me
> to continue to expect clients to use `equals` rather than trying to develop
> a finer-grained distinction between different classes. I think it's
> perfectly fine advice for most code to continue to treat `==` as a smell,
> like they always have.
>

That is the "hygienic" line that we've been sorta-holding inside Google
with modest success. And I think it's the direction of gravity among static
analysis tools and really good style guides etc.

But it's a pretty hard line to hold as it stands, because the visceral
appeal of `==` is just too strong, and `!=` much stronger yet. I think it
would get near-impossible once there are a proliferation of user-defined
identityless classes where `==` "happens to be safe". We'd plead the case
that "sure, they're safe now, but this kind of unsafety is viral, so it's
fragile", yadda yadda, but who knows.

(One thing that has maybe helped us hold the line is that the most common
cases are enums, and we get to say "eh, `switch` is better anyway". So I
guess it's worth noting that at least other types supporting pattern
matching will have this same escape valve. That said, switch with only one
arm is a tough pill for people to swallow (I don't recall if `instanceof`
does or means to support *all* such cases or not).)

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From daniel.smith at oracle.com  Wed Nov  3 18:24:19 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 3 Nov 2021 18:24:19 +0000
Subject: Consolidating the user model
In-Reply-To: <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
Message-ID: <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>

On Nov 3, 2021, at 11:23 AM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

On Wed, Nov 3, 2021 at 9:02 AM John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

> One way to thicken this thin argument is to say that Point is not really a class.
> It?s a primitive.  Then it still has a value-set inclusion relation to Object, but it?s
> not a sub-class of Object.  It is a value-set subtype.

I would spin it like this: `Point` absolutely is a class. But its instances are values (like ints and references are, but compound), and values are still not objects.

We've said at times we want to "make everything an object", but I think the unification users really care about is everything being a class instance.

I think this fits neatly with the current design: `Point` has no supertypes*, not even `Object`, but `Point.ref` does.

(*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense or the "can inherit" sense. I don't know what the word is really supposed to mean. :-))

These sorts of explanations make me uncomfortable?that a Point stored in a reference isn't really a Point anymore, but a "box" or something like that.

The problem is that you want to say that the Point gets converted to some other thing, yet that other thing:
- is == to the original
- provides the exact same API as the original
- has the exact same behaviors as the original
- works exactly like a class declared with original class's declaration

If you're telling people that when you assign a Point to type Object, they now have something other than a Point, they're going to want to *see* that somehow. And of course they can't, because the box is a fiction.

The reference vs. value story that we developed to address these problems (and problems that arise when you *do* let people "see" a real box) carries the right intuitions: you can handle a Point by value or by reference, but either way it's the exact same object, so of course everything you do with it will work the same.

From brian.goetz at oracle.com  Wed Nov  3 18:34:52 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 3 Nov 2021 14:34:52 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <F4BF7F0C-9FB4-463C-8A01-CF8E73988215@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <F4BF7F0C-9FB4-463C-8A01-CF8E73988215@oracle.com>
Message-ID: <b6ec2579-6144-ecc4-bcc1-2f74f0a408f8@oracle.com>

There's lots of great stuff on subtyping in chapters 15 and 16 of TAPL 
(esp 15.6, "Coercion semantics"), which might be helpful.? But as a 
tl;dr, I would suggest treating subtyping strictly as an is-a relation 
within our nominal type system.? By this interpretation, int <! long, 
and int <! Object; these are both _conversions_.

Subtyping is a very strong condition, because it is transitive. 
Conversions allows finer-grained, more ad-hoc conversions.? So we view 
int to long as a "primitive widening conversion" (JLS 5) and int to 
Object today as a boxing conversion.

Note too that the JVM and language have different type systems; ideally 
the two form an embedding-projection pair, so that we can do our type 
checking in the rich type system and erase it down to the more limited 
type system of the VM without loss of correctness.


On 11/3/2021 2:07 PM, John Rose wrote:
> On Nov 3, 2021, at 10:23 AM, Kevin Bourrillion <kevinb at google.com> wrote:
>>
>> I think this fits neatly with the current design: `Point` has no 
>> supertypes*, not even `Object`, but `Point.ref` does.
>>
>> (*I mean "supertype" in the polymorphic sense, not the "has a 
>> conversion" sense or the "can inherit" sense. I don't know what the 
>> word is really supposed to mean. :-))
>
> Slippery terms. ??Type? is hopelessly broad as is ?super type?.
>
> For types as value sets, a super type is a value super set.
> Again, int <: long in this view, and even in the JLS.
>
> For types as in an object hierarchy, a super type is a parent+
> type, an upper limit in the hierarchy lattice. ?That view
> centers on object polymorphism and virtual methods,
> and is suspiciously bound up with pointer polymorphism.
> So String <: Object in this view.
>
> To heal the rift we are groping towards int <: Object, but
> we don?t fully know which kind of ?<:? that is, and how
> it breaks down into a value set super, an object hierarchy
> super, or perhaps something further. ?The best view we
> have so far, IMO, is that int <: Object breaks apart into
> int <: int.ref (value set) and int.ref <: Object (hierarchy).
> In that view, the last link of int <: int.ref requires a
> story of how methods ?inherit? across value sets,
> without the benefit of a pointer-polymorphic hierarchy
> to inherit within. ?It?s doable, but we are running
> into the sub-problems of this task.
>

From kevinb at google.com  Wed Nov  3 19:00:20 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 12:00:20 -0700
Subject: identityless objects and the type hierarchy
Message-ID: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>

Okay, let's stick a pin in proper-value-types (i.e. try to leave them out
of this discussion) for a moment...

One question is whether the existing design for the bifurcated type
hierarchy will carry right over to this split instead. (My understanding of
that design is: every (non-Object) concrete class will implement exactly
one of two disjoint interfaces, explicitly or not.)

My first thoughts were that the situation is different here: exposed
identity seems to be strictly (maybe?) contractually stronger than no
exposed identity. So here, a class being "noncommittal" *ought to* look the
same as it being identityless. In theory, it should be harmless for an
identity class to extend an identityless class (while the opposite
direction is a problem).

So, first, is that even right?

Next, even if so, the Backward Default Problem strikes again. To make a
class identityless you would seem to need all your *supertypes* to be,
first! That's hard to pull off. And `Object` itself would seem to want to
be marked identityless, which is obviously weird/problematic.

So I think we are forced back to a tripartite model (somewhat like we are
having to do with nullness, but probably closer to what we'll have to do
after nullness for `@OkayToIgnoreReturnValue`).

"intentionally identityful" is-stronger-than "intentionally identityless"
is-stronger-than "unknown so will be *presumed* identityful unless
otherwise specified"

It's possible that would put us straight back to where this email started.

But this all smells rotten, like it demands we find a simpler way to think
about it (which you may already know, and I'm just missing it so far).


--
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Wed Nov  3 19:24:13 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 3 Nov 2021 19:24:13 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <b6ec2579-6144-ecc4-bcc1-2f74f0a408f8@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <F4BF7F0C-9FB4-463C-8A01-CF8E73988215@oracle.com>
 <b6ec2579-6144-ecc4-bcc1-2f74f0a408f8@oracle.com>
Message-ID: <21F5CF86-2C4B-4ABD-97FC-AC607527EFF6@oracle.com>

On Nov 3, 2021, at 11:34 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> There's lots of great stuff on subtyping in chapters 15 and 16 of TAPL (esp 15.6, "Coercion semantics"), which might be helpful.  But as a tl;dr, I would suggest treating subtyping strictly as an is-a relation within our nominal type system.  By this interpretation, int <! long, and int <! Object; these are both _conversions_.  

Yes, that?s good.  So when someone tries to say ?int <: long? or ?int <: Object? our response would be ?sorry, you are talking about a different idea of types?.  Something like ?int <: Object? is a conversion the object can do, not two ways of viewing the whole object.  For us, types are about is-a, not is-a-member-of-larger-set (a disguised has-a) or can-do-a-conversion (another disguised has-a).

That does lead us to the next hard problem:  Which is that a value is not a box, it has a box.  And yet we want reflection (getClass specifically) not to make a distinction between those two distinct entities, but assign them the same class.  Which is fine, except that Classes have grabbed some of the jobs of types.

Brainstorming here:  We might be happier with a method called getRuntimeType which is allowed to return different values when applied to a box/ref of Point and a value of Point.  And then we notice that Point values don?t have inheritance or super types, directly, so the method paradigm (getRuntimeType being a method) is overkill; these are all statically bound methods.

And yet, there is a strong constraint that such a method, in its statically bound form, should return the same value as the corresponding call when applied to a box (under the type Object, maybe).  I don?t know how to untie this knot completely.

Brainstorming again:  getRuntimeType applied to a value can (and should?) be constant folded at compile time.  (Same point for getClass in fact.)  When applied to a ref it cannot (usually).  This makes getRuntimeType feel even less like an object method, but more like a __RuntimeTypeOf[ ? ] syntax (no bikesheds were painted in the production of this statement).

I?m thinking that those few users who want to extract type mirrors from (non-null) witnesses will need to specify manually which type projection they are expecting, rather than hope that the the result they want will pop out.

Not this:
  Class<Point> cp = point.getClass(); //Point
  Class<Point> ci = anint.getClass(); //Integer (aka int.ref)

but this:
  Class<Point> cp = point.getClass().valueType(); //Point
  Class<Point> ci = anint.getClass().valueType(); //int

or else this:
  Class<Point> cp = point.getClass().referenceType(); //Point.ref
  Class<Point> ci = anint.getClass().referenceType(); //Integer

In other words, if the rift between Integer and Point is not completely healed, users can probably work around the problems.


From brian.goetz at oracle.com  Wed Nov  3 19:42:55 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 3 Nov 2021 15:42:55 -0400
Subject: identityless objects and the type hierarchy
In-Reply-To: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>
References: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>
Message-ID: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com>


On 11/3/2021 3:00 PM, Kevin Bourrillion wrote:
> Okay, let's stick a pin in proper-value-types (i.e. try to leave them 
> out of this discussion) for a moment...
>
> One question is whether the existing design for the bifurcated type 
> hierarchy will carry right over to this split instead. (My 
> understanding of that design is: every (non-Object) concrete class 
> will implement exactly one of two disjoint interfaces, explicitly or not.)
>
> My first thoughts were that the situation is different here: exposed 
> identity seems to be strictly (maybe?) contractually stronger than no 
> exposed identity. So here, a class being "noncommittal" /ought to/ 
> look the same as it being identityless. In theory, it should be 
> harmless for an identity class to extend an identityless class (while 
> the opposite direction is a problem).
>
> So, first, is that even right?

We went back and forth on this a few times.? A useful lens is to ask: 
how might we depend on reflecting identity-{ful,less}ness in the 
hierarchy?? These include:

 ??? if (x instanceof IdentityObject) { ... }

 ??? void m(IdentityObject o) { ... }

 ??? <T extends IdentityObject> m(T t) { ... }

It is worth noting that the first is invertible (we can negate the 
condition) but the latter two are not.? Which is another way to say 
that, if anyone, anywhere, might want to write code that *requires* no 
identity, then we should consider giving them a way to do it.

(Ideally, if you're planning on (say) synchronizing on a parameter, you 
should engage the type system to ensure that an identityful object is 
passed; this is a good use of the type system.)

> Next, even if so, the Backward Default Problem strikes again. To make 
> a class identityless you would seem to need all?your /supertypes/ to 
> be, first! That's hard to pull off. And `Object` itself would seem to 
> want to be marked identityless, which is obviously weird/problematic.

The superclass chain is tricky, but we've spent a lot of time shaking 
this box.? Some types are _identity-agnostic_.? These include interfaces 
that do not extend PrimitiveObject, abstract classes that meet some set 
of conditions, and Object.? The supertypes of a primitive class (and of 
an identity-agnostic class) must be identity-agnostic.

This is powerful.? For example, an interface could extend 
IdentityObject, which would effectively prohibit primitive classes from 
implementing it.? This is a way to signal "my (concrete) subtypes need 
identity."


From kevinb at google.com  Wed Nov  3 21:40:58 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 14:40:58 -0700
Subject: Consolidating the user model
In-Reply-To: <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
Message-ID: <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>

Certainly possible I'm just misunderstanding something. (I don't *think* I
am...)


On Wed, Nov 3, 2021 at 11:24 AM Dan Smith <daniel.smith at oracle.com> wrote:

> On Nov 3, 2021, at 11:23 AM, Kevin Bourrillion <kevinb at google.com> wrote:
>
> On Wed, Nov 3, 2021 at 9:02 AM John Rose <john.r.rose at oracle.com> wrote:
>
> > One way to thicken this thin argument is to say that Point is not really
> a class.
> > It?s a primitive.  Then it still has a value-set inclusion relation to
> Object, but it?s
> > not a sub-class of Object.  It is a value-set subtype.
>
> I would spin it like this: `Point` absolutely is a class. But its
> instances are *values* (like ints and references are, but compound), and
> values *are still not objects*.
>
> We've said at times we want to "make everything an object", but I think
> the unification users really care about is everything being a *class
> instance*.
>
> I think this fits neatly with the current design: `Point` has no
> supertypes*, not even `Object`, but `Point.ref` does.
>
> (*I mean "supertype" in the polymorphic sense, not the "has a conversion"
> sense or the "can inherit" sense. I don't know what the word is really
> supposed to mean. :-))
>
>
> These sorts of explanations make me uncomfortable?that a Point stored in a
> reference isn't really a Point anymore, but a "box" or something like that.
>

Yes exactly. I will be talking about why I think it's probably *good* to
think of it as a box.


> The problem is that you want to say that the Point gets converted to some
> other thing, yet that other thing:
> - is == to the original
>

I would hope that's already true of int==Integer?


> - provides the exact same API as the original
> - has the exact same behaviors as the original
>

Agreed that Point and Point.ref are different types that have the same
members/features.

One-class-multiple-types is not entirely without precedent (though, sure,
List<A> and List<?> and List don't have *exactly* the same API).

Once you accept that they're different types, then the fact they have the
same API is just convenient.


- works exactly like a class declared with original class's declaration
>

It's the same class. There's only one class.

(There are two java.lang.Classes, because what that type models is not
"class", it's something more like "an erased type or void" <handwave>.)


If you're telling people that when you assign a Point to type Object, they
> now have something other than a Point, they're going to want to *see* that
> somehow. And of course they can't, because the box is a fiction.
>

What would they want to see? What is there to see about an object? Maybe
its header, its dynamic type -- and uh, those things must be there, right?.
because how could I use it polymorphically otherwise. I'm not sure what
else would be meant by "seeing" the thing.

Fictions are great things when they don't leak. I don't see the leak here
yet.

I'll attempt to flip this around on you. :-) You say that a *value* of type
Point is also already an "object". But then where is its header, its
dynamic type? Objects have that. For whatever reason this seemed like the
more conspicuous leak to me.


> The reference vs. value story that we developed to address these problems
> (and problems that arise when you *do* let people "see" a real box) carries
> the right intuitions: you can handle a Point by value or by reference, but
> either way it's the exact same object, so of course everything you do with
> it will work the same.
>

I'm claiming this picture makes explaining the feature harder,
unnecessarily. An unhoused value floating around somewhere that I can
somehow have a reference to strikes me as quite exotic. Tell me it's just
an object and I feel calmer.

But I'll write a more proper explanation of why I think this is the wrong
retcon for "object".

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From daniel.smith at oracle.com  Wed Nov  3 23:05:52 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 3 Nov 2021 23:05:52 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
Message-ID: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>

On Nov 3, 2021, at 3:40 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

The problem is that you want to say that the Point gets converted to some other thing, yet that other thing:
- is == to the original

I would hope that's already true of int==Integer?

Formally, you can't literally compare an int with an Integer. All comparisons between a boxed Integer and an int have to decide if they're primitive comparisons, reference comparisons, or illegal, based on some rather complex conversions and disambiguation rules. At runtime, if the types you use result in a reference comparison, the answer depends on quirks of the interning logic.

Informally, whatever path you take, where boxed Integers are involved, == is unreliable, because you may indeed be comparing two different objects that happen to have been derived from the same number.

Now, if we kept `int` and `Integer` as distinct things, but turned `Integer` into an identity-free class, I suppose it's true that you wouldn't be able to tell whether two boxes were distinct or not, because == would always be true. (More properly, "are these distinct boxes with the same payload?" would be a malformed question to ask, because it presumes identity.)

So, okay: to be fair to these reimagined boxes, I'll stipulate that they are identity-free, and thus indistinguishable with ==.

- provides the exact same API as the original
- has the exact same behaviors as the original

Agreed that Point and Point.ref are different types that have the same members/features.

One-class-multiple-types is not entirely without precedent (though, sure, List<A> and List<?> and List don't have exactly the same API).

Once you accept that they're different types, then the fact they have the same API is just convenient.


- works exactly like a class declared with original class's declaration

It's the same class. There's only one class.

(There are two java.lang.Classes, because what that type models is not "class", it's something more like "an erased type or void" <handwave>.)

Is your model that, where there are n possible Points, there are in fact 2n instances of class Point, where half of them are "values" and half of them are "boxes"?

I would find that pretty confusing, but I'm not sure it's what you mean. I would want to be able to somehow distinguish which subset an instance belonged to.

Or is it your model that, when you convert a value to a box, the two things are the same class instance, just manifested or encoded differently?

That's actually not that far from the model we've described, which is that it's the same instance, just *viewed* or *accessed* differently. Those are different verbs, and so the models might not be interchangeable, but they're close.

If you're telling people that when you assign a Point to type Object, they now have something other than a Point, they're going to want to *see* that somehow. And of course they can't, because the box is a fiction.

What would they want to see? What is there to see about an object? Maybe its header, its dynamic type -- and uh, those things must be there, right?. because how could I use it polymorphically otherwise. I'm not sure what else would be meant by "seeing" the thing.

I think my intuitions about boxes tie heavily to 'getClass' behavior (or some analogous reflective operation). "What are you?" should give me different answers for a bare value and a box. A duck in a box is not the same thing as a duck.

The analogy here would be that Integer.getClass() returns Integer.class, while int.getClass(), if it existed, would return int.class.

I might want to write code like:

<T extends Point.ref> void m(T arg) {
    if (arg.getClass() == Point.class) System.out.println("I'm a value!");
    else System.out.println("I'm a box!");
}

But this isn't the runtime behavior we would intend to support, because in fact at runtime there are no boxes to reflect.

I'll attempt to flip this around on you. :-) You say that a value of type Point is also already an "object". But then where is its header, its dynamic type? Objects have that. For whatever reason this seemed like the more conspicuous leak to me.

The value type/reference type model is that you can operate on an object directly, or by reference. It's the same object either way. Reference conversion just says "take this object and give me a reference to it". Nothing about the object itself changes.

The details of object encoding are deliberately left out of the model, but it's perfectly fine for you to imagine a header and a dynamic type carried around with the object always, both when accessed as a value and when accessed via a reference.

(It is, I suppose, part of the model that objects of a given class all have a finite, matching layout when accessed by value, even if the details of that layout are kept abstract. Which is why value types are monomorphic and you need reference types for polymorphism.)

The fact that the VM often discards object headers at runtime is a pure optimization.

I'm claiming this picture makes explaining the feature harder, unnecessarily. An unhoused value floating around somewhere that I can somehow have a reference to strikes me as quite exotic. Tell me it's just an object and I feel calmer.

Yes, it's just an object. :-)

But not quite how you mean. The new feature here is working with objects *directly*, without references. I think one thing you're struggling with is that your concept of "object" includes the reference, and if we take that away, it doesn't quite seem like an object anymore.


From kevinb at google.com  Thu Nov  4 00:19:04 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 17:19:04 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
Message-ID: <CAGKkBktgd0iyC3eCs-mj02nCz1twHsdzkVuJVBQycU_vTuo0Qg@mail.gmail.com>

(Note that shipping bucket-2 shouldn't require us to agree on any of this
stuff below.)


On Wed, Nov 3, 2021 at 4:06 PM Dan Smith <daniel.smith at oracle.com> wrote:

> - provides the exact same API as the original
>> - has the exact same behaviors as the original
>>
>
> Agreed that Point and Point.ref are different types that have the same
> members/features.
>
> One-class-multiple-types is not entirely without precedent (though, sure,
> List<A> and List<?> and List don't have *exactly* the same API).
>
> Once you accept that they're different types, then the fact they have the
> same API is just convenient.
>
>
> - works exactly like a class declared with original class's declaration
>>
>
> It's the same class. There's only one class.
>
> (There are two java.lang.Classes, because what that type models is not
> "class", it's something more like "an erased type or void" <handwave>.)
>
>
> Is your model that, where there are n possible Points, there are in fact
> 2n instances of class Point, where half of them are "values" and half of
> them are "boxes"?
>

... Yes? but it's an odd way to put it; I'll explain. The model I'm
speaking for says that values and objects are two different and disjoint
kinds of things. So there are n possible Point values (according to !=) and
there are n corresponding possible Point.ref objects (according to !=). But
I wouldn't have put the numbers together into one number "2n", because I
don't think there's anything a program could actually count that would turn
up that answer. (It's a biiiit like asking "how many continents or cardinal
directions are there?" and I just answer "11".) Sorry, I belabored that
point a bit overmuch.


I would find that pretty confusing, but I'm not sure it's what you mean. I
> would want to be able to somehow distinguish which subset an instance
> belonged to.
>

I don't see what distinguishing there is to do? You always definitively
either have a value or you have a reference value pointing to some object.
You know that before there's even any question to ask... right?


Or is it your model that, when you convert a value to a box, the two things
> are the same class instance, just manifested or encoded differently?
>
> That's actually not that far from the model we've described, which is that
> it's the same instance, just *viewed* or *accessed* differently. Those are
> different verbs, and so the models might not be interchangeable, but
> they're close.
>
> If you're telling people that when you assign a Point to type Object, they
>> now have something other than a Point, they're going to want to *see* that
>> somehow. And of course they can't, because the box is a fiction.
>>
>
> What would they want to see? What is there to see about an object? Maybe
> its header, its dynamic type -- and uh, those things must be there, right?.
> because how could I use it polymorphically otherwise. I'm not sure what
> else would be meant by "seeing" the thing.
>
>
> I think my intuitions about boxes tie heavily to 'getClass' behavior (or
> some analogous reflective operation). "What are you?" should give me
> different answers for a bare value and a box. A duck in a box is not the
> same thing as a duck.
>
> The analogy here would be that Integer.getClass() returns Integer.class,
> while int.getClass(), if it existed, would return int.class.
>

So far so good. If `int.getClass()` has to work at all, it might as well
produce `int.class`, though it serves no actual purpose and we would just
refactor it to `int.class` anyway. If `int.getClass()` won't even compile,
it would be no great loss at all. The method exists for finding the dynamic
type of an object; my model says "values are not objects and so have no
dynamic type", which I think is good.


I might want to write code like:
>
> <T extends Point.ref> void m(T arg) {
>     if (arg.getClass() == Point.class) System.out.println("I'm a value!");
>     else System.out.println("I'm a box!");
> }
>

Someone might think this, but they can just ask themselves whether
`int/Integer` work like that. They don't, so this doesn't either. This is
one example of why users can *keep* almost everything they already know
about `int/Integer`.


But this isn't the runtime behavior we would intend to support, because in
> fact at runtime there are no boxes to reflect.
>
> I'll attempt to flip this around on you. :-) You say that a *value* of
> type Point is also already an "object". But then where is its header, its
> dynamic type? Objects have that. For whatever reason this seemed like the
> more conspicuous leak to me.
>
> The value type/reference type model is that you can operate on an object
> directly, or by reference. It's the same object either way.
>

I will be writing out my argument for why this is nonsense. :-)  Not meant
to sound rude (I didn't know it to be nonsense myself a month ago).


> Reference conversion just says "take this object and give me a reference
> to it". Nothing about the object itself changes.
>
> The details of object encoding are deliberately left out of the model, but
> it's perfectly fine for you to imagine a header and a dynamic type carried
> around with the object always, both when accessed as a value and when
> accessed via a reference.
>

Huh. It seems to me very important to understand that when I use Point (not
Point.ref) there is no header involved. Values are not self-describing,
which is a big part of their appeal! This no-header fact is also what
explains to me why values have to be not just layout-monomorphic (as you
mention next) but entirely, strictly monomorphic.


(It is, I suppose, part of the model that objects of a given class all have
> a finite, matching layout when accessed by value, even if the details of
> that layout are kept abstract. Which is why value types are monomorphic and
> you need reference types for polymorphism.)
>
> The fact that the VM often discards object headers at runtime is a pure
> optimization.
>
> I'm claiming this picture makes explaining the feature harder,
> unnecessarily. An unhoused value floating around somewhere that I can
> somehow have a reference to strikes me as quite exotic. Tell me it's just
> an object and I feel calmer.
>
>
> Yes, it's just an object. :-)
>
> But not quite how you mean. The new feature here is working with objects
> *directly*, without references. I think one thing you're struggling with is
> that your concept of "object" includes the reference, and if we take that
> away, it doesn't quite seem like an object anymore.
>

Not struggling.

So there is a body of associations our users have with terms like "object",
"class", "primitive", and so on. Many of them are even right (so far). But
they can't all survive intact.

To best serve these users, we have to make a careful determination between
which of those associations we deem are the *essential* ones and which are
merely circumstantial or ancillary. And then we want for the essential
portion to change as little as possible with the release of Valhalla. Users
want to feel lots of stable ground underneath them -- and if it takes a bit
of retraining to show them why it's stable ground, that's still a pretty
good outcome; that is still better than having to tell them "just change
what you think you know to this new thing."

You and I are just advocating for different places to make that cut, that's
all.  I think mine represents more "stable ground", but I accept the burden
of argument here, and will keep working on it.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Thu Nov  4 01:34:52 2021
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 4 Nov 2021 01:34:52 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
Message-ID: <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>

On Nov 3, 2021, at 4:05 PM, Dan Smith <daniel.smith at oracle.com<mailto:daniel.smith at oracle.com>> wrote:

(It is, I suppose, part of the model that objects of a given class all have a finite, matching layout when accessed by value, even if the details of that layout are kept abstract. Which is why value types are monomorphic and you need reference types for polymorphism.)

The fact that the VM often discards object headers at runtime is a pure optimization.

Let?s see what happens if we say that (a) bare values have headers and (b) Object::getClass allows the user to observe part of the header contents.

It follows then that the expression aPointVal.getClass() will show the contents of aPointVal?s header, even if it is a compile-time constant.

Point pv = new Point(42,42);  // ?class Point? is the definition of Point
assert pv.getClass() == Point.class;  // ok, that?s certainly the class
assert pv.getClass() != Point.ref.class;  // and it?s not a ref, so good

That is all fine.  There?s a little hiccup when you ?box? the point and get the same Class mirror even though the ?header? is a very real-heap resident value now:

Point.ref pr = pv;  // same object? now it?s on the heap, though, with a real live heap header
assert pr.getClass() == Point.class;  // same class, but...
assert pr.getClass() != Point.ref.class;  // we suppress any distinction the heap header might provide

There?s a bigger hiccup when you compare all that with good old int:

int iv = 42;  // ?class int? is NOT a thing, but ?class Integer? is
assert iv.getClass() != int.class;  // because int is not a class
assert iv.getClass() == Integer.class;  // ah, there?s the class!
assert iv.getClass() == int.ref.class;  // this works differently from Point
assert ((Object)iv).getClass() == pr.getClass();  // this should be true also, right?

And to finish out the combinations:

int.ref ir = iv;  // same object? now it?s on the heap, though, with a real live heap header
assert ir.getClass() == Integer.class;  // same class
assert ir.getClass() == int.ref.class;  // and this time it?s a ref-class (only for classic primitives)
assert ir.getClass() != int.class;

All this has some odd irregularities when you compare what Point does and what int does.  And yet it?s probably the least-bad thing we can do.

A bad response would be to follow the bad precedent of ir.getClass() == Integer.class off the cliff, and have pv.getClass() and pr.getClass() return Point.ref.class.  That way, getClass() only returns a ref.  Get it, see, getClass() can only return reference types.  The rejoinder (which Brian made to me when I aired it) is devastating:  Point.class is the class, not Point.ref.class, and the method is named ?get-class?.

Another approach would be to fiddle with the definitions of val.getClass(), so as to align iv.getClass() with pv.getClass() with their non-ref types.  But that still leaves pv.getClass() unaligned (in its non-ref-ness) with ir.getClass() (in its ref-ness).  We still expect Point.class as the answer from *both* pr.getClass() and pv.getClass().

Or we could try to make the problem go away by simply outlawing (statically) instances of expr.getClass() that expose inconvenient answers.  Such moves score high on the ?Those Idiots? score card.  And they still doesn?t align the ref-ness of pr.getClass() vs. ir.getClass().

Maybe we only earn partial Idiot Points if we outlaw iv.getClass() but allow pv.getClass()?  Same amount of seam, different shape of seam, IMO.

Another source of constraint is that we expect that up-casting anything to Object and then re-querying should not change the answer.  (This is another way of saying that the header should stay the same whether it is in the heap or not.)  It is one of the reasons that iv.getClass() should not return int.class.

assert ((Object)pv).getClass() == pv.getClass();  // this should be true also, right?
assert ((Object)pr).getClass() == pr.getClass();  // this should be true also, right?
assert ((Object)iv).getClass() == iv.getClass();  // this should be true also, right?
assert ((Object)ir).getClass() == ir.getClass();  // this should be true also, right?

This is an over-constrained problem.  I don?t know how to make it look more regular, and I think (after doing some more exhaustive analysis off-line) there aren?t any other ideas we haven?t examined.

(I?m saying that partly in a superstitious hope that, having said it, someone will of course prove me wrong.)


I'm claiming this picture makes explaining the feature harder, unnecessarily. An unhoused value floating around somewhere that I can somehow have a reference to strikes me as quite exotic. Tell me it's just an object and I feel calmer.

Yes, it's just an object. :-)

But not quite how you mean. The new feature here is working with objects *directly*, without references. I think one thing you're struggling with is that your concept of "object" includes the reference, and if we take that away, it doesn't quite seem like an object anymore.

The lack of ?null? in the value set is a small but persistent hint that something has changed in the object representation.

We can double down on the model that a val-object has a header.  It?s not in the heap; it has a statically defined value; it exists (if at all) to assist with Object::getClass and the other methods as needed.  It feeds getClass with the val-projection, not the ref-projection.

We are so sorry, Mr. int.  You don?t really pass as a primitive class.  If an int has a header (on stack or on heap), it feeds getClass with the ref-projection Integer.class, not the val-projection int.class, because your class is Integer, a ref-type (one of 8 or 9 such types).  It?s a seam.

BTW, here?s another look at the difference between Mr. int and Mr. Point:

var pv = new Point(42,42);  // var infers Point (a val type)
assert new Point(42,42).getClass() == Point.class;  // OK

//var pr = new Point.ref(42,42);  // nope, Point.ref is not ?class Point?
//assert new Point.ref(42,42).getClass() == Point.ref.class;  // cannot ask this question

var ir = new Integer(42);  // var infers Integer (a ref type)
assert new Integer(42).getClass() == Integer.class;  // OK, but I don?t like Integer as much as Point

//var iv = new int(42);  // sorry, Mr. int, you don?t get to play there
//assert new int(42).getClass() == int.class;  // cannot ask this question

Did I get all the details right, Dan and Brian?

? John

From kevinb at google.com  Thu Nov  4 05:29:23 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 22:29:23 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
Message-ID: <CAGKkBktOj-cdQ4z+qzkKmic3O1Qy16bcKjZCuPk7C+J3nHw3qQ@mail.gmail.com>

On Tue, Nov 2, 2021 at 4:53 PM Brian Goetz <brian.goetz at oracle.com> wrote:

## Reflection
>>
>> Earlier designs all included some non-intuitive behavior around
>> reflection.
>> What we'd like to do is align the user-visible types with reflection
>> literals
>> with descriptors, following the invariant that
>>
>>      new X().getClass() == X.class
>>
>
> Seems like part of the goal would be making it fit naturally with the
> current int/Integer relationship (of course, `42.getClass()` is uncommitted
> to any precedent).
>
>
> There's a nasty tension here.  On the one hand, for B3 classes, it makes
> sense for b3.getClass() to yield the val mirror, but int.getClass()
> historically corresponds to the ref mirror (Object o = 3; o.getClass() ==
> Integer.class.)
>

I'm confused at why there's any concern here. `anInt.getClass()` has never
existed, so it can do anything. The code snippet you show is obviously
boxing so of course the class is that of the box.


You could argue that it doesn't make sense on the values, but surely it
> makes sense on their boxes.  But its a thin argument, since classes extend
> Object, and we want to treat values as objects (without appealing to
> boxing) for purposes of invoking methods, accessing fields, etc.  So
> getClass() shouldn't be different.
>

Sorry for this: I'm not trying to push values-aren't-objects relentlessly
in multiple threads, but I think what you said is a little off, and worth
pushing back on no matter which model is the one we ship.

"Invoking methods, accessing fields" really seem a lot like things you can
do with *class instances*, and who cares if it's an "object" or not.
Consider two points: we call them "instance methods", not "object methods",
and where do they come from? From a class. This seems to me to be at the
heart of what classes and class instances are about. I don't find a reason
to utter the word "object" while talking about this. When do I, then?
Well....

You say "we want to treat values as objects for purposes of invoking
methods." But I'm not sure you really want that. :-) *Objects* have dynamic
dispatch, so it has to dereference and check the dynamic type and
re-resolve what method to actually call. Values have none of that junk,
just call the method. RIght? I think that's significant. What I think you
mean (?) is that invoking methods needs to *work* for all kinds of class
instances. But it works a bit *differently* for values vs. objects (in my
parlance; in your current parlance, that's "it works differently for values
vs. objects-except-values").

(If preparing to respond that methods on a static type that's final don't
have dynamic dispatch either, meh, that's more simply understood as "of
course you don't actually query when you already know the answer; that's
just optimization". Whereas with the value type, the idea of this querying
at all isn't even a thing.)

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Nov  4 06:54:56 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 3 Nov 2021 23:54:56 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
Message-ID: <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>

On Wed, Nov 3, 2021 at 6:35 PM John Rose <john.r.rose at oracle.com> wrote:

Let?s see what happens if we say that (a) bare values have headers
>
and (b) Object::getClass allows the user to observe part of the header
> contents.
>

I'm asking specific questions below as best I can, but I must confess that
I don't really follow what this thought experiment is trying to demonstrate
overall.


It follows then that the expression aPointVal.getClass() will show the
> contents of aPointVal?s header, even if it is a compile-time constant.
>
> Point pv = new Point(42,42);  // ?class Point? is the definition of Point
> assert pv.getClass() == Point.class;  // ok, that?s certainly the class
>

(Header or not, that could just be a hardcoded synthetic method anyway?
Have I been missing something big when I keep pointing out that this method
would be patently useless? When static type is MyValueClass, result is
MyValueClass.class, always, no?)


> assert pv.getClass() != Point.ref.class;  // and it?s not a ref, so good
>
> That is all fine.  There?s a little hiccup when you ?box? the point and
> get the same Class mirror even though the ?header? is a very real-heap
> resident value now:
>
> Point.ref pr = pv;  // same object? now it?s on the heap, though, with a
> real live heap header
> assert pr.getClass() == Point.class;  // same class, but...
>

Why would we even want this? It would be very surprising/puzzling to me.


> assert pr.getClass() != Point.ref.class;  // we suppress any distinction
> the heap header might provide
>
> There?s a bigger hiccup when you compare all that with good old int:
>
> int iv = 42;  // ?class int? is NOT a thing, but ?class Integer? is
> assert iv.getClass() != int.class;  // because int is not a class
>

No matter: array types aren't classes either. (If they're treated as such
internally, hats off to you folks, because that bit of trivia basically
never leaks, except perhaps for the particular misnamed-in-retrospect
method/type/literal trio we're talking about here. And that's great. Either
way, array types aren't classes, and `getClass` means "get runtime type"
(for a reasonable definition thereof), and ergo, I'd guess that assertion
and the next below to fail.)

assert iv.getClass() == Integer.class;  // ah, there?s the class!
>
assert iv.getClass() == int.ref.class;  // this works differently from Point
> assert ((Object)iv).getClass() == pr.getClass();  // this should be true
> also, right?
>

Not sure what that's meant to return, but surely casting to Object must do
nothing different from casting to Integer or int.ref.


> And to finish out the combinations:
>
> int.ref ir = iv;  // same object? now it?s on the heap, though, with a
> real live heap header
> assert ir.getClass() == Integer.class;  // same class
> assert ir.getClass() == int.ref.class;  // and this time it?s a ref-class
> (only for classic primitives)
> assert ir.getClass() != int.class;
>
> All this has some odd irregularities when you compare what Point does and
> what int does.  And yet it?s probably the least-bad thing we can do.
>
> A bad response would be to follow the bad precedent of ir.getClass() ==
> Integer.class off the cliff, and have pv.getClass() and pr.getClass()
> return Point.ref.class.  That way, getClass() only returns a ref.  Get it,
> see, getClass() can only return reference types.  The rejoinder (which
> Brian made to me when I aired it) is devastating:  Point.class is the
> class, not Point.ref.class, and the method is named ?get-class?.
>

If s/named/misnamed/ is it still devastating? :-)

With this my brain has given its last feeble gasp of the night.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Nov  4 14:56:27 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 4 Nov 2021 10:56:27 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
Message-ID: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>


On 11/4/2021 2:54 AM, Kevin Bourrillion wrote:
>
>
>     Point.ref pr = pv; ?// same object? now it?s on the heap, though,
>     with a real live heap header
>     assert pr.getClass() == Point.class; ?// same class, but...
>
>
> Why would we even want this? It would be very surprising/puzzling to me.

It's surprising because we're so used to "boxes" being a thing.? But 
let's look at this a bit.

 ??? int n = 3;
 ??? Object asObject = n;

People like that this compiles, and are not forced to say 
`Integer.valueOf(3)`, but autoboxing hides something bad about boxing; 
that the box type is something completely different.? If we weren't so 
steeped in the Culture of Boxing, we'd be surprised that a lowly 
assignment like this changes the type and representation so drastically 
(and as it turns out, unnecessarily.)

Let's compare with what happens with String.

 ??? String aString = new String("foo");
 ??? Object asObj = aString;

This code makes use of both String and Object, but in slightly different 
and overlapping ways.? String in this example is both a static and 
dynamic type.? When we make a new String with the String constructor, we 
are instantiating a new instance whose dynamic type (getClass) is 
String.? Then we assign it to a variable whose static type is String.? 
When we then assign that variable to a new variable whose static type is 
Object, nothing is being converted into an Object; the thing in asObj is 
still a String; it's just held in a variable whose *static* type is a 
supertype of String.

But, this isn't a perfect example, because Object is also a dynamic 
type; I can create new Objects with an Object constructor.? So let's 
replace with Comparable:

 ??? String aString = new String("foo");
 ??? Comparable c = aString;

Now, String is both a static and dynamic type, but Comparable is *only a 
static type*.? There are no objects whose report their type as 
Comparable; there are only objects whose type extends Comparable.

Now, let's go back to the integer example.? The assignment here (in the 
current language) takes the primitive value stored in n, and creates a 
whole new, accidental object whose type is different from int.? The only 
saving grace is that you cannot discern the type of int, since you can't 
ask it getClass, but we all know that boxing is a big seam in the 
language, runtime, and reflection.

In the new world, Point.ref is like Comparable; it exists as a static 
type for variables, but there are no objects that are *instances* of 
Point.ref, because its not a concrete type.

 ??? Point p = new Point(3, 4);
 ??? Point.ref asRef = p;

This is like the String to Comparable example; the new variable refers 
to the same object, but through an alias that has a different static type.

Now, alias is a funny word to use here, but this connects back to the 
whole point of Valhalla -- the reason we disavow identity is that we are 
no longer constrained to track the fact that two references were 
initially assigned from the same object instance. In the absence of 
identity, we're free to copy the state rather than the reference, which 
admits powerful optimizations.? But in the Point example above, it 
really makes sense to think of the Point.ref as a reference to the *same 
instance* that is held by p.


From brian.goetz at oracle.com  Thu Nov  4 15:54:08 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 4 Nov 2021 11:54:08 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
 <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
Message-ID: <cc7c7452-9c59-9d2f-377e-a1044f2c4012@oracle.com>

To close the loop, in the initial "Eclair" discussion (which grew out of 
a conversation at the last JVMLS), a primitive was a pair of classes, 
where the companion class was actually an interface.? We haven't 
revisited "what is Point.ref" since then, but one possible way to do 
this is to say exactly this: that Point is a primitive class, and 
Point.ref is an interface it implements.? That makes it clear that (a) 
why it is a reference type, (b) that it is no different from other 
superinterfaces, and (c) that no object is actually of type Point.ref.

Q1: does this help?

Q2: Does this provide us a path to rehabilitating the user intuition 
around boxing, by saying "good news everyone, we still have boxes, but 
now they're interfaces, not concrete objects."? Does that balance the 
desire to lean on existing intuition, while breaking enough about the 
implementation assumptions to not carry all the existing baggage?


>
> In the new world, Point.ref is like Comparable; it exists as a static 
> type for variables, but there are no objects that are *instances* of 
> Point.ref, because its not a concrete type.
>
> ??? Point p = new Point(3, 4);
> ??? Point.ref asRef = p;
>
> This is like the String to Comparable example; the new variable refers 
> to the same object, but through an alias that has a different static 
> type.


From kevinb at google.com  Thu Nov  4 16:08:45 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 4 Nov 2021 09:08:45 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
 <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
Message-ID: <CAGKkBkvok-MeiQqRtehro_ZAvP9H8C+h438owLYrZ7W1WvNHyg@mail.gmail.com>

On Thu, Nov 4, 2021 at 7:56 AM Brian Goetz <brian.goetz at oracle.com> wrote:

On 11/4/2021 2:54 AM, Kevin Bourrillion wrote:
>
> Point.ref pr = pv;  // same object? now it?s on the heap, though, with a
>> real live heap header
>> assert pr.getClass() == Point.class;  // same class, but...
>>
>
> Why would we even want this? It would be very surprising/puzzling to me.
>
> It's surprising because we're so used to "boxes" being a thing.  But let's
> look at this a bit.
>

Okay, it's clear I have more work to do in understanding your whole
coherent model as it exists.

Summary of what kevinb has been on about the last 24 hours:

The model I've been speaking for over the past day has flowed from
following my own "I want to think it's as simple as...." intuitions. I
expected to sort of "hit a wall" with those naive assumptions and never
felt like I did (yet).

Your model is likely enough the best, and I'm simply "resisting" it, but in
that case I'm channeling some of the resistance other users will feel, and
we can hash out how to head it off. But also, occasionally I turn out to be
right about things so I'll prepare for that misfortune as well.

I think it's worth my understanding both models until I can explain them
well, and *then* we can make more progress. Let's just name the models.
Fair enough? (Feel free to inject "no need to name your model because I can
give the killer argument right now why it just can't work", I mean we
wouldn't name a woodland animal we found moments from death on the side of
the road, would we.)

--
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Nov  4 16:18:21 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 4 Nov 2021 12:18:21 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvok-MeiQqRtehro_ZAvP9H8C+h438owLYrZ7W1WvNHyg@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
 <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
 <CAGKkBkvok-MeiQqRtehro_ZAvP9H8C+h438owLYrZ7W1WvNHyg@mail.gmail.com>
Message-ID: <1cce1692-84b3-608f-703e-3cdd01b8fafd@oracle.com>

I would summarize what you've been on about as "Hey, developers are used 
to primitives and boxes, is there mileage in working within that 
framework, rather than tossing it out the window because boxes seem 
dirty?"? And I think there is something to that.

One way to frame this is that there's a model that makes sense from a 
specification perspective, and there's a mental model that makes sense 
to Java users, and these need not be the same. A prime (wildly 
unrelated) example of this is the "double brace idiom"; language weenies 
cringe because it's not actually a thing, its just an interaction 
between an unfortunate syntax choice for instance initializers and 
otherwise empty anonymous classes, but what does it matter if users 
think it's a thing?

The spec already has opinions about what terms like "value", "object", 
"reference", etc mean.? Internally, we have to respect these (or pay the 
cost to refactor the terminology), but this only weakly constrains the 
user model.? The language spec makes clear distinctions between classes 
and types, but most developers can slush the differences away and spend 
that mental bookkeeping budget elsewhere.? The main risk of trying to 
present an alternate model is that it will invariably use terms (e.g., 
"object") that appear to have their meaning nailed down; perhaps we need 
a notational convention to distinguish between "what the spec currently 
calls object" and "what users understand objects to be", at least for 
sake of discussion?


On 11/4/2021 12:08 PM, Kevin Bourrillion wrote:
> On Thu, Nov 4, 2021 at 7:56 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>     On 11/4/2021 2:54 AM, Kevin Bourrillion wrote:
>>
>>         Point.ref pr = pv; ?// same object? now it?s on the heap,
>>         though, with a real live heap header
>>         assert pr.getClass() == Point.class; ?// same class, but...
>>
>>
>>     Why would we even want this? It would be very surprising/puzzling
>>     to me.
>     It's surprising because we're so used to "boxes" being a thing.?
>     But let's look at this a bit.
>
>
> Okay, it's clear I have more work to do in understanding your whole 
> coherent model as it exists.
>
> Summary of what kevinb has been on about the last 24 hours:
>
> The model I've been speaking for over the?past day has flowed from 
> following?my own "I want to think it's as simple as...." intuitions. I 
> expected to sort of "hit a wall" with those naive assumptions and 
> never felt like I did (yet).
>
> Your model is likely enough the best, and I'm simply "resisting" it, 
> but in that case I'm channeling some of the resistance other users 
> will feel, and we can hash out how to head it off. But also, 
> occasionally I turn out to be right about things so I'll prepare for 
> that misfortune as well.
>
> I think it's worth my understanding both models until I can explain 
> them well, and /then/?we can make more progress. Let's just name the 
> models. Fair enough? (Feel free to inject "no need to name your model 
> because I can give the killer argument right now why it just can't 
> work", I mean we wouldn't name a woodland animal we found moments from 
> death on the side of the road, would we.)
>
> --
> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com

From daniel.smith at oracle.com  Thu Nov  4 16:28:13 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 4 Nov 2021 16:28:13 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBktgd0iyC3eCs-mj02nCz1twHsdzkVuJVBQycU_vTuo0Qg@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <CAGKkBktgd0iyC3eCs-mj02nCz1twHsdzkVuJVBQycU_vTuo0Qg@mail.gmail.com>
Message-ID: <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com>

On Nov 3, 2021, at 6:19 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

I think my intuitions about boxes tie heavily to 'getClass' behavior (or some analogous reflective operation). "What are you?" should give me different answers for a bare value and a box. A duck in a box is not the same thing as a duck.

The analogy here would be that Integer.getClass() returns Integer.class, while int.getClass(), if it existed, would return int.class.

So far so good. If `int.getClass()` has to work at all, it might as well produce `int.class`, though it serves no actual purpose and we would just refactor it to `int.class` anyway. If `int.getClass()` won't even compile, it would be no great loss at all. The method exists for finding the dynamic type of an object; my model says "values are not objects and so have no dynamic type", which I think is good.

But Point extends Object, and Object.getClass exists.

One thing the user model has to explain is how method inheritance works. You've been pointing out that inheritance != subtyping, which is true. But still, when I invoke a super method (a default method in a superinterface, say), it must be true that that method declaration knows how to execute on a value.

The ref/val model explains this by saying that method invocation will add/remove references to align with the expecations of the (dynamically-selected) method implementation. The object remains the same, so 'this' is the object that the caller started with.

I guess the value/object model would pretty much say the same thing, except it would say the value the caller started with might be boxed (or the object unboxed) to match the method's expectations. It's the same *value*, presented as an object.

Either way, if I can invoke 'getClass', its behavior is specified by the *class* not the value/object, so I would expect to get the same answer whether invoked via a value or a reference/box.

(Another thing you could say is that the super method is like a template, stamped out in specialized form for each primitive subclass as part of inheritance. We experimented with this way of thinking for awhile before deciding, no, it really needs to be the case that invoking an inherited method means executing the method body in its original context.)

Now, all that said, we could say by fiat that `getClass` is special and value types aren't allowed to invoke it. YAGNI. Except...

I might want to write code like:

<T extends Point.ref> void m(T arg) {
    if (arg.getClass() == Point.class) System.out.println("I'm a value!");
    else System.out.println("I'm a box!");
}

Someone might think this, but they can just ask themselves whether `int/Integer` work like that. They don't, so this doesn't either.

int/Integer are a starting point, but our goal is to offer something more.

In particular, we want universal generics: when I invoke m and pass it a Point, it must be the case that T=Point, not T=Point.ref. This is different than the status quo for int/Integer, where T=Integer.

The right way to interpret generic code is, roughly, to substitute [T:=Point] and figure out what the code would do. This is imprecise, because there are compile-time decisions that aren't allowed to change under different substitutions. (For example, we don't re-do overload resolution for different Ts, even if it would get different answers.) But, for our purposes, it should be the case that you can imagine 'arg' being a value, not a reference, and this code having intuitive behavior.

So the ref/val model says that 'arg' is an object (handled by value, not by reference) and its 'getClass' method returns the class of the object.

The value/object model says that 'arg' is a value and its 'getClass' method exists. And I guess it returns Point.class.

(If we really thought `getClass` was poison, I guess at this point we could say by fiat that type variable types aren't allowed to access `getClass`. But... `getClass` really is a useful thing to invoke in this context.)

An implication of universal generics is that there needs to be some common protocol that works on both vals and refs. In the val/ref model, that protocol is objects: both vals and refs are objects with members that can be accessed via '.'. In the value/object model, I'm not quite sure how you'd explain it. Maybe there's a third concept here, generalizing how values and objects behave.


From daniel.smith at oracle.com  Thu Nov  4 16:36:07 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 4 Nov 2021 16:36:07 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
Message-ID: <FACC7C70-364E-4CCD-970C-D57A1A25D44E@oracle.com>

On Nov 3, 2021, at 7:34 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

There?s a bigger hiccup when you compare all that with good old int:

int iv = 42;  // ?class int? is NOT a thing, but ?class Integer? is
assert iv.getClass() != int.class;  // because int is not a class
assert iv.getClass() == Integer.class;  // ah, there?s the class!
assert iv.getClass() == int.ref.class;  // this works differently from Point
assert ((Object)iv).getClass() == pr.getClass();  // this should be true also, right?

And to finish out the combinations:

int.ref ir = iv;  // same object? now it?s on the heap, though, with a real live heap header
assert ir.getClass() == Integer.class;  // same class
assert ir.getClass() == int.ref.class;  // and this time it?s a ref-class (only for classic primitives)
assert ir.getClass() != int.class;

All this has some odd irregularities when you compare what Point does and what int does.  And yet it?s probably the least-bad thing we can do.

A bad response would be to follow the bad precedent of ir.getClass() == Integer.class off the cliff, and have pv.getClass() and pr.getClass() return Point.ref.class.  That way, getClass() only returns a ref.  Get it, see, getClass() can only return reference types.  The rejoinder (which Brian made to me when I aired it) is devastating:  Point.class is the class, not Point.ref.class, and the method is named ?get-class?.

I guess to rephrase this, I'll just say: yes, there are problems with int/Integer. But we shouldn't let that tail wag the dog when sorting out the language model. int/Integer is going to be a special case, no matter how we stack it. (On the other hand, we really like to look for analogies from int/Integer when sorting out the language model, and sometimes those are fruitful. But handle with care.)


From daniel.smith at oracle.com  Thu Nov  4 16:41:10 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 4 Nov 2021 16:41:10 +0000
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkvok-MeiQqRtehro_ZAvP9H8C+h438owLYrZ7W1WvNHyg@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <CAGKkBktREgT73EkrzhgNoRCTOaHHoZWoo5_2a99iqMTd3=z1DA@mail.gmail.com>
 <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com>
 <CAGKkBkvok-MeiQqRtehro_ZAvP9H8C+h438owLYrZ7W1WvNHyg@mail.gmail.com>
Message-ID: <00AD191C-7257-486F-85D9-531C69489B1F@oracle.com>

On Nov 4, 2021, at 10:08 AM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

Your model is likely enough the best, and I'm simply "resisting" it, but in that case I'm channeling some of the resistance other users will feel, and we can hash out how to head it off. But also, occasionally I turn out to be right about things so I'll prepare for that misfortune as well.

Keep it up. It's a very useful exercise, and I haven't ruled out that you're onto something valuable here.

From kevinb at google.com  Thu Nov  4 16:51:14 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 4 Nov 2021 09:51:14 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <CAGKkBktgd0iyC3eCs-mj02nCz1twHsdzkVuJVBQycU_vTuo0Qg@mail.gmail.com>
 <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com>
Message-ID: <CAGKkBkuQfkwAfXbpVVRJua6EMD4nxxf_=sqn=9owt_j+9QTArA@mail.gmail.com>

On Thu, Nov 4, 2021 at 9:28 AM Dan Smith <daniel.smith at oracle.com> wrote:

> On Nov 3, 2021, at 6:19 PM, Kevin Bourrillion <kevinb at google.com> wrote:
>
> I think my intuitions about boxes tie heavily to 'getClass' behavior (or
>> some analogous reflective operation). "What are you?" should give me
>> different answers for a bare value and a box. A duck in a box is not the
>> same thing as a duck.
>>
>> The analogy here would be that Integer.getClass() returns Integer.class,
>> while int.getClass(), if it existed, would return int.class.
>>
>
> So far so good. If `int.getClass()` has to work at all, it might as well
> produce `int.class`, though it serves no actual purpose and we would just
> refactor it to `int.class` anyway. If `int.getClass()` won't even compile,
> it would be no great loss at all. The method exists for finding the dynamic
> type of an object; my model says "values are not objects and so have no
> dynamic type", which I think is good.
>
>
> But Point extends Object, and Object.getClass exists.
>

As does `wait()`. :-)  But absolutely, this case is different; I'm trying
to be clear that it seems *pointless but harmless* for
`someValue.getClass()` to be callable, so long as it returns whatever is
the most sensible thing according to the model adopted. Keeps static
refactoring tools in business!


Now, all that said, we could say by fiat that `getClass` is special and
> value types aren't allowed to invoke it. YAGNI. Except...
>

> I might want to write code like:
>>
>> <T extends Point.ref> void m(T arg) {
>>     if (arg.getClass() == Point.class) System.out.println("I'm a value!");
>>     else System.out.println("I'm a box!");
>> }
>>
>
> Someone might think this, but they can just ask themselves whether
> `int/Integer` work like that. They don't, so this doesn't either.
>
> int/Integer are a starting point, but our goal is to offer something more.
>

Just to be clear, my intention was only to say that under the
sure-they're-boxes model it would both "not work" and "not be expected to
work", which is at least harmonious. :-)


An implication of universal generics is that there needs to be some common
> protocol that works on both vals and refs. In the val/ref model, that
> protocol is objects: both vals and refs are objects with members that can
> be accessed via '.'. In the value/object model, I'm not quite sure how
> you'd explain it. Maybe there's a third concept here, generalizing how
> values and objects behave.
>

This is on point. I quite honestly forgot that "oh yeah, I don't fully
understand universal generics yet", and I'll go work on that. It might be
death to the model I'm clinging to, but in that case I'll become pretty
good at explaining to people why that model fails, so cool.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Nov  4 16:56:15 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 4 Nov 2021 09:56:15 -0700
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <FACC7C70-364E-4CCD-970C-D57A1A25D44E@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
 <FACC7C70-364E-4CCD-970C-D57A1A25D44E@oracle.com>
Message-ID: <CAGKkBksA4ReGaBn9mfyFdh28vRx3pUZGZjfTK3zofjyQ1DRtjQ@mail.gmail.com>

On Thu, Nov 4, 2021 at 9:36 AM Dan Smith <daniel.smith at oracle.com> wrote:

I guess to rephrase this, I'll just say: yes, there are problems with
> int/Integer. But we shouldn't let that tail wag the dog when sorting out
> the language model. int/Integer is going to be a special case, no matter
> how we stack it. (On the other hand, we really like to look for analogies
> from int/Integer when sorting out the language model, and sometimes those
> are fruitful. But handle with care.)
>

Perfectly said, I think.

When someone says "so it's like int/Integer?" my hope of being able to
answer "yeah actually, that will serve well enough" is closely followed by
"we'd like to say yeah, but we *need* to ask you to think of it differently
now, but you'll understand why." You probably have that and I'm just
delayed in absorbing it.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Nov  4 18:35:33 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 4 Nov 2021 14:35:33 -0400
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <CAGKkBkuQfkwAfXbpVVRJua6EMD4nxxf_=sqn=9owt_j+9QTArA@mail.gmail.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <CAGKkBkvEOH88eo3eWZxAMyU2eKM+9HSVpasqTMCYXEvb2FiXzQ@mail.gmail.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <CAGKkBktgd0iyC3eCs-mj02nCz1twHsdzkVuJVBQycU_vTuo0Qg@mail.gmail.com>
 <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com>
 <CAGKkBkuQfkwAfXbpVVRJua6EMD4nxxf_=sqn=9owt_j+9QTArA@mail.gmail.com>
Message-ID: <5c017540-1863-4418-8dcd-10926e7239f8@oracle.com>


>
>     An implication of universal generics is that there needs to be
>     some common protocol that works on both vals and refs. In the
>     val/ref model, that protocol is objects: both vals and refs are
>     objects with members that can be accessed via '.'. In the
>     value/object model, I'm not quite sure how you'd explain it. Maybe
>     there's a third concept here, generalizing how values and objects
>     behave.
>
>
> This is on point. I quite honestly forgot that "oh yeah, I don't fully 
> understand universal generics yet", and I'll go work on that. It might 
> be death to the model I'm clinging to, but in that case I'll become 
> pretty good at explaining to people why that model fails, so cool.
>

Generics are often a clarifying lens through which to look at this 
problem.? We've caught ourselves multiple times trying to locally 
optimize, only to find that is an impediment to "generify over all the 
things."? One of the arguments in favor of "everything is an object" (or 
a class, or whatever), aside from its natural uniformity, is that then 
generics have a more regular surface to quantify over; generifying over 
all types is easier when the types have more in common.

For example, one of the reasons to allow the locution "String.ref" as an 
alias for String, while useless, is that it strengthens the notion that 
".ref" is a total operator, so "T.ref" makes sense simply by appealing 
to substitution, rather than having to give it a more elaborate definition.

When considering universal (erased) generics, we had to totalize the 
semantics of all operations, even when some operations are not allowed 
under a strict-substitution interpretation.? A quick tour (assume `t` is 
of type `T`, an unbounded type variable, which is instantiated to `Point`.)

 ?- Assignment to Object or interface (`Object o = t`).? In the 
language, this is considered a primitive widening (nee boxing) 
conversion, but in the VM, this is mere subtyping (QFoo is-a LFoo). This 
means that we can use the same `astore` or `putfield` operations to 
simply move the value without conversion.

 ?- Assignment to null (`T t = null`).? Not all types under T are 
nullable, but T is still erased to Object.? In this case, we assign a 
null and issue an unchecked warning; if that values bubbles out to 
non-generic code, the cast to `Point` will catch the null, and treat 
this as a form of heap pollution.

 ?- Array covariance (`Object[] os = ts`).? The JVM has been upgraded to 
support array covariance for primitives, where `Point[] <: Point.ref[]` 
(and transitivity gets us to `Object[]`.)

 ?- Synchronization (`synchronized(t)`).? Warnings at compile time, IMSE 
at runtime.

 ?- Equality (`o == t`).? ACMP has been upgraded to understand 
primitives, so we can translate as always.

I'm sure I missed a few, but what you see here is a bag of tricks for 
creating totality.? In some cases (equality, array covariance) we 
engineered actual totality into the bytecodes; in some cases 
(synchronization) we rely on compile time warnings and runtime errors; 
in others, we rely on erasure and lean on existing detection of heap 
pollution.

When moving forward to specialized generics, the constraints get 
stiffer.? We want a model where the _bytecode_ is invariant across 
specializations, all specialization operates on the constant pool, and 
specialization is strictly optional at runtime (meaning erasure is still 
a valid runtime strategy.)? This might mean that some total-seeming 
operations (e.g., T.default) are either outlawed or require complex 
translation through a reflective runtime.

All of this is to say, there may be some hidden indirect constraints 
that derive from the desire for a uniform but still specializable 
translation.


From kevinb at google.com  Thu Nov  4 21:34:54 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 4 Nov 2021 14:34:54 -0700
Subject: identityless objects and the type hierarchy
In-Reply-To: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com>
References: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>
 <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com>
Message-ID: <CAGKkBkv_=jgqQqkRTJr-=AJbNSbFefBgrehe0g2BJYxZHdhe4g@mail.gmail.com>

On Wed, Nov 3, 2021 at 12:43 PM Brian Goetz <brian.goetz at oracle.com> wrote:

>
> On 11/3/2021 3:00 PM, Kevin Bourrillion wrote:
>
> Okay, let's stick a pin in proper-value-types (i.e. try to leave them out
> of this discussion) for a moment...
>
> One question is whether the existing design for the bifurcated type
> hierarchy will carry right over to this split instead.
>
> Brian, your response reads like it is explaining/defending *that* design
to me. But I believe I already understood it and wasn't expressing any
problem with it.

Now we're talking about making a smaller split first, "identity objects vs.
identityless objects" (1 vs 2, not 1 vs 3), so I was inquiring into why
that class model does or does not also work exactly as-is for *this *
purpose.

(Note that I assume if bucket 3's arrival requires another such type in the
mix, there would be a second such bifurcation under IdentitylessObject.)

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From forax at univ-mlv.fr  Thu Nov  4 22:25:49 2021
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 4 Nov 2021 23:25:49 +0100 (CET)
Subject: identityless objects and the type hierarchy
In-Reply-To: <CAGKkBkv_=jgqQqkRTJr-=AJbNSbFefBgrehe0g2BJYxZHdhe4g@mail.gmail.com>
References: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>
 <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com>
 <CAGKkBkv_=jgqQqkRTJr-=AJbNSbFefBgrehe0g2BJYxZHdhe4g@mail.gmail.com>
Message-ID: <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr>

> From: "Kevin Bourrillion" <kevinb at google.com>
> To: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Jeudi 4 Novembre 2021 22:34:54
> Subject: Re: identityless objects and the type hierarchy

> On Wed, Nov 3, 2021 at 12:43 PM Brian Goetz < [ mailto:brian.goetz at oracle.com |
> brian.goetz at oracle.com ] > wrote:

>> On 11/3/2021 3:00 PM, Kevin Bourrillion wrote:

>>> Okay, let's stick a pin in proper-value-types (i.e. try to leave them out of
>>> this discussion) for a moment...

>>> One question is whether the existing design for the bifurcated type hierarchy
>>> will carry right over to this split instead.

> Brian, your response reads like it is explaining/defending that design to me.
> But I believe I already understood it and wasn't expressing any problem with
> it.

> Now we're talking about making a smaller split first, "identity objects vs.
> identityless objects" (1 vs 2, not 1 vs 3), so I was inquiring into why that
> class model does or does not also work exactly as-is for this purpose.

> (Note that I assume if bucket 3's arrival requires another such type in the mix,
> there would be a second such bifurcation under IdentitylessObject.)

I don't think a second bifurcation is needed. 
At runtime bucket 2 and bucket 3 behave the same apart from null. 
Given that IdentitylessObject (or whatever the name we choose) is an interface, it always accept null, 
so if they are typed as that interface, B2 and B3 behave exactly the same. 

R?mi 

From forax at univ-mlv.fr  Thu Nov  4 22:47:25 2021
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 4 Nov 2021 23:47:25 +0100 (CET)
Subject: [External] : Re: Consolidating the user model
In-Reply-To: <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
References: <bd7e0eea-7c61-467f-cddb-1f9916b86f2d@oracle.com>
 <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com>
 <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com>
 <CAGKkBkvE+osDje4tTcinU_rdgEFn3JHN-a68z50CkRZKT4S2nw@mail.gmail.com>
 <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com>
 <CAGKkBkvQBOq+vWHpE0Td61QBwe03SE=1Z=6b385OPdod7WmUsg@mail.gmail.com>
 <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com>
 <C143E758-D72F-404B-B0EB-AC6BA52724F4@oracle.com>
Message-ID: <1202978501.1480795.1636066045213.JavaMail.zimbra@u-pem.fr>

> From: "John Rose" <john.r.rose at oracle.com>
> To: "daniel smith" <daniel.smith at oracle.com>
> Cc: "Kevin Bourrillion" <kevinb at google.com>, "Brian Goetz"
> <brian.goetz at oracle.com>, "valhalla-spec-experts"
> <valhalla-spec-experts at openjdk.java.net>
> Sent: Jeudi 4 Novembre 2021 02:34:52
> Subject: Re: [External] : Re: Consolidating the user model

> On Nov 3, 2021, at 4:05 PM, Dan Smith < [ mailto:daniel.smith at oracle.com |
> daniel.smith at oracle.com ] > wrote:

>> (It is, I suppose, part of the model that objects of a given class all have a
>> finite, matching layout when accessed by value, even if the details of that
>> layout are kept abstract. Which is why value types are monomorphic and you need
>> reference types for polymorphism.)

>> The fact that the VM often discards object headers at runtime is a pure
>> optimization.

> Let?s see what happens if we say that (a) bare values have headers and (b)
> Object::getClass allows the user to observe part of the header contents.

> It follows then that the expression aPointVal.getClass() will show the contents
> of aPointVal?s header, even if it is a compile-time constant.

> Point pv = new Point(42,42); // ?class Point? is the definition of Point
> assert pv.getClass() == Point.class; // ok, that?s certainly the class
> assert pv.getClass() != Point.ref.class; // and it?s not a ref, so good

> That is all fine. There?s a little hiccup when you ?box? the point and get the
> same Class mirror even though the ?header? is a very real-heap resident value
> now:

> Point.ref pr = pv; // same object? now it?s on the heap, though, with a real
> live heap header
> assert pr.getClass() == Point.class; // same class, but...
> assert pr.getClass() != Point.ref.class; // we suppress any distinction the heap
> header might provide

> There?s a bigger hiccup when you compare all that with good old int:

> int iv = 42; // ?class int? is NOT a thing, but ?class Integer? is
> assert iv.getClass() != int.class; // because int is not a class
> assert iv.getClass() == Integer.class; // ah, there?s the class!
> assert iv.getClass() == int.ref.class; // this works differently from Point
> assert ((Object)iv).getClass() == pr.getClass(); // this should be true also,
> right?

How can you have int.class not being a class and at the same time having the notation int.ref ?? 

If you suppose that int is now a primitive class, B3 bucket, then iv.getClass() == int.class, 
because it's equivalent to new int(iv).getClass() == int.class 
so 
assert iv.getClass() != Integer.class; //because Integer is the reference projection 
assert iv.getClass() != int.ref.class; // because int.ref is equivalent to Integer 

If you suppose that Integer is B2 bucket (after all, all other value based class are B2), then iv.getClass() == Integer.class 
because it's equivalent to Integer.valueOf(iv).getClass() == Integer.class 
so 
assert iv.getClass() != int.class; //because int.class is a fake type like void.class 
assert iv.getClass() != int.ref.class; // does not compile because Integer is B2 not B3 

I think i've missed something ? 

R?mi 

From daniel.smith at oracle.com  Wed Nov 17 15:39:35 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 17 Nov 2021 15:39:35 +0000
Subject: EG meeting, 2021-11-17
Message-ID: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

Lots of traffic this time, we can have follow up discussions wherever there's interest. Potential topics:

"Consolidating the user model": followup discussions homed in on how we model primitive values?whether they're reference-less objects or some other "value" entity, and how they interact with reference types

"Equality operator for identityless classes": Kevin is concerned that the new == operator is an attractive nuisance, because it's sometimes, but not always, equivalent to 'equals'

"identityless objects and the type hierarchy": discussed how the IdentityObject/PrimitiveObject interfaces are used in the "Consolidating the user model" world

"Consequences of null for flattenable representations": John described strategies for encoding nulls where object references are flattened


From jesper at selskabet.org  Wed Nov 17 16:10:15 2021
From: jesper at selskabet.org (=?utf-8?Q?Jesper_Steen_M=C3=B8ller?=)
Date: Wed, 17 Nov 2021 17:10:15 +0100
Subject: EG meeting, 2021-11-17
In-Reply-To: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
Message-ID: <398DCAC3-9B8E-410B-A931-E75B710454B8@selskabet.org>

Hi Srikanth,

I suppose this meeting will decide some of the work to be done?

-Jesper

> On 17 Nov 2021, at 16.40, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> ?EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).
> 
> Lots of traffic this time, we can have follow up discussions wherever there's interest. Potential topics:
> 
> "Consolidating the user model": followup discussions homed in on how we model primitive values?whether they're reference-less objects or some other "value" entity, and how they interact with reference types
> 
> "Equality operator for identityless classes": Kevin is concerned that the new == operator is an attractive nuisance, because it's sometimes, but not always, equivalent to 'equals'
> 
> "identityless objects and the type hierarchy": discussed how the IdentityObject/PrimitiveObject interfaces are used in the "Consolidating the user model" world
> 
> "Consequences of null for flattenable representations": John described strategies for encoding nulls where object references are flattened
> 

From kevinb at google.com  Wed Nov 17 17:41:52 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 17 Nov 2021 09:41:52 -0800
Subject: EG meeting, 2021-11-17
In-Reply-To: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
Message-ID: <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>

Derp, I slept in today


On Wed, Nov 17, 2021 at 7:39 AM Dan Smith <daniel.smith at oracle.com> wrote:

"Consolidating the user model": followup discussions homed in on how we
> model primitive values?whether they're reference-less objects or some other
> "value" entity, and how they interact with reference types
>

I'm in progress writing up the two main models so far as I understand them.


"Equality operator for identityless classes": Kevin is concerned that the
> new == operator is an attractive nuisance, because it's sometimes, but not
> always, equivalent to 'equals'
>

Summary: for reftypes `==` and `.equals()` ask two different questions, and
users almost never really mean `==`, but it's *sometimes *an okay
shorthand. That remains true, but when the *"sometimes"* is, exactly, could
get much much harder to observe now. Definitely concerned -- but perhaps
the question users *actually* mean most of the time is really the "pattern
matching" question, after all. (The biggest ergonomic problem of `.equals()`


"identityless objects and the type hierarchy": discussed how the
> IdentityObject/PrimitiveObject interfaces are used in the "Consolidating
> the user model" world
>

For the moment I think this does probably carry over to
WithIdentity/WithoutIdentity or whatever they are called. The question I
think is still open (to me) is whether there really are active contractual
implications of being identityless or if it's equivalent to being
uncommitted; i.e. should a clear-cut identityless class still be able to
have an identityful subclass, or does that clearly break something.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Wed Nov 17 17:49:49 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 17 Nov 2021 09:49:49 -0800
Subject: EG meeting, 2021-11-17
In-Reply-To: <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
Message-ID: <CAGKkBkuw_Pkwd6CD0kdMQvYnb1dVKSv=fSv1Ynx55LC6fwzVkA@mail.gmail.com>

On Wed, Nov 17, 2021 at 9:41 AM Kevin Bourrillion <kevinb at google.com> wrote:

(The biggest ergonomic problem of `.equals()`
>

... is that it's not negatable, sometimes forcing a `!` to be far away to
the left, and I'm not under the impression pattern-matching addresses that.)


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Nov 18 02:04:19 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 17 Nov 2021 18:04:19 -0800
Subject: EG meeting, 2021-11-17
In-Reply-To: <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
Message-ID: <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>

On Wed, Nov 17, 2021 at 11:16 AM Dan Heidinga <heidinga at redhat.com> wrote:

> For the moment I think this does probably carry over to
> WithIdentity/WithoutIdentity or whatever they are called. The question I
> think is still open (to me) is whether there really are active contractual
> implications of being identityless or if it's equivalent to being
> uncommitted; i.e. should a clear-cut identityless class still be able to
> have an identityful subclass, or does that clearly break something.
>
> It breaks flattening.  If an identityless class is flattened - and we
> want to preserve the option to do this for bucket 2 values that are <=
> 64 bits - then we can't assign a subclass instance to a slot (field /
> array element) declared to be the superclass's type as we may have to
> truncate the subclass to have it fit.
>

Right. I guess I was figuring that the mere fact of the idenityless class
being non-final would already destroy that?

Supposing this justifies requiring `final` for these classes, then my
question evaporates. I wasn't sure though. Even losing flattening entirely
doesn't leave you worse off than B1.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Nov 18 22:26:59 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 18 Nov 2021 14:26:59 -0800
Subject: EG meeting, 2021-11-17
In-Reply-To: <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
Message-ID: <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>

On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com> wrote:

Let me turn the question around: What do we gain by allowing
> subclassing of B2 classes?
>

I'm not claiming it's much. I'm just coming into this from a different
direction.

In my experience most immutable (or stateless) classes have no real
interest in exposing identity, but just get defaulted into it. Any
dependency on the distinction between one instance and another that
equals() it would be a probable bug.

When B2 exists I see myself advocating that a developer's first instinct
should be to make new classes in B2 except when they *need* something from
B1 like mutability (and perhaps subclassability belongs in this list too!).
As far as I can tell, this makes sense whether there are even *any *performance
benefits at all, and the performance benefits just make it a lot more
*motivating* to do what is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of
B1-forcing-factors, that'll be fine, I just hadn't fully thought it through
and was implicitly treating it like an open question, which probably made
my initial question in this subthread confusing.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Nov 18 22:34:51 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 18 Nov 2021 22:34:51 +0000
Subject: EG meeting, 2021-11-17
In-Reply-To: <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
Message-ID: <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>

I think it is reasonable to consider allowing bucket two classes to be abstract.  They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic.

A similar argument works for records.

Sent from my iPad

On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com> wrote:

?
On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com<mailto:heidinga at redhat.com>> wrote:

Let me turn the question around: What do we gain by allowing
subclassing of B2 classes?

I'm not claiming it's much. I'm just coming into this from a different direction.

In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug.

When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing.


--
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com<mailto:kevinb at google.com>

From forax at univ-mlv.fr  Thu Nov 18 22:58:07 2021
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 18 Nov 2021 23:58:07 +0100 (CET)
Subject: EG meeting, 2021-11-17
In-Reply-To: <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
Message-ID: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Kevin Bourrillion" <kevinb at google.com>
> Cc: "Dan Heidinga" <heidinga at redhat.com>, "daniel smith"
> <daniel.smith at oracle.com>, "valhalla-spec-experts"
> <valhalla-spec-experts at openjdk.java.net>
> Sent: Jeudi 18 Novembre 2021 23:34:51
> Subject: Re: EG meeting, 2021-11-17

> I think it is reasonable to consider allowing bucket two classes to be abstract.
> They could be extended by other classes which would either be abstract or
> final. The intermediate types are polymorphic but the terminal type is
> monomorphic.

> A similar argument works for records.

I suppose you are talking about empty (no field) abstract classes. 
We need that for j.l.Object, j.l.Number or j.l.Record. 

>From a user POV, it's not very different from an interface with default methods. 

R?mi 

> Sent from my iPad

>> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com> wrote:

>> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga < [ mailto:heidinga at redhat.com |
>> heidinga at redhat.com ] > wrote:

>>> Let me turn the question around: What do we gain by allowing
>>> subclassing of B2 classes?

>> I'm not claiming it's much. I'm just coming into this from a different
>> direction.

>> In my experience most immutable (or stateless) classes have no real interest in
>> exposing identity, but just get defaulted into it. Any dependency on the
>> distinction between one instance and another that equals() it would be a
>> probable bug.

>> When B2 exists I see myself advocating that a developer's first instinct should
>> be to make new classes in B2 except when they need something from B1 like
>> mutability (and perhaps subclassability belongs in this list too!). As far as I
>> can tell, this makes sense whether there are even any performance benefits at
>> all, and the performance benefits just make it a lot more motivating to do what
>> is already probably technically best anyway.

>> Now, if subclassability legitimately belongs in that list of B1-forcing-factors,
>> that'll be fine, I just hadn't fully thought it through and was implicitly
>> treating it like an open question, which probably made my initial question in
>> this subthread confusing.

>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com |
>> kevinb at google.com ]

From brian.goetz at oracle.com  Thu Nov 18 23:06:29 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 18 Nov 2021 23:06:29 +0000
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
Message-ID: <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com>

No, I?m talking more broadly.

    abstract class A implements PureObject {
        int a;
    }

    abstract class B extends A {
        int b;
    }

    pure class C extends B {
        int c;
    }

Now C is a final, pure class with fields a, b, and c.  A and B are abstract superclasses of C.

There?d be details to work out, but this is not an impossible lift.  The question is whether the return on complexity is there or not.


On Nov 18, 2021, at 5:58 PM, Remi Forax <forax at univ-mlv.fr<mailto:forax at univ-mlv.fr>> wrote:


________________________________
From: "Brian Goetz" <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>>
To: "Kevin Bourrillion" <kevinb at google.com<mailto:kevinb at google.com>>
Cc: "Dan Heidinga" <heidinga at redhat.com<mailto:heidinga at redhat.com>>, "daniel smith" <daniel.smith at oracle.com<mailto:daniel.smith at oracle.com>>, "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net<mailto:valhalla-spec-experts at openjdk.java.net>>
Sent: Jeudi 18 Novembre 2021 23:34:51
Subject: Re: EG meeting, 2021-11-17
I think it is reasonable to consider allowing bucket two classes to be abstract.  They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic.

A similar argument works for records.

I suppose you are talking about empty (no field) abstract classes.
We need that for j.l.Object, j.l.Number or j.l.Record.

From a user POV, it's not very different from an interface with default methods.

R?mi


Sent from my iPad

On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com<mailto:heidinga at redhat.com>> wrote:

Let me turn the question around: What do we gain by allowing
subclassing of B2 classes?

I'm not claiming it's much. I'm just coming into this from a different direction.

In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug.

When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing.


--
Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com<mailto:kevinb at google.com>


From brian.goetz at oracle.com  Fri Nov 19 13:32:38 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 19 Nov 2021 13:32:38 +0000
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <CAJq4Gi6crdzYKoMkTTxmq1a57w=Hvktbu4g4g-G4u=ZMsmhCUw@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com>
 <CAJq4Gi6crdzYKoMkTTxmq1a57w=Hvktbu4g4g-G4u=ZMsmhCUw@mail.gmail.com>
Message-ID: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com>

The translation model I had in mind was more complicated, but my point was that the reason we disallow inheritance is because we?re trying to disallow layout polymorphism for concrete types, so that we know exactly how big a ?C? is.  And this is not inconsistent with abstract superclasses contributing fields. There?s definitely translational complexity, but its not insurmountable.  I raised it because Kevin seemed to be going somewhere with extension, and I wanted to get a better sense of what that was.  I have definitely wished for abstract records a few times before, and I could imagine Kevin has similar use cases in mind.


The model requires fields in value types to be final so each of those
fields should be marked as `final` to ensure they show the right
properties to users via reflection.  Additionally, that means that A &
B would need to have constructors to set those final fields to be
consistent with the rest of the language, but C will never run those
constructors.

Without a constructor, there's no place for A & B to set invariants on
their fields.  If they can't define the contract for those fields,
then they shouldn't define the fields.  This is similar to how
interfaces work: the interface can define a "int getX()" method that
implementers have to implement, but it can't define the "int x" field
directly.

If we relaxed the "must be final" field constraint, we'd need some
other rule to prevent A or B from defining a setter for their fields
as there is no single set of bytecode that can implement a setter for
both a value and an identity class:

void setA(int a) { putfield A.a }
vs
A setA(int a) { withfield A.a; areturn; }

Note in particular that the second *must* return a new A as values are
immutable.

The details around this would be hard for users to keep straight and
would be easy to violate when refactoring as the authors of A & B
would need to know that their subclasses include value types.  And
this would be incredibly hard to keep straight across maintenance
boundaries.

--Dan


On Nov 18, 2021, at 5:58 PM, Remi Forax <forax at univ-mlv.fr<mailto:forax at univ-mlv.fr>> wrote:


________________________________

From: "Brian Goetz" <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>>
To: "Kevin Bourrillion" <kevinb at google.com<mailto:kevinb at google.com>>
Cc: "Dan Heidinga" <heidinga at redhat.com<mailto:heidinga at redhat.com>>, "daniel smith" <daniel.smith at oracle.com<mailto:daniel.smith at oracle.com>>, "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net<mailto:valhalla-spec-experts at openjdk.java.net>>
Sent: Jeudi 18 Novembre 2021 23:34:51
Subject: Re: EG meeting, 2021-11-17

I think it is reasonable to consider allowing bucket two classes to be abstract.  They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic.

A similar argument works for records.


I suppose you are talking about empty (no field) abstract classes.
We need that for j.l.Object, j.l.Number or j.l.Record.

From a user POV, it's not very different from an interface with default methods.

R?mi


Sent from my iPad

On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com<mailto:heidinga at redhat.com>> wrote:

Let me turn the question around: What do we gain by allowing
subclassing of B2 classes?


I'm not claiming it's much. I'm just coming into this from a different direction.

In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug.

When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing.


--
Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com<mailto:kevinb at google.com>


From forax at univ-mlv.fr  Fri Nov 19 14:23:46 2021
From: forax at univ-mlv.fr (forax at univ-mlv.fr)
Date: Fri, 19 Nov 2021 15:23:46 +0100 (CET)
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com>
 <CAJq4Gi6crdzYKoMkTTxmq1a57w=Hvktbu4g4g-G4u=ZMsmhCUw@mail.gmail.com>
 <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com>
Message-ID: <1355298284.3039585.1637331826853.JavaMail.zimbra@u-pem.fr>

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Dan Heidinga" <heidinga at redhat.com>
> Cc: "Remi Forax" <forax at univ-mlv.fr>, "Kevin Bourrillion" <kevinb at google.com>,
> "daniel smith" <daniel.smith at oracle.com>, "valhalla-spec-experts"
> <valhalla-spec-experts at openjdk.java.net>
> Sent: Vendredi 19 Novembre 2021 14:32:38
> Subject: Re: [External] : Re: EG meeting, 2021-11-17

> The translation model I had in mind was more complicated, but my point was that
> the reason we disallow inheritance is because we?re trying to disallow layout
> polymorphism for concrete types, so that we know exactly how big a ?C? is. And
> this is not inconsistent with abstract superclasses contributing fields.
> There?s definitely translational complexity, but its not insurmountable. I
> raised it because Kevin seemed to be going somewhere with extension, and I
> wanted to get a better sense of what that was. I have definitely wished for
> abstract records a few times before, and I could imagine Kevin has similar use
> cases in mind.

For records, it's easy to avoid abstract inheritance because all states are public, 
so instead of 
abstract A { int a; } 
abstract B extends A { int b; } 
record C() extends B { } 

one can write 
interface A { int a(); } 
interface B extends A { int b(); } 
record C(int a, int b) implements B { } 

R?mi 

>> The model requires fields in value types to be final so each of those
>> fields should be marked as `final` to ensure they show the right
>> properties to users via reflection. Additionally, that means that A &
>> B would need to have constructors to set those final fields to be
>> consistent with the rest of the language, but C will never run those
>> constructors.

>> Without a constructor, there's no place for A & B to set invariants on
>> their fields. If they can't define the contract for those fields,
>> then they shouldn't define the fields. This is similar to how
>> interfaces work: the interface can define a "int getX()" method that
>> implementers have to implement, but it can't define the "int x" field
>> directly.

>> If we relaxed the "must be final" field constraint, we'd need some
>> other rule to prevent A or B from defining a setter for their fields
>> as there is no single set of bytecode that can implement a setter for
>> both a value and an identity class:

>> void setA(int a) { putfield A.a }
>> vs
>> A setA(int a) { withfield A.a; areturn; }

>> Note in particular that the second *must* return a new A as values are
>> immutable.

>> The details around this would be hard for users to keep straight and
>> would be easy to violate when refactoring as the authors of A & B
>> would need to know that their subclasses include value types. And
>> this would be incredibly hard to keep straight across maintenance
>> boundaries.

>> --Dan

>>> On Nov 18, 2021, at 5:58 PM, Remi Forax < [ mailto:forax at univ-mlv.fr |
>>> forax at univ-mlv.fr ] > wrote:

>>> ________________________________

>>> From: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ]
>>> >
>>> To: "Kevin Bourrillion" < [ mailto:kevinb at google.com | kevinb at google.com ] >
>>> Cc: "Dan Heidinga" < [ mailto:heidinga at redhat.com | heidinga at redhat.com ] >,
>>> "daniel smith" < [ mailto:daniel.smith at oracle.com | daniel.smith at oracle.com ]
>>> >, "valhalla-spec-experts" < [ mailto:valhalla-spec-experts at openjdk.java.net |
>>> valhalla-spec-experts at openjdk.java.net ] >
>>> Sent: Jeudi 18 Novembre 2021 23:34:51
>>> Subject: Re: EG meeting, 2021-11-17

>>> I think it is reasonable to consider allowing bucket two classes to be abstract.
>>> They could be extended by other classes which would either be abstract or
>>> final. The intermediate types are polymorphic but the terminal type is
>>> monomorphic.

>>> A similar argument works for records.

>>> I suppose you are talking about empty (no field) abstract classes.
>>> We need that for j.l.Object, j.l.Number or j.l.Record.

>>> From a user POV, it's not very different from an interface with default methods.

>>> R?mi

>>> Sent from my iPad

>>> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion < [ mailto:kevinb at google.com |
>>> kevinb at google.com ] > wrote:

>>> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga < [ mailto:heidinga at redhat.com |
>>> heidinga at redhat.com ] > wrote:

>>>> Let me turn the question around: What do we gain by allowing
>>>> subclassing of B2 classes?

>>> I'm not claiming it's much. I'm just coming into this from a different
>>> direction.

>>> In my experience most immutable (or stateless) classes have no real interest in
>>> exposing identity, but just get defaulted into it. Any dependency on the
>>> distinction between one instance and another that equals() it would be a
>>> probable bug.

>>> When B2 exists I see myself advocating that a developer's first instinct should
>>> be to make new classes in B2 except when they need something from B1 like
>>> mutability (and perhaps subclassability belongs in this list too!). As far as I
>>> can tell, this makes sense whether there are even any performance benefits at
>>> all, and the performance benefits just make it a lot more motivating to do what
>>> is already probably technically best anyway.

>>> Now, if subclassability legitimately belongs in that list of B1-forcing-factors,
>>> that'll be fine, I just hadn't fully thought it through and was implicitly
>>> treating it like an open question, which probably made my initial question in
>>> this subthread confusing.

>>> --
>>> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com |
>>> kevinb at google.com ]

From john.r.rose at oracle.com  Mon Nov 22 05:05:13 2021
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 22 Nov 2021 05:05:13 +0000
Subject: EG meeting, 2021-11-17
In-Reply-To: <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
Message-ID: <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>

Yes.  One way I like to think about the Old Bucket is
that it is characterized by *concrete* representations
which have somehow opted into object identity.

Confusingly, the Old Bucket also contains interfaces
which are non-concrete and also Object, which might
as well be non-concrete.  (I?m not saying ?abstract?
because that?s a keyword in the language, and you
can have semi-concrete classes which are abstract
but also commit to object identity and may even
have mutable fields or by-reference constructors,
like AbstractList.)

Those are the two interesting populations in the
Old Bucket:  Concrete classes that are entangled
with object identity (until they can be migrated,
or forever in many cases).  And, non-concrete
classes, which are necessarily polymorphic.

Those two kinds of types (in the Old Bucket)
interact with the New Buckets in distinct ways.

There?s a middle case which is causing problems
here:  A class can be concrete *and* polymorphic,
meaning that subclasses can add more stuff.
(The parent class could be declared abstract
or not; that?s not an important detail.)

A class that is concrete *and* polymorphic is
exactly one that plays the classic game of object
oriented subclasses, where data fields and methods
are refined in layers.

This classic game does not translate well into
the by-value world; it needs polymorphic pointers.
Just consult any C++ style guide to see what happens
if you unwarily try to mix by-value structs and
class inheritance:  You shouldn?t, according to the
guides.

Is there a way to make that work in Java, so that
identity-free classes can inherit from each other?
Probably, in some limited way.  The simplest move
is the one Brian and I are liking here, where a
completely non-concrete class (one with no fields
and no commitment to object identity) can be
refined by a subclass.  But it should be marked
abstract, so as not to have cases where you have
a variable of the super-type and you don?t know
whether it has the layout of the super (because
it was concrete, oops) or a subtype.

The division separating non-concrete types from
identity-object types in the Old Bucket may be
seen in this diagram, which I cobbled up this
weekend:

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

This comes from my attempts to make a more or
less comprehensive Venn-style diagram of the stuff
we are talking about.  I think it helps me better
visualize what we are trying to do; maybe it will
help others in some way.

I view this as my due diligence mapping the side of the
elephant I can make contact with.  Therefore I?m happy
to take corrections on it.

I?m also noodling on a whimsical Field Guide, which asks
you binary questions about a random Java type, and guides
you towards classifying it.  That helped me crystallize
the diagram, and may be useful in its own right,
or perhaps distilled into a flowchart.  Stay tuned.

? John


On Nov 18, 2021, at 2:34 PM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>> wrote:

I think it is reasonable to consider allowing bucket two classes to be abstract.  They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic.

A similar argument works for records.

Sent from my iPad

On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com<mailto:kevinb at google.com>> wrote:

?
On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com<mailto:heidinga at redhat.com>> wrote:

Let me turn the question around: What do we gain by allowing
subclassing of B2 classes?

I'm not claiming it's much. I'm just coming into this from a different direction.

In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug.

When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway.

Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing.


--
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com<mailto:kevinb at google.com>


From john.r.rose at oracle.com  Mon Nov 22 05:10:29 2021
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 22 Nov 2021 05:10:29 +0000
Subject: EG meeting, 2021-11-17
In-Reply-To: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
Message-ID: <B2E1A867-67FD-482B-8BA7-1791BCCFF783@oracle.com>

On Nov 18, 2021, at 2:58 PM, Remi Forax <forax at univ-mlv.fr<mailto:forax at univ-mlv.fr>> wrote:

I suppose you are talking about empty (no field) abstract classes.
We need that for j.l.Object, j.l.Number or j.l.Record.

From a user POV, it's not very different from an interface with default methods.

Yes.  The key thing is that the abstract class in question
does not accidentally entangle itself with object identity.
There are three ways off the top of my head to do that:

 - have a constructor that needs to write fields through `this`
 - have a mutable instance field
 - have synchronization somewhere (a synch. method)

We?ll need to have a way for an abstract class (for Record,
for example) to stand clear of the object identity thicket.

I think we could allow such an abstract class to have final
fields, with suitable restrictions.  But it would require
a complex translation strategy and/or tricky JVM support.
The problem is that the fields in the super would have to
be replicated into each concrete subclass in a physically
separate manner.  Also the fields would have to have their
initialization declared by the superclass but defined by
the concrete subclass.  Also field access might need to be
virtualized, if each concrete subclass has its own idea
about where the field ?lives? in its bundle of fields.
It?s doable but messy.  I?d rather leave it for later; we
have so many more worthwhile things to do.


From john.r.rose at oracle.com  Mon Nov 22 05:25:15 2021
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 22 Nov 2021 05:25:15 +0000
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com>
 <CAJq4Gi6crdzYKoMkTTxmq1a57w=Hvktbu4g4g-G4u=ZMsmhCUw@mail.gmail.com>
 <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com>
Message-ID: <C01B53BC-6550-4C48-B55A-A474233532FE@oracle.com>

On Nov 19, 2021, at 5:32 AM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>> wrote:

And this is not inconsistent with abstract superclasses contributing fields.

For me the poster child is Enum as much as Record.  I want pure
enums, some day, but in order to make this work we need a way for
the ordinal and name fields to (a) appear in the abstract class Enum
and (b) be suitably defined in the layout of each Enum subclass,
whether it is an identity subclass or a pure (B2) subclass.

Sketch of an example way forward (but still with the sense that we
have more important things to do):

 - Allow fields to be marked abstract, and mark Enum?s fields that way.
 - Do not require (or allow) constructors to initialize abstract fields.
 - The JVM can support virtualized getfield, maybe, or just ask the T.S. to use access methods.
 - As with methods, require a concrete subclass to redeclare inherited abstracts.
 - The concrete subclass will naturally declare and initialize the now-concrete field.
 - Have Enum support both kinds of constructors: Old School (fully concrete) and empty.
 - Figure out some story for concretifying Enum?s fields for Old School clients.

The trick would be to configure Enum so that it was a fully functional
super for both kinds of subclasses; it should behave one way to Old
School enum subclasses and another way to B2 enum subclasses.

It?s a research project.  I get the sense there?s a path forward, but
not a simple one.

If you exclude fields, then it?s not as hard as a research project IMO.
The abstract supers of a B2 are not themselves B2; they are polymorphic
types that (conventionally) live in the Old Bucket.


From john.r.rose at oracle.com  Mon Nov 22 05:36:30 2021
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 22 Nov 2021 05:36:30 +0000
Subject: identityless objects and the type hierarchy
In-Reply-To: <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr>
References: <CAGKkBktToL_GBtsw613K+p+B-=1j6T15cvW=DMPzoArH72OZ-Q@mail.gmail.com>
 <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com>
 <CAGKkBkv_=jgqQqkRTJr-=AJbNSbFefBgrehe0g2BJYxZHdhe4g@mail.gmail.com>
 <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr>
Message-ID: <59B8F839-277D-45AE-A350-4B07275F8722@oracle.com>

On Nov 4, 2021, at 3:25 PM, Remi Forax <forax at univ-mlv.fr<mailto:forax at univ-mlv.fr>> wrote:


I don't think a second bifurcation is needed.
At runtime bucket 2 and bucket 3 behave the same apart from null.
Given that IdentitylessObject (or whatever the name we choose) is an interface, it always accept null,
so if they are typed as that interface, B2 and B3 behave exactly the same.


Piling on:  The marker interfaces are useful for
testing and bounding *reference types*.  But
a primitive type is not a reference type, so it
cannot be (directly) tested or bounded as a
reference.

There *is* a difference between a reference
of the form B3.ref (B3.box, B3? whatever)
and B2.  But it?s not an interesting difference,
because when you box a B3 primitive you
get something which has (as Brian says)
all the affordances of reference, but
without object identity.  That?s exactly
what a B2 type is.  The only difference
between a reference to a B3 type and a
B2 type is the syntax by which they were
declared and derived.

This looked pretty clear to me when I
did my diagram, where B3 types have
ref projections that bubble into the
B2 swath of types.  Once there, they
behave exactly like native B2 types.

The diagram has three swathes for
concrete types (PRIM, NOID, IDOSAUR),
plus a separate upper quadrant for
non-concrete reference types.
The PRIM swath has a little excrescence
into the NOID swath where the P.ref
types pop out.

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

All that suggests to me that we won?t want
a marker interface to specially distinguish the
B3 excrescences.

It does also suggest that we are not done
bike-shedding terms:  What?s the collective
term for ?B2 refs + B3 boxes??  (I used NOID.)
Or, is a B3 box a ?pure object? like any B2
pure object, whose class happens to be a
primitive class?  I dunno.

It remains true (and I hope will continue to
be true) that a B3 class defines two types,
one reference and one non-reference, while
a B2 class defines one reference type.
But maybe those two reference types are
both to ?pure objects??  I?ll bet Dan has
a take on this.


From brian.goetz at oracle.com  Mon Nov 22 15:22:11 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 22 Nov 2021 10:22:11 -0500
Subject: EG meeting, 2021-11-17
In-Reply-To: <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>
Message-ID: <17113716-e837-1c0e-c31c-c4f388ce2260@oracle.com>


> Is there a way to make that work in Java, so that
> identity-free classes can inherit from each other?
> Probably, in some limited way. ?The simplest move
> is the one Brian and I are liking here, where a
> completely non-concrete class (one with no fields
> and no commitment to object identity) can be
> refined by a subclass. ?But it should be marked
> abstract, so as not to have cases where you have
> a variable of the super-type and you don?t know
> whether it has the layout of the super (because
> it was concrete, oops) or a subtype.

This is the second turn of the crank (the first was "you can extend 
Object only"), but as this conversation hinted, there may be a further 
turn where abstract identity-agnostic classes can contribute to the 
layout and initialization of a concrete final by-value class without 
pulling it into the world of new-dup-init.? The following is an 
exploration, not a proposal, but might help find the next turn of the 
crank.? The exposition is translational-centric but that's not essential.

An abstract class can contribute fields, initialization of those fields, 
and behavior.? We can transform:

 ??? abstract class C extends ValueObject {? // no identity children, please
 ??????? T t;

 ??????? C(T t) { ... t = e ... }
 ??? }

into

 ??? interface C {
 ??????? abstract <V extends C> protected V withT(V v, T t);
 ??????? abstract protected T t();

 ??????? static<V extends C>? protected V init(V v, T t) {
 ?????????? ... v = withT(v, e) ...
 ????????? return v;
 ?????? }
 ??? }

and a subclass

 ??? b2-class V extends C {
 ??????? U u;

 ??????? V(T t, U u) { super(t); u = e; }
 ??? }

into

 ??? b2-class V implements C {
 ??????? T t;?? // pull down fields from super
 ??????? U u;

 ??????? V(T t, U u) {
 ??????????? V this = initialvalue;
 ??????????? this = C.init(this, t);
 ??????????? this = this withfield[u] u;
 ??????? }
 ??? }

The point of this exercise is to observe that the two components of C 
that are doing double-duty as both API points for clients of C and 
extension points for subclasses of C -- the constructor and the layout 
-- can be given new implementations for the by-value world, that is 
consistent with the inheritance semantics the user expects.

Again, not making a proposal here, as much as probing at the bounds of a 
new object model.

(I think this is similar to what you sketched in your next mail.)

> The division separating non-concrete types from
> identity-object types in the Old Bucket may be
> seen in this diagram, which I cobbled up this
> weekend:
>
> http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf
>
> This comes from my attempts to make a more or
> less comprehensive Venn-style diagram of the stuff
> we are talking about. ?I think it helps me better
> visualize what we are trying to do; maybe it will
> help others in some way.
>
> I view this as my due diligence mapping the side of the
> elephant I can make contact with. ?Therefore I?m happy
> to take corrections on it.
>
> I?m also noodling on a whimsical Field Guide, which asks
> you binary questions about a random Java type, and guides
> you towards classifying it. ?That helped me crystallize
> the diagram, and may be useful in its own right,
> or perhaps distilled into a flowchart. ?Stay tuned.
>
> ? John
>
>
>> On Nov 18, 2021, at 2:34 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>
>> I think it is reasonable to consider allowing bucket two classes to 
>> be abstract. ?They could be extended by other classes which would 
>> either be abstract or final. The intermediate types are polymorphic 
>> but the terminal type is monomorphic.
>>
>> A similar argument works for records.
>>
>> Sent from my iPad
>>
>>> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion <kevinb at google.com> 
>>> wrote:
>>>
>>> ?
>>> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga <heidinga at redhat.com> 
>>> wrote:
>>>
>>>     Let me turn the question around: What do we gain by allowing
>>>     subclassing of B2 classes?
>>>
>>>
>>> I'm not claiming it's much. I'm just coming into this from a 
>>> different direction.
>>>
>>> In my experience most immutable (or stateless) classes have no real 
>>> interest in exposing identity, but just get defaulted into it. Any 
>>> dependency on the distinction between one instance and another that 
>>> equals() it would be a probable bug.
>>>
>>> When B2 exists I see myself advocating that a developer's first 
>>> instinct should be to make new classes in B2 except when they 
>>> /need/?something from B1 like mutability (and perhaps 
>>> subclassability?belongs in this list too!). As far as I can tell, 
>>> this makes sense whether there are even /any /performance benefits 
>>> at all, and the performance benefits just make it a lot more 
>>> /motivating/?to do what is already probably technically best anyway.
>>>
>>> Now, if subclassability?legitimately belongs in that list of 
>>> B1-forcing-factors, that'll be fine, I just hadn't fully thought it 
>>> through and was implicitly treating it like an open question, which 
>>> probably made my initial question in this subthread confusing.
>>>
>>>
>>>
>>> -- 
>>> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com
>

From brian.goetz at oracle.com  Mon Nov 22 19:14:22 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 22 Nov 2021 14:14:22 -0500
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <CAJq4Gi7fuuLf49o1_bD5wc5MJnJc+6pAee2NXs96fZKdAeBshQ@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>
 <17113716-e837-1c0e-c31c-c4f388ce2260@oracle.com>
 <CAJq4Gi7fuuLf49o1_bD5wc5MJnJc+6pAee2NXs96fZKdAeBshQ@mail.gmail.com>
Message-ID: <e6f6b719-9b65-887d-0fc3-6eb4267d3eaf@oracle.com>

I wouldn't say we flipped anything.? But we have made a lot of progress 
on the model; at first we thought abstract supers at all were a bridge 
too far, but we found the right set of constraints and it seems to fit 
naturally now.? So it makes sense to ask the question whether we're at 
the edge, or whether further crank-turns are worth exploring.

I was mostly reacting to Kevin's comments; he seemed to be going 
somewhere with the "could we get people to adopt B2 by default", and 
probing for where that might go, and what constraints we'd have to 
reexplore, to see if there was untapped value there.

On 11/22/2021 2:09 PM, Dan Heidinga wrote:
> I'm trying to understand what flipped the cost-benefit calculation
> here that makes it worthwhile to re-explore allowing values to inherit
> fields from abstract supers.

From kevinb at google.com  Mon Nov 22 21:07:55 2021
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 22 Nov 2021 13:07:55 -0800
Subject: EG meeting, 2021-11-17
In-Reply-To: <CAJq4Gi51c2C9xOqJs8kZxZJEiJc47bY5EyHpioCyXqJT+JNGOw@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <B2E1A867-67FD-482B-8BA7-1791BCCFF783@oracle.com>
 <CAJq4Gi51c2C9xOqJs8kZxZJEiJc47bY5EyHpioCyXqJT+JNGOw@mail.gmail.com>
Message-ID: <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>

On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga <heidinga at redhat.com> wrote:

I'll echo Brian's comment that I'd like to understand Kevin's use
> cases better to see if there's something we're missing in the design /
> a major use case that isn't being addressed that will cause useer
> confusion / pain.
>

Sorry if I threw another wrench here!

What I'm raising is only the wish that users can reasonably *default* to
B2-over-B1 unless their use case requires something on our list of "only B1
does this". And that list can be however long it needs to be, just
hopefully no longer. That's probably how we were looking at it already.

And sure, "need" sometimes can mean "it would have made translation *way
too* complex and clever". Even if all we can say is "in principle this
*could* be supported, but it just isn't and click here if you *really care
a lot* to know the reasons why", it works and I suspect most users wouldn't
even click.

Does that make perfect sense? Again, the thread just backed into the topic
sideways.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Mon Nov 22 21:15:54 2021
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 22 Nov 2021 16:15:54 -0500
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <B2E1A867-67FD-482B-8BA7-1791BCCFF783@oracle.com>
 <CAJq4Gi51c2C9xOqJs8kZxZJEiJc47bY5EyHpioCyXqJT+JNGOw@mail.gmail.com>
 <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>
Message-ID: <abb5d350-9b84-f345-5a7a-9c93909ff8aa@oracle.com>

Or, to put it another way: success looks like yet another "got the 
defaults wrong", where people should default to B2 unless they need B1, 
and "pure" joins the ranks of "final" and "private" of "I shoulda been 
the default."

Right, that's what you're saying?

On 11/22/2021 4:07 PM, Kevin Bourrillion wrote:
> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga <heidinga at redhat.com> wrote:
>
>     I'll echo Brian's comment that I'd like to understand Kevin's use
>     cases better to see if there's something we're missing in the design /
>     a major use case that isn't being addressed that will cause useer
>     confusion / pain.
>
>
> Sorry if I threw another wrench here!
>
> What I'm raising is only the wish that users can reasonably /default/ 
> to B2-over-B1 unless their use case?requires something on our list of 
> "only B1 does this". And that list can be however long it needs to be, 
> just hopefully no longer. That's probably how we were looking at it 
> already.
>
> And sure, "need" sometimes can mean "it would have made translation 
> /way too/ complex and clever". Even if all we can say is "in principle 
> this /could/?be supported, but it just isn't and click here if you 
> /really care a lot/?to know the reasons why", it works and I suspect 
> most users wouldn't even click.
>
> Does that make perfect sense? Again, the thread just backed into the 
> topic sideways.
>
> -- 
> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com

From john.r.rose at oracle.com  Tue Nov 23 01:04:41 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 23 Nov 2021 01:04:41 +0000
Subject: EG meeting, 2021-11-17
In-Reply-To: <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <F495FFC2-1F70-49E6-A025-E1419E5E2E6D@oracle.com>
Message-ID: <BA60AE84-7A30-4103-83B8-3E4DFF5F5A4B@oracle.com>

Thanks, Brian, for many useful suggestions about the diagram.

I have updated it in place.  Its message should be clearer now.

On Nov 21, 2021, at 9:05 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf


From daniel.smith at oracle.com  Tue Nov 23 01:13:14 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Tue, 23 Nov 2021 01:13:14 +0000
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <B2E1A867-67FD-482B-8BA7-1791BCCFF783@oracle.com>
 <CAJq4Gi51c2C9xOqJs8kZxZJEiJc47bY5EyHpioCyXqJT+JNGOw@mail.gmail.com>
 <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>
Message-ID: <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com>

> On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion <kevinb at google.com> wrote:
> 
>> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga <heidinga at redhat.com> wrote:
>> 
>> I'll echo Brian's comment that I'd like to understand Kevin's use
>> cases better to see if there's something we're missing in the design /
>> a major use case that isn't being addressed that will cause useer
>> confusion / pain.
>> 
> Sorry if I threw another wrench here!
> 
> What I'm raising is only the wish that users can reasonably default to B2-over-B1 unless their use case requires something on our list of "only B1 does this". And that list can be however long it needs to be, just hopefully no longer. That's probably how we were looking at it already.

Here's the current list, FYI (derived from JEP 401):

	? Implicitly final class, cannot be extended.
	? All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer.
	? The class does not implement?directly or indirectly?IdentityObject. This implies that the superclass is either Object or a stateless abstract class.
	? No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code.
	? No instance methods are declared synchronized.
	? (Possibly) The class does not implement Cloneable or declare a clone()method.
	? (Possibly) The class does not declare a finalize() method.
	? (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned.

And elaborating on IdentityObject & stateless abstract classes:

An abstract class can be declared to implement either IdentityObject or ValueObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning). Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes.

(Such a "both kinds" abstract class has its ACC_PRIM_SUPER?name to be changed?flag set in the class file, along with an <init> method for identity classes.)

From john.r.rose at oracle.com  Wed Nov 24 06:48:32 2021
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 24 Nov 2021 06:48:32 +0000
Subject: [External] : Re: EG meeting, 2021-11-17
In-Reply-To: <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com>
References: <EFB12BE2-E527-45D2-B6EF-549286C91CFE@oracle.com>
 <CAGKkBktbJw1nHS_=VAVH0h-b14hZkJTAn8ACWSwRzPppv-Dx1A@mail.gmail.com>
 <CAJq4Gi4yPksX9nacV=jyuYxrf7ezuBgUixAKrtmZmiWrGDVVZA@mail.gmail.com>
 <CAGKkBkvOgxurxC5Da4Wwq4jvSmKNGWNHxQM90P3KxAbJteHxMA@mail.gmail.com>
 <CAJq4Gi7mE1Di9t5W3sdM+G6qWYkOd3EJPZFxAjAULxNnFMuykQ@mail.gmail.com>
 <CAGKkBkv4JtLWyZjZfrk9skC1j-My1uKuj=ZY13T=YXrK2hU7nA@mail.gmail.com>
 <DEE3FD33-EA30-4CDA-ADA1-375EB0CA0AD3@oracle.com>
 <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr>
 <B2E1A867-67FD-482B-8BA7-1791BCCFF783@oracle.com>
 <CAJq4Gi51c2C9xOqJs8kZxZJEiJc47bY5EyHpioCyXqJT+JNGOw@mail.gmail.com>
 <CAGKkBkvg7w-HBVbabpuK3KpvdEPENn6VPz0H4=eKnFHjh5Hp-g@mail.gmail.com>
 <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com>
Message-ID: <54F7E409-E87F-4E7C-B9DA-A26869CF22AC@oracle.com>

On Nov 22, 2021, at 5:13 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
>> On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion <kevinb at google.com> wrote:
>> 
>>> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga <heidinga at redhat.com> wrote:
>>> 
>>> I'll echo Brian's comment that I'd like to understand Kevin's use
>>> cases better to see if there's something we're missing in the design /
>>> a major use case that isn't being addressed that will cause useer
>>> confusion / pain.
>>> 
>> Sorry if I threw another wrench here!
>> 
>> What I'm raising is only the wish that users can reasonably default to B2-over-B1 unless their use case requires something on our list of "only B1 does this". And that list can be however long it needs to be, just hopefully no longer. That's probably how we were looking at it already.
> 
> Here's the current list, FYI (derived from JEP 401):
> 
> 	? Implicitly final class, cannot be extended.

JVMS requires ACC_FINAL on class.

> 	? All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer.

JVMS requires ACC_FINAL on every instance field.  (Static fields OK.)

> 	? The class does not implement?directly or indirectly?IdentityObject. This implies that the superclass is either Object or a stateless abstract class.

JVMS requires a check for this.

> 	? No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code.

JVMS rules for invokespecial <init> must exclude this.

> 	? No instance methods are declared synchronized.

JVMS forbits ACC_SYNC. on all instance methods.  (Static methods OK.)

> 	? (Possibly) The class does not implement Cloneable or declare a clone()method.
> 	? (Possibly) The class does not declare a finalize() method.

A conservative move is to forbid these things, in language and JVMS.
Minor precedent:  record has similar special cases (for component names).

> 	? (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned.

JVMS doesn?t care about this.

The private opcodes initialvalue and withfield work to set up ?this?
as the constructor executes.  It?s OK to sample the value at any time,
but maybe the language says, ?don?t do that?.

I think there are use cases for private methods to work on partially
initialized stuff.  The theory is tricky.  OK to be conservative now
and more lenient later.

> 
> And elaborating on IdentityObject & stateless abstract classes:
> 
> An abstract class can be declared to implement either IdentityObject or ValueObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning).

JVMS should enforce corresponding structural rules on loaded classfiles.
Neither a source class-or-interface nor a loaded classfile can ever
implement both IO and VO at the same time.

As a special feature in the JVM I want an explicit form for these
?empty constructors?.  We?ve discussed this; I?m not sure which form
is best, but I don?t want it to be a ?not-really-empty? constructor which
has a super-call in it; that?s what seemingly ?empty? constructor look
like today to the JVM.

The JVM should both allow and require an empty constructor if
and only if the abstract class implements VO.  (Alternative:
The JVM implicitly injects VO if it sees an empty constructor,
and if it sees VO it looks for an empty constructor.)

IIRC maybe our last consensus was to add an attribute to an
<init> method of signature ()V that says, ?whatever you think
you see in this method, Mr. VM, please also feel free to skip it.?
That?s a more hacky way to specify an empty constructor than
would be my preference (which is an ACC_ABSTRACT <init>()V
or even a zero-length class attribute).  If a VO-only abstract
has an <init>()V method, that?s a smell, because it will never
be used!  OTOH, maybe just being a VO-0nly abstract class is
enough to tell the JVM that the constructor is empty, with
no further markings.  Anyway, there?s a little corner of the
design space to consider here.

> Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes.

Such a class is very handy.  It needs *both kinds of constructors*.

Are you thinking that just mentioning the special VO super is
enough to trigger inclusion of an empty constructor?  That?s
probably a good move.  Is this the *only* way to request an
empty constructor, or is there a way to make an explicit
empty constructor?  (I mean a really-empty one, not just
today?s seemingly-empty ones.  Even Object?s empty constructor
has an areturn instruction, so it?s not really empty.)

> (Such a "both kinds" abstract class has its ACC_PRIM_SUPER?name to be changed?flag set in the class file, along with an <init> method for identity classes.)

Yes, that makes sense.  So maybe a VO-capable abstract class
is always assumed to have an implicit empty constructor,
even if there is no other marking than the PRIM_SUPER?
I guess that?s OK for the JVM.  For the source language it
might be too magic.


From daniel.smith at oracle.com  Tue Nov 30 00:09:06 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Tue, 30 Nov 2021 00:09:06 +0000
Subject: JEP update: Value Objects
Message-ID: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com>

I've been exploring possible terminology for "Bucket 2" classes, the ones that lack identity but require reference type semantics.

Proposal: *value classes*, instances of which are *value objects*

The term "value" is meant to suggest an entity that doesn't rely on mutation, uniqueness of instances, or other features that come with identity. A value object with certain field values is the same (per ==), now and always, as every "other" value object with those field values.

(A value object is *not* necessarily immutable all the way down, because its fields can refer to identity objects. If programmers want clean immutable semantics, they shouldn't write code (like 'equals') that depends on these identity objects' mutable state. But I think the "value" term is still reasonable.)

This feels like it may be an intuitive way to talk about identity without resorting to something verbose and negative like "non-identity".

If you've been following along all this time, there's potential for confusion: a "value class" has little to do with a "primitive value type", as we've used the term in JEP 401. We're thinking the latter can just become "primitive type", leading to the following two-axis interpretation of the Valhalla features:

---------------------------------------------------------------------------------------------
Value class reference type (B2 & B3.ref)	| Identity class type (B1)
---------------------------------------------------------------------------------------------
Value class primitive type (B3)			|
---------------------------------------------------------------------------------------------

Columns: value class vs. identity class. Rows: reference type vs. primitive type. (Avoid "value type", which may not mean what you think it means.)

Fortunately, the renaming exercise is just a problem for those of us who have been closely involved in the project. Everybody else will approach this grid with fresh eyes.

(Another old term that I am still finding useful, perhaps in a slightly different way: "inline", describing any JVM implementation strategy that encodes value objects directly as a sequence of field values.)

Here's a new JEP draft that incorporates this terminology and sets us up to deliver Bucket 2 classes, potentially as a separate feature from Bucket 3:

https://bugs.openjdk.java.net/browse/JDK-8277163

Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive classes and types as a special kind of of value class.


From john.r.rose at oracle.com  Tue Nov 30 06:53:56 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 30 Nov 2021 06:53:56 +0000
Subject: JEP update: Value Objects
In-Reply-To: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com>
References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com>
Message-ID: <F744E563-907B-45B3-9601-4A7C8BB2777A@oracle.com>

Two points from me for the record:

1. I re-read the JEP draft now titled Value Objects, and liked everything I saw, including the new/old term ?Value? replacing ?Pure? and ?Inline?.

2. In your mail, and in the companion JEP draft titled Primitive Objects, you refer to ?primitive classes? and their objects.  It would make our deliberations simpler, IMO, if we were to title this less prescriptively as ?Primitives? or ?Primitive Types? or ?Primitive Types and Values?, rather than ?Primitive Classes?, because (a) there?s no logical need for the new things to be classes, and (b) it might actually be helpful for them *not* to be, in the end, after deliberation.  Putting the word ?classes? in the title presupposes an answer to deliberations that have not yet been concluded.

People should note that the term ?class? and ?object? is only loosely bound to the term ?primitive? in most of our designs, since (of course) today no primitives at all are either defined by classes or have objects.  They have corresponding reference or box classes and objects, to be precise.  Today a primitive type ?has a class? but it is not the case that it ?is a class?.  We could choose to preserve this state of affairs instead of fixing it by making ?classes everywhere?; it makes some dependent choices easier to make.  As you know, one possible bridge to the future is, ?Today all types are a disjoint union of primitives, classes, and interfaces, and tomorrow the same will be true, with all three possessing class-like declarations.?

What about objects, shouldn?t primitives at least be objects?  Well, interfaces don?t directly have objects today; they have objects of implementing classes.  Likewise, primitives need never have objects directly, as long as they have objects which properly relate to them?their boxes.  Boxes-boxes-everywhere certainly has its downsides, include pedagogical downsides, but that doesn?t make it a non-starter.

Instead, if we choose to use the terms ?primitive class? and ?primitive object? as exact counterparts to ?reference class? and ?reference object?, as your chart suggests, Dan, we will have to account for the duplication and/or ad hoc division of various attributions of classes and objects between the ?primitive  class? and its corresponding ?reference class? (e.g., int.ref, Point.ref).  I think a good leading question is, ?if a primitive is a class, and its reference type is also a class, which of its methods are situated on the primitive class, and which are situated on the reference class??  I would suggest that we be more sure we want to have two classes per primitive, or only-a-primitive-class per primitive, before we presuppose a decision by putting the word ?Classes? in the title of JEP 402.

> On Nov 29, 2021, at 4:09 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> I've been exploring possible terminology for "Bucket 2" classes, the ones that lack identity but require reference type semantics.
> 
> Proposal: *value classes*, instances of which are *value objects*
> 
> The term "value" is meant to suggest an entity that doesn't rely on mutation, uniqueness of instances, or other features that come with identity. A value object with certain field values is the same (per ==), now and always, as every "other" value object with those field values.
> 
> (A value object is *not* necessarily immutable all the way down, because its fields can refer to identity objects. If programmers want clean immutable semantics, they shouldn't write code (like 'equals') that depends on these identity objects' mutable state. But I think the "value" term is still reasonable.)
> 
> This feels like it may be an intuitive way to talk about identity without resorting to something verbose and negative like "non-identity".
> 
> If you've been following along all this time, there's potential for confusion: a "value class" has little to do with a "primitive value type", as we've used the term in JEP 401. We're thinking the latter can just become "primitive type", leading to the following two-axis interpretation of the Valhalla features:
> 
> ---------------------------------------------------------------------------------------------
> Value class reference type (B2 & B3.ref)	| Identity class type (B1)
> ---------------------------------------------------------------------------------------------
> Value class primitive type (B3)			|
> ---------------------------------------------------------------------------------------------
> 
> Columns: value class vs. identity class. Rows: reference type vs. primitive type. (Avoid "value type", which may not mean what you think it means.)
> 
> Fortunately, the renaming exercise is just a problem for those of us who have been closely involved in the project. Everybody else will approach this grid with fresh eyes.
> 
> (Another old term that I am still finding useful, perhaps in a slightly different way: "inline", describing any JVM implementation strategy that encodes value objects directly as a sequence of field values.)
> 
> Here's a new JEP draft that incorporates this terminology and sets us up to deliver Bucket 2 classes, potentially as a separate feature from Bucket 3:
> 
> https://bugs.openjdk.java.net/browse/JDK-8277163
> 
> Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive classes and types as a special kind of of value class.
> 


From john.r.rose at oracle.com  Tue Nov 30 07:05:22 2021
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 30 Nov 2021 07:05:22 +0000
Subject: JEP update: Value Objects
In-Reply-To: <F744E563-907B-45B3-9601-4A7C8BB2777A@oracle.com>
References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com>
 <F744E563-907B-45B3-9601-4A7C8BB2777A@oracle.com>
Message-ID: <DBFCF622-1E7E-4CB1-A824-4420F6630127@oracle.com>

P.S. I?d like to emphasize that none of my pleas for caution apply to the
JEP draft titled Value Objects.

That very nice JEP draft merely links to the JEP draft titled Primitive Classes,
which is the JEP with the potential problem I?m taking pains to point out here.

Also, I?m not really demanding a title change here, Dan, but rather asking
everyone to be careful about any presupposition that ?of course we will
heal the rift by making all primitives be classes?.  Or even ?all primitives
be objects.?  Those are easy ideas to fall into by accident, and I don?t want
us to get needlessly muddled about them as we sort them out.

(Having picked Value as the winner for the first JEP, replacing Primitive
Objects with Primitive Values in the second JEP is not exactly graceful,
is it?  Naming is hard.  If you were to change the title I suggest simply
?Primitives? as the working title, until we figure out exactly what we
want these Primitives to be, relative to other concepts.  Just a suggestion.)

On Nov 29, 2021, at 10:53 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

Two points from me for the record:

1. I re-read the JEP draft now titled Value Objects, and liked everything I saw, including the new/old term ?Value? replacing ?Pure? and ?Inline?.

2. In your mail, and in the companion JEP draft titled Primitive Objects, you refer to ?primitive classes? and their objects.  It would make our deliberations simpler, IMO, if we were to title this less prescriptively as ?Primitives? or ?Primitive Types? or ?Primitive Types and Values?, rather than ?Primitive Classes??


From daniel.smith at oracle.com  Tue Nov 30 18:13:55 2021
From: daniel.smith at oracle.com (Dan Smith)
Date: Tue, 30 Nov 2021 18:13:55 +0000
Subject: JEP update: Value Objects
In-Reply-To: <DBFCF622-1E7E-4CB1-A824-4420F6630127@oracle.com>
References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com>
 <F744E563-907B-45B3-9601-4A7C8BB2777A@oracle.com>
 <DBFCF622-1E7E-4CB1-A824-4420F6630127@oracle.com>
Message-ID: <D4DB4E8A-5AE6-423D-969B-3CE0BB37DCD6@oracle.com>

On Nov 30, 2021, at 12:05 AM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

Also, I?m not really demanding a title change here, Dan, but rather asking
everyone to be careful about any presupposition that ?of course we will
heal the rift by making all primitives be classes?.  Or even ?all primitives
be objects.?  Those are easy ideas to fall into by accident, and I don?t want
us to get needlessly muddled about them as we sort them out.

+1

I've been defaulting in descriptions like my two-axis grid to the plan of record, until we settle on a revised plan. But quite possible that "class" is not the right word for the second row.

(As for JEP 401?it will need to be revised to build on the Value Objects JEP. What you're seeing right now is unchanged from a few months ago. An updated iteration to come...)