From brian.goetz at oracle.com Tue Nov 2 21:18:46 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 2 Nov 2021 17:18:46 -0400 Subject: Consolidating the user model Message-ID: We've been grinding away, and we think we have a reduced-complexity user model.? This is all very rough, and there's lots we need to write up more carefully first, but I'm sharing this as a preview of how we can simplify past where JEPs 401 and 402 currently stand. # Consolidating the user model As the mechanics of primitive classes have taken shape, it is time to take another look at the user model. Valhalla started with the goal of providing user-programmable classes which could be flat and dense in memory.? Numerics are one of the motivating use cases, but adding new primitive types directly to the language has a very high barrier.? As we learned from [Growing a Language][growing] there are infinitely many numeric types we might want to add to Java, but the proper way to do that is as libraries, not as language features. In the Java language as we have today, objects and primitives are different in almost every way: objects have identity, primitives do not; objects are referred to through references, primitives are not; object references can be null, primitives cannot; objects can have mutable state, primitive can not; classes can be extended, primitive types cannot; loading and storing of object references is atomic, but loading and storing of large primitives is not.? For obvious reasons, the design center has revolved around the characteristics of primitives, but the desire to have it both ways is strong; developers continue to ask for variants of primitive classes that have a little more in common with traditional classes in certain situations.? These include: ?- **Nullability.**? By far the most common concern raised about primitive ?? classes, which "code like a class", is the treatment of null; many developers ?? want the benefits of flattening but want at least the option to have `null` ?? as the default value, and getting an exception when an uninitialized instance ?? is used. ?- **Classes with no sensible default.**? Prior to running the constructor, the ?? JVM initializes all memory to zero.? Since primitive classes are routinely ?? stored directly rather than via reference, it is possible that users might be ?? exposed to instances in this initial, all-zero state, without a constructor ?? having run.? For numeric classes such as complex numbers, zero is a fine ?? default, and indeed a good default.? But for some classes, not only is zero ?? not the best default, but there _is no good default_. Storing dates as ?? seconds-since-epoch would mean uninitialized dates are interpreted as Jan 1, ?? 1970, which is more likely to be a bug than the desired behavior.? Classes ?? may try to reject bad values in their constructor, but if a class has no ?? sensible default, then they would rather have a default that behaves more ?? like null, where you get an error if you dereference it.? And if the default ?? is going to behave like null, it's probably best if the default _is_ null. ?- **Migration**.? Classes like `Optional` and `LocalDate` today are ?? _value-based_, meaning they already disavow the use of object identity and ?? therefore are good candidates for being primitive classes. However, since ?? these classes exist today and are used in existing APIs and client code, they ?? would have additional compatibility constraints.? They would have to continue ?? to be passed by object references to existing API points (otherwise the ?? invocation would fail to link) and these types are already nullable. ?- **Non-tearability.**? 64-bit primitives (`long` and `double`) risk _tearing_ ?? when accessed under race unless they are declared `volatile`.? However, ?? objects with final fields offer special initialization-safety guarantees ?? under the JMM, even under race.? So should primitive classes be more like ?? primitives (risking being seen to be in impossible states), or more like ?? classes (consistent views for immutable objects are guaranteed, even under ?? race)?? Tear-freedom has potentially signficant costs, and tearing has ?? signficant risks, so it is unlikely one size fits all. ?- **Direct control over flattening.**? In some cases, flattening is ?? counterproductive.? For example, if we have a primitive class with many ?? fields, sorting a flattened array may be more expensive than sorting an array ?? of references; while we don't pay the indirection costs, we do pay for ?? increased footprint, as well as increased memory movement when swapping ?? elements.? Similarly, if we want to permute an array with a side index, it ?? may well be cheaper to maintain an array of references rather than copying ?? all the data into a separate array. These requests are all reasonable when taken individually; its easy to construct use cases where one would want it both ways for any given characteristic.? But having twelve knobs (and 2^12 possible settings) on primitive classes is not a realistic option, nor does it result in a user model that is easy to reason about. In the current model, a primitive class is really like a primitive -- no nulls, no references, always flattened, tearable when large enough. Each primitive class `P` comes with a companion reference type (`P.ref`), which behaves much as boxes do today (except without identity.)? There is also, for migration, an option (`ref-default`) to invert the meaning of the unqualified name, so that by default `Optional` means `Optional.ref`, and flattening must be explicitly requested which, in turn, is the sole motivation for the `P.val` denotation.) We would like for the use of the `.ref` and `.val` qualifiers to be rare, but currently they are not rare enough for comfort. Further, we've explored but have not committed to a means of declaring primitive classes which don't like their zero value, for primitive classes with no good default, so that dereferencing a zero value would result in some sort of exception.? (The nullability question is really dominated by the initialization safety question.)? This would be yet another variant of primitive class. A serious challenge to this stacking is the proliferation of options; there are knobs for nullability, zero-hostility, migration, tear-resistence, etc. Explaining when to use which at the declaration site is already difficult, and there is also the challenge of when to use `ref` or `val` at the use site.? The current model has done well at enumerating the requirements (and, helping us separate the real ones from the wannabes), so it is now time to consolidate. ## Finding the buckets Intuitively, we sense that there are three buckets here; traditional identity classes in one bucket, traditional primitives (coded like classes) in another, and a middle bucket that offers some "works like an int" benefits but with some of the affordances (e.g., nullability, non-tearability) of the first. Why have multiple buckets at all?? Project Valhalla has two main goals: better performance (enabling more routine flattening and better density), and unifying the type system (healing the rift between primitives and objects.)? It's easy to talk about flattening, but there really are at least three categories of flattening, and different ones may be possible in different situations: ?- **Heap flattening.**? Inlining the layout of one object into another object ?? (or array) layout; when class `C` has a field of type `D`, rather than ?? indirecting to a `D`, we inline D's layout directly into C. ?- **Calling convention flattening.**? Shredding a primitive class into its ?? fields in (out-of-line) method invocations on the call stack. ?- **IR flattening.**? When calling a method that allocates a new instance and ?? returns it, eliding the allocation and shredding it into its fields instead. ?? This only applies when we can inline through from the allocation to the ?? consumption of its fields.? (Escape analysis also allows this form of ?? flattening, but only for provably non-escaping objects.? If we know the ?? object is identity free, we can optimize in places where EA would fail.) #### Nullability Variables in the heap (fields and array elements) must have a default value; for all practical purposes it is a forced move that this default value is the all-zero-bits value.? This zero-bits value is interpreted as `null` for references, zero for numerics, and `false` for booleans today. If primitives are to "code like a class", the constructor surely must be able to reject bad proposed states.? But what if the constructor thinks the default value is a bad state?? The desire to make some primitive classes nullable stems from the reality that for some classes, we'd like a "safe" default -- one that throws if you try to use it before it is initialized. But, the "traditional" primitives are not nullable, and for good reason; zero is a fine default value, and the primitives we have today typically use all their bit patterns, meaning that arranging for a representation of null requires at least an extra bit, which in reality means longs would take at least 65 bits (which in reality means 128 bits most of the time.) So we see nullability is a tradeoff; on the one hand, it gives us protection from uninitialized variables, but also has costs -- extra footprint, extra checks.? We experimented with a pair of modifiers `null-default` and `zero-default`, which would determine how the zero value is interpreted.? But this felt like solving the problem at the wrong level. #### Tearing The Java Memory Model includes special provisions for visibility of final fields, even with the reference to their container object is shared via a data race.? These initialization safety guarantees are the bedrock of the Java security model; a String being seen to change its value -- or to not respect invariants established by its constructor -- would make it nearly impossible to reason about security. On the other hand, longs and doubles permit tearing when shared via data races. This isn't great, but preventing tearing has a cost, and the whole reason we got primitives in 1995 was driven by expectations and tradeoffs around arithmetical performance.? Preventing tearing is still quite expensive; above 64 bits, atomic instructions have a significant tax, and often the best way to manage tearing is via an indirection when stored in the heap (which is precisely what flattening is trying to avoid.) When we can code primitives "like a class", which should they be more like?? It depends!? Classes that are more like numerics may be willing to tolerate tearing for the sake of improved performance; classes that are more like "traditional classes" will want the initialization safety afforded to immutable objects already. So we see tearability is a tradeoff; on the one hand, it protects invariants from data races, but also has costs -- expensive atomic instructions, or reduced heap flattening.? We experimented with a modifier that marks classes as non-tearable, but this would require users to keep track of which primitive classes are tearable and which aren't.? This felt like solving the problem at the wrong level. #### Migration There are some classes -- such as `java.lang.Integer`, or `java.util.Optional` -- that meet all the requirements to be declared as (nullable) primitive classes, but which exist today in as identity classes.? We would like to be able to migrate these to primitives to get the benefits of flattening, but are constrained that (at least for non-private API points) they must be represented as `L` descriptors for reasons of binary compatibility.? Our existing interpretation of `L` descriptors is that they represent references as pointers; this means that even if we could migrate these types, we'd still give up on some forms of flattening (heap and stack), and our migration would be less than ideal. Worse, the above interpretation of migration suggests that sometimes a use of `P` is translated as `LP`, and sometimes as `QP`.? To the degree that there is uncertainty in whether a given source type translates to an `L` or `Q` descriptor, this flows into either uncertainty of how to use reflection (users must guess as to whether a given API point using `P` was translated with `LP` or `QP`), or uncertainty on the part of reflection (the user calls `getMethod(P.class)`, and reflection must consider methods that accept both `LP` and `QP` descriptors.) ## Restacking for simplicity The various knobs on the user model (which may flow into translation and reflection) risk being death by 1000 cuts; they not only add complexity to the implementation, but they add complexity for users.? This prompted a rethink of assumptions at every layer. #### Nullable primitives The first part of the restacking involved relaxing the assumption that primitive classes are inherently non-nullable.? We shied away from this for a long time, knowing that there would be significant VM complexity down this road, but in the end concluded that the complexity is better spent here than elsewhere.? These might be translated as `Q` descriptors, or might be translated as `L` descriptors with a side channel for preloading metadata -- stay tuned for a summary of this topic. > Why Q?? The reason we have `Q` descriptors at all is that we need to know things about classes earlier than we otherwise would, in order to make decisions that are hard to unmake later (such as layout and calling convention.)? Rather than interpreting `Q` as meaning "value type" (as the early prototypes did), `Q` acquired the interpretation "go and look."? When the JVM encounters a field or method descriptor with a `Q` in it, rather than deferring classloading as long as possible (as is the case with `L` descriptors), we load the class eagerly, so we can learn all we need to know about it.? From classloading, we might not only learn that it is a primitive class, but whether it should be nullable or not. (Since primitive classes are monomorphic, carrying this information around on a per-linkage basis is cheap enough.) So some primitive classes are marked as "pure" primitives, and others as supporting null; when the latter are used as receivers, `invokevirtual` does a null check prior to invocation (and NPEs if the receiver is null).? When moving values between the heap and the stack (`getfield`, `aastore`, etc), these bytecodes must check for the "flat null" representation in the heap and a real null on the stack.? The VM needs some help from the classfile to help choose a bit pattern for the flat null; the most obvious strategy is to inject a synthetic boolean, but there are others that don't require additional footprint (e.g., flow analysis that proves a field is assigned a non-default value; using low-order bits in pointers; using spare bits in booleans; using pointer colors; etc.)? The details are for another day, but we would like for this to not intrude on the user model. #### L vs Q The exploration into nullable primitives prompted a reevaluation of the meaning of L vs Q.? Historically we had interpreted L vs Q as being "pointer vs flat" (though the VM always has the right to unflatten if it feels like it.)? But over time we've been moving towards Q mostly being about earlier loading (so the VM can learn what it needs to know before making hard-to-reverse decisions, such as layout.)? So let's go there fully. A `Q` descriptor means that the class must be loaded eagerly (Q for "quick") before resolving the descriptor; an `L` descriptor means it _must not be_ (L for "lazy"), consistent with current JVMS treatment.? Since an `L` descriptor is lazily resolved, we have to assume conservatively that it is nullable; a Q descriptor might or might not be nullable (we'll know once we load the class, which we do eagerly.) What we've done is wrested control of flatness away from the language, and ceded it to the VM, where it belongs.? The user/language expresses semantic requirements (e.g., nullability) and the VM chooses a representation.? That's how we like it. #### It's all about the references The rethink of L vs Q enabled a critical restack of the user model.? With this reinterpretation, Q descriptors can (based on what is in the classfile) still be reference types -- and these reference types can still be flattened; alternately, with side-channels for preload metadata on `L` descriptors, we may be able to get to non-flat references under `L` descriptors. Returning to the tempting user knobs of nullability and tearability, we can now put these where they belong: nullability is a property of _reference types_ -- and some primitive classes can be reference types.? Similarly, the initialization safety of immutable objects derives from the fact that object references are loaded atomically (with respect to stores of the same reference.) Non-tearability is also a property of reference types.? (Similar with layout circularity; references can break layout circularities.)? So rather than the user choosing nullability and non-tearability as ad-hoc choices, we treat them as affordances of references, and let users choose between reference-only primitive classes, and the more traditional primitive classes, that come in both reference and value flavors. > This restack allows us to eliminate `ref-default` completely (we'll share more > details later), which in turn allows us to eliminate `.val` completely. > Further, the use cases for `.ref` become smaller. #### The buckets So, without further ado, let's meet the new user model.? The names may change, but the concepts seem pretty sensible.? We have identity classes, as before; let's call that the first bucket.? These are unchanged; they are always translated with L descriptors, and there is only one usable `Class` literal for these. The second bucket are _identity-free reference classes_.? They come with the restrictions on identity-free classes: no mutability and limited extensibility. Because they are reference types, they are nullable and receive tearing protection.? They are flattenable (though, depending on layout size and hardware details, we may choose to get tearing protection by maintaining the indirection.)? These might be with Q descriptors, or with modified L descriptors, but there is no separate `.ref` form (they're already references) and there is only one usable `Class` literal for these. The third bucket are the _true primitives_.? These are also identity-free classes, but further give rise to both value and reference types, and the value type is the default (we denote the reference type with the familiar `.ref`.) Value types are non-nullable, and permit tearing just as existing primitives do. The `.ref` type has all the affordances of reference types -- nullability and tearing protection.? The value type is translated with Q; the reference type is translated with L.? There are two mirrors (`P.class` and `P.ref.class`) to reflect the difference in translation and semantics. A valuable aspect of this translation strategy is that there is a deterministic, 1:1 correspondence between source types and descriptors. How we describe the buckets is open to discussion; there are several possible approaches.? One possible framing is that the middle bucket gives up identity, and the third further gives up references (which can be clawed back with `.ref`), but there are plenty of ways we might express it. If these are expressed as modifiers, then they can be applied to records as well. Another open question is whether we double down, or abandon, the terminology of boxing.? On the one hand, users are familiar with it, and the new semantics are the same as the old semantics; on the other, the metaphor of boxing is no longer accurate, and users surely have a lot of mental baggage that says "boxes are slow."? We'd like for users to come to a better understanding of the difference between value and reference types. #### Goodbye, direct control over flattening In earlier explorations, we envisioned using `X.ref` as a way to explicitly ask for no flattening.? But in the proposed model, flattening is entirely under the control of the VM -- where we think it belongs. #### What's left for .ref? A pleasing outcome here is that many of the use cases for `X.ref` are subsumed into more appropriate mechanisms, leaving a relatively small set of corner-ish cases.? This is what we'd hoped `.ref` would be -- something that stays in the corner until summoned.? The remaining reasons to use `X.ref` at the use site include: ?- Boxing.? Primitives have box objects; strict value-based classes need ?? companion reference types for all the same situations as today's primitives ?? do.? It would be odd if the box were non-denotable. ?- Null-adjunction.? Some methods, like `Map::get`, return null to indicate no ?? mapping was present.? But if in `Map`, `V` is not nullable, then? there ?? is no way to express this method.? We envision that such methods would return ?? `V.ref`, so that strict value-based classes would widened to their "box" on ?? return, and null would indicate no mapping present. ?- Cycle-breaking.? Primitives that are self-referential (e.g., linked list node ?? classes that have a next node field) would have layout circularities; using a ?? reference rather than a value allows the circularity to be broken. This list is (finally!) as short as we would like it to be, and devoid of low-level control over representation; users use `X.ref` when they need references (either for interop with reference types, or to require nullability). Our hope all along was that `.ref` was mostly "break glass in case of emergency"; I think we're finally there. #### Migration The topic of migration is a complex one, and I won't treat it fully here (the details are best left until we're fully agreed on the rest.) Earlier treatments of migration were limited, in that even with all the complexity of `ref-default`, we still didn't get all the flattening we wanted, because the laziness of `L` descriptors kept us from knowing about potential flattenability until it was too late.? Attempts to manage "preload lists" or "side preload channels" in previous rounds foundered due to complexity or corner cases, but the problem has gotten simpler, since we're only choosing representation rather than value sets now -- which means that the `L*` types might work out here. Stay tuned for more details. ## Reflection Earlier designs all included some non-intuitive behavior around reflection. What we'd like to do is align the user-visible types with reflection literals with descriptors, following the invariant that ???? new X().getClass() == X.class ## TBD Stay tuned for some details on managing null encoding and detection, reference types under either Q or modified L descriptors, and some thoughts on painting the bikeshed. growing: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 From john.r.rose at oracle.com Tue Nov 2 21:54:17 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 2 Nov 2021 21:54:17 +0000 Subject: Consolidating the user model In-Reply-To: References: Message-ID: +100; great summary > On Nov 2, 2021, at 2:18 PM, Brian Goetz wrote: > > which means that the `L*` types might work out here. Stay tuned for more details. A footnote, FTR, about L*-descriptors, in case it doesn?t ring a bell. Brian is referring here to the thing we have talked about several years ago, of loosely coupling a side-record with an occurrence of L-Foo that means ?link like L-Foo, but load and adapt like Q-Foo?. We went through some of these iterations even before we settled on Q-descriptors; they are back again, but in a far more tractable form we think. L* is not a new descriptor, it?s just an L (so it links to plain L?s) but some sort of star-like marking * (not really in the descriptor string but a side channel!) alerts the JVM to do extra loading and adapting. So, one current vision of this side-channel is a very limited early use of the ?Type Restriction? mechanism, as mentioned in the Parametric VM proposal and elsewhere. The idea is that a type L*-Foo would be TR-ed to itself (Foo.class) and since TR?s use eager loading (of the content of the TR, not of the type it applies to) the effect would be similar to a Q-Foo, but it would still be spelled L-Foo. To avoid implementation burdens, the JVM would not accept any more ?interesting? TRs, until we need to build them out for specialized generics. Or we?d just have a one-shot, purpose-built side channel which smells like an infant sibling to an eventual real T.R. feature. A T.R. that really restricts a type (instead of just asks the JVM to take a closer look a la Q-Foo) is a much deeper implementation challenge, since it creates possible failure points when restrictions are violated. An L* cannot violate itself since the value set is the same. This is why L* only works on the middle bucket. L*-Foo (using TRs or any other side-channel) is not a perfect substitute for Q-Foo, because the stars ?rub off too easily? to ensure rigid correspondence between callers and callee. This means L*-based API linkage requires more speculation and runtime checking, compared to Q-based API linkage. Although it may seem odd, there are a number of practical reasons to use L* in the middle bucket but Q in the left bucket. The left bucket needs two descriptors, so L/Q. The middle bucket has just one class mirror, so either Q or else a mix of L and L*, and it needs some story for migration for a few of its citizens, so L* looks good again (linking with legacy L with a dynamic mixup). As Brian says, we may elect to use Q uniformly for the middle bucket, and handle the migration problem another way. It would be good if we could decide Q vs. L* for the middle bucket without co-solving the migration problem. Anyway, such smaller details are up in the air. The points in Brian?s message are the high-order bits, and the stuff I?ve shared here is a footnote. Please do give the high-order bits your best attention. It?s a really good write-up. ? John From brian.goetz at oracle.com Tue Nov 2 21:58:37 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 2 Nov 2021 17:58:37 -0400 Subject: Consolidating the user model In-Reply-To: References: Message-ID: <4388956e-f5a0-13ad-750f-cf15ba74b630@oracle.com> "Links like an L; works like a Q" On 11/2/2021 5:54 PM, John Rose wrote: > L* is not a new descriptor, it?s just an L (so it links to plain L?s) > but some sort of star-like marking * (not really in the descriptor > string but a side channel!) alerts the JVM to do extra loading > and adapting. From kevinb at google.com Tue Nov 2 22:44:34 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 2 Nov 2021 15:44:34 -0700 Subject: Consolidating the user model In-Reply-To: References: Message-ID: Good stuff. On Tue, Nov 2, 2021 at 2:19 PM Brian Goetz wrote: But, the "traditional" primitives are not nullable, and for good reason; > zero is > a fine default value, > Yes, it would have been impractical to do otherwise, but here's my stock reminder that zero being a "fine" default value has *still nonetheless* caused many thousands of bugs. Again, it had to be done. But I think it's notable that those bugs happen even for the types that have the *absolute most sensible* default values. My concern is that the purest form of value types will be overused and misused for even less clear-cut cases. I would like to think that we can convince these users that they really want the next "bucket" over, which I think comes down to whether the added cost of `null` is worth it. Returning to the tempting user knobs of nullability and tearability, we can > now > put these where they belong: nullability is a property of _reference types_ > -- > Though I've argued loudly here for the notion that nullability is not *conceptually* intrinsic to references (and though I still think we should start saying "the null value" instead of "the null reference"), I nevertheless find this an acceptable compromise, because (a) I think nullable values was just introducing too much practical complexity (b) I hope most use cases really will just use the middle bucket and be fine. Btw, am I right that for the middle bucket, `==` will fail (at compile-time when possible)? The third bucket are the _true primitives_. These are also identity-free > classes, but further give rise to both value and reference types, and the > value > type is the default (we denote the reference type with the familiar > `.ref`.) > Value types are non-nullable, and permit tearing just as existing > primitives do. > The `.ref` type has all the affordances of reference types -- nullability > and > tearing protection. > In fact, if I'm looking at a middle-bucket class, and I'm looking at one of these `.ref` types of "primitive" class, as far as I can tell I should be able to think of these in exactly the same way as exactly the same things. (I'm aware you intend to define `==` differently for the two, but I'll get into my massive concerns about that later.) Basically, that's good. > How we describe the buckets is open to discussion; there are several > possible > approaches. One possible framing is that the middle bucket gives up > identity, > and the third further gives up references (which can be clawed back with > `.ref`), but there are plenty of ways we might express it. > We should address the conceptual-simplicity cost of this "clawing back" sometime. Another open question is whether we double down, or abandon, the > terminology of > boxing. On the one hand, users are familiar with it, and the new > semantics are > the same as the old semantics; on the other, the metaphor of boxing is no > longer > accurate, and users surely have a lot of mental baggage that says "boxes > are > slow." We'd like for users to come to a better understanding of the > difference > between value and reference types. > The key for me is that the new boxing takes over for everything the old boxing did, and more. So, it's better boxing. I see no value in fighting against that. If users are thinking of this by starting from what they know about int/Integer, that's actually *good*. They will just find out it's better, that's all. - Null-adjunction. Some methods, like `Map::get`, return null to indicate > no > mapping was present. But if in `Map`, `V` is not nullable, then > there > is no way to express this method. We envision that such methods would > return > `V.ref`, so that strict value-based classes would widened to their > "box" on > return, and null would indicate no mapping present. > Now just spell it `?` :-) (not serious. Also, not not serious) ## Reflection > > Earlier designs all included some non-intuitive behavior around > reflection. > What we'd like to do is align the user-visible types with reflection > literals > with descriptors, following the invariant that > > new X().getClass() == X.class > Seems like part of the goal would be making it fit naturally with the current int/Integer relationship (of course, `42.getClass()` is uncommitted to any precedent). It seems like `Complex.class` (as opposed to `Complex.ref.class`) would never be returned by `Object.getClass()` in any other condition than when you could have just written `Complex.class` anyway. Actually, that makes me start to wonder if `getClass()` should be another method like `notify` that simply doesn't make sense to call on value types. (But we still need the two distinct Class instances per class anyway.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Tue Nov 2 22:58:58 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 2 Nov 2021 22:58:58 +0000 Subject: Consolidating the user model In-Reply-To: References: Message-ID: On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion > wrote: Btw, am I right that for the middle bucket, `==` will fail (at compile-time when possible)? I don?t see how middle bucket references, which behave very much like old-bucket references (id-classes), would tend to fail on ==/acmp any more than old-bucket references. Example please? If X is an old-bucket or middle-bucket type, then all of these are OK and lead to expected results: X x, x1; x == x x == x1 x == null If Y is a class which is statically disjoint from X, then these may fail, but not through any bucket-related effect: Y y; x == y //error: incomparable types: X and Y I think I?m missing your point? From kevinb at google.com Tue Nov 2 23:07:30 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 2 Nov 2021 16:07:30 -0700 Subject: Consolidating the user model In-Reply-To: References: Message-ID: Hmm, I'd rather pretend I hadn't said it, if that will keep the focus on the main points. :-) I haven't caught up on the plans for equality in a long time. On Tue, Nov 2, 2021 at 3:59 PM John Rose wrote: > On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion wrote: > > > Btw, am I right that for the middle bucket, `==` will fail (at > compile-time when possible)? > > > I don?t see how middle bucket references, which behave very > much like old-bucket references (id-classes), would tend to > fail on ==/acmp any more than old-bucket references. > > Example please? > > If X is an old-bucket or middle-bucket type, then all of > these are OK and lead to expected results: > > X x, x1; > x == x > x == x1 > x == null > > If Y is a class which is statically disjoint from X, then > these may fail, but not through any bucket-related > effect: > > Y y; > x == y //error: incomparable types: X and Y > > I think I?m missing your point? > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Tue Nov 2 23:08:39 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 2 Nov 2021 23:08:39 +0000 Subject: Consolidating the user model In-Reply-To: References: Message-ID: On Nov 2, 2021, at 3:44 PM, Kevin Bourrillion > wrote: new X().getClass() == X.class Seems like part of the goal would be making it fit naturally with the current int/Integer relationship (of course, `42.getClass()` is uncommitted to any precedent). It seems like `Complex.class` (as opposed to `Complex.ref.class`) would never be returned by `Object.getClass()` in any other condition than when you could have just written `Complex.class` anyway. Actually, that makes me start to wonder if `getClass()` should be another method like `notify` that simply doesn't make sense to call on value types. (But we still need the two distinct Class instances per class anyway.) Yep, you hit on a tricky spot there. One part of the problem is that getClass, specifically and uniquely, has a special relation the the primitive types which is coupled to the typing of class literals like int.class (which is Class not Class). Also, Integer is a class, and Complex is a class, but they have different ?tilts?: Integer is (kinda sorta) int.ref but Complex is not Complex.ref, and the mirrors reflect this difference. Sorting this out seems to be an overconstrained problem. As you say, we have not yet applied ?.getClass? to any non-ref type, yet, but we will certainly do so, and that?s when the fun begins. Also, trying to retype int.class as Class is a related part of the fun. In the end, however nicely we ?heal the rift? between good old int and his new friend Complex, there will surely be some scars on good old int from his time marooned (with just a few friends) in primitive-land. (My current mental metaphor for the isolation of int is Gilligan, who had about the same number of unfortunate island-mates as int does.) From brian.goetz at oracle.com Tue Nov 2 23:53:38 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 2 Nov 2021 19:53:38 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: Message-ID: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> > My concern is that the purest form of value types will be overused and > misused for even less clear-cut cases. I would like to think that we > can convince these users that they really want the next "bucket" over, > which I think comes down to whether the added cost of `null` is worth it. I share this concern.? Do you have any thoughts of how to make B2 more attractive? For the record, we expect to see similar stack-based and IR-based flattening across buckets 2 and 3, but measurably less heap-based flattening for bucket 2.? Plus the extra footprint. > Btw, am I right that for the middle bucket, `==` will fail (at > compile-time when possible)? B2 gets state-based ==, just like B3.? No difference there; if you have no identity, then equality is state-based. > > > The third bucket are the _true primitives_.? These are also > identity-free > classes, but further give rise to both value and reference types, > and the value > type is the default (we denote the reference type with the > familiar `.ref`.) > Value types are non-nullable, and permit tearing just as existing > primitives do. > The `.ref` type has all the affordances of reference types -- > nullability and > tearing protection. > > > In fact, if I'm looking at a middle-bucket class, and I'm looking at > one of these `.ref` types of "primitive" class, as far as I can tell I > should be able to think of these in exactly the same way as exactly > the same things. Yes.? A B2, and a B3.ref, behave identically. > (I'm aware you intend to define `==` differently for the two, but I'll > get into my massive concerns about that later.) Actually, B2, B3, and B3.ref all have the same interpretation of ==, which is state-based.? (You can think of this as "box (or unbox) before comparing a B3 with a B3.ref.) > > ?- Null-adjunction.? Some methods, like `Map::get`, return null to > indicate no > ?? mapping was present.? But if in `Map`, `V` is not > nullable, then? there > ?? is no way to express this method.? We envision that such > methods would return > ?? `V.ref`, so that strict value-based classes would widened to > their "box" on > ?? return, and null would indicate no mapping present. > > > Now just spell it `?` :-) > (not serious. Also, not not serious) Yeah, maybe.? If that were the only difference, I'd be more inclined.? But it drags in ref-ness, and all the reference affordances, so it feels more misleading than helpful at this point. > > ## Reflection > > Earlier designs all included some non-intuitive behavior around > reflection. > What we'd like to do is align the user-visible types with > reflection literals > with descriptors, following the invariant that > > ???? new X().getClass() == X.class > > > Seems like part of the goal would be making it fit naturally with the > current int/Integer relationship (of course, `42.getClass()` is > uncommitted to any precedent). There's a nasty tension here.? On the one hand, for B3 classes, it makes sense for b3.getClass() to yield the val mirror, but int.getClass() historically corresponds to the ref mirror (Object o = 3; o.getClass() == Integer.class.)? To invert it, we would have to break a lot of reflection-using code that tests for Integer.class because that's how primitives are reflected.? Work in progress. > Actually, that makes me start to wonder if `getClass()` should be > another method like `notify` that simply doesn't make sense to call on > value types. (But we still need the two distinct Class instances per > class anyway.) You could argue that it doesn't make sense on the values, but surely it makes sense on their boxes.? But its a thin argument, since classes extend Object, and we want to treat values as objects (without appealing to boxing) for purposes of invoking methods, accessing fields, etc.? So getClass() shouldn't be different. From brian.goetz at oracle.com Wed Nov 3 14:05:21 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Nov 2021 10:05:21 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: Message-ID: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> > I haven't caught?up on the plans for equality in a long time. This is a good time to catch up on this. Today, the JVM provides an equality operation on objects in the form of the `ACMP` instructions.? It also provides per-primitive equality operations (`ICMP`, `FCMP`, etc) for the various primitive types. (The JVM mostly erases boolean, byte, char, and short to int, so some of these instructions are "missing".) Today, the language translate the `==` operator to the appropriate ACMP / ICMP / etc instruction, depending on the static type of the operands.? (JLS Ch5 (Contexts and Conversions) does the lifting of managing mismatches when we, say, compare an object to a primitive.)? The important thing to take away here is that there really are multiple `==` operators, they are just spelled the same way, and disambiguated by static typing; let's call them `id==`, `int==`, etc if there's any ambiguity.? Note that `float==` and `double==` are weird when it comes to `NaN`, so `==` on primitives is not necessarily just a straight bitwise comparison. Object has an `equals` method; the default implementation is: ??? boolean equals(Object other) { ??????? return this == other; ??? } So in the absence of code to the contrary, two objects are `equals` if they are the same object. Extrapolating, ACMP is a _substitutability test_; it says that substituting one for the other would have no detectable differences.? Because all objects have a unique identity, comparing the identities is both necessary and sufficient for a substitutability test.? This is the foundation on which we abstract `==` on the new classes. If C is a class with no identity, that means an instance is the state, the whole state, and nothing but the state.? So the natural way to ask "could I substitute instance c1 for instance c2" is to compare each of its fields with a substitutability test.? Which is exactly what `ACMP` does on primitive objects.? In keeping with the notion that each primitive type has its own `==`, we'll write `Point==` for the equality on `Point`. For a simple `Point` primitive class, this is obvious, but it gets tricky when a primitive is hiding behind a broader static type like Object or an interface type.? Consider: ??? primitive class Box { ??????? Object contents; ??? } How do we compare two boxes?? By comparing their contents.? How do we compare contents?? With a substitutability test.? If we have identity objects in the box, then the box comparison is "are you both boxes, and are your contents `id==`".? What if we have Points in the box?? We need to compare them with `Point==`.? How do we know we have Points in the box?? By looking at their dynamic type.? So the `==` operation on primitive objects not only recurses into fields, but for fields that could hold _either_ identity or primitive objects (these are `Object`, interfaces, and some abstract classes), we dynamically select the `==` operator to use on that field.? (Edge cases: an id object is never `==` to a primitive object; null is always `==` to itself.) Note that `.ref` is transparent here; in order to get a `Point` into the `Object` field, we (probably silently) converted it to `Point.ref`.? But `Point.ref` uses the same `==` computation as `Point`.? The same is true for the B2/B3 distinction; no difference.? Objects without identity are equal when their state is equal, whether they're a B2, B3, or B3.ref. Possibly surprisingly, this has been pushed all the way into `ACMP`.? This means that existing code like the default implementation of `Object::equals` just works; if you give it primitive objects, it knows what to do, and performs the proper substitutability test.? One rough edge is that we don't use `==` as the test for float and double fields, because it's not a proper substitutability test; we use the semantics of `Float::equals` and `Double::equals` instead.? Historical wart. The bottom line is that `==` is preserved as a substitutability test on instances of all primitive classes, whether they're "stored" by reference or value.? A corollary is that (finally) Integer instances provide reliable `==` semantics, rather than the old unreliable cache-based semantics.? (One rift healed.) From daniel.smith at oracle.com Wed Nov 3 14:45:59 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 Nov 2021 14:45:59 +0000 Subject: EG meeting, 2021-11-03 Message-ID: EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). Note that we're still on DST in the US, won't shift to 5pm UTC until next time. We'll discuss: "Consolidating the user model": Brian described a user model centered on reference and value types. Sent just yesterday, so we'll probably spend most of the time just reviewing the main ideas. From forax at univ-mlv.fr Wed Nov 3 14:50:46 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 3 Nov 2021 15:50:46 +0100 (CET) Subject: Consolidating the user model In-Reply-To: References: Message-ID: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr> I really like this, it's far better than how i was seeing Valhalla, pushing .ref into a corner is a good move. I still hope that moving from B1 to B2 can be almost backward compatible, if no direct access to the constructor, no synchronized and reasonable uses of ==. My only concern now is the dual of Kevin's concern, what if people discover that they always want to use the identitiy-free reference types (B2), because it is better integrated with the rest of the Java world and that in the end, the OG/pure primitive types (B3) are almost never used. R?mi > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Mardi 2 Novembre 2021 22:18:46 > Subject: Consolidating the user model > We've been grinding away, and we think we have a reduced-complexity user model. > This is all very rough, and there's lots we need to write up more carefully > first, but I'm sharing this as a preview of how we can simplify past where JEPs > 401 and 402 currently stand. > # Consolidating the user model > As the mechanics of primitive classes have taken shape, it is time to take > another look at the user model. > Valhalla started with the goal of providing user-programmable classes which > could be flat and dense in memory. Numerics are one of the motivating use > cases, but adding new primitive types directly to the language has a very high > barrier. As we learned from [Growing a Language][growing] there are infinitely > many numeric types we might want to add to Java, but the proper way to do that > is as libraries, not as language features. > In the Java language as we have today, objects and primitives are different in > almost every way: objects have identity, primitives do not; objects are referred > to through references, primitives are not; object references can be null, > primitives cannot; objects can have mutable state, primitive can not; classes > can be extended, primitive types cannot; loading and storing of object > references is atomic, but loading and storing of large primitives is not. For > obvious reasons, the design center has revolved around the characteristics of > primitives, but the desire to have it both ways is strong; developers continue > to ask for variants of primitive classes that have a little more in common with > traditional classes in certain situations. These include: > - **Nullability.** By far the most common concern raised about primitive > classes, which "code like a class", is the treatment of null; many developers > want the benefits of flattening but want at least the option to have `null` > as the default value, and getting an exception when an uninitialized instance > is used. > - **Classes with no sensible default.** Prior to running the constructor, the > JVM initializes all memory to zero. Since primitive classes are routinely > stored directly rather than via reference, it is possible that users might be > exposed to instances in this initial, all-zero state, without a constructor > having run. For numeric classes such as complex numbers, zero is a fine > default, and indeed a good default. But for some classes, not only is zero > not the best default, but there _is no good default_. Storing dates as > seconds-since-epoch would mean uninitialized dates are interpreted as Jan 1, > 1970, which is more likely to be a bug than the desired behavior. Classes > may try to reject bad values in their constructor, but if a class has no > sensible default, then they would rather have a default that behaves more > like null, where you get an error if you dereference it. And if the default > is going to behave like null, it's probably best if the default _is_ null. > - **Migration**. Classes like `Optional` and `LocalDate` today are > _value-based_, meaning they already disavow the use of object identity and > therefore are good candidates for being primitive classes. However, since > these classes exist today and are used in existing APIs and client code, they > would have additional compatibility constraints. They would have to continue > to be passed by object references to existing API points (otherwise the > invocation would fail to link) and these types are already nullable. > - **Non-tearability.** 64-bit primitives (`long` and `double`) risk _tearing_ > when accessed under race unless they are declared `volatile`. However, > objects with final fields offer special initialization-safety guarantees > under the JMM, even under race. So should primitive classes be more like > primitives (risking being seen to be in impossible states), or more like > classes (consistent views for immutable objects are guaranteed, even under > race)? Tear-freedom has potentially signficant costs, and tearing has > signficant risks, so it is unlikely one size fits all. > - **Direct control over flattening.** In some cases, flattening is > counterproductive. For example, if we have a primitive class with many > fields, sorting a flattened array may be more expensive than sorting an array > of references; while we don't pay the indirection costs, we do pay for > increased footprint, as well as increased memory movement when swapping > elements. Similarly, if we want to permute an array with a side index, it > may well be cheaper to maintain an array of references rather than copying > all the data into a separate array. > These requests are all reasonable when taken individually; its easy to construct > use cases where one would want it both ways for any given characteristic. But > having twelve knobs (and 2^12 possible settings) on primitive classes is not a > realistic option, nor does it result in a user model that is easy to reason > about. > In the current model, a primitive class is really like a primitive -- no nulls, > no references, always flattened, tearable when large enough. Each primitive > class `P` comes with a companion reference type (`P.ref`), which behaves much as > boxes do today (except without identity.) There is also, for migration, an > option (`ref-default`) to invert the meaning of the unqualified name, so that by > default `Optional` means `Optional.ref`, and flattening must be explicitly > requested which, in turn, is the sole motivation for the `P.val` denotation.) We > would like for the use of the `.ref` and `.val` qualifiers to be rare, but > currently they are not rare enough for comfort. > Further, we've explored but have not committed to a means of declaring primitive > classes which don't like their zero value, for primitive classes with no good > default, so that dereferencing a zero value would result in some sort of > exception. (The nullability question is really dominated by the initialization > safety question.) This would be yet another variant of primitive class. > A serious challenge to this stacking is the proliferation of options; there are > knobs for nullability, zero-hostility, migration, tear-resistence, etc. > Explaining when to use which at the declaration site is already difficult, and > there is also the challenge of when to use `ref` or `val` at the use site. The > current model has done well at enumerating the requirements (and, helping us > separate the real ones from the wannabes), so it is now time to consolidate. > ## Finding the buckets > Intuitively, we sense that there are three buckets here; traditional identity > classes in one bucket, traditional primitives (coded like classes) in another, > and a middle bucket that offers some "works like an int" benefits but with some > of the affordances (e.g., nullability, non-tearability) of the first. > Why have multiple buckets at all? Project Valhalla has two main goals: better > performance (enabling more routine flattening and better density), and unifying > the type system (healing the rift between primitives and objects.) It's easy to > talk about flattening, but there really are at least three categories of > flattening, and different ones may be possible in different situations: > - **Heap flattening.** Inlining the layout of one object into another object > (or array) layout; when class `C` has a field of type `D`, rather than > indirecting to a `D`, we inline D's layout directly into C. > - **Calling convention flattening.** Shredding a primitive class into its > fields in (out-of-line) method invocations on the call stack. > - **IR flattening.** When calling a method that allocates a new instance and > returns it, eliding the allocation and shredding it into its fields instead. > This only applies when we can inline through from the allocation to the > consumption of its fields. (Escape analysis also allows this form of > flattening, but only for provably non-escaping objects. If we know the > object is identity free, we can optimize in places where EA would fail.) > #### Nullability > Variables in the heap (fields and array elements) must have a default value; for > all practical purposes it is a forced move that this default value is the > all-zero-bits value. This zero-bits value is interpreted as `null` for > references, zero for numerics, and `false` for booleans today. > If primitives are to "code like a class", the constructor surely must be able to > reject bad proposed states. But what if the constructor thinks the default > value is a bad state? The desire to make some primitive classes nullable stems > from the reality that for some classes, we'd like a "safe" default -- one that > throws if you try to use it before it is initialized. > But, the "traditional" primitives are not nullable, and for good reason; zero is > a fine default value, and the primitives we have today typically use all their > bit patterns, meaning that arranging for a representation of null requires at > least an extra bit, which in reality means longs would take at least 65 bits > (which in reality means 128 bits most of the time.) > So we see nullability is a tradeoff; on the one hand, it gives us protection > from uninitialized variables, but also has costs -- extra footprint, extra > checks. We experimented with a pair of modifiers `null-default` and > `zero-default`, which would determine how the zero value is interpreted. But > this felt like solving the problem at the wrong level. > #### Tearing > The Java Memory Model includes special provisions for visibility of final > fields, even with the reference to their container object is shared via a data > race. These initialization safety guarantees are the bedrock of the Java > security model; a String being seen to change its value -- or to not respect > invariants established by its constructor -- would make it nearly impossible to > reason about security. > On the other hand, longs and doubles permit tearing when shared via data races. > This isn't great, but preventing tearing has a cost, and the whole reason we got > primitives in 1995 was driven by expectations and tradeoffs around arithmetical > performance. Preventing tearing is still quite expensive; above 64 bits, atomic > instructions have a significant tax, and often the best way to manage tearing is > via an indirection when stored in the heap (which is precisely what flattening > is trying to avoid.) > When we can code primitives "like a class", which should they be more like? It > depends! Classes that are more like numerics may be willing to tolerate tearing > for the sake of improved performance; classes that are more like "traditional > classes" will want the initialization safety afforded to immutable objects > already. > So we see tearability is a tradeoff; on the one hand, it protects invariants > from data races, but also has costs -- expensive atomic instructions, or reduced > heap flattening. We experimented with a modifier that marks classes as > non-tearable, but this would require users to keep track of which primitive > classes are tearable and which aren't. This felt like solving the problem at > the wrong level. > #### Migration > There are some classes -- such as `java.lang.Integer`, or `java.util.Optional` > -- that meet all the requirements to be declared as (nullable) primitive > classes, but which exist today in as identity classes. We would like to be able > to migrate these to primitives to get the benefits of flattening, but are > constrained that (at least for non-private API points) they must be represented > as `L` descriptors for reasons of binary compatibility. Our existing > interpretation of `L` descriptors is that they represent references as pointers; > this means that even if we could migrate these types, we'd still give up on some > forms of flattening (heap and stack), and our migration would be less than > ideal. > Worse, the above interpretation of migration suggests that sometimes a use of > `P` is translated as `LP`, and sometimes as `QP`. To the degree that there is > uncertainty in whether a given source type translates to an `L` or `Q` > descriptor, this flows into either uncertainty of how to use reflection (users > must guess as to whether a given API point using `P` was translated with `LP` or > `QP`), or uncertainty on the part of reflection (the user calls > `getMethod(P.class)`, and reflection must consider methods that accept both `LP` > and `QP` descriptors.) > ## Restacking for simplicity > The various knobs on the user model (which may flow into translation and > reflection) risk being death by 1000 cuts; they not only add complexity to the > implementation, but they add complexity for users. This prompted a rethink of > assumptions at every layer. > #### Nullable primitives > The first part of the restacking involved relaxing the assumption that primitive > classes are inherently non-nullable. We shied away from this for a long time, > knowing that there would be significant VM complexity down this road, but in the > end concluded that the complexity is better spent here than elsewhere. These > might be translated as `Q` descriptors, or might be translated as `L` > descriptors with a side channel for preloading metadata -- stay tuned for a > summary of this topic. > > Why Q? The reason we have `Q` descriptors at all is that we need to know > things about classes earlier than we otherwise would, in order to make decisions > that are hard to unmake later (such as layout and calling convention.) Rather > than interpreting `Q` as meaning "value type" (as the early prototypes did), `Q` > acquired the interpretation "go and look." When the JVM encounters a field or > method descriptor with a `Q` in it, rather than deferring classloading as long > as possible (as is the case with `L` descriptors), we load the class eagerly, so > we can learn all we need to know about it. From classloading, we might not only > learn that it is a primitive class, but whether it should be nullable or not. > (Since primitive classes are monomorphic, carrying this information around on a > per-linkage basis is cheap enough.) > So some primitive classes are marked as "pure" primitives, and others as > supporting null; when the latter are used as receivers, `invokevirtual` does a > null check prior to invocation (and NPEs if the receiver is null). When moving > values between the heap and the stack (`getfield`, `aastore`, etc), these > bytecodes must check for the "flat null" representation in the heap and a real > null on the stack. The VM needs some help from the classfile to help choose a > bit pattern for the flat null; the most obvious strategy is to inject a > synthetic boolean, but there are others that don't require additional footprint > (e.g., flow analysis that proves a field is assigned a non-default value; using > low-order bits in pointers; using spare bits in booleans; using pointer colors; > etc.) The details are for another day, but we would like for this to not > intrude on the user model. > #### L vs Q > The exploration into nullable primitives prompted a reevaluation of the meaning > of L vs Q. Historically we had interpreted L vs Q as being "pointer vs flat" > (though the VM always has the right to unflatten if it feels like it.) But over > time we've been moving towards Q mostly being about earlier loading (so the VM > can learn what it needs to know before making hard-to-reverse decisions, such as > layout.) So let's go there fully. > A `Q` descriptor means that the class must be loaded eagerly (Q for "quick") > before resolving the descriptor; an `L` descriptor means it _must not be_ (L for > "lazy"), consistent with current JVMS treatment. Since an `L` descriptor is > lazily resolved, we have to assume conservatively that it is nullable; a Q > descriptor might or might not be nullable (we'll know once we load the class, > which we do eagerly.) > What we've done is wrested control of flatness away from the language, and ceded > it to the VM, where it belongs. The user/language expresses semantic > requirements (e.g., nullability) and the VM chooses a representation. That's > how we like it. > #### It's all about the references > The rethink of L vs Q enabled a critical restack of the user model. With this > reinterpretation, Q descriptors can (based on what is in the classfile) still be > reference types -- and these reference types can still be flattened; > alternately, > with side-channels for preload metadata on `L` descriptors, we may be able to > get > to non-flat references under `L` descriptors. > Returning to the tempting user knobs of nullability and tearability, we can now > put these where they belong: nullability is a property of _reference types_ -- > and some primitive classes can be reference types. Similarly, the > initialization safety of immutable objects derives from the fact that object > references are loaded atomically (with respect to stores of the same reference.) > Non-tearability is also a property of reference types. (Similar with layout > circularity; references can break layout circularities.) So rather than the > user choosing nullability and non-tearability as ad-hoc choices, we treat them > as affordances of references, and let users choose between reference-only > primitive classes, and the more traditional primitive classes, that come in both > reference and value flavors. > > This restack allows us to eliminate `ref-default` completely (we'll share more > > details later), which in turn allows us to eliminate `.val` completely. > > Further, the use cases for `.ref` become smaller. > #### The buckets > So, without further ado, let's meet the new user model. The names may change, > but the concepts seem pretty sensible. We have identity classes, as before; > let's call that the first bucket. These are unchanged; they are always > translated with L descriptors, and there is only one usable `Class` literal for > these. > The second bucket are _identity-free reference classes_. They come with the > restrictions on identity-free classes: no mutability and limited extensibility. > Because they are reference types, they are nullable and receive tearing > protection. They are flattenable (though, depending on layout size and hardware > details, we may choose to get tearing protection by maintaining the > indirection.) These might be with Q descriptors, or with modified L > descriptors, but there is no separate `.ref` form (they're already references) > and there is only one usable `Class` literal for these. > The third bucket are the _true primitives_. These are also identity-free > classes, but further give rise to both value and reference types, and the value > type is the default (we denote the reference type with the familiar `.ref`.) > Value types are non-nullable, and permit tearing just as existing primitives do. > The `.ref` type has all the affordances of reference types -- nullability and > tearing protection. The value type is translated with Q; the reference type is > translated with L. There are two mirrors (`P.class` and `P.ref.class`) to > reflect the difference in translation and semantics. > A valuable aspect of this translation strategy is that there is a deterministic, > 1:1 correspondence between source types and descriptors. > How we describe the buckets is open to discussion; there are several possible > approaches. One possible framing is that the middle bucket gives up identity, > and the third further gives up references (which can be clawed back with > `.ref`), but there are plenty of ways we might express it. If these are > expressed as modifiers, then they can be applied to records as well. > Another open question is whether we double down, or abandon, the terminology of > boxing. On the one hand, users are familiar with it, and the new semantics are > the same as the old semantics; on the other, the metaphor of boxing is no longer > accurate, and users surely have a lot of mental baggage that says "boxes are > slow." We'd like for users to come to a better understanding of the difference > between value and reference types. > #### Goodbye, direct control over flattening > In earlier explorations, we envisioned using `X.ref` as a way to explicitly > ask for no flattening. But in the proposed model, flattening is entirely > under the control of the VM -- where we think it belongs. > #### What's left for .ref? > A pleasing outcome here is that many of the use cases for `X.ref` are subsumed > into more appropriate mechanisms, leaving a relatively small set of corner-ish > cases. This is what we'd hoped `.ref` would be -- something that stays in the > corner until summoned. The remaining reasons to use `X.ref` at the use site > include: > - Boxing. Primitives have box objects; strict value-based classes need > companion reference types for all the same situations as today's primitives > do. It would be odd if the box were non-denotable. > - Null-adjunction. Some methods, like `Map::get`, return null to indicate no > mapping was present. But if in `Map`, `V` is not nullable, then there > is no way to express this method. We envision that such methods would return > `V.ref`, so that strict value-based classes would widened to their "box" on > return, and null would indicate no mapping present. > - Cycle-breaking. Primitives that are self-referential (e.g., linked list node > classes that have a next node field) would have layout circularities; using a > reference rather than a value allows the circularity to be broken. > This list is (finally!) as short as we would like it to be, and devoid of > low-level control over representation; users use `X.ref` when they need > references (either for interop with reference types, or to require nullability). > Our hope all along was that `.ref` was mostly "break glass in case of > emergency"; I think we're finally there. > #### Migration > The topic of migration is a complex one, and I won't treat it fully here (the > details are best left until we're fully agreed on the rest.) Earlier treatments > of migration were limited, in that even with all the complexity of > `ref-default`, we still didn't get all the flattening we wanted, because the > laziness of `L` descriptors kept us from knowing about potential flattenability > until it was too late. Attempts to manage "preload lists" or "side preload > channels" in previous rounds foundered due to complexity or corner cases, but > the problem has gotten simpler, since we're only choosing representation rather > than value sets now -- which means that the `L*` types might work out here. > Stay tuned for more details. > ## Reflection > Earlier designs all included some non-intuitive behavior around reflection. > What we'd like to do is align the user-visible types with reflection literals > with descriptors, following the invariant that > new X().getClass() == X.class > ## TBD > Stay tuned for some details on managing null encoding and detection, > reference types under either Q or modified L descriptors, and some > thoughts on painting the bikeshed. > growing: [ https://dl.acm.org/doi/abs/10.1145/1176617.1176621 | > https://dl.acm.org/doi/abs/10.1145/1176617.1176621 ] From kevinb at google.com Wed Nov 3 15:15:43 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 08:15:43 -0700 Subject: Consolidating the user model In-Reply-To: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr> References: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr> Message-ID: On Wed, Nov 3, 2021 at 7:51 AM Remi Forax wrote: My only concern now is the dual of Kevin's concern, > what if people discover that they always want to use the identitiy-free > reference types (B2), because it is better integrated with the rest of the > Java world and that in the end, the OG/pure primitive types (B3) are almost > never used. > B2 is certainly the more basic feature, and could at least hypothetically release earlier than the rest. Regardless of timing, it does seem that the costs and benefits of B3 need to be interpreted *relative* to B2. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Nov 3 15:54:25 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Nov 2021 11:54:25 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr> References: <394055645.914485.1635951046089.JavaMail.zimbra@u-pem.fr> Message-ID: <278cd063-9240-47db-6cb7-956065a1fb60@oracle.com> > I really like this, it's far better than how i was seeing Valhalla, > pushing .ref into a corner is a good move. Yes, we always disliked how prevalent .ref was; it took several rounds of "shaking the box" to get it to stay in the corner. > I still hope that moving from B1 to B2 can be almost backward > compatible, if no direct access to the constructor, no synchronized > and reasonable uses of ==. Yes, this works out better than we had hoped it might; as you say, if B1 is value-based, it should be an "almost compatible" move to convert to B2.? Amazingly, it might even be "almost compatible" to go in the other direction too, something we'd almost given up on the possibility of.? Codes like a class, indeed. The cost of this is, of course, that a B2 class gets less optimization than a B3 one (though more than a B1 one.)? Less heap flattening, more footprint, more null checks.? Though still substantial stack (calling convention) / IR (scalarization) flattening.? How we guide people to this is the next challenge. > My only concern now is the dual of Kevin's concern, > what if people discover that they always want to use the > identitiy-free reference types (B2), because it is better integrated > with the rest of the Java world and that in the end, the OG/pure > primitive types (B3) are almost never used. In other words: having solved the almost-impossible technical problems, we now face the harder pedagogical problem :) I'm actually worried about the opposite, though!? I think its a bigger risk that people will use B3 over B2 "because performance", and put themselves in danger (e.g., tearing, unexpected zeroes) without realizing it. From kevinb at google.com Wed Nov 3 15:58:17 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 08:58:17 -0700 Subject: Equality operator for identityless classes In-Reply-To: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> References: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> Message-ID: I imagine we might be constrained to this design by the need to support compatible migration. So there may be nothing we can do. But there is a pretty serious problem here. Background: code like IdentityHashMap, which cares about *objects per se *instead of what those objects *represent*, is unusual, special-case, egghead, lift-the-caution-tape code. It is not normal. It's surely more common in JDK code. But I strongly suspect that the vast majority of `==` tests in the wild are not expressing questions of identity at all, but are abbreviations for `equals()` when the developer happens to believe it's safe. Many of those are of course bugs, and then there are plain accidental usages as well. Today, things are pretty okay because developers can learn that `==` is a code smell. A responsible code reviewer has to think through each one like this: 1. Look up the type. Is it a builtin, or Class? Okay, we're fine. 2. Is it an enum? Okay, I resent having to go look it up when they could have just used switch, but fine. 3. Wait, is this weird code that actually cares about objects instead of what they represent? This needs a comment. The problem is that now we'll be introducing a whole class of ... classes ... for which `==` does something reasonable: only the ones that happen to contain no references, however deeply nested! These cannot at all be easily distinguished. This is giving bugs a really fantastic way to hide. I think we'd better consider some heretical options, like introducing `===` and `!==` as sugar for Object.equals(). It seems tragic to imagine the entire world (except the special-case code) transitioning over to that, as it's quite ugly. But it would lead to more correct code. Maybe you have other ideas. On Wed, Nov 3, 2021 at 7:05 AM Brian Goetz wrote: Extrapolating, ACMP is a _substitutability test_; it says that > substituting one for the other would have no detectable differences. > Because all objects have a unique identity, comparing the identities is > both necessary and sufficient for a substitutability test. What you say here may be technically true, but people who override equals() are already trying their best to disavow identity in the only way they have. And that makes your statement here actually kinda *wrong*. Being a necessary and sufficient substitutability test is literally, exactly, what Object.equals() does (and never mind that people might implement it *wrong*). If that method's purpose is not to give classes control over their own substitutability test -- which they *need!* -- then I can't imagine a purpose for it at all. (And yes, those objects still do expose identity, but their equals() implementation is consenting to have that identity "forgotten" at any time just by round-tripping it through some collection etc.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Wed Nov 3 16:02:18 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Nov 2021 16:02:18 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> Message-ID: <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> On Nov 2, 2021, at 4:53 PM, Brian Goetz > wrote: Actually, that makes me start to wonder if `getClass()` should be another method like `notify` that simply doesn't make sense to call on value types. (But we still need the two distinct Class instances per class anyway.) You could argue that it doesn't make sense on the values, but surely it makes sense on their boxes. But its a thin argument, since classes extend Object, and we want to treat values as objects (without appealing to boxing) for purposes of invoking methods, accessing fields, etc. So getClass() shouldn't be different. One way to thicken this thin argument is to say that Point is not really a class. It?s a primitive. Then it still has a value-set inclusion relation to Object, but it?s not a sub-class of Object. It is a value-set subtype. It?s probably fruitless, but worth brainstorming as a heuristic for possible moves, so? we could say that: - Point is not a class, it?s a primitive with a value set - Point is not a subclass of Object, it?s a subtype (with value set conversion, like int <: long) - !(Point *is a* Object) & (Point *has a* Object box) - Point does not (cannot) inherit methods from Object - Point can *execute* methods from Object, but only after value-set mapping From john.r.rose at oracle.com Wed Nov 3 17:10:49 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Nov 2021 17:10:49 +0000 Subject: [External] : Equality operator for identityless classes In-Reply-To: References: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> Message-ID: One of the long standing fixtures in the ecosystem is the set of idioms for correct use of op==/acmp. Another is lots of articles and IDE checkers which detect other uses which are dubious. It?s a problem that you cannot use op==/acmp by itself in most cases; you have to accompany it by a call to Object::equals. We might try to fix this problem, but it cannot be expunged from our billions of lines of pre-existing Java code. I like to call these equals-accompanying idioms L.I.F.E, or Legacy Idiom(s) For Equality. It shows up, canonically, in this method of ju.Objects: public static boolean equals(Object a, Object b) { return (a == b) || (a != null && a.equals(b)); } Thus, the defective character of op==/acmp is just (wait for it) a fact of L.I.F.E. and we cannot fight it too much without hurting ourselves. Turning that around, if L.I.F.E. is a dynamically common occurrence (as it is surely statically common) then we can expend JIT complexity budget to deal with it, and (maybe even) adjust JVM rules around the optimizations to make more edgy versions of the optimizations legal. Specifically, this JIT-time transform has the potential to radically reduce the frequency of op==/acmp: (a == b) || (a != null && a.equals(b)) => (a == null ? b == null : a.equals(b)) This only works if all possible methods selected from a.equals permit the dropping of op==. The contract of Object::equals does indeed allow this, but it is not enforced; the JVMS allows the contract to be broken, and the transform will expose the breakage. And yet, there are things we can do here to unlock this transform. More generally, for other L.I.F.E.-forms, I am confident we can build JIT transforms that reduce reliance on acmp, which is suddenly more expensive than its coders (and the original designers of Java) expect. Programmers who override Object::equals to (as you nicely say) disavow identity-based substitutability will probably write, prompted by their IDE, in a ceremonial mood, that one occurrence of op==/acmp to short-circuit the rest of their Foo::equals method. Or they may erase it, in a purifying mood. In either case, the above transform requires the JIT to examine such as either actually or potentially starting with a short-circuiting op==/acmp. In any case, such an identity comparison will be monomorphic in the receiver type, not a polymorphic multi-way dispatch on Object references. So this is not just moving around costs that stay the same; you can de-virtualize op==/acmp by moving it into the prologue of all Object::equal methods. (Non-compliant ones can be handled by splitting the entry point.) Once the actual or potential op==/acmp is found at the start of Foo::equals, we can then inline and reorder the checks in the body of the equals method. At that point the cost of op== starts to go to zero. This is old news; we?ve discussed it in Burlington now these many years ago. But I thought I?d remind us of it. And this is really a more hopeful approach to L.I.F.E. That is, even if we don?t do these JIT transforms in the first release, there is a path forward that eventually removes the unintentional costs of op==/acmp when L.I.F.E. throws them at us. All this can work without requiring a global move to a completely new operator (op===), surely an alien form of L.I.F.E. within our ecosystem. (Ba-DUM-ch!) From brian.goetz at oracle.com Wed Nov 3 17:21:18 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Nov 2021 13:21:18 -0400 Subject: [External] : Equality operator for identityless classes In-Reply-To: References: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> Message-ID: A related concern is that many existing uses of == are optimizations intended to short-circuit evaluation of `equals`, under the assumption that == is "much faster" than equals.? When the performance reality shifts, some of this code might get slower.? (Though in most cases it probably makes no difference.) If you assume that most uses of == are accidents, many of them might get less wrong; for example, using == on Integer (outside of the box cache) is almost always wrong, but will get less wrong in the future (since it will compare what's in the box.) This is both better and worse, in that fewer bugs will manifest as problems, but then bugs may sit undetected for longer. (Don't get me started on the "primitives are good for numerics" -> "numerics will want operator overloading" -> "oh crap, == already means something" problem.) On 11/3/2021 11:58 AM, Kevin Bourrillion wrote: > I imagine we might be constrained to this design by the need to > support compatible migration. So there may be nothing we can do. > > But there is a pretty serious problem here. > > Background: code like IdentityHashMap, which cares about /objects per > se /instead of what those objects /represent/, is unusual, > special-case, egghead, lift-the-caution-tape code. It is not normal. > It's surely more common in JDK code. But I strongly suspect that the > vast majority of `==` tests in the wild are not?expressing questions > of identity at all, but are abbreviations for `equals()` when the > developer happens to believe it's safe. Many of those are of course > bugs, and then there are plain accidental usages as well. > > Today, things are pretty okay because developers can learn that `==` > is a code smell. A responsible code reviewer has to think through each > one like this: > > 1. Look up the type. Is it a builtin, or Class? Okay, we're fine. > 2. Is it an enum? Okay, I resent having to go look it up when they > could have just used switch, but fine. > 3. Wait, is this weird code that actually cares about objects instead > of what they represent? This needs a comment. > > The problem is that now we'll be introducing a whole class of ... > classes ... for which `==` does something reasonable: only the ones > that happen to contain no references, however deeply nested! These > cannot at all be easily distinguished. This is giving bugs a really > fantastic way to hide. > > I think we'd better consider some heretical options, like introducing > `===` and `!==` as sugar for Object.equals(). It seems tragic to > imagine the entire world (except the special-case code) transitioning > over to that, as it's quite ugly. But it would lead to more > correct?code. Maybe you have other ideas. > > > On Wed, Nov 3, 2021 at 7:05 AM Brian Goetz wrote: > > Extrapolating, ACMP is a _substitutability test_; it says that > substituting one for the other would have no detectable differences. > Because all objects have a unique identity, comparing the > identities is > both necessary and sufficient for a substitutability test. > > > What you say here may be technically true, but people who override > equals() are already trying their best to disavow identity in the only > way they have. And that makes your statement here actually kinda > /wrong/. Being a necessary and sufficient substitutability test is > literally, exactly, what Object.equals() does (and never mind that > people might implement it /wrong/). If that method's purpose is not to > give classes control over their own substitutability test -- which > they /need!/?-- then I can't imagine a purpose for it at all. (And > yes, those objects still do expose identity, but their equals() > implementation is consenting to have that identity "forgotten" at any > time just by round-tripping it through some collection etc.) > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com From kevinb at google.com Wed Nov 3 17:23:20 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 10:23:20 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> Message-ID: On Wed, Nov 3, 2021 at 9:02 AM John Rose wrote: > One way to thicken this thin argument is to say that Point is not really a class. > It?s a primitive. Then it still has a value-set inclusion relation to Object, but it?s > not a sub-class of Object. It is a value-set subtype. I would spin it like this: `Point` absolutely is a class. But its instances are *values* (like ints and references are, but compound), and values *are still not objects*. We've said at times we want to "make everything an object", but I think the unification users really care about is everything being a *class instance*. I think this fits neatly with the current design: `Point` has no supertypes*, not even `Object`, but `Point.ref` does. (*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense or the "can inherit" sense. I don't know what the word is really supposed to mean. :-)) > - !(Point *is a* Object) & (Point *has a* Object box) > - Point does not (cannot) inherit methods from Object > - Point can *execute* methods from Object, but only after value-set mapping I'm a little fuzzy on what these accomplish for us, can you spell it out a bit? It sounds like a special rule treating Object methods differently from other supertype methods (?), which would be nice to not need. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Wed Nov 3 17:58:29 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Nov 2021 17:58:29 +0000 Subject: Consequences of null for flattenable representations Message-ID: As we just discussed in the EG, allowing null to co-exist with flattenable representations is a challenge. It is one we have in the past tried to avoid, but the very legitimate needs for (what we now call) reference semantics for all of Bucket 2 and some of Bucket 3 require us to give null a place at the table, even while continuing to aim at flattening nullable values, when possible. A good example of this is Optional, migrated from a Bucket 1 *value-based class* to a proper Bucket 2 *reference-based primitive*. (See that tricky change in POV?) Another example to keep in mind is the reference projection of a Bucket 3 type such as Complex.ref or Point.ref. The simplest way to support null is just to do what we do today, and buffer on the heap, with the option of a null reference instead of a reference to a boxed value. (We call such things ?buffers? rather than ?boxes? simply because, unlike int/Integer, the type of thing that?s in the box might not be denotably different from the type of the ?box? itself.) The next thing to do is inject a *pivot field* into the flattened layout of the primitive object. When this invisible field contains all zero bits, the flattened object encodes a null. All the other bits are either ignorable or must be zero, depending on what you are trying to do. This idea splits into two directions: How to work with ?pivoted? non-null values, and how to represent the pivot efficiently. Both lines of thought are more or less required exercises, once you allow null its place at the table. We know where null comes from (the null literal and aconst_null). Where do pivoted values come from? You need an original source of them for the initial value of ?this? in the primitive constructor (a factory method at the bytecode level). Specifically, you need that bit pattern which is almost but not quite all zero bits; the pivot field is set to the ?non-null? state but all other field values are zero. Then the constructor can get to work. This might be the job of an ?initialvalue? bytecode, which is a repackaging of the ?defaultvalue? bytecode. Given a suitable definition with suitable restrictions for initialvalue, a constructor uses a mix of initialvalue and withfield executions to get to its output state for ?this?. None of the intermediate states would be confusable with null. (We sometimes assumed, wrongly in hindsight, that doing this simply requires assigning ?this? to null in the constructor and then special-casing withfield and maybe getfield to allow a null input and maybe a null output. But this is a thicket of tangles and irregularities, and it doesn?t quite get rid of the need for a separate operation to actually set the pivot field. Basically, once null gets entrenched, defaultvalue has to turn into initialvalue, or so it appears to me at this moment.) Once the constructor returns a non-null set of bits, all subsequent assignments continue to separate null from non-null. That?s true even for racy assignments, assuming that pivot field states are individually atomic, even if they race relative to other fields. (Race control might be important for Bucket 3 references like Complex.ref, if we ever try to flatten those. I?m digressing; my focus is to build out Bucket 2, which suppresses such races.) To allow Bucket 2 constructors control over their outputs, it follows that initialvalue (unlike its earlier version defaultvalue) must be restricted to those same contexts where withfield is allowed. Either to constructors only (for the same class) or to the capsule (nest) of that class. OK, so how is the pivot field physically represented? Again, we have discussed this in years past, but I?ll summarize some of the thinking: 1. It can be just a boolean, a byte or a packed bit that is made free somehow. A 65th bit to a 64-bit payload perhaps. This is sad, but also hard to get around when every single bitwise encoding in the existing layout already has a meaning. But the payload of the primitive type might use a field with ?slack?, aka unused bitwise encodings. We can pounce on this and use bit-twiddling to internally reserve the zero state, and declare that when that field is zero, it is the pivot field denoting null, and when it is non-zero it is doing its normal job. 2. If the language tells us, ?yes I promise not to use the default value on this field? then maybe the JVM can do something with that promise. There are issues, but it?s tempting for (say) a Rational type where the denominator is never zero. 3. More reliably, if the JVM knows that the a field has unused encodings, it can just swap the all-zero state with some other state. People will immediate think of unused bits which can be flipped to true in the field when it is pivoted to non-null. It?s better, IMO, to start out with the humble increment operator (rather than the bit-set operator) and work from there. As long as the encoding of all-one-bits is not taken, for a given field (true for booleans and managed pointers!) then the JVM can simply perform an unsigned non-overflowing increment when storing payload to the pivot field (preserving the non-zero invariant) and do a non-overflowing unsigned decrement when loading. I can just hear the GC folks groaning in the distance about such increments, on managed pointers. For them, a slightly less JIT-friendly operation might be preferable, to perform the increment (on store) only when the value is null, and vice versa on load, decrement only when 1. Or use bit twiddling in the low bits of the pointer. Or use all-one-bits as the ?payload null? which is distinct from the ?pivot is zero? state. I think the JIT and GC folks can come to an agreement, in any given JVM. When the JIT people groan back about weirdo encodings of managed pointers, we can gently tell them, ?it?s just another flavor of managed pointer transcoding, a problem we solved when we went to compressed oops.? (On balance, I think the GC should define a small family of ?quasi-null sentinel values? which can be easily stored into any managed pointer for ad hoc purposes like this and others. Others would be at least 1. an Optional::isEmpty state for optionals *which are null-friendly* and 2. a distinction between null and unbound, for lazy variables which are also null-friendly. Neither of these exist today, of course, and none of these hypothetical sentinels would ever be visible to normal Java code.) My point is that we don?t have to just slap a boolean on everything. In particular, when migrating ju.Optional to Bucket 2, we can preserve its very attractive one-field representation by invisibly assigning a bad managed pointer value to encode Optional::isEmpty. No Java code changes are needed (or desired) to pull this off, just the increment hack sketched above, or one of its variations. Even Bucket 3 references could be encoded in this way, if and when we desire to. That is, whatever JVM algorithm constructors a pivot field and its logic could be pointed at a Bucket 3 reference projection, if we think this would be desirable. One result would be that Map.get, which returns T.ref, could avoid buffering on the heap. N.B. This assumes stuff we don?t have yet, to specialize Map::get to a particular flattenable type. I hope we will get there. ? John From daniel.smith at oracle.com Wed Nov 3 18:04:51 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 Nov 2021 18:04:51 +0000 Subject: Equality operator for identityless classes In-Reply-To: References: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> Message-ID: On Nov 3, 2021, at 9:58 AM, Kevin Bourrillion > wrote: Today, things are pretty okay because developers can learn that `==` is a code smell. A responsible code reviewer has to think through each one like this: 1. Look up the type. Is it a builtin, or Class? Okay, we're fine. 2. Is it an enum? Okay, I resent having to go look it up when they could have just used switch, but fine. 3. Wait, is this weird code that actually cares about objects instead of what they represent? This needs a comment. The problem is that now we'll be introducing a whole class of ... classes ... for which `==` does something reasonable: only the ones that happen to contain no references, however deeply nested! These cannot at all be easily distinguished. This is giving bugs a really fantastic way to hide. I'm not sure about this leap: while it's true that `==` is sometimes equivalent to `equals`, in general, you can't be sure without deep knowledge about the class. As a coding convention, seems reasonable to me to continue to expect clients to use `equals` rather than trying to develop a finer-grained distinction between different classes. I think it's perfectly fine advice for most code to continue to treat `==` as a smell, like they always have. From john.r.rose at oracle.com Wed Nov 3 18:07:55 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Nov 2021 18:07:55 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> Message-ID: On Nov 3, 2021, at 10:23 AM, Kevin Bourrillion > wrote: I think this fits neatly with the current design: `Point` has no supertypes*, not even `Object`, but `Point.ref` does. (*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense or the "can inherit" sense. I don't know what the word is really supposed to mean. :-)) Slippery terms. ?Type? is hopelessly broad as is ?super type?. For types as value sets, a super type is a value super set. Again, int <: long in this view, and even in the JLS. For types as in an object hierarchy, a super type is a parent+ type, an upper limit in the hierarchy lattice. That view centers on object polymorphism and virtual methods, and is suspiciously bound up with pointer polymorphism. So String <: Object in this view. To heal the rift we are groping towards int <: Object, but we don?t fully know which kind of ?<:? that is, and how it breaks down into a value set super, an object hierarchy super, or perhaps something further. The best view we have so far, IMO, is that int <: Object breaks apart into int <: int.ref (value set) and int.ref <: Object (hierarchy). In that view, the last link of int <: int.ref requires a story of how methods ?inherit? across value sets, without the benefit of a pointer-polymorphic hierarchy to inherit within. It?s doable, but we are running into the sub-problems of this task. From kevinb at google.com Wed Nov 3 18:23:37 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 11:23:37 -0700 Subject: Equality operator for identityless classes In-Reply-To: References: <160ba720-56af-8657-5afb-b94c6da45088@oracle.com> Message-ID: On Wed, Nov 3, 2021 at 11:05 AM Dan Smith wrote: I'm not sure about this leap: while it's true that `==` is sometimes > equivalent to `equals`, in general, you can't be sure without deep > knowledge about the class. As a coding convention, seems reasonable to me > to continue to expect clients to use `equals` rather than trying to develop > a finer-grained distinction between different classes. I think it's > perfectly fine advice for most code to continue to treat `==` as a smell, > like they always have. > That is the "hygienic" line that we've been sorta-holding inside Google with modest success. And I think it's the direction of gravity among static analysis tools and really good style guides etc. But it's a pretty hard line to hold as it stands, because the visceral appeal of `==` is just too strong, and `!=` much stronger yet. I think it would get near-impossible once there are a proliferation of user-defined identityless classes where `==` "happens to be safe". We'd plead the case that "sure, they're safe now, but this kind of unsafety is viral, so it's fragile", yadda yadda, but who knows. (One thing that has maybe helped us hold the line is that the most common cases are enums, and we get to say "eh, `switch` is better anyway". So I guess it's worth noting that at least other types supporting pattern matching will have this same escape valve. That said, switch with only one arm is a tough pill for people to swallow (I don't recall if `instanceof` does or means to support *all* such cases or not).) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From daniel.smith at oracle.com Wed Nov 3 18:24:19 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 Nov 2021 18:24:19 +0000 Subject: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> Message-ID: <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> On Nov 3, 2021, at 11:23 AM, Kevin Bourrillion > wrote: On Wed, Nov 3, 2021 at 9:02 AM John Rose > wrote: > One way to thicken this thin argument is to say that Point is not really a class. > It?s a primitive. Then it still has a value-set inclusion relation to Object, but it?s > not a sub-class of Object. It is a value-set subtype. I would spin it like this: `Point` absolutely is a class. But its instances are values (like ints and references are, but compound), and values are still not objects. We've said at times we want to "make everything an object", but I think the unification users really care about is everything being a class instance. I think this fits neatly with the current design: `Point` has no supertypes*, not even `Object`, but `Point.ref` does. (*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense or the "can inherit" sense. I don't know what the word is really supposed to mean. :-)) These sorts of explanations make me uncomfortable?that a Point stored in a reference isn't really a Point anymore, but a "box" or something like that. The problem is that you want to say that the Point gets converted to some other thing, yet that other thing: - is == to the original - provides the exact same API as the original - has the exact same behaviors as the original - works exactly like a class declared with original class's declaration If you're telling people that when you assign a Point to type Object, they now have something other than a Point, they're going to want to *see* that somehow. And of course they can't, because the box is a fiction. The reference vs. value story that we developed to address these problems (and problems that arise when you *do* let people "see" a real box) carries the right intuitions: you can handle a Point by value or by reference, but either way it's the exact same object, so of course everything you do with it will work the same. From brian.goetz at oracle.com Wed Nov 3 18:34:52 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Nov 2021 14:34:52 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> Message-ID: There's lots of great stuff on subtyping in chapters 15 and 16 of TAPL (esp 15.6, "Coercion semantics"), which might be helpful.? But as a tl;dr, I would suggest treating subtyping strictly as an is-a relation within our nominal type system.? By this interpretation, int On Nov 3, 2021, at 10:23 AM, Kevin Bourrillion wrote: >> >> I think this fits neatly with the current design: `Point` has no >> supertypes*, not even `Object`, but `Point.ref` does. >> >> (*I mean "supertype" in the polymorphic sense, not the "has a >> conversion" sense or the "can inherit" sense. I don't know what the >> word is really supposed to mean. :-)) > > Slippery terms. ??Type? is hopelessly broad as is ?super type?. > > For types as value sets, a super type is a value super set. > Again, int <: long in this view, and even in the JLS. > > For types as in an object hierarchy, a super type is a parent+ > type, an upper limit in the hierarchy lattice. ?That view > centers on object polymorphism and virtual methods, > and is suspiciously bound up with pointer polymorphism. > So String <: Object in this view. > > To heal the rift we are groping towards int <: Object, but > we don?t fully know which kind of ?<:? that is, and how > it breaks down into a value set super, an object hierarchy > super, or perhaps something further. ?The best view we > have so far, IMO, is that int <: Object breaks apart into > int <: int.ref (value set) and int.ref <: Object (hierarchy). > In that view, the last link of int <: int.ref requires a > story of how methods ?inherit? across value sets, > without the benefit of a pointer-polymorphic hierarchy > to inherit within. ?It?s doable, but we are running > into the sub-problems of this task. > From kevinb at google.com Wed Nov 3 19:00:20 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 12:00:20 -0700 Subject: identityless objects and the type hierarchy Message-ID: Okay, let's stick a pin in proper-value-types (i.e. try to leave them out of this discussion) for a moment... One question is whether the existing design for the bifurcated type hierarchy will carry right over to this split instead. (My understanding of that design is: every (non-Object) concrete class will implement exactly one of two disjoint interfaces, explicitly or not.) My first thoughts were that the situation is different here: exposed identity seems to be strictly (maybe?) contractually stronger than no exposed identity. So here, a class being "noncommittal" *ought to* look the same as it being identityless. In theory, it should be harmless for an identity class to extend an identityless class (while the opposite direction is a problem). So, first, is that even right? Next, even if so, the Backward Default Problem strikes again. To make a class identityless you would seem to need all your *supertypes* to be, first! That's hard to pull off. And `Object` itself would seem to want to be marked identityless, which is obviously weird/problematic. So I think we are forced back to a tripartite model (somewhat like we are having to do with nullness, but probably closer to what we'll have to do after nullness for `@OkayToIgnoreReturnValue`). "intentionally identityful" is-stronger-than "intentionally identityless" is-stronger-than "unknown so will be *presumed* identityful unless otherwise specified" It's possible that would put us straight back to where this email started. But this all smells rotten, like it demands we find a simpler way to think about it (which you may already know, and I'm just missing it so far). -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Wed Nov 3 19:24:13 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Nov 2021 19:24:13 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> Message-ID: <21F5CF86-2C4B-4ABD-97FC-AC607527EFF6@oracle.com> On Nov 3, 2021, at 11:34 AM, Brian Goetz wrote: > > There's lots of great stuff on subtyping in chapters 15 and 16 of TAPL (esp 15.6, "Coercion semantics"), which might be helpful. But as a tl;dr, I would suggest treating subtyping strictly as an is-a relation within our nominal type system. By this interpretation, int cp = point.getClass(); //Point Class ci = anint.getClass(); //Integer (aka int.ref) but this: Class cp = point.getClass().valueType(); //Point Class ci = anint.getClass().valueType(); //int or else this: Class cp = point.getClass().referenceType(); //Point.ref Class ci = anint.getClass().referenceType(); //Integer In other words, if the rift between Integer and Point is not completely healed, users can probably work around the problems. From brian.goetz at oracle.com Wed Nov 3 19:42:55 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Nov 2021 15:42:55 -0400 Subject: identityless objects and the type hierarchy In-Reply-To: References: Message-ID: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com> On 11/3/2021 3:00 PM, Kevin Bourrillion wrote: > Okay, let's stick a pin in proper-value-types (i.e. try to leave them > out of this discussion) for a moment... > > One question is whether the existing design for the bifurcated type > hierarchy will carry right over to this split instead. (My > understanding of that design is: every (non-Object) concrete class > will implement exactly one of two disjoint interfaces, explicitly or not.) > > My first thoughts were that the situation is different here: exposed > identity seems to be strictly (maybe?) contractually stronger than no > exposed identity. So here, a class being "noncommittal" /ought to/ > look the same as it being identityless. In theory, it should be > harmless for an identity class to extend an identityless class (while > the opposite direction is a problem). > > So, first, is that even right? We went back and forth on this a few times.? A useful lens is to ask: how might we depend on reflecting identity-{ful,less}ness in the hierarchy?? These include: ??? if (x instanceof IdentityObject) { ... } ??? void m(IdentityObject o) { ... } ??? m(T t) { ... } It is worth noting that the first is invertible (we can negate the condition) but the latter two are not.? Which is another way to say that, if anyone, anywhere, might want to write code that *requires* no identity, then we should consider giving them a way to do it. (Ideally, if you're planning on (say) synchronizing on a parameter, you should engage the type system to ensure that an identityful object is passed; this is a good use of the type system.) > Next, even if so, the Backward Default Problem strikes again. To make > a class identityless you would seem to need all?your /supertypes/ to > be, first! That's hard to pull off. And `Object` itself would seem to > want to be marked identityless, which is obviously weird/problematic. The superclass chain is tricky, but we've spent a lot of time shaking this box.? Some types are _identity-agnostic_.? These include interfaces that do not extend PrimitiveObject, abstract classes that meet some set of conditions, and Object.? The supertypes of a primitive class (and of an identity-agnostic class) must be identity-agnostic. This is powerful.? For example, an interface could extend IdentityObject, which would effectively prohibit primitive classes from implementing it.? This is a way to signal "my (concrete) subtypes need identity." From kevinb at google.com Wed Nov 3 21:40:58 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 14:40:58 -0700 Subject: Consolidating the user model In-Reply-To: <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> Message-ID: Certainly possible I'm just misunderstanding something. (I don't *think* I am...) On Wed, Nov 3, 2021 at 11:24 AM Dan Smith wrote: > On Nov 3, 2021, at 11:23 AM, Kevin Bourrillion wrote: > > On Wed, Nov 3, 2021 at 9:02 AM John Rose wrote: > > > One way to thicken this thin argument is to say that Point is not really > a class. > > It?s a primitive. Then it still has a value-set inclusion relation to > Object, but it?s > > not a sub-class of Object. It is a value-set subtype. > > I would spin it like this: `Point` absolutely is a class. But its > instances are *values* (like ints and references are, but compound), and > values *are still not objects*. > > We've said at times we want to "make everything an object", but I think > the unification users really care about is everything being a *class > instance*. > > I think this fits neatly with the current design: `Point` has no > supertypes*, not even `Object`, but `Point.ref` does. > > (*I mean "supertype" in the polymorphic sense, not the "has a conversion" > sense or the "can inherit" sense. I don't know what the word is really > supposed to mean. :-)) > > > These sorts of explanations make me uncomfortable?that a Point stored in a > reference isn't really a Point anymore, but a "box" or something like that. > Yes exactly. I will be talking about why I think it's probably *good* to think of it as a box. > The problem is that you want to say that the Point gets converted to some > other thing, yet that other thing: > - is == to the original > I would hope that's already true of int==Integer? > - provides the exact same API as the original > - has the exact same behaviors as the original > Agreed that Point and Point.ref are different types that have the same members/features. One-class-multiple-types is not entirely without precedent (though, sure, List and List and List don't have *exactly* the same API). Once you accept that they're different types, then the fact they have the same API is just convenient. - works exactly like a class declared with original class's declaration > It's the same class. There's only one class. (There are two java.lang.Classes, because what that type models is not "class", it's something more like "an erased type or void" .) If you're telling people that when you assign a Point to type Object, they > now have something other than a Point, they're going to want to *see* that > somehow. And of course they can't, because the box is a fiction. > What would they want to see? What is there to see about an object? Maybe its header, its dynamic type -- and uh, those things must be there, right?. because how could I use it polymorphically otherwise. I'm not sure what else would be meant by "seeing" the thing. Fictions are great things when they don't leak. I don't see the leak here yet. I'll attempt to flip this around on you. :-) You say that a *value* of type Point is also already an "object". But then where is its header, its dynamic type? Objects have that. For whatever reason this seemed like the more conspicuous leak to me. > The reference vs. value story that we developed to address these problems > (and problems that arise when you *do* let people "see" a real box) carries > the right intuitions: you can handle a Point by value or by reference, but > either way it's the exact same object, so of course everything you do with > it will work the same. > I'm claiming this picture makes explaining the feature harder, unnecessarily. An unhoused value floating around somewhere that I can somehow have a reference to strikes me as quite exotic. Tell me it's just an object and I feel calmer. But I'll write a more proper explanation of why I think this is the wrong retcon for "object". -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From daniel.smith at oracle.com Wed Nov 3 23:05:52 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 Nov 2021 23:05:52 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> Message-ID: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> On Nov 3, 2021, at 3:40 PM, Kevin Bourrillion > wrote: The problem is that you want to say that the Point gets converted to some other thing, yet that other thing: - is == to the original I would hope that's already true of int==Integer? Formally, you can't literally compare an int with an Integer. All comparisons between a boxed Integer and an int have to decide if they're primitive comparisons, reference comparisons, or illegal, based on some rather complex conversions and disambiguation rules. At runtime, if the types you use result in a reference comparison, the answer depends on quirks of the interning logic. Informally, whatever path you take, where boxed Integers are involved, == is unreliable, because you may indeed be comparing two different objects that happen to have been derived from the same number. Now, if we kept `int` and `Integer` as distinct things, but turned `Integer` into an identity-free class, I suppose it's true that you wouldn't be able to tell whether two boxes were distinct or not, because == would always be true. (More properly, "are these distinct boxes with the same payload?" would be a malformed question to ask, because it presumes identity.) So, okay: to be fair to these reimagined boxes, I'll stipulate that they are identity-free, and thus indistinguishable with ==. - provides the exact same API as the original - has the exact same behaviors as the original Agreed that Point and Point.ref are different types that have the same members/features. One-class-multiple-types is not entirely without precedent (though, sure, List and List and List don't have exactly the same API). Once you accept that they're different types, then the fact they have the same API is just convenient. - works exactly like a class declared with original class's declaration It's the same class. There's only one class. (There are two java.lang.Classes, because what that type models is not "class", it's something more like "an erased type or void" .) Is your model that, where there are n possible Points, there are in fact 2n instances of class Point, where half of them are "values" and half of them are "boxes"? I would find that pretty confusing, but I'm not sure it's what you mean. I would want to be able to somehow distinguish which subset an instance belonged to. Or is it your model that, when you convert a value to a box, the two things are the same class instance, just manifested or encoded differently? That's actually not that far from the model we've described, which is that it's the same instance, just *viewed* or *accessed* differently. Those are different verbs, and so the models might not be interchangeable, but they're close. If you're telling people that when you assign a Point to type Object, they now have something other than a Point, they're going to want to *see* that somehow. And of course they can't, because the box is a fiction. What would they want to see? What is there to see about an object? Maybe its header, its dynamic type -- and uh, those things must be there, right?. because how could I use it polymorphically otherwise. I'm not sure what else would be meant by "seeing" the thing. I think my intuitions about boxes tie heavily to 'getClass' behavior (or some analogous reflective operation). "What are you?" should give me different answers for a bare value and a box. A duck in a box is not the same thing as a duck. The analogy here would be that Integer.getClass() returns Integer.class, while int.getClass(), if it existed, would return int.class. I might want to write code like: void m(T arg) { if (arg.getClass() == Point.class) System.out.println("I'm a value!"); else System.out.println("I'm a box!"); } But this isn't the runtime behavior we would intend to support, because in fact at runtime there are no boxes to reflect. I'll attempt to flip this around on you. :-) You say that a value of type Point is also already an "object". But then where is its header, its dynamic type? Objects have that. For whatever reason this seemed like the more conspicuous leak to me. The value type/reference type model is that you can operate on an object directly, or by reference. It's the same object either way. Reference conversion just says "take this object and give me a reference to it". Nothing about the object itself changes. The details of object encoding are deliberately left out of the model, but it's perfectly fine for you to imagine a header and a dynamic type carried around with the object always, both when accessed as a value and when accessed via a reference. (It is, I suppose, part of the model that objects of a given class all have a finite, matching layout when accessed by value, even if the details of that layout are kept abstract. Which is why value types are monomorphic and you need reference types for polymorphism.) The fact that the VM often discards object headers at runtime is a pure optimization. I'm claiming this picture makes explaining the feature harder, unnecessarily. An unhoused value floating around somewhere that I can somehow have a reference to strikes me as quite exotic. Tell me it's just an object and I feel calmer. Yes, it's just an object. :-) But not quite how you mean. The new feature here is working with objects *directly*, without references. I think one thing you're struggling with is that your concept of "object" includes the reference, and if we take that away, it doesn't quite seem like an object anymore. From kevinb at google.com Thu Nov 4 00:19:04 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 17:19:04 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: (Note that shipping bucket-2 shouldn't require us to agree on any of this stuff below.) On Wed, Nov 3, 2021 at 4:06 PM Dan Smith wrote: > - provides the exact same API as the original >> - has the exact same behaviors as the original >> > > Agreed that Point and Point.ref are different types that have the same > members/features. > > One-class-multiple-types is not entirely without precedent (though, sure, > List and List and List don't have *exactly* the same API). > > Once you accept that they're different types, then the fact they have the > same API is just convenient. > > > - works exactly like a class declared with original class's declaration >> > > It's the same class. There's only one class. > > (There are two java.lang.Classes, because what that type models is not > "class", it's something more like "an erased type or void" .) > > > Is your model that, where there are n possible Points, there are in fact > 2n instances of class Point, where half of them are "values" and half of > them are "boxes"? > ... Yes? but it's an odd way to put it; I'll explain. The model I'm speaking for says that values and objects are two different and disjoint kinds of things. So there are n possible Point values (according to !=) and there are n corresponding possible Point.ref objects (according to !=). But I wouldn't have put the numbers together into one number "2n", because I don't think there's anything a program could actually count that would turn up that answer. (It's a biiiit like asking "how many continents or cardinal directions are there?" and I just answer "11".) Sorry, I belabored that point a bit overmuch. I would find that pretty confusing, but I'm not sure it's what you mean. I > would want to be able to somehow distinguish which subset an instance > belonged to. > I don't see what distinguishing there is to do? You always definitively either have a value or you have a reference value pointing to some object. You know that before there's even any question to ask... right? Or is it your model that, when you convert a value to a box, the two things > are the same class instance, just manifested or encoded differently? > > That's actually not that far from the model we've described, which is that > it's the same instance, just *viewed* or *accessed* differently. Those are > different verbs, and so the models might not be interchangeable, but > they're close. > > If you're telling people that when you assign a Point to type Object, they >> now have something other than a Point, they're going to want to *see* that >> somehow. And of course they can't, because the box is a fiction. >> > > What would they want to see? What is there to see about an object? Maybe > its header, its dynamic type -- and uh, those things must be there, right?. > because how could I use it polymorphically otherwise. I'm not sure what > else would be meant by "seeing" the thing. > > > I think my intuitions about boxes tie heavily to 'getClass' behavior (or > some analogous reflective operation). "What are you?" should give me > different answers for a bare value and a box. A duck in a box is not the > same thing as a duck. > > The analogy here would be that Integer.getClass() returns Integer.class, > while int.getClass(), if it existed, would return int.class. > So far so good. If `int.getClass()` has to work at all, it might as well produce `int.class`, though it serves no actual purpose and we would just refactor it to `int.class` anyway. If `int.getClass()` won't even compile, it would be no great loss at all. The method exists for finding the dynamic type of an object; my model says "values are not objects and so have no dynamic type", which I think is good. I might want to write code like: > > void m(T arg) { > if (arg.getClass() == Point.class) System.out.println("I'm a value!"); > else System.out.println("I'm a box!"); > } > Someone might think this, but they can just ask themselves whether `int/Integer` work like that. They don't, so this doesn't either. This is one example of why users can *keep* almost everything they already know about `int/Integer`. But this isn't the runtime behavior we would intend to support, because in > fact at runtime there are no boxes to reflect. > > I'll attempt to flip this around on you. :-) You say that a *value* of > type Point is also already an "object". But then where is its header, its > dynamic type? Objects have that. For whatever reason this seemed like the > more conspicuous leak to me. > > The value type/reference type model is that you can operate on an object > directly, or by reference. It's the same object either way. > I will be writing out my argument for why this is nonsense. :-) Not meant to sound rude (I didn't know it to be nonsense myself a month ago). > Reference conversion just says "take this object and give me a reference > to it". Nothing about the object itself changes. > > The details of object encoding are deliberately left out of the model, but > it's perfectly fine for you to imagine a header and a dynamic type carried > around with the object always, both when accessed as a value and when > accessed via a reference. > Huh. It seems to me very important to understand that when I use Point (not Point.ref) there is no header involved. Values are not self-describing, which is a big part of their appeal! This no-header fact is also what explains to me why values have to be not just layout-monomorphic (as you mention next) but entirely, strictly monomorphic. (It is, I suppose, part of the model that objects of a given class all have > a finite, matching layout when accessed by value, even if the details of > that layout are kept abstract. Which is why value types are monomorphic and > you need reference types for polymorphism.) > > The fact that the VM often discards object headers at runtime is a pure > optimization. > > I'm claiming this picture makes explaining the feature harder, > unnecessarily. An unhoused value floating around somewhere that I can > somehow have a reference to strikes me as quite exotic. Tell me it's just > an object and I feel calmer. > > > Yes, it's just an object. :-) > > But not quite how you mean. The new feature here is working with objects > *directly*, without references. I think one thing you're struggling with is > that your concept of "object" includes the reference, and if we take that > away, it doesn't quite seem like an object anymore. > Not struggling. So there is a body of associations our users have with terms like "object", "class", "primitive", and so on. Many of them are even right (so far). But they can't all survive intact. To best serve these users, we have to make a careful determination between which of those associations we deem are the *essential* ones and which are merely circumstantial or ancillary. And then we want for the essential portion to change as little as possible with the release of Valhalla. Users want to feel lots of stable ground underneath them -- and if it takes a bit of retraining to show them why it's stable ground, that's still a pretty good outcome; that is still better than having to tell them "just change what you think you know to this new thing." You and I are just advocating for different places to make that cut, that's all. I think mine represents more "stable ground", but I accept the burden of argument here, and will keep working on it. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Thu Nov 4 01:34:52 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 4 Nov 2021 01:34:52 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: On Nov 3, 2021, at 4:05 PM, Dan Smith > wrote: (It is, I suppose, part of the model that objects of a given class all have a finite, matching layout when accessed by value, even if the details of that layout are kept abstract. Which is why value types are monomorphic and you need reference types for polymorphism.) The fact that the VM often discards object headers at runtime is a pure optimization. Let?s see what happens if we say that (a) bare values have headers and (b) Object::getClass allows the user to observe part of the header contents. It follows then that the expression aPointVal.getClass() will show the contents of aPointVal?s header, even if it is a compile-time constant. Point pv = new Point(42,42); // ?class Point? is the definition of Point assert pv.getClass() == Point.class; // ok, that?s certainly the class assert pv.getClass() != Point.ref.class; // and it?s not a ref, so good That is all fine. There?s a little hiccup when you ?box? the point and get the same Class mirror even though the ?header? is a very real-heap resident value now: Point.ref pr = pv; // same object? now it?s on the heap, though, with a real live heap header assert pr.getClass() == Point.class; // same class, but... assert pr.getClass() != Point.ref.class; // we suppress any distinction the heap header might provide There?s a bigger hiccup when you compare all that with good old int: int iv = 42; // ?class int? is NOT a thing, but ?class Integer? is assert iv.getClass() != int.class; // because int is not a class assert iv.getClass() == Integer.class; // ah, there?s the class! assert iv.getClass() == int.ref.class; // this works differently from Point assert ((Object)iv).getClass() == pr.getClass(); // this should be true also, right? And to finish out the combinations: int.ref ir = iv; // same object? now it?s on the heap, though, with a real live heap header assert ir.getClass() == Integer.class; // same class assert ir.getClass() == int.ref.class; // and this time it?s a ref-class (only for classic primitives) assert ir.getClass() != int.class; All this has some odd irregularities when you compare what Point does and what int does. And yet it?s probably the least-bad thing we can do. A bad response would be to follow the bad precedent of ir.getClass() == Integer.class off the cliff, and have pv.getClass() and pr.getClass() return Point.ref.class. That way, getClass() only returns a ref. Get it, see, getClass() can only return reference types. The rejoinder (which Brian made to me when I aired it) is devastating: Point.class is the class, not Point.ref.class, and the method is named ?get-class?. Another approach would be to fiddle with the definitions of val.getClass(), so as to align iv.getClass() with pv.getClass() with their non-ref types. But that still leaves pv.getClass() unaligned (in its non-ref-ness) with ir.getClass() (in its ref-ness). We still expect Point.class as the answer from *both* pr.getClass() and pv.getClass(). Or we could try to make the problem go away by simply outlawing (statically) instances of expr.getClass() that expose inconvenient answers. Such moves score high on the ?Those Idiots? score card. And they still doesn?t align the ref-ness of pr.getClass() vs. ir.getClass(). Maybe we only earn partial Idiot Points if we outlaw iv.getClass() but allow pv.getClass()? Same amount of seam, different shape of seam, IMO. Another source of constraint is that we expect that up-casting anything to Object and then re-querying should not change the answer. (This is another way of saying that the header should stay the same whether it is in the heap or not.) It is one of the reasons that iv.getClass() should not return int.class. assert ((Object)pv).getClass() == pv.getClass(); // this should be true also, right? assert ((Object)pr).getClass() == pr.getClass(); // this should be true also, right? assert ((Object)iv).getClass() == iv.getClass(); // this should be true also, right? assert ((Object)ir).getClass() == ir.getClass(); // this should be true also, right? This is an over-constrained problem. I don?t know how to make it look more regular, and I think (after doing some more exhaustive analysis off-line) there aren?t any other ideas we haven?t examined. (I?m saying that partly in a superstitious hope that, having said it, someone will of course prove me wrong.) I'm claiming this picture makes explaining the feature harder, unnecessarily. An unhoused value floating around somewhere that I can somehow have a reference to strikes me as quite exotic. Tell me it's just an object and I feel calmer. Yes, it's just an object. :-) But not quite how you mean. The new feature here is working with objects *directly*, without references. I think one thing you're struggling with is that your concept of "object" includes the reference, and if we take that away, it doesn't quite seem like an object anymore. The lack of ?null? in the value set is a small but persistent hint that something has changed in the object representation. We can double down on the model that a val-object has a header. It?s not in the heap; it has a statically defined value; it exists (if at all) to assist with Object::getClass and the other methods as needed. It feeds getClass with the val-projection, not the ref-projection. We are so sorry, Mr. int. You don?t really pass as a primitive class. If an int has a header (on stack or on heap), it feeds getClass with the ref-projection Integer.class, not the val-projection int.class, because your class is Integer, a ref-type (one of 8 or 9 such types). It?s a seam. BTW, here?s another look at the difference between Mr. int and Mr. Point: var pv = new Point(42,42); // var infers Point (a val type) assert new Point(42,42).getClass() == Point.class; // OK //var pr = new Point.ref(42,42); // nope, Point.ref is not ?class Point? //assert new Point.ref(42,42).getClass() == Point.ref.class; // cannot ask this question var ir = new Integer(42); // var infers Integer (a ref type) assert new Integer(42).getClass() == Integer.class; // OK, but I don?t like Integer as much as Point //var iv = new int(42); // sorry, Mr. int, you don?t get to play there //assert new int(42).getClass() == int.class; // cannot ask this question Did I get all the details right, Dan and Brian? ? John From kevinb at google.com Thu Nov 4 05:29:23 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 22:29:23 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> Message-ID: On Tue, Nov 2, 2021 at 4:53 PM Brian Goetz wrote: ## Reflection >> >> Earlier designs all included some non-intuitive behavior around >> reflection. >> What we'd like to do is align the user-visible types with reflection >> literals >> with descriptors, following the invariant that >> >> new X().getClass() == X.class >> > > Seems like part of the goal would be making it fit naturally with the > current int/Integer relationship (of course, `42.getClass()` is uncommitted > to any precedent). > > > There's a nasty tension here. On the one hand, for B3 classes, it makes > sense for b3.getClass() to yield the val mirror, but int.getClass() > historically corresponds to the ref mirror (Object o = 3; o.getClass() == > Integer.class.) > I'm confused at why there's any concern here. `anInt.getClass()` has never existed, so it can do anything. The code snippet you show is obviously boxing so of course the class is that of the box. You could argue that it doesn't make sense on the values, but surely it > makes sense on their boxes. But its a thin argument, since classes extend > Object, and we want to treat values as objects (without appealing to > boxing) for purposes of invoking methods, accessing fields, etc. So > getClass() shouldn't be different. > Sorry for this: I'm not trying to push values-aren't-objects relentlessly in multiple threads, but I think what you said is a little off, and worth pushing back on no matter which model is the one we ship. "Invoking methods, accessing fields" really seem a lot like things you can do with *class instances*, and who cares if it's an "object" or not. Consider two points: we call them "instance methods", not "object methods", and where do they come from? From a class. This seems to me to be at the heart of what classes and class instances are about. I don't find a reason to utter the word "object" while talking about this. When do I, then? Well.... You say "we want to treat values as objects for purposes of invoking methods." But I'm not sure you really want that. :-) *Objects* have dynamic dispatch, so it has to dereference and check the dynamic type and re-resolve what method to actually call. Values have none of that junk, just call the method. RIght? I think that's significant. What I think you mean (?) is that invoking methods needs to *work* for all kinds of class instances. But it works a bit *differently* for values vs. objects (in my parlance; in your current parlance, that's "it works differently for values vs. objects-except-values"). (If preparing to respond that methods on a static type that's final don't have dynamic dispatch either, meh, that's more simply understood as "of course you don't actually query when you already know the answer; that's just optimization". Whereas with the value type, the idea of this querying at all isn't even a thing.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Nov 4 06:54:56 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Nov 2021 23:54:56 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: On Wed, Nov 3, 2021 at 6:35 PM John Rose wrote: Let?s see what happens if we say that (a) bare values have headers > and (b) Object::getClass allows the user to observe part of the header > contents. > I'm asking specific questions below as best I can, but I must confess that I don't really follow what this thought experiment is trying to demonstrate overall. It follows then that the expression aPointVal.getClass() will show the > contents of aPointVal?s header, even if it is a compile-time constant. > > Point pv = new Point(42,42); // ?class Point? is the definition of Point > assert pv.getClass() == Point.class; // ok, that?s certainly the class > (Header or not, that could just be a hardcoded synthetic method anyway? Have I been missing something big when I keep pointing out that this method would be patently useless? When static type is MyValueClass, result is MyValueClass.class, always, no?) > assert pv.getClass() != Point.ref.class; // and it?s not a ref, so good > > That is all fine. There?s a little hiccup when you ?box? the point and > get the same Class mirror even though the ?header? is a very real-heap > resident value now: > > Point.ref pr = pv; // same object? now it?s on the heap, though, with a > real live heap header > assert pr.getClass() == Point.class; // same class, but... > Why would we even want this? It would be very surprising/puzzling to me. > assert pr.getClass() != Point.ref.class; // we suppress any distinction > the heap header might provide > > There?s a bigger hiccup when you compare all that with good old int: > > int iv = 42; // ?class int? is NOT a thing, but ?class Integer? is > assert iv.getClass() != int.class; // because int is not a class > No matter: array types aren't classes either. (If they're treated as such internally, hats off to you folks, because that bit of trivia basically never leaks, except perhaps for the particular misnamed-in-retrospect method/type/literal trio we're talking about here. And that's great. Either way, array types aren't classes, and `getClass` means "get runtime type" (for a reasonable definition thereof), and ergo, I'd guess that assertion and the next below to fail.) assert iv.getClass() == Integer.class; // ah, there?s the class! > assert iv.getClass() == int.ref.class; // this works differently from Point > assert ((Object)iv).getClass() == pr.getClass(); // this should be true > also, right? > Not sure what that's meant to return, but surely casting to Object must do nothing different from casting to Integer or int.ref. > And to finish out the combinations: > > int.ref ir = iv; // same object? now it?s on the heap, though, with a > real live heap header > assert ir.getClass() == Integer.class; // same class > assert ir.getClass() == int.ref.class; // and this time it?s a ref-class > (only for classic primitives) > assert ir.getClass() != int.class; > > All this has some odd irregularities when you compare what Point does and > what int does. And yet it?s probably the least-bad thing we can do. > > A bad response would be to follow the bad precedent of ir.getClass() == > Integer.class off the cliff, and have pv.getClass() and pr.getClass() > return Point.ref.class. That way, getClass() only returns a ref. Get it, > see, getClass() can only return reference types. The rejoinder (which > Brian made to me when I aired it) is devastating: Point.class is the > class, not Point.ref.class, and the method is named ?get-class?. > If s/named/misnamed/ is it still devastating? :-) With this my brain has given its last feeble gasp of the night. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Nov 4 14:56:27 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 4 Nov 2021 10:56:27 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> On 11/4/2021 2:54 AM, Kevin Bourrillion wrote: > > > Point.ref pr = pv; ?// same object? now it?s on the heap, though, > with a real live heap header > assert pr.getClass() == Point.class; ?// same class, but... > > > Why would we even want this? It would be very surprising/puzzling to me. It's surprising because we're so used to "boxes" being a thing.? But let's look at this a bit. ??? int n = 3; ??? Object asObject = n; People like that this compiles, and are not forced to say `Integer.valueOf(3)`, but autoboxing hides something bad about boxing; that the box type is something completely different.? If we weren't so steeped in the Culture of Boxing, we'd be surprised that a lowly assignment like this changes the type and representation so drastically (and as it turns out, unnecessarily.) Let's compare with what happens with String. ??? String aString = new String("foo"); ??? Object asObj = aString; This code makes use of both String and Object, but in slightly different and overlapping ways.? String in this example is both a static and dynamic type.? When we make a new String with the String constructor, we are instantiating a new instance whose dynamic type (getClass) is String.? Then we assign it to a variable whose static type is String.? When we then assign that variable to a new variable whose static type is Object, nothing is being converted into an Object; the thing in asObj is still a String; it's just held in a variable whose *static* type is a supertype of String. But, this isn't a perfect example, because Object is also a dynamic type; I can create new Objects with an Object constructor.? So let's replace with Comparable: ??? String aString = new String("foo"); ??? Comparable c = aString; Now, String is both a static and dynamic type, but Comparable is *only a static type*.? There are no objects whose report their type as Comparable; there are only objects whose type extends Comparable. Now, let's go back to the integer example.? The assignment here (in the current language) takes the primitive value stored in n, and creates a whole new, accidental object whose type is different from int.? The only saving grace is that you cannot discern the type of int, since you can't ask it getClass, but we all know that boxing is a big seam in the language, runtime, and reflection. In the new world, Point.ref is like Comparable; it exists as a static type for variables, but there are no objects that are *instances* of Point.ref, because its not a concrete type. ??? Point p = new Point(3, 4); ??? Point.ref asRef = p; This is like the String to Comparable example; the new variable refers to the same object, but through an alias that has a different static type. Now, alias is a funny word to use here, but this connects back to the whole point of Valhalla -- the reason we disavow identity is that we are no longer constrained to track the fact that two references were initially assigned from the same object instance. In the absence of identity, we're free to copy the state rather than the reference, which admits powerful optimizations.? But in the Point example above, it really makes sense to think of the Point.ref as a reference to the *same instance* that is held by p. From brian.goetz at oracle.com Thu Nov 4 15:54:08 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 4 Nov 2021 11:54:08 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> Message-ID: To close the loop, in the initial "Eclair" discussion (which grew out of a conversation at the last JVMLS), a primitive was a pair of classes, where the companion class was actually an interface.? We haven't revisited "what is Point.ref" since then, but one possible way to do this is to say exactly this: that Point is a primitive class, and Point.ref is an interface it implements.? That makes it clear that (a) why it is a reference type, (b) that it is no different from other superinterfaces, and (c) that no object is actually of type Point.ref. Q1: does this help? Q2: Does this provide us a path to rehabilitating the user intuition around boxing, by saying "good news everyone, we still have boxes, but now they're interfaces, not concrete objects."? Does that balance the desire to lean on existing intuition, while breaking enough about the implementation assumptions to not carry all the existing baggage? > > In the new world, Point.ref is like Comparable; it exists as a static > type for variables, but there are no objects that are *instances* of > Point.ref, because its not a concrete type. > > ??? Point p = new Point(3, 4); > ??? Point.ref asRef = p; > > This is like the String to Comparable example; the new variable refers > to the same object, but through an alias that has a different static > type. From kevinb at google.com Thu Nov 4 16:08:45 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 4 Nov 2021 09:08:45 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> Message-ID: On Thu, Nov 4, 2021 at 7:56 AM Brian Goetz wrote: On 11/4/2021 2:54 AM, Kevin Bourrillion wrote: > > Point.ref pr = pv; // same object? now it?s on the heap, though, with a >> real live heap header >> assert pr.getClass() == Point.class; // same class, but... >> > > Why would we even want this? It would be very surprising/puzzling to me. > > It's surprising because we're so used to "boxes" being a thing. But let's > look at this a bit. > Okay, it's clear I have more work to do in understanding your whole coherent model as it exists. Summary of what kevinb has been on about the last 24 hours: The model I've been speaking for over the past day has flowed from following my own "I want to think it's as simple as...." intuitions. I expected to sort of "hit a wall" with those naive assumptions and never felt like I did (yet). Your model is likely enough the best, and I'm simply "resisting" it, but in that case I'm channeling some of the resistance other users will feel, and we can hash out how to head it off. But also, occasionally I turn out to be right about things so I'll prepare for that misfortune as well. I think it's worth my understanding both models until I can explain them well, and *then* we can make more progress. Let's just name the models. Fair enough? (Feel free to inject "no need to name your model because I can give the killer argument right now why it just can't work", I mean we wouldn't name a woodland animal we found moments from death on the side of the road, would we.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Nov 4 16:18:21 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 4 Nov 2021 12:18:21 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> Message-ID: <1cce1692-84b3-608f-703e-3cdd01b8fafd@oracle.com> I would summarize what you've been on about as "Hey, developers are used to primitives and boxes, is there mileage in working within that framework, rather than tossing it out the window because boxes seem dirty?"? And I think there is something to that. One way to frame this is that there's a model that makes sense from a specification perspective, and there's a mental model that makes sense to Java users, and these need not be the same. A prime (wildly unrelated) example of this is the "double brace idiom"; language weenies cringe because it's not actually a thing, its just an interaction between an unfortunate syntax choice for instance initializers and otherwise empty anonymous classes, but what does it matter if users think it's a thing? The spec already has opinions about what terms like "value", "object", "reference", etc mean.? Internally, we have to respect these (or pay the cost to refactor the terminology), but this only weakly constrains the user model.? The language spec makes clear distinctions between classes and types, but most developers can slush the differences away and spend that mental bookkeeping budget elsewhere.? The main risk of trying to present an alternate model is that it will invariably use terms (e.g., "object") that appear to have their meaning nailed down; perhaps we need a notational convention to distinguish between "what the spec currently calls object" and "what users understand objects to be", at least for sake of discussion? On 11/4/2021 12:08 PM, Kevin Bourrillion wrote: > On Thu, Nov 4, 2021 at 7:56 AM Brian Goetz wrote: > > On 11/4/2021 2:54 AM, Kevin Bourrillion wrote: >> >> Point.ref pr = pv; ?// same object? now it?s on the heap, >> though, with a real live heap header >> assert pr.getClass() == Point.class; ?// same class, but... >> >> >> Why would we even want this? It would be very surprising/puzzling >> to me. > It's surprising because we're so used to "boxes" being a thing.? > But let's look at this a bit. > > > Okay, it's clear I have more work to do in understanding your whole > coherent model as it exists. > > Summary of what kevinb has been on about the last 24 hours: > > The model I've been speaking for over the?past day has flowed from > following?my own "I want to think it's as simple as...." intuitions. I > expected to sort of "hit a wall" with those naive assumptions and > never felt like I did (yet). > > Your model is likely enough the best, and I'm simply "resisting" it, > but in that case I'm channeling some of the resistance other users > will feel, and we can hash out how to head it off. But also, > occasionally I turn out to be right about things so I'll prepare for > that misfortune as well. > > I think it's worth my understanding both models until I can explain > them well, and /then/?we can make more progress. Let's just name the > models. Fair enough? (Feel free to inject "no need to name your model > because I can give the killer argument right now why it just can't > work", I mean we wouldn't name a woodland animal we found moments from > death on the side of the road, would we.) > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com From daniel.smith at oracle.com Thu Nov 4 16:28:13 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 4 Nov 2021 16:28:13 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com> On Nov 3, 2021, at 6:19 PM, Kevin Bourrillion > wrote: I think my intuitions about boxes tie heavily to 'getClass' behavior (or some analogous reflective operation). "What are you?" should give me different answers for a bare value and a box. A duck in a box is not the same thing as a duck. The analogy here would be that Integer.getClass() returns Integer.class, while int.getClass(), if it existed, would return int.class. So far so good. If `int.getClass()` has to work at all, it might as well produce `int.class`, though it serves no actual purpose and we would just refactor it to `int.class` anyway. If `int.getClass()` won't even compile, it would be no great loss at all. The method exists for finding the dynamic type of an object; my model says "values are not objects and so have no dynamic type", which I think is good. But Point extends Object, and Object.getClass exists. One thing the user model has to explain is how method inheritance works. You've been pointing out that inheritance != subtyping, which is true. But still, when I invoke a super method (a default method in a superinterface, say), it must be true that that method declaration knows how to execute on a value. The ref/val model explains this by saying that method invocation will add/remove references to align with the expecations of the (dynamically-selected) method implementation. The object remains the same, so 'this' is the object that the caller started with. I guess the value/object model would pretty much say the same thing, except it would say the value the caller started with might be boxed (or the object unboxed) to match the method's expectations. It's the same *value*, presented as an object. Either way, if I can invoke 'getClass', its behavior is specified by the *class* not the value/object, so I would expect to get the same answer whether invoked via a value or a reference/box. (Another thing you could say is that the super method is like a template, stamped out in specialized form for each primitive subclass as part of inheritance. We experimented with this way of thinking for awhile before deciding, no, it really needs to be the case that invoking an inherited method means executing the method body in its original context.) Now, all that said, we could say by fiat that `getClass` is special and value types aren't allowed to invoke it. YAGNI. Except... I might want to write code like: void m(T arg) { if (arg.getClass() == Point.class) System.out.println("I'm a value!"); else System.out.println("I'm a box!"); } Someone might think this, but they can just ask themselves whether `int/Integer` work like that. They don't, so this doesn't either. int/Integer are a starting point, but our goal is to offer something more. In particular, we want universal generics: when I invoke m and pass it a Point, it must be the case that T=Point, not T=Point.ref. This is different than the status quo for int/Integer, where T=Integer. The right way to interpret generic code is, roughly, to substitute [T:=Point] and figure out what the code would do. This is imprecise, because there are compile-time decisions that aren't allowed to change under different substitutions. (For example, we don't re-do overload resolution for different Ts, even if it would get different answers.) But, for our purposes, it should be the case that you can imagine 'arg' being a value, not a reference, and this code having intuitive behavior. So the ref/val model says that 'arg' is an object (handled by value, not by reference) and its 'getClass' method returns the class of the object. The value/object model says that 'arg' is a value and its 'getClass' method exists. And I guess it returns Point.class. (If we really thought `getClass` was poison, I guess at this point we could say by fiat that type variable types aren't allowed to access `getClass`. But... `getClass` really is a useful thing to invoke in this context.) An implication of universal generics is that there needs to be some common protocol that works on both vals and refs. In the val/ref model, that protocol is objects: both vals and refs are objects with members that can be accessed via '.'. In the value/object model, I'm not quite sure how you'd explain it. Maybe there's a third concept here, generalizing how values and objects behave. From daniel.smith at oracle.com Thu Nov 4 16:36:07 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 4 Nov 2021 16:36:07 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: On Nov 3, 2021, at 7:34 PM, John Rose > wrote: There?s a bigger hiccup when you compare all that with good old int: int iv = 42; // ?class int? is NOT a thing, but ?class Integer? is assert iv.getClass() != int.class; // because int is not a class assert iv.getClass() == Integer.class; // ah, there?s the class! assert iv.getClass() == int.ref.class; // this works differently from Point assert ((Object)iv).getClass() == pr.getClass(); // this should be true also, right? And to finish out the combinations: int.ref ir = iv; // same object? now it?s on the heap, though, with a real live heap header assert ir.getClass() == Integer.class; // same class assert ir.getClass() == int.ref.class; // and this time it?s a ref-class (only for classic primitives) assert ir.getClass() != int.class; All this has some odd irregularities when you compare what Point does and what int does. And yet it?s probably the least-bad thing we can do. A bad response would be to follow the bad precedent of ir.getClass() == Integer.class off the cliff, and have pv.getClass() and pr.getClass() return Point.ref.class. That way, getClass() only returns a ref. Get it, see, getClass() can only return reference types. The rejoinder (which Brian made to me when I aired it) is devastating: Point.class is the class, not Point.ref.class, and the method is named ?get-class?. I guess to rephrase this, I'll just say: yes, there are problems with int/Integer. But we shouldn't let that tail wag the dog when sorting out the language model. int/Integer is going to be a special case, no matter how we stack it. (On the other hand, we really like to look for analogies from int/Integer when sorting out the language model, and sometimes those are fruitful. But handle with care.) From daniel.smith at oracle.com Thu Nov 4 16:41:10 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 4 Nov 2021 16:41:10 +0000 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <2cd20f3f-4e56-e283-cff5-02ae16ecb149@oracle.com> Message-ID: <00AD191C-7257-486F-85D9-531C69489B1F@oracle.com> On Nov 4, 2021, at 10:08 AM, Kevin Bourrillion > wrote: Your model is likely enough the best, and I'm simply "resisting" it, but in that case I'm channeling some of the resistance other users will feel, and we can hash out how to head it off. But also, occasionally I turn out to be right about things so I'll prepare for that misfortune as well. Keep it up. It's a very useful exercise, and I haven't ruled out that you're onto something valuable here. From kevinb at google.com Thu Nov 4 16:51:14 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 4 Nov 2021 09:51:14 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com> References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com> Message-ID: On Thu, Nov 4, 2021 at 9:28 AM Dan Smith wrote: > On Nov 3, 2021, at 6:19 PM, Kevin Bourrillion wrote: > > I think my intuitions about boxes tie heavily to 'getClass' behavior (or >> some analogous reflective operation). "What are you?" should give me >> different answers for a bare value and a box. A duck in a box is not the >> same thing as a duck. >> >> The analogy here would be that Integer.getClass() returns Integer.class, >> while int.getClass(), if it existed, would return int.class. >> > > So far so good. If `int.getClass()` has to work at all, it might as well > produce `int.class`, though it serves no actual purpose and we would just > refactor it to `int.class` anyway. If `int.getClass()` won't even compile, > it would be no great loss at all. The method exists for finding the dynamic > type of an object; my model says "values are not objects and so have no > dynamic type", which I think is good. > > > But Point extends Object, and Object.getClass exists. > As does `wait()`. :-) But absolutely, this case is different; I'm trying to be clear that it seems *pointless but harmless* for `someValue.getClass()` to be callable, so long as it returns whatever is the most sensible thing according to the model adopted. Keeps static refactoring tools in business! Now, all that said, we could say by fiat that `getClass` is special and > value types aren't allowed to invoke it. YAGNI. Except... > > I might want to write code like: >> >> void m(T arg) { >> if (arg.getClass() == Point.class) System.out.println("I'm a value!"); >> else System.out.println("I'm a box!"); >> } >> > > Someone might think this, but they can just ask themselves whether > `int/Integer` work like that. They don't, so this doesn't either. > > int/Integer are a starting point, but our goal is to offer something more. > Just to be clear, my intention was only to say that under the sure-they're-boxes model it would both "not work" and "not be expected to work", which is at least harmonious. :-) An implication of universal generics is that there needs to be some common > protocol that works on both vals and refs. In the val/ref model, that > protocol is objects: both vals and refs are objects with members that can > be accessed via '.'. In the value/object model, I'm not quite sure how > you'd explain it. Maybe there's a third concept here, generalizing how > values and objects behave. > This is on point. I quite honestly forgot that "oh yeah, I don't fully understand universal generics yet", and I'll go work on that. It might be death to the model I'm clinging to, but in that case I'll become pretty good at explaining to people why that model fails, so cool. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Nov 4 16:56:15 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 4 Nov 2021 09:56:15 -0700 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: On Thu, Nov 4, 2021 at 9:36 AM Dan Smith wrote: I guess to rephrase this, I'll just say: yes, there are problems with > int/Integer. But we shouldn't let that tail wag the dog when sorting out > the language model. int/Integer is going to be a special case, no matter > how we stack it. (On the other hand, we really like to look for analogies > from int/Integer when sorting out the language model, and sometimes those > are fruitful. But handle with care.) > Perfectly said, I think. When someone says "so it's like int/Integer?" my hope of being able to answer "yeah actually, that will serve well enough" is closely followed by "we'd like to say yeah, but we *need* to ask you to think of it differently now, but you'll understand why." You probably have that and I'm just delayed in absorbing it. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Nov 4 18:35:33 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 4 Nov 2021 14:35:33 -0400 Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> <71FADE80-08C7-45E5-AE55-0C644848F8E1@oracle.com> Message-ID: <5c017540-1863-4418-8dcd-10926e7239f8@oracle.com> > > An implication of universal generics is that there needs to be > some common protocol that works on both vals and refs. In the > val/ref model, that protocol is objects: both vals and refs are > objects with members that can be accessed via '.'. In the > value/object model, I'm not quite sure how you'd explain it. Maybe > there's a third concept here, generalizing how values and objects > behave. > > > This is on point. I quite honestly forgot that "oh yeah, I don't fully > understand universal generics yet", and I'll go work on that. It might > be death to the model I'm clinging to, but in that case I'll become > pretty good at explaining to people why that model fails, so cool. > Generics are often a clarifying lens through which to look at this problem.? We've caught ourselves multiple times trying to locally optimize, only to find that is an impediment to "generify over all the things."? One of the arguments in favor of "everything is an object" (or a class, or whatever), aside from its natural uniformity, is that then generics have a more regular surface to quantify over; generifying over all types is easier when the types have more in common. For example, one of the reasons to allow the locution "String.ref" as an alias for String, while useless, is that it strengthens the notion that ".ref" is a total operator, so "T.ref" makes sense simply by appealing to substitution, rather than having to give it a more elaborate definition. When considering universal (erased) generics, we had to totalize the semantics of all operations, even when some operations are not allowed under a strict-substitution interpretation.? A quick tour (assume `t` is of type `T`, an unbounded type variable, which is instantiated to `Point`.) ?- Assignment to Object or interface (`Object o = t`).? In the language, this is considered a primitive widening (nee boxing) conversion, but in the VM, this is mere subtyping (QFoo is-a LFoo). This means that we can use the same `astore` or `putfield` operations to simply move the value without conversion. ?- Assignment to null (`T t = null`).? Not all types under T are nullable, but T is still erased to Object.? In this case, we assign a null and issue an unchecked warning; if that values bubbles out to non-generic code, the cast to `Point` will catch the null, and treat this as a form of heap pollution. ?- Array covariance (`Object[] os = ts`).? The JVM has been upgraded to support array covariance for primitives, where `Point[] <: Point.ref[]` (and transitivity gets us to `Object[]`.) ?- Synchronization (`synchronized(t)`).? Warnings at compile time, IMSE at runtime. ?- Equality (`o == t`).? ACMP has been upgraded to understand primitives, so we can translate as always. I'm sure I missed a few, but what you see here is a bag of tricks for creating totality.? In some cases (equality, array covariance) we engineered actual totality into the bytecodes; in some cases (synchronization) we rely on compile time warnings and runtime errors; in others, we rely on erasure and lean on existing detection of heap pollution. When moving forward to specialized generics, the constraints get stiffer.? We want a model where the _bytecode_ is invariant across specializations, all specialization operates on the constant pool, and specialization is strictly optional at runtime (meaning erasure is still a valid runtime strategy.)? This might mean that some total-seeming operations (e.g., T.default) are either outlawed or require complex translation through a reflective runtime. All of this is to say, there may be some hidden indirect constraints that derive from the desire for a uniform but still specializable translation. From kevinb at google.com Thu Nov 4 21:34:54 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 4 Nov 2021 14:34:54 -0700 Subject: identityless objects and the type hierarchy In-Reply-To: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com> References: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com> Message-ID: On Wed, Nov 3, 2021 at 12:43 PM Brian Goetz wrote: > > On 11/3/2021 3:00 PM, Kevin Bourrillion wrote: > > Okay, let's stick a pin in proper-value-types (i.e. try to leave them out > of this discussion) for a moment... > > One question is whether the existing design for the bifurcated type > hierarchy will carry right over to this split instead. > > Brian, your response reads like it is explaining/defending *that* design to me. But I believe I already understood it and wasn't expressing any problem with it. Now we're talking about making a smaller split first, "identity objects vs. identityless objects" (1 vs 2, not 1 vs 3), so I was inquiring into why that class model does or does not also work exactly as-is for *this * purpose. (Note that I assume if bucket 3's arrival requires another such type in the mix, there would be a second such bifurcation under IdentitylessObject.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Thu Nov 4 22:25:49 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 4 Nov 2021 23:25:49 +0100 (CET) Subject: identityless objects and the type hierarchy In-Reply-To: References: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com> Message-ID: <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Brian Goetz" > Cc: "valhalla-spec-experts" > Sent: Jeudi 4 Novembre 2021 22:34:54 > Subject: Re: identityless objects and the type hierarchy > On Wed, Nov 3, 2021 at 12:43 PM Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> On 11/3/2021 3:00 PM, Kevin Bourrillion wrote: >>> Okay, let's stick a pin in proper-value-types (i.e. try to leave them out of >>> this discussion) for a moment... >>> One question is whether the existing design for the bifurcated type hierarchy >>> will carry right over to this split instead. > Brian, your response reads like it is explaining/defending that design to me. > But I believe I already understood it and wasn't expressing any problem with > it. > Now we're talking about making a smaller split first, "identity objects vs. > identityless objects" (1 vs 2, not 1 vs 3), so I was inquiring into why that > class model does or does not also work exactly as-is for this purpose. > (Note that I assume if bucket 3's arrival requires another such type in the mix, > there would be a second such bifurcation under IdentitylessObject.) I don't think a second bifurcation is needed. At runtime bucket 2 and bucket 3 behave the same apart from null. Given that IdentitylessObject (or whatever the name we choose) is an interface, it always accept null, so if they are typed as that interface, B2 and B3 behave exactly the same. R?mi From forax at univ-mlv.fr Thu Nov 4 22:47:25 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 4 Nov 2021 23:47:25 +0100 (CET) Subject: [External] : Re: Consolidating the user model In-Reply-To: References: <8efe3096-bb57-c0f2-19a7-084a1f987128@oracle.com> <6F3BCE64-30B2-4F32-A498-8031E5BA7D26@oracle.com> <97112C92-929D-4828-9A1B-D24D559F6708@oracle.com> <664073AD-DEFD-42DD-90AD-5743F71FA6C7@oracle.com> Message-ID: <1202978501.1480795.1636066045213.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "daniel smith" > Cc: "Kevin Bourrillion" , "Brian Goetz" > , "valhalla-spec-experts" > > Sent: Jeudi 4 Novembre 2021 02:34:52 > Subject: Re: [External] : Re: Consolidating the user model > On Nov 3, 2021, at 4:05 PM, Dan Smith < [ mailto:daniel.smith at oracle.com | > daniel.smith at oracle.com ] > wrote: >> (It is, I suppose, part of the model that objects of a given class all have a >> finite, matching layout when accessed by value, even if the details of that >> layout are kept abstract. Which is why value types are monomorphic and you need >> reference types for polymorphism.) >> The fact that the VM often discards object headers at runtime is a pure >> optimization. > Let?s see what happens if we say that (a) bare values have headers and (b) > Object::getClass allows the user to observe part of the header contents. > It follows then that the expression aPointVal.getClass() will show the contents > of aPointVal?s header, even if it is a compile-time constant. > Point pv = new Point(42,42); // ?class Point? is the definition of Point > assert pv.getClass() == Point.class; // ok, that?s certainly the class > assert pv.getClass() != Point.ref.class; // and it?s not a ref, so good > That is all fine. There?s a little hiccup when you ?box? the point and get the > same Class mirror even though the ?header? is a very real-heap resident value > now: > Point.ref pr = pv; // same object? now it?s on the heap, though, with a real > live heap header > assert pr.getClass() == Point.class; // same class, but... > assert pr.getClass() != Point.ref.class; // we suppress any distinction the heap > header might provide > There?s a bigger hiccup when you compare all that with good old int: > int iv = 42; // ?class int? is NOT a thing, but ?class Integer? is > assert iv.getClass() != int.class; // because int is not a class > assert iv.getClass() == Integer.class; // ah, there?s the class! > assert iv.getClass() == int.ref.class; // this works differently from Point > assert ((Object)iv).getClass() == pr.getClass(); // this should be true also, > right? How can you have int.class not being a class and at the same time having the notation int.ref ?? If you suppose that int is now a primitive class, B3 bucket, then iv.getClass() == int.class, because it's equivalent to new int(iv).getClass() == int.class so assert iv.getClass() != Integer.class; //because Integer is the reference projection assert iv.getClass() != int.ref.class; // because int.ref is equivalent to Integer If you suppose that Integer is B2 bucket (after all, all other value based class are B2), then iv.getClass() == Integer.class because it's equivalent to Integer.valueOf(iv).getClass() == Integer.class so assert iv.getClass() != int.class; //because int.class is a fake type like void.class assert iv.getClass() != int.ref.class; // does not compile because Integer is B2 not B3 I think i've missed something ? R?mi From daniel.smith at oracle.com Wed Nov 17 15:39:35 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 17 Nov 2021 15:39:35 +0000 Subject: EG meeting, 2021-11-17 Message-ID: EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT). Lots of traffic this time, we can have follow up discussions wherever there's interest. Potential topics: "Consolidating the user model": followup discussions homed in on how we model primitive values?whether they're reference-less objects or some other "value" entity, and how they interact with reference types "Equality operator for identityless classes": Kevin is concerned that the new == operator is an attractive nuisance, because it's sometimes, but not always, equivalent to 'equals' "identityless objects and the type hierarchy": discussed how the IdentityObject/PrimitiveObject interfaces are used in the "Consolidating the user model" world "Consequences of null for flattenable representations": John described strategies for encoding nulls where object references are flattened From jesper at selskabet.org Wed Nov 17 16:10:15 2021 From: jesper at selskabet.org (=?utf-8?Q?Jesper_Steen_M=C3=B8ller?=) Date: Wed, 17 Nov 2021 17:10:15 +0100 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: <398DCAC3-9B8E-410B-A931-E75B710454B8@selskabet.org> Hi Srikanth, I suppose this meeting will decide some of the work to be done? -Jesper > On 17 Nov 2021, at 16.40, Dan Smith wrote: > > ?EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT). > > Lots of traffic this time, we can have follow up discussions wherever there's interest. Potential topics: > > "Consolidating the user model": followup discussions homed in on how we model primitive values?whether they're reference-less objects or some other "value" entity, and how they interact with reference types > > "Equality operator for identityless classes": Kevin is concerned that the new == operator is an attractive nuisance, because it's sometimes, but not always, equivalent to 'equals' > > "identityless objects and the type hierarchy": discussed how the IdentityObject/PrimitiveObject interfaces are used in the "Consolidating the user model" world > > "Consequences of null for flattenable representations": John described strategies for encoding nulls where object references are flattened > From kevinb at google.com Wed Nov 17 17:41:52 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Nov 2021 09:41:52 -0800 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: Derp, I slept in today On Wed, Nov 17, 2021 at 7:39 AM Dan Smith wrote: "Consolidating the user model": followup discussions homed in on how we > model primitive values?whether they're reference-less objects or some other > "value" entity, and how they interact with reference types > I'm in progress writing up the two main models so far as I understand them. "Equality operator for identityless classes": Kevin is concerned that the > new == operator is an attractive nuisance, because it's sometimes, but not > always, equivalent to 'equals' > Summary: for reftypes `==` and `.equals()` ask two different questions, and users almost never really mean `==`, but it's *sometimes *an okay shorthand. That remains true, but when the *"sometimes"* is, exactly, could get much much harder to observe now. Definitely concerned -- but perhaps the question users *actually* mean most of the time is really the "pattern matching" question, after all. (The biggest ergonomic problem of `.equals()` "identityless objects and the type hierarchy": discussed how the > IdentityObject/PrimitiveObject interfaces are used in the "Consolidating > the user model" world > For the moment I think this does probably carry over to WithIdentity/WithoutIdentity or whatever they are called. The question I think is still open (to me) is whether there really are active contractual implications of being identityless or if it's equivalent to being uncommitted; i.e. should a clear-cut identityless class still be able to have an identityful subclass, or does that clearly break something. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Wed Nov 17 17:49:49 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Nov 2021 09:49:49 -0800 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: On Wed, Nov 17, 2021 at 9:41 AM Kevin Bourrillion wrote: (The biggest ergonomic problem of `.equals()` > ... is that it's not negatable, sometimes forcing a `!` to be far away to the left, and I'm not under the impression pattern-matching addresses that.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Nov 18 02:04:19 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Nov 2021 18:04:19 -0800 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: On Wed, Nov 17, 2021 at 11:16 AM Dan Heidinga wrote: > For the moment I think this does probably carry over to > WithIdentity/WithoutIdentity or whatever they are called. The question I > think is still open (to me) is whether there really are active contractual > implications of being identityless or if it's equivalent to being > uncommitted; i.e. should a clear-cut identityless class still be able to > have an identityful subclass, or does that clearly break something. > > It breaks flattening. If an identityless class is flattened - and we > want to preserve the option to do this for bucket 2 values that are <= > 64 bits - then we can't assign a subclass instance to a slot (field / > array element) declared to be the superclass's type as we may have to > truncate the subclass to have it fit. > Right. I guess I was figuring that the mere fact of the idenityless class being non-final would already destroy that? Supposing this justifies requiring `final` for these classes, then my question evaporates. I wasn't sure though. Even losing flattening entirely doesn't leave you worse off than B1. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Nov 18 22:26:59 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Nov 2021 14:26:59 -0800 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga wrote: Let me turn the question around: What do we gain by allowing > subclassing of B2 classes? > I'm not claiming it's much. I'm just coming into this from a different direction. In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug. When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they *need* something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even *any *performance benefits at all, and the performance benefits just make it a lot more *motivating* to do what is already probably technically best anyway. Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Nov 18 22:34:51 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Nov 2021 22:34:51 +0000 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: I think it is reasonable to consider allowing bucket two classes to be abstract. They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic. A similar argument works for records. Sent from my iPad On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion wrote: ? On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga > wrote: Let me turn the question around: What do we gain by allowing subclassing of B2 classes? I'm not claiming it's much. I'm just coming into this from a different direction. In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug. When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway. Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Thu Nov 18 22:58:07 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 18 Nov 2021 23:58:07 +0100 (CET) Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "Dan Heidinga" , "daniel smith" > , "valhalla-spec-experts" > > Sent: Jeudi 18 Novembre 2021 23:34:51 > Subject: Re: EG meeting, 2021-11-17 > I think it is reasonable to consider allowing bucket two classes to be abstract. > They could be extended by other classes which would either be abstract or > final. The intermediate types are polymorphic but the terminal type is > monomorphic. > A similar argument works for records. I suppose you are talking about empty (no field) abstract classes. We need that for j.l.Object, j.l.Number or j.l.Record. >From a user POV, it's not very different from an interface with default methods. R?mi > Sent from my iPad >> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion wrote: >> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga < [ mailto:heidinga at redhat.com | >> heidinga at redhat.com ] > wrote: >>> Let me turn the question around: What do we gain by allowing >>> subclassing of B2 classes? >> I'm not claiming it's much. I'm just coming into this from a different >> direction. >> In my experience most immutable (or stateless) classes have no real interest in >> exposing identity, but just get defaulted into it. Any dependency on the >> distinction between one instance and another that equals() it would be a >> probable bug. >> When B2 exists I see myself advocating that a developer's first instinct should >> be to make new classes in B2 except when they need something from B1 like >> mutability (and perhaps subclassability belongs in this list too!). As far as I >> can tell, this makes sense whether there are even any performance benefits at >> all, and the performance benefits just make it a lot more motivating to do what >> is already probably technically best anyway. >> Now, if subclassability legitimately belongs in that list of B1-forcing-factors, >> that'll be fine, I just hadn't fully thought it through and was implicitly >> treating it like an open question, which probably made my initial question in >> this subthread confusing. >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | >> kevinb at google.com ] From brian.goetz at oracle.com Thu Nov 18 23:06:29 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Nov 2021 23:06:29 +0000 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> Message-ID: <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com> No, I?m talking more broadly. abstract class A implements PureObject { int a; } abstract class B extends A { int b; } pure class C extends B { int c; } Now C is a final, pure class with fields a, b, and c. A and B are abstract superclasses of C. There?d be details to work out, but this is not an impossible lift. The question is whether the return on complexity is there or not. On Nov 18, 2021, at 5:58 PM, Remi Forax > wrote: ________________________________ From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "Dan Heidinga" >, "daniel smith" >, "valhalla-spec-experts" > Sent: Jeudi 18 Novembre 2021 23:34:51 Subject: Re: EG meeting, 2021-11-17 I think it is reasonable to consider allowing bucket two classes to be abstract. They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic. A similar argument works for records. I suppose you are talking about empty (no field) abstract classes. We need that for j.l.Object, j.l.Number or j.l.Record. From a user POV, it's not very different from an interface with default methods. R?mi Sent from my iPad On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion > wrote: On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga > wrote: Let me turn the question around: What do we gain by allowing subclassing of B2 classes? I'm not claiming it's much. I'm just coming into this from a different direction. In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug. When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway. Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing. -- Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com From brian.goetz at oracle.com Fri Nov 19 13:32:38 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 19 Nov 2021 13:32:38 +0000 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com> Message-ID: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com> The translation model I had in mind was more complicated, but my point was that the reason we disallow inheritance is because we?re trying to disallow layout polymorphism for concrete types, so that we know exactly how big a ?C? is. And this is not inconsistent with abstract superclasses contributing fields. There?s definitely translational complexity, but its not insurmountable. I raised it because Kevin seemed to be going somewhere with extension, and I wanted to get a better sense of what that was. I have definitely wished for abstract records a few times before, and I could imagine Kevin has similar use cases in mind. The model requires fields in value types to be final so each of those fields should be marked as `final` to ensure they show the right properties to users via reflection. Additionally, that means that A & B would need to have constructors to set those final fields to be consistent with the rest of the language, but C will never run those constructors. Without a constructor, there's no place for A & B to set invariants on their fields. If they can't define the contract for those fields, then they shouldn't define the fields. This is similar to how interfaces work: the interface can define a "int getX()" method that implementers have to implement, but it can't define the "int x" field directly. If we relaxed the "must be final" field constraint, we'd need some other rule to prevent A or B from defining a setter for their fields as there is no single set of bytecode that can implement a setter for both a value and an identity class: void setA(int a) { putfield A.a } vs A setA(int a) { withfield A.a; areturn; } Note in particular that the second *must* return a new A as values are immutable. The details around this would be hard for users to keep straight and would be easy to violate when refactoring as the authors of A & B would need to know that their subclasses include value types. And this would be incredibly hard to keep straight across maintenance boundaries. --Dan On Nov 18, 2021, at 5:58 PM, Remi Forax > wrote: ________________________________ From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "Dan Heidinga" >, "daniel smith" >, "valhalla-spec-experts" > Sent: Jeudi 18 Novembre 2021 23:34:51 Subject: Re: EG meeting, 2021-11-17 I think it is reasonable to consider allowing bucket two classes to be abstract. They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic. A similar argument works for records. I suppose you are talking about empty (no field) abstract classes. We need that for j.l.Object, j.l.Number or j.l.Record. From a user POV, it's not very different from an interface with default methods. R?mi Sent from my iPad On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion > wrote: On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga > wrote: Let me turn the question around: What do we gain by allowing subclassing of B2 classes? I'm not claiming it's much. I'm just coming into this from a different direction. In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug. When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway. Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing. -- Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com From forax at univ-mlv.fr Fri Nov 19 14:23:46 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 19 Nov 2021 15:23:46 +0100 (CET) Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com> References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com> <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com> Message-ID: <1355298284.3039585.1637331826853.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "Remi Forax" , "Kevin Bourrillion" , > "daniel smith" , "valhalla-spec-experts" > > Sent: Vendredi 19 Novembre 2021 14:32:38 > Subject: Re: [External] : Re: EG meeting, 2021-11-17 > The translation model I had in mind was more complicated, but my point was that > the reason we disallow inheritance is because we?re trying to disallow layout > polymorphism for concrete types, so that we know exactly how big a ?C? is. And > this is not inconsistent with abstract superclasses contributing fields. > There?s definitely translational complexity, but its not insurmountable. I > raised it because Kevin seemed to be going somewhere with extension, and I > wanted to get a better sense of what that was. I have definitely wished for > abstract records a few times before, and I could imagine Kevin has similar use > cases in mind. For records, it's easy to avoid abstract inheritance because all states are public, so instead of abstract A { int a; } abstract B extends A { int b; } record C() extends B { } one can write interface A { int a(); } interface B extends A { int b(); } record C(int a, int b) implements B { } R?mi >> The model requires fields in value types to be final so each of those >> fields should be marked as `final` to ensure they show the right >> properties to users via reflection. Additionally, that means that A & >> B would need to have constructors to set those final fields to be >> consistent with the rest of the language, but C will never run those >> constructors. >> Without a constructor, there's no place for A & B to set invariants on >> their fields. If they can't define the contract for those fields, >> then they shouldn't define the fields. This is similar to how >> interfaces work: the interface can define a "int getX()" method that >> implementers have to implement, but it can't define the "int x" field >> directly. >> If we relaxed the "must be final" field constraint, we'd need some >> other rule to prevent A or B from defining a setter for their fields >> as there is no single set of bytecode that can implement a setter for >> both a value and an identity class: >> void setA(int a) { putfield A.a } >> vs >> A setA(int a) { withfield A.a; areturn; } >> Note in particular that the second *must* return a new A as values are >> immutable. >> The details around this would be hard for users to keep straight and >> would be easy to violate when refactoring as the authors of A & B >> would need to know that their subclasses include value types. And >> this would be incredibly hard to keep straight across maintenance >> boundaries. >> --Dan >>> On Nov 18, 2021, at 5:58 PM, Remi Forax < [ mailto:forax at univ-mlv.fr | >>> forax at univ-mlv.fr ] > wrote: >>> ________________________________ >>> From: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] >>> > >>> To: "Kevin Bourrillion" < [ mailto:kevinb at google.com | kevinb at google.com ] > >>> Cc: "Dan Heidinga" < [ mailto:heidinga at redhat.com | heidinga at redhat.com ] >, >>> "daniel smith" < [ mailto:daniel.smith at oracle.com | daniel.smith at oracle.com ] >>> >, "valhalla-spec-experts" < [ mailto:valhalla-spec-experts at openjdk.java.net | >>> valhalla-spec-experts at openjdk.java.net ] > >>> Sent: Jeudi 18 Novembre 2021 23:34:51 >>> Subject: Re: EG meeting, 2021-11-17 >>> I think it is reasonable to consider allowing bucket two classes to be abstract. >>> They could be extended by other classes which would either be abstract or >>> final. The intermediate types are polymorphic but the terminal type is >>> monomorphic. >>> A similar argument works for records. >>> I suppose you are talking about empty (no field) abstract classes. >>> We need that for j.l.Object, j.l.Number or j.l.Record. >>> From a user POV, it's not very different from an interface with default methods. >>> R?mi >>> Sent from my iPad >>> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion < [ mailto:kevinb at google.com | >>> kevinb at google.com ] > wrote: >>> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga < [ mailto:heidinga at redhat.com | >>> heidinga at redhat.com ] > wrote: >>>> Let me turn the question around: What do we gain by allowing >>>> subclassing of B2 classes? >>> I'm not claiming it's much. I'm just coming into this from a different >>> direction. >>> In my experience most immutable (or stateless) classes have no real interest in >>> exposing identity, but just get defaulted into it. Any dependency on the >>> distinction between one instance and another that equals() it would be a >>> probable bug. >>> When B2 exists I see myself advocating that a developer's first instinct should >>> be to make new classes in B2 except when they need something from B1 like >>> mutability (and perhaps subclassability belongs in this list too!). As far as I >>> can tell, this makes sense whether there are even any performance benefits at >>> all, and the performance benefits just make it a lot more motivating to do what >>> is already probably technically best anyway. >>> Now, if subclassability legitimately belongs in that list of B1-forcing-factors, >>> that'll be fine, I just hadn't fully thought it through and was implicitly >>> treating it like an open question, which probably made my initial question in >>> this subthread confusing. >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | >>> kevinb at google.com ] From john.r.rose at oracle.com Mon Nov 22 05:05:13 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 22 Nov 2021 05:05:13 +0000 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: Yes. One way I like to think about the Old Bucket is that it is characterized by *concrete* representations which have somehow opted into object identity. Confusingly, the Old Bucket also contains interfaces which are non-concrete and also Object, which might as well be non-concrete. (I?m not saying ?abstract? because that?s a keyword in the language, and you can have semi-concrete classes which are abstract but also commit to object identity and may even have mutable fields or by-reference constructors, like AbstractList.) Those are the two interesting populations in the Old Bucket: Concrete classes that are entangled with object identity (until they can be migrated, or forever in many cases). And, non-concrete classes, which are necessarily polymorphic. Those two kinds of types (in the Old Bucket) interact with the New Buckets in distinct ways. There?s a middle case which is causing problems here: A class can be concrete *and* polymorphic, meaning that subclasses can add more stuff. (The parent class could be declared abstract or not; that?s not an important detail.) A class that is concrete *and* polymorphic is exactly one that plays the classic game of object oriented subclasses, where data fields and methods are refined in layers. This classic game does not translate well into the by-value world; it needs polymorphic pointers. Just consult any C++ style guide to see what happens if you unwarily try to mix by-value structs and class inheritance: You shouldn?t, according to the guides. Is there a way to make that work in Java, so that identity-free classes can inherit from each other? Probably, in some limited way. The simplest move is the one Brian and I are liking here, where a completely non-concrete class (one with no fields and no commitment to object identity) can be refined by a subclass. But it should be marked abstract, so as not to have cases where you have a variable of the super-type and you don?t know whether it has the layout of the super (because it was concrete, oops) or a subtype. The division separating non-concrete types from identity-object types in the Old Bucket may be seen in this diagram, which I cobbled up this weekend: http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf This comes from my attempts to make a more or less comprehensive Venn-style diagram of the stuff we are talking about. I think it helps me better visualize what we are trying to do; maybe it will help others in some way. I view this as my due diligence mapping the side of the elephant I can make contact with. Therefore I?m happy to take corrections on it. I?m also noodling on a whimsical Field Guide, which asks you binary questions about a random Java type, and guides you towards classifying it. That helped me crystallize the diagram, and may be useful in its own right, or perhaps distilled into a flowchart. Stay tuned. ? John On Nov 18, 2021, at 2:34 PM, Brian Goetz > wrote: I think it is reasonable to consider allowing bucket two classes to be abstract. They could be extended by other classes which would either be abstract or final. The intermediate types are polymorphic but the terminal type is monomorphic. A similar argument works for records. Sent from my iPad On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion > wrote: ? On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga > wrote: Let me turn the question around: What do we gain by allowing subclassing of B2 classes? I'm not claiming it's much. I'm just coming into this from a different direction. In my experience most immutable (or stateless) classes have no real interest in exposing identity, but just get defaulted into it. Any dependency on the distinction between one instance and another that equals() it would be a probable bug. When B2 exists I see myself advocating that a developer's first instinct should be to make new classes in B2 except when they need something from B1 like mutability (and perhaps subclassability belongs in this list too!). As far as I can tell, this makes sense whether there are even any performance benefits at all, and the performance benefits just make it a lot more motivating to do what is already probably technically best anyway. Now, if subclassability legitimately belongs in that list of B1-forcing-factors, that'll be fine, I just hadn't fully thought it through and was implicitly treating it like an open question, which probably made my initial question in this subthread confusing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Mon Nov 22 05:10:29 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 22 Nov 2021 05:10:29 +0000 Subject: EG meeting, 2021-11-17 In-Reply-To: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> Message-ID: On Nov 18, 2021, at 2:58 PM, Remi Forax > wrote: I suppose you are talking about empty (no field) abstract classes. We need that for j.l.Object, j.l.Number or j.l.Record. From a user POV, it's not very different from an interface with default methods. Yes. The key thing is that the abstract class in question does not accidentally entangle itself with object identity. There are three ways off the top of my head to do that: - have a constructor that needs to write fields through `this` - have a mutable instance field - have synchronization somewhere (a synch. method) We?ll need to have a way for an abstract class (for Record, for example) to stand clear of the object identity thicket. I think we could allow such an abstract class to have final fields, with suitable restrictions. But it would require a complex translation strategy and/or tricky JVM support. The problem is that the fields in the super would have to be replicated into each concrete subclass in a physically separate manner. Also the fields would have to have their initialization declared by the superclass but defined by the concrete subclass. Also field access might need to be virtualized, if each concrete subclass has its own idea about where the field ?lives? in its bundle of fields. It?s doable but messy. I?d rather leave it for later; we have so many more worthwhile things to do. From john.r.rose at oracle.com Mon Nov 22 05:25:15 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 22 Nov 2021 05:25:15 +0000 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com> References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> <4C229872-2809-4AE1-9E2E-D8CEC148080B@oracle.com> <7CE4960C-1457-466C-8349-F89910ECA565@oracle.com> Message-ID: On Nov 19, 2021, at 5:32 AM, Brian Goetz > wrote: And this is not inconsistent with abstract superclasses contributing fields. For me the poster child is Enum as much as Record. I want pure enums, some day, but in order to make this work we need a way for the ordinal and name fields to (a) appear in the abstract class Enum and (b) be suitably defined in the layout of each Enum subclass, whether it is an identity subclass or a pure (B2) subclass. Sketch of an example way forward (but still with the sense that we have more important things to do): - Allow fields to be marked abstract, and mark Enum?s fields that way. - Do not require (or allow) constructors to initialize abstract fields. - The JVM can support virtualized getfield, maybe, or just ask the T.S. to use access methods. - As with methods, require a concrete subclass to redeclare inherited abstracts. - The concrete subclass will naturally declare and initialize the now-concrete field. - Have Enum support both kinds of constructors: Old School (fully concrete) and empty. - Figure out some story for concretifying Enum?s fields for Old School clients. The trick would be to configure Enum so that it was a fully functional super for both kinds of subclasses; it should behave one way to Old School enum subclasses and another way to B2 enum subclasses. It?s a research project. I get the sense there?s a path forward, but not a simple one. If you exclude fields, then it?s not as hard as a research project IMO. The abstract supers of a B2 are not themselves B2; they are polymorphic types that (conventionally) live in the Old Bucket. From john.r.rose at oracle.com Mon Nov 22 05:36:30 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 22 Nov 2021 05:36:30 +0000 Subject: identityless objects and the type hierarchy In-Reply-To: <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr> References: <0d1af369-b041-3fec-b713-3f59cb2cd12c@oracle.com> <942577129.1474549.1636064749760.JavaMail.zimbra@u-pem.fr> Message-ID: <59B8F839-277D-45AE-A350-4B07275F8722@oracle.com> On Nov 4, 2021, at 3:25 PM, Remi Forax > wrote: I don't think a second bifurcation is needed. At runtime bucket 2 and bucket 3 behave the same apart from null. Given that IdentitylessObject (or whatever the name we choose) is an interface, it always accept null, so if they are typed as that interface, B2 and B3 behave exactly the same. Piling on: The marker interfaces are useful for testing and bounding *reference types*. But a primitive type is not a reference type, so it cannot be (directly) tested or bounded as a reference. There *is* a difference between a reference of the form B3.ref (B3.box, B3? whatever) and B2. But it?s not an interesting difference, because when you box a B3 primitive you get something which has (as Brian says) all the affordances of reference, but without object identity. That?s exactly what a B2 type is. The only difference between a reference to a B3 type and a B2 type is the syntax by which they were declared and derived. This looked pretty clear to me when I did my diagram, where B3 types have ref projections that bubble into the B2 swath of types. Once there, they behave exactly like native B2 types. The diagram has three swathes for concrete types (PRIM, NOID, IDOSAUR), plus a separate upper quadrant for non-concrete reference types. The PRIM swath has a little excrescence into the NOID swath where the P.ref types pop out. http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf All that suggests to me that we won?t want a marker interface to specially distinguish the B3 excrescences. It does also suggest that we are not done bike-shedding terms: What?s the collective term for ?B2 refs + B3 boxes?? (I used NOID.) Or, is a B3 box a ?pure object? like any B2 pure object, whose class happens to be a primitive class? I dunno. It remains true (and I hope will continue to be true) that a B3 class defines two types, one reference and one non-reference, while a B2 class defines one reference type. But maybe those two reference types are both to ?pure objects?? I?ll bet Dan has a take on this. From brian.goetz at oracle.com Mon Nov 22 15:22:11 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Nov 2021 10:22:11 -0500 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: <17113716-e837-1c0e-c31c-c4f388ce2260@oracle.com> > Is there a way to make that work in Java, so that > identity-free classes can inherit from each other? > Probably, in some limited way. ?The simplest move > is the one Brian and I are liking here, where a > completely non-concrete class (one with no fields > and no commitment to object identity) can be > refined by a subclass. ?But it should be marked > abstract, so as not to have cases where you have > a variable of the super-type and you don?t know > whether it has the layout of the super (because > it was concrete, oops) or a subtype. This is the second turn of the crank (the first was "you can extend Object only"), but as this conversation hinted, there may be a further turn where abstract identity-agnostic classes can contribute to the layout and initialization of a concrete final by-value class without pulling it into the world of new-dup-init.? The following is an exploration, not a proposal, but might help find the next turn of the crank.? The exposition is translational-centric but that's not essential. An abstract class can contribute fields, initialization of those fields, and behavior.? We can transform: ??? abstract class C extends ValueObject {? // no identity children, please ??????? T t; ??????? C(T t) { ... t = e ... } ??? } into ??? interface C { ??????? abstract protected V withT(V v, T t); ??????? abstract protected T t(); ??????? static? protected V init(V v, T t) { ?????????? ... v = withT(v, e) ... ????????? return v; ?????? } ??? } and a subclass ??? b2-class V extends C { ??????? U u; ??????? V(T t, U u) { super(t); u = e; } ??? } into ??? b2-class V implements C { ??????? T t;?? // pull down fields from super ??????? U u; ??????? V(T t, U u) { ??????????? V this = initialvalue; ??????????? this = C.init(this, t); ??????????? this = this withfield[u] u; ??????? } ??? } The point of this exercise is to observe that the two components of C that are doing double-duty as both API points for clients of C and extension points for subclasses of C -- the constructor and the layout -- can be given new implementations for the by-value world, that is consistent with the inheritance semantics the user expects. Again, not making a proposal here, as much as probing at the bounds of a new object model. (I think this is similar to what you sketched in your next mail.) > The division separating non-concrete types from > identity-object types in the Old Bucket may be > seen in this diagram, which I cobbled up this > weekend: > > http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf > > This comes from my attempts to make a more or > less comprehensive Venn-style diagram of the stuff > we are talking about. ?I think it helps me better > visualize what we are trying to do; maybe it will > help others in some way. > > I view this as my due diligence mapping the side of the > elephant I can make contact with. ?Therefore I?m happy > to take corrections on it. > > I?m also noodling on a whimsical Field Guide, which asks > you binary questions about a random Java type, and guides > you towards classifying it. ?That helped me crystallize > the diagram, and may be useful in its own right, > or perhaps distilled into a flowchart. ?Stay tuned. > > ? John > > >> On Nov 18, 2021, at 2:34 PM, Brian Goetz wrote: >> >> I think it is reasonable to consider allowing bucket two classes to >> be abstract. ?They could be extended by other classes which would >> either be abstract or final. The intermediate types are polymorphic >> but the terminal type is monomorphic. >> >> A similar argument works for records. >> >> Sent from my iPad >> >>> On Nov 18, 2021, at 5:27 PM, Kevin Bourrillion >>> wrote: >>> >>> ? >>> On Wed, Nov 17, 2021 at 7:05 PM Dan Heidinga >>> wrote: >>> >>> Let me turn the question around: What do we gain by allowing >>> subclassing of B2 classes? >>> >>> >>> I'm not claiming it's much. I'm just coming into this from a >>> different direction. >>> >>> In my experience most immutable (or stateless) classes have no real >>> interest in exposing identity, but just get defaulted into it. Any >>> dependency on the distinction between one instance and another that >>> equals() it would be a probable bug. >>> >>> When B2 exists I see myself advocating that a developer's first >>> instinct should be to make new classes in B2 except when they >>> /need/?something from B1 like mutability (and perhaps >>> subclassability?belongs in this list too!). As far as I can tell, >>> this makes sense whether there are even /any /performance benefits >>> at all, and the performance benefits just make it a lot more >>> /motivating/?to do what is already probably technically best anyway. >>> >>> Now, if subclassability?legitimately belongs in that list of >>> B1-forcing-factors, that'll be fine, I just hadn't fully thought it >>> through and was implicitly treating it like an open question, which >>> probably made my initial question in this subthread confusing. >>> >>> >>> >>> -- >>> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > From brian.goetz at oracle.com Mon Nov 22 19:14:22 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Nov 2021 14:14:22 -0500 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: References: <17113716-e837-1c0e-c31c-c4f388ce2260@oracle.com> Message-ID: I wouldn't say we flipped anything.? But we have made a lot of progress on the model; at first we thought abstract supers at all were a bridge too far, but we found the right set of constraints and it seems to fit naturally now.? So it makes sense to ask the question whether we're at the edge, or whether further crank-turns are worth exploring. I was mostly reacting to Kevin's comments; he seemed to be going somewhere with the "could we get people to adopt B2 by default", and probing for where that might go, and what constraints we'd have to reexplore, to see if there was untapped value there. On 11/22/2021 2:09 PM, Dan Heidinga wrote: > I'm trying to understand what flipped the cost-benefit calculation > here that makes it worthwhile to re-explore allowing values to inherit > fields from abstract supers. From kevinb at google.com Mon Nov 22 21:07:55 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 22 Nov 2021 13:07:55 -0800 Subject: EG meeting, 2021-11-17 In-Reply-To: References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> Message-ID: On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga wrote: I'll echo Brian's comment that I'd like to understand Kevin's use > cases better to see if there's something we're missing in the design / > a major use case that isn't being addressed that will cause useer > confusion / pain. > Sorry if I threw another wrench here! What I'm raising is only the wish that users can reasonably *default* to B2-over-B1 unless their use case requires something on our list of "only B1 does this". And that list can be however long it needs to be, just hopefully no longer. That's probably how we were looking at it already. And sure, "need" sometimes can mean "it would have made translation *way too* complex and clever". Even if all we can say is "in principle this *could* be supported, but it just isn't and click here if you *really care a lot* to know the reasons why", it works and I suspect most users wouldn't even click. Does that make perfect sense? Again, the thread just backed into the topic sideways. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Nov 22 21:15:54 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Nov 2021 16:15:54 -0500 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> Message-ID: Or, to put it another way: success looks like yet another "got the defaults wrong", where people should default to B2 unless they need B1, and "pure" joins the ranks of "final" and "private" of "I shoulda been the default." Right, that's what you're saying? On 11/22/2021 4:07 PM, Kevin Bourrillion wrote: > On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga wrote: > > I'll echo Brian's comment that I'd like to understand Kevin's use > cases better to see if there's something we're missing in the design / > a major use case that isn't being addressed that will cause useer > confusion / pain. > > > Sorry if I threw another wrench here! > > What I'm raising is only the wish that users can reasonably /default/ > to B2-over-B1 unless their use case?requires something on our list of > "only B1 does this". And that list can be however long it needs to be, > just hopefully no longer. That's probably how we were looking at it > already. > > And sure, "need" sometimes can mean "it would have made translation > /way too/ complex and clever". Even if all we can say is "in principle > this /could/?be supported, but it just isn't and click here if you > /really care a lot/?to know the reasons why", it works and I suspect > most users wouldn't even click. > > Does that make perfect sense? Again, the thread just backed into the > topic sideways. > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com From john.r.rose at oracle.com Tue Nov 23 01:04:41 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 23 Nov 2021 01:04:41 +0000 Subject: EG meeting, 2021-11-17 In-Reply-To: References: Message-ID: Thanks, Brian, for many useful suggestions about the diagram. I have updated it in place. Its message should be clearer now. On Nov 21, 2021, at 9:05 PM, John Rose > wrote: http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf From daniel.smith at oracle.com Tue Nov 23 01:13:14 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 23 Nov 2021 01:13:14 +0000 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> Message-ID: <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com> > On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion wrote: > >> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga wrote: >> >> I'll echo Brian's comment that I'd like to understand Kevin's use >> cases better to see if there's something we're missing in the design / >> a major use case that isn't being addressed that will cause useer >> confusion / pain. >> > Sorry if I threw another wrench here! > > What I'm raising is only the wish that users can reasonably default to B2-over-B1 unless their use case requires something on our list of "only B1 does this". And that list can be however long it needs to be, just hopefully no longer. That's probably how we were looking at it already. Here's the current list, FYI (derived from JEP 401): ? Implicitly final class, cannot be extended. ? All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer. ? The class does not implement?directly or indirectly?IdentityObject. This implies that the superclass is either Object or a stateless abstract class. ? No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code. ? No instance methods are declared synchronized. ? (Possibly) The class does not implement Cloneable or declare a clone()method. ? (Possibly) The class does not declare a finalize() method. ? (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned. And elaborating on IdentityObject & stateless abstract classes: An abstract class can be declared to implement either IdentityObject or ValueObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning). Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes. (Such a "both kinds" abstract class has its ACC_PRIM_SUPER?name to be changed?flag set in the class file, along with an method for identity classes.) From john.r.rose at oracle.com Wed Nov 24 06:48:32 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 24 Nov 2021 06:48:32 +0000 Subject: [External] : Re: EG meeting, 2021-11-17 In-Reply-To: <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com> References: <1987799963.2696985.1637276287602.JavaMail.zimbra@u-pem.fr> <1E60D4D1-5E09-445A-8A80-3DB5B2EF389A@oracle.com> Message-ID: <54F7E409-E87F-4E7C-B9DA-A26869CF22AC@oracle.com> On Nov 22, 2021, at 5:13 PM, Dan Smith wrote: > >> On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion wrote: >> >>> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga wrote: >>> >>> I'll echo Brian's comment that I'd like to understand Kevin's use >>> cases better to see if there's something we're missing in the design / >>> a major use case that isn't being addressed that will cause useer >>> confusion / pain. >>> >> Sorry if I threw another wrench here! >> >> What I'm raising is only the wish that users can reasonably default to B2-over-B1 unless their use case requires something on our list of "only B1 does this". And that list can be however long it needs to be, just hopefully no longer. That's probably how we were looking at it already. > > Here's the current list, FYI (derived from JEP 401): > > ? Implicitly final class, cannot be extended. JVMS requires ACC_FINAL on class. > ? All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer. JVMS requires ACC_FINAL on every instance field. (Static fields OK.) > ? The class does not implement?directly or indirectly?IdentityObject. This implies that the superclass is either Object or a stateless abstract class. JVMS requires a check for this. > ? No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code. JVMS rules for invokespecial must exclude this. > ? No instance methods are declared synchronized. JVMS forbits ACC_SYNC. on all instance methods. (Static methods OK.) > ? (Possibly) The class does not implement Cloneable or declare a clone()method. > ? (Possibly) The class does not declare a finalize() method. A conservative move is to forbid these things, in language and JVMS. Minor precedent: record has similar special cases (for component names). > ? (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned. JVMS doesn?t care about this. The private opcodes initialvalue and withfield work to set up ?this? as the constructor executes. It?s OK to sample the value at any time, but maybe the language says, ?don?t do that?. I think there are use cases for private methods to work on partially initialized stuff. The theory is tricky. OK to be conservative now and more lenient later. > > And elaborating on IdentityObject & stateless abstract classes: > > An abstract class can be declared to implement either IdentityObject or ValueObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning). JVMS should enforce corresponding structural rules on loaded classfiles. Neither a source class-or-interface nor a loaded classfile can ever implement both IO and VO at the same time. As a special feature in the JVM I want an explicit form for these ?empty constructors?. We?ve discussed this; I?m not sure which form is best, but I don?t want it to be a ?not-really-empty? constructor which has a super-call in it; that?s what seemingly ?empty? constructor look like today to the JVM. The JVM should both allow and require an empty constructor if and only if the abstract class implements VO. (Alternative: The JVM implicitly injects VO if it sees an empty constructor, and if it sees VO it looks for an empty constructor.) IIRC maybe our last consensus was to add an attribute to an method of signature ()V that says, ?whatever you think you see in this method, Mr. VM, please also feel free to skip it.? That?s a more hacky way to specify an empty constructor than would be my preference (which is an ACC_ABSTRACT ()V or even a zero-length class attribute). If a VO-only abstract has an ()V method, that?s a smell, because it will never be used! OTOH, maybe just being a VO-0nly abstract class is enough to tell the JVM that the constructor is empty, with no further markings. Anyway, there?s a little corner of the design space to consider here. > Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes. Such a class is very handy. It needs *both kinds of constructors*. Are you thinking that just mentioning the special VO super is enough to trigger inclusion of an empty constructor? That?s probably a good move. Is this the *only* way to request an empty constructor, or is there a way to make an explicit empty constructor? (I mean a really-empty one, not just today?s seemingly-empty ones. Even Object?s empty constructor has an areturn instruction, so it?s not really empty.) > (Such a "both kinds" abstract class has its ACC_PRIM_SUPER?name to be changed?flag set in the class file, along with an method for identity classes.) Yes, that makes sense. So maybe a VO-capable abstract class is always assumed to have an implicit empty constructor, even if there is no other marking than the PRIM_SUPER? I guess that?s OK for the JVM. For the source language it might be too magic. From daniel.smith at oracle.com Tue Nov 30 00:09:06 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 30 Nov 2021 00:09:06 +0000 Subject: JEP update: Value Objects Message-ID: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> I've been exploring possible terminology for "Bucket 2" classes, the ones that lack identity but require reference type semantics. Proposal: *value classes*, instances of which are *value objects* The term "value" is meant to suggest an entity that doesn't rely on mutation, uniqueness of instances, or other features that come with identity. A value object with certain field values is the same (per ==), now and always, as every "other" value object with those field values. (A value object is *not* necessarily immutable all the way down, because its fields can refer to identity objects. If programmers want clean immutable semantics, they shouldn't write code (like 'equals') that depends on these identity objects' mutable state. But I think the "value" term is still reasonable.) This feels like it may be an intuitive way to talk about identity without resorting to something verbose and negative like "non-identity". If you've been following along all this time, there's potential for confusion: a "value class" has little to do with a "primitive value type", as we've used the term in JEP 401. We're thinking the latter can just become "primitive type", leading to the following two-axis interpretation of the Valhalla features: --------------------------------------------------------------------------------------------- Value class reference type (B2 & B3.ref) | Identity class type (B1) --------------------------------------------------------------------------------------------- Value class primitive type (B3) | --------------------------------------------------------------------------------------------- Columns: value class vs. identity class. Rows: reference type vs. primitive type. (Avoid "value type", which may not mean what you think it means.) Fortunately, the renaming exercise is just a problem for those of us who have been closely involved in the project. Everybody else will approach this grid with fresh eyes. (Another old term that I am still finding useful, perhaps in a slightly different way: "inline", describing any JVM implementation strategy that encodes value objects directly as a sequence of field values.) Here's a new JEP draft that incorporates this terminology and sets us up to deliver Bucket 2 classes, potentially as a separate feature from Bucket 3: https://bugs.openjdk.java.net/browse/JDK-8277163 Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive classes and types as a special kind of of value class. From john.r.rose at oracle.com Tue Nov 30 06:53:56 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 30 Nov 2021 06:53:56 +0000 Subject: JEP update: Value Objects In-Reply-To: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: Two points from me for the record: 1. I re-read the JEP draft now titled Value Objects, and liked everything I saw, including the new/old term ?Value? replacing ?Pure? and ?Inline?. 2. In your mail, and in the companion JEP draft titled Primitive Objects, you refer to ?primitive classes? and their objects. It would make our deliberations simpler, IMO, if we were to title this less prescriptively as ?Primitives? or ?Primitive Types? or ?Primitive Types and Values?, rather than ?Primitive Classes?, because (a) there?s no logical need for the new things to be classes, and (b) it might actually be helpful for them *not* to be, in the end, after deliberation. Putting the word ?classes? in the title presupposes an answer to deliberations that have not yet been concluded. People should note that the term ?class? and ?object? is only loosely bound to the term ?primitive? in most of our designs, since (of course) today no primitives at all are either defined by classes or have objects. They have corresponding reference or box classes and objects, to be precise. Today a primitive type ?has a class? but it is not the case that it ?is a class?. We could choose to preserve this state of affairs instead of fixing it by making ?classes everywhere?; it makes some dependent choices easier to make. As you know, one possible bridge to the future is, ?Today all types are a disjoint union of primitives, classes, and interfaces, and tomorrow the same will be true, with all three possessing class-like declarations.? What about objects, shouldn?t primitives at least be objects? Well, interfaces don?t directly have objects today; they have objects of implementing classes. Likewise, primitives need never have objects directly, as long as they have objects which properly relate to them?their boxes. Boxes-boxes-everywhere certainly has its downsides, include pedagogical downsides, but that doesn?t make it a non-starter. Instead, if we choose to use the terms ?primitive class? and ?primitive object? as exact counterparts to ?reference class? and ?reference object?, as your chart suggests, Dan, we will have to account for the duplication and/or ad hoc division of various attributions of classes and objects between the ?primitive class? and its corresponding ?reference class? (e.g., int.ref, Point.ref). I think a good leading question is, ?if a primitive is a class, and its reference type is also a class, which of its methods are situated on the primitive class, and which are situated on the reference class?? I would suggest that we be more sure we want to have two classes per primitive, or only-a-primitive-class per primitive, before we presuppose a decision by putting the word ?Classes? in the title of JEP 402. > On Nov 29, 2021, at 4:09 PM, Dan Smith wrote: > > I've been exploring possible terminology for "Bucket 2" classes, the ones that lack identity but require reference type semantics. > > Proposal: *value classes*, instances of which are *value objects* > > The term "value" is meant to suggest an entity that doesn't rely on mutation, uniqueness of instances, or other features that come with identity. A value object with certain field values is the same (per ==), now and always, as every "other" value object with those field values. > > (A value object is *not* necessarily immutable all the way down, because its fields can refer to identity objects. If programmers want clean immutable semantics, they shouldn't write code (like 'equals') that depends on these identity objects' mutable state. But I think the "value" term is still reasonable.) > > This feels like it may be an intuitive way to talk about identity without resorting to something verbose and negative like "non-identity". > > If you've been following along all this time, there's potential for confusion: a "value class" has little to do with a "primitive value type", as we've used the term in JEP 401. We're thinking the latter can just become "primitive type", leading to the following two-axis interpretation of the Valhalla features: > > --------------------------------------------------------------------------------------------- > Value class reference type (B2 & B3.ref) | Identity class type (B1) > --------------------------------------------------------------------------------------------- > Value class primitive type (B3) | > --------------------------------------------------------------------------------------------- > > Columns: value class vs. identity class. Rows: reference type vs. primitive type. (Avoid "value type", which may not mean what you think it means.) > > Fortunately, the renaming exercise is just a problem for those of us who have been closely involved in the project. Everybody else will approach this grid with fresh eyes. > > (Another old term that I am still finding useful, perhaps in a slightly different way: "inline", describing any JVM implementation strategy that encodes value objects directly as a sequence of field values.) > > Here's a new JEP draft that incorporates this terminology and sets us up to deliver Bucket 2 classes, potentially as a separate feature from Bucket 3: > > https://bugs.openjdk.java.net/browse/JDK-8277163 > > Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive classes and types as a special kind of of value class. > From john.r.rose at oracle.com Tue Nov 30 07:05:22 2021 From: john.r.rose at oracle.com (John Rose) Date: Tue, 30 Nov 2021 07:05:22 +0000 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: P.S. I?d like to emphasize that none of my pleas for caution apply to the JEP draft titled Value Objects. That very nice JEP draft merely links to the JEP draft titled Primitive Classes, which is the JEP with the potential problem I?m taking pains to point out here. Also, I?m not really demanding a title change here, Dan, but rather asking everyone to be careful about any presupposition that ?of course we will heal the rift by making all primitives be classes?. Or even ?all primitives be objects.? Those are easy ideas to fall into by accident, and I don?t want us to get needlessly muddled about them as we sort them out. (Having picked Value as the winner for the first JEP, replacing Primitive Objects with Primitive Values in the second JEP is not exactly graceful, is it? Naming is hard. If you were to change the title I suggest simply ?Primitives? as the working title, until we figure out exactly what we want these Primitives to be, relative to other concepts. Just a suggestion.) On Nov 29, 2021, at 10:53 PM, John Rose > wrote: Two points from me for the record: 1. I re-read the JEP draft now titled Value Objects, and liked everything I saw, including the new/old term ?Value? replacing ?Pure? and ?Inline?. 2. In your mail, and in the companion JEP draft titled Primitive Objects, you refer to ?primitive classes? and their objects. It would make our deliberations simpler, IMO, if we were to title this less prescriptively as ?Primitives? or ?Primitive Types? or ?Primitive Types and Values?, rather than ?Primitive Classes?? From daniel.smith at oracle.com Tue Nov 30 18:13:55 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 30 Nov 2021 18:13:55 +0000 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: On Nov 30, 2021, at 12:05 AM, John Rose > wrote: Also, I?m not really demanding a title change here, Dan, but rather asking everyone to be careful about any presupposition that ?of course we will heal the rift by making all primitives be classes?. Or even ?all primitives be objects.? Those are easy ideas to fall into by accident, and I don?t want us to get needlessly muddled about them as we sort them out. +1 I've been defaulting in descriptions like my two-axis grid to the plan of record, until we settle on a revised plan. But quite possible that "class" is not the right word for the second row. (As for JEP 401?it will need to be revised to build on the Value Objects JEP. What you're seeing right now is unchanged from a few months ago. An updated iteration to come...)