From daniel.smith at oracle.com Wed Dec 1 15:59:52 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 1 Dec 2021 15:59:52 +0000 Subject: EG meeting, 2021-12-01 Message-ID: EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT). We can discuss "JEP update: Value Objects", the use of the term "value" here, and class file encodings. From forax at univ-mlv.fr Wed Dec 1 16:32:00 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 1 Dec 2021 17:32:00 +0100 (CET) Subject: JEP update: Value Objects In-Reply-To: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: <1464628180.1879848.1638376320697.JavaMail.zimbra@u-pem.fr> Hi Daniel, this is really nice. Here are my remarks. "It generally requires that an object's data be located at a fixed memory location" remove "fixed", all OpenJDK GCs move objects. Again later, remove "fixed" in "That is, a value object does not have a fixed memory address ...". At the beginning of the section "Value class declarations", before the example, i think we also need a sentence saying that fields are implicitly final. Class file and representation, about ACC_PERMITS_VALUE, what's the difference between "permits" and "allow" in English ? In section "Java language compilation", "Each class file generated by javac includes a Preload attribute naming any value class that appears in one of the class file's field or method descriptors." + if a value class is the receiver of a method call/field access (the receiver is not part of the method descriptor in the bytecode). In section "Performance model" "... must ensure that fields and arrays storing value objects are updated atomically.", not only stores, loads has to be done atomically too. The part "Initially, developers can expect the following from the HotSpot JVM" is dangerous because it will be read as Hotspot will do that forever. We have to be more vague here, "a Java VM may ..." regards, R?mi ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Mardi 30 Novembre 2021 01:09:06 > Subject: JEP update: Value Objects > I've been exploring possible terminology for "Bucket 2" classes, the ones that > lack identity but require reference type semantics. > > Proposal: *value classes*, instances of which are *value objects* > > The term "value" is meant to suggest an entity that doesn't rely on mutation, > uniqueness of instances, or other features that come with identity. A value > object with certain field values is the same (per ==), now and always, as every > "other" value object with those field values. > > (A value object is *not* necessarily immutable all the way down, because its > fields can refer to identity objects. If programmers want clean immutable > semantics, they shouldn't write code (like 'equals') that depends on these > identity objects' mutable state. But I think the "value" term is still > reasonable.) > > This feels like it may be an intuitive way to talk about identity without > resorting to something verbose and negative like "non-identity". > > If you've been following along all this time, there's potential for confusion: a > "value class" has little to do with a "primitive value type", as we've used the > term in JEP 401. We're thinking the latter can just become "primitive type", > leading to the following two-axis interpretation of the Valhalla features: > > --------------------------------------------------------------------------------------------- > Value class reference type (B2 & B3.ref) | Identity class type (B1) > --------------------------------------------------------------------------------------------- > Value class primitive type (B3) | > --------------------------------------------------------------------------------------------- > > Columns: value class vs. identity class. Rows: reference type vs. primitive > type. (Avoid "value type", which may not mean what you think it means.) > > Fortunately, the renaming exercise is just a problem for those of us who have > been closely involved in the project. Everybody else will approach this grid > with fresh eyes. > > (Another old term that I am still finding useful, perhaps in a slightly > different way: "inline", describing any JVM implementation strategy that > encodes value objects directly as a sequence of field values.) > > Here's a new JEP draft that incorporates this terminology and sets us up to > deliver Bucket 2 classes, potentially as a separate feature from Bucket 3: > > https://bugs.openjdk.java.net/browse/JDK-8277163 > > Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive > classes and types as a special kind of of value class. From john.r.rose at oracle.com Wed Dec 1 20:34:35 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 Dec 2021 20:34:35 +0000 Subject: aconst_init In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: <40F9B99B-7F37-47FE-BBC1-FBCEEBFC28FB@oracle.com> On Dec 1, 2021, at 7:58 AM, Dan Heidinga wrote: > > Splitting a new thread off from Dan's email about the jep draft to > talk about the `aconst_init` bytecode: > >> aconst_init, with a CONSTANT_Class operand, produces an instance of the named value class, with all fields set to their default values. This operation always has private access: a linkage error occurs if anyone other than the value class or its nestmates attempts an aconst_init operation. > > Can you confirm if this is purely a rename of the previous > defaultvalue / initialvalue bytecodes? I can confirm this, with one important exception: The defaultvalue bytecode has no access restrictions, while the aconst_init/initialvalue bytecode does. > I'm wondering how the name fits the eventual primitive values and > their uses. Will they also use this bytecode or will they continue to > use a defaultvalue version? For this reason, aconst_init/initialvalue is not useful for B3 types. I think there is no need for yet another bytecode to cover the B3 types. Instead, Class::__InitialValue should return either null for B1/B2 types (or any reference types: polys and arrays), and should return the (boxed) zero for primitives, starting with int.class. assert Integer.class.__InitialValue() == null; assert int.class.__InitialValue() == 0; assert Point.class.__InitialValue() == (new Point[1])[0]; assert Point.ref.class.__InitialValue() == null; (__InitialValue is not really the eventual method name.) > The expected bytecode pattern for a "" factory method is something like: > aconst_init MyValue > iconst1 > withfield MyValue.x:I > areturn > Correct? Yes, although it?s likely there are intervening astore_0 and aload_0 instructions, since ?this? is probably modeled by the compiler as local[0]. By the way, this raises the question of how vigorously the JVM should perform structural checks on the new features, to ensure they are only used in the ways we expect. I think in general such checks should be justified individually, rather than be applied by default. Since is just a static factory method, I would prefer (though I understand reasons to the contrary) to have the JVMS be agnostic about where methods can occur. In other words, treat like a plain identifier; maybe require that it be marked ACC_STATIC but allow it to work like a nameless factory method in any context where a classfile generator might choose to make use of it. Taking an agnostic stance now would let us experiment with translation strategies (in the future) which replace uses of (which have problematic security characteristics, even recently) with uses of . (Reflection might omit off-label uses of , just like it omits . But the ?guts? of MH reflection can see today and would see all such s tomorrow, so exposing it becomes a library issue, not a JVMS decision.) ? John From daniel.smith at oracle.com Wed Dec 1 23:29:37 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 1 Dec 2021 23:29:37 +0000 Subject: [External] : Re: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> Message-ID: <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> > On Dec 1, 2021, at 8:48 AM, Dan Heidinga wrote: > >> class file representation & interpretation >> >> A value class is declared in a class file using the ACC_VALUE modifier (0x0100). At class load time, the class is considered to implement the interface ValueObject; an error occurs if a value class is not final, has a non-final instance field, or implements?directly or indirectly?IdentityObject. > > I'll reiterate my earlier pleas to have javac explicitly make them > implement ValueObject. The VM can then check that they have both the > bit and the interface. So we went down the path of "maybe there's no need for a flag at all" in today's meeting, and it might be worth more consideration, but I convinced myself that the ACC_VALUE flag serves a useful purpose for validation and clarifying intent that can't be reproduced by a "directly/indirectly extends ValueObject" test. As you suggest, though, we could mandate that ACC_VALUE implies 'implements ValueObject'. Some reasons not to require this: - 'implements ValueObject' may be redundant if an ancestor implements ValueObject; but leaving it off risks a separate compilation error (e.g., ancestor used to implement ValueObject, doesn't anymore). So I think the proper compilation strategy would be to always implement it directly, even redundantly. There's an opportunity for a subtle compiler bug. - It's extra ceremony in the class file. - Inferring is consistent with what we do for at least some identity classes. Inferring everywhere is, in some ways, simpler.* (*Tangent about the idea of inferring IdentityObject in old versions, but requiring IdentityObject in new versions: the trouble with gating off less-preferred behavior in old versions is that it's still there and still must be supported. JVMs end up with two strategies instead of one. A (great strategy+ok strategy) combination is arguably *worse* than just (ok strategy) everywhere.) > It's a simpler model if the interface is > always there for values as the VM won't have to track whether it was > injected for a value class or explicitly declared. Why does that > matter? For two reasons: JVMTI will need to be consistent in the > classfile bytes it returns and not included the interface if it was > injected (less tracking), and given earlier conversations about > whether to "hide" the injected interface from Class::getInterfaces, > always having it for values removes one more sharp edge. The plan of record is to make no distinction between inferred and explicit superinterfaces in reflection. Is that not acceptable for JVMTI? If there's no need for a distinction, does that address your concern about inferred supers? From john.r.rose at oracle.com Wed Dec 1 23:56:02 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 Dec 2021 23:56:02 +0000 Subject: [External] : Re: JEP update: Value Objects In-Reply-To: <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> Message-ID: <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> On Dec 1, 2021, at 3:29 PM, Dan Smith > wrote: So we went down the path of "maybe there's no need for a flag at all" in today's meeting, and it might be worth more consideration, but I convinced myself that the ACC_VALUE flag serves a useful purpose for validation and clarifying intent that can't be reproduced by a "directly/indirectly extends ValueObject" test. As you suggest, though, we could mandate that ACC_VALUE implies 'implements ValueObject?. Assuming ACC_VALUE is part of the design, there are actually four things we can specify, for the case when a class file has ACC_VALUE set: A. Inject ValueObject as a direct interface, whether or not it was already inherited. B. Inject ValueObject as a direct interface, if it is not already inherited. C. Require ValueObject to be present as a direct interface, whether or not it was already inherited. D. Require ValueObject to be present as an interface, either direct or inherited. A and B will look magic to reflection. B is slightly more parsimonious and less predictable than A. C and D are less magic to reflection, and require a bit more ?ceremony? in the class file. D is less ceremony than C. Also, the D condition is a normal subtype condition, while the C condition is unusual to the JVM. I guess I prefer C and D over A and B because of the reflection magic problem, and also because of Dan H?s issue (IIUC) about ?where do we look for the metadata, if not in somebody?s constant pool?? Since D and C have about equal practical effect, and D is both simpler to specify and less ceremony, I prefer D best of all. I agree that ACC_VALUE is useful to prevent ?action at a distance?. There is the converse problem that comes from the redundancy: What happens if the class directly implements or inherits ValueObject and ACC_VALUE is not set? I guess that is an error also. ? John From john.r.rose at oracle.com Thu Dec 2 00:04:56 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 2 Dec 2021 00:04:56 +0000 Subject: [External] : Re: JEP update: Value Objects In-Reply-To: <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> Message-ID: <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> On Dec 1, 2021, at 3:56 PM, John Rose > wrote: There is the converse problem that comes from the redundancy: What happens if the class directly implements or inherits ValueObject and ACC_VALUE is not set? I guess that is an error also. I hit send too soon: That?s probably true for concrete classes. For abstracts, ACC_VALUE must not be set (yes?) and ValueObject ?just flows? along with all the other super types, with no particular notice. It all comes together when ACC_VALUE appears, and that must be on a final, concrete class. I keep wondering what ACC_VALUE ?should mean? for an abstract. Maybe it ?should mean? that the abstract is thereby also forced to implement VO, so that all subtypes will be VO?s. The slightly different meaning of ACC_PERMITS_VALUE is ?hold off on injecting IdentityObject at this point?. Because the type might allow subtypes that implement VO (whether abstract or concrete). At this point it also allows IdentityObject to be introduced in subtypes. Mmm? It could also have been spelled ACC_NOT_NECESSARILY_IDENTITY. As we said in the meeting, it seems to need magic injection of IdObj, even if we can require non-magic explicit presence of VO. Dan H., will the metadata pointer of IdObj be a problem to access, if it is magically injected? From daniel.smith at oracle.com Thu Dec 2 00:05:29 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 2 Dec 2021 00:05:29 +0000 Subject: JEP update: Value Objects In-Reply-To: <1464628180.1879848.1638376320697.JavaMail.zimbra@u-pem.fr> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <1464628180.1879848.1638376320697.JavaMail.zimbra@u-pem.fr> Message-ID: <97407E47-9296-4776-9B7B-22220931B785@oracle.com> > On Dec 1, 2021, at 9:32 AM, Remi Forax wrote: > > Hi Daniel, > this is really nice. > > Here are my remarks. > > "It generally requires that an object's data be located at a fixed memory location" > remove "fixed", all OpenJDK GCs move objects. > Again later, remove "fixed" in "That is, a value object does not have a fixed memory address ...". Yeah, was hoping I could weasel my way out of that with "generally", but okay. Changed to "particular memory location". > At the beginning of the section "Value class declarations", before the example, i think we also need a sentence saying that fields are implicitly final. Eh, this is putting more detail in the introductory paragraph than I want. I think I'm happier going the other direction?putting the rules about 'final' and 'abstract' class modifiers in the "subject to the following restrictions" list after the example. Then the intro is just two sentences about the 'value' keyword. > Class file and representation, about ACC_PERMITS_VALUE, what's the difference between "permits" and "allow" in English ? Very close synonyms, I'd say? I would use them interchangeably. The reason I chose "permits" is because we already have a PermittedSubclasses attribute that serves a similar purpose. > In section "Java language compilation", > "Each class file generated by javac includes a Preload attribute naming any value class that appears in one of the class file's field or method descriptors." > + if a value class is the receiver of a method call/field access (the receiver is not part of the method descriptor in the bytecode). The need here is to identity inlinable classes at the declaration site. Use sites don't need it. (And the the type of 'this' at the declaration site is, of course, already loaded.) > In section "Performance model" > "... must ensure that fields and arrays storing value objects are updated atomically.", > not only stores, loads has to be done atomically too. "read and written atomically", then. > The part "Initially, developers can expect the following from the HotSpot JVM" is dangerous because it will be read as Hotspot will do that forever. > We have to be more vague here, "a Java VM may ..." Yes, message received. I'll ask around about the best way to document our intentions for the targeted release (perhaps outside the JEP) without suggesting a constraint on the abstract feature. From daniel.smith at oracle.com Thu Dec 2 00:25:08 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 2 Dec 2021 00:25:08 +0000 Subject: JEP update: Value Objects In-Reply-To: <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> Message-ID: <39BB24B3-8214-4D39-BF31-2F51E30F75FD@oracle.com> > On Dec 1, 2021, at 4:56 PM, John Rose wrote: > > On Dec 1, 2021, at 3:29 PM, Dan Smith wrote: >> >> So we went down the path of "maybe there's no need for a flag at all" in today's meeting, and it might be worth more consideration, but I convinced myself that the ACC_VALUE flag serves a useful purpose for validation and clarifying intent that can't be reproduced by a "directly/indirectly extends ValueObject" test. >> >> As you suggest, though, we could mandate that ACC_VALUE implies 'implements ValueObject?. > > Assuming ACC_VALUE is part of the design, there are actually four > things we can specify, for the case when a class file has ACC_VALUE set: > > A. Inject ValueObject as a direct interface, whether or not it was already inherited. > B. Inject ValueObject as a direct interface, if it is not already inherited. > C. Require ValueObject to be present as a direct interface, whether or not it was already inherited. > D. Require ValueObject to be present as an interface, either direct or inherited. I realize my last sentence there is ambiguous, so thanks for spelling these out. I meant that Dan has suggested (D), and we could consider doing so. (The JEP says do either A or B, it's vague about what "considered to implement" means.) > A and B will look magic to reflection. This I'm unclear on. What's the magic? Are you imagining that certain superinterfaces be suppressed by reflection. As I said, our intent is to *not* suppress anything. > B is slightly more parsimonious and less predictable than A. Yeah, I'm not sure what I prefer. The distinction only matters, I think, for reflection. > C and D are less magic to reflection, and require a bit more ?ceremony? in the class file. > D is less ceremony than C. > Also, the D condition is a normal subtype condition, while the C condition is unusual to the JVM. The "normal subtype condition" is a big reason to prefer D over C. > I guess I prefer C and D over A and B because of the reflection magic problem, > and also because of Dan H?s issue (IIUC) about ?where do we look for the > metadata, if not in somebody?s constant pool?? I'll reiterate this point: >> the trouble with gating off less-preferred behavior in old versions is that it's still there and still must be supported. JVMs end up with two strategies instead of one. A (great strategy+ok strategy) combination is arguably *worse* than just (ok strategy) everywhere. We haven't really eliminated these problems if we're still inferring IdentityObject elsewhere. We've just (slightly) reduced their footprint. At the expense of living with two strategies instead of one. > Since D and C have about equal practical effect, and D is both simpler to > specify and less ceremony, I prefer D best of all. I'm concerned about D's separate compilation problem: implementing ValueObject at compile time doesn't guarantee implementing ValueObject at runtime. That change is not, strictly speaking, a binary compatible change, but a superinterface author might think they could get away with it, and the resulting error message seems excessively punitive: "you can't load this class because some superinterface changed its mind about allowing identity class implementations". They wanted to allow more, and ended up allowing less. Which means, to be safe, the compiler should always redundantly implement ValueObject in value classes, but then a compiler might forget to do so and introduce a subtle bug, ... Tolerable, but it's a rough edge of D. From forax at univ-mlv.fr Thu Dec 2 07:08:01 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 2 Dec 2021 08:08:01 +0100 (CET) Subject: [External] : Re: JEP update: Value Objects In-Reply-To: <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> Message-ID: <95379176.1986412.1638428881927.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "daniel smith" > Cc: "Dan Heidinga" , "valhalla-spec-experts" > > Sent: Jeudi 2 D?cembre 2021 00:56:02 > Subject: Re: [External] : Re: JEP update: Value Objects > On Dec 1, 2021, at 3:29 PM, Dan Smith < [ mailto:daniel.smith at oracle.com | > daniel.smith at oracle.com ] > wrote: >> So we went down the path of "maybe there's no need for a flag at all" in today's >> meeting, and it might be worth more consideration, but I convinced myself that >> the ACC_VALUE flag serves a useful purpose for validation and clarifying intent >> that can't be reproduced by a "directly/indirectly extends ValueObject" test. >> As you suggest, though, we could mandate that ACC_VALUE implies 'implements >> ValueObject?. > Assuming ACC_VALUE is part of the design, there are actually four > things we can specify, for the case when a class file has ACC_VALUE set: > A. Inject ValueObject as a direct interface, whether or not it was already > inherited. > B. Inject ValueObject as a direct interface, if it is not already inherited. > C. Require ValueObject to be present as a direct interface, whether or not it > was already inherited. > D. Require ValueObject to be present as an interface, either direct or > inherited. > A and B will look magic to reflection. > B is slightly more parsimonious and less predictable than A. > C and D are less magic to reflection, and require a bit more ?ceremony? in the > class file. > D is less ceremony than C. > Also, the D condition is a normal subtype condition, while the C condition is > unusual to the JVM. > I guess I prefer C and D over A and B because of the reflection magic problem, > and also because of Dan H?s issue (IIUC) about ?where do we look for the > metadata, if not in somebody?s constant pool?? > Since D and C have about equal practical effect, and D is both simpler to > specify and less ceremony, I prefer D best of all. > I agree that ACC_VALUE is useful to prevent ?action at a distance?. > There is the converse problem that comes from the redundancy: > What happens if the class directly implements or inherits ValueObject > and ACC_VALUE is not set? I guess that is an error also. As Daniel said during the meeting and in a following email, from the POV of javac, the compiler should add "implements ValueObject" on all concrete value classes even if ValueObject is already present in the hierarchy to avoid action at distance (to detect when a super type change from implementing ValueObject to implement IdentityObject by example). With that requirement, for the VM, D and C are equivalent for all classes generated by javac. So D is Ok. R?mi > ? John From daniel.smith at oracle.com Thu Dec 2 15:04:59 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 2 Dec 2021 15:04:59 +0000 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> Message-ID: <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> On Dec 2, 2021, at 7:08 AM, Dan Heidinga > wrote: When converting back from our internal form to a classfile for the JVMTI RetransformClasses agents, I need to either filter the interface out if we injected it or not if it was already there. JVMTI's GetImplementedInterfaces call has a similar issue with being consistent - and that's really the same issue as reflection. There's a lot of small places that can easily become inconsistent - and therefore a lot of places that need to be checked - to hide injected interfaces. The easiest solution to that is to avoid injecting interfaces in cases where javac can do it for us so the VM has a consistent view. I think you may be envisioning extra complexity that isn't needed here. The plan of record is that we *won't* hide injected interfaces. Our hope is that the implicit/explicit distinction is meaningless?that turning implicit into explicit via JVMTI would be a 100% equivalent change. I don't know JVMTI well, so I'm not sure if there's some reason to think that wouldn't be acceptable... From daniel.smith at oracle.com Thu Dec 2 23:11:07 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 2 Dec 2021 23:11:07 +0000 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> Message-ID: > On Dec 2, 2021, at 1:04 PM, Dan Heidinga wrote: > > On Thu, Dec 2, 2021 at 10:05 AM Dan Smith wrote: >> >> On Dec 2, 2021, at 7:08 AM, Dan Heidinga wrote: >> >> When converting back from our internal form to a classfile for the >> JVMTI RetransformClasses agents, I need to either filter the interface >> out if we injected it or not if it was already there. JVMTI's >> GetImplementedInterfaces call has a similar issue with being >> consistent - and that's really the same issue as reflection. >> >> There's a lot of small places that can easily become inconsistent - >> and therefore a lot of places that need to be checked - to hide >> injected interfaces. The easiest solution to that is to avoid >> injecting interfaces in cases where javac can do it for us so the VM >> has a consistent view. >> >> >> I think you may be envisioning extra complexity that isn't needed here. The plan of record is that we *won't* hide injected interfaces. > > +1. I'm 100% on board with this approach. It cleans up a lot of the > potential corner cases. > >> Our hope is that the implicit/explicit distinction is meaningless?that turning implicit into explicit via JVMTI would be a 100% equivalent change. I don't know JVMTI well, so I'm not sure if there's some reason to think that wouldn't be acceptable... > > JVMTI's "GetImplementedInterfaces" spec will need some adaptation as > it currently states "Return the direct super-interfaces of this class. > For a class, this function returns the interfaces declared in its > implements clause." > > The ClassFileLoadHook (CFLH) runs either with the original bytecodes > as passed to the VM (the first time) or with "morally equivalent" > bytecodes recreated by the VM from its internal classfile formats. > The first time through the process the agent may see a value class > that doesn't have the VO interface directly listed while after a call > to {retransform,redefine}Classes, the VO interface may be directly > listed. The same issues apply to the IO interface with legacy > classfiles so with some minor spec updates, we can paper over that. > > Those are the only two places: GetImplementedInterfaces & CFLH and > related redefine/retransform functions, I can find in the JVMTI spec > that would be affected. Some minor spec updates should be able to > address both to ensure an inconsistency in the observed behaviour is > treated as valid. Useful details, thanks. Would it be a problem if the ClassFileLoadHook gives different answers depending on the timing of the request (derived from original bytecodes vs. JVM-internal data)? If we need consistent answers, it may be that the "original bytecode" approach needs to reproduce the JVM's inference logic. If it's okay for the answers to change, there's less work to do. To highlight your last point: we *will* need to work this out for inferred IdentityObject, whether we decide to infer ValueObject or not. From brian.goetz at oracle.com Sun Dec 5 18:36:05 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 5 Dec 2021 13:36:05 -0500 Subject: Fwd: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: References: Message-ID: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> The following was received on valhalla-spec-comments. Summary: Various syntax options for no-arg constructors of "bucket 3" primitives, to enable users to pick a default value other than zero. Analysis: The suggestion is well-intentioned, but it is built on some significant misunderstandings of the problem we are facing. It assumes that it is sensible to allow a non-zero default value of a primitive to be specified by the class declaration.? While it is entirely understandable why one would want this, the problem is not that there isn't a good syntax for it (there obviously is), nor that running the constructor multiple times is the problem -- it is deeper than that.? Numerous safety properties derive from the fact that newly allocated objects and arrays are bulk-initialized to zero; compromising this seems likely to lead to exploits. -------- Forwarded Message -------- Subject: Proposal: Static/final constructors for bucket-3 primitive classes. Date: Fri, 3 Dec 2021 21:15:50 -0600 From: Clement Cherlin To: valhalla-spec-comments at openjdk.java.net Motivation: A concern with primitive classes (bucket 3) is that the all-zeroes default value may be inappropriate or even invalid in some cases. This proposal suggests a language enhancement to give primitive class authors control over the default value of their class without, in most cases, requiring a constructor call to create an instance. Proposed language change: Primitive classes can apply either the keyword "static" or the keyword "final", but not both, to their no-argument constructor. A "final" no-arg constructor is evaluated once, at compile time. The constructed object is treated as a static final constant, and can be folded as a constant, or copied verbatim whenever a default value of that class is instantiated. A "static" no-arg constructor is evaluated once, when the class is loaded. The constructed object is copied verbatim whenever a default value of that class is instantiated. Justification: Presuming that non-zero default values need to exist, and we're going to be constructing lots and lots of primitive objects and arrays of primitive objects, it behooves us to make initialization of default values as efficient as possible. Much of the time, there will be no need to call a constructor / factory method, just make a copy of a pre-existing default value (perhaps lazily). Related work: For classes without sensible default values, I have another proposal I am working on to make initializing arrays of primitive objects possible and efficient, without resorting to the all-zeroes default. Cheers, Clement Cherlin From ccherlin at gmail.com Sun Dec 5 23:09:20 2021 From: ccherlin at gmail.com (Clement Cherlin) Date: Sun, 5 Dec 2021 17:09:20 -0600 Subject: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> Message-ID: On Sun, Dec 5, 2021 at 12:36 PM Brian Goetz wrote: > > The following was received on valhalla-spec-comments. > > Summary: Various syntax options for no-arg constructors of "bucket 3" > primitives, to enable users to pick a default value other than zero. > > Analysis: The suggestion is well-intentioned, but it is built on some > significant misunderstandings of the problem we are facing. > > It assumes that it is sensible to allow a non-zero default value of a > primitive to be specified by the class declaration. While it is > entirely understandable why one would want this, the problem is not that > there isn't a good syntax for it (there obviously is), nor that running > the constructor multiple times is the problem -- it is deeper than > that. Numerous safety properties derive from the fact that newly > allocated objects and arrays are bulk-initialized to zero; compromising > this seems likely to lead to exploits. Thank you for your feedback. However, far from leading to new exploits, my suggestion is aimed at fixing the flaws inherent in the current design that make it extremely, unnecessarily difficult to use correctly as a primitive class author. It makes the assumption that the all zeroes value can and should be the default value for every single primitive class. Initializing to zero is simple, unambiguous and efficient. It is perfectly reasonable to have all-zeroes as the "default default", so to speak. However, it is completely unacceptable to make the "default default" the one and only default, because it creates a value that was never constructed. Numerous safety properties of existing classes also derive from the fact that every instance was initialized by a constructor; compromising this will inevitably lead to the same kinds of exploits that serialization did. Consider a very slowly-growing, but not constant set of values which ought to be expandable at runtime, such as, say, media type codes for a transcoding server that supports dynamic plugins. It's not constant, so it can't be an enum. We must validate any new instance against a canonical list of permitted values before allowing it to be constructed, lest invalid (possibly malicious) values sneak into the system. public primitive record MediaCode(byte b1, byte b2, byte b3, byte b4) { public MediaCode { if (!isValidMediaCode(b1, b2, b3, b4)) throw new IllegalArgumentException(); } } An invalid MediaCode of 0,0,0,0 is now trivially constructable, perhaps accidentally, using MediaCode[] mediaCodes = new MediaCode[numMediaCodes]; // time passes, mediaCodes is partially but not completely filled... MediaCode whoops = mediaCodes[numMediaCodes - 1]; Which permits injecting "nul" bytes into, say, a byte stream that will be deserialized by C code expecting null-terminated strings, or recognizing as a "media file" something that is very much not. Sounds like that could easily lead to an exploit to me. And class authors are helpless to prevent this easily foreseeable error. Even making the constructor private won't help, because the zero default cannot be suppressed, hidden or prevented in any way. I've seen the suggestion "Make the class private". If the only solution to the problem is to hide from it, that is a tacit admission that the current design is unworkable. Now consider the problems caused by the unwanted but mandatory implicit initializers in this class: public primitive class LongRational { private long numerator = 0; private long denominator = 0; ... } which I don't think I need to elaborate. These are just two examples I thought of off the top of my head. I can invent dozens more plausible ways that the all-zeroes default will create exploitable bugs with very little effort, and you know that the, ahem, professional bug exploiters will have even less trouble. The following excerpt is from "Towards Better Serialization" (Brian Goetz, June 2019), https://cr.openjdk.java.net/~briangoetz/amber/serialization.html > In an object-oriented system, the role of the constructor is to initialize > an object with its invariants established; this allows the rest of the > system to assume a basic degree of object integrity. In theory, we > should be able to reason about the possible states an object might be > in by reading the code for its constructors and any methods that > mutate the object's state. But because serialization constitutes a > hidden public constructor, you have to also reason about the state > that objects might be in based on previous versions of the code > (whose source code might not even exist any more, to say nothing > of maliciously constructed bytestreams). By bypassing constructors, > serialization completely subverts the integrity of the object model. Strong words. "The role of the constructor is to initialize an object with its invariants established." "Serialization constitutes a hidden public constructor...", and "...bypassing constructors... completely subverts the integrity of the object model." I fully agree with all of those statements and sentiments. Unless authors waste up to 8 bytes of space in every instance by including an "isConstructed" boolean, or waste time revalidating the state of every instance in every method call, the integrity of the object model is subverted. Is not introducing footguns an important goal? Is maintaining the integrity of the object model an important goal? There will be a compromise somewhere, but forcing all-zeroes on every primitive class is the *wrong* compromise. How is the JVM bulk-initializing an array to an author-controlled default value via memcpy (or equivalent) likely to lead to exploits? Specifically, how is it any more likely to lead to exploits than the JVM initializing an array to an arbitrary, uncontrolled, possibly inherently invalid default value via calloc (or equivalent)? If static/final were required on primitive class constructors (or there was another way to initialize an array, more on that later) then there would be no possible way for an exception to be thrown mid-array- initialization. Is that not safe? If you really want belt-and-suspenders safety, the JVM can initialize to zero, then reinitialize with a constructed default. I don't see the need for it, but it's a possibility. Really think about the LongRational case. If default-zero initialization can make a simple numeric type (one of the primary anticipated use cases for primitive classes) so unsafe that the *default instance* will throw ArithmeticException if one so much as looks at it, what are we doing? Decreeing that primitive classes cannot ever opt out of an unsanitized, unvalidated, all-zeroes value will render them completely unsuitable for some roles that they would otherwise be ideal for. At that point, we might as well drop Bucket 3 entirely and stick with nullable value classes, since those have a preexisting, if unfortunate, default. I do not want to see primitive class initialization become a foreseeable and preventable disaster like serialization was. Any mistakes in the design will be a lasting part of Java, for future developers to curse and future blackhat hackers to exploit. Cheers, Clement Cherlin > -------- Forwarded Message -------- > Subject: Proposal: Static/final constructors for bucket-3 primitive > classes. > Date: Fri, 3 Dec 2021 21:15:50 -0600 > From: Clement Cherlin > To: valhalla-spec-comments at openjdk.java.net > > > > Motivation: A concern with primitive classes (bucket 3) is that the > all-zeroes default value may be inappropriate or even invalid in some > cases. This proposal suggests a language enhancement to give primitive > class authors control over the default value of their class without, > in most cases, requiring a constructor call to create an instance. > > Proposed language change: > Primitive classes can apply either the keyword "static" or the > keyword "final", but not both, to their no-argument constructor. > > A "final" no-arg constructor is evaluated once, at compile time. The > constructed object is treated as a static final constant, and can be > folded as a constant, or copied verbatim whenever a default value of > that class is instantiated. > > A "static" no-arg constructor is evaluated once, when the class is loaded. > The > constructed object is copied verbatim whenever a default value of that > class is instantiated. > > Justification: > Presuming that non-zero default values need to exist, and we're going > to be constructing lots and lots of primitive objects and arrays of > primitive objects, it behooves us to make initialization of default > values as efficient as possible. Much of the time, there will be no > need to call a constructor / factory method, just make a copy of a > pre-existing default value (perhaps lazily). > > Related work: > For classes without sensible default values, I have another proposal I > am working on to make initializing arrays of primitive objects possible > and efficient, without resorting to the all-zeroes default. > > Cheers, > Clement Cherlin From john.r.rose at oracle.com Thu Dec 9 04:30:50 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 08 Dec 2021 20:30:50 -0800 Subject: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> Message-ID: <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> We have considered, at various points in the last six years or more, allowing user-defined primitive types to define (under user control) their own default values. The syntax is unimportant, but the concept is simple: Surely the user who defines a primitive type can also define default initializer expressions for each of the fields. But this would be a trail of tears, which we have chosen to avoid, each time the suggestion comes up. This feature is often visualized as a predefined bit pattern, which the JVM would keep handy, and just stamp down wherever a default initializer is needed. It?s can?t really be that simple, but even such a bit pattern is problematic. First of all is the problem of declaring the bit pattern. Java natively uses the side effects of `` to define constants using ad hoc bytecodes; it also defines (for some types but not others) a concept of constant expression. Neither of those fits well into a classfile that would define a primitive with a default bit pattern. If the bit pattern is defined using ad hoc bytecode, it must be defined in a new pseudo-method (not ``), to execute not *during* the initialization of the newly-declared primitive class, but *before*. (Surely not! a reader might exclaim, but this is the sort of subtlety we have to deal with.) During initialization of a class C, all fields of its own type C must be initialized *before* the first bytecode of `` executes, so that the static initializer code has something to write on. So there must be a ?default value definition? phase, call it ``, added after linking and before initialization of C, so C?s `` method has something to work with. This `` is really the body of a no-argument constructor of C, or its twin. A no-argument constructor of C is not a problem, but having it execute before C?s `` block is a huge irregularity, which the JVM spec is not organized to support, at present. This would turn into both JVMS and JLS spec. complexity, and more odd corners (and odd states) in the Java user experience. Sure, a user will say, ?but I promise not to do anything odd; I just want *this field* to be the value `(int)1`?. Yes, but a spec. must define not only the expected usages, but all possible usages, with no poorly-defined states. OK, so if `` is not the place to define to define this elusive bit pattern, what about something more declarative, like a `ConstantValue` attribute? Surely we could put a similarly structured `DefaultValue` attribute on every non-static field of a value type, and that would give the JVM enough information to synthesize the required bit pattern *before* it runs ``. Consider the user model here: A primitive declaration would allow its fields to have non-zero default values, *but only drawn from the restricted set of constant expressions*, because those are the ones which fit in the `ConstantValue` attribute. (They are true bit patterns in the constant pool, plus `String` constants.) There is no previous place in Java where we make such a restriction, except `case` labels. Can you hear the groans of users as we try to explain why only constant expressions are allowed in that context? That?s the muzak of the trail of tears I mentioned above. But we have condy to fix that (someone will surely say). But that?s problematic, because the resolution of constant pool constants of a class C requires C to be at least linked, and if the condy expression makes a self-reference to C itself, that will trigger C?s initialization, at an awkward moment. Have you ever debugged a tangled initialization circularity, marked by mysterious NPEs on variables you *know* you initialized? I have. It?s a stop on the trail of tears I mentioned. But if we really worked hard, and added a bunch of stuff to the JVMS and JLS, and persuaded users not to bother us about the odd restrictions (to constant expressions, or expressions which ?don?t touch the class itself?), we *could* define some sort of declarative default value initialization. What then? Well, ask the JVM engineers how they initialize heap variables, because those are the affected paths. Those parts of the JVM are among the most performance-sensitive. Currently, when a new object or array is created, its whole body (except the header) is sprayed with a nice even coat of all-zero-bit machine words. This is pretty fast, and it?s important to keep it fast. What if creating an array required painting some beautifully crafted arabesque of a bit pattern defined by a creative user? Well, it?s doable, but much more complicated. You need to load the bit pattern into live registers and (if it?s an array of C) keep them live while you paint the whole array. That?s got to be more expensive than spraying zeroes. (There?s even hardware that?s good for spraying zeroes, on some machines.) Basically, if we generously allowed users even a limited set of pre-defined default primitive values, we would be inviting them to create mysterious performance problems *for their clients*. Reflective creation of objects and arrays is also complicated by non-zero defaults, of course. When you reflectively create a heap node, today you compute its size, allocate its memory, store some metadata to its header, and paint the rest zero. That turns into something more complicated (see above about live registers) and metadata-driven, in the presence of non-zero defaults. I haven?t yet mentioned *reference* fields, but those are another can of worms. The JVM vigorously tracks references. Suppose your primitive had a String-valued field, and you were allowed to declare a non-null default value for it, say `"empty"`. If one of your customers creates an array of these things, suddenly there is a GC card mark (for many GCs) on *every element of the array*, and that is *before you do anything useful with it*. References also support circularity, including indirect cycles from an instance of C back to C itself. Can you guarantee that the computation of some tricky reference for your default value of `C.foo` won?t require linking of C itself, and a vicious circularity? No, you can?t, and you won?t like the feeling of debugging such a thing either. Trail of tears, again. Finally, depending on which of the above flawed tactics is chosen for representing user-selected default values, there is the possibility that JVM code can observe a variable V of type C in its pre-initialization state, because (a) C?s initialization specification is being loaded or evaluated somehow, and (b) the variable V has been allocated but is waiting for an initialization bit pattern. (V might be a static of C, or something in a related dependent class. Also it could be a multi-threading situation, where V is being observed via a race condition; those are very hard to keep straight.) During those moments, if V is loaded, then (voila!) it will have either garbage or those good old all-zero bits in it. And the abstraction we were laboring to secure will be subverted. This usually doesn?t happen, but when it?s an accident it?s a very subtle bug, and when it?s on purpose it turns into a security escalation. It?s best to keep the simple default all-zero conventions. They are robust and understandable and regular. When they are inconvenient, users will find workarounds. I hope this helps. ? John On 5 Dec 2021, at 10:36, Brian Goetz wrote: > The following was received on valhalla-spec-comments. > > Summary: Various syntax options for no-arg constructors of "bucket 3" > primitives, to enable users to pick a default value other than zero. > > Analysis: The suggestion is well-intentioned, but it is built on some > significant misunderstandings of the problem we are facing. > > It assumes that it is sensible to allow a non-zero default value of a > primitive to be specified by the class declaration.? While it is > entirely understandable why one would want this, the problem is not > that there isn't a good syntax for it (there obviously is), nor that > running the constructor multiple times is the problem -- > it is deeper than that.? Numerous safety properties derive from the > fact that newly allocated objects and arrays are bulk-initialized to > zero; compromising this seems likely to lead to exploits. From forax at univ-mlv.fr Thu Dec 9 07:12:17 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 9 Dec 2021 08:12:17 +0100 (CET) Subject: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> Message-ID: <2057846228.132574.1639033937029.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Brian Goetz" > Cc: "valhalla-spec-experts" , "clement > cherlin" > Sent: Thursday, December 9, 2021 5:30:50 AM > Subject: Re: Proposal: Static/final constructors for bucket-3 primitive classes. > We have considered, at various points in the last six years or more, allowing > user-defined primitive types to define (under user control) their own default > values. The syntax is unimportant, but the concept is simple: Surely the user > who defines a primitive type can also define default initializer expressions > for each of the fields. > But this would be a trail of tears, which we have chosen to avoid, each time the > suggestion comes up. > This feature is often visualized as a predefined bit pattern, which the JVM > would keep handy, and just stamp down wherever a default initializer is needed. > It?s can?t really be that simple, but even such a bit pattern is problematic. > First of all is the problem of declaring the bit pattern. Java natively uses the > side effects of to define constants using ad hoc bytecodes; it also > defines (for some types but not others) a concept of constant expression. > Neither of those fits well into a classfile that would define a primitive with > a default bit pattern. > If the bit pattern is defined using ad hoc bytecode, it must be defined in a new > pseudo-method (not ), to execute not during the initialization of the > newly-declared primitive class, but before . (Surely not! a reader might > exclaim, but this is the sort of subtlety we have to deal with.) During > initialization of a class C, all fields of its own type C must be initialized > before the first bytecode of executes, so that the static initializer > code has something to write on. So there must be a ?default value definition? > phase, call it , added after linking and before > initialization of C, so C?s method has something to work with. This > is really the body of a no-argument constructor of C, or its > twin. A no-argument constructor of C is not a problem, but having it execute > before C?s block is a huge irregularity, which the JVM spec is not > organized to support, at present. > This would turn into both JVMS and JLS spec. complexity, and more odd corners > (and odd states) in the Java user experience. Sure, a user will say, ?but I > promise not to do anything odd; I just want this field to be the value (int)1 > ?. Yes, but a spec. must define not only the expected usages, but all possible > usages, with no poorly-defined states. > OK, so if is not the place to define to define this elusive > bit pattern, what about something more declarative, like a ConstantValue > attribute? Surely we could put a similarly structured DefaultValue attribute on > every non-static field of a value type, and that would give the JVM enough > information to synthesize the required bit pattern before it runs . > Consider the user model here: A primitive declaration would allow its fields to > have non-zero default values, but only drawn from the restricted set of > constant expressions , because those are the ones which fit in the > ConstantValue attribute. (They are true bit patterns in the constant pool, plus > String constants.) There is no previous place in Java where we make such a > restriction, except case labels. Can you hear the groans of users as we try to > explain why only constant expressions are allowed in that context? That?s the > muzak of the trail of tears I mentioned above. > But we have condy to fix that (someone will surely say). you read my mind :) > But that?s problematic, because the resolution of constant pool constants of a > class C requires C to be at least linked, and if the condy expression makes a > self-reference to C itself, that will trigger C?s initialization, at an awkward > moment. Have you ever debugged a tangled initialization circularity, marked by > mysterious NPEs on variables you know you initialized? I have. It?s a stop on > the trail of tears I mentioned. > But if we really worked hard, and added a bunch of stuff to the JVMS and JLS, > and persuaded users not to bother us about the odd restrictions (to constant > expressions, or expressions which ?don?t touch the class itself?), we could > define some sort of declarative default value initialization. > What then? Well, ask the JVM engineers how they initialize heap variables, > because those are the affected paths. Those parts of the JVM are among the most > performance-sensitive. Currently, when a new object or array is created, its > whole body (except the header) is sprayed with a nice even coat of all-zero-bit > machine words. This is pretty fast, and it?s important to keep it fast. What if > creating an array required painting some beautifully crafted arabesque of a bit > pattern defined by a creative user? Well, it?s doable, but much more > complicated. You need to load the bit pattern into live registers and (if it?s > an array of C) keep them live while you paint the whole array. That?s got to be > more expensive than spraying zeroes. (There?s even hardware that?s good for > spraying zeroes, on some machines.) Basically, if we generously allowed users > even a limited set of pre-defined default primitive values, we would be > inviting them to create mysterious performance problems for their clients . > Reflective creation of objects and arrays is also complicated by non-zero > defaults, of course. When you reflectively create a heap node, today you > compute its size, allocate its memory, store some metadata to its header, and > paint the rest zero. That turns into something more complicated (see above > about live registers) and metadata-driven, in the presence of non-zero > defaults. > I haven?t yet mentioned reference fields, but those are another can of worms. > The JVM vigorously tracks references. Suppose your primitive had a > String-valued field, and you were allowed to declare a non-null default value > for it, say "empty" . If one of your customers creates an array of these > things, suddenly there is a GC card mark (for many GCs) on every element of the > array , and that is before you do anything useful with it . > References also support circularity, including indirect cycles from an instance > of C back to C itself. Can you guarantee that the computation of some tricky > reference for your default value of C.foo won?t require linking of C itself, > and a vicious circularity? No, you can?t, and you won?t like the feeling of > debugging such a thing either. Trail of tears, again. > Finally, depending on which of the above flawed tactics is chosen for > representing user-selected default values, there is the possibility that JVM > code can observe a variable V of type C in its pre-initialization state, > because (a) C?s initialization specification is being loaded or evaluated > somehow, and (b) the variable V has been allocated but is waiting for an > initialization bit pattern. (V might be a static of C, or something in a > related dependent class. Also it could be a multi-threading situation, where V > is being observed via a race condition; those are very hard to keep straight.) > During those moments, if V is loaded, then (voila!) it will have either garbage > or those good old all-zero bits in it. And the abstraction we were laboring to > secure will be subverted. This usually doesn?t happen, but when it?s an > accident it?s a very subtle bug, and when it?s on purpose it turns into a > security escalation. > It?s best to keep the simple default all-zero conventions. They are robust and > understandable and regular. When they are inconvenient, users will find > workarounds. > I hope this helps. I fully agree, i think it's better to do the opposite and force the fact that all primitive value classes (Bucket 3) must have a default constructor and that constructor have a fixed bytecode instructions. If a user does not provide a constructor without parameter, the compiler will provide one and the verifier will check that this constructor exist. If a user want to provide that constructor to be able to add javadoc on it, it should have only one instruction which is to call default() with no parameter, something like public primitive value class Complex { public Complex() { default(); } } >From the VM POV, it's an initfactory with a defaultvalue (or whatever the name of that bytecode) + areturn, so this can be easily check by the VM. The idea of forcing to have such constructor is to help users to think that whatever they do, people will still be able to create an empty B3. > ? John R?mi > On 5 Dec 2021, at 10:36, Brian Goetz wrote: >> The following was received on valhalla-spec-comments. >> Summary: Various syntax options for no-arg constructors of "bucket 3" >> primitives, to enable users to pick a default value other than zero. >> Analysis: The suggestion is well-intentioned, but it is built on some >> significant misunderstandings of the problem we are facing. >> It assumes that it is sensible to allow a non-zero default value of a primitive >> to be specified by the class declaration. While it is entirely understandable >> why one would want this, the problem is not that there isn't a good syntax for >> it (there obviously is), nor that running the constructor multiple times is the >> problem -- >> it is deeper than that. Numerous safety properties derive from the fact that >> newly allocated objects and arrays are bulk-initialized to zero; compromising >> this seems likely to lead to exploits. From john.r.rose at oracle.com Thu Dec 9 08:45:11 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Dec 2021 08:45:11 +0000 Subject: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: <2057846228.132574.1639033937029.JavaMail.zimbra@u-pem.fr> References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> <2057846228.132574.1639033937029.JavaMail.zimbra@u-pem.fr> Message-ID: On Dec 8, 2021, at 11:12 PM, Remi Forax wrote: > > I fully agree, i think it's better to do the opposite I snapped a few neurons trying to read that the first time. > and force the fact that all primitive value classes (Bucket 3) must have a default constructor and that constructor have a fixed bytecode instructions. Heavy on ceremony even for Java especially if you can?t do anything valuable in the constructor body. > > If a user does not provide a constructor without parameter, the compiler will provide one and the verifier will check that this constructor exist. That?s JVM ceremony, to what end? Maybe we should disallow no-arg constructors altogether and leave room for a possible future feature along the lines of the special init phase. That future feature would run ad hoc byte codes at class preparation time to build thyroid default value and would throw an error if it touched the class. Kind of like superclass init actions; after those and before the proper clinit call. It?s possible but not a priority, because of the various expenses I sketched. So we could leave space for it to put in later if the costs were justified after all. From forax at univ-mlv.fr Thu Dec 9 15:25:50 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 9 Dec 2021 16:25:50 +0100 (CET) Subject: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> <2057846228.132574.1639033937029.JavaMail.zimbra@u-pem.fr> Message-ID: <1803938440.516237.1639063550354.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" , "clement > cherlin" > Sent: Thursday, December 9, 2021 9:45:11 AM > Subject: Re: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes. > On Dec 8, 2021, at 11:12 PM, Remi Forax wrote: >> >> I fully agree, i think it's better to do the opposite > > I snapped a few neurons trying to read that the first time. hum, there is a missing 'but' after the comma ... > >> and force the fact that all primitive value classes (Bucket 3) must have a >> default constructor and that constructor have a fixed bytecode instructions. > > Heavy on ceremony even for Java especially if you can?t do anything valuable in > the constructor body. >> >> If a user does not provide a constructor without parameter, the compiler will >> provide one and the verifier will check that this constructor exist. > > That?s JVM ceremony, to what end? Users are used to constructors, but bucket 3 inherently has an escape hatch because you can create an instance bypassing the constructors. Bypassing the constructors is bad, we know that because this is what the serialization does, so instead of letting people to figure out out of blue that they should use B2 instead of B3 for such classes, i think it's better to maintain the illusion that there is a default constructor with no parameter for all B3. It makes the semantics of B3 very clear, by making a public default constructor mandatory. The JVM ceremony is not strictly necessary, i propose it so if people uses another bytecode generator than javac, things are still nicely aligned between the JLS and the JVMS view of the world, but it's less important. > > Maybe we should disallow no-arg constructors altogether and leave room for a > possible future feature along the lines of the special init phase. That future > feature would run ad hoc byte codes at class preparation time to build thyroid > default value and would throw an error if it touched the class. Kind of like > superclass init actions; after those and before the proper clinit call. > > It?s possible but not a priority, because of the various expenses I sketched. So > we could leave space for it to put in later if the costs were justified after > all. We may do something like that in a possible future, but i think it's more important to make the semantics of B3 visible front and center. R?mi From john.r.rose at oracle.com Thu Dec 9 18:15:06 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 09 Dec 2021 10:15:06 -0800 Subject: [External] : Re: Proposal: Static/final constructors for bucket-3 primitive classes. In-Reply-To: <1803938440.516237.1639063550354.JavaMail.zimbra@u-pem.fr> References: <6d0e4bd0-4dd2-9702-1a24-3c7ce5eedf00@oracle.com> <92B6DF83-478B-4D69-8E31-C2F25CB5DD08@oracle.com> <2057846228.132574.1639033937029.JavaMail.zimbra@u-pem.fr> <1803938440.516237.1639063550354.JavaMail.zimbra@u-pem.fr> Message-ID: <3B4A412A-412D-4081-8BCA-1D1BF89C5564@oracle.com> On 9 Dec 2021, at 7:25, forax at univ-mlv.fr wrote: > We may do something like that in a possible future, but i think it's > more important to make the semantics of B3 visible front and center. If you can only say one thing in such an explicit no-arg constructor (true initially and maybe forever) then it surely is strange that the silly thing has a body. So that leads to some un-bodied presentation like `class P { public default P(); }`, which could be made more expressive later (or never, probably). But that, in turn, hits near to one of the places where Java *already set the default* (rightly or wrongly). Java defines, under some circumstances, the no-arg constructor for a class implicitly. Arguably this precedent applies (though not exactly) to the current case, of default construction of the default value. I think, in the end, making a new primitive (as opposed to a new value class) is going to be an activity for library experts, not end users. Maybe the IDEs (not the JLS) can help them avoid pitfalls, but primitives are inherently tricky things to define. This means either that (a) it?s OK to force the experts to do the extra ceremony?s or (b) it?s OK to assume they know the rules of that game, and the ceremony won?t add anything. I incline towards (b). The vision I?m assuming here is that a _bare primitive_ is something inherently loosely assembled. It?s really just a bundle of scalar values. If you want a class wrapped around that bundle, you should be declaring your value as a _primitive reference_ (assuming the option for the bare primitive must also be provided) or declaring your type as a true _value class_ (if the option for the bare primitive is not so important). P.S. A friend kindly helped me update my metaphor firmware. I meant to say that pushing the feature under discussion would lead us along a path of pain, with various experiences along the way. But obviously not existential Jacksonian pain. And that?s all I want to say here about that. From kevinb at google.com Tue Dec 14 01:49:15 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 13 Dec 2021 17:49:15 -0800 Subject: basic conceptual model Message-ID: Hi, So I've been threatening for a long time that I've been hard at work writing up a coherent conceptual model for "how data looks/works inside a running Java program today". I have a few purposes for it, but one is to form a basis for explaining "and now here's precisely what parts Valhalla will change and how". *Data in Java programs: a basic conceptual model* This model comes filtered through a particular set of perceptions and biases. It's just *a* documented model and isn't trying to be a *the*. As such, you don't have to agree with all of it, but it would still be very helpful to know if it is inconsistent or confusing or ill-founded, or if you just see a way it could be better. I'll gladly add comment access on request. The next document (whenever that is) will try to examine various options for adjusting that particular model to accommodate Valhalla. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Tue Dec 14 02:40:58 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Dec 2021 18:40:58 -0800 Subject: basic conceptual model In-Reply-To: References: Message-ID: <72C8172B-A3B5-4825-890F-CFC2D40D4253@oracle.com> I have some comments. Since the doc invites directly stuck-on comments, I?ve requested edit permission, as that seems necessary for me to stick on a comment. Some free-floating notes: Good use of ?freely copyable? as a concept. There?s a tough case, happily not relevant to Java, of linear types (IIRC Rust has them) where a value is freely copyable, but only to the extent that the source forgets the value after the sink gets it. Accounting for that would stress your terminology. Another (more subtle) stress to your terminology is your assertion that a mutable variable ?forgets? the previous value when a new value is stored. That isn?t strictly correct in the case of race conditions. Only a volatile variable reliably ?forgets? its previous value in the presence of races. You don?t actually define the term ?value? but just illustrate it and make claims about it. Maybe you have to do it that way? Actually, you say it?s ?unit of data?. Referring to ?data? as a known term (for readers who are programmers) is OK. Saying ?unit? is more mysterious. You certainly don?t mean units of measure, or functional programming unit types. Are you meaning to imply that it has no subparts which might also be termed units? That?s OK as long as you have today?s primitives (which I like to call ?scalar primitives?) and of course references (which are also scalars). By ?scalar? I mean an item of data that is not composed of further scalars. From john.r.rose at oracle.com Tue Dec 14 02:44:59 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Dec 2021 18:44:59 -0800 Subject: basic conceptual model In-Reply-To: <72C8172B-A3B5-4825-890F-CFC2D40D4253@oracle.com> References: <72C8172B-A3B5-4825-890F-CFC2D40D4253@oracle.com> Message-ID: <7146FD7D-4901-4F3C-B144-9B3A9E0722ED@oracle.com> Two more thoughts: You could get away with saying ?indivisible unit?; I think that would convey much of what you mean. Also, a footnote drawing the reader?s attention to native hardware types (long, byte, float, reference) would make it clear that a Java computation is meant to ?bottom out? in operations on units of data familiar to assembly programmers. They are indivisible units, but even more important, their operations are natural to real computers. On 13 Dec 2021, at 18:40, John Rose wrote: > I have some comments. Since the doc invites directly stuck-on comments, I?ve requested edit permission, as that seems necessary for me to stick on a comment. > > Some free-floating notes: > > Good use of ?freely copyable? as a concept. There?s a tough case, happily not relevant to Java, of linear types (IIRC Rust has them) where a value is freely copyable, but only to the extent that the source forgets the value after the sink gets it. Accounting for that would stress your terminology. > > Another (more subtle) stress to your terminology is your assertion that a mutable variable ?forgets? the previous value when a new value is stored. That isn?t strictly correct in the case of race conditions. Only a volatile variable reliably ?forgets? its previous value in the presence of races. > > You don?t actually define the term ?value? but just illustrate it and make claims about it. Maybe you have to do it that way? Actually, you say it?s ?unit of data?. Referring to ?data? as a known term (for readers who are programmers) is OK. > > Saying ?unit? is more mysterious. You certainly don?t mean units of measure, or functional programming unit types. Are you meaning to imply that it has no subparts which might also be termed units? That?s OK as long as you have today?s primitives (which I like to call ?scalar primitives?) and of course references (which are also scalars). By ?scalar? I mean an item of data that is not composed of further scalars. From kevinb at google.com Tue Dec 14 03:05:09 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 13 Dec 2021 19:05:09 -0800 Subject: basic conceptual model In-Reply-To: <7146FD7D-4901-4F3C-B144-9B3A9E0722ED@oracle.com> References: <72C8172B-A3B5-4825-890F-CFC2D40D4253@oracle.com> <7146FD7D-4901-4F3C-B144-9B3A9E0722ED@oracle.com> Message-ID: > > On 13 Dec 2021, at 18:40, John Rose wrote: > > > Another (more subtle) stress to your terminology is your assertion that > a mutable variable ?forgets? the previous value when a new value is > stored. That isn?t strictly correct in the case of race conditions. Only > a volatile variable reliably ?forgets? its previous value in the presence > of races. > Indeed there was a revision where "(modulo race conditions)" was there and I'll put it back. > You don?t actually define the term ?value? but just illustrate it and > make claims about it. Maybe you have to do it that way? Actually, you say > it?s ?unit of data?. Referring to ?data? as a known term (for readers who > are programmers) is OK. > Yes, in general I am sure that I can't accomplish actual ground up non-cyclical definition-definitions here. I think it should suffice to be descriptive enough for the reader to course-correct their previous notions in this direction (provided they want to). > Saying ?unit? is more mysterious. You certainly don?t mean units of > measure, or functional programming unit types. Are you meaning to imply > that it has no subparts which might also be termed units? Oh, I actually do not want to imply irreducibility at all. That all values have had that property in Java is a fact I would label as incidental-not-essential., Glob, gob, blob, hunk, chunk, piece, ..... > That?s OK as long as you have today?s primitives (which I like to call > ?scalar primitives?) and of course references (which are also scalars). By > ?scalar? I mean an item of data that is not composed of further scalars. > A tangent, but there's enough math major still in me to object to this. :-) Scalars are scalar because they scale things! This would be more similar to a one-dimensional vector space than to a scalar.... imho the best adjective for today's primitives is "primitive" and I'll plead my case about that soon too. :-) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Tue Dec 14 04:36:31 2021 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Dec 2021 20:36:31 -0800 Subject: [External] : Re: basic conceptual model In-Reply-To: References: <72C8172B-A3B5-4825-890F-CFC2D40D4253@oracle.com> <7146FD7D-4901-4F3C-B144-9B3A9E0722ED@oracle.com> Message-ID: <6EC05409-60DD-4E25-A9FC-026BADB09F74@oracle.com> On 13 Dec 2021, at 19:05, Kevin Bourrillion wrote: > ? > Yes, in general I am sure that I can't accomplish actual ground up > non-cyclical definition-definitions here. I think it should suffice to be > descriptive enough for the reader to course-correct their previous notions > in this direction (provided they want to). Yup, I see that?s how it?s working in there. > > >> Saying ?unit? is more mysterious. You certainly don?t mean units of >> measure, or functional programming unit types. Are you meaning to imply >> that it has no subparts which might also be termed units? > > > Oh, I actually do not want to imply irreducibility at all. That all values > have had that property in Java is a fact I would label as > incidental-not-essential., > > Glob, gob, blob, hunk, chunk, piece, ..... In that case I claim unit has the wrong connotation, since it does (often) come with an expectation of irreducibility. With that in mind I like the unassuming term ?piece?, or those other words. If you are still in thesaurus mode: https://www.thesaurus.com/browse/portion > > > >> That?s OK as long as you have today?s primitives (which I like to call >> ?scalar primitives?) and of course references (which are also scalars). By >> ?scalar? I mean an item of data that is not composed of further scalars. >> > > A tangent, but there's enough math major still in me to object to this. :-) > Scalars are scalar because they scale things! This would be more similar to > a one-dimensional vector space than to a scalar.... imho the best > adjective for today's primitives is "primitive" and I'll plead my case > about that soon too. :-) Sure, that?s a good position for math majors like you and me. And I?m sure you/they/we really squirm in the presence of discussions about ?vector processing units? and ?vector ISAs?. But the squirm-worthy folks that define VPUs also use the term ?scalar? to mean ?the value that?s in a vector lane?, and they assuredly do not mean that ?scalar? can be identified with ?single-lane vector?. From brian.goetz at oracle.com Tue Dec 14 19:48:42 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Dec 2021 14:48:42 -0500 Subject: Enhancing java.lang.constant for Valhalla Message-ID: The jl.constant API will have to be updated somewhat for Valhalla.? Since it was already on the drawing board when we designed jl.constant, shouldn't be too bad, but there are a few subtleties.? Now that the descriptors are largely settling down, we can take a stab at this. ClassDesc (the base abstraction) has several factories to get a CD from a String: ??? of(String qualifiedName) ??? of(String packageName, unqualifiedClassName) ??? ofDescriptor(String fieldDescriptor) Obviously we can already represent extended primitives with the last of these, but doing nothing else would make them somewhat second-class. For reference, we also have combinators: ??? nested(String unqualifiedNestedName): give me a ClassDesc for a class nested in this one ??? arrayType(): give me a ClassDesc for the array with this component type ??? componentType(): (partial) give me a ClassDesc for the component type of this one, assuming this one is an array type With the addition of Q descriptors, this library reveals itself to be L-biased; ClassDesc.of("com.foo.Bar") gives us an L-Bar. ("You could look in the classfile", I hear some of you say.? Not so fast; this is a symbolic API, not a reflective one, by design.)? But this is OK; L is a reasonable default. The fully orthogonal version would involve adding: ??? static ClassDesc ofValue(qualifiedName) ??? static ClassDesc ofValue(String packageName, unqualifiedClassName) ??? boolean isValue() ??? ClassDesc valueType() // flip to Q ??? ClassDesc refType()?? // flip to L But the first two are not really necessary, since they can be expressed both with ClassDesc.of(name).valueType(), or with ClassDesc.ofDescriptor(desc), and I'm inclined to go that route -- the canonical constructor is ofDescriptor, the others are conveniences around that, and complex transforms are done by combinators. Separately, over in bytecode-API land, we have identified a desire for another overload of ClassDesc::of, which is one that takes an internal (slash-separated) name. One of the horrors of classfile APIs is that the classfile format is woefully inconsistent about names.? Sometimes it wants an internal binary name (foo/Bar), sometimes a descriptor (Lfoo/Bar;), and there are other exceptions (e.g., module and package names use dots, operand to `new` is sometimes an internal binary name, but a descriptor for arrays.)? So accepting any sort of String immediately raises the question: "in what format?"? A more strongly typed API would use ClassDesc in some places, but given that we might have an internal binary name, or external binary name, or descriptor in hand, we need a way to convert all of these to a ClassDesc.? For the external binary name and descriptor, we have ClassDesc::of and ::ofDescriptor, but we're missing one for internal binary names.? So I'm proposing: ??? ClassDesc ofInternal(String internalBinaryName) to round out the set. So, summarizing the new methods (modulo naming changes to reflect changes in Valhalla language syntax): ??? ClassDesc ofInternal(String internalBinaryName) ??? boolean isValue() ??? ClassDesc valueType() ??? ClassDesc refType() Also, eventually, all the *Impl classes in this library can become B2 primitives, since identity doesn't matter. From daniel.smith at oracle.com Wed Dec 15 16:38:03 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 Dec 2021 16:38:03 +0000 Subject: EG meeting, 2021-12-15 Message-ID: EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT). Possible topics: "JEP update: Value Objects": discussed some details about inferred superinterfaces, including impact on JVMTI "basic conceptual model": Kevin shared his notes describing key Java programming model concepts, in anticipation of changes coming from primitive classes "Enhancing java.lang.constant for Valhalla": Brian explored evolution of java.lang.constant From kevinb at google.com Wed Dec 15 18:42:55 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Dec 2021 10:42:55 -0800 Subject: We have to talk about "primitive". Message-ID: (Okay, so we're doing this) I think the rename to "primitive classes" happened during my outage last year. When I came back I made the decision to like it. Since then, I've found that in my explanatory model I'm fighting against it constantly. I think it may actually be fatally flawed. The points I raise here were surely already known at the time, and I know there were good reasons for overriding them. But I feel the need to come back and push harder on their importance. Background: the textbook definition of "primitive" is centered on their nature of being elements-not-molecules, and I see no dispute about it. Also, there's no disputing the fact that we're allowed to adopt a different meaning if we so choose. So that's not even the fatal flaw. The main problem I think we can't escape is that we'll still need some word that means only the eight predefined types. (For the sake of argument let's assume we can pick one and lean hard on it, whether that's "predefined", "built-in", "elemental", "leaf type", or whatever.) Definitely, our trying to minimize their specialness is virtuous. They should be like helium: yes, they are molecules when you want a molecule! But on any deeper look they will clearly be "actually" elements, and the distinction will matter often enough. So we have to attempt to shift users' understanding of "primitive" while at the same time injecting a new term to mean exactly what primitive used to mean. That's the old Indiana Jones switch and I don't have to tell you how that turned out for him. It would be difficult to pull off in a world where we were just pushing some new server and the whole world gets the new model at once. But in this universe where every version of Java ever made all have to coexist, it's looking to me like a guaranteed source of never-ending confusion. I also think it robs us of our ability to smoothly portray the real changes of Valhalla. We want to be able to say "elements are still elements! now we have molecules too". Pedagogically that is always preferable to "elements aren't really what you thought they were". Okay, the real comparison is a little more nuanced than that, but I'll get to that now. An alternative that seems to work fine, in my mental model at least, is: - Primitive types are examples of value types, and have always been. - Java never supported any other kinds of value types before, so we didn't distinguish the terms before. - Everything you associate with primitive types remains true. - But most of those traits really come from their value-type-ness. (I plan to make the above shifts to my model document already.) - Now we have user-defined value types too. - The way we user-define a type is with a class, so a value type is defined by a "value class" (sorry B2). - The primitive types will now each get a value class. - These 8 classes will look as much like user-defined types as Object does. - They, like Object, will have a "cheat" in their source code that no one else gets to use. (Object's is that there is no implied `extends Object` or `super();`; these need no fields because the data they store is magically handled by the VM. These feel like similar cheats.) Then mopping up the rest: - Existing classes probably need a term like "reference classes" (in the model I'm going to circulate that doubles down on values-are-not-objects, then this wants to be "object classes", even though that feels weird at first). - I think the term for bucket 2 classes really ought to center on identitylessness, e.g. "noid", "noident", "idfree", or something. Anything else is getting away from the essential meaning of the bucket; plus, we want people to call bucket 1 classes "identity classes", don't we? Footnote: for a more concrete manifestation of this problem: I am sure we cannot possibly get away with Class.isPrimitive() being true for these classes. Right? Thoughts? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Dec 15 19:17:25 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Dec 2021 14:17:25 -0500 Subject: We have to talk about "primitive". In-Reply-To: References: Message-ID: > Background: the textbook definition of "primitive" is centered on > their nature of being elements-not-molecules, and I see no dispute > about it. Also, there's no disputing the fact that we're allowed to > adopt a different meaning if we so choose. So that's not even the > fatal flaw. Yes, that's definitely a point against -- these things are "not primitive" in the atomic sense.? OTOH, they are *very much like* today's primitives in many other ways.? So this is a choice between being strictly linguistically accurate and appealing to existing mental models.? Tough choice. > The main problem I think we can't escape is that we'll still need some > word that means only the eight predefined types. (For the sake of > argument let's assume we can pick one and lean hard on it, whether > that's "predefined", "built-in", "elemental", "leaf type", or whatever.) I've been calling them the built-in primitives; we've test-driven other terms like "basic" primitives.? Assume we'll agree on a term. Also, no matter how we try, they will be different from the extended primitives in some ways, such as: ?- Their reference companions have weird names (e.g., Integer); ?- They permit a seemingly circular declaration (i.e., the declaration of "class int" will use "int" in its representation); ?- They will be translated differently, because the VM has built-in carriers for I/J/F/D, whereas extended primitives will use the L and Q carriers; ?- There will probably be some special treatment in reflection for these eight types; Most of these are things about which we can say "OK, fine, these are historical warts." There may be others asymmetries too, that derive from compatibility constraints.? As you say, the game is minimization. > An alternative that seems to work fine, in my mental model at least, is: > > * Primitive types are examples of value types, and have always been. > * Java never supported any other kinds of value types before, so we > didn't distinguish the terms before. > * Everything you associate with primitive types remains true. > * But most of those traits really come from their value-type-ness. > FTR, there is one big difference, which has a few consequences.? The big difference is reference-ness; value and primitive classes give rise to reference types, whereas primitive classes additionally give rise to a "primitive" type.? That the "primitive" type gives us reference-ness means it gives up nullability and non-tearability. I think what you're saying here, at root, is to give the "value" name to extended primitives, and find another name to give to B2? From daniel.smith at oracle.com Wed Dec 15 20:10:44 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 Dec 2021 20:10:44 +0000 Subject: We have to talk about "primitive". In-Reply-To: References: Message-ID: <1FB47341-5AC1-4F5F-AC7B-F1A24F53D4D8@oracle.com> On Dec 15, 2021, at 12:17 PM, Brian Goetz > wrote: The main problem I think we can't escape is that we'll still need some word that means only the eight predefined types. (For the sake of argument let's assume we can pick one and lean hard on it, whether that's "predefined", "built-in", "elemental", "leaf type", or whatever.) I've been calling them the built-in primitives; we've test-driven other terms like "basic" primitives. Assume we'll agree on a term. Also, no matter how we try, they will be different from the extended primitives in some ways, such as: - Their reference companions have weird names (e.g., Integer); - They permit a seemingly circular declaration (i.e., the declaration of "class int" will use "int" in its representation); - They will be translated differently, because the VM has built-in carriers for I/J/F/D, whereas extended primitives will use the L and Q carriers; - There will probably be some special treatment in reflection for these eight types; Most of these are things about which we can say "OK, fine, these are historical warts." There may be others asymmetries too, that derive from compatibility constraints. As you say, the game is minimization. Yes, this is a good list. Add to it: - They are named with a lower-case keyword - They exclusively get to use special operators (for now) My high-level response to "primitive=one of 8 types" is that it may be giving the good name to, and drawing attention to, something that doesn't matter much. Sure, we'll need to specify a distinction for the purpose of the things on the list, but I don't think most programmers should really care whether the value they're working with belongs to one of the 8 special types or not. These especially don't matter: - Aliased reference type names: going forward, everybody should be saying `int.ref` instead - Circular declarations: less than 100 people in the world need to care about this (maybe exaggerating) - Weird JVM features: yes, but the JVM has lots of quirks, ergonomics are not the top priority And the operator limitation is not fundamental, certainly could be addressed in the future. So we're left with, for most Java programmers, a set of special types that get spelled with keywords and get some special behavior in the reflection API. My initial sense is that's not enough to put them in their own different-noun category. Meanwhile, if we can tell programmers "primitives have members/classes now, and libraries can define additional primitives", that can build on existing intuitions pretty well. For example, the primitive type/reference type duality still exists, and pretty much works the same. Asking them to do s/primitive type/value type/ in this context is its own Indiana Jones maneuver. From kevinb at google.com Wed Dec 15 22:18:48 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Dec 2021 14:18:48 -0800 Subject: We have to talk about "primitive". In-Reply-To: References: Message-ID: On Wed, Dec 15, 2021 at 11:17 AM Brian Goetz wrote: > Background: the textbook definition of "primitive" is centered on their > nature of being elements-not-molecules, and I see no dispute about it. > Also, there's no disputing the fact that we're allowed to adopt a different > meaning if we so choose. So that's not even the fatal flaw. > > Yes, that's definitely a point against -- these things are "not primitive" > in the atomic sense. OTOH, they are *very much like* today's primitives in > many other ways. So this is a choice between being strictly linguistically > accurate and appealing to existing mental models. Tough choice. > My way of trying to cut through tough choices like that is to ask: Which traits that we associate with primitives today can we assess as being *essential* to their meaning, and which are the ones that are *incidental*? - Their reference companions have weird names (e.g., Integer); > - They permit a seemingly circular declaration (i.e., the declaration of > "class int" will use "int" in its representation); > - They will be translated differently, because the VM has built-in > carriers for I/J/F/D, whereas extended primitives will use the L and Q > carriers; > - There will probably be some special treatment in reflection for these > eight types; > > Most of these are things about which we can say "OK, fine, these are > historical warts." > I think there's a deeper conceptual need as well. To understand something that can recursively contain things of its own kind, I think many people want to have a sense of "but where does that all stop?" What are the leaves in that tree? The answer is "(builtin-)primitives and references", the buck stops there. The fact that100% of all of your data is all actually made up of those things alone (grouped into containers like objects) is significant, to me. So that's an eternal way that they're special that isn't a historical wart. An alternative that seems to work fine, in my mental model at least, is: > > - Primitive types are examples of value types, and have always been. > - Java never supported any other kinds of value types before, so we > didn't distinguish the terms before. > - Everything you associate with primitive types remains true. > - But most of those traits really come from their value-type-ness. > > FTR, there is one big difference, which has a few consequences. The big > difference is reference-ness; value and primitive classes give rise to > reference types, whereas primitive classes additionally give rise to a > "primitive" type. That the "primitive" type gives us reference-ness means > it gives up nullability and non-tearability. > I'm not sure I understood this, but I do want to at least add a bullet to my list: - Now, every value type will come along with a corresponding reference type. (We didn't need that before because we could just hand-code 8 reference types and done.) As for tearability: from *this* perspective 64-bit values are already technically tearable, so nothing new here. It's from a different perspective, that of writing a class, where expectations have to be weakened. I think what you're saying here, at root, is to give the "value" name to > extended primitives, and find another name to give to B2? > Yes, that's somewhere near the end of my message. I think B2 should stay centered on the concept of identitylessness. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Dec 15 23:06:12 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Dec 2021 18:06:12 -0500 Subject: [External] : Re: We have to talk about "primitive". In-Reply-To: References: Message-ID: <77745ddc-aded-b656-113f-6f687e7a1236@oracle.com> It took us a while to unravel this one, but I think we did. The JMM says that loads and stores of references, and of 32-bit-and-smaller primitive values, are atomic with respect to other loads and stores of the same variable.? This means that you'll see a valid value, though it could be a stale one.? For 64 bit primitives, it is treated as 2 32 bit loads, which are individually atomic. The initialization safety guarantees -- that you see correct values of final fields even when loading a reference with a race -- rests on the atomicity properties above. What this says is that tearing/non-tearing is a property of reference-vs-primitive-ness; accessing a (fat) value through a reference gives you *more guarantees* than accessing it directly. (Correspondingly, this has more costs.) All of this is to say, as I think you are saying: primitives of a certain size were always tearable, and they still are; references never were, and they are still not. > As for tearability: from /this/?perspective 64-bit values are already > technically tearable, so nothing new here. It's from a different > perspective, that of writing a class, where expectations have to be > weakened. From kevinb at google.com Wed Dec 15 23:14:21 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Dec 2021 15:14:21 -0800 Subject: We have to talk about "primitive". In-Reply-To: <1FB47341-5AC1-4F5F-AC7B-F1A24F53D4D8@oracle.com> References: <1FB47341-5AC1-4F5F-AC7B-F1A24F53D4D8@oracle.com> Message-ID: On Wed, Dec 15, 2021 at 12:10 PM Dan Smith wrote: Yes, this is a good list. Add to it: > - They are named with a lower-case keyword > - They exclusively get to use special operators (for now) > (Well that parenthetical turns my blood cold....) Leaning away from that though: I'm most worried about ==/!= because they are overloaded across ALL types, of all kinds, and including types that will be migrating behind the scenes. All the combinations here seem like potential Puzzler-Whack-A-Mole. But the only thing for it is to sit down and look at the whole matrix... My high-level response to "primitive=one of 8 types" is that it may be > giving the good name to, and drawing attention to, something that doesn't > matter much. Sure, we'll need to specify a distinction for the purpose of > the things on the list, but I don't think most programmers should really > care whether the value they're working with belongs to one of the 8 special > types or not. > I'm not sure "primitive" IS the good name. Maybe "value" is the good name? Agreed that most programmers most of the time can interact with all "molecules" in the same consistent way, and that is very good. But I don't think the need for the concept ever fades *too* far into the background. A mental graph needs leaves. If the concept is still needed *sometimes*, then I think it's a problem if the term you always knew that concept by got taken away. - Circular declarations: less than 100 people in the world need to care > about this (maybe exaggerating) > Oh, many more people will want to understand how we square that circle than have any absolute technical need to. They'll wake up one night thinking "wait, what the hell is an int made of then?" As they descend into that pit we want them to hit some simple workable explanation they can bounce off of and get back to work. Something like "the contents of a value are either (a) the other values they contain (see their fields), or (b) for primitives, the contents defined by the platform itself (see no fields)". (At least I believe an `int` class would not need or want a `value` field since it can just use `this` for that... right?) > So we're left with, for most Java programmers, a set of special types that > get spelled with keywords and get some special behavior in the reflection > API. My initial sense is that's not enough to put them in their own > different-noun category. > Example: many usages of Class.isPrimitive() are basically recursing an object graph and simply need to know where to stop. Did we hit bottom or not? It's a basic kind of question. > Meanwhile, if we can tell programmers "primitives have members/classes > now, and libraries can define additional primitives", that can build on > existing intuitions pretty well. For example, the primitive type/reference > type duality still exists, and pretty much works the same. Asking them to > do s/primitive type/value type/ in this context is its own Indiana Jones > maneuver. > Obviously I see it as meaningfully different from an Indiana Jones maneuver, and wouldn't have used the term if I didn't. On an island where only pear trees grow they wouldn't have a word for "fruit". A traveler comes, "here, these are apples". Well, they're going to make a word for fruit. Most times they used to say "pear", it was really the fruitness that mattered. They start saying "fruit" more than "pear". They still need the word "pear" sometimes. Contrast: traveler says "here, these are pears." "What?" "These are the apple kind of pears, and what you have are heritage pears." For a while no one knows what the hell "bring me a pear" means anymore. Also, for some reason everyone is being chased by a giant spherical boulder. One feels more destabilizing than the other. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Thu Dec 16 01:49:34 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 15 Dec 2021 17:49:34 -0800 Subject: [External] : Re: We have to talk about "primitive". In-Reply-To: <77745ddc-aded-b656-113f-6f687e7a1236@oracle.com> References: <77745ddc-aded-b656-113f-6f687e7a1236@oracle.com> Message-ID: On 15 Dec 2021, at 15:06, Brian Goetz wrote: > It took us a while to unravel this one, but I think we did. > ? What this says is that tearing/non-tearing is a property of > reference-vs-primitive-ness; accessing a (fat) value through a > reference gives you *more guarantees* than accessing it directly. > (Correspondingly, this has more costs.) > > All of this is to say, as I think you are saying: primitives of a > certain size were always tearable, and they still are; references > never were, and they are still not. Of course references don?t tear, and more to the point, `final` fields reached by references also don?t tear, because they are (a) safely published and (b) never mutated after publication. So, as Brian says, wrapping a reference around some chunks of state has a special benefit (as well as a special cost). The reference wrapper freezes those chunks in place, relative to each other. From john.r.rose at oracle.com Thu Dec 16 03:15:32 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 15 Dec 2021 19:15:32 -0800 Subject: We have to talk about "primitive". In-Reply-To: References: Message-ID: <842DC86C-7AA5-4D9F-BEA0-89BD195EA7A2@oracle.com> On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote: > ? > The main problem I think we can't escape is that we'll still need some > word > that means only the eight predefined types. (For the sake of argument > let's > assume we can pick one and lean hard on it, whether that's > "predefined", > "built-in", "elemental", "leaf type", or whatever.) As others have said, we?ll pick a term for this. The idea of calling out a ?leaf? in a data graph is compelling to me. As you say, people are going to wonder what is the foundation of the whole scheme. (No it?s not objects all the way down, at least that?s not what we are aiming for.) (But?spoiler alert?the division between leaf/scalar/basic type and composite/class type is *less important in daily practice* than the ad hoc mental models programmers make about which types they choose to view as composite and which are indivisible. Typical example: Most programmers choose to regard `String` as a sort of nullable primitive. I?ll pick up that thread later.) I like the term ?basic type?, and (as we already discussed) I like ?scalar? also, because ?scalar? correctly suggests something about how it?s processed in hardware. Here?s a point I think is also important and has not been discussed much yet: A concept like ?basic type? (or ?scalar type?) should include references as well as Java?s eight current primitive types. Like an `int` or other basic primitive, a reference is copied by value, processed efficiently (probably in a hardware register), and is a ?leaf item? with respect to a single object layout or method type signature. Also, like `int`, a reference has its own special operators in the language and special bytecodes in the JVM. Like `int`, it has a default value `null` (instead of `0`). The main difference of a reference from an `int` is the fact that it has a far end: You can often (not always) find other values by indirecting the reference and loading a field or calling a method or querying a super type. (Because it has a far end, it also has a nominal subtype to classify what might be at the far end. But I?m speaking here about references per se, apart from their subtypes.) Despite their ?far end?, people treat some reference types, like `String`, as if they were leaves; you stop at the `String` and don?t bother thinking about its fields. Users don?t care that there?s an array somewhere on the other end, unless they are engineering the string class itself. So a reference has a far end, unlike an `int`, but, like an `int`, a reference *often* is treated like an unstructured value, in code. Bottom line: There are a handful of built-in basic types. These are used to compose classes. They are the primitives and the references. When we consider a reference apart from its class (say, as `jl.Object`), it can be comfortably called a *basic type*, and then that handful of built-in basic types consists of the (basic) primitives and references. OK, that?s enough on that. Whether ?reference? is a basic type is less important than how we choose to extend (or not extend) the reach of the term ?primitive?. For historic reasons we use the word ~~fruit~~ *primitive* to mean a basic type other than a reference. Now that we have user-defined `int`-like things, we have to decide whether and how to connect the old word to the new things. Since user-defined `int`-like things are (we think) very like `int` in many ways, a term like ?extended primitive? makes sense. This is how I get to the terms ?basic primitive? and ?extended primitive?. Or ?scalar primitive? and ?extended primitive?. As I read your messages, you would prefer to keep the term ?primitive? narrow, because of the possible confusion of telling users ?hey, what you think of as primitives are now the ~~heirloom~~ basic primitives.? Personally, I think users will say, to our unveiling ?extended primitives?, something like this: >> Well, that?s not exactly what the dictionary says primitive means, >> if you can make new composite ones. But I do know that Java has >> non-reference types and calls them ?primitive?. And I also know >> it would be really cool to define new types that work like `int`, >> such as `UnsignedInt` or `HalfFloat` or the like. I get why they >> don?t want to build all such types into the language; in fact maybe >> I?d like to try my hand someday at defining my own. So, >> ?extended primitive?. It?s on: The Java primitives are now an >> open-ended set just like the Java objects. In other words, in saying ?extended primitive? (and also ?basic primitive?) we lean away from the dictionary definition of ?primitive? and into the Java definition. That feels like a non-confusing choice to me. > > Definitely, our trying to minimize their specialness is virtuous. Yep. We also call this ?healing the rift?, sometimes. > ? > So we have to attempt to shift users' understanding of "primitive" > while at > the same time injecting a new term to mean exactly what primitive used > to > mean. That's the old Indiana Jones switch and I don't have to tell you > how > that turned out for him. So, no, it?s not the Indy switch, at all. Users know what ~~fruit~~ primitives are in Java, and they will have no problem with adding new ~~imported exotic apples~~ extended primitive to the familiar set of primitive types. And in exchange for this infusion of wonderful new types, they will learn a new term for the old types, which is ~~pears~~ basic primitives (or scalar primitives). > > It would be difficult to pull off in a world where we were just > pushing > some new server and the whole world gets the new model at once. But in > this > universe where every version of Java ever made all have to coexist, > it's > looking to me like a guaranteed source of never-ending confusion. > > I also think it robs us of our ability to smoothly portray the real > changes > of Valhalla. We want to be able to say "elements are still elements! > now we > have molecules too". There are two kinds of users w.r.t. the question of ?what?s a primitive? and you can?t please both. You and I want to please different kinds. The user I want to please is one who thinks of ?Java primitive? as a kind of non-nullable scalar number (or boolean or char). The user you want to please thinks of ?Java primitive? as ?all leaves in the Big Graph?. The latter user will be disappointed if we say ?Java primitives? can be non-leaves. The former user will be delighted. The latter user sees a `String` and wants to crack out its underlying array, in a Gollum-like quest for the roots of the mountains. The latter user treats a `String` as a primitive. There are more of the former than the latter; we should cater to them. It?s the former who I was channeling above, concluding with ?The Java primitives are now an open-ended set just like the Java objects.? > Pedagogically that is always preferable to "elements > aren't really what you thought they were". Okay, the real comparison > is a > little more nuanced than that, but I'll get to that now. > > An alternative that seems to work fine, in my mental model at least, > is: > > - Primitive types are examples of value types, and have always > been. > - Java never supported any other kinds of value types before, so we > didn't distinguish the terms before. > - Everything you associate with primitive types remains true. > - But most of those traits really come from their value-type-ness. > > (I plan to make the above shifts to my model document already.) The term ?value? can be applied to composites in B3 alone, to composites in B2 alone, or to both. (Or neither.) All the basic types, including references, are values as well. This is big choice, where to ?spend? the term ?value?. Our choice will be informed and supported by our account about what *we mean* by the term ?value?. If the word value means ?a primitive thing that can be stored in a register?, then we can?t extend it. So that won?t fly. For us the word value means something like that but adjusted, ?a thing that is freely copyable and can be stored in one or more registers?. But look how that affects B2 and B3: B3 are values, obviously; there is no reference to confuse their free copying. (There is also no reference to help us adjoin `null` to the value set, and no reference to help us perform safe publication.) B2 are references to? well, values as well. They might be on the heap, or they might be elsewhere; we don?t care because the freely copyable values are not also accompanied by object identity. Both B1 and B2 *references* (per se) are, confusingly, also values, since basic types (and/or references) are freely copyable. But a B2 reference is a value, which refers to another value. (Proof they are distinct values: One is possibly null, the other isn?t.) And like a user using `String`, the value-ness of a B2 reference can be treated as a single, simple, atomic thing, without further reference to substructure. In particular, because it?s not B1, there?s no possibility of state under the B2 reference; there?s just the value you care about. I think, because the term value applies in so many places (including B1 references), it will be tricky to use it as a classification (like ?pear?) instead of an assertion of use (like ?fruit?). But given the choice between using the term ?value? to classify types, distinguishing them from B1 types, I think the correct choice is to apply the term to B2, as ?value object? vs. ?identity object?. The value-ness of B3 (as loose aggregates) and B1 (as references) is going to add a bit of confusion. Dan did a round of naming where he used the term ?pure object? as the opposite of ?identity object?; now we are at ?value object? vs. ?identity object?, I think. > > - Now we have user-defined value types too. > - The way we user-define a type is with a class, so a value type is > defined by a "value class" (sorry B2). > - The primitive types will now each get a value class. > - These 8 classes will look as much like user-defined types as > Object > does. > - They, like Object, will have a "cheat" in their source code that > no > one else gets to use. (Object's is that there is no implied > `extends > Object` or `super();`; these need no fields because the data they > store is > magically handled by the VM. These feel like similar cheats.) I don?t disagree with any of the above, but I think the value classes live in B2 not in B3. The B3 types are derived from the B2 types, by ?dumping out? the class fields. Note that every single B3 type (non-reference) has a unique companion B2 type (reference). The semantic difference between those types is like the semantic difference between `int` and `Integer`. Narrow but useful. Separate question: Does the declaring form for a B3/B2 type pair ?look like? a B2-only declaration, but with an added mode switch? Or does it ?look like? a B3-declaration, something that?s not a full-on class-that-defines-objects? We could go either way on that. Either way, one declaration will define two related types. Suppose we have this B2-only class declaration syntax: ``` __ByValue class NamedInt { String name; int value; ? } ``` Then a B2-tilted syntax for a B3/B2 pair might look like: ``` __ByValue __AlsoPrimitive class Point { double x, y; ? } ``` And a B3-tilted syntax for the same pair might look like: ``` __ExtendedPrimitive Point { double x, y; ? } ``` (F.D.: I think the B3-tilted syntax is less likely to succeed.) Either way, you can draw out a B3 type from the first and a B2 type from the second. As a sort of mental experiment, you can also imagine a ?two headed? declaration syntax that would provide independent specification of the names of both types: ``` __PrimitiveType int & /*int is B3*/ __PrimitiveBox class Integer /*int.ref=Integer is B2*/ extends Comparable { ? one body with two heads ? } ``` Why do that? Well, it makes it clear that a one-headed declaration could in principle start with either the B3 or the B2 end of the stick. Also it helps us think, a little, about retrofitting the very odd legacy wrapper names. > > Then mopping up the rest: > > - Existing classes probably need a term like "reference classes" > (in the > model I'm going to circulate that doubles down on > values-are-not-objects, > then this wants to be "object classes", even though that feels > weird at > first). > - I think the term for bucket 2 classes really ought to center on > identitylessness, e.g. "noid", "noident", "idfree", or something. > Anything > else is getting away from the essential meaning of the bucket; > plus, we > want people to call bucket 1 classes "identity classes", don't we? If we spend the good word ?value? on B3, we must then find a word like ?noid? for B2. But since I think ?value-ness? is centered in B2 from the start, I?d rather find a one-off term for B3! (And that?s ?primitive? as argued above.) But let?s grant, for a moment, that we don?t want ?value? for B2. What term characterizes B2 types? As you say, they are objects but they don?t have identity, so ?noid?, etc. That?s a true description. But it?s not the main point of B2 types. The point of B2 types is not that we dislike object identity (we like it a lot in many cases!). The point of B2 types is they can be regarded as tidy bundles of field values, and/or tidy abstractions (like `String`) of simple values, without confounding state changes. After looking at this from many angles, I prefer to say that, while B2 has the *negative* characteristic of being identity-free, it has the *positive* characteristic of being *freely copyable*. The ?freely? is so free that copying often happens outside of the JVM heap. In fact, a B2 type is a value. Maybe there?s a different way of characterizing the *positive* nature of B2, but I think it comes down to, ?B2 types are plain values?. Until I get an even better account for B2?s special power (one that doesn?t begin with the word ?not? or ?no? or ?doesn?t?), I?m going to be very happy to declare B2 types as ?value classes? and work with their instances as ?value objects?. So, while I see why you want to avoid the paradox of ?extended primitives?, and your very correct identification of ?values? in B3, I prefer to talk about B3 as primitives (primitive values) and B2 as value objects. BTW, I agree that B3 values should not be objects; maybe we can call them instances, although instance/class/object are terms that usually appear together. Obviously both B1 and B2 contain instances/classes/objects. BTW again, I updated my own Zoo of Field Types diagram here, and you might wish to give it a look, since it?s relevant to this discussion: http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf (that?s cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf if the URL police got the previous line) > Footnote: for a more concrete manifestation of this problem: I am sure > we > cannot possibly get away with Class.isPrimitive() being true for these > classes. Right? Yeah, `Class::isPrimitive` is a query on types, not classes. In other words, the `Class` mirror, for this call, is serving to reflect a type, for example one of `int.class` or `Integer.class`. If we apply the term ?primitive? to classes, then we will need a not-so-good name, like `Class::isPrimitiveClass`. However, if we choose to make extended primitives reflect very similarly to basic primitives, then we can choose to have `Class::isPrimitive` to return true *for their non-reference types*. There is no reference type for which `Class::isPrimitive` is true. Despite my fondness for the concept of ?basic types? there is no `Class::isBasicType`. There could be, in the future, though I don?t think it pulls its weight. We could also have `Class::isBasicPrimitive`. Or we could choose to break less code by keeping `Class::isPrimitive` true only for nine mirrors, and define `Class::isReferenceType` and/or `Class::isNonReferenceType` to provide the query for ~~fruit~~ basic or extended primitive types. From brian.goetz at oracle.com Thu Dec 16 17:50:44 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 Dec 2021 12:50:44 -0500 Subject: [External] : Re: Enhancing java.lang.constant for Valhalla In-Reply-To: References: Message-ID: This reminds me of an earlier version of the jl.constant API, where we tried to track the varargs bit.? In the end, we dropped this, because it washed off too easily in the API.? We could have a preload() bit that travels with the ClassDesc, which would then have to be propagated into a bit mask in MethodTypeDesc, which would have to carry the bits around (and expose them) through combinators like dropArguments().? Seems possible, but also seems like there's gonna be some whack-a-mole handling the wash-off cases. Stepping back, the ClassDesc type was originally intended to model the C_Class constant pool entry.? And there's not L* flavor of C_Class.? But, it is also reasonable to use ClassDesc as a way of describing a field descriptor in a bytecode API (and similar for MethodTypeDesc.) Presumably the preload attribute is one attribute for the whole class.? That means that a classfile reader would have to parse that attribute, and when dispensing ClassDesc to clients, would have to look in the table to see whether the class is there? Also, how would this affect ClassDesc::equals?? Would LFoo and L*Foo be equal? On 12/16/2021 12:26 PM, Dan Heidinga wrote: > The updated api looks pretty good for handling both L and Q descriptors. > > There's one case that isn't handled here though - L* descriptors. > WIth the bucket 2 & 3 design, we really have 3 kinds of descriptors: > L, Q, and L*. Over the years, we've spent a lot of time as an EG > talking about stars on descriptors and working through the issues > related to "stars washing off". I think this API is one of the cases > where stars may wash off and therefore needs some way to indicate that > a given L descriptor is actually an L* descriptor. > > Why are stars important? If we don't have stars - which are the L > form of Q's "go and look" preload contract - we lose out on calling > convention and layout optimizations for the bucket 2 classes with > their L descriptors. Without the "*" on the descriptor, we may miss > the preload signal for a given ClassDesc and lose our chance to > optimize. > > Tracking this extra boolean state in the ClassDesc seems like a > reasonable thing to do to let the stars flow through the system and > ensure they are available at classfile generation time when we want to > write the preload attribute. > >> So, summarizing the new methods (modulo naming changes to reflect changes in Valhalla language syntax): >> >> ClassDesc ofInternal(String internalBinaryName) >> boolean isValue() >> ClassDesc valueType() >> ClassDesc refType() > Having 3 forms of descriptors unfortunately causes some issues with > the "isValue()" and "valueType()" apis. Do both bucket 2 and 3 return > "true" for isValue()? There are also two possible results for > valueType() when called on an L - add a star or convert to a Q > descriptor. > > Figuring out the right methods to add - and naming them - overlaps > somewhat with the other thread about the meaning of "primitive". So > borrowing John's terminology, I think we can get by with a single new > potential API after redefining the contract for some of the other > proposed apis: > > ClassDesc ofInternal(String internalBinaryName) > boolean isValue() // true for both L* & Q > ClassDesc valueType() // creates L* descriptor > ClassDesc extendedPrimitive() // creates Q descriptor > ClassDesc refType() > > Alternatively, a single 'ClassDesc valueType(boolean isQ)' could be > added but I think the multiple method approach is better as aligning > the names makes the intention clearer. > > --Dan > From kevinb at google.com Thu Dec 16 18:31:15 2021 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 16 Dec 2021 10:31:15 -0800 Subject: We have to talk about "primitive". In-Reply-To: <842DC86C-7AA5-4D9F-BEA0-89BD195EA7A2@oracle.com> References: <842DC86C-7AA5-4D9F-BEA0-89BD195EA7A2@oracle.com> Message-ID: Really appreciate the attention and insight here. I must respond on the installment plan. On Wed, Dec 15, 2021 at 7:15 PM John Rose wrote: > On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote: > > ? > The main problem I think we can't escape is that we'll still need some > word > that means only the eight predefined types. (For the sake of argument > let's > assume we can pick one and lean hard on it, whether that's "predefined", > "built-in", "elemental", "leaf type", or whatever.) > > As others have said, we?ll pick a term for this. The idea of calling out a > ?leaf? in a data graph is compelling to me. As you say, people are going to > wonder what is the foundation of the whole scheme. (No it?s not objects all > the way down, at least that?s not what we are aiming for.) > > (But?spoiler alert?the division between leaf/scalar/basic type and > composite/class type is *less important in daily practice* than the ad > hoc mental models programmers make about which types they choose to view as > composite and which are indivisible. Typical example: Most programmers > choose to regard String as a sort of nullable primitive. I?ll pick up > that thread later.) > Yes, I agree. (Because I hate to drop a metaphor) Physicists want to know that the proton is divisible, but they can do a hell of a lot without paying attention to that fact. > I like the term ?basic type?, and (as we already discussed) I like > ?scalar? also, because ?scalar? correctly suggests something about how it?s > processed in hardware. > Note I'm stipulating that we'll find the most perfect term there is (next to "primitive"), and all my arguments remain. > Here?s a point I think is also important and has not been discussed much > yet: A concept like ?basic type? (or ?scalar type?) should include > references as well as Java?s eight current primitive types. > I think I've said this somewhere in this threads as well, "I'm quite comfortable with the idea that references are the ninth primitive type", but I backpedaled from that by the time I'd finished the conceptual model . To suggest that "reference" is a type implies that each reference has *two* static types. Pros/cons: + Each of those static types always functions in the exact same way. Non-reference values just don't have the second one (or it equals the first). ? It makes *three* types involved overall (1. it's a reference / 2. it has this constraint on the referent's dynamic type / 3. the referent has this dynamic type). ? It means that what users see in their code might be the first *or* the second of those, which seems like losing ground on the opaqueness of references. Valhalla acts to strengthen the implementation-detailness of references, e.g. we want users to think, "the dot means member access, and Java dereferences first if necessary". (I see the main distinction between the "values-are-objects" and "values-ain't-objects" candidate models as being how *far* it goes down that line.) So my alternative is: just let *valueness* be what unifies them. They are only special in that (a) the static type functions totally differently and (b) their opaqueness and everything that is done to provide that. Right now I feel like this gets the job done. As I read your messages, you would prefer to keep the term ?primitive? > narrow, because of the possible confusion of telling users ?hey, what you > think of as primitives are now the heirloom basic primitives.? > Personally, I think users will say, to our unveiling ?extended primitives?, > something like this: > > Well, that?s not exactly what the dictionary says primitive means, if you > can make new composite ones. But I do know that Java has non-reference > types and calls them ?primitive?. And I also know it would be really cool > to define new types that work like `int`, such as `UnsignedInt` or > `HalfFloat` or the like. I get why they don?t want to build all such types > into the language; in fact maybe I?d like to try my hand someday at > defining my own. So, ?extended primitive?. It?s on: The Java primitives are > now an open-ended set just like the Java objects. > > I have quibbles here and there but I definitely agree that everyone can find a map through this. But: > In other words, in saying ?extended primitive? (and also ?basic > primitive?) we lean away from the dictionary definition of ?primitive? and > into the Java definition. That feels like a non-confusing choice to me. > This might be okay except for my central point: that we simultaneously need a new term meaning exactly the dictionary definition. So we have to attempt to shift users' understanding of "primitive" while at > the same time injecting a new term to mean exactly what primitive used to > mean. That's the old Indiana Jones switch and I don't have to tell you how > that turned out for him. > > So, no, it?s not the Indy switch, at all. Users know what fruit > primitives are in Java, and they will have no problem with adding new imported > exotic apples extended primitive to the familiar set of primitive types. > And in exchange for this infusion of wonderful new types, they will learn a > new term for the old types, which is pears basic primitives (or scalar > primitives). > It would be a *worse* "Indiana Jones switch" if these were sibling concepts. But even if he was swapping the idol for a less detailed idol he'd better start runnin'. > It would be difficult to pull off in a world where we were just pushing > some new server and the whole world gets the new model at once. But in > this > universe where every version of Java ever made all have to coexist, it's > looking to me like a guaranteed source of never-ending confusion. > > I also think it robs us of our ability to smoothly portray the real > changes > of Valhalla. We want to be able to say "elements are still elements! now > we > have molecules too". > > There are two kinds of users w.r.t. the question of ?what?s a primitive? > and you can?t please both. You and I want to please different kinds. The > user I want to please is one who thinks of ?Java primitive? as a kind of > non-nullable scalar number (or boolean or char). The user you want to > please thinks of ?Java primitive? as ?all leaves in the Big Graph?. The > latter user will be disappointed if we say ?Java primitives? can be > non-leaves. The former user will be delighted. The latter user sees a > String and wants to crack out its underlying array, in a Gollum-like > quest for the roots of the mountains. The latter user treats a String as > a primitive. There are more of the former than the latter; we should cater > to them. It?s the former who I was channeling above, concluding with ?The > Java primitives are now an open-ended set just like the Java objects.? > Here's where I suggest that we categorize our existing associations with "primitive" into essential vs. incidental. And generally, to claim essentialness for a meaning that's at odds with the generally accepted meaning should be subject to a form of Sagan's razor . But I suppose the question is what that generally accepted meaning is. For one thing, if any other language has user-defined compound types that it calls "primitives" already that would be very useful to know. > The term ?value? can be applied to composites in B3 alone, to composites > in B2 alone, or to both. (Or neither.) All the basic types, including > references, are values as well. > > This is big choice, where to ?spend? the term ?value?. > Just a reminder (esp. for others observing) that my conceptual model document shows at least one thorough viewpoint that "value" has a strong existing meaning, and one that Valhalla doesn't even have to shift at all. > B2 are references to? well, values as well. Just lacking identity already allows for substitutable copies; that isn't necessarily valueness, and if users don't have to think B2 instances are a whole new kind of thing they've never seen before, that is very (sorry) valuable. If I get to hang onto the meaning of "value" in my document, then I can use it to explain B2: a B2 instance is an object, whose identity is either nonexistent or completely unobservable (no difference). That makes the VM free to substitute equal copies, or (whenever indistinguishable) to *represent* it as a compound value instead, or even to box that compound value up again as needed. In any case it is still, meaningfully, an object." -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Dec 16 20:57:41 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 Dec 2021 15:57:41 -0500 Subject: [External] : Re: Enhancing java.lang.constant for Valhalla In-Reply-To: References: Message-ID: <189870bb-c730-0c69-2119-3b71a00cd34a@oracle.com> > If the preload() bit is tied to the ClassDesc, do we need to worry > about a bit mask in MethodTypeDesc? Isn't the MethodTypeDesc composed > of ClassDesc returnType and ClassDesc[] of parameters? I feel like > I'm missing some complexity here... That's how the implementation happens to work, but conceptually, a MethodType has N+1 descriptors and N pre-load bits.? (Its convenient that the the implementation works in terms of ClassDesc.) > >> Also, how would this affect ClassDesc::equals? Would LFoo and L*Foo be equal? > I think they'd have to be equal as they represent the same descriptor. > It's only the presence of side channel info - the star - that makes > them different and the difference is only in the "go and look" > behaviour. The VM will treat them as identical when dealing with > descriptor strings. > Which makes me ask ... if it really is a side channel, does it really go *in* the ClassDesc? From daniel.smith at oracle.com Fri Dec 17 00:08:02 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 17 Dec 2021 00:08:02 +0000 Subject: JEP update: Primitive Classes Message-ID: First, I've made some minor revisions to the Value Objects JEP in the last couple of weeks. You can see it here: https://openjdk.java.net/jeps/8277163 Second, I've put together a draft of a revised JEP 401, Primitive Classes. This removes content that became part of the Value Objects feature, and refines how we talk about the relationship between primitive types and reference types. Working outside of JBS for now, because I don't want to disrupt the already-Candidate JEP 401 artifact until we're at least ready to Submit the Value Objects piece. A key idea is that primitive values and value objects are distinct entities, with different types, but they're both instances of the same class (thanks for the good ideas here, Kevin!). (I'll acknowledge the ongoing discussion about whether "primitive" is the right term to use here. But for now, sticking with the status quo.) Happy to hear your thoughts! --- Summary ------- Support new, developer-declared primitive types in Java. This is a [preview language and VM feature](http://openjdk.java.net/jeps/12). Goals ----- This JEP introduces primitive classes, special kinds of [value classes][jep-values] that define new primitive types. The Java programming language will be enhanced to recognize primitive class declarations and support new primitive types in its type system. The Java Virtual Machine will be enhanced with a new `Q` carrier type to encode declared primitive types. Non-Goals --------- This JEP is concerned with the core treatment of developer-declared primitives. Additional features to improve integration with the Java programming language are not covered here, but are expected to be developed in parallel. Specifically: - [JEP 402][jep402] will enhance the basic primitives (`int`, `boolean`, etc.) by giving them primitive class declarations. - [A separate JEP][jep-generics] will update Java's generics so that primitive types can be used as type arguments. Other followup efforts may enhance existing APIs to take advantage of primitive classes, or introduce new language features and APIs built on top of primitive classes. Motivation ---------- Java developers work with two kinds of values: primitives and objects. Primitives offer better performance, because they are typically *inlined*?stored directly (without headers or pointers) in variables, on the computation stack, and, ultimately, in CPU registers. Hence, memory reads do not have additional indirections, primitive arrays are stored densely and contiguously in memory, primitive-typed fields can be similarly compact, primitive values do not require garbage collection, and primitive operations are performed within the CPU. Objects offer better abstractions, including fields, methods, constructors, access control, and nominal subtyping. But objects traditionally perform poorly in comparison to primitives, because they are primarily stored in heap-allocated memory and accessed by reference. *Value objects*, introduced by [another JEP][jep-values], significantly improve object performance in many contexts, providing a good fusion of the better abstractions of objects with the better performance of primitives. However, certain invariant properties of objects limit how much they can be optimized?particularly when stored in fields and arrays. Specifically: - A variable of a reference type may be `null`, so the inlined layout of a value object typically requires some additional bits to encode `null`. For example, a variable storing an `int` can fit in 32 bits, but for a value class with a single `int` field, a variable of that class type could use up to 64 bits. - A variable of a reference type must be modified atomically. This often makes it impractical to inline a value object, because its layout would be too large for efficient atomic modification. Large primitive types (currently, `double` and `long`) make no such atomicity guarantees, so variables of these types can be modified efficiently without indirect representations (concurrency is instead managed at a higher level). Primitive classes give developers the capability to define new primitive types that aren't subject to these limitations. Programs can make use of class features without giving up any of the performance benefits of primitives. Applications of developer-declared primitives include: - Numbers of varieties not supported by the basic primitives, such as unsigned bytes, 128-bit integers, and half-precision floats; - Points, complex numbers, colors, vectors, and other multi-dimensional numerics; - Numbers with units?sizes, rates of change, currency, etc.; - Bitmasks and other compressed encodings of data; - Map entries and other data structure internals; - Data-carrying tuples and multiple returns; - Aggregations of other primitive types, potentially multiple layers deep Description ----------- The features described below are preview features, enabled with the `--enable-preview` compile-time and runtime flags. ### Primitive classes A *primitive class* is a special kind of value class that introduces a new primitive type. As value classes, primitive classes have no identity. This allows their instances to be freely converted between value objects and simpler *primitive values*. A primitive value can be thought of as a bare sequence of field values, without any headers or extra pointers. A primitive class is declared with the `primitive` contextual keyword. ``` primitive class Point implements Shape { private double x; private double y; public Point(double x, double y) { this.x = x; this.y = y; } public double x() { return x; } public double y() { return y; } public Point translate(double dx, double dy) { return new Point(x+dx, y+dy); } public boolean contains(Point p) { return equals(p); } } interface Shape { boolean contains(Point p); } ``` (Alternatively, we might prefer the class to be declared as `primitive Point`.) Primitive class declarations are subject to the [same restrictions][jep-values] as other value class declarations. For example, the instance fields of a primitive class are implicitly `final`, so cannot be assigned outside of a constructor or initializer. In addition, no instance field of a primitive class declaration may have a primitive type that depends?directly or indirectly?on the declaring class. In other words, with the exception of reference-typed fields, the class must allow for flat, fixed-size layouts without cycles. In most other ways, a primitive class declaration is just like any other class declaration. It can have superinterfaces, type parameters, enclosing instances, inner classes, overloaded constructors, `static` members, and the full range of access restrictions on its members. ### Primitive types The name of a primitive class denotes that class's primitive type. Primitive types store instances of the named class as primitive values. Instances can be created with normal class instance creation expressions. ``` Point p1 = new Point(1.0, -0.5); ``` Field access and method invocation are supported by primitive types. The members of a primitive type are the same as the members of the class. ``` assert p1.x() == 1.0; Point p2 = p1.translate(0.0, 1.0); System.out.println(p2.toString()); ``` Primitive types support the `==` and `!=` operators when comparing two values of the same type. As is the case for value objects, the `==` comparison recursively compares the values' fields. ``` Point p3 = new Point(1.8, 3.6); Point p4 = p3.translate(0.0, 0.0); assert p3 == p4; ``` Like a value class reference type, an expression of a primitive type cannot be used as the operand of a `synchronized` statement. *Unlike* other value classes, a `this` expression in the body of a primitive class has a primitive type. ### Default values and `null` Like the basic primitive types (`int`, `boolean`, etc.), declared primitive types do not allow `null`. Whenever a field or array component is created, the longstanding behavior is to set its initial value to the *default value* of its type. For reference types, this value is `null`, and for the basic primitive types, this value is 0 or `false`. For a declared primitive type, the default value is the *initial instance* of the class: an instance whose fields are all set to their own default values. ``` Object[] os = new Object[5]; assert os[0] == null; Point[] ps = new Point[5]; assert ps[0].x() == 0.0 && ps[0].y() == 0.0; ``` As shorthand, the default value of a primitive type can be expressed with the class name followed by the `default` keyword. ``` assert Point.default.x() == 0.0 && Point.default.y() == 0.0; ``` Note that the initial instance of a primitive class is created without invoking any constructors or instance initializers, and is available to anyone with access to the class (or its reflective `Class` object). Primitive classes are not able to specify an initial instance that sets fields to something other than their default values. Methods of primitive classes should be designed to work on the initial instance. If this isn't feasible (for example, a reference-typed field is expected to be non-null), it may not be appropriate for the class to have a primitive type. Instead, it can be declared as a normal value class. ### Multi-threaded reads and writes As for the basic primitive types `double` and `long`, when a field or array component has a declared primitive type, reads and writes might not be atomic. As a result, in a multi-threaded program, unexpected instances may be encountered. ``` Point[] ps = new Point[]{ new Point(0.0, 1.0) }; new Thread(() -> ps[0] = new Point(1.0, 0.0)).run(); Point p = ps[0]; // may be (1.0, 1.0), among other possibilities ``` Like initial instances, primitive class instances produced by non-atomic reads and writes are created without invoking any constructors or instance initializers. There is no opportunity for the class to ensure that the field values of the new object are compatible with each other (for example, a `start` index may end up being greater than an `end` index). To ensure that a particular primitive-typed field is always read from and written to atomically, the field can be declared `volatile`. But there is no mechanism for a primitive class to ensure that *all* fields and array components of its type are considered volatile. A class with a complex integrity constraint in its constructor may not be a good candidate to be a primitive class. Instead, it can be declared as a normal value class. ### Reference types Primitive values are *monomorphic*?they belong to a single type with a specific set of fields known at compile time and runtime. Values of different primitive types can't be mixed. To participate in the *polymorphic* reference type hierarchy, primitive values are converted to value objects with a *value object conversion*. This occurs implicitly when assigning from a primitive type to a reference type. The result is an instance of the same class, just in a different form. ``` Shape s = p1; // value object conversion assert s.getClass() == Point.class; ``` When invoking an inherited method of a primitive type, the receiver value undergoes value object conversion to have the type expected by the method declaration. ``` Point p = new Point(0.3, 7.2); // toString is declared by Object p.toString(); // value object conversion ``` It is sometimes useful to talk about the reference type of a primitive class. This type is expressed with the class name followed by the `ref` contextual keyword. A variable with a primitive class reference type stores either a value object belonging to the named class or `null`. ``` Point.ref[] prs = new Point.ref[10]; prs[1] = new Point(1.0, 1.0); prs[4] = new Point(4.0, 4.0); for (Point.ref pr : prs) { if (pr != null) System.out.println(pr); } ``` The `ref` type is useful when `null` is needed or when the runtime characteristics of reference types are preferred (for example, a large sparse array might be more efficiently encoded with references). The relationship between the types `Point` and `Point.ref` is similar to the traditional relationship between the types `int` and `Integer`. However, `Point` and `Point.ref` both correspond to the same class declaration; the values of both types are instances of a single `Point` class. At run time, the conversion between a primitive value and a value object is more lightweight than traditional boxing conversion. Value objects can be converted back to primitive values with a *primitive value conversion*. `null` cannot be converted to a primitive value, so attempts to convert it cause an exception. ``` Point p = prs[1]; // primitive value conversion prs[1] = null; p = prs[1]; // NullPointerException ``` When invoking a method overridden by a primitive class, the receiver object undergoes primitive value conversion to have the type expected by the method declaration. ``` Shape s = new Point(0.7, 3.2); // 'contains' is declared by Point s.contains(Point.default); // primitive value conversion ``` #### Overload resolution and type arguments Value object conversion and primitive value conversion are allowed in *loose*, but not *strict*, invocation contexts. This follows the pattern of boxing and unboxing: a method overload that is applicable without applying the conversions takes priority over one that requires them. ``` void m(Point p, int i) { ... } void m(Point.ref pr, Integer i) { ... } void test(Point.ref pr, Integer i) { m(pr, i); // prefers the second declaration m(pr, 0); // ambiguous } ``` For now, Java's generics only work with reference types. [Another JEP][jep-generics] will enhance generics to interoperate with primitive types. Thus, provisionally, type arguments must be inferred to be reference types. Type inference treats value object and primitive value conversions the same as boxing and unboxing?for example, a primitive value passed where an inferred type is expected will lead to a reference-typed inference constraint. ``` var list = List.of(new Point(1.0, 5.0)); // infers List ``` #### Array subtyping Traditionally, primitive array types are not related to reference array types?an `int[]` cannot be assigned to an `Object[]` variable. Arrays of declared primitive types are more flexible: the type `Point[]` is a subtype of `Point.ref[]`, which is a subtype of `Object[]`. (Basic primitive array types like `int[]` will also gain this capability with [JEP 402][jep402].) When a reference is stored in an array of static type `Object[]`, if the array's runtime component type is `Point` then the operation will perform both an array store check (checking that the object is an instance of class `Point`) and a primitive value conversion (converting the object to a primitive value). Similarly, reading from an array of static type `Object[]` will cause a value object conversion if the array stores primitive values. ``` Object replace(Object[] objs, int i, Object val) { Object result = objs[i]; // may perform value object conversion objs[i] = val; // may perform primitive value conversion return result; } Point[] ps = new Point[]{ new Point(3.0, -2.1) }; replace(ps, 0, new Point(-2.1, 3.0)); replace(ps, 0, null); // NPE from primitive value conversion ``` ### `class` file representation & interpretation A primitive class is declared in a `class` file using the `ACC_PRIMITIVE` modifier (`0x0800`). At class load time, an error occurs if a primitive class is not a value class (via `ACC_VALUE`, `0x0100`). At preparation time, an error occurs if a primitive class has a primitive type circularity in its instance fields. A declared primitive type is represented with a new `Q` descriptor prefix (`QPoint;`). The class's reference type is represented using the usual `L` descriptor (`LPoint;`). Primitive values with `Q` types are one-slot stack values, even though they may represent aggregates of much more than 32 or 64 bits. No particular encoding of primitive values is mandated. Verification treats a `Q` type as a subtype of the corresponding `L` type?e.g., `QPoint;` is a subtype of `LPoint;`. Conversions from primitive values to value objects occur implicitly, as needed. The `this` parameter of a primitive class's instance method has a primitive type. Classes mentioned by primitive types in field and method descriptors are loaded during linkage, before the first access of that field or method. A `CONSTANT_Class` constant pool entry may refer to a primitive type using a `Q` descriptor as a "class name". A `CONSTANT_Class` using the plain name of a primitive class represents the class's reference type. The `aconst_init` instruction may refer to either a primitive type or a reference type. This determines whether a primitive value or a value object is produced. Similarly, a `CONSTANT_Fieldref` or `CONSTANT_Methodref` may refer to a field or method as a member of a primitive type or a reference type. In the case of `withfield`, this determines the result type of the operation. The `anewarray` and `multianewarray` instructions can be used to create arrays of declared primitive types. Array subtyping allows these arrays to be viewed as instances of reference array types. The `checkcast`, `instanceof`, and `aastore` opcodes support primitive value types, performing primitive value conversions (including `null` checks) when necessary. Primitive classes may be initialized for the same reasons as other classes (for example, before a static method is invoked). In addition, primitive class initialization is triggered by the `aconst_init` instruction, by each of the `anewarray` and `multianewarray` instructions when used with a primitive type, and (recursively) by initialization of another class that declares a primitive-typed field mentioning the primitive class. ### Core reflection Every primitive class has a `java.lang.Class` object representing the class. For both primitive values and value objects, the `getClass` method of the class's instances returns this object. A class literal?`Point.class`?can also be used to express this object. Tentatively: this `Class` object returns `true` from the `isPrimitive` method, and `getModifiers` shows its `Modifier.PRIMITIVE` flag set. For uses that need to model *types*, there is one `Class` object representing the primitive type, and another representing the reference type. Each of these have the same behavior as the `Class` object representing the class in most respects, except for methods to explicitly tell them apart and map from one to the other. Tentatively: the `Class` object representing the class doubles as a representation of the primitive type. A separate `Class` object exist for the purpose of representing the reference type. ### Other APIs The following APIs also gain new behaviors: - `java.lang.constant` encodes `Q` types in `CONSTANT_Class` structures and field and method descriptors - `java.lang.invoke` recognizes `Q` types and supports `L`-to-`Q` conversions - `javax.lang.model` recognizes primitive class declarations ### Performance model In typical usage, in heap storage and during fully-optimized code execution, declared primitive types should have a footprint and execution overhead comparable to the basic primitive types. For example, a `Point`, as declared above, can be expected to directly occupy 128 bits in local variables, parameters, fields, and array components. A field access simply extracts the first or second 64 bits. There are no additional pointers or metadata fields. Notably, a primitive class with a single instance field can be expected to have minimal overhead compared to operating on a value of the field's type directly. However, JVMs are ultimately free to encode primitive values however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with primitive values as objects. A primitive value might carry with it a cached value object pointer to reduce the overhead of future conversions. Etc. Value objects that are instances of primitive classes can be expected to behave much like instances of [other value classes][jep-values]. ### HotSpot implementation This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs. Values of `Q` types in HotSpot are encoded as follows: - Primitive classes whose field layouts exceed a size threshold are always encoded as regular heap objects. Fields marked `volatile` always store regular heap objects. - Otherwise, primitive values are encoded in fields and arrays as a flattened sequence of field values. Array components may be padded to achieve good alignment. - In the interpreter and C1, primitive values on the stack are represented as value objects. Each read of a primitive-typed field or array allocates a heap object. - In C2, primitive values on the stack are scalarized, effectively encoding each field as a separate variable. Methods with Q-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are also scalarized when working with the primitive class's reference type. Heap allocations occur where any other supertype is used. Default values are generally encoded as sequences of zeros, simplifying the task of field and array creation. However, in cases where a field or array encodes primitive values as heap pointers, the default value is a non-zero pointer. (Circularities may require this value to be `null` temporarily, but the `null` must be hidden from program code.) Some array types, like `[Ljava/lang/Object;` and `[LPoint;`, allow for both pointer-based and flattened arrays. Reads and writes for these types dynamically check a flag and perform the necessary conversions when operating on flattened arrays. Alternatives ------------ Making use of the basic primitive types, rather than declaring new primitives, will often produce a program with equivalent or slightly better performance. However, this approach gives up the valuable abstractions provided by classes. It's easy to, say, interpret a `double` with the wrong units, pass an out-of-range `int` to a library method, or fail to keep two `boolean` flags together in the right order. Normal value classes provide many of the benefits of primitive classes, without the substantial disruptions to the language and JVM type systems. With additional innovation in JVM implementation techniques and hardware capabilities, the gap may close further. However, the limitations outlined in the "Motivation" section are pretty fundamental. For example, a value class type wrapping a single `long` field and supporting the full range of `long` values for that field can never be encoded in fewer than 65 bits. Primitive classes give programmers who need fine-grained control a more reliable performance model. We considered many different approaches to boxing and polymorphism before settling on a model in which primitive values and value objects are two different representations, with two different types, of the same class instances. This strategy balances the traditional understanding of primitive types, with familiar semantics, performance expectations, and conversions to objects, with the simplicity of a single named class declaration for modeling data in both the primitive and reference spaces. Strategies in which a primitive value *is a* object obscure some important differences between the types. Strategies in which conversions occur between two different class-like entities introduce distracting complexity. Risks and Assumptions --------------------- There are security risks involved in allowing instance creation outside of constructors, via default instances and non-atomic reads and writes. Developers will need to understand the implications, and recognize when it would be unsafe to declare a class `primitive`. This JEP does not address the interaction of primitive classes with the basic primitives or generics; these features will be addressed by other JEPs (see below). But, ultimately, all three JEPs will need to be completed to deliver a cohesive language design. Dependencies ----------- This JEP depends on [Value Objects][jep-values], which establishes the semantics of primitives when treated as objects. Primitive classes are a special case of value classes. In support of this JEP, there are separate efforts to improve the JVM Specification (in particular its treatment of `class` file validation) and the Java Language Specification (in particular its treatment of types). These changes address technical debt and facilitate the specification of these new features. In [JEP 402][jep402] we propose to update the basic primitive types (`int`, `boolean`, etc.) to be represented by primitive classes, unifying the two kinds of primitive types. The existing wrapper classes will be repurposed to represent the corresponding types' primitive classes. In another JEP we will propose modifying the generics model in Java to make type parameters *universal*?instantiable by all types, both reference and primitive. In the future, JVM class and method specialization ([JEP 218][jep218], with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by primitive types. [jep402]: https://openjdk.java.net/jeps/402 [jep218]: https://openjdk.java.net/jeps/218 [jep-values]: https://openjdk.java.net/jeps/8277163 [jep-generics]: https://openjdk.java.net/jeps/8261529 From brian.goetz at oracle.com Fri Dec 17 13:11:14 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 17 Dec 2021 08:11:14 -0500 Subject: [External] : Re: Enhancing java.lang.constant for Valhalla In-Reply-To: References: <189870bb-c730-0c69-2119-3b71a00cd34a@oracle.com> Message-ID: Let's do an ASM thought experiment. The descriptors live in (a) {method,field}_info metadata, and (b) C_{Field,Method}Ref constants referred to by invoke/field access instructions. The stars, though, live somewhere completely different: the Preload attribute, which is not on the instruction, or the code attribute, or the method/field, but on the class. I would expect ClassVisitor to be enhanced with something like ??? visitPreload(String clazz) So, when reading a classfile, you get a bunch of preload "events", and then eventually, when you get to method/field metadata, or instructions, you get a bunch of events that have L descriptors in them, with no stars. ASM commits to delivering certain events before others, so when adapting, you might accumulate the visitPreload events into a List, and then if you are inserting new instructions that are supposed to use L*, if they're not in the list, you'd emit extra visitPreload calls.? (Presumably also ASM would want to filter the Preload values to eliminate duplicates.) Similarly, when writing a classfile, if you want to do a getstatic of a field known to be an L* field, you might do something like: ??? b.visitPreload(internalName(C)) ??? b.visitFieldInsn(GETSTATIC, receiverClass, "foo", eLdescriptor(C)) Which is to say, one of the costs of this scheme is that the stars go far away from the descriptors they are attached to (not even in 1:1 correspondence), and classfile manglers will have to keep this mapping somewhere. My first instinct is that putting the stars in the ClassDesc is putting the bookkeeping in the wrong place. Let's look at other uses of ClassDesc; one was the constant folding example.? We want to be able to intrinsify LDC operations, including condy, and indy calls.? Do any of them need preloading to work properly?? (The *s are about preloading constraints.) LDC'ing a C_Class will already force loading of the class (and besides, C_Class has no use for a *.) Invoking an indy which returns an L*Foo might want Foo preloaded.? I don't know enough about the timing of indy linkage to know whether all the classes in the type descriptor are loaded by the time the calling convention is set up, but I suspect it may already be? On 12/16/2021 4:24 PM, Dan Heidinga wrote: >> Which makes me ask ... if it really is a side channel, does it really go >> *in* the ClassDesc? > If it's not in the ClassDesc, then how do we communicate the side > channel to users - e.g. class file generators? > > I recently rewatched your JVMLS talk from 2018 [1] where javac > converted the jl.constant version of the descriptions into an `ldc`. > Now none of that is in the language yet but if we drop the stars from > descriptors now, the info won't be available when/if that vision comes > to fruition. > > --Dan > > [1]https://urldefense.com/v3/__https://www.youtube.com/watch?v=iSEjlLFCS3E&list=PLX8CzqL3ArzVnxC6PYxMlngEMv3W1pIkn&index=2&t=2s__;!!ACWV5N9M2RV99hQ!cSDE-Mpt9NsrFZwwEnoJ2sUm92OQqIlb9bDfBvr96zxWf-9NjmWc3mGzBCLje9p5EQ$ > From forax at univ-mlv.fr Sat Dec 18 11:53:07 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 18 Dec 2021 12:53:07 +0100 (CET) Subject: We have to talk about "primitive". In-Reply-To: References: Message-ID: <9352049.3304121.1639828387890.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "valhalla-spec-experts" > Sent: Mercredi 15 D?cembre 2021 19:42:55 > Subject: We have to talk about "primitive". > (Okay, so we're doing this) > I think the rename to "primitive classes" happened during my outage last year. > When I came back I made the decision to like it. > Since then, I've found that in my explanatory model I'm fighting against it > constantly. I think it may actually be fatally flawed. > The points I raise here were surely already known at the time, and I know there > were good reasons for overriding them. But I feel the need to come back and > push harder on their importance. > Background: the textbook definition of "primitive" is centered on their nature > of being elements-not-molecules, and I see no dispute about it. Also, there's > no disputing the fact that we're allowed to adopt a different meaning if we so > choose. So that's not even the fatal flaw. As already said by John, there are atoms in term of user defined types but not at runtime, apart if declared volatile, a long or a double is two 32 bits values. > The main problem I think we can't escape is that we'll still need some word that > means only the eight predefined types. (For the sake of argument let's assume > we can pick one and lean hard on it, whether that's "predefined", "built-in", > "elemental", "leaf type", or whatever.) I still hope that we can see used defined primitive and a builtin type the same way from a JLS point of view. Obviously, from the JVMS POV they are different, but i think one of our goal should be that the distinction between a builtin primitive and a user defined primitive should not visible in the JLS. > Definitely, our trying to minimize their specialness is virtuous. They should be > like helium: yes, they are molecules when you want a molecule! But on any > deeper look they will clearly be "actually" elements, and the distinction will > matter often enough. > So we have to attempt to shift users' understanding of "primitive" while at the > same time injecting a new term to mean exactly what primitive used to mean. > That's the old Indiana Jones switch and I don't have to tell you how that > turned out for him. > It would be difficult to pull off in a world where we were just pushing some new > server and the whole world gets the new model at once. But in this universe > where every version of Java ever made all have to coexist, it's looking to me > like a guaranteed source of never-ending confusion. > I also think it robs us of our ability to smoothly portray the real changes of > Valhalla. We want to be able to say "elements are still elements! now we have > molecules too". Pedagogically that is always preferable to "elements aren't > really what you thought they were". Okay, the real comparison is a little more > nuanced than that, but I'll get to that now. I agree retconing is better pedagogically because a lot of people think in term of analogy. > An alternative that seems to work fine, in my mental model at least, is: > * Primitive types are examples of value types, and have always been. > * Java never supported any other kinds of value types before, so we didn't > distinguish the terms before. > * Everything you associate with primitive types remains true. > * But most of those traits really come from their value-type-ness. > (I plan to make the above shifts to my model document already.) > * Now we have user-defined value types too. > * The way we user-define a type is with a class, so a value type is defined by a > "value class" (sorry B2). > * The primitive types will now each get a value class. > * These 8 classes will look as much like user-defined types as Object does. > * They, like Object, will have a "cheat" in their source code that no one else > gets to use. (Object's is that there is no implied `extends Object` or > `super();`; these need no fields because the data they store is magically > handled by the VM. These feel like similar cheats.) > Then mopping up the rest: > * Existing classes probably need a term like "reference classes" (in the model > I'm going to circulate that doubles down on values-are-not-objects, then this > wants to be "object classes", even though that feels weird at first). > * I think the term for bucket 2 classes really ought to center on > identitylessness, e.g. "noid", "noident", "idfree", or something. Anything else > is getting away from the essential meaning of the bucket; plus, we want people > to call bucket 1 classes "identity classes", don't we? > Footnote: for a more concrete manifestation of this problem: I am sure we cannot > possibly get away with Class.isPrimitive() being true for these classes. Right? > Thoughts? I agree but i don't think we should use "value type" as a term to encompass user defined primitive and builtin primitive. BTW, i think it's very interesting to have this discussion now that we have scramble the model by introducing the B1/B2/B3 model. This is how I see the thing, technically, we have 4 category, B1: user defined object with an identity, used by reference (nullable) B2: user defined object with no identity, used by reference (nullable) B3: user defined primitive with no identity, used as direct value B4: builtin primitive with no identity, used as direct value With the previous model of Valhalla, with had only the category B1, B2 and B3, so the cut was between having an identity or not to the point were we have introduced IdentityObject/ValueObject in the type system. I believe that introducing B2 change where we introduce the cut, we still hope that at the end we have only two category right ? I believe we should piggyback on the difference between reference vs direct value and do the cut here. After all, introducing B2 means that having identityless objects used by reference is useful. So, for me, it seems logical to group B1 and B2 together and to group B3 and B4 together and see B2 as a special king of B1 and B4 as a special kind of B3. So we have object or primitive, among the object, we have the one with identity and the one identityless, among the primitive, we have the one with Ref box and the one with historical box (Integer, etc). On the subject of boxes, i think we should go the other way, aka "you never really understood how box worked" because most people don't care about how a box work, and rightly so, once we get better generics, box will mostly disappear. I think we should not introduce the interfaces IdentityObject/ValueObject because it does not seem useful anymore to explain the new model, it's not the center of the model anymore, and their usefulness in term of typing is low. (We still need to consider empty abstract class as special but it's a detail for people wanting to play with primitive class + inheritance, so it's fairly specific). For a regular user of Java that does not care about the JVM details, - class/enum/record/lambda are handled by references, a class/enum/record can be declared identityless with a modifier, a lambda is identityless. - primitive are handled by direct values (so not nullable), they are tearable, have a default value/a non overridable default constructor, they are defined by the keyword primitive, the bulitins (the one written in lowercase) have a named box instead of Primitive.Box Examples: String is a class, Optional is an identityless class, Complex is a primitive, int is a builtin primitive R?mi From brian.goetz at oracle.com Mon Dec 20 17:54:01 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Dec 2021 12:54:01 -0500 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> Message-ID: I was working on some docs and am not sure if we came to a conclusion on the rules about who may, may not, or must declare ValueObject or IdentityObject. Let me see if I can chart the boundaries of the design space. I'll start with IdentityObject since it is more constrained. ?- Clearly for legacy classes, the VM is going to have to infer and inject IdentityObject. ?- Since IdentityObject is an interface, it is inherited; if my super implements IO, so am I. ?- It seems desirable that a user be *allowed* to name IdentityObject as a superinterface of an interface or abstract class, which constrains what subclasses can do.? (Alternately we could spell this "value interface" or "value abstract class"; this is a separate set of tradeoffs.) ?- There is value in having exactly one way to say certain things; it reduces the space of what has to be specified and tested. ?- I believe our goal is to know everything we need to know at class load time, and not to have to go back and do complex checks on a supertype when a subclass is loaded. The choice space seems to be ? user { must, may, may not } specify IO on concrete classes ? x compiler { must, may, may not } specify IO when ACC_VALUE present ? x VM (and reflection) { mops up } where "mopping up" minimally includes dealing with legacy classfiles. Asking the user to say "IdentityObject" on each identity class seems ridiculous, so we can drop that one. ? user { may, may not } specify IO on concrete classes ? x compiler { must, may, may not } specify IO when ACC_VALUE present ? x VM (and reflection) { mops up } From a user model perspective, it seems arbitrary to say the user may not explicitly say IO for concrete classes, but may so do for abstract classes.? So the two consistent user choices are either: ?- User can say "implements IO" anywhere they like ?- User cannot say "implements IO" anywhere, and instead we have an "identity" modifier which is optional on concrete classes and acts as a constraint on abstract classes/interfaces. While having an "identity" modifier is nice from a completeness perspective, the fact that it is probably erased to "implements IdentityObject" creates complication for reflection (and another asymmetry between reflection and javax.lang.model).? So it seems that just letting users say "implements IdentityObject" is reasonable. Given that the user has a choice, there is little value in "compiler may not inject", so the choice for the compiler here is "must" vs "may" inject.? Which is really asking whether we want to draw the VM line at legacy vs new classfiles, or merely adding IO as a default when nothing else has been selected. Note that asking the compiler to inject based on ACC_VALUE is also asking pretty much everything that touches bytecode to do this too, and likely to generate more errors from bytecode manglers.? The VM is doing inference either way, what we get to choose here is the axis. Let's put a pin in IO and come back to VO. The user is already saying "value", and we're stuck with the default being "identity".? Unless we want to have the user say "value interface" for a value-only interface (which moves some complexity into reflection, but is also a consistent model), I think we're stuck with letting the user specify either IO/VO on an abstract class / interface, which sort of drags us towards letting the user say it (redundantly) on concrete classes too. The compiler and VM will always type-check the consistency of the value keyword/bit and the implements clause.? So the real question is where the inference/injection happens.? And the VM will have to do injection for at least IO at least for legacy classes. So the choices for VM infer&inject seem to be: ?- Only inject IO for legacy concrete classes, based on classfile version, otherwise require everything to be explicit; ?- Inject IO for concrete classes when ACC_VALUE is not present, require VO to be explicit; ?- Inject IO for concrete classes when ACC_VALUE is not present; inject VO for concrete classes when ACC_VALUE is present Is infer&inject measurably more costly than just ordinary classfile checking?? It seems to me that if all things are equal, the simpler injection rule is preferable (the third), mostly on the basis of what it asks of humans who write code to manipulate bytecode, but if there's a real cost to the injection, then having the compiler help out is reasonable. (But in that case, it probably makes sense for the compiler to help out in all cases, not just VO.) On 12/2/2021 6:11 PM, Dan Smith wrote: >> On Dec 2, 2021, at 1:04 PM, Dan Heidinga wrote: >> >> On Thu, Dec 2, 2021 at 10:05 AM Dan Smith wrote: >>> On Dec 2, 2021, at 7:08 AM, Dan Heidinga wrote: >>> >>> When converting back from our internal form to a classfile for the >>> JVMTI RetransformClasses agents, I need to either filter the interface >>> out if we injected it or not if it was already there. JVMTI's >>> GetImplementedInterfaces call has a similar issue with being >>> consistent - and that's really the same issue as reflection. >>> >>> There's a lot of small places that can easily become inconsistent - >>> and therefore a lot of places that need to be checked - to hide >>> injected interfaces. The easiest solution to that is to avoid >>> injecting interfaces in cases where javac can do it for us so the VM >>> has a consistent view. >>> >>> >>> I think you may be envisioning extra complexity that isn't needed here. The plan of record is that we *won't* hide injected interfaces. >> +1. I'm 100% on board with this approach. It cleans up a lot of the >> potential corner cases. >> >>> Our hope is that the implicit/explicit distinction is meaningless?that turning implicit into explicit via JVMTI would be a 100% equivalent change. I don't know JVMTI well, so I'm not sure if there's some reason to think that wouldn't be acceptable... >> JVMTI's "GetImplementedInterfaces" spec will need some adaptation as >> it currently states "Return the direct super-interfaces of this class. >> For a class, this function returns the interfaces declared in its >> implements clause." >> >> The ClassFileLoadHook (CFLH) runs either with the original bytecodes >> as passed to the VM (the first time) or with "morally equivalent" >> bytecodes recreated by the VM from its internal classfile formats. >> The first time through the process the agent may see a value class >> that doesn't have the VO interface directly listed while after a call >> to {retransform,redefine}Classes, the VO interface may be directly >> listed. The same issues apply to the IO interface with legacy >> classfiles so with some minor spec updates, we can paper over that. >> >> Those are the only two places: GetImplementedInterfaces & CFLH and >> related redefine/retransform functions, I can find in the JVMTI spec >> that would be affected. Some minor spec updates should be able to >> address both to ensure an inconsistency in the observed behaviour is >> treated as valid. > Useful details, thanks. > > Would it be a problem if the ClassFileLoadHook gives different answers depending on the timing of the request (derived from original bytecodes vs. JVM-internal data)? If we need consistent answers, it may be that the "original bytecode" approach needs to reproduce the JVM's inference logic. If it's okay for the answers to change, there's less work to do. > > To highlight your last point: we *will* need to work this out for inferred IdentityObject, whether we decide to infer ValueObject or not. From forax at univ-mlv.fr Mon Dec 20 19:05:58 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 20 Dec 2021 20:05:58 +0100 (CET) Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> Message-ID: <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> Brian, the last time we talked about IdentityObject and ValueObject, you said that you were aware that introducing those interfaces will break some existing codes, but you wanted to know if it was a lot of codes or not. So i do not understand now why you want to mix IdentityObject/ValueObject with the runtime behavior, it seems risky and if we need to backout the introduction of those interfaces, it will more work than it should. Decoupling the typing part and the runtime behavior seems a better solution. Moreover, the split between IdentityObject and ValueObject makes less sense now that we have 3 kinds of value objects, the identityless reference (B2), the primitive (B3) and the builtin primitive (B4). Why do we want these types to be seen in the type system but not by example the set containing only B3 and B4 ? R?mi > From: "Brian Goetz" > To: "daniel smith" , "Dan Heidinga" > > Cc: "John Rose" , "valhalla-spec-experts" > > Sent: Lundi 20 D?cembre 2021 18:54:01 > Subject: Re: JEP update: Value Objects > I was working on some docs and am not sure if we came to a conclusion on the > rules about who may, may not, or must declare ValueObject or IdentityObject. > Let me see if I can chart the boundaries of the design space. I'll start with > IdentityObject since it is more constrained. > - Clearly for legacy classes, the VM is going to have to infer and inject > IdentityObject. > - Since IdentityObject is an interface, it is inherited; if my super implements > IO, so am I. > - It seems desirable that a user be *allowed* to name IdentityObject as a > superinterface of an interface or abstract class, which constrains what > subclasses can do. (Alternately we could spell this "value interface" or "value > abstract class"; this is a separate set of tradeoffs.) > - There is value in having exactly one way to say certain things; it reduces the > space of what has to be specified and tested. > - I believe our goal is to know everything we need to know at class load time, > and not to have to go back and do complex checks on a supertype when a subclass > is loaded. > The choice space seems to be > user { must, may, may not } specify IO on concrete classes > x compiler { must, may, may not } specify IO when ACC_VALUE present > x VM (and reflection) { mops up } > where "mopping up" minimally includes dealing with legacy classfiles. > Asking the user to say "IdentityObject" on each identity class seems ridiculous, > so we can drop that one. > user { may, may not } specify IO on concrete classes > x compiler { must, may, may not } specify IO when ACC_VALUE present > x VM (and reflection) { mops up } > From a user model perspective, it seems arbitrary to say the user may not > explicitly say IO for concrete classes, but may so do for abstract classes. So > the two consistent user choices are either: > - User can say "implements IO" anywhere they like > - User cannot say "implements IO" anywhere, and instead we have an "identity" > modifier which is optional on concrete classes and acts as a constraint on > abstract classes/interfaces. > While having an "identity" modifier is nice from a completeness perspective, the > fact that it is probably erased to "implements IdentityObject" creates > complication for reflection (and another asymmetry between reflection and > javax.lang.model). So it seems that just letting users say "implements > IdentityObject" is reasonable. > Given that the user has a choice, there is little value in "compiler may not > inject", so the choice for the compiler here is "must" vs "may" inject. Which > is really asking whether we want to draw the VM line at legacy vs new > classfiles, or merely adding IO as a default when nothing else has been > selected. Note that asking the compiler to inject based on ACC_VALUE is also > asking pretty much everything that touches bytecode to do this too, and likely > to generate more errors from bytecode manglers. The VM is doing inference > either way, what we get to choose here is the axis. > Let's put a pin in IO and come back to VO. > The user is already saying "value", and we're stuck with the default being > "identity". Unless we want to have the user say "value interface" for a > value-only interface (which moves some complexity into reflection, but is also > a consistent model), I think we're stuck with letting the user specify either > IO/VO on an abstract class / interface, which sort of drags us towards letting > the user say it (redundantly) on concrete classes too. > The compiler and VM will always type-check the consistency of the value > keyword/bit and the implements clause. So the real question is where the > inference/injection happens. And the VM will have to do injection for at least > IO at least for legacy classes. > So the choices for VM infer&inject seem to be: > - Only inject IO for legacy concrete classes, based on classfile version, > otherwise require everything to be explicit; > - Inject IO for concrete classes when ACC_VALUE is not present, require VO to be > explicit; > - Inject IO for concrete classes when ACC_VALUE is not present; inject VO for > concrete classes when ACC_VALUE is present > Is infer&inject measurably more costly than just ordinary classfile checking? It > seems to me that if all things are equal, the simpler injection rule is > preferable (the third), mostly on the basis of what it asks of humans who write > code to manipulate bytecode, but if there's a real cost to the injection, then > having the compiler help out is reasonable. (But in that case, it probably > makes sense for the compiler to help out in all cases, not just VO.) > On 12/2/2021 6:11 PM, Dan Smith wrote: >>> On Dec 2, 2021, at 1:04 PM, Dan Heidinga [ mailto:heidinga at redhat.com | >>> ] wrote: >>> On Thu, Dec 2, 2021 at 10:05 AM Dan Smith [ mailto:daniel.smith at oracle.com | >>> ] wrote: >>>> On Dec 2, 2021, at 7:08 AM, Dan Heidinga [ mailto:heidinga at redhat.com | >>>> ] wrote: >>>> When converting back from our internal form to a classfile for the >>>> JVMTI RetransformClasses agents, I need to either filter the interface >>>> out if we injected it or not if it was already there. JVMTI's >>>> GetImplementedInterfaces call has a similar issue with being >>>> consistent - and that's really the same issue as reflection. >>>> There's a lot of small places that can easily become inconsistent - >>>> and therefore a lot of places that need to be checked - to hide >>>> injected interfaces. The easiest solution to that is to avoid >>>> injecting interfaces in cases where javac can do it for us so the VM >>>> has a consistent view. >>>> I think you may be envisioning extra complexity that isn't needed here. The plan >>>> of record is that we *won't* hide injected interfaces. >>> +1. I'm 100% on board with this approach. It cleans up a lot of the >>> potential corner cases. >>>> Our hope is that the implicit/explicit distinction is meaningless?that turning >>>> implicit into explicit via JVMTI would be a 100% equivalent change. I don't >>>> know JVMTI well, so I'm not sure if there's some reason to think that wouldn't >>>> be acceptable... >>> JVMTI's "GetImplementedInterfaces" spec will need some adaptation as >>> it currently states "Return the direct super-interfaces of this class. >>> For a class, this function returns the interfaces declared in its >>> implements clause." >>> The ClassFileLoadHook (CFLH) runs either with the original bytecodes >>> as passed to the VM (the first time) or with "morally equivalent" >>> bytecodes recreated by the VM from its internal classfile formats. >>> The first time through the process the agent may see a value class >>> that doesn't have the VO interface directly listed while after a call >>> to {retransform,redefine}Classes, the VO interface may be directly >>> listed. The same issues apply to the IO interface with legacy >>> classfiles so with some minor spec updates, we can paper over that. >>> Those are the only two places: GetImplementedInterfaces & CFLH and >>> related redefine/retransform functions, I can find in the JVMTI spec >>> that would be affected. Some minor spec updates should be able to >>> address both to ensure an inconsistency in the observed behaviour is >>> treated as valid. >> Useful details, thanks. >> Would it be a problem if the ClassFileLoadHook gives different answers depending >> on the timing of the request (derived from original bytecodes vs. JVM-internal >> data)? If we need consistent answers, it may be that the "original bytecode" >> approach needs to reproduce the JVM's inference logic. If it's okay for the >> answers to change, there's less work to do. >> To highlight your last point: we *will* need to work this out for inferred >> IdentityObject, whether we decide to infer ValueObject or not. From brian.goetz at oracle.com Mon Dec 20 19:26:01 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Dec 2021 14:26:01 -0500 Subject: Do we even need IO/VO interfaces? (was: JEP update: Value Objects) In-Reply-To: <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> Message-ID: I thought we were wrapping this up; I'm not sure how we got back to "do we even need these at all", but OK.? Splitting off a separate (hopefully short) thread. These interfaces serve both a dynamic and static role. Statically, they allow us to constrain inputs, such as: ??? void runWithLock(IdentityObject lock, Runnable task) and similar use in generic type bounds. Dynamically, they allow code to check before doing something partial: ??? if (x instanceof IdentityObject) { synchronized(x) { ... } } rather than trying and dealing with IMSE. Introducing new interfaces that have no methods is clearly source- and binary compatible, so I am not particularly compelled by "some very brittle and badly written code might break."? So far, no one has proposed any examples that would make us reconsider that. As to "value class" vs "primitive class" vs "built in primitive", I see no reason to add *additional* mechanisms by which to distinguish these in either the static or dynamic type systems; the salient difference is identity vs value. (Reflection will almost certainly give us means to ask questions about how the class was declared, though.) As to B3: instanceof operates on reference types, so (at least from a pure spec / model perspective), `x instanceof T` gets answered on value instances by lifting to the reference type, and answering the question there.? So it would not even be a sensible question to ask "are you a primitive value vs primitive reference"; subtyping is a "reference affordance", and questions about subtyping are answered in the reference domain. And to B4: the goal is to make B3 and B4 as similar as possible; there are going to be obvious ways in which we can't do this, but this should not be relevant to either the static or dynamic type system. On 12/20/2021 2:05 PM, Remi Forax wrote: > Brian, > the last time we talked about IdentityObject and ValueObject, you said > that you were aware that introducing those interfaces will break some > existing codes, > but you wanted to know if it was a lot of codes or not. > > So i do not understand now why you want to mix > IdentityObject/ValueObject with the runtime behavior, it seems risky > and if we need to backout the introduction of those interfaces, it > will more work than it should. > Decoupling the typing part and the runtime behavior seems a better > solution. > > Moreover, the split between IdentityObject and ValueObject makes less > sense now that we have 3 kinds of value objects, the identityless > reference (B2), the primitive (B3) and the builtin primitive (B4). > Why do we want these types to be seen in the type system but not by > example the set containing only B3 and B4 ? > > R?mi > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"daniel smith" , "Dan Heidinga" > > *Cc: *"John Rose" , > "valhalla-spec-experts" > *Sent: *Lundi 20 D?cembre 2021 18:54:01 > *Subject: *Re: JEP update: Value Objects > > I was working on some docs and am not sure if we came to a > conclusion on the rules about who may, may not, or must declare > ValueObject or IdentityObject. > > Let me see if I can chart the boundaries of the design space.? > I'll start with IdentityObject since it is more constrained. > > ?- Clearly for legacy classes, the VM is going to have to infer > and inject IdentityObject. > ?- Since IdentityObject is an interface, it is inherited; if my > super implements IO, so am I. > ?- It seems desirable that a user be *allowed* to name > IdentityObject as a superinterface of an interface or abstract > class, which constrains what subclasses can do.? (Alternately we > could spell this "value interface" or "value abstract class"; this > is a separate set of tradeoffs.) > ?- There is value in having exactly one way to say certain things; > it reduces the space of what has to be specified and tested. > ?- I believe our goal is to know everything we need to know at > class load time, and not to have to go back and do complex checks > on a supertype when a subclass is loaded. > > The choice space seems to be > ? user { must, may, may not } specify IO on concrete classes > ? x compiler { must, may, may not } specify IO when ACC_VALUE present > ? x VM (and reflection) { mops up } > > where "mopping up" minimally includes dealing with legacy classfiles. > > Asking the user to say "IdentityObject" on each identity class > seems ridiculous, so we can drop that one. > > ? user { may, may not } specify IO on concrete classes > ? x compiler { must, may, may not } specify IO when ACC_VALUE present > ? x VM (and reflection) { mops up } > > From a user model perspective, it seems arbitrary to say the user > may not explicitly say IO for concrete classes, but may so do for > abstract classes.? So the two consistent user choices are either: > > ?- User can say "implements IO" anywhere they like > ?- User cannot say "implements IO" anywhere, and instead we have > an "identity" modifier which is optional on concrete classes and > acts as a constraint on abstract classes/interfaces. > > While having an "identity" modifier is nice from a completeness > perspective, the fact that it is probably erased to "implements > IdentityObject" creates complication for reflection (and another > asymmetry between reflection and javax.lang.model).? So it seems > that just letting users say "implements IdentityObject" is > reasonable. > > Given that the user has a choice, there is little value in > "compiler may not inject", so the choice for the compiler here is > "must" vs "may" inject.? Which is really asking whether we want to > draw the VM line at legacy vs new classfiles, or merely adding IO > as a default when nothing else has been selected.? Note that > asking the compiler to inject based on ACC_VALUE is also asking > pretty much everything that touches bytecode to do this too, and > likely to generate more errors from bytecode manglers.? The VM is > doing inference either way, what we get to choose here is the axis. > > Let's put a pin in IO and come back to VO. > > The user is already saying "value", and we're stuck with the > default being "identity".? Unless we want to have the user say > "value interface" for a value-only interface (which moves some > complexity into reflection, but is also a consistent model), I > think we're stuck with letting the user specify either IO/VO on an > abstract class / interface, which sort of drags us towards letting > the user say it (redundantly) on concrete classes too. > > The compiler and VM will always type-check the consistency of the > value keyword/bit and the implements clause.? So the real question > is where the inference/injection happens.? And the VM will have to > do injection for at least IO at least for legacy classes. > > So the choices for VM infer&inject seem to be: > > ?- Only inject IO for legacy concrete classes, based on classfile > version, otherwise require everything to be explicit; > ?- Inject IO for concrete classes when ACC_VALUE is not present, > require VO to be explicit; > ?- Inject IO for concrete classes when ACC_VALUE is not present; > inject VO for concrete classes when ACC_VALUE is present > > Is infer&inject measurably more costly than just ordinary > classfile checking?? It seems to me that if all things are equal, > the simpler injection rule is preferable (the third), mostly on > the basis of what it asks of humans who write code to manipulate > bytecode, but if there's a real cost to the injection, then having > the compiler help out is reasonable.? (But in that case, it > probably makes sense for the compiler to help out in all cases, > not just VO.) > > > > On 12/2/2021 6:11 PM, Dan Smith wrote: > > On Dec 2, 2021, at 1:04 PM, Dan Heidinga wrote: > > On Thu, Dec 2, 2021 at 10:05 AM Dan Smith wrote: > > On Dec 2, 2021, at 7:08 AM, Dan Heidinga wrote: > > When converting back from our internal form to a classfile for the > JVMTI RetransformClasses agents, I need to either filter the interface > out if we injected it or not if it was already there. JVMTI's > GetImplementedInterfaces call has a similar issue with being > consistent - and that's really the same issue as reflection. > > There's a lot of small places that can easily become inconsistent - > and therefore a lot of places that need to be checked - to hide > injected interfaces. The easiest solution to that is to avoid > injecting interfaces in cases where javac can do it for us so the VM > has a consistent view. > > > I think you may be envisioning extra complexity that isn't needed here. The plan of record is that we *won't* hide injected interfaces. > > +1. I'm 100% on board with this approach. It cleans up a lot of the > potential corner cases. > > Our hope is that the implicit/explicit distinction is meaningless?that turning implicit into explicit via JVMTI would be a 100% equivalent change. I don't know JVMTI well, so I'm not sure if there's some reason to think that wouldn't be acceptable... > > JVMTI's "GetImplementedInterfaces" spec will need some adaptation as > it currently states "Return the direct super-interfaces of this class. > For a class, this function returns the interfaces declared in its > implements clause." > > The ClassFileLoadHook (CFLH) runs either with the original bytecodes > as passed to the VM (the first time) or with "morally equivalent" > bytecodes recreated by the VM from its internal classfile formats. > The first time through the process the agent may see a value class > that doesn't have the VO interface directly listed while after a call > to {retransform,redefine}Classes, the VO interface may be directly > listed. The same issues apply to the IO interface with legacy > classfiles so with some minor spec updates, we can paper over that. > > Those are the only two places: GetImplementedInterfaces & CFLH and > related redefine/retransform functions, I can find in the JVMTI spec > that would be affected. Some minor spec updates should be able to > address both to ensure an inconsistency in the observed behaviour is > treated as valid. > > Useful details, thanks. > > Would it be a problem if the ClassFileLoadHook gives different answers depending on the timing of the request (derived from original bytecodes vs. JVM-internal data)? If we need consistent answers, it may be that the "original bytecode" approach needs to reproduce the JVM's inference logic. If it's okay for the answers to change, there's less work to do. > > To highlight your last point: we *will* need to work this out for inferred IdentityObject, whether we decide to infer ValueObject or not. > > > From forax at univ-mlv.fr Tue Dec 21 00:00:36 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 21 Dec 2021 01:00:36 +0100 (CET) Subject: Do we even need IO/VO interfaces? (was: JEP update: Value Objects) In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> Message-ID: <410768406.203519.1640044836675.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "daniel smith" , "Dan Heidinga" > , "John Rose" , > "valhalla-spec-experts" > Sent: Lundi 20 D?cembre 2021 20:26:01 > Subject: Do we even need IO/VO interfaces? (was: JEP update: Value Objects) > I thought we were wrapping this up; I'm not sure how we got back to "do we even > need these at all", but OK. Splitting off a separate (hopefully short) thread. > These interfaces serve both a dynamic and static role. Statically, they allow us > to constrain inputs, such as: > void runWithLock(IdentityObject lock, Runnable task) > and similar use in generic type bounds. > Dynamically, they allow code to check before doing something partial: > if (x instanceof IdentityObject) { synchronized(x) { ... } } > rather than trying and dealing with IMSE. The static role is defeated by having a java.lang.Object, a super type for both IdentityObject and ValueObject. java.io.Serializable is useless as a type, ObjectOutputStream.writeObject() takes an Object not a Serializable, same for Arrays.sort() that takes an Object[] and not an array of Comparable, IdentityObject (like Serializable or Comparable) as a type can be easily lost because of the existence of Object. If the type IdentityObject can be lost, as a designer, there is little point to have a method that takes an IdentityObject as parameter, because it forces the user of the API to use a cast, trading a CCE for an IMSE. For the dynamic role, x.getClass().isValue() does the same thing in a more efficient way (apart if the VM has a special optimization for IdentityObject). Moreover, there are very few methods that synchronize on a user provided object because it makes the concurrent code hard to reason about it. Adding a bit in the type system to support codes that people should not write is not exactly a win. > Introducing new interfaces that have no methods is clearly source- and binary > compatible, so I am not particularly compelled by "some very brittle and badly > written code might break." So far, no one has proposed any examples that would > make us reconsider that. ??; you are forgetting inference, this code will fail to compile class A {} class B {} var list = List.of(new A(), new B()); List list2 = list: > As to "value class" vs "primitive class" vs "built in primitive", I see no > reason to add *additional* mechanisms by which to distinguish these in either > the static or dynamic type systems; the salient difference is identity vs > value. (Reflection will almost certainly give us means to ask questions about > how the class was declared, though.) Primitive (builtin or not) allows tearing, so we should introduce two interfaces TearableObject and NonTeareableObject, because knowing if something is tearable or not clearly changes the algorithm that can be used. > As to B3: instanceof operates on reference types, so (at least from a pure spec > / model perspective), `x instanceof T` gets answered on value instances by > lifting to the reference type, and answering the question there. So it would > not even be a sensible question to ask "are you a primitive value vs primitive > reference"; subtyping is a "reference affordance", and questions about > subtyping are answered in the reference domain. > And to B4: the goal is to make B3 and B4 as similar as possible; there are going > to be obvious ways in which we can't do this, but this should not be relevant > to either the static or dynamic type system. I agree that B3 and B4 should be as similar as possible, we still need Class.isPrimitive() to only return true for builtin primitives to be backward compatible. R?mi > On 12/20/2021 2:05 PM, Remi Forax wrote: >> Brian, >> the last time we talked about IdentityObject and ValueObject, you said that you >> were aware that introducing those interfaces will break some existing codes, >> but you wanted to know if it was a lot of codes or not. >> So i do not understand now why you want to mix IdentityObject/ValueObject with >> the runtime behavior, it seems risky and if we need to backout the introduction >> of those interfaces, it will more work than it should. >> Decoupling the typing part and the runtime behavior seems a better solution. >> Moreover, the split between IdentityObject and ValueObject makes less sense now >> that we have 3 kinds of value objects, the identityless reference (B2), the >> primitive (B3) and the builtin primitive (B4). >> Why do we want these types to be seen in the type system but not by example the >> set containing only B3 and B4 ? >> R?mi >>> From: "Brian Goetz" [ mailto:brian.goetz at oracle.com | ] >>> To: "daniel smith" [ mailto:daniel.smith at oracle.com | >>> ] , "Dan Heidinga" [ mailto:heidinga at redhat.com | ] >>> Cc: "John Rose" [ mailto:john.r.rose at oracle.com | ] , >>> "valhalla-spec-experts" [ mailto:valhalla-spec-experts at openjdk.java.net | >>> ] >>> Sent: Lundi 20 D?cembre 2021 18:54:01 >>> Subject: Re: JEP update: Value Objects >>> I was working on some docs and am not sure if we came to a conclusion on the >>> rules about who may, may not, or must declare ValueObject or IdentityObject. >>> Let me see if I can chart the boundaries of the design space. I'll start with >>> IdentityObject since it is more constrained. >>> - Clearly for legacy classes, the VM is going to have to infer and inject >>> IdentityObject. >>> - Since IdentityObject is an interface, it is inherited; if my super implements >>> IO, so am I. >>> - It seems desirable that a user be *allowed* to name IdentityObject as a >>> superinterface of an interface or abstract class, which constrains what >>> subclasses can do. (Alternately we could spell this "value interface" or "value >>> abstract class"; this is a separate set of tradeoffs.) >>> - There is value in having exactly one way to say certain things; it reduces the >>> space of what has to be specified and tested. >>> - I believe our goal is to know everything we need to know at class load time, >>> and not to have to go back and do complex checks on a supertype when a subclass >>> is loaded. >>> The choice space seems to be >>> user { must, may, may not } specify IO on concrete classes >>> x compiler { must, may, may not } specify IO when ACC_VALUE present >>> x VM (and reflection) { mops up } >>> where "mopping up" minimally includes dealing with legacy classfiles. >>> Asking the user to say "IdentityObject" on each identity class seems ridiculous, >>> so we can drop that one. >>> user { may, may not } specify IO on concrete classes >>> x compiler { must, may, may not } specify IO when ACC_VALUE present >>> x VM (and reflection) { mops up } >>> From a user model perspective, it seems arbitrary to say the user may not >>> explicitly say IO for concrete classes, but may so do for abstract classes. So >>> the two consistent user choices are either: >>> - User can say "implements IO" anywhere they like >>> - User cannot say "implements IO" anywhere, and instead we have an "identity" >>> modifier which is optional on concrete classes and acts as a constraint on >>> abstract classes/interfaces. >>> While having an "identity" modifier is nice from a completeness perspective, the >>> fact that it is probably erased to "implements IdentityObject" creates >>> complication for reflection (and another asymmetry between reflection and >>> javax.lang.model). So it seems that just letting users say "implements >>> IdentityObject" is reasonable. >>> Given that the user has a choice, there is little value in "compiler may not >>> inject", so the choice for the compiler here is "must" vs "may" inject. Which >>> is really asking whether we want to draw the VM line at legacy vs new >>> classfiles, or merely adding IO as a default when nothing else has been >>> selected. Note that asking the compiler to inject based on ACC_VALUE is also >>> asking pretty much everything that touches bytecode to do this too, and likely >>> to generate more errors from bytecode manglers. The VM is doing inference >>> either way, what we get to choose here is the axis. >>> Let's put a pin in IO and come back to VO. >>> The user is already saying "value", and we're stuck with the default being >>> "identity". Unless we want to have the user say "value interface" for a >>> value-only interface (which moves some complexity into reflection, but is also >>> a consistent model), I think we're stuck with letting the user specify either >>> IO/VO on an abstract class / interface, which sort of drags us towards letting >>> the user say it (redundantly) on concrete classes too. >>> The compiler and VM will always type-check the consistency of the value >>> keyword/bit and the implements clause. So the real question is where the >>> inference/injection happens. And the VM will have to do injection for at least >>> IO at least for legacy classes. >>> So the choices for VM infer&inject seem to be: >>> - Only inject IO for legacy concrete classes, based on classfile version, >>> otherwise require everything to be explicit; >>> - Inject IO for concrete classes when ACC_VALUE is not present, require VO to be >>> explicit; >>> - Inject IO for concrete classes when ACC_VALUE is not present; inject VO for >>> concrete classes when ACC_VALUE is present >>> Is infer&inject measurably more costly than just ordinary classfile checking? It >>> seems to me that if all things are equal, the simpler injection rule is >>> preferable (the third), mostly on the basis of what it asks of humans who write >>> code to manipulate bytecode, but if there's a real cost to the injection, then >>> having the compiler help out is reasonable. (But in that case, it probably >>> makes sense for the compiler to help out in all cases, not just VO.) >>> On 12/2/2021 6:11 PM, Dan Smith wrote: >>>>> On Dec 2, 2021, at 1:04 PM, Dan Heidinga [ mailto:heidinga at redhat.com | >>>>> ] wrote: >>>>> On Thu, Dec 2, 2021 at 10:05 AM Dan Smith [ mailto:daniel.smith at oracle.com | >>>>> ] wrote: >>>>>> On Dec 2, 2021, at 7:08 AM, Dan Heidinga [ mailto:heidinga at redhat.com | >>>>>> ] wrote: >>>>>> When converting back from our internal form to a classfile for the >>>>>> JVMTI RetransformClasses agents, I need to either filter the interface >>>>>> out if we injected it or not if it was already there. JVMTI's >>>>>> GetImplementedInterfaces call has a similar issue with being >>>>>> consistent - and that's really the same issue as reflection. >>>>>> There's a lot of small places that can easily become inconsistent - >>>>>> and therefore a lot of places that need to be checked - to hide >>>>>> injected interfaces. The easiest solution to that is to avoid >>>>>> injecting interfaces in cases where javac can do it for us so the VM >>>>>> has a consistent view. >>>>>> I think you may be envisioning extra complexity that isn't needed here. The plan >>>>>> of record is that we *won't* hide injected interfaces. >>>>> +1. I'm 100% on board with this approach. It cleans up a lot of the >>>>> potential corner cases. >>>>>> Our hope is that the implicit/explicit distinction is meaningless?that turning >>>>>> implicit into explicit via JVMTI would be a 100% equivalent change. I don't >>>>>> know JVMTI well, so I'm not sure if there's some reason to think that wouldn't >>>>>> be acceptable... >>>>> JVMTI's "GetImplementedInterfaces" spec will need some adaptation as >>>>> it currently states "Return the direct super-interfaces of this class. >>>>> For a class, this function returns the interfaces declared in its >>>>> implements clause." >>>>> The ClassFileLoadHook (CFLH) runs either with the original bytecodes >>>>> as passed to the VM (the first time) or with "morally equivalent" >>>>> bytecodes recreated by the VM from its internal classfile formats. >>>>> The first time through the process the agent may see a value class >>>>> that doesn't have the VO interface directly listed while after a call >>>>> to {retransform,redefine}Classes, the VO interface may be directly >>>>> listed. The same issues apply to the IO interface with legacy >>>>> classfiles so with some minor spec updates, we can paper over that. >>>>> Those are the only two places: GetImplementedInterfaces & CFLH and >>>>> related redefine/retransform functions, I can find in the JVMTI spec >>>>> that would be affected. Some minor spec updates should be able to >>>>> address both to ensure an inconsistency in the observed behaviour is >>>>> treated as valid. >>>> Useful details, thanks. >>>> Would it be a problem if the ClassFileLoadHook gives different answers depending >>>> on the timing of the request (derived from original bytecodes vs. JVM-internal >>>> data)? If we need consistent answers, it may be that the "original bytecode" >>>> approach needs to reproduce the JVM's inference logic. If it's okay for the >>>> answers to change, there's less work to do. >>>> To highlight your last point: we *will* need to work this out for inferred >>>> IdentityObject, whether we decide to infer ValueObject or not. From brian.goetz at oracle.com Tue Dec 21 00:07:15 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Dec 2021 19:07:15 -0500 Subject: [External] : Re: Do we even need IO/VO interfaces? (was: JEP update: Value Objects) In-Reply-To: <410768406.203519.1640044836675.JavaMail.zimbra@u-pem.fr> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> <410768406.203519.1640044836675.JavaMail.zimbra@u-pem.fr> Message-ID: <9cc80909-6f9d-d604-26f7-bb387ccc677b@oracle.com> > > > Introducing new interfaces that have no methods is clearly source- > and binary compatible, so I am not particularly compelled by "some > very brittle and badly written code might break."? So far, no one > has proposed any examples that would make us reconsider that. > > > ??; > you are forgetting inference, this code will fail to compile > ? class A {} > ? class B {} > ? var list = List.of(new A(), new B()); > ? List list2 = list: > Good catch.? There is precedent for leaving certain interfaces out of inference, though; I suspect we will want to do this for these interfaces too. From forax at univ-mlv.fr Tue Dec 21 01:00:05 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 21 Dec 2021 02:00:05 +0100 (CET) Subject: [External] : Re: Do we even need IO/VO interfaces? (was: JEP update: Value Objects) In-Reply-To: <9cc80909-6f9d-d604-26f7-bb387ccc677b@oracle.com> References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <816087489.174195.1640027158110.JavaMail.zimbra@u-pem.fr> <410768406.203519.1640044836675.JavaMail.zimbra@u-pem.fr> <9cc80909-6f9d-d604-26f7-bb387ccc677b@oracle.com> Message-ID: <802119714.214685.1640048405163.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "daniel smith" , "Dan Heidinga" > , "John Rose" , > "valhalla-spec-experts" > Sent: Mardi 21 D?cembre 2021 01:07:15 > Subject: Re: [External] : Re: Do we even need IO/VO interfaces? (was: JEP > update: Value Objects) >>> Introducing new interfaces that have no methods is clearly source- and binary >>> compatible, so I am not particularly compelled by "some very brittle and badly >>> written code might break." So far, no one has proposed any examples that would >>> make us reconsider that. >> ??; >> you are forgetting inference, this code will fail to compile >> class A {} >> class B {} >> var list = List.of(new A(), new B()); >> List list2 = list: > Good catch. There is precedent for leaving certain interfaces out of inference, > though; I suspect we will want to do this for these interfaces too. The problem is that these interfaces are only useful if they are propagated along the expression flow. But - if something is typed Object or Object[], that information is lost - if something is typed with an interface, that information is lost (only the concrete classes implement those interfaces) - you are saying that in case of inference, they are removed from the flow too. It seems they are only useful on a blue moon. R?mi From daniel.smith at oracle.com Tue Dec 21 20:07:17 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 21 Dec 2021 20:07:17 +0000 Subject: JEP update: Value Objects In-Reply-To: References: <68250ADC-90BB-43EC-A646-77127091D4BD@oracle.com> <117E6CD9-9D94-4110-BA40-3778FC207977@oracle.com> <8AD4B184-2937-4146-A763-612E31E64683@oracle.com> <6776971B-F8B1-416D-8A4F-32EAE842AC03@oracle.com> <82A9C5AA-F0F3-4FB7-BF36-B6557103080E@oracle.com> Message-ID: > On Dec 20, 2021, at 10:54 AM, Brian Goetz wrote: > > > So the choices for VM infer&inject seem to be: > > - Only inject IO for legacy concrete classes, based on classfile version, otherwise require everything to be explicit; > - Inject IO for concrete classes when ACC_VALUE is not present, require VO to be explicit; > - Inject IO for concrete classes when ACC_VALUE is not present; inject VO for concrete classes when ACC_VALUE is present > One more dimension to this is whether "inject" and "require" are talking about an element in the `interfaces` array of the declaration, or simply the presence of the interface via some combination of inheritance/declaration. The latter seems more natural. But in "require" cases, it leads to surprising binary incompatibilities (per some comments I made earlier in the thread): 1) declare `interface Foo extends ValueObject` and `value class Bar extends Foo` 2) compile; javac excludes ValueObject from Bar's `interfaces` 3) Modify Foo, removing `extends ValueObject` (turns out I was overly eager when I put in that constraint, and I actually wouldn't mind subclasses that are identity classes) 4) recompile Foo separately, which succeeds 5) Try running, and discover that class Bar refuses to load, with an error saying it doesn't implement ValueObject ("of course it does!" you say?"it's a value class") Inference is nice in that it will happily paper over these sorts of separate compilation mismatches. From brian.goetz at oracle.com Thu Dec 23 17:14:43 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 23 Dec 2021 17:14:43 +0000 Subject: Updated State of Valhalla documents Message-ID: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> Just in time for Christmas, the latest State of Valhalla is available! https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model The main focus for the last year has been finding the right way to expose the Valhalla features in the user model, in a way that is cleanly factored, intuitive, and clearly connects with where the platform has come from. I am very pleased with where this has landed. There are several more installments in the works, but these should give plenty to chew on for now! Simple corrections accepted as PRs against valhalla-docs. From forax at univ-mlv.fr Thu Dec 23 18:35:16 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 23 Dec 2021 19:35:16 +0100 (CET) Subject: Updated State of Valhalla documents In-Reply-To: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> Message-ID: <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Thursday, December 23, 2021 6:14:43 PM > Subject: Updated State of Valhalla documents > Just in time for Christmas, the latest State of Valhalla is available! > [ > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background > | > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background > ] > [ > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model > | > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model > ] > [ > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model > | > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model > ] > The main focus for the last year has been finding the right way to expose the > Valhalla features in the user model, in a way that is cleanly factored, > intuitive, and clearly connects with where the platform has come from. I am > very pleased with where this has landed. > There are several more installments in the works, but these should give plenty > to chew on for now! I've done a rapid reading, in the objec-model primitive class Point implements Serializable should be primitive Point implements Serializable "value" is a modifier but "primitive" is a top level type. The design in part 3 is cool, because if i'm not mistaken, you can implement value classes without the support of Qtype in the classfile. R?mi From john.r.rose at oracle.com Thu Dec 23 18:51:14 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 23 Dec 2021 18:51:14 +0000 Subject: Updated State of Valhalla documents In-Reply-To: <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> Message-ID: <112398AD-D910-4AF4-8644-844A07DE6539@oracle.com> On Dec 23, 2021, at 10:35 AM, Remi Forax wrote: ? ________________________________ From: "Brian Goetz" To: "valhalla-spec-experts" Sent: Thursday, December 23, 2021 6:14:43 PM Subject: Updated State of Valhalla documents Just in time for Christmas, the latest State of Valhalla is available! https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model The main focus for the last year has been finding the right way to expose the Valhalla features in the user model, in a way that is cleanly factored, intuitive, and clearly connects with where the platform has come from. I am very pleased with where this has landed. There are several more installments in the works, but these should give plenty to chew on for now! I've done a rapid reading, in the objec-model primitive class Point implements Serializable should be primitive Point implements Serializable "value" is a modifier but "primitive" is a top level type. I call bike shed on that! Since a primitive class file defines two types we have a choice in how to convey that in the source notation. This may evolve further of course and even to the place you suggest. The design in part 3 is cool, because if i'm not mistaken, you can implement value classes without the support of Qtype in the classfile. Thank you. That is correct! This is a big result of the refactoring work, and to a lower total complexity. R?mi From forax at univ-mlv.fr Thu Dec 23 19:26:08 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 23 Dec 2021 20:26:08 +0100 (CET) Subject: Updated State of Valhalla documents In-Reply-To: <112398AD-D910-4AF4-8644-844A07DE6539@oracle.com> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> <112398AD-D910-4AF4-8644-844A07DE6539@oracle.com> Message-ID: <394037657.1216754.1640287568639.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Sent: Thursday, December 23, 2021 7:51:14 PM > Subject: Re: Updated State of Valhalla documents >> On Dec 23, 2021, at 10:35 AM, Remi Forax wrote: >>> From: "Brian Goetz" >>> To: "valhalla-spec-experts" >>> Sent: Thursday, December 23, 2021 6:14:43 PM >>> Subject: Updated State of Valhalla documents >>> Just in time for Christmas, the latest State of Valhalla is available! >>> [ >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background >>> | >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background >>> ] >>> [ >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model >>> | >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model >>> ] >>> [ >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model >>> | >>> https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model >>> ] >>> The main focus for the last year has been finding the right way to expose the >>> Valhalla features in the user model, in a way that is cleanly factored, >>> intuitive, and clearly connects with where the platform has come from. I am >>> very pleased with where this has landed. >>> There are several more installments in the works, but these should give plenty >>> to chew on for now! >> I've done a rapid reading, >> in the objec-model >> primitive class Point implements Serializable >> should be >> primitive Point implements Serializable >> "value" is a modifier but "primitive" is a top level type. > I call bike shed on that! Since a primitive class file defines two types we have > a choice in how to convey that in the source notation. This may evolve further > of course and even to the place you suggest. For "value", we know that we want value class and value record, so it's more like a modifier. For primitive, do we want a primitive record ? The VM supports it, but do we want to offer that possibility in Java ? My gut feeling is that the answer is "No" because of what Kevin said earlier, we should drive users to use value classes instead of primitives. >> The design in part 3 is cool, because if i'm not mistaken, you can implement >> value classes without the support of Qtype in the classfile. > Thank you. That is correct! This is a big result of the refactoring work, and to > a lower total complexity. yes ! R?mi From john.r.rose at oracle.com Thu Dec 23 19:43:22 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 23 Dec 2021 11:43:22 -0800 Subject: [External] : Re: Updated State of Valhalla documents In-Reply-To: <394037657.1216754.1640287568639.JavaMail.zimbra@u-pem.fr> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> <112398AD-D910-4AF4-8644-844A07DE6539@oracle.com> <394037657.1216754.1640287568639.JavaMail.zimbra@u-pem.fr> Message-ID: <056021F6-4BDF-43C8-A3DC-C9D2B064FC59@oracle.com> On 23 Dec 2021, at 11:26, forax at univ-mlv.fr wrote: > For "value", we know that we want value class and value record, so > it's more like a modifier. > For primitive, do we want a primitive record ? The VM supports it, but > do we want to offer that possibility in Java ? > My gut feeling is that the answer is "No" because of what Kevin said > earlier, we should drive users to use value classes instead of > primitives. Good points, though not sure if they carry the decision completely the other way. The VM sees primitive as a classfile modifier. (The `ACC_PRIMITIVE` modifier flag!) You are raising the question of whether this is smart for the language as well. For further discussion and perhaps experimentation. From forax at univ-mlv.fr Thu Dec 23 19:58:03 2021 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 23 Dec 2021 20:58:03 +0100 (CET) Subject: [External] : Re: Updated State of Valhalla documents In-Reply-To: <056021F6-4BDF-43C8-A3DC-C9D2B064FC59@oracle.com> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> <1471584940.1211017.1640284516318.JavaMail.zimbra@u-pem.fr> <112398AD-D910-4AF4-8644-844A07DE6539@oracle.com> <394037657.1216754.1640287568639.JavaMail.zimbra@u-pem.fr> <056021F6-4BDF-43C8-A3DC-C9D2B064FC59@oracle.com> Message-ID: <313972565.1229854.1640289483686.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Sent: Thursday, December 23, 2021 8:43:22 PM > Subject: Re: [External] : Re: Updated State of Valhalla documents > On 23 Dec 2021, at 11:26, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >> For "value", we know that we want value class and value record, so it's more >> like a modifier. >> For primitive, do we want a primitive record ? The VM supports it, but do we >> want to offer that possibility in Java ? >> My gut feeling is that the answer is "No" because of what Kevin said earlier, we >> should drive users to use value classes instead of primitives. > Good points, though not sure if they carry the decision completely the other > way. The VM sees primitive as a classfile modifier. (The ACC_PRIMITIVE modifier > flag!) You are raising the question of whether this is smart for the language > as well. For further discussion and perhaps experimentation. This re-join with the discussion about where to cut. >From the VM POV, which is interested by the runtime characteristics, we have either classical classes or value types and value types can be value class or primitive class. But for Java, i would argue that the model is more we have either reference objects or primitives, for reference objects you have those with identity and those without identity, hence "primitive" being a top-level kind while "value" (or a better term) being a modifier. R?mi From ali.ebrahimi1781 at gmail.com Fri Dec 24 15:18:06 2021 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Fri, 24 Dec 2021 18:48:06 +0330 Subject: Updated State of Valhalla documents In-Reply-To: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> References: <09EED588-7A6E-4BB8-8DCA-08C29F4E3D73@oracle.com> Message-ID: Hi Brian, Thanks for sharing project's latest direction. In Identifying identity part of 02-object-model doc: Identifying identity To distinguish between *primitive *and identity classes at compile and run time, we introduce two restricted interfaces IdentityObject and ValueObject. *I think you mean value instead of primitive: * To distinguish between *value *and *identity *classes ............. On Thu, Dec 23, 2021 at 8:44 PM Brian Goetz wrote: > Just in time for Christmas, the latest State of Valhalla is available! > > > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/01-background > > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/02-object-model > > https://openjdk.java.net/projects/valhalla/design-notes/state-of-valhalla/03-vm-model > > The main focus for the last year has been finding the right way to expose > the Valhalla features in the user model, in a way that is cleanly factored, > intuitive, and clearly connects with where the platform has come from. I > am very pleased with where this has landed. > > There are several more installments in the works, but these should give > plenty to chew on for now! > > Simple corrections accepted as PRs against valhalla-docs. > > > -- Best Regards, Ali Ebrahimi