From brian.goetz at oracle.com Sat Jul 1 01:38:19 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 30 Jun 2023 21:38:19 -0400 Subject: We don't need no stinkin' Q descriptors In-Reply-To: <6367D637-CE41-4502-BD9B-3580FA5970DE@oracle.com> References: <6367D637-CE41-4502-BD9B-3580FA5970DE@oracle.com> Message-ID: > > ?We won?t necessarily keep the Q forever, but it will help us, during > prototyping, to clearly map all of the places where value-ness needs > to be tracked.? I remember thinking, ?OK, but we?ll never get rid > of it; it?s too obviously correct.? > Or, if not too obviously correct, too nuisancefully attractive.? We all got comfortable with the Qs, and when it was time to give them up, we all resisted (for different reasons.)? For me, what sold me here is the degree to which we are aligning with the language model, and how it seems a shorter hop from here to enforced `String!` putfields and to species. > Another result was yearly struggle sessions about how we were ever going > to handle migration of Optional, LocalDate, etc. I?m surprised and glad > that we have come to a place of maximum erasure, where (a) all the places > where Q-ness needs mapping have been mapped, and yet (b) there is now no > remaining migration problem (despite no investment in high-tech API > bridges). > Indeed, I remember a time when we thought "we might be able to migrate Optional to B2, but never to B3", and Kevin thought I was crazy to even imagine that.? The various moves have been like the rebalancing of an AVL tree; as you rotate left, then the left side is unbalanced, and you rotate the left subtree right, and then ... > Along the way Dan S. started quietly talking about Type Restrictions, > which seemed (at first) to be some high-tech ceremony for stuff that could > just as easily be spelled with Q?s. I?m glad he persisted, because now > they seem to be the right-sized bucket in which to place the Q-signals, > after Q?s go away. > Yeah, sounded like magic pixie dust at first to me too. To me, the huge (implementation-driven) breakthrough was discovering that we could effectively scalarize L-value arguments and returns in the calling convention, at almost no extra cost over Q.? This surprised me, and is what cracked the nut open that we could erase more aggressively.? (Thanks Tobias!) > I think one key principle here is to embrace erasure, and hide the > presence of new refinement types from legacy code. > Indeed, the biggest temptation is the "but we can, and it might be useful."? Leaving the level of Class (and descriptors) alone and working entirely at the next level down feels unnatural. > Here is a complementary principle: In the VM, we should choose to support > exactly and only those refinement types that support Valhalla?s prime > goals, > which are data structure improvement (flattening). Since |String!| doesn?t > (yet) have a flattening story, |String!| should not be a (VM) > representable > type. > I'd agree, but only if I can add "yet".? We should pathfind through the VM using the things we know we need now, but if we can harness value set restrictions later, this is a huge step towards aligning to the user model, where users will be beating down our door for `String!` and either living without it, or with a complex set of compiler-inserted, only-mostly-reliable checks. > Why does |checkcast| get extra powers not enjoyed by the other two use > cases? I think the answer is pretty simple: |checkcast| is the last > resort for a Java compiler?s T.S. (translation strategy); if some type > cannot be represented on a VM container (and enforced by the verifier) > then either it cannot be safely cast (leading to ?unchecked? warnings) > or else it must be dynamically checked (requiring a |checkcast|). > Compilers need checkcast when they use erasure; we're using a lot of erasure here, so we need the biggest checkcast we can get. > Exactly where to put each |checkcast| (and where not to bother) > is an interesting question; perhaps it?s too much work to place > them on every read of a field. (I think it?s a good idea, because > redundant checks are free in the VM and earlier checks are better > than later ones.) But it seems very likely that at least field > writes will benefit from checkcasts, for all types that are > representable. And, note that type of |new B3![]| is representable. > Its class will be |B3[].class|, but its representable type > will be something like |NullRestrictedArray.of(B3.class)|. > Spoiler: I much prefer to check on write, and ultimately, push that check into the VM using the same value set restriction machinery as we do for B3!. > I don?t think there is a rift-healing move we could do with field > declarations, since flat |int| fields are already fully supported. > We could be generous in linkage, treating "I" and "LInteger;" with a type restriction as the same type, but it doesn't seem worth it. > Although it is technically an incompatibility, we might consider > allowing legacy |int[]| arrays to interoperate with |Object[]|, > so that more generic code can be written. That would be close to > the spirit of allowing |B3![]| arrays be viewed transparently as > possibly-null-tolerant |B3[]| arrays. > I think we concluded that this was probably a forced move at some point, but we should revisit that analysis. > (There is no cause to ask that |int|, which isn?t even a reference > type, should somehow be made to look like a subtype of |Integer|.) > Yes, but its arrays may be a different story. > I mean that the call |int.class.cast(x)| does not work, and lifting > that non-behavior up to |RepresentableType| will make a new and > unwelcome distinction between |B3!| and |int|: The mirror for |B3!| > (not actually a mirror) -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Jul 1 04:05:19 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 30 Jun 2023 21:05:19 -0700 Subject: We don't need no stinkin' Q descriptors In-Reply-To: References: <6367D637-CE41-4502-BD9B-3580FA5970DE@oracle.com> Message-ID: <2771AAD4-B136-4889-8320-ACE9A293BB93@oracle.com> On 30 Jun 2023, at 18:38, Brian Goetz wrote: ? >> Here is a complementary principle: In the VM, we should choose to support >> exactly and only those refinement types that support Valhalla?s prime goals, >> which are data structure improvement (flattening). Since |String!| doesn?t >> (yet) have a flattening story, |String!| should not be a (VM) representable >> type. >> > > I'd agree, but only if I can add "yet". Good! There was a ?yet? in there, and I meant yet another ?yet?. (It was a silent ?yet?.) ? > Spoiler: I much prefer to check on write, and ultimately, push that check into the VM using the same value set restriction machinery as we do for B3!. (I teed that up for you. You might be right.) >> I don?t think there is a rift-healing move we could do with field >> declarations, since flat |int| fields are already fully supported. >> > > We could be generous in linkage, treating "I" and "LInteger;" with a type restriction as the same type, but it doesn't seem worth it. Yes! >> Although it is technically an incompatibility, we might consider >> allowing legacy |int[]| arrays to interoperate with |Object[]|, >> so that more generic code can be written. That would be close to >> the spirit of allowing |B3![]| arrays be viewed transparently as >> possibly-null-tolerant |B3[]| arrays. >> > > I think we concluded that this was probably a forced move at some point, but we should revisit that analysis. Looking at Arrays.java and ArrayList.java, and thinking about T=int in those algorithms and containers and others, made it clear the choice is between |int[] <: Object[]| or replicated bytecodes, either by hand (like we do in Arrays.java) or somehow else (circa Model-2). >> I mean that the call |int.class.cast(x)| does not work, and lifting >> that non-behavior up to |RepresentableType| will make a new and >> unwelcome distinction between |B3!| and |int|: The mirror for |B3!| >> > > (not actually a mirror) Sorta-mirror. Mirror of a different class. Mirror-oid. From daniel.smith at oracle.com Wed Jul 12 15:01:20 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 12 Jul 2023 15:01:20 +0000 Subject: EG meeting 2023-07-12 Message-ID: <83937DB1-F69C-4B4F-AD5A-524AE58E57CB@oracle.com> An EG meeting will be held today, July 12, at 4pm UTC (9am PDT, 12pm EDT). Brian recently shared a summary about side-channel refinement types as a replacement for Q types. We can discuss this further, as interested. From forax at univ-mlv.fr Thu Jul 13 07:39:23 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 13 Jul 2023 09:39:23 +0200 (CEST) Subject: The last miles Message-ID: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> Hi all, if we take a step back and think about how value types are currently implemented, a good retconing is that value classes are classical classes that behave slightly differently at runtime when they are JIT optimized and if an optimization has to be done before JIT time (class layout by example) then we use side channels (Preload attribute and field attribute) so the information needed for the optimizations are available before JIT time. There is one area where this is not true, class instantitation, for class instantiation, currently the Java compiler transforms the bytecode, this has two major disatvantages, we need two new bytecodes, a special factory method and we do not support direct instantitation e.g. new Integer(). I wonder if we can not revisit that now and solve this last miles issue. I believe that the important property for value class instantitation that "this" should not escape the constructor and that instead of having the compiler to rewrite the bytecode, the constructor can be a classical method, the verifier will check that "this" does not escape, the JIT can not delay the initialization of the value type before its uses (this is already true because all fields are final) so at runtime no code can see a half baked value class instance. Obviously, I'm not a VM implementor and I barely know how things really work, so it might be a fantasy, but I think it's one that is worth trying to make a reality. R?mi From brian.goetz at oracle.com Thu Jul 13 14:24:11 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 13 Jul 2023 14:24:11 +0000 Subject: The last miles In-Reply-To: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> This is a good thought; we split the initialization protocol and its a fair question to ask whether we can go back to a lump. In this case, I suspect John is about to say ?Please let?s not give the verifier any more jobs to do.? > On Jul 13, 2023, at 3:39 AM, Remi Forax wrote: > > Hi all, > if we take a step back and think about how value types are currently implemented, > a good retconing is that value classes are classical classes that behave slightly differently at runtime when they are JIT optimized > and if an optimization has to be done before JIT time (class layout by example) then we use side channels (Preload attribute and field attribute) so the information needed for the optimizations are available before JIT time. > > There is one area where this is not true, class instantitation, for class instantiation, currently the Java compiler transforms the bytecode, > this has two major disatvantages, we need two new bytecodes, a special factory method and we do not support direct instantitation e.g. new Integer(). > I wonder if we can not revisit that now and solve this last miles issue. > > I believe that the important property for value class instantitation that "this" should not escape the constructor and that instead of having the compiler to rewrite the bytecode, the constructor can be a classical method, the verifier will check that "this" does not escape, the JIT can not delay the initialization of the value type before its uses (this is already true because all fields are final) so at runtime no code can see a half baked value class instance. > > Obviously, I'm not a VM implementor and I barely know how things really work, so it might be a fantasy, but I think it's one that is worth trying to make a reality. > > R?mi From john.r.rose at oracle.com Thu Jul 13 20:52:38 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 13 Jul 2023 13:52:38 -0700 Subject: The last miles In-Reply-To: <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> Message-ID: <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> On 13 Jul 2023, at 7:24, Brian Goetz wrote: > This is a good thought; we split the initialization protocol and its a fair question to ask whether we can go back to a lump. > > In this case, I suspect John is about to say ?Please let?s not give the verifier any more jobs to do.? It is that, and even worse. If you work the details, you?ll quickly run into the fact that the protocol (for Java constructors) builds an object but does not return the new object, it takes the new object from the caller in a tabula rasa (blank) state, and pokes values into it. Worse, the new object is supplied (by a new opcode) from an untrusted (even hostile) client. That means that the verifier needs complex rules (>10% of the total complexity) to track these untrusted-but-trusted blank objects and make sure they are handed to before being used. That?s bad. We have a steady bug stream from this very delicate machinery. Maybe it?s done after a quarter century but I wouldn?t bet the farm on that. Worse still, for values, there is no architecturally defined state, for values, which corresponds to the ?tabula rasa? state of the receiver of an call. We know something of that state; it is called a ?larval object?, but the Valhalla JVMS does not define or rely on it. The proposed ?unification? would require us to somehow simulate larval objects in terms of today?s blank identity objects, and define how the larval-to-adult state transition works, or it would have to build new verifier rules for larval objects (mutable while runs, then pure values after that). Either option seems much worse than what we have chosen to do so far. What we have chosen to do so far is have a functionally clean model for value objects that does not require mutability, either temporary (larval-only) or permanent (I shudder at that thought). This functionally clean model uses withfield instead of getfield, and aconst_init instead of the ?new? opcode. I think that is a great trade, because it lets us off the hook from defining mutability into values, at any stage of their lifetimes. Yes, serialization smuggles larval mutability back in, but that?s a private matter of optimization, between the VM and JDK. I really don?t want to see that in the JVMS, because it would be just as hairy and complex and bug-prone as today?s new/ dance. Yes, we should use the old mechanisms when we can, and we do! But the new/ dance is, IMO, hopelessly entangled with a presupposition of object identity, and also hopelessly buggy; so I don?t think it can help us, and I wouldn?t touch to extend it even if I thought it might help. How?s that? :-) From john.r.rose at oracle.com Thu Jul 13 20:55:42 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 13 Jul 2023 13:55:42 -0700 Subject: The last miles In-Reply-To: <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> Message-ID: P.S. If the original designers of Java bytecode had allowed to allocate its own object, and return it, we?d be having a different discussion. I wish it had been like that. I think it is a false economy to have ?new X(?)? and ?super(?)? call the same method symbol; that is the root of many evils. On 13 Jul 2023, at 13:52, John Rose wrote: > On 13 Jul 2023, at 7:24, Brian Goetz wrote: > >> This is a good thought; we split the initialization protocol and its a fair question to ask whether we can go back to a lump. >> >> In this case, I suspect John is about to say ?Please let?s not give the verifier any more jobs to do.? > > It is that, and even worse. If you work the details, you?ll quickly run into the fact that the protocol (for Java constructors) builds an object but does not return the new object, it takes the new object from the caller in a tabula rasa (blank) state, and pokes values into it. Worse, the new object is supplied (by a new opcode) from an untrusted (even hostile) client. That means that the verifier needs complex rules (>10% of the total complexity) to track these untrusted-but-trusted blank objects and make sure they are handed to before being used. That?s bad. We have a steady bug stream from this very delicate machinery. Maybe it?s done after a quarter century but I wouldn?t bet the farm on that. > > Worse still, for values, there is no architecturally defined state, for values, which corresponds to the ?tabula rasa? state of the receiver of an call. We know something of that state; it is called a ?larval object?, but the Valhalla JVMS does not define or rely on it. The proposed ?unification? would require us to somehow simulate larval objects in terms of today?s blank identity objects, and define how the larval-to-adult state transition works, or it would have to build new verifier rules for larval objects (mutable while runs, then pure values after that). Either option seems much worse than what we have chosen to do so far. > > What we have chosen to do so far is have a functionally clean model for value objects that does not require mutability, either temporary (larval-only) or permanent (I shudder at that thought). This functionally clean model uses withfield instead of getfield, and aconst_init instead of the ?new? opcode. I think that is a great trade, because it lets us off the hook from defining mutability into values, at any stage of their lifetimes. > > Yes, serialization smuggles larval mutability back in, but that?s a private matter of optimization, between the VM and JDK. I really don?t want to see that in the JVMS, because it would be just as hairy and complex and bug-prone as today?s new/ dance. Yes, we should use the old mechanisms when we can, and we do! But the new/ dance is, IMO, hopelessly entangled with a presupposition of object identity, and also hopelessly buggy; so I don?t think it can help us, and I wouldn?t touch to extend it even if I thought it might help. > > How?s that? :-) From john.r.rose at oracle.com Thu Jul 13 21:03:46 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 13 Jul 2023 21:03:46 +0000 Subject: The last miles In-Reply-To: <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> Message-ID: On Jul 13, 2023, at 1:52 PM, John Rose wrote: > > The proposed ?unification? would require us to somehow simulate larval objects in terms of today?s blank identity objects P.P.S. That?s almost possible if you declare that the new opcode makes a larval value, but closing it off is very hard. You need an explicit end-larval transition to adult. The verifier would have to enforce this. Nightmare. From forax at univ-mlv.fr Thu Jul 13 22:05:09 2023 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 14 Jul 2023 00:05:09 +0200 (CEST) Subject: The last miles In-Reply-To: <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> Message-ID: <347077932.104195064.1689285909209.JavaMail.zimbra@univ-eiffel.fr> ----- Original Message ----- > From: "John Rose" > To: "Brian Goetz" > Cc: "Remi Forax" , "valhalla-spec-experts" > Sent: Thursday, July 13, 2023 10:52:38 PM > Subject: Re: The last miles > On 13 Jul 2023, at 7:24, Brian Goetz wrote: > >> This is a good thought; we split the initialization protocol and its a fair >> question to ask whether we can go back to a lump. >> >> In this case, I suspect John is about to say ?Please let?s not give the verifier >> any more jobs to do.? > > It is that, and even worse. If you work the details, you?ll quickly run into > the fact that the protocol (for Java constructors) builds an object but > does not return the new object, it takes the new object from the caller in a > tabula rasa (blank) state, and pokes values into it. Worse, the new object is > supplied (by a new opcode) from an untrusted (even hostile) client. That means > that the verifier needs complex rules (>10% of the total complexity) to track > these untrusted-but-trusted blank objects and make sure they are handed to > before being used. That?s bad. We have a steady bug stream from this > very delicate machinery. Maybe it?s done after a quarter century but I > wouldn?t bet the farm on that. > > Worse still, for values, there is no architecturally defined state, for values, > which corresponds to the ?tabula rasa? state of the receiver of an call. > We know something of that state; it is called a ?larval object?, but the > Valhalla JVMS does not define or rely on it. The proposed ?unification? would > require us to somehow simulate larval objects in terms of today?s blank > identity objects, and define how the larval-to-adult state transition works, or > it would have to build new verifier rules for larval objects (mutable while > runs, then pure values after that). Either option seems much worse than > what we have chosen to do so far. > > What we have chosen to do so far is have a functionally clean model for value > objects that does not require mutability, either temporary (larval-only) or > permanent (I shudder at that thought). This functionally clean model uses > withfield instead of getfield, and aconst_init instead of the ?new? opcode. I > think that is a great trade, because it lets us off the hook from defining > mutability into values, at any stage of their lifetimes. > > Yes, serialization smuggles larval mutability back in, but that?s a private > matter of optimization, between the VM and JDK. I really don?t want to see > that in the JVMS, because it would be just as hairy and complex and bug-prone > as today?s new/ dance. Yes, we should use the old mechanisms when we > can, and we do! But the new/ dance is, IMO, hopelessly entangled with a > presupposition of object identity, and also hopelessly buggy; so I don?t think > it can help us, and I wouldn?t touch to extend it even if I thought it might > help. > > How?s that? :-) Here, your analysis is based on the fact that neither the callsite nor the declaration site of will change. We are less contrained than that, the callsite can not be changed but the declaration of can change, recompiling the value class is something users will have to do anyway. So the new + dup + invokespecial dance has to be the same but not the semantics of each individual opcodes which can be adjusted to value class (it's a lump move) and the content of and even its decriptor can be different. Here is what I propose, - inside the value class, should return the instance, so the decriptor should be (LComplex;)LComplex; instead of ()V for a constructor with no parameters. So inside the constructor, either "this" or the first parameter is ignored and withfield is used instead of putfield, the fully initialized instance is returned by . The verifier is updated to understand the opcode "withfield". - outside the value class, the semantics of the opcode "new" is changed to be the semantics of "aconst_init" if the class is a value class. invokespecial Complex ()V semantics is changed if Complex is a value class, so on stack takes two instances + the parameters and calls (LComplex;)LComplex; It's not beautiful, it's a hack, as Brian said it's a lump move. But it's not as bad as you seem to think :) R?mi From daniel.smith at oracle.com Fri Jul 14 16:17:03 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 14 Jul 2023 16:17:03 +0000 Subject: The last miles In-Reply-To: <347077932.104195064.1689285909209.JavaMail.zimbra@univ-eiffel.fr> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> <347077932.104195064.1689285909209.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <132DDB00-8313-4F7F-9781-EF856146FD81@oracle.com> > On Jul 13, 2023, at 3:05 PM, forax at univ-mlv.fr wrote: > > - outside the value class, the semantics of the opcode "new" is changed to be the semantics of "aconst_init" if the class is a value class. > invokespecial Complex ()V semantics is changed if Complex is a value class, so on stack takes two instances + the parameters and calls (LComplex;)LComplex; > > It's not beautiful, it's a hack, as Brian said it's a lump move. But it's not as bad as you seem to think :) We sort of tried this when we looked at migrating existing 'new Integer(0)' calls in bytecode. There, we looked for a simple bytecode translation, turning this: new java/lang/Integer; dup; iconst_0; invokespecial java/lang/Integer.:(I)V; into this: aconst_init java/lang/Integer; dup; iconst_0; invokemagic java/lang/Integer.:(I)Ljava/lang/Integer;; For full generality, the 'invokemagic' operation should: - On entry, consume an Integer along with method args, as if it were invoking an instance method - On return, replace every Integer on the stack/in locals that was produced by the same 'aconst_init' call as the "receiver" with the result This seemed like a pretty significant new JVM behavior, so instead we tentatively proposed shifting the responsibility to an external compatibility tool, which could perform a bytecode rewrite and wasn't obligated to support all shapes. (We should get data on this, but for presumably almost all constructor invocations, the "every Integer on the stack/in locals" reduces to "the variable currently on the top of the stack". Obviously a simpler problem, but not a constraint the JVM imposes.) --- Another idea we've thought about is a magic '' method. It would look something like: iconst_0; invokestatic java/lang/Integer.:(I)Ljava/lang/Integer; At the declaration site, it's *impossible to declare* a method. Instead, the behavior is implicit: - If Integer is an identity class, the effect of '' is to do new/dup/ - If Integer is a value class, the effect of '' is to call (A sub-diversion we went on was to allow user-declared '' methods?factories, basically. But there are guarantees you get from direct 'new' calls that you lose if you let people write their own factories. We weren't comfortable with that.) The magic '' method seems reasonable, and provides a path going forward to get rid of new/dup/ code. But it doesn't do anything for legacy bytecode. So the problem of needing a bytecode rewriting tool remains. From john.r.rose at oracle.com Fri Jul 14 20:40:11 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 14 Jul 2023 13:40:11 -0700 Subject: The last miles In-Reply-To: <132DDB00-8313-4F7F-9781-EF856146FD81@oracle.com> References: <1724794551.103459871.1689233963536.JavaMail.zimbra@univ-eiffel.fr> <940032A1-D914-474B-8473-7DAE200ACF40@oracle.com> <939E976C-5088-4AE2-987E-D5EFBEF734C1@oracle.com> <347077932.104195064.1689285909209.JavaMail.zimbra@univ-eiffel.fr> <132DDB00-8313-4F7F-9781-EF856146FD81@oracle.com> Message-ID: <5BD44678-F230-4D71-BEC9-90820EC72BD2@oracle.com> On 14 Jul 2023, at 9:17, Dan Smith wrote: > ? > The magic '' method seems reasonable, and provides a path going forward to get rid of new/dup/ code. But it doesn't do anything for legacy bytecode. So the problem of needing a bytecode rewriting tool remains. If/when we decide to do that magic ??, and newer classfiles avoid the new/dup/init dance, two good things will start to happen. First, migration to value types will become somewhat easier, avoiding recompilation in some cases (those cases where a client says ?new SomeValue()?). Second, the ?dance? will begin to disappear from classfiles, and we can think about disallowing it in newer classfiles, thus taking 10% off the top of verifier complexity and bugginess. I?d love to put ?the dance? in our rear view mirror. From daniel.smith at oracle.com Wed Jul 26 14:21:50 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 26 Jul 2023 14:21:50 +0000 Subject: EG meeting *canceled* 2023-07-26 Message-ID: <91590CC7-CEC5-4978-9B47-B1966A1837ED@oracle.com> I'm on vacation today, no EG meeting. August 9 will be during the JVM Language Summit, so we won't hold a meeting that day, either, but look forward to seeing many of you at the conference. From heidinga at redhat.com Thu Jul 27 20:29:46 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 27 Jul 2023 16:29:46 -0400 Subject: We don't need no stinkin' Q descriptors In-Reply-To: References: Message-ID: Overall, this approach sounds reasonable to me and aligns with the direction we've been discussing during EG meetings. I do have some concerns about how it will be lowered to the classfile. If I read this correctly, only two bytecodes will be impacted: checkcast and anewarray. Is that correct? Both bytecodes currently take a constant pool index to a CONSTANT_Class. One of the benefits of "Q"s was that they allowed us to "smuggle" the extra information we needed into the existing CONSTANT_Class without needing to introduce new constant pool forms. With the change to use RefinementTypes, I'm not clear on how we'll express the information in the constant pool without needing to make the bytecodes accept two different CP entry types which is unfortunate (mostly for the interpreter). One option is to introduce a new CONSTANT_RefinementType entry that expresses both the RefinementType ("rfType") and the base type ("baseType") being refined: CONSTANT_RefinementType_Info { u1 tag; // CONSTANT_RefinementType 21 u2 rfType_idx; // CONSTANT_Class for the RefinementType {NullRestrictedClass, NullRestrictedArray} u2 baseType_idx; // CONSTANT_Class for the baseType } with some resolution rules that require "rtType" to be a subclass of RefinementType and "baseType" to agree with the constraints required by the rtType {implicitly constructable B3, or array of implicitly constructable B3}. A naive interpreter implementation would need to check the "tag" in checkcast/anewarray before reading the CP data but clever encodings should be able to avoid that overhead. And the JIT of course can figure it out cheaply at compile time so no extra cost for the JIT. This encoding won't necessarily create an instance of the RefinementType while resolving the CP entry, but it would have all the data necessary for the VM to enforce the requirements. It does push the knowledge of the restrictions for the refinement types into the VM (unfortunate) but there may be ways to pull those back up into the class library by using an upcall during resolution. This is a rough sketch of a possible CP encoding to ensure there's a reasonable path forward here. Anyone have a better encoding or other thoughts on how this could be lowered to the classfile? I was also wondering about how this will be expressed in MethodHandles if we don't differentiate Q from L any more.... It means we don't have a java.lang.Class for the "Q" flavour so we can't use MethodTypes to differentiate any more (and that includes losing asType / castArguments to do casts). Would we need to add a new MH combinator to handle RefinementType casts? Sorry if this is jumping too far into the weeds; my thought process was to poke at challenging areas and ensure the bits hold together. --Dan On Fri, Jun 30, 2023 at 4:52?PM Brian Goetz wrote: > This mail summarizes some discussions we?ve been having about eliminating > Q descriptors from the VM design. Over time, we?ve been giving Q fewer and > fewer jobs to do, to the point where (perhaps surprisingly) we can replace > the remaining jobs with less intrusive mechanisms. Additionally, as the > language model has simplified, the gap between the language and VM has > increased, and the proposal herein offers a path to narrowing that gap. > > I?ll be on vacation for a while, but Dan and John will be able to carry > forward this discussion. > > Please bear in mind that this is a very rough draft of direction; we don?t > need to bikeshed anything right now, as much as agree that there is a > better, simpler, more aligned direction than we had previously. > We don?t need no stinkin? Q types > > In the last six months, we made a significant breakthrough at the > language/user > level ? to decompose B3 with its value and reference companions, into two > simpler concepts: implicit constructibility (a declaration-site property) > and > null restriction (a use-site property.) The .ref/.val distinction, and all > its > excess complexity, stemmed from the mistaken desire to model the > int/Integer > divide directly. By breaking B3-ness down into more ?primitive? properties > (some of which are shared with non-B3 classes), we arrived at a simpler > model; > no more ref/val projections, and more uniform treatment of X! (including > for B1 > and B2 classes). > > As we worked through the language and translation details, we continued to > seek > a lower energy state. We concluded that we can erase X! to LX; in a number > of places (locals, method descriptors, verifier type system) while still > meeting > our performance objectives. Doing so eliminates a number of issues with > method > resolution and distinguishing overloads from overrides. In fact, we found > ourselves using Q for fewer and fewer things, at which point we started to > ask > ourselves: do we need Q descriptors at all? > > In our VM, there is a (mostly) 1-1-1 correspondence between runtime types, > descriptors, and class mirrors. In a world where QFoo and LFoo are separate > runtime types, it makes sense for them to have their own descriptors and > mirrors. But as Foo! and Foo? have come together in the language, mapping > to a VM which seems them as separate runtime types starts to show gaps. > > The role of Q has historically been one of ?other?, rather than something > on its > own; any class which had a Q type, also had an L type, and Q was the ?other > flavor.? The ?two flavors? orientation made sense when we were modeling the > int/Integer split; we needed two flavors for that in both language and VM. > The > language since discovered that we can break down the int/Integer divide > into two > more primitive notions ? implicit constructibility (an int can be used > without > calling a constructor, an Integer cannot) and non-nullity (non-identity > plus > default constructibility plus non-nullity unlocks flattening.) > > If Q is a valid descriptor and there is always a Q mirror, we are in a > stable > place with respect to runtime types. But if we intend to allow m(Foo!) to > override m(Foo?), to be tolerant of bang-mismatches in method resolution, > and > give Q fewer jobs, then we are moving to an unstable place. We?ve explored > a > number of ?only use Q for certain things? positions, and have found many > of them > to be unstable in various ways. The other stable point is that there are > no Q > types, and no Q mirrors ? but then we need some new channel to encode the > request to exclude null, and so give the VM the flattening hint that is > needed. > > As it turns out, there are surprisingly few places that truly need such a > new > channel. We basically need the VM to take ?Q-ness? into account in three > places: > > - Field layout ? a field of type Foo! (where Foo is implicitly > constructible) needs a hint that this field is null-restricted, so we > can lay > it out flat. > - Array layout ? at the point of anewarray and friends, we need a hint > when > the component type is an implicitly-constructible, null-restricted > type. > - Casting ? casts need to be able to express a value-set check for the > restricted value set of Foo! as well as the unrestricted value set of > Foo. > > We are convinced that these three are all that is truly required to get the > flattening we want. So rather than invent new runtime types / mirrors / > descriptors that are going to flow everywhere (into reflection, method > handles, > verification, etc), let?s invent the minimal additional classfile surface > and VM > model to model that. At the same time, let?s make sure that the new thing > aligns with the new language model, where the star of the show is > null-restricted types. > What about species? > > In separate investigations, we have a notion of ?species? for a long time, > which > we know we?re going to need when we get to specialization. Species form a > partition of a classes instances; every instance of a class belongs to > exactly > one species, and different species may have different layouts and value set > restrictions. And we struggled with species for a long time over the same > runtime type affordances (mirrors and descriptors) ? what does a field > descriptor for a field of type ArrayList look like? What does > getClass > return? > > In both cases, the constraints of compatibility have been pushing us > towards > more erasure in descriptors and reflection, with side channels to > reconstruct > information necessary for optimized heap layout, and with separate API > points > for getClass vs getSpecies. While specialization is considerably more > complicated, nearly all the same considerations (descriptors, mirrors, > reflection) are present for null-restriction types. We took an earlier > swing at > unifying the two under the rubric of ?type restrictions?, but I think our > model > wasn?t quite clean enough at the time to admit this unification. But I > think we > are now (almost) there, and the payoff is big. > > What we concluded around species and specialization is that we would have > to > continue to erase descriptors (ArrayList as a method or field > descriptor > continues to erase to LArrayList;), that getClass returns the primary > mirror > (ArrayList), and that species information is pushed into a side channel. > These are pretty much the exact same considerations as for null-restriction > types. > Species and bang types are *refinement types* > > A *refinement type* is a type whose value set is that of another type, > plus a > predicate restricting the value set. A ?bang? type Point! is a refinement > of > Point, where we eliminate the value null. (Other well-known refinement > types > from PL history include C enums and Pascal ranges.) Refinement types are > often > erased to their base type, but some refinements enable better layout. > Indeed, > our interest in Q types is flattening, and for an implicitly constructible > class, a variable holding a null-excluding type can be flattened. > Similarly, > for a sufficiently constrained generic type (e.g., Point[int,int]), the > layout > of such a variable can be flattened as well. > > What we previously called ?type restrictions? in the Parametric > VM > > document is in fact a refinement type. We claim that we can design the > null-restriction channel in such a way that it can be extended, in some > reasonable way, to support more general specialization. > > Both specialization, and null-restriction, are forms of refinement types. > Given > that we?ve already discovered that we need to erase these to their primary > (L) > type in a lot of places, let?s stake out some general principles for > representing refinements in the VM: > > - Refinement types are erased to their base type in method and field > descriptors. > - Refinement types do not have *class* mirrors. > - Object::getClass returns a class mirror. > - Reflection deals in class mirrors, so refinements are erased from > base > reflection. > - Method handles deal in class mirrors, so refinements are erased from > method > handles. > > That?s a lot of erasure, so we have to bake refinement back in where it > matters, > but we want to be careful to limit the ?blast radius? of the refinement > information to where it does actually matter. The new channel that encodes > a > refinement type will appear only when needed to carry out the tasks listed > above: field declaration, array creation, and casting. > > - Fields are enhanced with some sort of ?refinement? attribute, which > (a) > guards against stores of bad values (the field equivalent of > ArrayStoreException) and (b) enables flatter layouts when the > refinement > permits. > - Array creation (anewarray / `multianewarray?) is enhanced to support > creating arrays with refined component types, enabling the same > benefits > (storage safety / layout flattening.) > - Casting is enhanced to support refinements. This is needed mostly > because of > erasure ? we are erasing away refinement information and sometimes > need to > reassert it. > - When we get to specialization, new is enhanced to support > refinements, and > possibly method declarations (to enable calling convention > optimization in > the presence of highly specialized types like Point[int,int].) > > We had previously been assuming that [QPoint is somehow more of a ?real? > type > than (specialized) Point[int,int], but I think we are better served seeing > them both as refinements, where we continue to report a broad type but > sort-of-secretly use refinement information to optimize layout. > A strawman > > What follows is a strawman that eliminates Qs completely, replacing the > few jobs > Q has (field layout, array layout, and casts) with a single mechanism for > refinement types which stays in the background until explicitly summoned. > We > believe the model outlined here can extend cleanly to species, as well as > B1! > types like String! as well. Call this No-Q world. This should not be taken > as a concrete proposal, as much as a sketch of the concepts and the > players. > > We have come to believe that adding Q descriptors to the JVM specification, > while perhaps the right move in a from-scratch VM design, would be > overreach as > an evolutionary step. For old APIs to adopt new descriptors will require > many > bridge methods with complex properties. To avoid such bridges, old APIs > would > be forbidden from mentioning the new types. For these reasons, new > descriptors, > and the mirrors that would accompany them, are quite literally a bridge > too far. > Accordingly, in No-Q world, descriptors reclaim their former role: > describing > primitives and classes. Field and method descriptors will use L > descriptors, > even when carrying a null-restricted value (or a species.) Similarly, class > mirrors return to their former role: describing classfiles and non-refined > VM-derived types (such as array types.) > > As a self-imposed rule of this essay, we will not appeal to runtime > support, > condy or indy. Everything will be done with bytecodes, descriptors, > constant > pool entries, and other classfile structures, and not via specially-known > methods. As this is a strawman, we may indulge in some ?wasteful? design, > which > can be transformed or lumped in later iterations. The new elements of the > design are: > > - A new reflective concept for RefinementType, which represents a > refinement > of an existing (class) type. > - A new reflective concept for RepresentableType, which is the common > supertype between Class and RefinementType. > - New constant pool forms representing null-restriction of classes and > of > arrays. > - A new field attribute called FieldRefinement. > - Adjustments to various bytecodes to interact with the new constant > pool > forms. > - Additions to reflective APIs. > > Refined types > > A refined type is a combination of a type (called the base type) and a > value set > restriction for that type which excludes some values in the value set of > the > base type. Null-restricted types, arrays of null-restricted types, and > eventually, species of generics are refined types. > > Refined types can be represented by a reflective object > > sealed interface RefinementType implements RepresentableType { > RepresentableType baseType(); > } > > The type parameter T represents the base type. > > There are initially two implementations of RefinementType, which may be > private, > and are known to the VM: > > private record NullRestrictedClass(Class baseType) > implements RefinementType { } > > private record NullRestrictedArray(Class baseType) > implements RefinementType { } > > Constant pool entries > > The two jobs for null restriction must be representable in the constant > pool: a > null-restricted B3, and an array of a null-restricted B3. (These > correspond to > Constant_Class_info with a descriptor of QFoo; and [QFoo; in the > traditional design.) In addition to being referenced by bytecodes and > attributes, such constants should ideally be loadable, evaluating to a > RefinementType or RepresentableType. > > The exact form of the constant pool entry (whether new bespoke constant > pool > entries, ad-hoc extensions to Constant_Class_info, or condy) can be > bikeshod at > the appropriate time; there are clearly tradeoffs here. > > Initially, null-restricted types must be implicitly constructible (B3), > which > would be checked when the constant is resolved. Eventually, we can relax > null-restriction to support all class types. Similarly, we may initially > restrict to one-dimensional flat arrays, and leave multianewarray to its > old > job. > Representable types > > The new common superinterface between Class and RefinementType exists so > that > both classes and class refinements can be used as array components, type > parameters for specializations, etc. Some operations from Class, such as > casting, may be pulled up into this interface. > > sealed interface RepresentableType { > T cast(Object o) throws ClassCastException; > ... > } > > Refined fields > > Any field whose type is a null-restricted implicitly constructible class > may be > considered by the VM as a candidate for flattening. Rather than using > field_info.descriptor_index to encode a null-restricted type, we continue > to > erase to the traditional L descriptor, but add a FieldRefinement attribute > on the field. Similarly, Constant_FieldRef_info continues to link fields > using the L descriptor. > > FieldRefinement { > u2 name_index; // "FieldRefinement" > u4 length; > u2 refinement_index; // symbolic reference to a RefinementType > } > > The symbolic reference must be to a null-restricted, implicitly > constructible > class type, not an array type. We may relax this restriction later. > > Additionally, a field refinement may affect the behavior of putfield. For > a > null-restricted class, attempts to putfield a null will result in > NullPointerException (or perhaps a more general FieldStoreException.) > > Looking ahead, for the null-restriction of a B1 or B2 class, there is no > change > to the layout but we could enforce the storage restriction on putfield. > When > we get to species, the refinement for a species may affect the layout, and > attempting to store a value of the wrong species may result in an > exception or > in an automatic conversion. > > It is a free choice as to whether we want to translate a field of type > Point![] using an array refinement or fully erase it to Point[]. > Refined casts > > The operand of a checkcast or instanceof may be a symbolic reference to a > class or refinement. (Since instanceof is null-hostile, changing > instanceof > is not necessary now, but when we get to species, we will need to be able > to > test for species membership.) The cast operation may be pulled up from > Class to RepresentableType so that casts can be done reflectively with > either a Class or a refinement. > Refined array creation > > An anewarray may make a symbolic reference to a class refinement type, as > well > as to a class, array, or interface type. > > For a refined array, a.getClass() continues to return the primary mirror > for > the array type, and Class::getComponentType on that array continues to > return > the primary mirror for the component type, but we may provide an > additional API > point akin to getComponentType that returns a RepresentableType which may > be > a RefinementType. > > Arrays of null-restricted values can be created reflectively; the existing > Array::newInstance method will get an overload that takes > RepresentableType. > Arrays::copyOf when presented with a refined array type will create a > refined > array. > Refinement information stays in the background until summoned > > The place where we need discipline is avoiding the temptation of ?but > someone > might profitably use the information that this field holds a flat array.? > Yes, > they might ? but supporting that as a general-purpose runtime type (with > descriptor and mirror) has costs. > > The model proposed here resists the temptation to redefine mirrors, > descriptors, > symbolic resolution, and reflection, instead leaning on erasure here for > both > null-restriction and specialization, and providing a secondary reflective > channel (which almost no users will actually need) to get refinement > information. (An example of code that needs to summon refinement > information is > Arrays::copy, which would need to fetch the refined component type and > instantiate an array using the refined type; most other reflective code > would > not need to even be aware of it.) > Bonus round: specialization > > The framework so far seems to accomodate specialization fairly well. > There?ll > be a new subtype of RefinementType to represent a specialization, a > reflective > method for creating such specialization such as: > > static SpecializedType specialization(Class baseClass, > RepresentableType... arguments) > > and a new way to get such a type refinement in the constant pool (possibly > just > a condy whose bootstrap is the above method.) The new bytecode is > extended to > accept a specialization refinement. Field refinements would then be able to > refer to specialization refinements. > Conclusions > > In the current world we have a (mostly) 1:1:1 relationship between runtime > types, descriptors, and mirrors; a model where species/refinements are not > full > runtime types preserves this. The surface area where refinement information > leaks to users who are not prepared for it is dramatically smaller. > Refinements > are not full runtime types, they don?t have full Class mirrors. We erase > down > to real runtime types in descriptors and in reflective API points like > Object::getClass. This seems a powerful simplification, and one that > aligns > with the previous language simplification. To summarize: > > - Yes, we should get rid of Q descriptors, but should do so in a more > principled way by getting rid of Q as a runtime type entirely, > replacing it > with a refinement type which stays in the background until it is > actually > needed. > - We should erase Q from method and field descriptors and from the > obvious > mirrors, because refinement information is on a need-to-know basis. > - Refinement information primarily flows from source -> classfile -> > VM, and > mostly does not flow in the other direction. Specialized reflection > might > expose it, but we should do so not on general principles, but based on > where > it is actually needed by the programming model. > - Null restriction is more like specialization than not; they are both > value > set refinements that possibly enable layout optimization, and we > should seek > to treat them the same. > - While leaving the door open for additional kinds of species and type > migration, we use our new powers, at first, only to define flattenable > fields > and flattenable one-dimensional arrays. > > ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 27 20:37:51 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Jul 2023 16:37:51 -0400 Subject: We don't need no stinkin' Q descriptors In-Reply-To: References: Message-ID: <499b3249-e439-de10-6316-f86aa6aff06f@oracle.com> > > I was also wondering about how this will be expressed in MethodHandles > if we don't differentiate Q from L any more.... It means we don't have > a java.lang.Class for the "Q" flavour so we can't use MethodTypes to > differentiate any more (and that includes losing asType / > castArguments to do casts). Would we need to add a new MH combinator > to handle RefinementType casts? I'll let John speak to the classfile-encoding-of-refinements. But yes, you have it right -- we'd add a MH combinator for refinement type casts / anewarray.? This aligns MHs with how bytecoded methods work; we don't have restrictions for method parmeters/returns (yet) either.?? Our experiments suggested that the cost of the extra null channel for scalarizing LFoo vs QFoo was pretty much negligible (some extra register pressure).