From heidinga at redhat.com Wed Mar 2 18:43:24 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 2 Mar 2022 13:43:24 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> Message-ID: On Mon, Feb 28, 2022 at 1:53 PM Brian Goetz wrote: > > > Now, what if instead of Object, we start with Long? > > Long l = 0L > if (l instanceof byte b) { ... } > > First, applicability: does Long unbox to a primitive type that can be narrowed to byte? Yes! Long unboxes to long, and long can be narrowed to byte. > > Then: matching: if the RHS is not null, we unbox, and do a range check. (The rules in my previous mail probably didn't get this case perfectly right), but 0L will match, and 0xffff will not -- as we would expect. > > > This is totally alien to me, when you have x instanceof Foo (note: this is not the pattern version) with X the type of x, then if x is declared with a super type of X it works the exact same way, i.e i don't have to care to much about the type on what i'm doing an instanceof / switching over it. > > > Yes, I understand your discomfort. And I will admit, I don't love this particular corner-of-a-corner either. (But let's be clear: it is a corner. If you're seeking to throw out the whole scheme on the basis that corners exist, you'll find the judge to be unsympathetic.) > > So why have I proposed it this way? Because, unfortunately, of this existing line in JLS 5.2 (which I never liked): > > > an unboxing conversion followed by a widening primitive conversion > > This is what lets you say: > > long l = anInteger > > And, I never liked this rule, but we're stuck with it. The inverse, from which we would derive this rule, is that > > anInteger instanceof long l > > should be applicable, and in fact always match when the LHS is non-null. I would prefer to not allow this assignment conversion, and similarly not allow both unboxing and widening in one go in pattern matching, but I didn't get to write JLS 5.2. > > What's new here is going in the *other* direction: > > anInteger instanceof short s > > and I think what is making you uncomfortable is that you are processing two generalizations at once, and it's pushing your "OMG different! scary!" buttons: > > - that we're defining primitive type patterns in a way such that we can derive the existing assignment conversions; > - that primitive type patterns can have dynamic checks that primitive assignments cannot, so we're including the value-range check. > > Each individually is not quite as scary, but I can understand why the two together would seem scary. (And, as I mentioned, I don't like the unbox-and-widen conversions either, but I didn't invent those.) > > Making the pattern match compatible with assignment conversions makes sense to me and follows a similar rationale to that used with MethodHandle::asType following the JLS 5.3 invocation conversions. Though with MHs we had the ability to add additional conversions under MethodHandles::explicitCastArguments. With pattern matching, we don't have the same ability to make the "extra" behaviour opt-in / opt-out. We just get one chance to pick the right behaviour. Intuitively, the behaviour you propose is kind of what we want - all the possible byte cases end up in the byte case and we don't need to adapt the long case to handle those that would have fit in a byte. I'm slightly concerned that this changes Java's historical approach and may lead to surprises when refactoring existing code that treats unbox(Long) one way and unbox(Short) another. Will users be confused when the unbox(Long) in the short right range ends up in a case that was only intended for unbox(Short)? I'm having a hard time finding an example that would trip on this but my lack of imagination isn't definitive =) Something like following shouldn't be surprising given the existing rules around unbox + widening primitive conversion (though it may be when first encountered as I expect most users haven't really internalized the JLS 5.2 rules): Number n = ....; switch(n) { case long l -> ... case int i -> .... // dead code case byte b -> .... // dead code default -> .... } But this may be more surprising as I suggested above Number n = new Long(5); switch(n) { case byte b -> .... // matches here case int i -> .... // case long l -> ... default -> .... } Overall, I like the extra dynamic range check but would be fine with leaving it out if it complicates the spec given it feels like a pretty deep-in-the-weeds corner case. --Dan From forax at univ-mlv.fr Wed Mar 2 19:36:05 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 2 Mar 2022 20:36:05 +0100 (CET) Subject: [External] : Re: Primitive type patterns In-Reply-To: <51f07830-a1bf-b705-2c89-53e71ccb066e@oracle.com> References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> <1221659857.9463108.1646080532106.JavaMail.zimbra@u-pem.fr> <51f07830-a1bf-b705-2c89-53e71ccb066e@oracle.com> Message-ID: <693181479.10454030.1646249765675.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Monday, February 28, 2022 10:00:08 PM > Subject: Re: [External] : Re: Primitive type patterns > This is a valid generalized preference (and surely no one is going to > say "no, I prefer to play to our weaknesses.")? But at root, I think > what you are saying is that you would prefer that pattern matching > simply be a much smaller and less fundamental feature than what is being > discussed here.? And again, while I think that's a valid preference, I > think the basis of your preference is that it is "simpler", but I do not > think it actually delivers the simplicity dividend you are hoping for, > because there will be subtle mismatches that impede composition and > refactoring (e.g., new "null gates" and "box gates".) There are two ways to express "match non null Integer + unboxing", this one Integer value = ... switch(value) { case Integer(int i) -> ... } And we already agree that we want that syntax. You are proposing a new one Integer value = ... switch(value) { case int i -> ... } Obviously, your proposal makes things less simple because we new have to ways to say the same thing. Moreover, introducing assignment conversions introduce more corner cases, we already discussed about - unboxing + widening being supported while than the widening + unboxing is not, - pattern matching behaving differently from the visitor pattern/method overriding rules There is also issues IMO when you start mixing wrappers and primitive types. Just ask yourself, can you predict if the following codes compiles and if it compiles, which case is called depending on the value of value. Integer value = ... switch(value) { case double d -> ... case Integer i -> ... } int value = ... switch(value) { case float f -> ... case Integer i -> ... } int value = ... switch(value) { case Float f -> ... case Integer i -> ... } I think we do not need assignment conversions *and* that introducing them makes the semantics harder to understand. R?mi From brian.goetz at oracle.com Wed Mar 2 20:11:58 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 2 Mar 2022 15:11:58 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: <693181479.10454030.1646249765675.JavaMail.zimbra@u-pem.fr> References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> <1221659857.9463108.1646080532106.JavaMail.zimbra@u-pem.fr> <51f07830-a1bf-b705-2c89-53e71ccb066e@oracle.com> <693181479.10454030.1646249765675.JavaMail.zimbra@u-pem.fr> Message-ID: <8bb18d39-426b-1e45-ba60-0ea462024b15@oracle.com> On 3/2/2022 2:36 PM, forax at univ-mlv.fr wrote: > There are two ways to express "match non null Integer + unboxing", > this one > Integer value = ... > switch(value) { > case Integer(int i) -> ... > } > > And we already agree that we want that syntax. Wait, what?? The above is not yet on the table; we will have to wait for deconstruction patterns on classes to be able to express that. When we get there, we'll have a choice of whether we want to add a deconstructor to the wrapper classes.? (At which point, you might well say "we already have a way to do that"...) > You are proposing a new one > Integer value = ... > switch(value) { > case int i -> ... > } But if it was on the table now, it would still not be particularly notable as a "new way" of anything; this would be true for *every single nested pattern*.? In fact, that's a feature, not a bug, that you can unroll a switch with a non-total nested pattern to a nested switch, and that is a desirable refactoring to support. > Obviously, your proposal makes things less simple because we new have to ways to say the same thing. Not obvious at all.? Java's simplicity does not derive from "exactly one way to do each thing"; suggesting otherwise is muddying the terms for rhetorical effect, which is not helpful. > I think we do not need assignment conversions*and* that introducing them makes the semantics harder to understand. The former is a valid opinion, and is noted! From brian.goetz at oracle.com Wed Mar 2 20:13:30 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 2 Mar 2022 15:13:30 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> Message-ID: On 3/2/2022 1:43 PM, Dan Heidinga wrote: > > Making the pattern match compatible with assignment conversions makes > sense to me and follows a similar rationale to that used with > MethodHandle::asType following the JLS 5.3 invocation conversions. > Though with MHs we had the ability to add additional conversions under > MethodHandles::explicitCastArguments. With pattern matching, we don't > have the same ability to make the "extra" behaviour opt-in / opt-out. > We just get one chance to pick the right behaviour. Indeed.? And the thing that I am trying to avoid here is creating _yet another_ new context in which a different bag of ad-hoc conversions are possible.? While it might be justifiable from a local perspective to say "its OK if `int x` does unboxing, but having it do range checking seems new and different, so let's not do that", from a global perspective, that means we a new context ("pattern match context") to add to assignment, loose invocation, strict invocation, cast, and numeric contexts.? That is the kind of incremental complexity I'd like to avoid, if there is a unifying move we can pull. Conversions like unboxing or casting are burdened by the fact that they have to be total, which means the "does it fit" / "if so, do it" / "if not, do something else (truncate, throw, etc)" all have to be crammed into a single operation.? What pattern matching is extracts the "does it fit, and if so do it" into a more primitive operation, from which other operations can be composed. At some level, what I'm proposing is all spec-shuffling; we'll either say "a widening primitive conversion is allowed in assignment context", or we'll say that primitive `P p` matches any primitive type Q that can be widened to P.? We'll end up with a similar number of rules, but we might be able to "shake the box" to make them settle to a lower energy state, and be able to define (whether we explicitly do so or not) assignment context to support "all the cases where the LHS, viewed as a type pattern, are exhaustive on the RHS, potentially with remainder, and throws if remainder is encountered."? (That's what unboxing does; throws when remainder is encountered.) As to the range check, it has always bugged me that you see code that looks like: ??? if (i >= -127 && i <= 128) { byte b = (byte) i; ... } because of the accidental specificity, and the attendant risk of error (using <= instead of <, or using 127 instead of 128). Being able to say: ??? if (i instanceof byte b) { ... } is better not because it is more compact, but because you're actually asking the right question -- "does this int value fit in a byte."? I'm sad we don't really have a way to ask this question today; it seems an omission. > Intuitively, the behaviour you propose is kind of what we want - all > the possible byte cases end up in the byte case and we don't need to > adapt the long case to handle those that would have fit in a byte. > I'm slightly concerned that this changes Java's historical approach > and may lead to surprises when refactoring existing code that treats > unbox(Long) one way and unbox(Short) another. Will users be confused > when the unbox(Long) in the short right range ends up in a case that > was only intended for unbox(Short)? I'm having a hard time finding an > example that would trip on this but my lack of imagination isn't > definitive =) I'm worried about this too.? We examined it briefly, and ran away, when we were thinking about constant patterns, specifically: ??? Object o = ... ??? switch (o) { ??????? case 0: ... ??????? default: ... ??? } What would this mean?? What I wouldn't want it to mean is "match Long 0, Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the line for "magic".? (Note that this is about defining what the _constant pattern_ means, not the primitive type pattern.) I think its probably reasonable to say this is a type error; 0 is applicable to primitive numerics and their boxes, but not to Number or Object.? I think that is consistent with what I'm suggesting about primitive type patterns, but I'd have to think about it more. > Something like following shouldn't be surprising given the existing > rules around unbox + widening primitive conversion (though it may be > when first encountered as I expect most users haven't really > internalized the JLS 5.2 rules): As Alex said to me yesterday: "JLS Ch 5 contains many more words than any prospective reader would expect to find on the subject, but once the reader gets over the overwhelm of how much there is to say, will find none of the words surprising."? There's a deeper truth to this statement: Java is not actually as simple a language as its mythology suggests, but we win by hiding the complexity in places users generally don't have to look, and if and when they do confront the complexity, they find it unsurprising, and go back to ignoring it. So in point of fact, *almost no one* has read JLS 5.2, but it still does "what users would likely find reasonable". > Number n = ....; > switch(n) { > case long l -> ... > case int i -> .... // dead code > case byte b -> .... // dead code > default -> .... > } Correct.? We have rules for pattern dominance, which are used to give compile errors on dead cases; we'd have to work through the details to confirm that `long l` dominates `int i`, but I'd hope this is the case. > But this may be more surprising as I suggested above > > Number n = new Long(5); > switch(n) { > case byte b -> .... // matches here > case int i -> .... // > case long l -> ... > default -> .... > } > > Overall, I like the extra dynamic range check but would be fine with > leaving it out if it complicates the spec given it feels like a pretty > deep-in-the-weeds corner case. It is probably not a forced move to support the richer interpretation of primitive patterns now.? But I think the consequence of doing so may be surprising: rather than "simplifying the language" (as one might hope that "leaving something out" would do), I think there's a risk that it makes things more complicated, because (a) it effectively creates yet another conversion context that is distinct from the too-many we have now, and (b) creates a sharp edge where refactoring from local variable initialization to let-bind doesn't work, because assignment would then be looser than let-bind. One reason this is especially undesirable is that one of the forms of let-bind is a let-bind *expression*: ??? let P = p, Q = q ??? in which is useful for pulling out subexpressions and binding them to a variable, but for which the scope of that variable is limited.? If refactoring from: ??? int x = stuff; ??? m(f(stuff)); to ??? m(let x = stuff in f(stuff)) ??? // x no longer in scope here was not possible because of a silly mismatch between the conversions in let context and the conversions in assignment context, then we're putting users in the position of having to choose between richer conversions and richer scoping. (Usual warning (Remi): I'm mentioning let-expressions because it gives a sense of where some of these constraints come from, but this is not a suitable time to design the let-expression feature.) From james.laskey at oracle.com Thu Mar 3 13:57:50 2022 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 3 Mar 2022 13:57:50 +0000 Subject: Proposal: java.lang.runtime.Carrier Message-ID: We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. Motivation The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. Description In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; // Return a constructor MethodHandle for a carrier with components // aligning with the parameter types of the supplied methodType. static MethodHandle constructor(MethodType methodType) // Return a component getter MethodHandle for component i. static MethodHandle component(MethodType methodType, int i) // Return component getter MethodHandles for all the carrier's components. static MethodHandle[] components(MethodType methodType) Examples import java.lang.runtime.Carrier; ... // Define the carrier description. MethodType methodType = MethodType.methodType(Object.class, byte.class, short.class, char.class, int.class, long.class, float.class, double.class, boolean.class, String.class); // Fetch the carrier constructor. MethodHandle constructor = Carrier.constructor(methodType); // Create a carrier object. Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, "abcde"); // Get an array of accessors for the carrier object. MethodHandle[] components = Carrier.components(methodType); // Access fields. byte b = (byte)components[0].invokeExact(object); short s = (short)components[1].invokeExact(object); char c =(char)components[2].invokeExact(object); int i = (int)components[3].invokeExact(object); long l = (long)components[4].invokeExact(object); float f =(float)components[5].invokeExact(object); double d = (double)components[6].invokeExact(object); boolean tf (boolean)components[7].invokeExact(object); String s = (String)components[8].invokeExact(object)); // Access a specific field. MethodHandle component = Carrier.component(methodType, 3); int ii = (int)component.invokeExact(object); From heidinga at redhat.com Thu Mar 3 15:17:00 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 3 Mar 2022 10:17:00 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> Message-ID: On Wed, Mar 2, 2022 at 3:13 PM Brian Goetz wrote: > > > > On 3/2/2022 1:43 PM, Dan Heidinga wrote: > > > > Making the pattern match compatible with assignment conversions makes > > sense to me and follows a similar rationale to that used with > > MethodHandle::asType following the JLS 5.3 invocation conversions. > > Though with MHs we had the ability to add additional conversions under > > MethodHandles::explicitCastArguments. With pattern matching, we don't > > have the same ability to make the "extra" behaviour opt-in / opt-out. > > We just get one chance to pick the right behaviour. > > Indeed. And the thing that I am trying to avoid here is creating _yet > another_ new context in which a different bag of ad-hoc conversions are > possible. While it might be justifiable from a local perspective to say > "its OK if `int x` does unboxing, but having it do range checking seems > new and different, so let's not do that", from a global perspective, > that means we a new context ("pattern match context") to add to > assignment, loose invocation, strict invocation, cast, and numeric > contexts. That is the kind of incremental complexity I'd like to avoid, > if there is a unifying move we can pull. I'm in agreement on not adding new contexts but I had the opposite impression here. Doesn't "having it do range checking" require a new context as this is different from what assignment contexts allow today? Or is it the case that regular, non-match assignment must be total with no left over that allows them to use the same context despite not being able to do the dynamic range check? As this sentence shows, I'm confused on how dynamic range checking fits in the existing assignment context. Or are we suggesting that assignment allows: byte b = new Long(5); to succeed if we can unbox + meet the dynamic range check? I'm clearly confused here. > Conversions like unboxing or casting are burdened by the fact that they > have to be total, which means the "does it fit" / "if so, do it" / "if > not, do something else (truncate, throw, etc)" all have to be crammed > into a single operation. What pattern matching is extracts the "does it > fit, and if so do it" into a more primitive operation, from which other > operations can be composed. Is it accurate to say this is less reusing assignment context and more completely replacing it with a new pattern context from which assignment can be built on top of? > At some level, what I'm proposing is all spec-shuffling; we'll either > say "a widening primitive conversion is allowed in assignment context", > or we'll say that primitive `P p` matches any primitive type Q that can > be widened to P. We'll end up with a similar number of rules, but we > might be able to "shake the box" to make them settle to a lower energy > state, and be able to define (whether we explicitly do so or not) > assignment context to support "all the cases where the LHS, viewed as a > type pattern, are exhaustive on the RHS, potentially with remainder, and > throws if remainder is encountered." (That's what unboxing does; throws > when remainder is encountered.) Ok. So maybe I'm not confused. We'd allow the `byte b = new Long(5);` code to compile and throw not only on a failed unbox, but also on a dynamic range check failure. If we took this "dynamic hook" behaviour to the limit, what other new capabilities does it unlock? Is this the place to connect other user-supplied conversion operations as well? Maybe I'm running too far with this idea but it seems like this could be laying the groundwork for other interesting behaviours. Am I way off in the weeds here? > > As to the range check, it has always bugged me that you see code that > looks like: > > if (i >= -127 && i <= 128) { byte b = (byte) i; ... } > > because of the accidental specificity, and the attendant risk of error > (using <= instead of <, or using 127 instead of 128). Being able to say: > > if (i instanceof byte b) { ... } > > is better not because it is more compact, but because you're actually > asking the right question -- "does this int value fit in a byte." I'm > sad we don't really have a way to ask this question today; it seems an > omission. I had been thinking about this when I wrote my response and I like having the compiler generate the range check for me. As you say, way easier to avoid errors that way. > > > Intuitively, the behaviour you propose is kind of what we want - all > > the possible byte cases end up in the byte case and we don't need to > > adapt the long case to handle those that would have fit in a byte. > > I'm slightly concerned that this changes Java's historical approach > > and may lead to surprises when refactoring existing code that treats > > unbox(Long) one way and unbox(Short) another. Will users be confused > > when the unbox(Long) in the short right range ends up in a case that > > was only intended for unbox(Short)? I'm having a hard time finding an > > example that would trip on this but my lack of imagination isn't > > definitive =) > > I'm worried about this too. We examined it briefly, and ran away, when > we were thinking about constant patterns, specifically: > > Object o = ... > switch (o) { > case 0: ... > default: ... > } > > What would this mean? What I wouldn't want it to mean is "match Long 0, > Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the > line for "magic". (Note that this is about defining what the _constant > pattern_ means, not the primitive type pattern.) I think its probably > reasonable to say this is a type error; 0 is applicable to primitive > numerics and their boxes, but not to Number or Object. I think that is > consistent with what I'm suggesting about primitive type patterns, but > I'd have to think about it more. Object o =... switch(o) { case (long)0: ... // can we say this? Probably not case long l && l == 0: // otherwise this would become the way to catch most of the constant 0 cases default: .... } I'm starting to think the constant pattern will feel less like magic once the dynamic range checking becomes commonplace. > > Something like following shouldn't be surprising given the existing > > rules around unbox + widening primitive conversion (though it may be > > when first encountered as I expect most users haven't really > > internalized the JLS 5.2 rules): > > As Alex said to me yesterday: "JLS Ch 5 contains many more words than > any prospective reader would expect to find on the subject, but once the > reader gets over the overwhelm of how much there is to say, will find > none of the words surprising." There's a deeper truth to this > statement: Java is not actually as simple a language as its mythology > suggests, but we win by hiding the complexity in places users generally > don't have to look, and if and when they do confront the complexity, > they find it unsurprising, and go back to ignoring it. > > So in point of fact, *almost no one* has read JLS 5.2, but it still does > "what users would likely find reasonable". > > > Number n = ....; > > switch(n) { > > case long l -> ... > > case int i -> .... // dead code > > case byte b -> .... // dead code > > default -> .... > > } > > Correct. We have rules for pattern dominance, which are used to give > compile errors on dead cases; we'd have to work through the details to > confirm that `long l` dominates `int i`, but I'd hope this is the case. > > > But this may be more surprising as I suggested above > > > > Number n = new Long(5); > > switch(n) { > > case byte b -> .... // matches here > > case int i -> .... // > > case long l -> ... > > default -> .... > > } > > > > Overall, I like the extra dynamic range check but would be fine with > > leaving it out if it complicates the spec given it feels like a pretty > > deep-in-the-weeds corner case. > > It is probably not a forced move to support the richer interpretation of > primitive patterns now. But I think the consequence of doing so may be > surprising: rather than "simplifying the language" (as one might hope > that "leaving something out" would do), I think there's a risk that it > makes things more complicated, because (a) it effectively creates yet > another conversion context that is distinct from the too-many we have > now, and (b) creates a sharp edge where refactoring from local variable > initialization to let-bind doesn't work, because assignment would then > be looser than let-bind. Ok. You're saying that the dynamic range check is essential enough that it's worth a new context for if we can't adjust the meaning of assignment context. > > One reason this is especially undesirable is that one of the forms of > let-bind is a let-bind *expression*: > > let P = p, Q = q > in > > which is useful for pulling out subexpressions and binding them to a > variable, but for which the scope of that variable is limited. If > refactoring from: > Possible typo in the example. Attempted to fix: > int x = stuff; > m(f(x)); > > to > > m(let x = stuff in f(x)) > // x no longer in scope here Not sure I follow this example. I'm not sure why introducing a new variable in this scope is useful. > > was not possible because of a silly mismatch between the conversions in > let context and the conversions in assignment context, then we're > putting users in the position of having to choose between richer > conversions and richer scoping. Ok. I think I see where this is going and while it may be clearer with a larger example, I agree with the principle that this refactoring should be possible. --Dan > > (Usual warning (Remi): I'm mentioning let-expressions because it gives a > sense of where some of these constraints come from, but this is not a > suitable time to design the let-expression feature.) > > From brian.goetz at oracle.com Thu Mar 3 15:29:51 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 10:29:51 -0500 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> Thanks Jim. As background, (some form of) this code originated in a prototype for pattern matching, where we needed a carrier for a tuple (T, U, V) to carry the results of a match from a deconstruction pattern (or other declared pattern) on the stack as a return value.? We didn't want to spin a custom class per pattern, and we didn't want to commit to the actual layout, because we wanted to preserve the ability to switch later to a value class.? So the idea is you describe the carrier you want as a MethodType, and there's a condy that gives you an MH that maps that shape of arguments to an opaque carrier (the constructor), and other condys that give you MHs that map from the carrier to the individual bindings.? So pattern matching will stick those MHs in CP slots. The carrier might be some bespoke thing (e.g., record anon(T t, U u, V v)), or something that holds an Object[], or something with three int fields and two ref fields, or whatever the runtime decides to serve up. The template mechanism wants almost exactly the same thing for bundling the parameters for uninterprted template strings. Think of it as a macro-box; instead of boxing primitives to Object and Objects to varargs, there's a single boxing operation from a tuple to an opaque type. On 3/3/2022 8:57 AM, Jim Laskey wrote: > > We propose to provide a runtime /anonymous carrier class object > generator/; *java.lang.runtime.Carrier*. This generator class is > designed to share /anonymous classes/?when shapes are similar. For > example, if several clients require objects containing two integer > fields, then *Carrier*?will ensure that each client generates carrier > objects using the same underlying anonymous class. > > Providing this mechanism decouples the strategy for carrier class > generation from the client facility. One could implement one class per > shape; one class for all shapes (with an Object[]), or something in > the middle; having this decision behind a bootstrap means that it can > be evolved at runtime, and optimized differently for different situations. > > > Motivation > > The String Templates JEP draft > ?proposes the > introduction of a /TemplatedString/?object for the primary purpose of > /carrying/?the /template/?and associated /values/?derived from a > /template literal/. To avoid value boxing, early prototypes described > these /carrier/objects using /per-callsite/?anonymous classes shaped > by value types, The use of distinct anonymous classes here is > overkill, especially considering that many of these classes are > similar; containing one or two object fields and/or one or two > integral fields. /Pattern matching/?has a similar issue when carrying > the values for the /holes/?of a pattern. With potentially hundreds > (thousands?) of template literals or patterns per application, we need > to find an alternate approach for these /value carriers/. > > > Description > > In general terms, the *Carrier*?class simply caches anonymous classes > keyed on shape. To further increase similarity in shape, the ordering > of value types is handled by the API and not in the underlying > anonymous class. If one client requires an object with one object > value and one integer value and a second client requires an object > with one integer value and one object value, then both clients will > use the same underlying anonymous class. Further, types are folded as > either integer (byte, short, int, boolean, char, float), long (long, > double) or object. [We've seen that performance hit by folding the > long group into the integer group is significant, hence the separate > group.] > > The *Carrier*?API uses MethodType parameter types to describe the > shape of a carrier. This incorporates with the primary use case where > bootstrap methods need to capture indy non-static arguments. The API > has three static methods; > > |// Return a constructor MethodHandle for a carrier with components // > aligning with the parameter types of the supplied methodType. static > MethodHandle constructor(MethodType methodType) // Return a component > getter MethodHandle for component i. static MethodHandle > component(MethodType methodType, int i) // Return component getter > MethodHandles for all the carrier's components. static MethodHandle[] > components(MethodType methodType)| > > > Examples > > |import java.lang.runtime.Carrier; ... // Define the carrier > description. MethodType methodType = > MethodType.methodType(Object.class, byte.class, short.class, > char.class, int.class, long.class, float.class, double.class, > boolean.class, String.class); // Fetch the carrier constructor. > MethodHandle constructor = Carrier.constructor(methodType); // Create > a carrier object. Object object = > (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', > 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, > "abcde"); // Get an array of accessors for the carrier object. > MethodHandle[] components = Carrier.components(methodType); // Access > fields. byte b = (byte)components[0].invokeExact(object); short s = > (short)components[1].invokeExact(object); char c > =(char)components[2].invokeExact(object); int i = > (int)components[3].invokeExact(object); long l = > (long)components[4].invokeExact(object); float f > =(float)components[5].invokeExact(object); double d = > (double)components[6].invokeExact(object); boolean tf > (boolean)components[7].invokeExact(object); String s = > (String)components[8].invokeExact(object)); // Access a specific > field. MethodHandle component = Carrier.component(methodType, 3); int > ii = (int)component.invokeExact(object);| > From brian.goetz at oracle.com Thu Mar 3 16:09:29 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 11:09:29 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> Message-ID: > I'm in agreement on not adding new contexts but I had the opposite > impression here. Doesn't "having it do range checking" require a new > context as this is different from what assignment contexts allow > today? Or is it the case that regular, non-match assignment must be > total with no left over that allows them to use the same context > despite not being able to do the dynamic range check? As this > sentence shows, I'm confused on how dynamic range checking fits in the > existing assignment context. > > Or are we suggesting that assignment allows: > > byte b = new Long(5); > > to succeed if we can unbox + meet the dynamic range check? I'm > clearly confused here. At a meta level, the alignment target is: ?? - given a target type `T` ?? - given an expression `e : E` then: ? - being able to statically determine whether `T t` matches `e` should be equivalent to whether the assignment `T t = e` is valid under the existing 5.2 rules. That is to say, the existing 5.2 rules may look like a bag of ad-hoc, two-for-one-on-tuesday rules, but really, they will be revealed to be the set of conversions that are consistent with statically determining whether `T t` matches `e : E`.? Most of these rules involve only T and E (e.g., widening primitive conversion), but one of them is about ranges, which we can only statically assess when `e` is a constant. > >> Conversions like unboxing or casting are burdened by the fact that they >> have to be total, which means the "does it fit" / "if so, do it" / "if >> not, do something else (truncate, throw, etc)" all have to be crammed >> into a single operation. What pattern matching is extracts the "does it >> fit, and if so do it" into a more primitive operation, from which other >> operations can be composed. > Is it accurate to say this is less reusing assignment context and more > completely replacing it with a new pattern context from which > assignment can be built on top of? Yes!? Ideally, this is one of those "jack up the house and provide a solid foundation" moves. > >> At some level, what I'm proposing is all spec-shuffling; we'll either >> say "a widening primitive conversion is allowed in assignment context", >> or we'll say that primitive `P p` matches any primitive type Q that can >> be widened to P. We'll end up with a similar number of rules, but we >> might be able to "shake the box" to make them settle to a lower energy >> state, and be able to define (whether we explicitly do so or not) >> assignment context to support "all the cases where the LHS, viewed as a >> type pattern, are exhaustive on the RHS, potentially with remainder, and >> throws if remainder is encountered." (That's what unboxing does; throws >> when remainder is encountered.) > Ok. So maybe I'm not confused. We'd allow the `byte b = new Long(5);` > code to compile and throw not only on a failed unbox, but also on a > dynamic range check failure. No ;) Today, we would disallow this assignment because it is not an unboxing followed by a primitive widening.? (The opposite, long l = new Byte(3), would be allowed today, except that we took away these constructors so you have to use valueOf.)? We would only allow a narrowing if the RHS were a constant, like "5", in which case the compiler would statically evaluate the range check and narrow 5 to byte. Tomorrow, the assignment would be the same; assignment works based on "statically determined to match", and we can only statically determine the range check if we know the target value, i.e., its a constant.? But, if you *asked*, then you can get a dynamic range check: ??? if (anInt matches byte b) // we get a range check here The reason we don't do that with assignment is we don't know what to do if it doesn't match.? But if its in a conditional context (if or switch), then the programmer is going to tell us what to do if it doesn't match. > If we took this "dynamic hook" behaviour to the limit, what other new > capabilities does it unlock? Is this the place to connect other > user-supplied conversion operations as well? Maybe I'm running too > far with this idea but it seems like this could be laying the > groundwork for other interesting behaviours. Am I way off in the > weeds here? Not entirely in the weeds.? The problem with assignment, casting, and all of those things is that they have to be total; when you say "x = y" then the guarantee is that *something* got assigned to x. Now, we are already cheating a bit, because `x = y` allows unboxing, and unboxing can throw.? (Sounds like remainder rejection!)?? Now, imagine we had an "assign or else" construct (with static types A and B): ??? a := (b, e) then this would mean ??? if (b matches A aa) ??????? a = aa ??? else ??????? a = e? // and maybe e is really a function of b In the case of unboxing conversions, our existing assignment works kind of like: ??? a := (b, throw new NPE) because we'd try to match, and if it fails, evaluate the second component, which throws. Obviously I'm not suggesting we tinker with assignment in this way, but the point is: pattern matching gives you a chance to stop and say: "don't do it yet, but if you did it, would it work?" > >> >>> Intuitively, the behaviour you propose is kind of what we want - all >>> the possible byte cases end up in the byte case and we don't need to >>> adapt the long case to handle those that would have fit in a byte. >>> I'm slightly concerned that this changes Java's historical approach >>> and may lead to surprises when refactoring existing code that treats >>> unbox(Long) one way and unbox(Short) another. Will users be confused >>> when the unbox(Long) in the short right range ends up in a case that >>> was only intended for unbox(Short)? I'm having a hard time finding an >>> example that would trip on this but my lack of imagination isn't >>> definitive =) >> I'm worried about this too. We examined it briefly, and ran away, when >> we were thinking about constant patterns, specifically: >> >> Object o = ... >> switch (o) { >> case 0: ... >> default: ... >> } >> >> What would this mean? What I wouldn't want it to mean is "match Long 0, >> Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the >> line for "magic". (Note that this is about defining what the _constant >> pattern_ means, not the primitive type pattern.) I think its probably >> reasonable to say this is a type error; 0 is applicable to primitive >> numerics and their boxes, but not to Number or Object. I think that is >> consistent with what I'm suggesting about primitive type patterns, but >> I'd have to think about it more. > Object o =... > switch(o) { > case (long)0: ... // can we say this? Probably not > case long l && l == 0: // otherwise this would become the way to > catch most of the constant 0 cases > default: .... > } > > I'm starting to think the constant pattern will feel less like magic > once the dynamic range checking becomes commonplace. Probably can't say `case (long) 0`, but you can say `case 0L`. Though we don't have suffixes for all the types. > >> One reason this is especially undesirable is that one of the forms of >> let-bind is a let-bind *expression*: >> >> let P = p, Q = q >> in >> >> which is useful for pulling out subexpressions and binding them to a >> variable, but for which the scope of that variable is limited. If >> refactoring from: >> > Possible typo in the example. Attempted to fix: > >> int x = stuff; >> m(f(x)); >> >> to >> >> m(let x = stuff in f(x)) >> // x no longer in scope here > Not sure I follow this example. I'm not sure why introducing a new > variable in this scope is useful. Two reasons: narrower scope for locals, and turning statements into expressions. A common expression with redundant subexpressions is "last 3 characters of string": ??? last3 = s.substring(s.length() - 3, s.length()) We can refactor to ??? int sLen = s.length(); ??? last3 = s.substring(sLen - 3, sLen); but some people dislike this because now the rest of the scope is "polluted" with a garbage variable.? A let expression narrows the scope of sLen: ??? last3 = let sLen = s.length() ??????????????? in s.substring(sLen - 3, sLen); This becomes more important when we want to use the result in, say, a method call; now we have to unroll the declaration of any helper statements (e.g., `int sLen = s.length()`) to outside the method call.? A similar thing happens when we want to create an object, mutate it, and return it; this often requires statements, but a let expression turns it back into an expression. From forax at univ-mlv.fr Thu Mar 3 18:20:05 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 3 Mar 2022 19:20:05 +0100 (CET) Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> Message-ID: <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> For the pattern matching, we also need a 'with' method, that return a method handle that takes a carrier and a value and return a new carrier with the component value updated. static MethodHandle withComponent(MethodType methodType, int i) // returns a mh (Carrier;T) -> Carrier with T the type of the component It can be built on top of constructor() + component() but i think that i should be part of the API instead of every user of the Carrier API trying to re-implement it. In term of spec, Jim, can you rename "component getter" to "component accessor" which is the term used by records. R?mi > From: "Brian Goetz" > To: "Jim Laskey" , "amber-spec-experts" > > Sent: Thursday, March 3, 2022 4:29:51 PM > Subject: Re: Proposal: java.lang.runtime.Carrier > Thanks Jim. > As background, (some form of) this code originated in a prototype for pattern > matching, where we needed a carrier for a tuple (T, U, V) to carry the results > of a match from a deconstruction pattern (or other declared pattern) on the > stack as a return value. We didn't want to spin a custom class per pattern, and > we didn't want to commit to the actual layout, because we wanted to preserve > the ability to switch later to a value class. So the idea is you describe the > carrier you want as a MethodType, and there's a condy that gives you an MH that > maps that shape of arguments to an opaque carrier (the constructor), and other > condys that give you MHs that map from the carrier to the individual bindings. > So pattern matching will stick those MHs in CP slots. > The carrier might be some bespoke thing (e.g., record anon(T t, U u, V v)), or > something that holds an Object[], or something with three int fields and two > ref fields, or whatever the runtime decides to serve up. > The template mechanism wants almost exactly the same thing for bundling the > parameters for uninterprted template strings. > Think of it as a macro-box; instead of boxing primitives to Object and Objects > to varargs, there's a single boxing operation from a tuple to an opaque type. > On 3/3/2022 8:57 AM, Jim Laskey wrote: >> We propose to provide a runtime anonymous carrier class object generator ; >> java.lang.runtime.Carrier . This generator class is designed to share anonymous >> classes when shapes are similar. For example, if several clients require >> objects containing two integer fields, then Carrier will ensure that each >> client generates carrier objects using the same underlying anonymous class. >> Providing this mechanism decouples the strategy for carrier class generation >> from the client facility. One could implement one class per shape; one class >> for all shapes (with an Object[]), or something in the middle; having this >> decision behind a bootstrap means that it can be evolved at runtime, and >> optimized differently for different situations. Motivation >> The [ https://bugs.openjdk.java.net/browse/JDK-8273943 | String Templates JEP >> draft ] proposes the introduction of a TemplatedString object for the primary >> purpose of carrying the template and associated values derived from a template >> literal . To avoid value boxing, early prototypes described these carrier >> objects using per-callsite anonymous classes shaped by value types, The use of >> distinct anonymous classes here is overkill, especially considering that many >> of these classes are similar; containing one or two object fields and/or one or >> two integral fields. Pattern matching has a similar issue when carrying the >> values for the holes of a pattern. With potentially hundreds (thousands?) of >> template literals or patterns per application, we need to find an alternate >> approach for these value carriers . Description >> In general terms, the Carrier class simply caches anonymous classes keyed on >> shape. To further increase similarity in shape, the ordering of value types is >> handled by the API and not in the underlying anonymous class. If one client >> requires an object with one object value and one integer value and a second >> client requires an object with one integer value and one object value, then >> both clients will use the same underlying anonymous class. Further, types are >> folded as either integer (byte, short, int, boolean, char, float), long (long, >> double) or object. [We've seen that performance hit by folding the long group >> into the integer group is significant, hence the separate group.] >> The Carrier API uses MethodType parameter types to describe the shape of a >> carrier. This incorporates with the primary use case where bootstrap methods >> need to capture indy non-static arguments. The API has three static methods; >> // Return a constructor MethodHandle for a carrier with components >> // aligning with the parameter types of the supplied methodType. >> static MethodHandle constructor(MethodType methodType) >> // Return a component getter MethodHandle for component i. >> static MethodHandle component(MethodType methodType, int i) >> // Return component getter MethodHandles for all the carrier's components. >> static MethodHandle[] components(MethodType methodType) >> Examples >> import java.lang.runtime.Carrier; >> ... >> // Define the carrier description. >> MethodType methodType = >> MethodType.methodType(Object.class, byte.class, short.class, >> char.class, int.class, long.class, >> float.class, double.class, >> boolean.class, String.class); >> // Fetch the carrier constructor. >> MethodHandle constructor = Carrier.constructor(methodType); >> // Create a carrier object. >> Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, >> 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, >> 1.0f / 3.0f, 1.0 / 3.0, >> true, "abcde"); >> // Get an array of accessors for the carrier object. >> MethodHandle[] components = Carrier.components(methodType); >> // Access fields. >> byte b = (byte)components[0].invokeExact(object); >> short s = (short)components[1].invokeExact(object); >> char c =(char)components[2].invokeExact(object); >> int i = (int)components[3].invokeExact(object); >> long l = (long)components[4].invokeExact(object); >> float f =(float)components[5].invokeExact(object); >> double d = (double)components[6].invokeExact(object); >> boolean tf (boolean)components[7].invokeExact(object); >> String s = (String)components[8].invokeExact(object)); >> // Access a specific field. >> MethodHandle component = Carrier.component(methodType, 3); >> int ii = (int)component.invokeExact(object); From brian.goetz at oracle.com Thu Mar 3 18:29:21 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 13:29:21 -0500 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> Message-ID: > For the pattern matching, > we also need a 'with' method, that return a method handle that takes a > carrier and a value and return a new carrier with the component value > updated. It is not clear to me why we "need" this.? Rather than jumping right to "Here is the solution", can you instead try to shine some light on the problem you are trying to solve? > In term of spec, Jim, can you rename "component getter" to "component > accessor" which is the term used by records. "Accessor" is a perfectly OK term, but remember, these are not records, and "record component" means something.? I think its OK to use accessor here, not because its what records do, but because we don't need to give people the idea that this has something to do with beans. From kasperni at gmail.com Thu Mar 3 18:36:45 2022 From: kasperni at gmail.com (Kasper Nielsen) Date: Thu, 3 Mar 2022 18:36:45 +0000 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: Looks good. I don't know about the name though. Are there still plans for adding a java.util.concurrent.Carrier type? https://mail.openjdk.java.net/pipermail/loom-dev/2020-March/001122.html What about java.lang.runtime.ObjectShapes instead? Would fit in with ObjectMethods. /Kasper On Thu, 3 Mar 2022 at 13:59, Jim Laskey wrote: > We propose to provide a runtime anonymous carrier class object generator; > java.lang.runtime.Carrier. This generator class is designed to share > anonymous classes when shapes are similar. For example, if several clients > require objects containing two integer fields, then Carrier will ensure > that each client generates carrier objects using the same underlying > anonymous class. > From forax at univ-mlv.fr Thu Mar 3 18:42:01 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 3 Mar 2022 19:42:01 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> Message-ID: <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" , "Jim Laskey" > Cc: "amber-spec-experts" > Sent: Thursday, March 3, 2022 7:29:21 PM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >> For the pattern matching, >> we also need a 'with' method, that return a method handle that takes a carrier >> and a value and return a new carrier with the component value updated. > It is not clear to me why we "need" this. Rather than jumping right to "Here is > the solution", can you instead try to shine some light on the problem you are > trying to solve? When you have nested record patterns, each of these patterns contribute to introduce bindings, so when executing the code of the pattern matching, the code that match a nested pattern needs to add values into the carrier object. Given that the Carrier API is non mutable, we need the equivalent of a functional setter, a wither. R?mi From maurizio.cimadamore at oracle.com Thu Mar 3 18:49:43 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 3 Mar 2022 18:49:43 +0000 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: Seems sensible. As a possible "test", we could perhaps use this mechanism in the JDK implementation of LambdaForms? We do have places where we spin "species" classes: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/invoke/ClassSpecializer.java (that said, maybe species classes contain a bit more than just data, so perhaps that's a wrong fit - but anyway, worth talking a look for possible code duplication). Maurizio On 03/03/2022 13:57, Jim Laskey wrote: > > We propose to provide a runtime /anonymous carrier class object > generator/; *java.lang.runtime.Carrier*. This generator class is > designed to share /anonymous classes/?when shapes are similar. For > example, if several clients require objects containing two integer > fields, then *Carrier*?will ensure that each client generates carrier > objects using the same underlying anonymous class. > > Providing this mechanism decouples the strategy for carrier class > generation from the client facility. One could implement one class per > shape; one class for all shapes (with an Object[]), or something in > the middle; having this decision behind a bootstrap means that it can > be evolved at runtime, and optimized differently for different situations. > > > Motivation > > The String Templates JEP draft > ?proposes the > introduction of a /TemplatedString/?object for the primary purpose of > /carrying/?the /template/?and associated /values/?derived from a > /template literal/. To avoid value boxing, early prototypes described > these /carrier/objects using /per-callsite/?anonymous classes shaped > by value types, The use of distinct anonymous classes here is > overkill, especially considering that many of these classes are > similar; containing one or two object fields and/or one or two > integral fields. /Pattern matching/?has a similar issue when carrying > the values for the /holes/?of a pattern. With potentially hundreds > (thousands?) of template literals or patterns per application, we need > to find an alternate approach for these /value carriers/. > > > Description > > In general terms, the *Carrier*?class simply caches anonymous classes > keyed on shape. To further increase similarity in shape, the ordering > of value types is handled by the API and not in the underlying > anonymous class. If one client requires an object with one object > value and one integer value and a second client requires an object > with one integer value and one object value, then both clients will > use the same underlying anonymous class. Further, types are folded as > either integer (byte, short, int, boolean, char, float), long (long, > double) or object. [We've seen that performance hit by folding the > long group into the integer group is significant, hence the separate > group.] > > The *Carrier*?API uses MethodType parameter types to describe the > shape of a carrier. This incorporates with the primary use case where > bootstrap methods need to capture indy non-static arguments. The API > has three static methods; > > |// Return a constructor MethodHandle for a carrier with components // > aligning with the parameter types of the supplied methodType. static > MethodHandle constructor(MethodType methodType) // Return a component > getter MethodHandle for component i. static MethodHandle > component(MethodType methodType, int i) // Return component getter > MethodHandles for all the carrier's components. static MethodHandle[] > components(MethodType methodType)| > > > Examples > > |import java.lang.runtime.Carrier; ... // Define the carrier > description. MethodType methodType = > MethodType.methodType(Object.class, byte.class, short.class, > char.class, int.class, long.class, float.class, double.class, > boolean.class, String.class); // Fetch the carrier constructor. > MethodHandle constructor = Carrier.constructor(methodType); // Create > a carrier object. Object object = > (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', > 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, > "abcde"); // Get an array of accessors for the carrier object. > MethodHandle[] components = Carrier.components(methodType); // Access > fields. byte b = (byte)components[0].invokeExact(object); short s = > (short)components[1].invokeExact(object); char c > =(char)components[2].invokeExact(object); int i = > (int)components[3].invokeExact(object); long l = > (long)components[4].invokeExact(object); float f > =(float)components[5].invokeExact(object); double d = > (double)components[6].invokeExact(object); boolean tf > (boolean)components[7].invokeExact(object); String s = > (String)components[8].invokeExact(object)); // Access a specific > field. MethodHandle component = Carrier.component(methodType, 3); int > ii = (int)component.invokeExact(object);| > From brian.goetz at oracle.com Thu Mar 3 18:59:01 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 13:59:01 -0500 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> Message-ID: <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> > For the pattern matching, > we also need a 'with' method, that return a method handle that > takes a carrier and a value and return a new carrier with the > component value updated. > > > It is not clear to me why we "need" this.? Rather than jumping > right to "Here is the solution", can you instead try to shine some > light on the problem you are trying to solve? > > > When you have nested record patterns, each of these patterns > contribute to introduce bindings, so when executing the code of the > pattern matching, the code that match a nested pattern needs to add > values into the carrier object. Given that the Carrier API is non > mutable, we need the equivalent of a functional setter, a wither. > I don't think we need to do this. Recall what nested patterns means: if R(T t) is a record, and Q is a pattern that is not total, then ??? x matches R(Q) means ??? x matches R(T t) && t matches Q So if we have ??? record R(S s) { } ??? record S(int a, int b) { } then ??? case R(S(var a, var b)) operates by matching the target to R, deconstructing with the R deconstructor, which yields a carrier of shape (S).? Then we further match the first component of this carrier to the S deconstructor, which yields a carrier of shape (II).? No mutation needed. Note that this unrolling can happen in one of two ways: ?- The compiler just unrolls it doing plain vanilla compiler stuff ?- A pattern runtime has a nesting combinator that takes a pattern description for an outer and an inner pattern, which when evaluated with R and S, yields a carrier of shape (R;S;II), the compiler evaluates this nesting combinator with condy, and uses that to do the match. Either way, we don't need to mutate or replace carriers. From brian.goetz at oracle.com Thu Mar 3 20:42:25 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 15:42:25 -0500 Subject: Telling the totality story Message-ID: Given the misconceptions about totality and whether "pattern matching means the same thing in all contexts", it is pretty clear that the story we're telling is not evoking the right concepts.? It is important not only to have the right answer, but also to have the story that helps people understand why its right, so let's take another whack at that. (The irony here is that it only makes a difference at one point -- null -- which is the least interesting part of the story.? So it is much sound and fury that signifies nothing.) None of what we say below represents a change in the language; it is just test-driving a new way to tell the same story.? (From a mathematical and specification perspective, it is a much worse story, since it is full of accidental detail about nulls (among other sins), but it might make some people happier, and maybe we can learn something from the reaction.) The key change here is to say that every pattern-matching construct needs special rules for null.? This is already true, so we're just doubling down on it.? None of this changes anything about how it works. #### Instanceof We define: ??? x instanceof P === (x != null) && x matches P This means that instanceof always says false on null; on non-null, we can ask the pattern.? This is consistent with instanceof today.? P will just never see a null in this case. We'll define "matches" soon. #### Switch We play the same game with switch.? A switch: ??? switch (x) { ??????? case P1: ??????? case P2: ??????? ... ??? } is defined to mean: ?- if x is null ?? - if one of P1..Pn is the special label "null", then we take that branch ?? - else we throw NPE ?- otherwise, we start trying to match against P1..Pn sequentially ?- if none match (its remainder), throw MatchException This is consistent with switch today, in that there are no null labels, so null always falls into the NPE bucket.? It is also easy to keep track of; if the switch does not say "case null", it does not match null.? None of the patterns P1..Pn will ever see a null. #### Covering patterns Let's define what it means for a pattern to cover a type T. This is just a new name for totality; we say that an "any" pattern covers all types T, and a type pattern `U u` covers T when `T <: U`.? (When we do primitive type patterns, there will be more terms in this list, but they'll still be type patterns.)? No other pattern is a covering pattern.? Covering patterns all have a single binding (`var x`, `String s`). #### Let This is where it gets ugly.? Let has no opinions about null, but the game here is to pretend it does.? So in a let statement: ??? let P = e ?- Evaluate e ?- If e is null ?? - If P is a covering pattern, its binding is bound to null; ?? - else we throws NPE ?- Otherwise, e is matched to P.? If it does not match (its remainder), a MatchException is thrown (or the else clause is taken, if one is present.) #### Nesting Given a nested pattern R(P), where R is `record R(T t) { }`, this means: ?- Match the target to record R, and note the resulting t ?? - if P is a covering pattern, we put the resulting `t` in the binding of P without further matching ?? - Otherwise, we match t to P In other words: when a "non trivial" (i.e., non covering) pattern P is nested in a record pattern, then: ??? x instanceof R(P) === x instanceof R(var a) && a instanceof P and a covering pattern is interpreted as "not really a pattern", and we just slam the result of the outer binding into the inner binding. #### Matching What we've done now is ensure that no pattern ever actually encounters a null, because the enclosing constructs always filter the nulls out.? This means we can say: ?- var patterns match everything ?- `T t` is gated by `instanceof T` ?- R(P) is gated by `instanceof R` ## Is this a better way to tell the story? This is clearly a much more complex way to tell the story; every construct must have a special case for null, and we must be prepared to treat simple patterns (`var x`, `String s`) not always as patterns, but sometimes as mere declarations whose bindings will be forcibly crammed from "outside". Note too this will not scale to declared total patterns that treat null as a valid target value, if we ever want them; it depends on being able to "peek inside" the pattern from the outside, and say "oh, you're a covering pattern, I can set your binding without asking you."? That won't work with declared total patterns, which are opaque and may have multiple bindings.? (Saying "this doesn't bother me" is saying "I'm fine with a new null gate for total declared patterns.") It also raises the profile of null in the story, which kind of sucks; while it is not new "null gates", it is new stories that is prominently parameterized by null.? (Which is a funny outcome for a system explicitly designed to *not* create new null gates!) #### Further editorializing So, having written it out, I dislike it even more than that I thought I would when I started writing this mail. Overall this seems like a lot of mental gymnastics to avoid acknowledging the reality that pattern matching has three parameters, not two -- the pattern, the target, and the *static type* of the target. I think the root cause here is that when we faced the lump-or-split decision of "do we reuse instanceof for pattern matching or not" back in Java 13, the choice to lump (which was a rational and defensible choice, but so was splitting) created an expectation that "no pattern ever matches null", because instanceof is the first place people saw patterns, and instanceof never matches null.? While this expectation is only consistent with the simplest of patterns (and is only problematic at null), there seems to be a desire to force "the semantics of matching pattern P is whatever instanceof says matches P." I suspect that had we chosen the split route for instanceof (which would not have been as silly as choosing the split route for switch, but still lumping seemed better), we would not be facing this situation.? But even if we decide, in hindsight, that lumping was a mistake, I do not want to compound this mistake by distorting the meaning of pattern matching to cater to it.? It might seem harmless from the outside, but it will dramatically limit how far we can push pattern matching in the future. ## Tweaking the current story So, I think the main thing we can control about the story is the terminology.? I think part of what people find confusing is the use of the term "total", since that's a math-y term, and also it collides with "exhaustive", which is similar but not entirely coincident. I won't try to find new terms here, but I think there are three relevant terms needed: ?- "Pattern set P* is exhaustive on type T."? This is a static property; it means that the pattern set P* covers all the "reasonable" values of T.? Null is only one "potentially unreasonable" value. ?- "Pattern P matches expression e".? This is a dynamic property. - "Pattern P is total on type T".? This is both a static and dynamic property; it means the pattern matches all values of T, and that we know this statically. What we've been calling "remainder" is the set of T values that are not matched by any of the patterns in a set P* that is exhaustive on T.? (Exhaustiveness is static; matching is dynamic; the difference is remainder.) I think the name "exhaustive" is pretty good; we've been using the term "exhaustiveness checking" for switch for a while.? The problem could be "matches", as it has a dynamic sound to it; the problem could also be "total".? The above excursion calls totality "covering", but having both "exhaustive" and "covering" in the same scheme is likely to confuse people, so I won't suggest that.? Perhaps "statically total"?? This has the nice implication that matching is statically known to always succeed.? This would give us: ?- If P is statically total on T, it is exhaustive on T; ?- Any patterns (var x) are statically total on all T; ?- Type patterns `T t` are statically total on `U <: T` (and more, when we get to primitive type patterns) ?- Statically total patterns match null (because they match everything) ?- instanceof intercepts nulls: x instanceof P === x != null && x matches P ?- switch intercepts nulls: as above ?- Let does not have any new opinions about null ?- Nesting unrolls to instanceof only for non-statically-total nested patterns Also I think being more explicit about the switch/instanceof rules will help, as this filters out all top-level null reasoning. From brian.goetz at oracle.com Thu Mar 3 23:22:08 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 18:22:08 -0500 Subject: [External] : Re: Primitive type patterns In-Reply-To: References: <8d47550c-b9c7-8eea-bb71-043b282984da@oracle.com> <445101940.9418309.1646070225203.JavaMail.zimbra@u-pem.fr> <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> Message-ID: I read JLS 5.2 more carefully and discovered that while assignment context supports primitive narrowing from int-and-smaller to smaller-than-that: ??? byte b = 0 it does not support primitive narrowing from long to int: ??? int x = 0L? // error My best guess at rationale is that because there is no suffix for int/short/byte, then int literals are like "poly expressions" but long literals are just long literals.? That's an irritating asymmetry (but, fixable.) > In addition, if the expression is a constant expression (?15.29) of > type byte, short, > char, or int: > ? A narrowing primitive conversion may be used if the variable is of > type byte, > short, or char, and the value of the constant expression is > representable in the > type of the variable. > ? A narrowing primitive conversion followed by a boxing conversion may > be used > if the variable is of type Byte, Short, or Character, and the value of > the constant > expression is representable in the type byte, short, or char respectively. On 3/3/2022 10:17 AM, Dan Heidinga wrote: > On Wed, Mar 2, 2022 at 3:13 PM Brian Goetz wrote: >> >> >> On 3/2/2022 1:43 PM, Dan Heidinga wrote: >>> Making the pattern match compatible with assignment conversions makes >>> sense to me and follows a similar rationale to that used with >>> MethodHandle::asType following the JLS 5.3 invocation conversions. >>> Though with MHs we had the ability to add additional conversions under >>> MethodHandles::explicitCastArguments. With pattern matching, we don't >>> have the same ability to make the "extra" behaviour opt-in / opt-out. >>> We just get one chance to pick the right behaviour. >> Indeed. And the thing that I am trying to avoid here is creating _yet >> another_ new context in which a different bag of ad-hoc conversions are >> possible. While it might be justifiable from a local perspective to say >> "its OK if `int x` does unboxing, but having it do range checking seems >> new and different, so let's not do that", from a global perspective, >> that means we a new context ("pattern match context") to add to >> assignment, loose invocation, strict invocation, cast, and numeric >> contexts. That is the kind of incremental complexity I'd like to avoid, >> if there is a unifying move we can pull. > I'm in agreement on not adding new contexts but I had the opposite > impression here. Doesn't "having it do range checking" require a new > context as this is different from what assignment contexts allow > today? Or is it the case that regular, non-match assignment must be > total with no left over that allows them to use the same context > despite not being able to do the dynamic range check? As this > sentence shows, I'm confused on how dynamic range checking fits in the > existing assignment context. > > Or are we suggesting that assignment allows: > > byte b = new Long(5); > > to succeed if we can unbox + meet the dynamic range check? I'm > clearly confused here. > >> Conversions like unboxing or casting are burdened by the fact that they >> have to be total, which means the "does it fit" / "if so, do it" / "if >> not, do something else (truncate, throw, etc)" all have to be crammed >> into a single operation. What pattern matching is extracts the "does it >> fit, and if so do it" into a more primitive operation, from which other >> operations can be composed. > Is it accurate to say this is less reusing assignment context and more > completely replacing it with a new pattern context from which > assignment can be built on top of? > >> At some level, what I'm proposing is all spec-shuffling; we'll either >> say "a widening primitive conversion is allowed in assignment context", >> or we'll say that primitive `P p` matches any primitive type Q that can >> be widened to P. We'll end up with a similar number of rules, but we >> might be able to "shake the box" to make them settle to a lower energy >> state, and be able to define (whether we explicitly do so or not) >> assignment context to support "all the cases where the LHS, viewed as a >> type pattern, are exhaustive on the RHS, potentially with remainder, and >> throws if remainder is encountered." (That's what unboxing does; throws >> when remainder is encountered.) > Ok. So maybe I'm not confused. We'd allow the `byte b = new Long(5);` > code to compile and throw not only on a failed unbox, but also on a > dynamic range check failure. > > If we took this "dynamic hook" behaviour to the limit, what other new > capabilities does it unlock? Is this the place to connect other > user-supplied conversion operations as well? Maybe I'm running too > far with this idea but it seems like this could be laying the > groundwork for other interesting behaviours. Am I way off in the > weeds here? > >> As to the range check, it has always bugged me that you see code that >> looks like: >> >> if (i >= -127 && i <= 128) { byte b = (byte) i; ... } >> >> because of the accidental specificity, and the attendant risk of error >> (using <= instead of <, or using 127 instead of 128). Being able to say: >> >> if (i instanceof byte b) { ... } >> >> is better not because it is more compact, but because you're actually >> asking the right question -- "does this int value fit in a byte." I'm >> sad we don't really have a way to ask this question today; it seems an >> omission. > I had been thinking about this when I wrote my response and I like > having the compiler generate the range check for me. As you say, way > easier to avoid errors that way. > >>> Intuitively, the behaviour you propose is kind of what we want - all >>> the possible byte cases end up in the byte case and we don't need to >>> adapt the long case to handle those that would have fit in a byte. >>> I'm slightly concerned that this changes Java's historical approach >>> and may lead to surprises when refactoring existing code that treats >>> unbox(Long) one way and unbox(Short) another. Will users be confused >>> when the unbox(Long) in the short right range ends up in a case that >>> was only intended for unbox(Short)? I'm having a hard time finding an >>> example that would trip on this but my lack of imagination isn't >>> definitive =) >> I'm worried about this too. We examined it briefly, and ran away, when >> we were thinking about constant patterns, specifically: >> >> Object o = ... >> switch (o) { >> case 0: ... >> default: ... >> } >> >> What would this mean? What I wouldn't want it to mean is "match Long 0, >> Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the >> line for "magic". (Note that this is about defining what the _constant >> pattern_ means, not the primitive type pattern.) I think its probably >> reasonable to say this is a type error; 0 is applicable to primitive >> numerics and their boxes, but not to Number or Object. I think that is >> consistent with what I'm suggesting about primitive type patterns, but >> I'd have to think about it more. > Object o =... > switch(o) { > case (long)0: ... // can we say this? Probably not > case long l && l == 0: // otherwise this would become the way to > catch most of the constant 0 cases > default: .... > } > > I'm starting to think the constant pattern will feel less like magic > once the dynamic range checking becomes commonplace. > > >>> Something like following shouldn't be surprising given the existing >>> rules around unbox + widening primitive conversion (though it may be >>> when first encountered as I expect most users haven't really >>> internalized the JLS 5.2 rules): >> As Alex said to me yesterday: "JLS Ch 5 contains many more words than >> any prospective reader would expect to find on the subject, but once the >> reader gets over the overwhelm of how much there is to say, will find >> none of the words surprising." There's a deeper truth to this >> statement: Java is not actually as simple a language as its mythology >> suggests, but we win by hiding the complexity in places users generally >> don't have to look, and if and when they do confront the complexity, >> they find it unsurprising, and go back to ignoring it. >> >> So in point of fact, *almost no one* has read JLS 5.2, but it still does >> "what users would likely find reasonable". >> >>> Number n = ....; >>> switch(n) { >>> case long l -> ... >>> case int i -> .... // dead code >>> case byte b -> .... // dead code >>> default -> .... >>> } >> Correct. We have rules for pattern dominance, which are used to give >> compile errors on dead cases; we'd have to work through the details to >> confirm that `long l` dominates `int i`, but I'd hope this is the case. >> >>> But this may be more surprising as I suggested above >>> >>> Number n = new Long(5); >>> switch(n) { >>> case byte b -> .... // matches here >>> case int i -> .... // >>> case long l -> ... >>> default -> .... >>> } >>> >>> Overall, I like the extra dynamic range check but would be fine with >>> leaving it out if it complicates the spec given it feels like a pretty >>> deep-in-the-weeds corner case. >> It is probably not a forced move to support the richer interpretation of >> primitive patterns now. But I think the consequence of doing so may be >> surprising: rather than "simplifying the language" (as one might hope >> that "leaving something out" would do), I think there's a risk that it >> makes things more complicated, because (a) it effectively creates yet >> another conversion context that is distinct from the too-many we have >> now, and (b) creates a sharp edge where refactoring from local variable >> initialization to let-bind doesn't work, because assignment would then >> be looser than let-bind. > Ok. You're saying that the dynamic range check is essential enough > that it's worth a new context for if we can't adjust the meaning of > assignment context. > >> One reason this is especially undesirable is that one of the forms of >> let-bind is a let-bind *expression*: >> >> let P = p, Q = q >> in >> >> which is useful for pulling out subexpressions and binding them to a >> variable, but for which the scope of that variable is limited. If >> refactoring from: >> > Possible typo in the example. Attempted to fix: > >> int x = stuff; >> m(f(x)); >> >> to >> >> m(let x = stuff in f(x)) >> // x no longer in scope here > Not sure I follow this example. I'm not sure why introducing a new > variable in this scope is useful. > >> was not possible because of a silly mismatch between the conversions in >> let context and the conversions in assignment context, then we're >> putting users in the position of having to choose between richer >> conversions and richer scoping. > Ok. I think I see where this is going and while it may be clearer > with a larger example, I agree with the principle that this > refactoring should be possible. > > --Dan > >> (Usual warning (Remi): I'm mentioning let-expressions because it gives a >> sense of where some of these constraints come from, but this is not a >> suitable time to design the let-expression feature.) >> >> From forax at univ-mlv.fr Fri Mar 4 00:39:00 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 4 Mar 2022 01:39:00 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> Message-ID: <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Jim Laskey" , "amber-spec-experts" > > Sent: Thursday, March 3, 2022 7:59:01 PM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >>>> For the pattern matching, >>>> we also need a 'with' method, that return a method handle that takes a carrier >>>> and a value and return a new carrier with the component value updated. >>> It is not clear to me why we "need" this. Rather than jumping right to "Here is >>> the solution", can you instead try to shine some light on the problem you are >>> trying to solve? >> When you have nested record patterns, each of these patterns contribute to >> introduce bindings, so when executing the code of the pattern matching, the >> code that match a nested pattern needs to add values into the carrier object. >> Given that the Carrier API is non mutable, we need the equivalent of a >> functional setter, a wither. > I don't think we need to do this. > Recall what nested patterns means: if R(T t) is a record, and Q is a pattern > that is not total, then > x matches R(Q) > means > x matches R(T t) && t matches Q > So if we have > record R(S s) { } > record S(int a, int b) { } > then > case R(S(var a, var b)) > operates by matching the target to R, deconstructing with the R deconstructor, > which yields a carrier of shape (S). Then we further match the first component > of this carrier to the S deconstructor, which yields a carrier of shape (II). > No mutation needed. > Note that this unrolling can happen in one of two ways: > - The compiler just unrolls it doing plain vanilla compiler stuff > - A pattern runtime has a nesting combinator that takes a pattern description > for an outer and an inner pattern, which when evaluated with R and S, yields a > carrier of shape (R;S;II), the compiler evaluates this nesting combinator with > condy, and uses that to do the match. > Either way, we don't need to mutate or replace carriers. You want the same carrier for the whole pattern matching: - if you have a logical OR between patterns (not something in the current Java spec but Python, C# or clojure core.match have it so we may want to add an OR in the future) - if different cases starts with the same prefix of patterns, so you don't have to re-execute the de-constructors/pattern methods of the prefix several times It's like when you create a value type, you start with the default value with all the field initialized with their default value, and each time there is a new binding, you do the equivalent of a withfield. R?mi From forax at univ-mlv.fr Fri Mar 4 00:41:06 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 4 Mar 2022 01:41:06 +0100 (CET) Subject: [External] : Re: Primitive type patterns In-Reply-To: <8bb18d39-426b-1e45-ba60-0ea462024b15@oracle.com> References: <0e6b0a6c-5148-2dd6-5195-d4ea5e5ea89d@oracle.com> <1252930995.9426030.1646073289788.JavaMail.zimbra@u-pem.fr> <82a8dd98-b8b2-e54e-4c2c-8a1c675614f2@oracle.com> <1221659857.9463108.1646080532106.JavaMail.zimbra@u-pem.fr> <51f07830-a1bf-b705-2c89-53e71ccb066e@oracle.com> <693181479.10454030.1646249765675.JavaMail.zimbra@u-pem.fr> <8bb18d39-426b-1e45-ba60-0ea462024b15@oracle.com> Message-ID: <1312650162.11148649.1646354466019.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Wednesday, March 2, 2022 9:11:58 PM > Subject: Re: [External] : Re: Primitive type patterns > On 3/2/2022 2:36 PM, forax at univ-mlv.fr wrote: > >> There are two ways to express "match non null Integer + unboxing", >> this one >> Integer value = ... >> switch(value) { >> case Integer(int i) -> ... >> } >> >> And we already agree that we want that syntax. > > Wait, what?? The above is not yet on the table; we will have to wait for > deconstruction patterns on classes to be able to express that. When we > get there, we'll have a choice of whether we want to add a deconstructor > to the wrapper classes.? (At which point, you might well say "we already > have a way to do that"...) It's a bad faith argument, we know since the beginning that we need deconstructors. R?mi From forax at univ-mlv.fr Fri Mar 4 00:50:13 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 4 Mar 2022 01:50:13 +0100 (CET) Subject: Telling the totality story In-Reply-To: References: Message-ID: <1803574989.11148833.1646355013391.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Thursday, March 3, 2022 9:42:25 PM > Subject: Telling the totality story > Given the misconceptions about totality and whether "pattern matching means the > same thing in all contexts", it is pretty clear that the story we're telling is > not evoking the right concepts. It is important not only to have the right > answer, but also to have the story that helps people understand why its right, > so let's take another whack at that. > (The irony here is that it only makes a difference at one point -- null -- which > is the least interesting part of the story. So it is much sound and fury that > signifies nothing.) > None of what we say below represents a change in the language; it is just > test-driving a new way to tell the same story. (From a mathematical and > specification perspective, it is a much worse story, since it is full of > accidental detail about nulls (among other sins), but it might make some people > happier, and maybe we can learn something from the reaction.) > The key change here is to say that every pattern-matching construct needs > special rules for null. This is already true, so we're just doubling down on > it. None of this changes anything about how it works. [...] > #### Let > This is where it gets ugly. Let has no opinions about null, but the game here is > to pretend it does. So in a let statement: > let P = e > - Evaluate e > - If e is null > - If P is a covering pattern, its binding is bound to null; > - else we throws NPE > - Otherwise, e is matched to P. If it does not match (its remainder), a > MatchException is thrown (or the else clause is taken, if one is present.) It's not clear to me why a MatchException should be thrown instead of not compiling if not exhaustive. It's mean that there are remainders that does not lead to either a NPE or an error, do you have an example of such remainder ? R?mi From brian.goetz at oracle.com Fri Mar 4 02:11:44 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 21:11:44 -0500 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> Message-ID: > > Either way, we don't need to mutate or replace carriers. > > > You want the same carrier for the whole pattern matching: I think you're going about this backwards.? You seem to have a clear picture of how pattern matching "should" be translated.? If so, you should share!? Maybe your way is better.? But you keep making statements like "we need" and "we want" without explaining why. > - if you have a logical OR between patterns (not something in the > current Java spec but Python, C# or clojure core.match have it so we > may want to add an OR in the future) OR combinators are a good point, but they can be done without a with operation. > - if different cases starts with the same prefix of patterns, so you > don't have to re-execute the de-constructors/pattern methods of the > prefix several times Agree that optimizing away multiple invocations is good, but again, I don't see that as being coupled to the pseudo-mutability of the carrier. Perhaps you should start with how you see translation working? From brian.goetz at oracle.com Fri Mar 4 02:23:58 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 3 Mar 2022 21:23:58 -0500 Subject: [External] : Re: Telling the totality story In-Reply-To: <1803574989.11148833.1646355013391.JavaMail.zimbra@u-pem.fr> References: <1803574989.11148833.1646355013391.JavaMail.zimbra@u-pem.fr> Message-ID: <5e2e4cc7-56ef-c1bc-a8ba-4db6002582cd@oracle.com> > > #### Let > > This is where it gets ugly.? Let has no opinions about null, but > the game here is to pretend it does.? So in a let statement: > > ??? let P = e > > ?- Evaluate e > ?- If e is null > ?? - If P is a covering pattern, its binding is bound to null; > ?? - else we throws NPE > ?- Otherwise, e is matched to P.? If it does not match (its > remainder), a MatchException is thrown (or the else clause is > taken, if one is present.) > > > It's not clear to me why a MatchException should be thrown instead of > not compiling if not exhaustive. You're confusing "exhaustive" and "total".? A let must be exhaustive, but exhaustiveness != totality. > It's mean that there are remainders that does not lead to either a NPE > or an error, do you have an example of such remainder ? Yes.? Suppose we have records Box(T t) and Pair(T t, U u), and A is sealed to B|C.? Then if we're matching on a Pair, A>, then ??? Pair(null, B) ??? Pair(Box(B), D)? // D is a type from the future ??? Pair(Box(D), B) ??? Pair(Box(D), D) ??? Pair(null, D) are all in the remainder.? It is a big stretch to claim either NPE or ICCE is right for any of these, and completely arbitrary to pick one for Pair(null, D).? Also, its much more expensive to try to distinguish between these, and pick the "right" error for each, rather than insert a default clause that throws MatchRemainderException. From forax at univ-mlv.fr Fri Mar 4 10:37:08 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 4 Mar 2022 11:37:08 +0100 (CET) Subject: [External] : Re: Telling the totality story In-Reply-To: <5e2e4cc7-56ef-c1bc-a8ba-4db6002582cd@oracle.com> References: <1803574989.11148833.1646355013391.JavaMail.zimbra@u-pem.fr> <5e2e4cc7-56ef-c1bc-a8ba-4db6002582cd@oracle.com> Message-ID: <1420997074.11396225.1646390228184.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Friday, March 4, 2022 3:23:58 AM > Subject: Re: [External] : Re: Telling the totality story >>> #### Let >>> This is where it gets ugly. Let has no opinions about null, but the game here is >>> to pretend it does. So in a let statement: >>> let P = e >>> - Evaluate e >>> - If e is null >>> - If P is a covering pattern, its binding is bound to null; >>> - else we throws NPE >>> - Otherwise, e is matched to P. If it does not match (its remainder), a >>> MatchException is thrown (or the else clause is taken, if one is present.) >> It's not clear to me why a MatchException should be thrown instead of not >> compiling if not exhaustive. > You're confusing "exhaustive" and "total". A let must be exhaustive, but > exhaustiveness != totality. >> It's mean that there are remainders that does not lead to either a NPE or an >> error, do you have an example of such remainder ? > Yes. Suppose we have records Box(T t) and Pair(T t, U u), and A is > sealed to B|C. Then if we're matching on a Pair, A>, then > Pair(null, B) > Pair(Box(B), D) // D is a type from the future > Pair(Box(D), B) > Pair(Box(D), D) > Pair(null, D) > are all in the remainder. It is a big stretch to claim either NPE or ICCE is > right for any of these, and completely arbitrary to pick one for Pair(null, D). > Also, its much more expensive to try to distinguish between these, and pick the > "right" error for each, rather than insert a default clause that throws > MatchRemainderException. The exception are here because the view at runtime and the view at compile time are slightly different - an NPE is raised when a value is null but the pattern ask for a deconstruction - an ICCE if the compiler has use a sealed type during it's analysis and the sealed type has changed if we have only one case Pair<>(Box<>(A a1), A a2) and i suppose a default then Pair(null, B) and Pair(null, D) throw a NPE and Pair(Box(B), D), Pair(Box(D), B) and Pair(Box(D), D) all match Now, if we have sealed A permits B, C { } and a switch that relies on A when doing the exhaustiveness check By example, switch(pair) { case Pair<>(Box<>(A a1), B a2) -> ... case Pair<>(Box<>(A a1), C a2) -> ... case Pair<>(Object o) -> ... } in that case, the compiler registers that at runtime before the first execution of the switch, A should only permit B and C, if it's not the case an ICCE should be thrown. In term of implementation, the ICCE checks can be done by the bootstrap method of the invokedynamic corresponding to the switch, so once before the switch/invokedynamic is linked. One open question, i think first asked by Tagir, is what if Object (in the example above) is not a super-type of Box anymore, should we also checks that at runtime and throw an ICCE. R?mi From brian.goetz at oracle.com Fri Mar 4 13:00:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 4 Mar 2022 08:00:10 -0500 Subject: [External] : Re: Telling the totality story In-Reply-To: <1420997074.11396225.1646390228184.JavaMail.zimbra@u-pem.fr> References: <1803574989.11148833.1646355013391.JavaMail.zimbra@u-pem.fr> <5e2e4cc7-56ef-c1bc-a8ba-4db6002582cd@oracle.com> <1420997074.11396225.1646390228184.JavaMail.zimbra@u-pem.fr> Message-ID: On 3/4/2022 5:37 AM, forax at univ-mlv.fr wrote: > > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"Remi Forax" > *Cc: *"amber-spec-experts" > *Sent: *Friday, March 4, 2022 3:23:58 AM > *Subject: *Re: [External] : Re: Telling the totality story > > > > #### Let > > This is where it gets ugly.? Let has no opinions about > null, but the game here is to pretend it does.? So in a > let statement: > > ??? let P = e > > ?- Evaluate e > ?- If e is null > ?? - If P is a covering pattern, its binding is bound to null; > ?? - else we throws NPE > ?- Otherwise, e is matched to P.? If it does not match > (its remainder), a MatchException is thrown (or the else > clause is taken, if one is present.) > > > It's not clear to me why a MatchException should be thrown > instead of not compiling if not exhaustive. > > > You're confusing "exhaustive" and "total".? A let must be > exhaustive, but exhaustiveness != totality. > > It's mean that there are remainders that does not lead to > either a NPE or an error, do you have an example of such > remainder ? > > > Yes.? Suppose we have records Box(T t) and Pair(T t, U > u), and A is sealed to B|C.? Then if we're matching on a > Pair, A>, then > > ??? Pair(null, B) > ??? Pair(Box(B), D)? // D is a type from the future > ??? Pair(Box(D), B) > ??? Pair(Box(D), D) > ??? Pair(null, D) > > are all in the remainder.? It is a big stretch to claim either NPE > or ICCE is right for any of these, and completely arbitrary to > pick one for Pair(null, D).? Also, its much more expensive to try > to distinguish between these, and pick the "right" error for each, > rather than insert a default clause that throws > MatchRemainderException. > > > The exception are here because the view at runtime and the view at > compile time are slightly different > - an NPE is raised when a value is null but the pattern ask for a > deconstruction > - an ICCE if the compiler has use a sealed type during it's analysis > and the sealed type has changed This was a thread about pedagogy.? You asked a factual question (though tangential) for clarification, which is fine, and got an answer.? But then you used the answer to spin off into a "let me redesign a feature that is unrelated to the thread."? That's not so good. There actually was a thread about exceptions, where we could have discussed this (and we did, I think; these points here mostly seem repeated.)? By diverting this thread, it means we may not get back to the main point -- something I put time into because I wanted feedback.? So far, we're three replies-to-replies into the weeds, and no one has talked about pedagogy.? And likely, no one will, because once a thread has been diverted, it rarely comes back. From asviraspossible at gmail.com Fri Mar 4 15:43:11 2022 From: asviraspossible at gmail.com (Victor Nazarov) Date: Fri, 4 Mar 2022 16:43:11 +0100 Subject: Telling the totality story In-Reply-To: References: Message-ID: Hello Brian and experts, I've tried multiple times to come up with some constructive feedback, but it seems very hard in this case. After trying to analyse this thread along with the "Treatment of total patterns", I can come up with the following points: * I've used switch on Patterns a lot in Java 17 preview, and had no problems with treatment of null. "Treatment of total patterns" proposed changes based on "people are still uncomfortable with `case Object o` binding null to o", but I don't fully understand the exact problems that people see with. * It's hard for me to discuss didactic aspects of pattern matching, because I haven't fully recovered from the introduction of explicit null processing in switch. For me, the original model as introduced in Java 17 was a simpler one, both mathematically and didactically, and I can't fully justify to myself the switch. * If we are going to accept the complexity of additional null processing, then I have to (and this is surprising to me) say that the story presented here is more "natural" then the previous story. This story sounds to me like better following the single responsibility principle, in that there is a single null-handling responsibility. I think even when I had a previous story, mentally I've translated it into this story, even when I've tried not to admit it to myself. * The story itself is still hideous and I hate it too, but I think the problem is not the story, but the model itself. And the solution is not a better story, but a better underlying model, possibly a switch back to a previous model. If I can indulge myself and go back to think about other possible models, I can collect the following facts that seem important to the model: * There are statically total and exhaustive patterns and people seem to confuse those. * There are two sources of exhaustive pattern remainder: 1. new classes or constants from the future that are missing during the compilation, but appear during the execution 2. unexpected null, null expectation is determined by the totality of pattern, which is a static property. We can observe the following: * When new class or constant appears, then switch-without-default fails to compile, switch-with-default silently accepts new class or constant * When totality of pattern changes, then switch-without-default start to throw on null, switch-with-default start to throw on null The thing, that I think people may want, is to get some "fails to compile" * When totality of pattern changes, then switch-without-default fails to compile, switch-with-default accepts null We already can have desired switch-with-default behaviour, by replacing `default` with `case default, null`, but we lack the instruments to make switch-without-default fail to compile. The current model doesn't provide this behaviour, what it does is introduces another dimension for switch: switch[{with/without} null, {with/without} default], that only affects dynamic behaviour of the switch, but not compile-time behaviour. Java has a long tradition to avoid any "action at distance" and instead to encode static properties at the type level and make compilation fail, when static properties change, so maybe we need to do it here and to have a way to encode this static property, so that the example below will fail to compile. record B(String s); record A(B b) A a = new A(null); String result = switch (a) no-null-in-remainder { case A(B(String s)) -> s; }; // error: there exists a remainder values with null But the adjusted version will successfully compile: record B(String s); record A(B b) A a = new A(null); String result = switch (a) no-null-in-remainder { case null -> "null"; case A(B b) -> a.toString(); }; // ok Having the same example, but without `no-null-in-remainder` will succeed: record B(String s); record A(B b) A a = new A(null); String result = switch (a) { case A(B(String s)) -> s; }; // ok, throw MatchingException Possibly we should have compilation fail when switch is annotated with `no-null-in-remainder`, but have a remainder, and possibly compilation should fail when there is no remainder, but switch is annotated with `no-null-in-remainder`. This will mimic rules for methods annotated with throws. I think a proposal like this was disposed early, because we do not want to introduce new flags that people need to always set and will regret when they do not set them, but this argument doesn't work here, because here the compiler will ask the user to set the required flag or to remove it. `no-null-in-remainder` is a static property only, there is no code that distinguishes between null and unexpected class or unexpected constant in the runtime. In all cases the same MatchException is thrown. `no-null-in-remainder` is defined so that there is no value in the remainder containing null, this can be implemented because switch already does exhaustiveness check at the compile time. >From a didactic point of view `no-null-in-remainder` is about exhaustiveness, it's not about null handling or about the semantics of pattern matching. Pattern matching is not affected at all, it's just the additional static check. Something akin to previously proposed check for instanceof, that required the pattern not to be total. `no-null-in-remainder` can be called `total`, but this is very confusing given that switch is already expected to be exhaustive and it's the presence of "default" that controls "totality". `no-null-in-remainder` can be called `no-remainder`, but this requires us to think about a difference between compile time and run-time, at run-time we can still get an exception from newly appeared, separately compiled new class, so I think it's important to say that this has something to do with "null". The syntactic form that I've chosen is here to mimic those of a method's "throws" clause. Here we try to state that "switch" doesn't throw a MatchException, caused by the null value, so we may probably use the word "throw", like `no-throw-on-unmatched-null`. Refactoring between nested switches is preserved only when the no-null-in-remainder property is preserved: switch (x) { case P(Q): B case P(T): C } is exactly the same as switch (x) { case P(var alpha): switch(alpha) no-null-in-remainder { case Q: B case T: C } } } Compilation will fail on missing or excessive `no-null-in-remainder` annotations during the refactoring. I feel that `case null` in the current proposal tried to do something like this, but failed to achieve it because it carried away with `case null` being just a case for null value, and not an annotation affecting exhaustive analysis. The proposed model requires some good syntax for the `no-null-in-remainder`, this is another problem, but maybe we can adopt just the `null` keyword, as illustrated with compilation errors below. Again we can say that this annotation is not about null, but about remainder, but when we frame it like this we should talk about static and dynamic behaviour separately. We can not prevent introduction of new classes and constants from separate compilation. But if we frame it as something about null only, then there is no difference. String result = switch (a) { case null -> "null"; case A(B b) -> a.toString(); }; // compilation error: all null-containing values are matched, switch should be annotated with `null` String result = switch (a) null { case null -> "null"; case A(B b) -> a.toString(); }; // ok class A = P | Q A alpha = ...; switch(alpha) null { case P p: B case Q q: C }; // compilation error: not all null-containing values are matched, switch should not be annotated with `null`, possible null containing values: {null} String result = switch (a) null { case null -> "null"; case A(B(String s)) -> a.toString(); }; // compilation error: not all null-containing values are matched, switch should not be annotated with `null`, possible null containing values: {A(null)} -- Victor Nazarov On Thu, Mar 3, 2022 at 9:42 PM Brian Goetz wrote: > Given the misconceptions about totality and whether "pattern matching > means the same thing in all contexts", it is pretty clear that the story > we're telling is not evoking the right concepts. It is important not > only to have the right answer, but also to have the story that helps > people understand why its right, so let's take another whack at that. > > (The irony here is that it only makes a difference at one point -- null > -- which is the least interesting part of the story. So it is much > sound and fury that signifies nothing.) > > None of what we say below represents a change in the language; it is > just test-driving a new way to tell the same story. (From a > mathematical and specification perspective, it is a much worse story, > since it is full of accidental detail about nulls (among other sins), > but it might make some people happier, and maybe we can learn something > from the reaction.) > > The key change here is to say that every pattern-matching construct > needs special rules for null. This is already true, so we're just > doubling down on it. None of this changes anything about how it works. > > #### Instanceof > > We define: > > x instanceof P === (x != null) && x matches P > > This means that instanceof always says false on null; on non-null, we > can ask the pattern. This is consistent with instanceof today. P will > just never see a null in this case. We'll define "matches" soon. > > #### Switch > > We play the same game with switch. A switch: > > switch (x) { > case P1: > case P2: > ... > } > > is defined to mean: > > - if x is null > - if one of P1..Pn is the special label "null", then we take that > branch > - else we throw NPE > - otherwise, we start trying to match against P1..Pn sequentially > - if none match (its remainder), throw MatchException > > This is consistent with switch today, in that there are no null labels, > so null always falls into the NPE bucket. It is also easy to keep track > of; if the switch does not say "case null", it does not match null. > None of the patterns P1..Pn will ever see a null. > > #### Covering patterns > > Let's define what it means for a pattern to cover a type T. This is just > a new name for totality; we say that an "any" pattern covers all types > T, and a type pattern `U u` covers T when `T <: U`. (When we do > primitive type patterns, there will be more terms in this list, but > they'll still be type patterns.) No other pattern is a covering > pattern. Covering patterns all have a single binding (`var x`, `String > s`). > > #### Let > > This is where it gets ugly. Let has no opinions about null, but the > game here is to pretend it does. So in a let statement: > > let P = e > > - Evaluate e > - If e is null > - If P is a covering pattern, its binding is bound to null; > - else we throws NPE > - Otherwise, e is matched to P. If it does not match (its remainder), > a MatchException is thrown (or the else clause is taken, if one is > present.) > > #### Nesting > > Given a nested pattern R(P), where R is `record R(T t) { }`, this means: > > - Match the target to record R, and note the resulting t > - if P is a covering pattern, we put the resulting `t` in the > binding of P without further matching > - Otherwise, we match t to P > > In other words: when a "non trivial" (i.e., non covering) pattern P is > nested in a record pattern, then: > > x instanceof R(P) === x instanceof R(var a) && a instanceof P > > and a covering pattern is interpreted as "not really a pattern", and we > just slam the result of the outer binding into the inner binding. > > #### Matching > > What we've done now is ensure that no pattern ever actually encounters a > null, because the enclosing constructs always filter the nulls out. > This means we can say: > > - var patterns match everything > - `T t` is gated by `instanceof T` > - R(P) is gated by `instanceof R` > > > ## Is this a better way to tell the story? > > This is clearly a much more complex way to tell the story; every > construct must have a special case for null, and we must be prepared to > treat simple patterns (`var x`, `String s`) not always as patterns, but > sometimes as mere declarations whose bindings will be forcibly crammed > from "outside". > > Note too this will not scale to declared total patterns that treat null > as a valid target value, if we ever want them; it depends on being able > to "peek inside" the pattern from the outside, and say "oh, you're a > covering pattern, I can set your binding without asking you." That > won't work with declared total patterns, which are opaque and may have > multiple bindings. (Saying "this doesn't bother me" is saying "I'm fine > with a new null gate for total declared patterns.") > > It also raises the profile of null in the story, which kind of sucks; > while it is not new "null gates", it is new stories that is prominently > parameterized by null. (Which is a funny outcome for a system > explicitly designed to *not* create new null gates!) > > > #### Further editorializing > > So, having written it out, I dislike it even more than that I thought I > would when I started writing this mail. Overall this seems like a lot of > mental gymnastics to avoid acknowledging the reality that pattern > matching has three parameters, not two -- the pattern, the target, and > the *static type* of the target. > > I think the root cause here is that when we faced the lump-or-split > decision of "do we reuse instanceof for pattern matching or not" back in > Java 13, the choice to lump (which was a rational and defensible choice, > but so was splitting) created an expectation that "no pattern ever > matches null", because instanceof is the first place people saw > patterns, and instanceof never matches null. While this expectation is > only consistent with the simplest of patterns (and is only problematic > at null), there seems to be a desire to force "the semantics of matching > pattern P is whatever instanceof says matches P." > > I suspect that had we chosen the split route for instanceof (which would > not have been as silly as choosing the split route for switch, but still > lumping seemed better), we would not be facing this situation. But even > if we decide, in hindsight, that lumping was a mistake, I do not want to > compound this mistake by distorting the meaning of pattern matching to > cater to it. It might seem harmless from the outside, but it will > dramatically limit how far we can push pattern matching in the future. > > > ## Tweaking the current story > > So, I think the main thing we can control about the story is the > terminology. I think part of what people find confusing is the use of > the term "total", since that's a math-y term, and also it collides with > "exhaustive", which is similar but not entirely coincident. > > I won't try to find new terms here, but I think there are three relevant > terms needed: > > - "Pattern set P* is exhaustive on type T." This is a static > property; it means that the pattern set P* covers all the "reasonable" > values of T. Null is only one "potentially unreasonable" value. > - "Pattern P matches expression e". This is a dynamic property. > - "Pattern P is total on type T". This is both a static and dynamic > property; it means the pattern matches all values of T, and that we know > this statically. > > What we've been calling "remainder" is the set of T values that are not > matched by any of the patterns in a set P* that is exhaustive on T. > (Exhaustiveness is static; matching is dynamic; the difference is > remainder.) > > I think the name "exhaustive" is pretty good; we've been using the term > "exhaustiveness checking" for switch for a while. The problem could be > "matches", as it has a dynamic sound to it; the problem could also be > "total". The above excursion calls totality "covering", but having both > "exhaustive" and "covering" in the same scheme is likely to confuse > people, so I won't suggest that. Perhaps "statically total"? This has > the nice implication that matching is statically known to always > succeed. This would give us: > > - If P is statically total on T, it is exhaustive on T; > - Any patterns (var x) are statically total on all T; > - Type patterns `T t` are statically total on `U <: T` (and more, when > we get to primitive type patterns) > - Statically total patterns match null (because they match everything) > - instanceof intercepts nulls: x instanceof P === x != null && x matches > P > - switch intercepts nulls: as above > - Let does not have any new opinions about null > - Nesting unrolls to instanceof only for non-statically-total nested > patterns > > Also I think being more explicit about the switch/instanceof rules will > help, as this filters out all top-level null reasoning. > > From brian.goetz at oracle.com Fri Mar 4 17:34:33 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 4 Mar 2022 12:34:33 -0500 Subject: Fwd: Yielding blocks In-Reply-To: References: Message-ID: <1abd667c-15b9-a0bd-5749-767fc18df9cb@oracle.com> The following was received on the -comments list. Summary: "Can we please have block expressions, you're almost there with the yielding block in switch expressions." My observations: There was some discussion around the time we did switch expressions about whether we wanted a general-purpose block expression; it was certainly clear that we were inventing a limited form of block expression, and the worst thing about what we did back then was open the door to block expressions a bit, raising the inevitable questions about "why not just throw open the door".? While I can imagine wanting to open up on this at some point in the future, and am sympathetic to the notion that we want to express some things as expressions that currently require statements, I'm still not particularly motivated to throw open this door at this time. -------- Forwarded Message -------- Subject: Yielding blocks Date: Fri, 4 Mar 2022 01:22:19 +0200 From: Dimitris Paltatzidis To: amber-spec-comments at openjdk.java.net Methods, ignoring their arguments, form 2 categories: 1. Those that do not return a value (void). 2. Those that do. Now, stripping away their signature, we get anonymous methods. Java supports category 1, "blocks". They can be local, instance initializers or even static. On the other end, category 2 is missing. Consider the examples below of trying to initialize a local variable, with a complex computation. A. (Verbose and garbage producing) Type a = ((Supplier) () -> { . . return .. ; }).get(); B. (The hack, hard to reason) Type a = switch (0) { default -> { . . yield .. ; } }; C. (Does not exist) Type a = { . . yield .. ; } All of them are equal in power, yet C is compact and readable. Of course, no one will resort to A or B, instead they will compute in the same block as the variable in question (a). Unfortunately, that has a few drawbacks: 1. Pollution of the block with temporary variables that won't be needed further down the line, and at the same time, prohibiting the usage of the same name, especially if they were final. 2. Hard to reason where statements and computations start and end, without resorting to comments. Now, with limited power we can have: D. (Legal) Type a; { . . a = .. ; } Actually D has fewer characters than C, for small enough variable names. It also solves the pollution problems stated above. But, as already mentioned, it lacks in power. We can't use it in places where only 1 statement is permitted, like computing the parameter of a method inside its parenthesis. With C, we can even initialize instance variables (or static) that require multiple statements, without resorting to an instance initializer block. Basically, computing a value ad-hoc that fits everywhere. The block right of the arrow -> of a switch case is basically it. It just needs a promotion to stand on its own. From amaembo at gmail.com Sat Mar 5 02:38:27 2022 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 5 Mar 2022 09:38:27 +0700 Subject: Yielding blocks In-Reply-To: <1abd667c-15b9-a0bd-5749-767fc18df9cb@oracle.com> References: <1abd667c-15b9-a0bd-5749-767fc18df9cb@oracle.com> Message-ID: For the record, I suggested a similar enhancement two years ago: https://mail.openjdk.java.net/pipermail/amber-spec-experts/2020-March/002046.html The difference is that I think that explicit prefix, like `do { ... }` would be better. In particular, it helps to disambiguate between array initializer and block expression.. With best regards, Tagir Valeev On Sat, Mar 5, 2022 at 12:36 AM Brian Goetz wrote: > > The following was received on the -comments list. > > Summary: "Can we please have block expressions, you're almost there with the yielding block in switch expressions." > > My observations: There was some discussion around the time we did switch expressions about whether we wanted a general-purpose block expression; it was certainly clear that we were inventing a limited form of block expression, and the worst thing about what we did back then was open the door to block expressions a bit, raising the inevitable questions about "why not just throw open the door". While I can imagine wanting to open up on this at some point in the future, and am sympathetic to the notion that we want to express some things as expressions that currently require statements, I'm still not particularly motivated to throw open this door at this time. > > > -------- Forwarded Message -------- > Subject: Yielding blocks > Date: Fri, 4 Mar 2022 01:22:19 +0200 > From: Dimitris Paltatzidis > To: amber-spec-comments at openjdk.java.net > > > Methods, ignoring their arguments, form 2 categories: > 1. Those that do not return a value (void). > 2. Those that do. > > Now, stripping away their signature, we get anonymous methods. Java supports > category 1, "blocks". They can be local, instance initializers or even > static. > On the other end, category 2 is missing. Consider the examples below of > trying > to initialize a local variable, with a complex computation. > > A. (Verbose and garbage producing) > Type a = ((Supplier) () -> { > . > . > return .. ; > }).get(); > > B. (The hack, hard to reason) > Type a = switch (0) { > default -> { > . > . > yield .. ; > } > }; > > C. (Does not exist) > Type a = { > . > . > yield .. ; > } > > All of them are equal in power, yet C is compact and readable. Of course, no > one will resort to A or B, instead they will compute in the same block as > the > variable in question (a). Unfortunately, that has a few drawbacks: > 1. Pollution of the block with temporary variables that won't be needed > further > down the line, and at the same time, prohibiting the usage of the same > name, > especially if they were final. > 2. Hard to reason where statements and computations start and end, without > resorting to comments. > > Now, with limited power we can have: > > D. (Legal) > Type a; { > . > . > a = .. ; > } > > Actually D has fewer characters than C, for small enough variable names. It > also > solves the pollution problems stated above. But, as already mentioned, it > lacks > in power. We can't use it in places where only 1 statement is permitted, > like > computing the parameter of a method inside its parenthesis. > > With C, we can even initialize instance variables (or static) that require > multiple statements, without resorting to an instance initializer block. > Basically, computing a value ad-hoc that fits everywhere. > > The block right of the arrow -> of a switch case is basically it. It just > needs > a promotion to stand on its own. From brian.goetz at oracle.com Sat Mar 5 21:11:02 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 5 Mar 2022 16:11:02 -0500 Subject: [External] : Re: Yielding blocks In-Reply-To: References: <1abd667c-15b9-a0bd-5749-767fc18df9cb@oracle.com> Message-ID: <4289df2c-c219-2a0a-e874-a3328aa5c924@oracle.com> I certainly agree that blocks would need something to prefix them.? I don't necessarily object to the concept, but we have other, higher-leverage activities I'd rather pursue instead right now. On 3/4/2022 9:38 PM, Tagir Valeev wrote: > For the record, I suggested a similar enhancement two years ago: > https://mail.openjdk.java.net/pipermail/amber-spec-experts/2020-March/002046.html > The difference is that I think that explicit prefix, like `do { ... }` > would be better. In particular, it helps to disambiguate between array > initializer and block expression.. > > With best regards, > Tagir Valeev > > On Sat, Mar 5, 2022 at 12:36 AM Brian Goetz wrote: >> The following was received on the -comments list. >> >> Summary: "Can we please have block expressions, you're almost there with the yielding block in switch expressions." >> >> My observations: There was some discussion around the time we did switch expressions about whether we wanted a general-purpose block expression; it was certainly clear that we were inventing a limited form of block expression, and the worst thing about what we did back then was open the door to block expressions a bit, raising the inevitable questions about "why not just throw open the door". While I can imagine wanting to open up on this at some point in the future, and am sympathetic to the notion that we want to express some things as expressions that currently require statements, I'm still not particularly motivated to throw open this door at this time. >> >> >> -------- Forwarded Message -------- >> Subject: Yielding blocks >> Date: Fri, 4 Mar 2022 01:22:19 +0200 >> From: Dimitris Paltatzidis >> To:amber-spec-comments at openjdk.java.net >> >> >> Methods, ignoring their arguments, form 2 categories: >> 1. Those that do not return a value (void). >> 2. Those that do. >> >> Now, stripping away their signature, we get anonymous methods. Java supports >> category 1, "blocks". They can be local, instance initializers or even >> static. >> On the other end, category 2 is missing. Consider the examples below of >> trying >> to initialize a local variable, with a complex computation. >> >> A. (Verbose and garbage producing) >> Type a = ((Supplier) () -> { >> . >> . >> return .. ; >> }).get(); >> >> B. (The hack, hard to reason) >> Type a = switch (0) { >> default -> { >> . >> . >> yield .. ; >> } >> }; >> >> C. (Does not exist) >> Type a = { >> . >> . >> yield .. ; >> } >> >> All of them are equal in power, yet C is compact and readable. Of course, no >> one will resort to A or B, instead they will compute in the same block as >> the >> variable in question (a). Unfortunately, that has a few drawbacks: >> 1. Pollution of the block with temporary variables that won't be needed >> further >> down the line, and at the same time, prohibiting the usage of the same >> name, >> especially if they were final. >> 2. Hard to reason where statements and computations start and end, without >> resorting to comments. >> >> Now, with limited power we can have: >> >> D. (Legal) >> Type a; { >> . >> . >> a = .. ; >> } >> >> Actually D has fewer characters than C, for small enough variable names. It >> also >> solves the pollution problems stated above. But, as already mentioned, it >> lacks >> in power. We can't use it in places where only 1 statement is permitted, >> like >> computing the parameter of a method inside its parenthesis. >> >> With C, we can even initialize instance variables (or static) that require >> multiple statements, without resorting to an instance initializer block. >> Basically, computing a value ad-hoc that fits everywhere. >> >> The block right of the arrow -> of a switch case is basically it. It just >> needs >> a promotion to stand on its own. From brian.goetz at oracle.com Sat Mar 5 21:33:55 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 5 Mar 2022 16:33:55 -0500 Subject: Telling the totality story In-Reply-To: References: Message-ID: <5cf3a204-da45-fbfa-d9b3-aee348488ec2@oracle.com> > > So, I think the main thing we can control about the story is the > terminology.? I think part of what people find confusing is the use of > the term "total", since that's a math-y term, and also it collides > with "exhaustive", which is similar but not entirely coincident. One concept we might want to appeal to here is the notion of a "pattern that isn't really asking a question".? This is still going to depend on the static type of the target, but rather than appealing to a mathematical term like "total" or "statically total", perhaps something like "a vacuous pattern" (much like the statement "all hairy hairless apes are green" is vacuously true), or a "self evident" pattern, or "an unconditional pattern" or an "effectively any" pattern or a "constant pattern" [1], or an "undiscerning pattern", or ... Recall that the total / effectively any / unconditional / vacuous patterns are: ?- `var x` ?- an any pattern `_` , if we ever have it ?- A type pattern `T t`, when applied to a target `U <: T`. All other patterns at least ask some question. (FWIW, what the spec currently does is: syntactic patterns undergo a resolution to a runtime pattern as part of type analysis.? So given a target type of Object, the pattern `Object o` *resolves* to an any pattern, whereas `String s` resolves to the type pattern.? May or may not be helpful.) Perhaps if we had a term for "a pattern that is not asking anything", this may help to frame the story better. [1] Currently, we've been calling patterns that are just a literal (if we ever have them) "constant patterns", but we could also call those "literal patterns", if we wanted to reserve "constant" as a modifier to mean that the pattern is a constant.? Or not. From forax at univ-mlv.fr Sat Mar 5 22:54:14 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 5 Mar 2022 23:54:14 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> Message-ID: <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Jim Laskey" , "amber-spec-experts" > > Sent: Friday, March 4, 2022 3:11:44 AM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >>> Either way, we don't need to mutate or replace carriers. >> You want the same carrier for the whole pattern matching: > I think you're going about this backwards. You seem to have a clear picture of > how pattern matching "should" be translated. If so, you should share! Maybe > your way is better. But you keep making statements like "we need" and "we want" > without explaining why. >> - if you have a logical OR between patterns (not something in the current Java >> spec but Python, C# or clojure core.match have it so we may want to add an OR >> in the future) > OR combinators are a good point, but they can be done without a with operation. >> - if different cases starts with the same prefix of patterns, so you don't have >> to re-execute the de-constructors/pattern methods of the prefix several times > Agree that optimizing away multiple invocations is good, but again, I don't see > that as being coupled to the pseudo-mutability of the carrier. > Perhaps you should start with how you see translation working? Sure, the idea is that to execute the pattern matching at runtime, each step is decomposed into few higher order functions, things like testing, projecting a value (deconstructing), etc each higher order manipulate one kind of function that takes two values, the value we are actually matching and the carrier, and returns a carrier. Obviously, each simple function is a method handle, so there is no boxing in the middle and everything is inlined. Here is a possible decomposition - MH of(Object carrier, MH pattern) which is equivalent to o -> pattern.apply(o, carrier) - MH match(int index) which is equivalent to (o, carrier) -> with(index, carrier, 0), i.e. return a new carrier with the component 0 updated with index - MH do_not_match() which is equivalent to match(-1) - MH is_instance(Class type) which is equivalent to (o, carrier) -> type.isInstance(o) - MH is_null() which is equivalent to (o, carrier) -> o == null - MH throw_NPE(String message) which is equivalent to (o, carrier) -> throw new NPE(message) - MH project(MH project, MH pattern) which is equivalent to (o, carrier) -> pattern.apply(project.apply(o), carrier) - MH bind(int binding, MH pattern) which is equivalent to (o, carrier) -> pattern.apply(with(o, carrier, binding) - MH test(MH test, MH target, MH fallback) which is equivalent to (o, carrier) -> test.test(o, carrier)? target.apply(o, carrier): fallback.apply(o, carrier) - MH or(MH pattern1, MH pattern2) which is equivalent to (o, carrier) -> { var carrier2 = pattern1.apply(o, carrier); if (carrier2.accessor[0] == -1) { return carrier2; } return pattern2.apply(o, carrier2); } For the carrier, the convention is that the component 0 is an int, -1 means "not match", and any positive index means the indexth case match. In the detail, it's a little more complex because we sometimes need to pass the type of the first parameter to correctly type the returned MH and we also need an object CarrierMetadata that keep track of the type of the carrier components (and provides an empty carrier and the accessors/withers). Here is a small example record Point( int x, int y) {} record Rectangle(Point p1, Point p2) {} // Object o = ... //switch(o) { // case Rectangle(Point p1, Point p2) -> ... //} var lookup = MethodHandles. lookup (); var carrierMetadata = new CarrierMetadata( methodType (Object. class , int . class , Point. class , Point. class )); var empty = carrierMetadata.empty(); var op = of (empty, test ( is_instance (Object. class , Rectangle. class ), cast (Object. class , or (carrierMetadata, project ( record_accessor (lookup, Rectangle. class , 0 ), test ( is_null (Point. class ), do_not_match (Point. class , carrierMetadata), bind ( 1 , carrierMetadata))), project ( record_accessor (lookup, Rectangle. class , 1 ), test ( is_null (Point. class ), do_not_match (Point. class , carrierMetadata), bind ( 2 , carrierMetadata, match (Point. class , carrierMetadata, 0 )))) ) ), throw_NPE (Object. class , "o is null" ) ) ); // match: new Rectangle(new Point(1, 2), new Point(3, 4)) var rectangle1 = (Object) new Rectangle( new Point( 1 , 2 ), new Point( 3 , 4 )); var carrier1 = op.invokeExact(rectangle1); System. out .println( "result: " + ( int ) carrierMetadata.accessor( 0 ).invokeExact(carrier1)); System. out .println( "binding 1 " + (Point) carrierMetadata.accessor( 1 ).invokeExact(carrier1)); System. out .println( "binding 2 " + (Point) carrierMetadata.accessor( 2 ).invokeExact(carrier1)); // match: new Rectangle(new Point(1, 2), null) var rectangle2 = (Object) new Rectangle( new Point( 1 , 2 ), null ); var carrier2 = op.invokeExact(rectangle2); System. out .println( "result: " + ( int ) carrierMetadata.accessor( 0 ).invokeExact(carrier2)); System. out .println( "binding 1 " + (Point) carrierMetadata.accessor( 1 ).invokeExact(carrier2)); System. out .println( "binding 2 " + (Point) carrierMetadata.accessor( 2 ).invokeExact(carrier2)); The full code is available here: [ https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/Patterns.java | https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/Patterns.java ] I believe, using a function with two parameters, the actual value we are switching upon and the carrier that will gather the bindings is better than using only a carrier as parameter because in that case, you need to use the carrier to store all the intermediary objects even if they are not kept as bindings. R?mi From manoj.palat at in.ibm.com Mon Mar 7 07:08:11 2022 From: manoj.palat at in.ibm.com (Manoj Palat) Date: Mon, 7 Mar 2022 07:08:11 +0000 Subject: [18][guarded pattern] conditional-and query - spec clarification Message-ID: Hi, Given, public void bar(Object o) { int i = switch(o) { case String a && o != null ? true : false -> 1;//ecj flags syntax error here default -> 1; }; } ECJ(eclipse compiler for Java) flags a syntax error on the guarded pattern. However, javac accepts. Ecj translates this into: case ((String a) && (o != null)) ? true : false and flags an error instead of case ((String a) && ((o != null) ? true : false)) And I think the ecj is correct in flagging the error due to: From https://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html we see that Conditional-And Operator ?&&? has higher operator precedence than the Conditional Operator ??:? . From https://docs.oracle.com/javase/specs/jls/se17/html/jls-15.html#jls-15.23, we see that ?The conditional-and operator is syntactically left-associative (it groups left-to-right).? Also, I don't see any mention of the precedence changes in spec 420 [latest at https://cr.openjdk.java.net/~gbierman/jep420/latest] A more detailed reasoning (the above cut verbatim from here) captured at https://bugs.eclipse.org/bugs/show_bug.cgi?id=578856#c1 Is there something which I am missing in this reasoning here? Regards, Manoj From forax at univ-mlv.fr Mon Mar 7 14:50:36 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 7 Mar 2022 15:50:36 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> References: <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> Message-ID: <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> > From: "Remi Forax" > To: "Brian Goetz" > Cc: "Jim Laskey" , "amber-spec-experts" > > Sent: Saturday, March 5, 2022 11:54:14 PM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >> From: "Brian Goetz" >> To: "Remi Forax" >> Cc: "Jim Laskey" , "amber-spec-experts" >> >> Sent: Friday, March 4, 2022 3:11:44 AM >> Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >>>> Either way, we don't need to mutate or replace carriers. >>> You want the same carrier for the whole pattern matching: >> I think you're going about this backwards. You seem to have a clear picture of >> how pattern matching "should" be translated. If so, you should share! Maybe >> your way is better. But you keep making statements like "we need" and "we want" >> without explaining why. >>> - if you have a logical OR between patterns (not something in the current Java >>> spec but Python, C# or clojure core.match have it so we may want to add an OR >>> in the future) >> OR combinators are a good point, but they can be done without a with operation. >>> - if different cases starts with the same prefix of patterns, so you don't have >>> to re-execute the de-constructors/pattern methods of the prefix several times >> Agree that optimizing away multiple invocations is good, but again, I don't see >> that as being coupled to the pseudo-mutability of the carrier. >> Perhaps you should start with how you see translation working? > Sure, > the idea is that to execute the pattern matching at runtime, each step is > decomposed into few higher order functions, things like testing, projecting a > value (deconstructing), etc > each higher order manipulate one kind of function that takes two values, the > value we are actually matching and the carrier, and returns a carrier. > Obviously, each simple function is a method handle, so there is no boxing in the > middle and everything is inlined. > Here is a possible decomposition > - MH of(Object carrier, MH pattern) > which is equivalent to o -> pattern.apply(o, carrier) > - MH match(int index) > which is equivalent to (o, carrier) -> with(index, carrier, 0), i.e. return a > new carrier with the component 0 updated with index > - MH do_not_match() > which is equivalent to match(-1) > - MH is_instance(Class type) > which is equivalent to (o, carrier) -> type.isInstance(o) > - MH is_null() > which is equivalent to (o, carrier) -> o == null > - MH throw_NPE(String message) > which is equivalent to (o, carrier) -> throw new NPE(message) > - MH project(MH project, MH pattern) > which is equivalent to (o, carrier) -> pattern.apply(project.apply(o), carrier) > - MH bind(int binding, MH pattern) > which is equivalent to (o, carrier) -> pattern.apply(with(o, carrier, binding) > - MH test(MH test, MH target, MH fallback) > which is equivalent to (o, carrier) -> test.test(o, carrier)? target.apply(o, > carrier): fallback.apply(o, carrier) > - MH or(MH pattern1, MH pattern2) > which is equivalent to > (o, carrier) -> { > var carrier2 = pattern1.apply(o, carrier); > if (carrier2.accessor[0] == -1) { > return carrier2; > } > return pattern2.apply(o, carrier2); > } > For the carrier, the convention is that the component 0 is an int, -1 means "not > match", and any positive index means the indexth case match. > In the detail, it's a little more complex because we sometimes need to pass the > type of the first parameter to correctly type the returned MH and we also need > an object CarrierMetadata that keep track of the type of the carrier components > (and provides an empty carrier and the accessors/withers). > Here is a small example > record Point( int x, int y) {} > record Rectangle(Point p1, Point p2) {} > // Object o = ... > //switch(o) { > // case Rectangle(Point p1, Point p2) -> ... > //} > var lookup = MethodHandles. lookup (); > var carrierMetadata = new CarrierMetadata( methodType (Object. class , int . > class , Point. class , Point. class )); > var empty = carrierMetadata.empty(); > var op = of (empty, > test ( is_instance (Object. class , Rectangle. class ), > cast (Object. class , > or (carrierMetadata, > project ( record_accessor (lookup, Rectangle. class , 0 ), > test ( is_null (Point. class ), > do_not_match (Point. class , carrierMetadata), > bind ( 1 , carrierMetadata))), > project ( record_accessor (lookup, Rectangle. class , 1 ), > test ( is_null (Point. class ), > do_not_match (Point. class , carrierMetadata), > bind ( 2 , carrierMetadata, > match (Point. class , carrierMetadata, 0 )))) > ) > ), > throw_NPE (Object. class , "o is null" ) > ) > ); > // match: new Rectangle(new Point(1, 2), new Point(3, 4)) > var rectangle1 = (Object) new Rectangle( new Point( 1 , 2 ), new Point( 3 , 4 > )); > var carrier1 = op.invokeExact(rectangle1); > System. out .println( "result: " + ( int ) carrierMetadata.accessor( 0 > ).invokeExact(carrier1)); > System. out .println( "binding 1 " + (Point) carrierMetadata.accessor( 1 > ).invokeExact(carrier1)); > System. out .println( "binding 2 " + (Point) carrierMetadata.accessor( 2 > ).invokeExact(carrier1)); > // match: new Rectangle(new Point(1, 2), null) > var rectangle2 = (Object) new Rectangle( new Point( 1 , 2 ), null ); > var carrier2 = op.invokeExact(rectangle2); > System. out .println( "result: " + ( int ) carrierMetadata.accessor( 0 > ).invokeExact(carrier2)); > System. out .println( "binding 1 " + (Point) carrierMetadata.accessor( 1 > ).invokeExact(carrier2)); > System. out .println( "binding 2 " + (Point) carrierMetadata.accessor( 2 > ).invokeExact(carrier2)); > The full code is available here: > [ > https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/Patterns.java > | > https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/Patterns.java > ] > I believe, using a function with two parameters, the actual value we are > switching upon and the carrier that will gather the bindings is better than > using only a carrier as parameter because in that case, you need to use the > carrier to store all the intermediary objects even if they are not kept as > bindings. Adding more information, we want the carrier to be a primitive type (to be able to optimize it away), which means that we can not use null to represent "do_not_match", we have to have a flag inside the carrier for that. For the runtime, they are 3 different contexts: switch, instanceof and assignment, - for a switch, the carrier contains an int (to be switched on) as component 0 and the values of the bindings - for an instanceof, the carrier contains a boolean as component 0 and the values of the bindings - for an assignment, only the values of the bindings are necessary. So for the switch, we can not use one carrier per case, because accessing to the component 0 will be polymorphic if we have multiple carrier objects. R?mi From brian.goetz at oracle.com Mon Mar 7 15:08:00 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 7 Mar 2022 10:08:00 -0500 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> References: <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> Message-ID: <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> > Adding more information, > we want the carrier to be a primitive type (to be able to optimize it > away), which means that we can not use null to represent "do_not_match", > we have to have a flag inside the carrier for that. The alternate approach is to use a .ref class for partial patterns (using null for "no match") and a B3 class for total patterns (since it needs no failure channel.) I think its pretty important that the static name of the carrier class not appear in generated bytecode.? As a result, we will have to use a reference type (object or interface), which means we get the null channel "for free". From forax at univ-mlv.fr Tue Mar 8 00:07:19 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 8 Mar 2022 01:07:19 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> References: <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> Message-ID: <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> From: "Brian Goetz" To: "Remi Forax" Cc: "Jim Laskey" , "amber-spec-experts" Sent: Monday, March 7, 2022 4:08:00 PM Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier BQ_BEGIN BQ_BEGIN Adding more information, we want the carrier to be a primitive type (to be able to optimize it away), which means that we can not use null to represent "do_not_match", we have to have a flag inside the carrier for that. BQ_END The alternate approach is to use a .ref class for partial patterns (using null for "no match") and a B3 class for total patterns (since it needs no failure channel.) BQ_END yes, we still need an int to indicate which case of the switch match and using B3 in Valhalla already give us the performance we want. Anyway changing from one representation to the other is not a big deal. BQ_BEGIN I think its pretty important that the static name of the carrier class not appear in generated bytecode. BQ_END Yes, that why i'm using java.lang.Object in the bytecode visible code. BQ_BEGIN As a result, we will have to use a reference type (object or interface), which means we get the null channel "for free". BQ_END Not necessarily, while java.lang.Object appears in the bytecode a JIT like c2 propagates the real type of the constants (here empty() is typed with the type of the carrier not with java.lang.Object at runtime) so introducing null may introduce some stupid nullchecks. R?mi From john.r.rose at oracle.com Tue Mar 8 17:02:16 2022 From: john.r.rose at oracle.com (John Rose) Date: Tue, 08 Mar 2022 09:02:16 -0800 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: <79C9CDF2-7A5A-49CA-BCC1-6D7F1D1E6084@oracle.com> Yes, ClassSpecializer was consciously designed to support structures like carrier instances, as well as bound method handles (which are a kind of carrier++). From james.laskey at oracle.com Tue Mar 8 19:31:04 2022 From: james.laskey at oracle.com (Jim Laskey) Date: Tue, 8 Mar 2022 19:31:04 +0000 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> References: <9f97a676-caad-ce84-d200-4a016a65fc20@oracle.com> <1747317127.11078549.1646331605873.JavaMail.zimbra@u-pem.fr> Message-ID: <816EC5B6-012F-477B-A99E-19DE2AEB805D@oracle.com> Brian asked me to talk a little about the String Templates use case for Carrier. ( String Templates JEP for background.) Going the example route, the following Java code constructs a TemplatedString from a templated string literal that contains three embedded expressions, x, y, and x + y; int x = 10; int y = 20; TemplatedString ts = "Adding \{x} and \{y} equals \{x + y}." Naively, we could capture the interesting bits of the templated string literal at runtime using a generic GenericTS record; public record GenericTS(String template, List values) implements TemplatedString {} ... TemplatedString ts = new GenericTS("Adding \uFFFC and \uFFFC equals \uFFFC.", List.of(x, y, x + y)); // `\uFFFC` are placeholder characters There are several drawbacks using this generic record, but let's just focus on the List.of(x, y, x + y). Clearly, using a list as a carrier would force all the int expression values to be boxed. For an optimizing template policy, such as STR (concatenation), boxing would be a performance killer. We need a way to carry values without boxing. Another approach is to use an anonymous class. TemplatedString ts = new TemplatedString() { private final int exp$1 = x; private final int exp$2 = y; private final int exp$3 = x + y; public String template() { return "Adding \uFFFC and \uFFFC equals \uFFFC."; } public List values() { return List.of(exp$1, exp$2, exp$3); } ... }); Using the fields of an anonymous class allows the TemplatedString to be constructed without boxing and allow an optimizing TemplatePolicy to access values without boxing (via MethodHandles.) The anonymous class downside is that we end up with hundreds of these often very similar classes. For example; TemplatedString ts1 = "Adding \{x} and \{y} equals \{x + y}." TemplatedString ts2 = "Subtracting \{x} from \{y} equals \{y - x}." Even though the templates are different, the underlying carrier is still three int values. The compiler could fold similarly shaped anonymous classes at compile time, but that would only work for a single compilation unit. What is needed is a runtime solution. That's where java.lang.runtime.Carrier kicks in. Carrier provides an optimal carrier aligning to types of values that are to be carried, spinning up anonymous classes if needed and reusing anonymous classes when similar shape. On Mar 3, 2022, at 2:20 PM, Remi Forax > wrote: For the pattern matching, we also need a 'with' method, that return a method handle that takes a carrier and a value and return a new carrier with the component value updated. static MethodHandle withComponent(MethodType methodType, int i) // returns a mh (Carrier;T) -> Carrier with T the type of the component It can be built on top of constructor() + component() but i think that i should be part of the API instead of every user of the Carrier API trying to re-implement it. In term of spec, Jim, can you rename "component getter" to "component accessor" which is the term used by records. R?mi ________________________________ From: "Brian Goetz" > To: "Jim Laskey" >, "amber-spec-experts" > Sent: Thursday, March 3, 2022 4:29:51 PM Subject: Re: Proposal: java.lang.runtime.Carrier Thanks Jim. As background, (some form of) this code originated in a prototype for pattern matching, where we needed a carrier for a tuple (T, U, V) to carry the results of a match from a deconstruction pattern (or other declared pattern) on the stack as a return value. We didn't want to spin a custom class per pattern, and we didn't want to commit to the actual layout, because we wanted to preserve the ability to switch later to a value class. So the idea is you describe the carrier you want as a MethodType, and there's a condy that gives you an MH that maps that shape of arguments to an opaque carrier (the constructor), and other condys that give you MHs that map from the carrier to the individual bindings. So pattern matching will stick those MHs in CP slots. The carrier might be some bespoke thing (e.g., record anon(T t, U u, V v)), or something that holds an Object[], or something with three int fields and two ref fields, or whatever the runtime decides to serve up. The template mechanism wants almost exactly the same thing for bundling the parameters for uninterprted template strings. Think of it as a macro-box; instead of boxing primitives to Object and Objects to varargs, there's a single boxing operation from a tuple to an opaque type. On 3/3/2022 8:57 AM, Jim Laskey wrote: We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. Motivation The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. Description In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; // Return a constructor MethodHandle for a carrier with components // aligning with the parameter types of the supplied methodType. static MethodHandle constructor(MethodType methodType) // Return a component getter MethodHandle for component i. static MethodHandle component(MethodType methodType, int i) // Return component getter MethodHandles for all the carrier's components. static MethodHandle[] components(MethodType methodType) Examples import java.lang.runtime.Carrier; ... // Define the carrier description. MethodType methodType = MethodType.methodType(Object.class, byte.class, short.class, char.class, int.class, long.class, float.class, double.class, boolean.class, String.class); // Fetch the carrier constructor. MethodHandle constructor = Carrier.constructor(methodType); // Create a carrier object. Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, "abcde"); // Get an array of accessors for the carrier object. MethodHandle[] components = Carrier.components(methodType); // Access fields. byte b = (byte)components[0].invokeExact(object); short s = (short)components[1].invokeExact(object); char c =(char)components[2].invokeExact(object); int i = (int)components[3].invokeExact(object); long l = (long)components[4].invokeExact(object); float f =(float)components[5].invokeExact(object); double d = (double)components[6].invokeExact(object); boolean tf (boolean)components[7].invokeExact(object); String s = (String)components[8].invokeExact(object)); // Access a specific field. MethodHandle component = Carrier.component(methodType, 3); int ii = (int)component.invokeExact(object); From john.r.rose at oracle.com Tue Mar 8 20:52:00 2022 From: john.r.rose at oracle.com (John Rose) Date: Tue, 08 Mar 2022 12:52:00 -0800 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> References: <540692934.11084390.1646332921585.JavaMail.zimbra@u-pem.fr> <0d8dd8fc-1f4a-eb01-8ff2-102de73d6355@oracle.com> <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> Message-ID: On 7 Mar 2022, at 16:07, forax at univ-mlv.fr wrote: > Not necessarily, while java.lang.Object appears in the bytecode a JIT > like c2 propagates the real type of the constants (here empty() is > typed with the type of the carrier not with java.lang.Object at > runtime) so introducing null may introduce some stupid nullchecks. Null checks are usually cheap, because they are usually done implicitly in parallel with the first access that requires a non-null value. This works for true references, but not scalarized values. We want scalarized values, of course. So what I might worry more about here is forcing the JVM to buffer the value object on the heap because of possibility of null. (B3 primitives/bare values are designed to make that less likely.) I think even here null is OK, because *all* B2 types are *both* nullable and scalarizable (if not null), so it is very likely the JVM is going to do extra work to adjoin (the technical term is ?cram in?) a null representation into the scalarized representation, using an extra register if necessary. Bottom line: Nullability doesn?t concern me here, since I assume we are going to continue to work hard (because of B2 value types like ju.Optional) to support both nullability and scalarization. ? John From heidinga at redhat.com Tue Mar 8 21:25:10 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 8 Mar 2022 16:25:10 -0500 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: Hi Jim, Will Carrier::constructor(MethodType) require that the MT's return type is Object.class and thus return a MH that returns an Object? Or can other Classes / Interfaces be used as the return type? Likewise, will Carrier::component(MethodType, int) only accept Object as the input argument? The return type of the ::constructor generated MH will need to be the same as the input arg of the ::component accessors. And do you see this api primarily being used in invokedynamic bootstrap methods? I'm wondering how easy it will be to statically determine the set of MethodHandles required by uses of this API. Primarily for when we need to implement this in qbicc but also for other native image style projects. --Dan On Thu, Mar 3, 2022 at 8:58 AM Jim Laskey wrote: > > We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. > > Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. > > Motivation > > The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. > > Description > > In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] > > The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; > > // Return a constructor MethodHandle for a carrier with components > // aligning with the parameter types of the supplied methodType. > static MethodHandle constructor(MethodType methodType) > > // Return a component getter MethodHandle for component i. > static MethodHandle component(MethodType methodType, int i) > > // Return component getter MethodHandles for all the carrier's components. > static MethodHandle[] components(MethodType methodType) > > Examples > > import java.lang.runtime.Carrier; > ... > > // Define the carrier description. > MethodType methodType = > MethodType.methodType(Object.class, byte.class, short.class, > char.class, int.class, long.class, > float.class, double.class, > boolean.class, String.class); > > // Fetch the carrier constructor. > MethodHandle constructor = Carrier.constructor(methodType); > > // Create a carrier object. > Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, > 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, > 1.0f / 3.0f, 1.0 / 3.0, > true, "abcde"); > > // Get an array of accessors for the carrier object. > MethodHandle[] components = Carrier.components(methodType); > > // Access fields. > byte b = (byte)components[0].invokeExact(object); > short s = (short)components[1].invokeExact(object); > char c =(char)components[2].invokeExact(object); > int i = (int)components[3].invokeExact(object); > long l = (long)components[4].invokeExact(object); > float f =(float)components[5].invokeExact(object); > double d = (double)components[6].invokeExact(object); > boolean tf (boolean)components[7].invokeExact(object); > String s = (String)components[8].invokeExact(object)); > > // Access a specific field. > MethodHandle component = Carrier.component(methodType, 3); > int ii = (int)component.invokeExact(object); > > From brian.goetz at oracle.com Tue Mar 8 21:28:55 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 8 Mar 2022 21:28:55 +0000 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> The minimal constraint is that the return type of the constructor MH is the same type as the argument type of the component MHs. It would seem to me that preserving stronger types here dynamically gives MH combinators more room to optimize? > On Mar 8, 2022, at 4:25 PM, Dan Heidinga wrote: > > Hi Jim, > > Will Carrier::constructor(MethodType) require that the MT's return > type is Object.class and thus return a MH that returns an Object? Or > can other Classes / Interfaces be used as the return type? Likewise, > will Carrier::component(MethodType, int) only accept Object as the > input argument? The return type of the ::constructor generated MH > will need to be the same as the input arg of the ::component > accessors. > > And do you see this api primarily being used in invokedynamic > bootstrap methods? I'm wondering how easy it will be to statically > determine the set of MethodHandles required by uses of this API. > Primarily for when we need to implement this in qbicc but also for > other native image style projects. > > --Dan > > On Thu, Mar 3, 2022 at 8:58 AM Jim Laskey wrote: >> >> We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. >> >> Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. >> >> Motivation >> >> The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. >> >> Description >> >> In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] >> >> The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; >> >> // Return a constructor MethodHandle for a carrier with components >> // aligning with the parameter types of the supplied methodType. >> static MethodHandle constructor(MethodType methodType) >> >> // Return a component getter MethodHandle for component i. >> static MethodHandle component(MethodType methodType, int i) >> >> // Return component getter MethodHandles for all the carrier's components. >> static MethodHandle[] components(MethodType methodType) >> >> Examples >> >> import java.lang.runtime.Carrier; >> ... >> >> // Define the carrier description. >> MethodType methodType = >> MethodType.methodType(Object.class, byte.class, short.class, >> char.class, int.class, long.class, >> float.class, double.class, >> boolean.class, String.class); >> >> // Fetch the carrier constructor. >> MethodHandle constructor = Carrier.constructor(methodType); >> >> // Create a carrier object. >> Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, >> 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, >> 1.0f / 3.0f, 1.0 / 3.0, >> true, "abcde"); >> >> // Get an array of accessors for the carrier object. >> MethodHandle[] components = Carrier.components(methodType); >> >> // Access fields. >> byte b = (byte)components[0].invokeExact(object); >> short s = (short)components[1].invokeExact(object); >> char c =(char)components[2].invokeExact(object); >> int i = (int)components[3].invokeExact(object); >> long l = (long)components[4].invokeExact(object); >> float f =(float)components[5].invokeExact(object); >> double d = (double)components[6].invokeExact(object); >> boolean tf (boolean)components[7].invokeExact(object); >> String s = (String)components[8].invokeExact(object)); >> >> // Access a specific field. >> MethodHandle component = Carrier.component(methodType, 3); >> int ii = (int)component.invokeExact(object); >> >> > From forax at univ-mlv.fr Wed Mar 9 11:53:20 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 9 Mar 2022 12:53:20 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> Message-ID: <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "Jim Laskey" > , "amber-spec-experts" > > Sent: Tuesday, March 8, 2022 9:52:00 PM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier > On 7 Mar 2022, at 16:07, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] wrote: >> Not necessarily, while java.lang.Object appears in the bytecode a JIT like c2 >> propagates the real type of the constants (here empty() is typed with the type >> of the carrier not with java.lang.Object at runtime) so introducing null may >> introduce some stupid nullchecks. > Null checks are usually cheap, because they are usually done implicitly in > parallel with the first access that requires a non-null value. This works for > true references, but not scalarized values. > We want scalarized values, of course. So what I might worry more about here is > forcing the JVM to buffer the value object on the heap because of possibility > of null. (B3 primitives/bare values are designed to make that less likely.) I > think even here null is OK, because all B2 types are both nullable and > scalarizable (if not null), so it is very likely the JVM is going to do extra > work to adjoin (the technical term is ?cram in?) a null representation into the > scalarized representation, using an extra register if necessary. > Bottom line: Nullability doesn?t concern me here, since I assume we are going to > continue to work hard (because of B2 value types like ju.Optional) to support > both nullability and scalarization. What i was proposing is for switch to cram "not match" and the index of the matching case into one int because using -1 seems natural and it will work well with the tableswitch. Anyway, as i said earlier, i don't think it's a bid deal to use "null" to mean "do not match" because "do not match" need to be tested at two different stages, inside the matching pipeline here using null is easier and in the end to nourish the tableswitch/instanceof, in that case null need to be transformed to -1 / false. The other thing is that for the deconstructor call, here i believe we have to go to the B2 representation anyway (a nullable record) because it is part of the public API, not necessary in the language but from the binary compatibility POV, so at least users will be able to see it with a debugger. So it may make sense to always use 'null' to mean "do not match", in the matching pipeline and as result of a deconstructor call. To come back to the carrier API, does it means that the carrier class is always a nullable value type or does it mean that we need to knob to select between a primitive type or a value type ? R?mi From brian.goetz at oracle.com Wed Mar 9 13:31:33 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 9 Mar 2022 13:31:33 +0000 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> References: <1549382924.11148597.1646354340015.JavaMail.zimbra@u-pem.fr> <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> Message-ID: <2A66AE78-5448-43BD-B427-A7B4B3C7E679@oracle.com> What i was proposing is for switch to cram "not match" and the index of the matching case into one int because using -1 seems natural and it will work well with the tableswitch. There?s two levels here, and I think part of the confusion with regard to pattern translation is we?re talking at different levels. The first level is: I?ve written a deconstructor for class Foo, and I want to match it with instanceof, case, whatever. I need a way to ?invoke? the pattern and let it conditionally ?return? multiple values. Carrier is a good tool for this job. The second level is: I want to use indy to choose which branch of a switch to take, *and* at the same time, carry all the values needed to that branch. Carrier could be applied to this as well. Somewhere in between, there?s the question of how we roll up the values in a compound pattern (e.g., Circle(Point(var x, var y) p, var r) c). This could involve flattening all the bindings (x, y, p, r, c) into a fresh carrier, or it could involve a ?carrier of carriers?. There are many degrees of freedom in the translation story. What Jim is proposing here is a runtime for bootstraps to make decomposable tuples that can be pass across boundaries that agree on a contract. This could be a simple return-to-caller, or it could rely on sharing the carrier in the heap between entities that have a shared static typing proof. To come back to the carrier API, does it means that the carrier class is always a nullable value type or does it mean that we need to knob to select between a primitive type or a value type ? Probably the carrier can always be a *primitive* class type, and the null can be handled separately by boxing from QCarrier$33 to LCarrier$33. All the Carrier API does is provide a constructor which takes N values and returns a carrier; at that point, you already know you want a non-null value. Consumers higher up the food chain can opt into nullity. From heidinga at redhat.com Wed Mar 9 13:42:36 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 9 Mar 2022 08:42:36 -0500 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> References: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> Message-ID: On Tue, Mar 8, 2022 at 4:29 PM Brian Goetz wrote: > > The minimal constraint is that the return type of the constructor MH is the same type as the argument type of the component MHs. Agreed. The types should match but they shouldn't be considered part of the api. I don't think (correct me if I'm wrong) that we want them to "escape" and be baked into classfiles. The implementation of the anonymous class holding the fields ("holder object") should remain as a hidden implementation detail. One way to do that is to enforce that the holder object is always hidden behind other public types like Object. > It would seem to me that preserving stronger types here dynamically gives MH combinators more room to optimize? Only the outer edge of the MH chain would need to return (constructor) / take (component) Object. The implementation of the MHs can use a sharper type. I don't think we gain any optimization abilities here by exposing the sharper type - at worst there's an asType operation to check the type but that shouldn't be visible in the performance profile. --Dan > > > On Mar 8, 2022, at 4:25 PM, Dan Heidinga wrote: > > > > Hi Jim, > > > > Will Carrier::constructor(MethodType) require that the MT's return > > type is Object.class and thus return a MH that returns an Object? Or > > can other Classes / Interfaces be used as the return type? Likewise, > > will Carrier::component(MethodType, int) only accept Object as the > > input argument? The return type of the ::constructor generated MH > > will need to be the same as the input arg of the ::component > > accessors. > > > > And do you see this api primarily being used in invokedynamic > > bootstrap methods? I'm wondering how easy it will be to statically > > determine the set of MethodHandles required by uses of this API. > > Primarily for when we need to implement this in qbicc but also for > > other native image style projects. > > > > --Dan > > > > On Thu, Mar 3, 2022 at 8:58 AM Jim Laskey wrote: > >> > >> We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. > >> > >> Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. > >> > >> Motivation > >> > >> The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. > >> > >> Description > >> > >> In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] > >> > >> The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; > >> > >> // Return a constructor MethodHandle for a carrier with components > >> // aligning with the parameter types of the supplied methodType. > >> static MethodHandle constructor(MethodType methodType) > >> > >> // Return a component getter MethodHandle for component i. > >> static MethodHandle component(MethodType methodType, int i) > >> > >> // Return component getter MethodHandles for all the carrier's components. > >> static MethodHandle[] components(MethodType methodType) > >> > >> Examples > >> > >> import java.lang.runtime.Carrier; > >> ... > >> > >> // Define the carrier description. > >> MethodType methodType = > >> MethodType.methodType(Object.class, byte.class, short.class, > >> char.class, int.class, long.class, > >> float.class, double.class, > >> boolean.class, String.class); > >> > >> // Fetch the carrier constructor. > >> MethodHandle constructor = Carrier.constructor(methodType); > >> > >> // Create a carrier object. > >> Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, > >> 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, > >> 1.0f / 3.0f, 1.0 / 3.0, > >> true, "abcde"); > >> > >> // Get an array of accessors for the carrier object. > >> MethodHandle[] components = Carrier.components(methodType); > >> > >> // Access fields. > >> byte b = (byte)components[0].invokeExact(object); > >> short s = (short)components[1].invokeExact(object); > >> char c =(char)components[2].invokeExact(object); > >> int i = (int)components[3].invokeExact(object); > >> long l = (long)components[4].invokeExact(object); > >> float f =(float)components[5].invokeExact(object); > >> double d = (double)components[6].invokeExact(object); > >> boolean tf (boolean)components[7].invokeExact(object); > >> String s = (String)components[8].invokeExact(object)); > >> > >> // Access a specific field. > >> MethodHandle component = Carrier.component(methodType, 3); > >> int ii = (int)component.invokeExact(object); > >> > >> > > > From forax at univ-mlv.fr Wed Mar 9 14:25:31 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 9 Mar 2022 15:25:31 +0100 (CET) Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <2A66AE78-5448-43BD-B427-A7B4B3C7E679@oracle.com> References: <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> <2A66AE78-5448-43BD-B427-A7B4B3C7E679@oracle.com> Message-ID: <1126635624.13763388.1646835931181.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "John Rose" , "Jim Laskey" > , "amber-spec-experts" > > Sent: Wednesday, March 9, 2022 2:31:33 PM > Subject: Re: [External] : Re: Proposal: java.lang.runtime.Carrier >> What i was proposing is for switch to cram "not match" and the index of the >> matching case into one int because using -1 seems natural and it will work well >> with the tableswitch. > There?s two levels here, and I think part of the confusion with regard to > pattern translation is we?re talking at different levels. yes, > The first level is: I?ve written a deconstructor for class Foo, and I want to > match it with instanceof, case, whatever. I need a way to ?invoke? the pattern > and let it conditionally ?return? multiple values. Carrier is a good tool for > this job. > The second level is: I want to use indy to choose which branch of a switch to > take, *and* at the same time, carry all the values needed to that branch. > Carrier could be applied to this as well. > Somewhere in between, there?s the question of how we roll up the values in a > compound pattern (e.g., Circle(Point(var x, var y) p, var r) c). This could > involve flattening all the bindings (x, y, p, r, c) into a fresh carrier, or it > could involve a ?carrier of carriers?. There are many degrees of freedom in the > translation story. > What Jim is proposing here is a runtime for bootstraps to make decomposable > tuples that can be pass across boundaries that agree on a contract. This could > be a simple return-to-caller, or it could rely on sharing the carrier in the > heap between entities that have a shared static typing proof. >> To come back to the carrier API, does it means that the carrier class is always >> a nullable value type or does it mean that we need to knob to select between a >> primitive type or a value type ? > Probably the carrier can always be a *primitive* class type, and the null can be > handled separately by boxing from QCarrier$33 to LCarrier$33. All the Carrier > API does is provide a constructor which takes N values and returns a carrier; > at that point, you already know you want a non-null value. Consumers higher up > the food chain can opt into nullity. Also, i wonder if the external Carrier API should have a way to wrap an existing record class to see it as a Carrier, so the destructuring pattern will behave the same way with a record or with the result of a de-constructor. R?mi From brian.goetz at oracle.com Wed Mar 9 14:25:59 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 9 Mar 2022 14:25:59 +0000 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> Message-ID: >> >> The minimal constraint is that the return type of the constructor MH is the same type as the argument type of the component MHs. > > Agreed. The types should match but they shouldn't be considered part > of the api. I don't think (correct me if I'm wrong) that we want them > to "escape" and be baked into classfiles. The implementation of the > anonymous class holding the fields ("holder object") should remain as > a hidden implementation detail. One way to do that is to enforce that > the holder object is always hidden behind other public types like > Object. Yes. So my question is, who does the laundry? Is it the carrier API (who always says Object), or the caller who is going to take the return value of the carrier constructor and stick it in an Object? Does it make a difference? If I take the constructor MH, and compose it with the component MHs, will having an extraneous Object signature make it harder to expose the true type (which may be a Q type), or will that routinely and reliably come out in the wash anyway? >> It would seem to me that preserving stronger types here dynamically gives MH combinators more room to optimize? > > Only the outer edge of the MH chain would need to return (constructor) > / take (component) Object. The implementation of the MHs can use a > sharper type. I don't think we gain any optimization abilities here > by exposing the sharper type - at worst there's an asType operation to > check the type but that shouldn't be visible in the performance > profile. OK, so you?re saying it?s fine to slap an Object label on it, as it will come off easily when needed. From brian.goetz at oracle.com Wed Mar 9 14:28:48 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 9 Mar 2022 14:28:48 +0000 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <1126635624.13763388.1646835931181.JavaMail.zimbra@u-pem.fr> References: <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> <2A66AE78-5448-43BD-B427-A7B4B3C7E679@oracle.com> <1126635624.13763388.1646835931181.JavaMail.zimbra@u-pem.fr> Message-ID: <06FD3039-4651-4476-9260-3055650809F1@oracle.com> Also, i wonder if the external Carrier API should have a way to wrap an existing record class to see it as a Carrier, so the destructuring pattern will behave the same way with a record or with the result of a de-constructor. Having records be their own carrier is an optimization we anticipate wanting to make, but I don?t think it is needed for Carrier to be involved in the deception. We?ll need a higher-level representation of a pattern (e.g., PatternHandle), which will exposes MHs for ?try to match, return a carrier if it matches? and MHs for deconstructing the carrier. A PH could use the record itself as the carrier, and the record?s accessor MHs as the component accessors, and not use Carrier at all, and the PH client can?t tell the difference. So Carrier here is intended to be the lowest level of the stack; a building block for aggregating tuples, nothing more. We can then build pattern matching atop that (which can use Carrier or not, as it sees fit) and switch dispatch atop that. From heidinga at redhat.com Wed Mar 9 15:03:39 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 9 Mar 2022 10:03:39 -0500 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> Message-ID: On Wed, Mar 9, 2022 at 9:26 AM Brian Goetz wrote: > > >> > >> The minimal constraint is that the return type of the constructor MH is the same type as the argument type of the component MHs. > > > > Agreed. The types should match but they shouldn't be considered part > > of the api. I don't think (correct me if I'm wrong) that we want them > > to "escape" and be baked into classfiles. The implementation of the > > anonymous class holding the fields ("holder object") should remain as > > a hidden implementation detail. One way to do that is to enforce that > > the holder object is always hidden behind other public types like > > Object. > > Yes. So my question is, who does the laundry? Is it the carrier API (who always says Object), or the caller who is going to take the return value of the carrier constructor and stick it in an Object? Does it make a difference? If I take the constructor MH, and compose it with the component MHs, will having an extraneous Object signature make it harder to expose the true type (which may be a Q type), or will that routinely and reliably come out in the wash anyway? Having the Carrier api do the laundry ensures that the implementation details don't leak into MH/MT constantpool entries. And encourages users not to try to be overly clever and rely on the implementation details. The more they do that the harder it will be to evolve the implementation in the future. The Q-ness should just flow through the APIs - I don't see an extra cost here. Either it's all used in the same JIT compilation unit and there's full type visibility or its not and the Q may have to hit the heap to flow along the callstack. The costs are the same whether the descriptors say Q or not. > > >> It would seem to me that preserving stronger types here dynamically gives MH combinators more room to optimize? > > > > Only the outer edge of the MH chain would need to return (constructor) > > / take (component) Object. The implementation of the MHs can use a > > sharper type. I don't think we gain any optimization abilities here > > by exposing the sharper type - at worst there's an asType operation to > > check the type but that shouldn't be visible in the performance > > profile. > > OK, so you?re saying it?s fine to slap an Object label on it, as it will come off easily when needed. > That's my assertion. From john.r.rose at oracle.com Wed Mar 9 17:15:52 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 09 Mar 2022 09:15:52 -0800 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: References: <2BF4871B-689F-431E-8086-32ADB19C2BFC@oracle.com> Message-ID: <49656108-CF44-4F8B-AD53-3D5CCFBE2E32@oracle.com> FWIW I agree with Dan?s point. The Carrier API should throw IAE if Object does *not* appear in all the MethodType places where the eventual (mysterious encapsulated) carrier object will appear. If tomorrow we figure out a clever use for client-specified types there (such as Record or List), then mandating Object today will give us flexibility to extend tomorrow. Also, I don?t see any plausible use cases for a client-specified carrier-placeholder type other than Object. Also, mandating Object will make it slightly easier to deal with the problem of static prediction (for static-image applications that Dan referred to), since there?s one less degree of dynamic freedom. On 9 Mar 2022, at 7:03, Dan Heidinga wrote: > On Wed, Mar 9, 2022 at 9:26 AM Brian Goetz > wrote: >> >>>> >>>> The minimal constraint is that the return type of the constructor >>>> MH is the same type as the argument type of the component MHs. >>> >>> Agreed. The types should match but they shouldn't be considered >>> part >>> of the api. I don't think (correct me if I'm wrong) that we want >>> them >>> to "escape" and be baked into classfiles. The implementation of the >>> anonymous class holding the fields ("holder object") should remain >>> as >>> a hidden implementation detail. One way to do that is to enforce >>> that >>> the holder object is always hidden behind other public types like >>> Object. >> >> Yes. So my question is, who does the laundry? Is it the carrier API >> (who always says Object), or the caller who is going to take the >> return value of the carrier constructor and stick it in an Object? >> Does it make a difference? If I take the constructor MH, and compose >> it with the component MHs, will having an extraneous Object signature >> make it harder to expose the true type (which may be a Q type), or >> will that routinely and reliably come out in the wash anyway? > > Having the Carrier api do the laundry ensures that the implementation > details don't leak into MH/MT constantpool entries. And encourages > users not to try to be overly clever and rely on the implementation > details. The more they do that the harder it will be to evolve the > implementation in the future. > > The Q-ness should just flow through the APIs - I don't see an extra > cost here. Either it's all used in the same JIT compilation unit and > there's full type visibility or its not and the Q may have to hit the > heap to flow along the callstack. The costs are the same whether the > descriptors say Q or not. > >> >>>> It would seem to me that preserving stronger types here dynamically >>>> gives MH combinators more room to optimize? >>> >>> Only the outer edge of the MH chain would need to return >>> (constructor) >>> / take (component) Object. The implementation of the MHs can use a >>> sharper type. I don't think we gain any optimization abilities here >>> by exposing the sharper type - at worst there's an asType operation >>> to >>> check the type but that shouldn't be visible in the performance >>> profile. >> >> OK, so you?re saying it?s fine to slap an Object label on it, as >> it will come off easily when needed. >> > > That's my assertion. From john.r.rose at oracle.com Wed Mar 9 17:23:19 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 09 Mar 2022 09:23:19 -0800 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: References: Message-ID: <5CF87079-F78F-49FA-B196-950D4E1D7E8D@oracle.com> ClassSpecializer is designed for cases beyond generating tuples, where some extra behavioral contract, and/or fixed field set, is required across all the generated classes. That said, ClassSpecializer should support tuple generation nicely, for Carrier. Maurizio?s point is a good one, although if I were Jim I?d hesitate to use something complicated to generate classes for just this one simple case. OTOH, our sense of what is ?simple? sometimes needs adjustment. In the end, the class file generation might be simple, but the infrastructure of generating and registering classes (and allowing them to be unloaded in some cases) is rather subtle, and maintainers will thank us for centralizing it. So, Jim, please do take a look at ClassSpecializer. It?s there for use cases like this one, even if in the end we don?t select it in this use case. On 3 Mar 2022, at 10:49, Maurizio Cimadamore wrote: > Seems sensible. > > As a possible "test", we could perhaps use this mechanism in the JDK > implementation of LambdaForms? We do have places where we spin > "species" classes: > > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/invoke/ClassSpecializer.java > > (that said, maybe species classes contain a bit more than just data, > so perhaps that's a wrong fit - but anyway, worth talking a look for > possible code duplication). > > Maurizio > > > On 03/03/2022 13:57, Jim Laskey wrote: >> >> We propose to provide a runtime /anonymous carrier class object >> generator/; *java.lang.runtime.Carrier*. This generator class is >> designed to share /anonymous classes/?when shapes are similar. For >> example, if several clients require objects containing two integer >> fields, then *Carrier*?will ensure that each client generates >> carrier objects using the same underlying anonymous class. >> >> Providing this mechanism decouples the strategy for carrier class >> generation from the client facility. One could implement one class >> per shape; one class for all shapes (with an Object[]), or something >> in the middle; having this decision behind a bootstrap means that it >> can be evolved at runtime, and optimized differently for different >> situations. >> >> >> Motivation >> >> The String Templates JEP draft >> ?proposes the >> introduction of a /TemplatedString/?object for the primary > purpose of /carrying/?the /template/?and associated > /values/?derived from a /template literal/. To avoid value boxing, > early prototypes described these /carrier/objects using > /per-callsite/?anonymous classes shaped by value types, The use of > distinct anonymous classes here is overkill, especially considering > that many of these classes are similar; containing one or two object > fields and/or one or two integral fields. /Pattern matching/?has a > similar issue when carrying the values for the /holes/?of a pattern. > With potentially hundreds (thousands?) of template literals or > patterns per application, we need to find an alternate approach for > these /value carriers/. >> >> >> Description >> >> In general terms, the *Carrier*?class simply caches anonymous >> classes keyed on shape. To further increase similarity in shape, the >> ordering of value types is handled by the API and not in the >> underlying anonymous class. If one client requires an object with one >> object value and one integer value and a second client requires an >> object with one integer value and one object value, then both clients >> will use the same underlying anonymous class. Further, types are >> folded as either integer (byte, short, int, boolean, char, float), >> long (long, double) or object. [We've seen that performance hit by >> folding the long group into the integer group is significant, hence >> the separate group.] >> >> The *Carrier*?API uses MethodType parameter types to describe the >> shape of a carrier. This incorporates with the primary use case where >> bootstrap methods need to capture indy non-static arguments. The API >> has three static methods; >> >> |// Return a constructor MethodHandle for a carrier with components >> // aligning with the parameter types of the supplied methodType. >> static MethodHandle constructor(MethodType methodType) // Return a >> component getter MethodHandle for component i. static MethodHandle >> component(MethodType methodType, int i) // Return component getter >> MethodHandles for all the carrier's components. static MethodHandle[] >> components(MethodType methodType)| >> >> >> Examples >> >> |import java.lang.runtime.Carrier; ... // Define the carrier >> description. MethodType methodType = >> MethodType.methodType(Object.class, byte.class, short.class, >> char.class, int.class, long.class, float.class, double.class, >> boolean.class, String.class); // Fetch the carrier constructor. >> MethodHandle constructor = Carrier.constructor(methodType); // Create >> a carrier object. Object object = >> (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', >> 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, >> "abcde"); // Get an array of accessors for the carrier object. >> MethodHandle[] components = Carrier.components(methodType); // Access >> fields. byte b = (byte)components[0].invokeExact(object); short s = >> (short)components[1].invokeExact(object); char c >> =(char)components[2].invokeExact(object); int i = >> (int)components[3].invokeExact(object); long l = >> (long)components[4].invokeExact(object); float f >> =(float)components[5].invokeExact(object); double d = >> (double)components[6].invokeExact(object); boolean tf >> (boolean)components[7].invokeExact(object); String s = >> (String)components[8].invokeExact(object)); // Access a specific >> field. MethodHandle component = Carrier.component(methodType, 3); int >> ii = (int)component.invokeExact(object);| >> From brian.goetz at oracle.com Wed Mar 9 17:26:09 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 9 Mar 2022 17:26:09 +0000 Subject: Proposal: java.lang.runtime.Carrier In-Reply-To: <5CF87079-F78F-49FA-B196-950D4E1D7E8D@oracle.com> References: <5CF87079-F78F-49FA-B196-950D4E1D7E8D@oracle.com> Message-ID: <0E073102-745B-46AA-8A57-A926EA87F520@oracle.com> And in the future, when we have templated classes, some carriers may well become specializations of arity-indexed base classes (CarrierTuple1, CarrierTuple2, etc), where the VM takes responsibility for nasty things like when to unload specializations. On Mar 9, 2022, at 12:23 PM, John Rose > wrote: ClassSpecializer is designed for cases beyond generating tuples, where some extra behavioral contract, and/or fixed field set, is required across all the generated classes. That said, ClassSpecializer should support tuple generation nicely, for Carrier. Maurizio?s point is a good one, although if I were Jim I?d hesitate to use something complicated to generate classes for just this one simple case. OTOH, our sense of what is ?simple? sometimes needs adjustment. In the end, the class file generation might be simple, but the infrastructure of generating and registering classes (and allowing them to be unloaded in some cases) is rather subtle, and maintainers will thank us for centralizing it. So, Jim, please do take a look at ClassSpecializer. It?s there for use cases like this one, even if in the end we don?t select it in this use case. On 3 Mar 2022, at 10:49, Maurizio Cimadamore wrote: Seems sensible. As a possible "test", we could perhaps use this mechanism in the JDK implementation of LambdaForms? We do have places where we spin "species" classes: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/invoke/ClassSpecializer.java (that said, maybe species classes contain a bit more than just data, so perhaps that's a wrong fit - but anyway, worth talking a look for possible code duplication). Maurizio On 03/03/2022 13:57, Jim Laskey wrote: We propose to provide a runtime anonymous carrier class object generator; java.lang.runtime.Carrier. This generator class is designed to share anonymous classes when shapes are similar. For example, if several clients require objects containing two integer fields, then Carrier will ensure that each client generates carrier objects using the same underlying anonymous class. Providing this mechanism decouples the strategy for carrier class generation from the client facility. One could implement one class per shape; one class for all shapes (with an Object[]), or something in the middle; having this decision behind a bootstrap means that it can be evolved at runtime, and optimized differently for different situations. Motivation The String Templates JEP draft proposes the introduction of a TemplatedString object for the primary purpose of carrying the template and associated values derived from a template literal. To avoid value boxing, early prototypes described these carrierobjects using per-callsite anonymous classes shaped by value types, The use of distinct anonymous classes here is overkill, especially considering that many of these classes are similar; containing one or two object fields and/or one or two integral fields. Pattern matching has a similar issue when carrying the values for the holes of a pattern. With potentially hundreds (thousands?) of template literals or patterns per application, we need to find an alternate approach for these value carriers. Description In general terms, the Carrier class simply caches anonymous classes keyed on shape. To further increase similarity in shape, the ordering of value types is handled by the API and not in the underlying anonymous class. If one client requires an object with one object value and one integer value and a second client requires an object with one integer value and one object value, then both clients will use the same underlying anonymous class. Further, types are folded as either integer (byte, short, int, boolean, char, float), long (long, double) or object. [We've seen that performance hit by folding the long group into the integer group is significant, hence the separate group.] The Carrier API uses MethodType parameter types to describe the shape of a carrier. This incorporates with the primary use case where bootstrap methods need to capture indy non-static arguments. The API has three static methods; // Return a constructor MethodHandle for a carrier with components // aligning with the parameter types of the supplied methodType. static MethodHandle constructor(MethodType methodType) // Return a component getter MethodHandle for component i. static MethodHandle component(MethodType methodType, int i) // Return component getter MethodHandles for all the carrier's components. static MethodHandle[] components(MethodType methodType) Examples import java.lang.runtime.Carrier; ... // Define the carrier description. MethodType methodType = MethodType.methodType(Object.class, byte.class, short.class, char.class, int.class, long.class, float.class, double.class, boolean.class, String.class); // Fetch the carrier constructor. MethodHandle constructor = Carrier.constructor(methodType); // Create a carrier object. Object object = (Object)constructor.invokeExact((byte)0xFF, (short)0xFFFF, 'C', 0xFFFFFFFF, 0xFFFFFFFFFFFFFFFFL, 1.0f / 3.0f, 1.0 / 3.0, true, "abcde"); // Get an array of accessors for the carrier object. MethodHandle[] components = Carrier.components(methodType); // Access fields. byte b = (byte)components[0].invokeExact(object); short s = (short)components[1].invokeExact(object); char c =(char)components[2].invokeExact(object); int i = (int)components[3].invokeExact(object); long l = (long)components[4].invokeExact(object); float f =(float)components[5].invokeExact(object); double d = (double)components[6].invokeExact(object); boolean tf (boolean)components[7].invokeExact(object); String s = (String)components[8].invokeExact(object)); // Access a specific field. MethodHandle component = Carrier.component(methodType, 3); int ii = (int)component.invokeExact(object); From john.r.rose at oracle.com Wed Mar 9 17:33:18 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 09 Mar 2022 09:33:18 -0800 Subject: [External] : Re: Proposal: java.lang.runtime.Carrier In-Reply-To: <06FD3039-4651-4476-9260-3055650809F1@oracle.com> References: <346083073.11799245.1646520854772.JavaMail.zimbra@u-pem.fr> <1554760661.12502090.1646664636312.JavaMail.zimbra@u-pem.fr> <33e69c0e-5306-589f-29f2-2bdbd341e740@oracle.com> <186874812.12710567.1646698039608.JavaMail.zimbra@u-pem.fr> <1767687407.13656581.1646826800400.JavaMail.zimbra@u-pem.fr> <2A66AE78-5448-43BD-B427-A7B4B3C7E679@oracle.com> <1126635624.13763388.1646835931181.JavaMail.zimbra@u-pem.fr> <06FD3039-4651-4476-9260-3055650809F1@oracle.com> Message-ID: <123E5C92-8CB2-4B12-9EC1-E3341AFAF34E@oracle.com> On 9 Mar 2022, at 6:28, Brian Goetz wrote: >> Also, i wonder if the external Carrier API should have a way to wrap an existing record class to see it as a Carrier, so the destructuring pattern will behave the same way with a record or with the result of a de-constructor. > > Having records be their own carrier is an optimization we anticipate wanting to make, but I don?t think it is needed for Carrier to be involved in the deception. We?ll need a higher-level representation of a pattern (e.g., PatternHandle), which will exposes MHs for ?try to match, return a carrier if it matches? and MHs for deconstructing the carrier. A PH could use the record itself as the carrier, and the record?s accessor MHs as the component accessors, and not use Carrier at all, and the PH client can?t tell the difference. > > So Carrier here is intended to be the lowest level of the stack; a building block for aggregating tuples, nothing more. We can then build pattern matching atop that (which can use Carrier or not, as it sees fit) and switch dispatch atop that. I hereby dub this optimization (of using the original record instead of an anonymously-typed copy) as ?BYOC?, or Bring Your Own Carrier. Whether the Carrier API needs to know about it depends on one factor: Whether Carrier (or CarrierBindings or CarrierHandles) is a quasi-reflective object that helpfully combines the constructor and list-of-component-accessors. If so, the BYOC use case suggests an external constructor for that little record. In fact, it?s probably exactly and only a little record, and we are done. But if we manage to avoid introducing such a little record (of ctor+acsors) then we push that job upward (out of the Carrier API) and at that point BYOC happens in the context of BYOCHR (Bring Your Own Carrier Handles Record). I?m not super happy about BYOCHR; it seems like an invitation to duplicate effort. Perhaps there should be a type Carrier.Handles which is exactly that simple-as-can-be record. It?s what the carrier factory (now there?s just one static factory, keyed by MethodType?) returns. But as a Record, it can also be used (trivially) for BYOC, if some user has other handles that would be nicer. Something to consider. ? John From gavin.bierman at oracle.com Thu Mar 10 20:21:06 2022 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Thu, 10 Mar 2022 20:21:06 +0000 Subject: [18][guarded pattern] conditional-and query - spec clarification In-Reply-To: References: Message-ID: <203F978A-9F9A-435F-A959-1F5D195BFE5C@oracle.com> Hi Manoj, It?s a slightly moot point, given that we are likely to drop guarded patterns in the next preview but I think there has been some confusion here... On 7 Mar 2022, at 07:08, Manoj Palat > wrote: Hi, Given, public void bar(Object o) { int i = switch(o) { case String a && o != null ? true : false -> 1;//ecj flags syntax error here default -> 1; }; } ECJ(eclipse compiler for Java) flags a syntax error on the guarded pattern. However, javac accepts. Ecj translates this into: case ((String a) && (o != null)) ? true : false and flags an error instead of case ((String a) && ((o != null) ? true : false)) The idea of guarded patterns is that we give a secondary role to `&&` to serve as an operator for patterns. After the `case` we parse a pattern. One of the form of a pattern is a guarded pattern which is: GuardedPattern: PrimaryPattern && ConditionalAndExpression Given the grammar as per http://cr.openjdk.java.net/~gbierman/jep420/jep420-20211208/specs/patterns-switch-jls.html I think javac is parsing this correctly. I don?t know quite what ecj is doing here because the translation you give above seems to suggest that it was accepting an expression after the `case` which is not correct. Moreover, the inner expression (String a) && (o != null) is not an expression but a (guarded) pattern. And I think the ecj is correct in flagging the error due to: From https://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html we see that Conditional-And Operator ?&&? has higher operator precedence than the Conditional Operator ??:? . From https://docs.oracle.com/javase/specs/jls/se17/html/jls-15.html#jls-15.23, we see that ?The conditional-and operator is syntactically left-associative (it groups left-to-right).? Also, I don't see any mention of the precedence changes in spec 420 [latest at https://cr.openjdk.java.net/~gbierman/jep420/latest] I don?t see the connection with precedence - we certainly didn?t make any changes. Am I understanding your issue correctly? Thanks, Gavin From forax at univ-mlv.fr Sun Mar 13 14:59:47 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 13 Mar 2022 15:59:47 +0100 (CET) Subject: Record pattern, the runtime side Message-ID: <2137135235.15766518.1647183587359.JavaMail.zimbra@u-pem.fr> Following the discussions we had, i've implemented a prototype of what can be the runtime part of the pattern matching that supports record pattern. It works in 3 steps: Step 1, at compile time, the compiler takes all the patterns and creates a tree of pattern from the list of patterns, pattern that starts with the same prefix are merged together. In the end, the tree of patterns is encoded in the bytecode as a tree of constant dynamic (each Pattern is created only from constant and patterns). sealed interface Pattern {} record NullPattern() implements Pattern {} record ConstantPattern(Object constant) implements Pattern {} record TypePattern(Class type) implements Pattern {} record RecordPattern(Class recordClass, Pattern... patterns) implements Pattern {} record OrPattern(Pattern pattern1, Pattern pattern2) implements Pattern {} record ResultPattern(int index, Pattern pattern) implements Pattern {} The last two patterns are less obvious, the OrPattern allows to create the tree by saying that a pattern should be executed before another one. Because the patterns are organized as a tree and not a list, we need a way to associate which branch of the tree correspond to which pattern from the use POV, the ResultPattern encodes in the carrier the index (the first pattern is 0, the second one is 1, etc) inside the first component of the carrier. I've chosen to not represent the bindings (and their corresponding the component index in the carrier) and to use the fact that binding slot are numbered in the order of the tree traversal. This is maybe too brittle because the compiler and the runtime has to agree on the order of the traversal. But this avoids to encode too many information in the bytecode. Step 2, at runtime, the first call to invokedynamic with the pattern tree as arguments, creates a tree of method handles that will match the pattern. During that phase, the runtime environment can be checked to see if the pattern tree is invalid with the runtime classes, in that case a linkage error is thrown. In the prototype the method is called toMatcher and takes a Lookup (to find the accessors of the record of the RecordPattern), a receiverClass the type of the variable switch upon, the carrier type (a method type as a tuple of the type of the binding in the tree traversal order), the index of the first binding (in case of a switch the first component in the matching index so the binding slots starts at 1) and method handle (the null matcher) to call if a null is found (the two possible semantics are doNotMatch/return null or throw a NPE). pattern.toMatcher(lookup, receiverClass, carrierType, firstBinding (0 or 1), nullMatcher); Step 3, during the execution, - if it's an instanceof a carrier which is null means no match otherwise the carrier contains the value of the bindings, - if it's a switch, a carrier which is null means "default" otherwise the component 0 or the carrier contains the index of the pattern matched and the binding values - if it's an assignment, the carrier can not be null because the nullMatcher throws a NPE earlier, so the carrier contains the binding values An example of instanceof https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/InstanceOfExamples.java#L15 An example of switch https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/SwitchExamples.java#L17 An example of assignment https://github.com/forax/switch-carrier/blob/master/src/main/java/com/github/forax/carrier/java/lang/runtime/AssignmentExamples.java#L14 regards, R?mi From forax at univ-mlv.fr Sun Mar 13 15:09:06 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 13 Mar 2022 16:09:06 +0100 (CET) Subject: Record pattern: matching an empty record Message-ID: <604077606.15770073.1647184146207.JavaMail.zimbra@u-pem.fr> Hi all, while writing the prototype of the runtime, i found a case i think we never discuss, can we match an empty record ? record Empty() { } switch(object) { case Empty() -> ... // no binding here I think the answer is yes because i don't see why we should do a special case for that, but i may be wrong. R?mi From brian.goetz at oracle.com Sun Mar 13 15:21:05 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 13 Mar 2022 11:21:05 -0400 Subject: Record pattern: matching an empty record In-Reply-To: <604077606.15770073.1647184146207.JavaMail.zimbra@u-pem.fr> References: <604077606.15770073.1647184146207.JavaMail.zimbra@u-pem.fr> Message-ID: <16964afd-0989-b598-4722-9bb126315f68@oracle.com> Given a record R, and a record pattern R(P*), where P* is a list of nested patterns of the same arity as R's components, then ??? x matches R(P*) iff ??? x instanceof R ??? && R(var alpha*)? // always true, just binds ??? && \forall i alpha_i matches P_i If P* is empty, the last clause is vacuously true. On 3/13/2022 11:09 AM, Remi Forax wrote: > Hi all, > while writing the prototype of the runtime, > i found a case i think we never discuss, can we match an empty record ? > > record Empty() { } > > switch(object) { > case Empty() -> ... // no binding here > > I think the answer is yes because i don't see why we should do a special case for that, but i may be wrong. > > R?mi From brian.goetz at oracle.com Wed Mar 16 16:41:49 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 Mar 2022 12:41:49 -0400 Subject: Record pattern, the runtime side In-Reply-To: <2137135235.15766518.1647183587359.JavaMail.zimbra@u-pem.fr> References: <2137135235.15766518.1647183587359.JavaMail.zimbra@u-pem.fr> Message-ID: <3003e7a2-0679-984a-0513-ec3b1c9539ca@oracle.com> > It works in 3 steps: > Step 1, at compile time, the compiler takes all the patterns and creates a tree of pattern from the list of patterns, > pattern that starts with the same prefix are merged together. We can "normalize" a complex pattern into a sequence of simpler conditionals.? For example, matching the record pattern ??? case Circle(Point(var x, var y), var r) can be unrolled (and type inference applied) as ??? x matches Circle(Point(var x, var y), var r) ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) Deconstruction patterns are known to have only an `instanceof` precondition; the deconstructor body won't ever fail (unlike more general static or instance patterns like Optional::of.)? So we can further rewrite this as: ??? x matches Circle(Point(var x, var y), var r) ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) ??? === x instanceof Circle c && c.deconstruct(Point p, int r) && p instanceof Point && p.deconstruct(int x, int y) (where the "deconstruct" term invokes the deconstructor and binds the relevant values.) If we are disciplined about the order in which we unroll (e.g., always depth-first and always left-to-right), with a modest amount of normalization, your "same pattern prefix" turns into the simpler "common prefix of normalized operations".? Record deconstructors can be further normalized, because the can be replaced with calling the accessors: ??? x matches Circle(Point(var x, var y), var r) ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) ??? === x instanceof Circle c && (Point p = c.center()) && (int r = c.radius()) && p instanceof Point && (int x = p.x()) && (int y = p.y()) Of course, this is all very implementation-centric; factoring matching this way is somewhat unusual, since the arity and order of side-effects might be surprising to Java developers.? (Yes, having side-effects in pattern definitions is bad, but it may still happen.)? So the spec will have to do some fast dancing to allow this. > In the end, the tree of patterns is encoded in the bytecode as a tree of constant dynamic (each Pattern is created only from constant and patterns). With record patterns, we don't even need pattern descriptions, because we can translate it all down to instanceof tests and invoking record component accessors.? Of course, that ends when we have deconstruction patterns, which correspond to imperative code; then having a Pattern instantiation, and a way to get to its matching / binding-extraction MHs, is needed. From forax at univ-mlv.fr Wed Mar 16 20:34:36 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 16 Mar 2022 21:34:36 +0100 (CET) Subject: Record pattern, the runtime side In-Reply-To: <3003e7a2-0679-984a-0513-ec3b1c9539ca@oracle.com> References: <2137135235.15766518.1647183587359.JavaMail.zimbra@u-pem.fr> <3003e7a2-0679-984a-0513-ec3b1c9539ca@oracle.com> Message-ID: <1317237683.18149630.1647462876658.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" , "amber-spec-experts" > Sent: Wednesday, March 16, 2022 5:41:49 PM > Subject: Re: Record pattern, the runtime side >> It works in 3 steps: >> Step 1, at compile time, the compiler takes all the patterns and creates a tree >> of pattern from the list of patterns, >> pattern that starts with the same prefix are merged together. > > We can "normalize" a complex pattern into a sequence of simpler > conditionals.? For example, matching the record pattern > > ??? case Circle(Point(var x, var y), var r) > > can be unrolled (and type inference applied) as > > ??? x matches Circle(Point(var x, var y), var r) > ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) > > Deconstruction patterns are known to have only an `instanceof` > precondition; the deconstructor body won't ever fail (unlike more > general static or instance patterns like Optional::of.) If you define "matches" in term of instanceof this transformation does not work in the context of an assignment, because you want Point(var x, var y) = null to throw a NPE. But it's a very valid transformation if the pattern is not total and "matches" means instanceof in the context of a switch or instanceof and requireNonNull + cast in the context of an assignment. Also from the runtime POV, a deconstructor and a pattern methods (static or instance) are identical, if we follow the idea of John to use null for not match. Obviously, it does not preclude us to differentiate between the two at the language level. >?So we can further rewrite this as: > > ??? x matches Circle(Point(var x, var y), var r) > ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) > ??? === x instanceof Circle c && c.deconstruct(Point p, int r) && p > instanceof Point && p.deconstruct(int x, int y) > > (where the "deconstruct" term invokes the deconstructor and binds the > relevant values.) With this rewrite, we are moving from higher world of patterns to the lower world of matcher (it's how i've called it), were the exact semantics of a pattern is decomposed into different method handles. It's at that stage that depending on if the pattern is total or not, an instanceof/type check is used or not. > > If we are disciplined about the order in which we unroll (e.g., always > depth-first and always left-to-right), with a modest amount of > normalization, your "same pattern prefix" turns into the simpler "common > prefix of normalized operations".? Yes ! > Record deconstructors can be further normalized, because the can be replaced with calling the accessors: > > ??? x matches Circle(Point(var x, var y), var r) > ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) > ??? === x instanceof Circle c && (Point p = c.center()) && (int r = > c.radius()) && p instanceof Point > && (int x = p.x()) && (int y = p.y()) I've unified the access to a record and the access to a carrier for that exact reason, so the translation for a deconstructor or a record is identical. A carrier being an anonymous record or it's dual, a record is a named carrier. > > Of course, this is all very implementation-centric; factoring matching > this way is somewhat unusual, since the arity and order of side-effects > might be surprising to Java developers.? (Yes, having side-effects in > pattern definitions is bad, but it may still happen.)? So the spec will > have to do some fast dancing to allow this. yes, this will require pedagogy to explain why a pattern method is called once and not as many time as present in the source code. But i believe we should not make the performance worst because few users may write pattern methods/deconstructors that side effect otherwise we will have people that rewrite switch to a cascade of if ... instanceof due to performance concern (as we have seen several examples of this in the past with people re-writing enhanced for loops to basic loops before escape analysis was added to the VM, or more recently with lambdas because before Java 17 each lambda was using its own metaspace). > >> In the end, the tree of patterns is encoded in the bytecode as a tree of >> constant dynamic (each Pattern is created only from constant and patterns). > > With record patterns, we don't even need pattern descriptions, because > we can translate it all down to instanceof tests and invoking record > component accessors.? Of course, that ends when we have deconstruction > patterns, which correspond to imperative code; then having a Pattern > instantiation, and a way to get to its matching / binding-extraction > MHs, is needed. Yes, record pattern is the last pattern we can implement in term of a cascade of if. I use that fact in the prototype, the runtime use the switch + type pattern because it does not use invokedynamic. For the future, i'm not sure we will want to use invokedynamic for all patterns, indy is still quite slow until c2 kicks in. R?mi From brian.goetz at oracle.com Thu Mar 17 13:59:40 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 Mar 2022 09:59:40 -0400 Subject: [External] : Re: Record pattern, the runtime side In-Reply-To: <1317237683.18149630.1647462876658.JavaMail.zimbra@u-pem.fr> References: <2137135235.15766518.1647183587359.JavaMail.zimbra@u-pem.fr> <3003e7a2-0679-984a-0513-ec3b1c9539ca@oracle.com> <1317237683.18149630.1647462876658.JavaMail.zimbra@u-pem.fr> Message-ID: On 3/16/2022 4:34 PM, forax at univ-mlv.fr wrote: > ----- Original Message ----- >> From: "Brian Goetz" >> To: "Remi Forax" , "amber-spec-experts" >> Sent: Wednesday, March 16, 2022 5:41:49 PM >> Subject: Re: Record pattern, the runtime side >>> It works in 3 steps: >>> Step 1, at compile time, the compiler takes all the patterns and creates a tree >>> of pattern from the list of patterns, >>> pattern that starts with the same prefix are merged together. >> We can "normalize" a complex pattern into a sequence of simpler >> conditionals.? For example, matching the record pattern >> >> ??? case Circle(Point(var x, var y), var r) >> >> can be unrolled (and type inference applied) as >> >> ??? x matches Circle(Point(var x, var y), var r) >> ??? === x matches Circle(Point p, int r) && p matches Point(int x, int y) >> >> Deconstruction patterns are known to have only an `instanceof` >> precondition; the deconstructor body won't ever fail (unlike more >> general static or instance patterns like Optional::of.) > If you define "matches" in term of instanceof this transformation does not work in the context of an assignment, > because you want > Point(var x, var y) = null > to throw a NPE. Yes, but this is not what matches means.? Matches is the three-place predicate that takes the static type of the target into account.? It differs from instanceof only at null, which is why I wrote "matches" rather than "instanceof".? You'll see in later rounds of lowering how this gets turned back into instanceof, taking into account the static type information. > But it's a very valid transformation if the pattern is not total and "matches" means instanceof in the context of a switch or instanceof and requireNonNull + cast in the context of an assignment. Correct: ???? P(Q) === P(var a) && a mathces Q? always, whereas ???? P(Q) === P(var a) && a instanceof Q? **when Q is not unconditional on the target of P** Updated terminology scorecard: a pattern P is *unconditional* on a type T if it matches all values of T; in other words, if it is not asking a question at all.? The only unconditional patterns are "any" patterns (_), "var" patterns, and total type patterns. Deconstruction patterns are never unconditional, because they don't match on nulls. On the other hand, a pattern P is *exhaustive* on a type T if it is considered "good enough" for purposes of static type checking. Deconstruction patterns D(...) are exhaustive on types T <: D, even though they don't match null.? The difference is *remainder*. > Also from the runtime POV, a deconstructor and a pattern methods (static or instance) are identical, if we follow the idea of John to use null for not match. Obviously, it does not preclude us to differentiate between the two at the language level. With one difference; the language makes deconstructors always total (they can't fail to match if the target is of the right type), whereas pattern methods can fail to match.? So in the translations where I write "c.deconstruct(...)" we are assuming that the deconstructor is always "true". >>> In the end, the tree of patterns is encoded in the bytecode as a tree of >>> constant dynamic (each Pattern is created only from constant and patterns). >> With record patterns, we don't even need pattern descriptions, because >> we can translate it all down to instanceof tests and invoking record >> component accessors.? Of course, that ends when we have deconstruction >> patterns, which correspond to imperative code; then having a Pattern >> instantiation, and a way to get to its matching / binding-extraction >> MHs, is needed. > Yes, record pattern is the last pattern we can implement in term of a cascade of if. I use that fact in the prototype, the runtime use the switch + type pattern because it does not use invokedynamic. We can use a cascade of if either way, but record patterns are more "transparent" in that the compiler can lower the match criteria and extraction of bindings to primitives it already understands, whereas a method pattern is an opaque blob of code. > For the future, i'm not sure we will want to use invokedynamic for all patterns, indy is still quite slow until c2 kicks in. It is a difficult tradeoff to decide what code to emit for narrowly-branching trees.? The indy approach means we can change plans later, but has a startup cost that may well not buy us very much; having a heuristic like "use if-else chains for fewer than five types" is brittle. From manoj.palat at in.ibm.com Mon Mar 21 10:58:15 2022 From: manoj.palat at in.ibm.com (Manoj Palat) Date: Mon, 21 Mar 2022 10:58:15 +0000 Subject: [18][guarded pattern] conditional-and query - spec clarification Message-ID: <9A37253F-F1EF-476C-99C1-2DDCC8E0F1AD@in.ibm.com> Hi Gavin, Thanks for the reply ? yes, agree with you in that precedence doesn?t figure out here ? my bad I went into a wrong path in the grammar. That said, with further analysis, the code in question: case String a && o != null ? true : false -> 1;//ecj flags syntax error here is still an error because there seems to be no way for the reduction into conditionalAndExpression as mentioned below. Ie: case String a && o != null ? true : false -> // Reject Add a parenthesis and then we get a path ? side note/ case String a && (o != null ? true : false) -> // Accept because, without parenthesis, ie the code: case String a && o != null ? true : false -> follows the path of: AssignmentExpression -> ConditionalExpression Expression ::= AssignmentExpression ConstantExpression -> Expression And this cannot be reduced to a ConditionalAndExpression and consequently not further to a GuardedPattern while the one with parenthesis, can be reduced to GuardedPattern eventually case String a && (o != null ? true : false) -> AssignmentExpression -> ConditionalExpression Expression ::= AssignmentExpression PrimaryNoNewArray ::= ?(? Expression_NotName ?)? ie and so on.. Primary -> PrimaryNoNewArray PostfixExpression -> Primary ? InclusiveOrExpression -> ExclusiveOrExpression ConditionalAndExpression -> InclusiveOrExpression ? With the combination of ?String a &&? eventually leading to GuardedPattern ::= PrimaryPattern AND_AND ConditionalAndExpression Regards, Manoj From: Gavin Bierman Date: Friday, 11 March 2022 at 1:51 AM To: Manoj Palat Cc: "amber-spec-experts at openjdk.java.net" Subject: [EXTERNAL] Re: [18][guarded pattern] conditional-and query - spec clarification Hi Manoj, It?s a slightly moot point, given that we are likely to drop guarded patterns in the next preview but I think there has been some confusion here... On 7 Mar 2022, at 07:08, Manoj Palat wrote: ? ? ? ? ? ? ? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Manoj, It?s a slightly moot point, given that we are likely to drop guarded patterns in the next preview but I think there has been some confusion here... On 7 Mar 2022, at 07:08, Manoj Palat > wrote: Hi, Given, public void bar(Object o) { int i = switch(o) { case String a && o != null ? true : false -> 1;//ecj flags syntax error here default -> 1; }; } ECJ(eclipse compiler for Java) flags a syntax error on the guarded pattern. However, javac accepts. Ecj translates this into: case ((String a) && (o != null)) ? true : false and flags an error instead of case ((String a) && ((o != null) ? true : false)) The idea of guarded patterns is that we give a secondary role to `&&` to serve as an operator for patterns. After the `case` we parse a pattern. One of the form of a pattern is a guarded pattern which is: GuardedPattern: PrimaryPattern && ConditionalAndExpression Given the grammar as per http://cr.openjdk.java.net/~gbierman/jep420/jep420-20211208/specs/patterns-switch-jls.html I think javac is parsing this correctly. I don?t know quite what ecj is doing here because the translation you give above seems to suggest that it was accepting an expression after the `case` which is not correct. Moreover, the inner expression (String a) && (o != null) is not an expression but a (guarded) pattern. And I think the ecj is correct in flagging the error due to: From https://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html we see that Conditional-And Operator ?&&? has higher operator precedence than the Conditional Operator ??:? . From https://docs.oracle.com/javase/specs/jls/se17/html/jls-15.html#jls-15.23, we see that ?The conditional-and operator is syntactically left-associative (it groups left-to-right).? Also, I don't see any mention of the precedence changes in spec 420 [latest at https://cr.openjdk.java.net/~gbierman/jep420/latest] I don?t see the connection with precedence - we certainly didn?t make any changes. Am I understanding your issue correctly? Thanks, Gavin From brian.goetz at oracle.com Thu Mar 24 17:39:21 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 24 Mar 2022 13:39:21 -0400 Subject: Pattern coverage Message-ID: I've put a document at http://cr.openjdk.java.net/~briangoetz/eg-attachments/Coverage.pdf which outlines a formal model for pattern coverage, including record patterns and the effects of sealing.? This refines the work we did earlier.? The document may be a bit rough so please let me know if you spot any errors.? The approach here should be more amenable to specification than the previous approach. From forax at univ-mlv.fr Thu Mar 24 17:56:52 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 24 Mar 2022 18:56:52 +0100 (CET) Subject: Pattern coverage In-Reply-To: References: Message-ID: <916111995.1289475.1648144612497.JavaMail.zimbra@u-pem.fr> Thanks for sharing, in the text, they are several mentions of the default pattern but the default pattern is not defined. R?mi > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Thursday, March 24, 2022 6:39:21 PM > Subject: Pattern coverage > I've put a document at > [ http://cr.openjdk.java.net/~briangoetz/eg-attachments/Coverage.pdf | > http://cr.openjdk.java.net/~briangoetz/eg-attachments/Coverage.pdf ] > which outlines a formal model for pattern coverage, including record patterns > and the effects of sealing. This refines the work we did earlier. The document > may be a bit rough so please let me know if you spot any errors. The approach > here should be more amenable to specification than the previous approach. From brian.goetz at oracle.com Thu Mar 24 18:49:44 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 24 Mar 2022 14:49:44 -0400 Subject: [External] : Re: Pattern coverage In-Reply-To: <916111995.1289475.1648144612497.JavaMail.zimbra@u-pem.fr> References: <916111995.1289475.1648144612497.JavaMail.zimbra@u-pem.fr> Message-ID: <91fb4352-f261-b555-58c3-a20df011abaf@oracle.com> Right, in this model "default" clauses map to "any" patterns.? It doesn't (yet) deal with remainder, but that will come in a separate section.? This is all about static type checking.? Also, the last two rules probably leave out some of the generics support, but that's not essential to the model; we're mostly trying to make sure we understand what exhaustiveness is, in a way that it can be specified. On 3/24/2022 1:56 PM, Remi Forax wrote: > Thanks for sharing, > in the text, they are several mentions of the default pattern but the > default pattern is not defined. > > R?mi > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"amber-spec-experts" > *Sent: *Thursday, March 24, 2022 6:39:21 PM > *Subject: *Pattern coverage > > I've put a document at > > http://cr.openjdk.java.net/~briangoetz/eg-attachments/Coverage.pdf > > which outlines a formal model for pattern coverage, including > record patterns and the effects of sealing. This refines the work > we did earlier.? The document may be a bit rough so please let me > know if you spot any errors.? The approach here should be more > amenable to specification than the previous approach. > > > From gavin.bierman at oracle.com Fri Mar 25 11:37:24 2022 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Fri, 25 Mar 2022 11:37:24 +0000 Subject: Record pattern: matching an empty record In-Reply-To: <604077606.15770073.1647184146207.JavaMail.zimbra@u-pem.fr> References: <604077606.15770073.1647184146207.JavaMail.zimbra@u-pem.fr> Message-ID: <74C46ACD-4A68-463D-9527-AFB1F65D49E8@oracle.com> That?s right; with record patterns pattern matching can now initialise **zero or more** pattern variables (with type patterns it was always exactly one pattern variable to initialise). Gavin > On 13 Mar 2022, at 15:09, Remi Forax wrote: > > Hi all, > while writing the prototype of the runtime, > i found a case i think we never discuss, can we match an empty record ? > > record Empty() { } > > switch(object) { > case Empty() -> ... // no binding here > > I think the answer is yes because i don't see why we should do a special case for that, but i may be wrong. > > R?mi From brian.goetz at oracle.com Fri Mar 25 15:38:52 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 25 Mar 2022 11:38:52 -0400 Subject: Pattern assignment Message-ID: <7551e40d-558b-a4a3-b493-623a421e2012@oracle.com> We still have a lot of work to do on the current round of pattern matching (record patterns), but let's take a quick peek down the road.? Pattern assignment is a sensible next building block, not only because it is directly useful, but also because it will be required for _declaring_ deconstruction patterns in classes (that's how one pattern delegates to another.)? What follows is a rambling sketch of all the things we _could_ do with pattern assignment, though we need not do all of them initially, or even ever. # Pattern assignment So far, we've got two contexts in the language that can accommodate patterns -- `instanceof` and `switch`.? Both of these are conditional contexts, designed for dealing with partial patterns -- test whether a pattern matches, and if so, conditionally extract some state and act on it. There are cases, though, when we know a pattern will always match, in which case we'd like to spare ourselves the ceremony of asking.? If we have a 3d `Point`, asking if it is a `Point` is redundant and distracting: ``` Point p = ... if (p instanceof Point(var x, var y, var z)) { ??? // use x, y, z } ``` In this situation, we're asking a question to which we know the answer, and we're distorting the structure of our code to do it.? Further, we're depriving ourselves of the type checking the compiler would willingly do to validate that the pattern is total.? Much better to have a way to _assert_ that the pattern matches. ## Let-bind statements In such a case, where we want to assert that the pattern matches, and forcibly bind it, we'd rather say so directly.? We've experimented with a few ways to express this, and the best approach seems to be some sort of `let` statement: ``` let Point(var x, var y, var z) p = ...; // can use x, y, z, p ``` Other ways to surface this might be to call it `bind`: ``` bind Point(var x, var y, var z) p = ...; ``` or even use no keyword, and treat it as a generalization of assignment: ``` Point(var x, var y, var z) p = ...; ``` (Usual disclaimer: we discuss substance before syntax.) A `let` statement takes a pattern and an expression, and we statically verify that the pattern is exhaustive on the type of the expression; if it is not, this is a type error at compile time.? Any bindings that appear in the pattern are definitely assigned and in scope in the remainder of the block that encloses the `let` statement. Let statements are also useful in _declaring_ patterns; just as a subclass constructor will delegate part of its job to a superclass constructor, a subclass deconstruction pattern will likely want to delegate part of its job to a superclass deconstruction pattern.? Let statements are a natural way to invoke total patterns from other total patterns. #### Remainder Let statements require that the pattern be exhaustive on the type of the expression. For total patterns like type patterns, this means that every value is matched, including `null`: ``` let Object o = x; ``` Whatever the value of `x`, `o` will be assigned to `x` (even if `x` is null) because `Object o` is total on `Object`.? Similarly, some patterns are clearly not total on some types: ``` Object o = ... let String s = o;? // compile error ``` Here, `String s` is not total on `Object`, so the `let` statement is not valid. But as previously discussed, there is a middle ground -- patterns that are _total with remainder_ -- which are "total enough" to be allowed to be considered exhaustive, but which in fact do not match on certain "weird" values. An example is the record pattern `Box(var x)`; it matches all box instances, even those containing null, but does not match a `null` value itself (because to deconstruct a `Box`, we effectively have to invoke an instance member on the box, and we cannot invoke instance members on null receivers.) Similarly, the pattern `Box(Bag(String s))` is total on `Box>`, with remainder `null` and `Box(null)`. Because `let` statements guarantee that its bindings are definitely assigned after the `let` statement completes normally, the natural thing to do when presented with a remainder value is to complete abruptly by reason of exception. (This is what `switch` does as well.)? So the following statement: ``` Box> bbs = ... let Box(Bag(String s)) = bbs; ``` would throw when encountering `null` or `Box(null)` (but not `Box(Bag(null))`, because that matches the pattern, with `s=null`, just like a switch containing only this case would. #### Conversions JLS Chapter 5 ("Conversions and Contexts") outlines the conversions (widening, narrowing, boxing, unboxing, etc) that are permitted in various contexts (assignment, loose method invocation, strict method invocation, cast, etc.) We need to define the set of conversions we're willing to perform in the context of a `let` statement as well; which of the following do we want to support? ``` let int x = aShort;???? // primitive widening let byte b = 0;???????? // primitive narrowing let Integer x = 0;????? // boxing let int x = anInteger;? // unboxing ``` The above examples -- all of which use type patterns -- look a lot like local variable declarations (especially if we choose to go without a keyword); this strongly suggests we should align the valid set of conversions in `let` statements with those permitted in assignment context.? The one place where we have to exercise care is conversions that involve unboxing; a null in such circumstances feeds into the remainder of the pattern, rather than having matching throw (we're still likely to throw, but it affects the timing of how far we progress in a pattern switch before we do so.)? So for example, the the pattern `int x` is exhaustive on `Integer`, but with remainder `null`. ## Possible extensions There are a number of ways we can extend `let` statements to make it more useful; these could be added at the same time, or at a later time. #### What about partial patterns? There are times when it may be more convenient to use a `let` even when we know the pattern is partial.? In most cases, we'll still want to complete abruptly if the pattern doesn't match, but we may want to control what happens. For example: ``` let Optional.of(var contents) = optName else throw new IllegalArgumentException("name is empty"); ``` Having an `else` clause allows us to use a partial pattern, which receives control if the pattern does not match.? The `else` clause could choose to throw, but could also choose to `break` or `return` to an enclosing context, or even recover by assigning the bindings. #### What about recovery? If we're supporting partial patterns, we might want to allow the `else` clause to provide defaults for the bindings, rather than throw.? We can make the bindings of the pattern in the `let` statement be in scope, but definitely unassigned, in the `else` clause, which means the `else` clause could initialize them and continue: ``` let Optional.of(var contents) = optName else contents = "Unnamed"; ``` This allows us to continue, while preserving the invariant that when the `let` statement completes normally, all bindings are DA. #### What about guards If we're supporting partial patterns, we also need to consider the case where the pattern matches but we still want to reject the content. This could of course be handled by testing and throwing after the `let` completes, but if we want to recover via the `else` clause, we might want to handle this directly. We've already introduced a means to do this for switch cases -- a `when` clause -- and this works equally well in `let`: ``` let Point(var x, var y) = aPoint when x >= 0 && y >= 0 else { x = y = 0; } ``` #### What about expressions? The name `let` conjures up the image of `let` expressions in functional languages, where we introduce a local binding for use in the scope of a single expression.? This is not an accident!? It is quite useful when the same expression is going to be used multiple times, or when we want to limit the scope of a local to a specific computation. It is a short hop to `let` being usable as an expression, by providing an `in` clause: ``` String lastThree = ??? let int len = s.length() ??? in s.substring(len-3, len); ``` The scope of the binding `len` is the expression to the right of the `in`, nothing else.? (As with `switch` expressions, the expression to the right of the `in` could be a block with a `yield` statement.) It is a further short hop to permitting _multiple_ matches in a single `let` statement or expression: ``` int area = let Point(var x0, var y0) = lowerLeft, ?????????????? Point(var x1, var y1) = upperRight ?????????? in (x1-x0) * (y1-y0); ``` #### What about parameter bindings? Destructuring with total patterns is also useful for method and lambda parameters.? For a lambda that accepts a `Point`, we could include the pattern in the lambda parameter list, and the bindings would automatically be in scope in the body.? Instead of: ``` areaFn = (Point lowerLeft, Point upperRight) ???????? -> (upperRight.x() - lowerLeft.x()) * (upperRight.y() - lowerLeft.y()); ``` we could do the destructuring in the lambda header: ``` areaFn = (let Point(var x0, var y0) lowerLeft, ????????? let Point(var x1, var y1) upperRight) ???????? -> (x1-x0) * (y1-y0); ``` This allows us to treat the derived values to be "parameters" of the lambda.? We would enforce totality at compile time, and dynamically reject remainder as we do with `switch` and `let` statements. I think this one may be a bridge too far, though.? The method header should probably be reserved for API declaration, and destructuring only serves the implementation.? I think I'd prefer to move the `let` into the body of the method or lambda. From forax at univ-mlv.fr Mon Mar 28 15:20:10 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 28 Mar 2022 17:20:10 +0200 (CEST) Subject: Pattern assignment In-Reply-To: <7551e40d-558b-a4a3-b493-623a421e2012@oracle.com> References: <7551e40d-558b-a4a3-b493-623a421e2012@oracle.com> Message-ID: <847647104.2746202.1648480810828.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Friday, March 25, 2022 4:38:52 PM > Subject: Pattern assignment > We still have a lot of work to do on the current round of pattern matching > (record patterns), but let's take a quick peek down the road. Pattern > assignment is a sensible next building block, not only because it is directly > useful, but also because it will be required for _declaring_ deconstruction > patterns in classes (that's how one pattern delegates to another.) What follows > is a rambling sketch of all the things we _could_ do with pattern assignment, > though we need not do all of them initially, or even ever. And obviously (but let state the obvious hidden behind "directly useful") we have introduced records as named tuples and _declaring_deconstruction_ is a way deconstruct that tuple. Construction: var x = ... var y = ... var point = new Point(x, y); De-construction var point = ... __let__ Point(var x, var y) = point; > # Pattern assignment > So far, we've got two contexts in the language that can accommodate patterns -- > `instanceof` and `switch`. Both of these are conditional contexts, designed for > dealing with partial patterns -- test whether a pattern matches, and if so, > conditionally extract some state and act on it. > There are cases, though, when we know a pattern will always match, in which case > we'd like to spare ourselves the ceremony of asking. If we have a 3d `Point`, > asking if it is a `Point` is redundant and distracting: > ``` > Point p = ... > if (p instanceof Point(var x, var y, var z)) { > // use x, y, z > } > ``` > In this situation, we're asking a question to which we know the answer, and > we're distorting the structure of our code to do it. Further, we're depriving > ourselves of the type checking the compiler would willingly do to validate that > the pattern is total. Much better to have a way to _assert_ that the pattern > matches. > ## Let-bind statements > In such a case, where we want to assert that the pattern matches, and forcibly > bind it, we'd rather say so directly. We've experimented with a few ways to > express this, and the best approach seems to be some sort of `let` statement: > ``` > let Point(var x, var y, var z) p = ...; > // can use x, y, z, p > ``` > Other ways to surface this might be to call it `bind`: > ``` > bind Point(var x, var y, var z) p = ...; > ``` > or even use no keyword, and treat it as a generalization of assignment: > ``` > Point(var x, var y, var z) p = ...; > ``` > (Usual disclaimer: we discuss substance before syntax.) > A `let` statement takes a pattern and an expression, and we statically verify > that the pattern is exhaustive on the type of the expression; if it is not, this > is a > type error at compile time. Any bindings that appear in the pattern are > definitely assigned and in scope in the remainder of the block that encloses the > `let` statement. > Let statements are also useful in _declaring_ patterns; just as a subclass > constructor will delegate part of its job to a superclass constructor, a > subclass deconstruction pattern will likely want to delegate part of its job to > a superclass deconstruction pattern. Let statements are a natural way to invoke > total patterns from other total patterns. yes ! > #### Remainder > Let statements require that the pattern be exhaustive on the type of the > expression. > For total patterns like type patterns, this means that every value is matched, > including `null`: > ``` > let Object o = x; > ``` > Whatever the value of `x`, `o` will be assigned to `x` (even if `x` is null) > because `Object o` is total on `Object`. Similarly, some patterns are clearly > not total on some types: > ``` > Object o = ... > let String s = o; // compile error > ``` > Here, `String s` is not total on `Object`, so the `let` statement is not valid. > But as previously discussed, there is a middle ground -- patterns that are > _total with remainder_ -- which are "total enough" to be allowed to be > considered > exhaustive, but which in fact do not match on certain "weird" values. An > example is the record pattern `Box(var x)`; it matches all box instances, even > those containing null, but does not match a `null` value itself (because to > deconstruct a `Box`, we effectively have to invoke an instance member on the > box, and we cannot invoke instance members on null receivers.) Similarly, the > pattern `Box(Bag(String s))` is total on `Box>`, with remainder > `null` and `Box(null)`. > Because `let` statements guarantee that its bindings are definitely assigned > after the `let` statement completes normally, the natural thing to do when > presented with a remainder value is to complete abruptly by reason of exception. > (This is what `switch` does as well.) So the following statement: > ``` > Box> bbs = ... > let Box(Bag(String s)) = bbs; > ``` > would throw when encountering `null` or `Box(null)` (but not `Box(Bag(null))`, > because that matches the pattern, with `s=null`, just like a switch containing > only this case would. > #### Conversions > JLS Chapter 5 ("Conversions and Contexts") outlines the conversions (widening, > narrowing, boxing, unboxing, etc) that are permitted in various contexts > (assignment, loose method invocation, strict method invocation, cast, etc.) > We need to define the set of conversions we're willing to perform in the context > of a `let` statement as well; which of the following do we want to support? > ``` > let int x = aShort; // primitive widening > let byte b = 0; // primitive narrowing > let Integer x = 0; // boxing > let int x = anInteger; // unboxing > ``` > The above examples -- all of which use type patterns -- look a lot like local > variable declarations (especially if we choose to go without a keyword); this > strongly suggests we should align the valid set of conversions in `let` > statements with those permitted in assignment context. The one place where we > have to exercise care is conversions that involve unboxing; a null in such > circumstances feeds into the remainder of the pattern, rather than having > matching throw (we're still likely to throw, but it affects the timing of how > far we progress in a pattern switch before we do so.) So for example, the > the pattern `int x` is exhaustive on `Integer`, but with remainder `null`. There are another different between assignment and _let_, a _let_ creates new fresh local variables (binding) while assignment is able to reuse an existing local variable. In Java, the if statement is used a lot (too much IMO but i don't think we should fight to change that) so it may make sense to be able to reuse an existing local variables. Here is an example with a special keyword _ASSIGN_ indicating that the existing local variable is assigned. var box = ... var x = 0; if (...) { _let_ Box(_ASSIGN_ x) = box; } or with an if/else var minmax = ... int value; if (...) { _let_ MinMax(_ASSIGN_ value, _) = minmax; } else { _let_ MinMax(_, _ASSIGN_ value) = minmax; } > ## Possible extensions > There are a number of ways we can extend `let` statements to make it more > useful; these could be added at the same time, or at a later time. > #### What about partial patterns? > There are times when it may be more convenient to use a `let` even when we know > the pattern is partial. In most cases, we'll still want to complete abruptly if > the > pattern doesn't match, but we may want to control what happens. For example: > ``` > let Optional.of(var contents) = optName > else throw new IllegalArgumentException("name is empty"); > ``` > Having an `else` clause allows us to use a partial pattern, which receives > control if the pattern does not match. The `else` clause could choose to throw, > but could also choose to `break` or `return` to an enclosing context, or even > recover by assigning the bindings. I don't like that because in that case "let pattern else ..." is equivalent of "if instanceof pattern else ... " with the former being expression oriented and the later statement oriented. As i said earlier, i don't think we should fight the fact that Java is statement oriented by adding expression oriented variations of existing constructs. > #### What about recovery? > If we're supporting partial patterns, we might want to allow the `else` clause > to provide defaults for the bindings, rather than throw. We can make the > bindings of the > pattern in the `let` statement be in scope, but definitely unassigned, in the > `else` clause, which means the `else` clause could initialize them and continue: > ``` > let Optional.of(var contents) = optName > else contents = "Unnamed"; > ``` > This allows us to continue, while preserving the invariant that when the `let` > statement completes normally, all bindings are DA. It fails if the "then" part or the "else" part need more than one instruction. Again, it's statement vs expression. > #### What about guards > If we're supporting partial patterns, we also need to consider the case where > the pattern matches but we still want to reject the content. This could of > course be handled by testing and throwing after the `let` completes, but if we > want to recover via the `else` clause, we might want to handle this directly. > We've already introduced a means to do this for switch cases -- a `when` clause > -- and this works equally well in `let`: > ``` > let Point(var x, var y) = aPoint > when x >= 0 && y >= 0 > else { x = y = 0; } > ``` It can be re-written using an if instanceof, so i do not think we need a special syntax int x, y; if (!(aPoint instanceof Point(_ASSIGN_ x, _ASSIGN_ y) && x >= 0 && y >= 0)) { x = 0; y = 0; } > #### What about expressions? > The name `let` conjures up the image of `let` expressions in functional > languages, where we introduce a local binding for use in the scope of a single > expression. This is not an accident! It is quite useful when the same expression > is going to be used multiple times, or when we want to limit the scope of a > local > to a specific computation. > It is a short hop to `let` being usable as an expression, by providing an `in` > clause: > ``` > String lastThree = > let int len = s.length() > in s.substring(len-3, len); > ``` > The scope of the binding `len` is the expression to the right of the `in`, > nothing else. (As with `switch` expressions, the expression to the right > of the `in` could be a block with a `yield` statement.) > It is a further short hop to permitting _multiple_ matches in a single `let` > statement or expression: > ``` > int area = let Point(var x0, var y0) = lowerLeft, > Point(var x1, var y1) = upperRight > in (x1-x0) * (y1-y0); > ``` "Let ... in" is useful but i don't think it's related to the current proposal, for me it's orthogonal. We can introduce "let ... in" independently to the pattern assignment idea, and if the pattern assignment is already in the language, then "let ... in" will support it. > #### What about parameter bindings? > Destructuring with total patterns is also useful for method and lambda > parameters. For a lambda that accepts a `Point`, we could include the pattern > in the lambda parameter list, and the bindings would automatically be in scope > in the body. Instead of: > ``` > areaFn = (Point lowerLeft, Point upperRight) > -> (upperRight.x() - lowerLeft.x()) * (upperRight.y() - lowerLeft.y()); > ``` > we could do the destructuring in the lambda header: > ``` > areaFn = (let Point(var x0, var y0) lowerLeft, > let Point(var x1, var y1) upperRight) > -> (x1-x0) * (y1-y0); > ``` > This allows us to treat the derived values to be "parameters" of the lambda. We > would enforce totality at compile time, and dynamically reject remainder as we > do with `switch` and `let` statements. > I think this one may be a bridge too far, though. The method header should > probably be reserved for API declaration, and destructuring only serves the > implementation. I think I'd prefer to move the `let` into the body of the > method or lambda. Technically, a lambda is implementation not API because a lambda is using the functional interface as API, but as you said it feels like a bridge too far at least for now. When Valhalla will land people may use value records more and we may have to introduce this kind of syntax. R?mi From brian.goetz at oracle.com Mon Mar 28 15:41:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 28 Mar 2022 11:41:10 -0400 Subject: [External] : Re: Pattern assignment In-Reply-To: <847647104.2746202.1648480810828.JavaMail.zimbra@u-pem.fr> References: <7551e40d-558b-a4a3-b493-623a421e2012@oracle.com> <847647104.2746202.1648480810828.JavaMail.zimbra@u-pem.fr> Message-ID: > > There are another different between assignment and _let_, a _let_ > creates new fresh local variables (binding) while assignment is able > to reuse an existing local variable. Correct, the more precise analogy is not to _assignment_, but to _local variable declaration with initialization_ (whose semantics are derived from assignment.) > In Java, the if statement is used a lot (too much IMO but i don't > think we should fight to change that) so it may make sense to be able > to reuse an existing local variables. Yes, this has come up before.? I agree that there are cases where we might want this (there's one distinguished case where we almost cannot avoid this), but in general, I am pretty reluctant to go there -- I think this is incremental complexity (and encouragement of more mutability) with not enough commensurate benefit. > > > > ## Possible extensions > > There are a number of ways we can extend `let` statements to make > it more > useful; these could be added at the same time, or at a later time. > > #### What about partial patterns? > > There are times when it may be more convenient to use a `let` even > when we know > the pattern is partial.? In most cases, we'll still want to > complete abruptly if the > pattern doesn't match, but we may want to control what happens.? > For example: > > ``` > let Optional.of(var contents) = optName > else throw new IllegalArgumentException("name is empty"); > ``` > > Having an `else` clause allows us to use a partial pattern, which > receives > control if the pattern does not match.? The `else` clause could > choose to throw, > but could also choose to `break` or `return` to an enclosing > context, or even > recover by assigning the bindings. > > > I don't like that because in that case "let pattern else ..." is > equivalent of "if instanceof pattern else ... " with the former being > expression oriented and the later statement oriented. > As i said earlier, i don't think we should fight the fact that Java is > statement oriented by adding expression oriented variations of > existing constructs. We haven't talked about let expressions yet; this is still a statement. It's a fair point to say that the above example could be rewritten as an if-else, and when the else throws unconditionally, we still get the same scoping.? Or that it can be rewritten as ??? if (!(pattern match)) ??????? throw blah On the other hand, people don't particularly like having to invert the match like this just to get the scoping they want. In any case, the real value of the else block is where you want to continue (and merge the control flow) with default values of the bindings set in the else clause (next section).? Dropping "else" makes this extremely messy.? And once you have else, the rest comes for the ride. > > > #### What about recovery? > > If we're supporting partial patterns, we might want to allow the > `else` clause > to provide defaults for the bindings, rather than throw.? We can > make the bindings of the > pattern in the `let` statement be in scope, but definitely > unassigned, in the > `else` clause, which means the `else` clause could initialize them > and continue: > > ``` > let Optional.of(var contents) = optName > else contents = "Unnamed"; > ``` > > This allows us to continue, while preserving the invariant that > when the `let` > statement completes normally, all bindings are DA. > > > It fails if the "then" part or the "else" part need more than one > instruction. > Again, it's statement vs expression. No, it's still a statement.? I don't know where you're getting this "statement vs expression" thing from? > > > #### What about guards > > If we're supporting partial patterns, we also need to consider the > case where > the pattern matches but we still want to reject the content.? This > could of > course be handled by testing and throwing after the `let` > completes, but if we > want to recover via the `else` clause, we might want to handle > this directly. > We've already introduced a means to do this for switch cases -- a > `when` clause > -- and this works equally well in `let`: > > ``` > let Point(var x, var y) = aPoint > when x >= 0 && y >= 0 > else { x = y = 0; } > ``` > > > It can be re-written using an if instanceof, so i do not think we need > a special syntax > > ? int x, y; > ? if (!(aPoint instanceof Point(_ASSIGN_ x, _ASSIGN_ y) && x >= 0 && y > >= 0)) { > ?? x = 0; > ?? y = 0; > ? } All let statements can be rewritten as instanceof.? Are you arguing that the whole idea is silly? > > "Let ... in" is useful but i don't think it's related to the current > proposal, for me it's orthogonal. We can introduce "let ... in" > independently to the pattern assignment idea, > and if the pattern assignment is already in the language, then "let > ... in" will support it. Yes and no.? You are correct that we could do either or both independently.? But it's not my job just to design each feature in a locally optimal way; it's my job to ensure that the features we design will fit together when they meet up in the future.?? The fact that the construct generalizes in this way is an important part of the design even if we don't plan to do this part now. From forax at univ-mlv.fr Mon Mar 28 22:09:07 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 29 Mar 2022 00:09:07 +0200 (CEST) Subject: [External] : Re: Pattern assignment In-Reply-To: References: <7551e40d-558b-a4a3-b493-623a421e2012@oracle.com> <847647104.2746202.1648480810828.JavaMail.zimbra@u-pem.fr> Message-ID: <1502568993.2904776.1648505347460.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Monday, March 28, 2022 5:41:10 PM > Subject: Re: [External] : Re: Pattern assignment >> There are another different between assignment and _let_, a _let_ creates new >> fresh local variables (binding) while assignment is able to reuse an existing >> local variable. > Correct, the more precise analogy is not to _assignment_, but to _local variable > declaration with initialization_ (whose semantics are derived from assignment.) >> In Java, the if statement is used a lot (too much IMO but i don't think we >> should fight to change that) so it may make sense to be able to reuse an >> existing local variables. > Yes, this has come up before. I agree that there are cases where we might want > this (there's one distinguished case where we almost cannot avoid this), but in > general, I am pretty reluctant to go there -- I think this is incremental > complexity (and encouragement of more mutability) with not enough commensurate > benefit. My general point is that it's less complex to consider that the semantics should be an _assignment pattern_ instead a _local variable declarations with initialization semantics_ if most (not the let ... in) of semantics variants you are proposing can be express as combinations of assignments + if/else. And the "encouragement of more mutability" is a false dichotomy argument because you are conflating the mutation of objects with the mutation of local variables, mutation of objects are visible from the outside (from the user POV) which make those objects harder to debug, mutation of local variables are not visible from the outside, so those are very different beasts but you already know that. >>> ## Possible extensions >>> There are a number of ways we can extend `let` statements to make it more >>> useful; these could be added at the same time, or at a later time. >>> #### What about partial patterns? >>> There are times when it may be more convenient to use a `let` even when we know >>> the pattern is partial. In most cases, we'll still want to complete abruptly if >>> the >>> pattern doesn't match, but we may want to control what happens. For example: >>> ``` >>> let Optional.of(var contents) = optName >>> else throw new IllegalArgumentException("name is empty"); >>> ``` >>> Having an `else` clause allows us to use a partial pattern, which receives >>> control if the pattern does not match. The `else` clause could choose to throw, >>> but could also choose to `break` or `return` to an enclosing context, or even >>> recover by assigning the bindings. >> I don't like that because in that case "let pattern else ..." is equivalent of >> "if instanceof pattern else ... " with the former being expression oriented and >> the later statement oriented. >> As i said earlier, i don't think we should fight the fact that Java is statement >> oriented by adding expression oriented variations of existing constructs. > We haven't talked about let expressions yet; this is still a statement. Okay, i did not expect that. For me let was an expression because it's usually the "raison d'?tre" of let, being an assignment expression. Reusing 'let' here is really confusing. > It's a fair point to say that the above example could be rewritten as an > if-else, and when the else throws unconditionally, we still get the same > scoping. Or that it can be rewritten as > if (!(pattern match)) > throw blah > On the other hand, people don't particularly like having to invert the match > like this just to get the scoping they want. If you really want > In any case, the real value of the else block is where you want to continue (and > merge the control flow) with default values of the bindings set in the else > clause (next section). Dropping "else" makes this extremely messy. And once you > have else, the rest comes for the ride. But your proposal do the opposite, you are not dropping the "else" but you are dropping the "then" which is also makes thing messy if you want to assign + call a method. One advantage of the "if" is that you can easily add more instructions inside the then branch or the else branch. With a let ... else, users will have to jungle between if instanceof/ else and let ... else if they add/remove instruction in the then branch. >>> #### What about recovery? >>> If we're supporting partial patterns, we might want to allow the `else` clause >>> to provide defaults for the bindings, rather than throw. We can make the >>> bindings of the >>> pattern in the `let` statement be in scope, but definitely unassigned, in the >>> `else` clause, which means the `else` clause could initialize them and continue: >>> ``` >>> let Optional.of(var contents) = optName >>> else contents = "Unnamed"; >>> ``` >>> This allows us to continue, while preserving the invariant that when the `let` >>> statement completes normally, all bindings are DA. >> It fails if the "then" part or the "else" part need more than one instruction. >> Again, it's statement vs expression. > No, it's still a statement. I don't know where you're getting this "statement vs > expression" thing from? see above, mostly because i'm cursed with knowledge, there is already a 'let' node in the javac AST. >>> #### What about guards >>> If we're supporting partial patterns, we also need to consider the case where >>> the pattern matches but we still want to reject the content. This could of >>> course be handled by testing and throwing after the `let` completes, but if we >>> want to recover via the `else` clause, we might want to handle this directly. >>> We've already introduced a means to do this for switch cases -- a `when` clause >>> -- and this works equally well in `let`: >>> ``` >>> let Point(var x, var y) = aPoint >>> when x >= 0 && y >= 0 >>> else { x = y = 0; } >>> ``` >> It can be re-written using an if instanceof, so i do not think we need a special >> syntax >> int x, y; >> if (!(aPoint instanceof Point(_ASSIGN_ x, _ASSIGN_ y) && x >= 0 && y >= 0)) { >> x = 0; >> y = 0; >> } By the way, there is a simpler way to write the same thing, Point(var x, var y) = aPoint; if (x < 0 || y < 0) { x = y = 0; } > All let statements can be rewritten as instanceof. Are you arguing that the > whole idea is silly? I'm arguing that that the "if" in Java is a statement that is versatile enough so there is no need to introduce new constructs that are like a "if" but with no way to add another operation than the pattern in the then part. If really we have a issue with !instanceof, which i agree is not very readable, i would prefer to keep the "if" and introduce something like _not_match_. But it's too much syntactic sugar and not enough semantics for my taste. >> "Let ... in" is useful but i don't think it's related to the current proposal, >> for me it's orthogonal. We can introduce "let ... in" independently to the >> pattern assignment idea, >> and if the pattern assignment is already in the language, then "let ... in" will >> support it. > Yes and no. You are correct that we could do either or both independently. But > it's not my job just to design each feature in a locally optimal way; it's my > job to ensure that the features we design will fit together when they meet up > in the future. The fact that the construct generalizes in this way is an > important part of the design even if we don't plan to do this part now. The generalization part seems self prophetic to me. If we name both features 'let', then it's a generalization. And designing let ... in to be both useful by itself and easy composable with pattern assignment so we get synergy does not seem to be a problem here. R?mi From brian.goetz at oracle.com Tue Mar 29 21:01:18 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 29 Mar 2022 17:01:18 -0400 Subject: Declared patterns -- translation and reflection Message-ID: Time to take a peek ahead at _declared patterns_.? Declared patterns come in three varieties -- deconstruction patterns, static patterns, and instance patterns (corresponding to constructors, static methods, and instance methods.)? I'm going to start with deconstruction patterns, but the basic game is the same for all three. Ignoring the trivial details, a deconstruction pattern looks like a "constructor in reverse": ```{.java} class Point { ??? int x, y; ??? Point(int x, int y) { ??????? this.x = x; ??????? this.y = y; ??? } ??? deconstructor(int x, int y) { ??????? x = this.x; ??????? y = this.y; ??? } } ``` Deconstruction patterns share the weird behaviors that constructors have in that they are instance members, but are not inherited, and that rather having names, they are accessed via the class name. Deconstruction patterns differ from static/instance patterns in that they are by definition total; they cannot fail to match. (This is a somewhat arbitrary simplification in the object model, but a reasonable one.)? They also cannot have any input parameters, other than the receiver. Patterns differ from their ctor/method counterparts in that they have what appear to be _two_ argument lists; a parameter list (like ctors and methods), and a _binding_ list.? The parameter list is often empty (with the receiver as the match target). The binding list can be thought of as a "conditional multiple return".? That they may return multiple values (and, for partial patterns, can return no values at all when they don't match) presents a challenge for translation to classfiles, and for the reflection model. #### Translation to methods Patterns contain imperative code, so surely we want to translate them to methods in some way.? The pattern input parameters map cleanly to method parameters. The pattern bindings need to tunneled, somehow, through the method return (or some other mechanism).? For our deconstructor, we might translate as: ??? PatternCarrier () (where the method applies the pattern, and PatternCarrier wraps and provides access to the bindings) or ??? PatternObject () (where PatternObject provides indirection to behavior to invoke the pattern, which in turn returns the carrier.) With either of these approaches, though, the pattern name is a problem, because patterns can be overloaded on their _bindings_, but both of these return types are insensitive to bindings. It is useful to characterize the "shape" of a pattern with a MethodType, where the parameters of the MethodType are the binding types.? (The return type is less constrained, but it is sometimes useful to use the return type of the MethodType for the required type of the pattern.)? Call this the "descriptor" of the pattern. If we do this, we can use some name mangling to encode the descriptor in the method name: ??? PatternCarrier name$mangle() The mangling has to be stable across compilations with respect to any source- and binary-compatible changes to the pattern declaration.? One mangling that works quite well is to use the "symbolic-freedom encoding" of the erasure of the pattern descriptor.? Because the erasure of the descriptor is exactly as stable as any other method signature derived from source declarations, it will have the desired binary compatibility properties, overriding will work as expected, etc. #### Return value In an earlier design, we used a pattern object (which was a bundle of method handles) as the return value of the pattern. This enabled clients to invoke these via condy and bind method handles into the constant pool for deconstruction and static patterns. Either way, we make use of some sort of carrier object to carry the bindings from the pattern to the client; either we return the carrier from the pattern method, or there is a method on the pattern object that we invoke to get a carrier.? We have a few preferences about the carrier; we'd like to be able to late-bind to the actual implementation (i.e., we don't want to freeze the name of a carrier class in the method descriptor), and at least for records, we'd like to let the record instance itself be the carrier (since it is immutable and we can just invoke the accessors to get the bindings.) #### Carriers As part of the work on template strings, Jim has put back some code that was originally written for the purpose of translating patterns, called "carriers".? There are methods / bootstraps that take a MethodType and return method handles to (a) encode values of those types into an opaque carrier object and (b) pull individual values out of a carrier.? This means that the choice of carrier object can be deferred to runtime, as long as both the bundling and unbundling methods handles agree on the carrier form. The choice of carrier is largely a footprint/specificity tradeoff.? One could imagine a carrier class per shape, or a single carrier class that wraps an Object[], or caching some number of common shapes (three ints and two refs).? This sort of tuning should be separate from the protocol encoded in the bytecode of the pattern method and its clients. The pattern matching runtime will provide some condy bootstraps which wrap the Carriers behavior. Since at least some patterns are conditional, we have to have a way to encode failure into the protocol.? For a partial pattern, we can use a B2 carrier and use null to encode failure to match; for a total pattern, we can use a B3 carrier. #### Proposed encoding Earlier explorations did a lot of work to preserve the optimization that a match target can be its own carrier.? But further analysis reveals that the cost of doing so for other than records is pretty substantial and works against the model of a pattern declaration being an imperative body of code that runs at match time.? So for record patterns, we can "inline" them by using `instanceof` as the applicability test and accessors for extraction, and for all other patterns, go through the carrier runtime. This allows us to encode pattern methods as ??? Object name$mangle(ARGS) and have the pattern method do the match and return a carrier (or null), using the carrier object that the carrier runtime associates with the pattern descriptor.? And clients can take apart the result again using the extraction logic that the carrier runtime associates with the pattern descriptor. This also means that instance patterns "just work" because virtual dispatch selects the right implementation for us automatically, and all implementations that can be overrides will also implicitly agree on the encoding. Because patterns are methods, we can take advantage of all the affordances of methods.? We can use access bits to control accessibility in the obvious way; we can use the attributes that carry annotations, method parameter metadata, and generics signatures to carry information about the pattern declaration and its parameters.? What's missing is a place to put metadata for the *bindings*, and to record the fact that this is a pattern implementation and not an ordinary method.? So, we add the following attribute on pattern methods: ??? Pattern { ??????? u2 attr_name; ??????? u4 attr_length; ??????? u2 patternFlags; // bitmask ??????? u2 patternName;? // index of UTF8 constant ??????? u2 patternDescr; // index of MethodType (or alternately UTF8) constant ??????? u2 attributes_count; ??????? attribute_info attributes[attributes_count]; ??? } This says that "this method is a pattern", reifies the name of the pattern (patternName), reifies the pattern descriptor (patternDescr) which encodes the types of the bindings as a method descriptor or MethodType, and has attributes which can carry annotations, parameter metadata, and signature metadata for the bindings.?? The existing attributes (e.g. Signature, ParameterNames, RVAA) can be reused as is, with the interpretation that this is the signature (or names, or annos) of the *bindings*, not the input parameters.? Flags can carry things like "deconstructor pattern" or "partial pattern" as needed. ## Reflection We already have a sensible base class in the reflection library for reflecting patterns: Executable.? All of the methods on Executable make sense for patterns, including Object as the return type.? If the pattern is reflectively invoked, it will return null (for no match) or an Object[]; this Object[] can be thought of as the boxing of the carrier.? Since the method return type is Object, this is an entirely reasonable interpretation. We need some additional methods to describe the bindings, so we would have a subtype of Executable for Pattern, with methods like getBindings(), getAnnotatedBindings(), getGenericBindings(), isDeconstructor(), isPartial(), etc. ## Summary This design borrows from previous rounds, but makes a number of simplifications. ?- The bindings of a pattern are captured in a MethodType, called the _pattern descriptor_.? The parameters of the pattern descriptor are the types of the bindings; the return type is the minimal type that will match the pattern (but is not as important as the bindings.) ?- Patterns are translated as methods whose names are derived, deterministically, from the name of the pattern and the erasure of the pattern descriptor.? These are called pattern methods. Pattern methods take as parameters the input parameters of the pattern, and return Object. ?- The returned object is an opaque carrier.? Null means the pattern didn't match.? A non-null value is the carrier type (from the carrier runtime) which is derived from the pattern descriptor. ?- Pattern methods are not directly invocable from the source language; they are invoked indirectly through pattern matching, or reflection. ?- Generated code invokes the pattern method and interprets the returned value according to the protocol, using MHs from the pattern runtime to access the bindings. ?- Pattern methods have a Pattern attribute, which captures information about the pattern as a whole (is a total/partial, a deconstructor, etc) and parameter-related attributes which describe the bindings. ?- Patterns are reflected through a new subtype of Executable, which exposes new methods to reflect over bindings. ?- When invoking a pattern method reflectively, the carrier is boxed to an Object[]. From forax at univ-mlv.fr Tue Mar 29 22:19:27 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 00:19:27 +0200 (CEST) Subject: Declared patterns -- translation and reflection In-Reply-To: References: Message-ID: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Tuesday, March 29, 2022 11:01:18 PM > Subject: Declared patterns -- translation and reflection > Time to take a peek ahead at _declared patterns_. Declared patterns come in > three varieties -- deconstruction patterns, static patterns, and instance > patterns (corresponding to constructors, static methods, and instance methods.) > I'm going to start with deconstruction patterns, but the basic game is the same > for all three. I mostly agree with everything said apart from the syntax of a deconstructor (see my next message about how small things can be improved). I have several problems with the proposed syntax for a deconstructor. I can see the appeal of having a code very similar to a constructor but it's a trap, a constructor and a deconstructor do not have the same semantics, a constructor initialize fields (which have a name) while a deconstructor (or a pattern method) initialize bindings which does not have a name at that point yet. 1/ conceptually there is a mismatch, the syntax introduce names for the bindings, but they have no names at that point, bindings only have names AFTER the pattern matching succeed. 2/ sending the value of the binding by name is alien to Java. In Java, sending values is by the position of the value. 3/ the conceptual mismatch also exists at runtime, you need to permute the value of bindings before creating the carrier because a carrier takes the value of the binding by position while the code will takes the value of the bindings by name (you need the equivalent of MethodHandles.permuteArguments() otherwise you will see the re-organisation of the code if they are side effects). Let's try to come with a syntax, as i said, bindings have no names at that point so the deconstructor should declare the bindings (int, int) and not (int x, int y), so a syntax like _deconstructor_ (int, int) { _send_bindings_(this.x, this.y); } Here the syntax shows that the value of the bindings are assigned following the position of the expression like usual in Java. We can discuss if _send_bindings_ should be "return" or another keyword and if the binding types should be declared before or after _deconstructor_. By example, if you wan to maintain a kind of symmetry with the constructor, we can reuse the name of the class instead of _deconstructor_ and move the binding types in front of the name of the class to show that the bindings move from the class to the pattern matching in the same direction like a return type of a method. Something like this: (int, int) Point { _send_bindings_(this.x, this.y); } To summarize, the proposed syntax does the convey the underlying semantics of the bindings initialization and make things more confusing than it should. > Ignoring the trivial details, a deconstruction pattern looks like a "constructor > in reverse": > ```{.java} > class Point { > int x, y; > Point(int x, int y) { > this.x = x; > this.y = y; > } > deconstructor(int x, int y) { > x = this.x; > y = this.y; > } > } > ``` R?mi From forax at univ-mlv.fr Tue Mar 29 22:36:53 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 00:36:53 +0200 (CEST) Subject: Declared patterns -- translation and reflection In-Reply-To: References: Message-ID: <691219619.3739631.1648593412907.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Tuesday, March 29, 2022 11:01:18 PM > Subject: Declared patterns -- translation and reflection > Time to take a peek ahead at _declared patterns_. Declared patterns come in > three varieties -- deconstruction patterns, static patterns, and instance > patterns (corresponding to constructors, static methods, and instance methods.) > I'm going to start with deconstruction patterns, but the basic game is the same > for all three. > Ignoring the trivial details, a deconstruction pattern looks like a "constructor > in reverse": > ```{.java} > class Point { > int x, y; > Point(int x, int y) { > this.x = x; > this.y = y; > } [....] > } > ``` > Deconstruction patterns share the weird behaviors that constructors have in that > they are instance members, but are not inherited, and that rather having names, > they are accessed via the class name. > Deconstruction patterns differ from static/instance patterns in that they are by > definition total; they cannot fail to match. (This is a somewhat arbitrary > simplification in the object model, but a reasonable one.) They also cannot > have any input parameters, other than the receiver. > Patterns differ from their ctor/method counterparts in that they have what > appear to be _two_ argument lists; a parameter list (like ctors and methods), > and a _binding_ list. The parameter list is often empty (with the receiver as > the match target). The binding list can be thought of as a "conditional > multiple return". That they may return multiple values (and, for partial > patterns, can return no values at all when they don't match) presents a > challenge for translation to classfiles, and for the reflection model. > #### Translation to methods > Patterns contain imperative code, so surely we want to translate them to methods > in some way. The pattern input parameters map cleanly to method parameters. > The pattern bindings need to tunneled, somehow, through the method return (or > some other mechanism). For our deconstructor, we might translate as: > PatternCarrier () > (where the method applies the pattern, and PatternCarrier wraps and provides > access to the bindings) or > PatternObject () > (where PatternObject provides indirection to behavior to invoke the pattern, > which in turn returns the carrier.) > With either of these approaches, though, the pattern name is a problem, because > patterns can be overloaded on their _bindings_, but both of these return types > are insensitive to bindings. > It is useful to characterize the "shape" of a pattern with a MethodType, where > the parameters of the MethodType are the binding types. (The return type is > less constrained, but it is sometimes useful to use the return type of the > MethodType for the required type of the pattern.) Call this the "descriptor" of > the pattern. > If we do this, we can use some name mangling to encode the descriptor in the > method name: > PatternCarrier name$mangle() > The mangling has to be stable across compilations with respect to any source- > and binary-compatible changes to the pattern declaration. One mangling that > works quite well is to use the "symbolic-freedom encoding" of the erasure of > the pattern descriptor. Because the erasure of the descriptor is exactly as > stable as any other method signature derived from source declarations, it will > have the desired binary compatibility properties, overriding will work as > expected, etc. I think we need a least to use a special name like the same way we have . I agree that we also need to encode the method type descriptor (the carrier type) into the name, so the name of the method in the classfile should be or (or perhaps ofr the pattern methods). > #### Return value > In an earlier design, we used a pattern object (which was a bundle of method > handles) as the return value of the pattern. This enabled clients to invoke > these via condy and bind method handles into the constant pool for > deconstruction and static patterns. > Either way, we make use of some sort of carrier object to carry the bindings > from the pattern to the client; either we return the carrier from the pattern > method, or there is a method on the pattern object that we invoke to get a > carrier. We have a few preferences about the carrier; we'd like to be able to > late-bind to the actual implementation (i.e., we don't want to freeze the name > of a carrier class in the method descriptor), and at least for records, we'd > like to let the record instance itself be the carrier (since it is immutable > and we can just invoke the accessors to get the bindings.) So the return type is either Object (too hide the type of the carrier) or a lambda that returns an Object (PatternObject or PatternCarrier acting like a glorified lambda). > #### Carriers > As part of the work on template strings, Jim has put back some code that was > originally written for the purpose of translating patterns, called "carriers". > There are methods / bootstraps that take a MethodType and return method handles > to (a) encode values of those types into an opaque carrier object and (b) pull > individual values out of a carrier. This means that the choice of carrier > object can be deferred to runtime, as long as both the bundling and unbundling > methods handles agree on the carrier form. > The choice of carrier is largely a footprint/specificity tradeoff. One could > imagine a carrier class per shape, or a single carrier class that wraps an > Object[], or caching some number of common shapes (three ints and two refs). > This sort of tuning should be separate from the protocol encoded in the > bytecode of the pattern method and its clients. > The pattern matching runtime will provide some condy bootstraps which wrap the > Carriers behavior. > Since at least some patterns are conditional, we have to have a way to encode > failure into the protocol. For a partial pattern, we can use a B2 carrier and > use null to encode failure to match; for a total pattern, we can use a B3 > carrier. > #### Proposed encoding > Earlier explorations did a lot of work to preserve the optimization that a match > target can be its own carrier. But further analysis reveals that the cost of > doing so for other than records is pretty substantial and works against the > model of a pattern declaration being an imperative body of code that runs at > match time. So for record patterns, we can "inline" them by using `instanceof` > as the applicability test and accessors for extraction, and for all other > patterns, go through the carrier runtime. > This allows us to encode pattern methods as > Object name$mangle(ARGS) > and have the pattern method do the match and return a carrier (or null), using > the carrier object that the carrier runtime associates with the pattern > descriptor. And clients can take apart the result again using the extraction > logic that the carrier runtime associates with the pattern descriptor. > This also means that instance patterns "just work" because virtual dispatch > selects the right implementation for us automatically, and all implementations > that can be overrides will also implicitly agree on the encoding. > Because patterns are methods, we can take advantage of all the affordances of > methods. We can use access bits to control accessibility in the obvious way; we > can use the attributes that carry annotations, method parameter metadata, and > generics signatures to carry information about the pattern declaration and its > parameters. What's missing is a place to put metadata for the *bindings*, and > to record the fact that this is a pattern implementation and not an ordinary > method. So, we add the following attribute on pattern methods: > Pattern { > u2 attr_name; > u4 attr_length; > u2 patternFlags; // bitmask > u2 patternName; // index of UTF8 constant > u2 patternDescr; // index of MethodType (or alternately UTF8) constant > u2 attributes_count; > attribute_info attributes[attributes_count]; > } > This says that "this method is a pattern", reifies the name of the pattern > (patternName), reifies the pattern descriptor (patternDescr) which encodes the > types of the bindings as a method descriptor or MethodType, and has attributes > which can carry annotations, parameter metadata, and signature metadata for the > bindings. The existing attributes (e.g. Signature, ParameterNames, RVAA) can be > reused as is, with the interpretation that this is the signature (or names, or > annos) of the *bindings*, not the input parameters. Flags can carry things like > "deconstructor pattern" or "partial pattern" as needed. >From the classfile POV, a constructor is a method with a funny name in between brackets, i think deconstructor and pattern methods should work the same way. Unlike a constructor, we need a way to attach the carrier type (and perhaps the pattern name) on the side, so an attribute on the pattern method seems the right choice. > ## Reflection > We already have a sensible base class in the reflection library for reflecting > patterns: Executable. All of the methods on Executable make sense for patterns, > including Object as the return type. If the pattern is reflectively invoked, it > will return null (for no match) or an Object[]; this Object[] can be thought of > as the boxing of the carrier. Since the method return type is Object, this is > an entirely reasonable interpretation. > We need some additional methods to describe the bindings, so we would have a > subtype of Executable for Pattern, with methods like getBindings(), > getAnnotatedBindings(), getGenericBindings(), isDeconstructor(), isPartial(), > etc. I agree if getBindings() return a Class[]. As i said, apart from the semantics implied by the proposed syntax, the rest of the design is great. R?mi From brian.goetz at oracle.com Wed Mar 30 00:40:46 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 29 Mar 2022 20:40:46 -0400 Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> References: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> Message-ID: <48b2ce50-6905-ec95-6e38-f23a68cbd683@oracle.com> I am disappointed that you took this as an invitation to digress into syntax here, when it should have been blindingly obvious that this was not the time for a syntax discussion.? (And when there is a syntax discussion, which this isn't, we need to cover all the different forms of declared patterns together; trying to design dtor patterns in a vacuum misses a number of considerations.) I'll respond to your other points separately. On 3/29/2022 6:19 PM, Remi Forax wrote: > > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"amber-spec-experts" > *Sent: *Tuesday, March 29, 2022 11:01:18 PM > *Subject: *Declared patterns -- translation and reflection > > Time to take a peek ahead at _declared patterns_.? Declared > patterns come in three varieties -- deconstruction patterns, > static patterns, and instance patterns (corresponding to > constructors, static methods, and instance methods.)? I'm going to > start with deconstruction patterns, but the basic game is the same > for all three. > > > I mostly agree with everything said apart from the syntax of a > deconstructor > (see my next message about how small things can be improved). > > I have several problems with the proposed syntax for a deconstructor. > ?I can see the appeal of having a code very similar to a constructor > but it's a trap, a constructor and a deconstructor do not have the > same semantics, a constructor initialize fields (which have a name) > while a deconstructor (or a pattern method) initialize bindings which > does not have a name at that point yet. > > 1/ conceptually there is a mismatch, the syntax introduce names for > the bindings, but they have no names at that point, bindings only have > names AFTER the pattern matching succeed. > 2/ sending the value of the binding by name is alien to Java. In Java, > sending values is by the position of the value. > 3/ the conceptual mismatch also exists at runtime, you need to permute > the value of bindings before creating the carrier because a carrier > takes the value of the binding by position while the code will takes > the value of the bindings by name (you need the equivalent of > MethodHandles.permuteArguments() otherwise you will see the > re-organisation of the code if they are side effects). > > Let's try to come with a syntax, > as i said, bindings have no names at that point so the deconstructor > should declare the bindings (int, int) and not (int x, int y), > so a syntax like > > ? _deconstructor_ (int, int) { > ?? _send_bindings_(this.x, this.y); > ? } > > Here the syntax shows that the value of the bindings are assigned > following the position of the expression like usual in Java. > > > We can discuss if _send_bindings_ should be "return" or another > keyword and if the binding types should be declared before or after > _deconstructor_. > > By example, if you wan to maintain a kind of symmetry with the > constructor, we can reuse the name of the class instead of > _deconstructor_ and move the binding types in front of the name of the > class to show that the bindings move from the class to the pattern > matching in the same direction like a return type of a method. > Something like this: > ? (int, int) Point { > ?? _send_bindings_(this.x, this.y); > ? } > > To summarize, the proposed syntax does the convey the underlying > semantics of the bindings initialization and make things more > confusing than it should. > > > > Ignoring the trivial details, a deconstruction pattern looks like > a "constructor in reverse": > > ```{.java} > class Point { > ??? int x, y; > > ??? Point(int x, int y) { > ??????? this.x = x; > ??????? this.y = y; > ??? } > > deconstructor(int x, int y) { > ??????? x = this.x; > ??????? y = this.y; > ??? } > } > ``` > > > R?mi > From brian.goetz at oracle.com Wed Mar 30 00:42:38 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 29 Mar 2022 20:42:38 -0400 Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> References: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> Message-ID: > > 1/ conceptually there is a mismatch, the syntax introduce names for > the bindings, but they have no names at that point, bindings only have > names AFTER the pattern matching succeed. I think you have missed the point here.? The names serve the implementation of the pattern, not the interface -- just as parameter names to methods do.?? As you see in the example, these are effectively blank final locals in the body of the pattern, which must be assigned to.? (I'd have pointed this out if this were actually a message on declaring deconstructors, but since the message is on translation and reflection I didn't want to digress.) > 2/ sending the value of the binding by name is alien to Java. In Java, > sending values is by the position of the value. It's not by name.? I don't know where you got this idea. > 3/ the conceptual mismatch also exists at runtime, you need to permute > the value of bindings before creating the carrier because a carrier > takes the value of the binding by position while the code will takes > the value of the bindings by name (you need the equivalent of > MethodHandles.permuteArguments() otherwise you will see the > re-organisation of the code if they are side effects). It's not by name.? I don't know where you got this idea. From brian.goetz at oracle.com Wed Mar 30 00:47:45 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 29 Mar 2022 20:47:45 -0400 Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: <691219619.3739631.1648593412907.JavaMail.zimbra@u-pem.fr> References: <691219619.3739631.1648593412907.JavaMail.zimbra@u-pem.fr> Message-ID: <83d3947d-09cb-1f6a-69b9-3c6ace0d3f18@oracle.com> > > The mangling has to be stable across compilations with respect to > any source- and binary-compatible changes to the pattern > declaration.? One mangling that works quite well is to use the > "symbolic-freedom encoding" of the erasure of the pattern > descriptor.? Because the erasure of the descriptor is exactly as > stable as any other method signature derived from source > declarations, it will have the desired binary compatibility > properties, overriding will work as expected, etc. > > > I think we need a least to use a special name like the > same way we have . Yes.? Instance/static patterns will have names, so for them, we'll use the name as declared in the source.? Dtors have no names, just like ctors, so we have to invent something to stand in for that. or similar is fine. > I agree that we also need to encode the method type descriptor (the > carrier type) into the name, so the name of the method in the > classfile should be or (or > perhaps ofr the pattern methods). The key constraint is that the mangled name be stable with respect to compatible changes in the declaration.? The rest is just "classfile syntax." > > > #### Return value > > In an earlier design, we used a pattern object (which was a bundle > of method handles) as the return value of the pattern.? This > enabled clients to invoke these via condy and bind method handles > into the constant pool for deconstruction and static patterns. > > Either way, we make use of some sort of carrier object to carry > the bindings from the pattern to the client; either we return the > carrier from the pattern method, or there is a method on the > pattern object that we invoke to get a carrier.? We have a few > preferences about the carrier; we'd like to be able to late-bind > to the actual implementation (i.e., we don't want to freeze the > name of a carrier class in the method descriptor), and at least > for records, we'd like to let the record instance itself be the > carrier (since it is immutable and we can just invoke the > accessors to get the bindings.) > > > So the return type is either Object (too hide the type of the carrier) > or a lambda that returns an Object (PatternObject or PatternCarrier > acting like a glorified lambda). If the pattern method actually runs the match, then I think Object is right.? If the method returns a constant bundle of method handles, then it can return something like PatternHandle or a matcher lambda.? But I am no longer seeing the benefit in this extra layer of indirection, given how the other translation work has played out. > > > ??? Pattern { > ??????? u2 attr_name; > ??????? u4 attr_length; > ??????? u2 patternFlags; // bitmask > ??????? u2 patternName;? // index of UTF8 constant > ??????? u2 patternDescr; // index of MethodType (or alternately > UTF8) constant > ??????? u2 attributes_count; > ??????? attribute_info attributes[attributes_count]; > ??? } > > This says that "this method is a pattern", reifies the name of the > pattern (patternName), reifies the pattern descriptor > (patternDescr) which encodes the types of the bindings as a method > descriptor or MethodType, and has attributes which can carry > annotations, parameter metadata, and signature metadata for the > bindings.?? The existing attributes (e.g. Signature, > ParameterNames, RVAA) can be reused as is, with the interpretation > that this is the signature (or names, or annos) of the *bindings*, > not the input parameters.? Flags can carry things like > "deconstructor pattern" or "partial pattern" as needed. > > > From the classfile POV, a constructor is a method with a funny name in > between brackets, i think deconstructor and pattern methods should > work the same way. Be careful of extrapolating from one data point.? Dtor are only one form of declared patterns; we also have to accomodate static and instance patterns. From forax at univ-mlv.fr Wed Mar 30 09:50:42 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 30 Mar 2022 11:50:42 +0200 (CEST) Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: References: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> Message-ID: <1877090243.3967946.1648633842653.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 2:42:38 AM > Subject: Re: [External] : Re: Declared patterns -- translation and reflection >> 1/ conceptually there is a mismatch, the syntax introduce names for the >> bindings, but they have no names at that point, bindings only have names AFTER >> the pattern matching succeed. > I think you have missed the point here. The names serve the implementation of > the pattern, not the interface -- just as parameter names to methods do. As you > see in the example, these are effectively blank final locals in the body of the > pattern, which must be assigned to. (I'd have pointed this out if this were > actually a message on declaring deconstructors, but since the message is on > translation and reflection I didn't want to digress.) >> 2/ sending the value of the binding by name is alien to Java. In Java, sending >> values is by the position of the value. > It's not by name. I don't know where you got this idea. >> 3/ the conceptual mismatch also exists at runtime, you need to permute the value >> of bindings before creating the carrier because a carrier takes the value of >> the binding by position while the code will takes the value of the bindings by >> name (you need the equivalent of MethodHandles.permuteArguments() otherwise you >> will see the re-organisation of the code if they are side effects). > It's not by name. I don't know where you got this idea. I think i understand the underlying semantics of the syntax, i'm not 100% confident. You know that it's not about the syntax per se but what the syntax try to communicate to the users. The problem with the proposed syntax is that you invent a new kind of variable, until now, we had local variables and fields (and array cells but those have no name). Your binding is a new kind of variable. It means that - as a user, i need to learn new rules: can i use the value of a binding to compute the value of another one, can i declare a binding final ? can i capture a binding in a lambda/anonymous class ? etc - the JLS needs to artificially grows to cover all these rules - the JVMS needs to be updated, more questions: do we need an attribute LocalBindingTable like we have a LocalVariableTable - tools needs to be updated: how debuggers reflects bindings, is it another tables in the UI ? Do/Can the debugger allows to chain the value of the binding while a user steps into the code, etc All this pain, because you want to name bindings. R?mi From forax at univ-mlv.fr Wed Mar 30 09:59:20 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 11:59:20 +0200 (CEST) Subject: Are binding types covariant ? Was: Declared patterns -- translation and reflection In-Reply-To: References: Message-ID: <586324091.3975257.1648634360229.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Tuesday, March 29, 2022 11:01:18 PM > Subject: Declared patterns -- translation and reflection > Time to take a peek ahead at _declared patterns_. Declared patterns come in > three varieties -- deconstruction patterns, static patterns, and instance > patterns (corresponding to constructors, static methods, and instance methods.) > I'm going to start with deconstruction patterns, but the basic game is the same > for all three. Once we have pattern methods, we can have an interface that defines a pattern method and a class that implement it, something like interface I { foo() (Object, int); // fake syntax: the first parenthesis are the parameters, the seconds are the binding types } class A implements I { foo() (String, int) { ... } } Do we agree that a binding type can be covariant ? (before saying no, think about generics that's the reason we have return type covariance in Java). In that case, are we are in trouble with the translation strategy ? R?mi From forax at univ-mlv.fr Wed Mar 30 10:11:16 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 30 Mar 2022 12:11:16 +0200 (CEST) Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: <83d3947d-09cb-1f6a-69b9-3c6ace0d3f18@oracle.com> References: <691219619.3739631.1648593412907.JavaMail.zimbra@u-pem.fr> <83d3947d-09cb-1f6a-69b9-3c6ace0d3f18@oracle.com> Message-ID: <1304811121.3989870.1648635076964.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 2:47:45 AM > Subject: Re: [External] : Re: Declared patterns -- translation and reflection >>> The mangling has to be stable across compilations with respect to any source- >>> and binary-compatible changes to the pattern declaration. One mangling that >>> works quite well is to use the "symbolic-freedom encoding" of the erasure of >>> the pattern descriptor. Because the erasure of the descriptor is exactly as >>> stable as any other method signature derived from source declarations, it will >>> have the desired binary compatibility properties, overriding will work as >>> expected, etc. >> I think we need a least to use a special name like the same way >> we have . > Yes. Instance/static patterns will have names, so for them, we'll use the name > as declared in the source. Dtors have no names, just like ctors, so we have to > invent something to stand in for that. or similar is fine. Pattern methods (static or not) does not have a real name, so '<' and '>' are here to signal that the name is in the Pattern attribute. We do not want people to unmangle the name of pattern methods that why the name is in the attribute, using '<' and '>' signal that idea. As a war story, most of the IDEs try to decode nested class name trying to making sense of the names in between the '$' and tend to throw exceptions when they encounters classes a different patterns that the one generated by javac. But we may not care given that not a lot of people read the bytecode directly. I think John can help us here :) >> I agree that we also need to encode the method type descriptor (the carrier >> type) into the name, so the name of the method in the classfile should be >> or (or perhaps ofr >> the pattern methods). > The key constraint is that the mangled name be stable with respect to compatible > changes in the declaration. The rest is just "classfile syntax." yes. >>> #### Return value >>> In an earlier design, we used a pattern object (which was a bundle of method >>> handles) as the return value of the pattern. This enabled clients to invoke >>> these via condy and bind method handles into the constant pool for >>> deconstruction and static patterns. >>> Either way, we make use of some sort of carrier object to carry the bindings >>> from the pattern to the client; either we return the carrier from the pattern >>> method, or there is a method on the pattern object that we invoke to get a >>> carrier. We have a few preferences about the carrier; we'd like to be able to >>> late-bind to the actual implementation (i.e., we don't want to freeze the name >>> of a carrier class in the method descriptor), and at least for records, we'd >>> like to let the record instance itself be the carrier (since it is immutable >>> and we can just invoke the accessors to get the bindings.) >> So the return type is either Object (too hide the type of the carrier) or a >> lambda that returns an Object (PatternObject or PatternCarrier acting like a >> glorified lambda). > If the pattern method actually runs the match, then I think Object is right. If > the method returns a constant bundle of method handles, then it can return > something like PatternHandle or a matcher lambda. But I am no longer seeing the > benefit in this extra layer of indirection, given how the other translation > work has played out. I agree, Object is enough. >>> Pattern { >>> u2 attr_name; >>> u4 attr_length; >>> u2 patternFlags; // bitmask >>> u2 patternName; // index of UTF8 constant >>> u2 patternDescr; // index of MethodType (or alternately UTF8) constant >>> u2 attributes_count; >>> attribute_info attributes[attributes_count]; >>> } >>> This says that "this method is a pattern", reifies the name of the pattern >>> (patternName), reifies the pattern descriptor (patternDescr) which encodes the >>> types of the bindings as a method descriptor or MethodType, and has attributes >>> which can carry annotations, parameter metadata, and signature metadata for the >>> bindings. The existing attributes (e.g. Signature, ParameterNames, RVAA) can be >>> reused as is, with the interpretation that this is the signature (or names, or >>> annos) of the *bindings*, not the input parameters. Flags can carry things like >>> "deconstructor pattern" or "partial pattern" as needed. >> From the classfile POV, a constructor is a method with a funny name in between >> brackets, i think deconstructor and pattern methods should work the same way. > Be careful of extrapolating from one data point. Dtor are only one form of > declared patterns; we also have to accomodate static and instance patterns. see above, it's about signaling that the name is mangled. R?mi From brian.goetz at oracle.com Wed Mar 30 13:32:46 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 09:32:46 -0400 Subject: [External] : Re: Declared patterns -- translation and reflection In-Reply-To: <1877090243.3967946.1648633842653.JavaMail.zimbra@u-pem.fr> References: <104521199.3738715.1648592366308.JavaMail.zimbra@u-pem.fr> <1877090243.3967946.1648633842653.JavaMail.zimbra@u-pem.fr> Message-ID: <614c8f4c-dadf-c98e-d890-28ea9cad40dc@oracle.com> > It's not by name.? I don't know where you got this idea. > > > I think i understand the underlying semantics of the syntax, i'm not > 100% confident. It's always OK to ask questions if you are not 100% sure!?? In fact, its generally better to do so. > The problem with the proposed syntax is that you invent a new kind of > variable, until now, we had local variables and fields (and array > cells but those have no name). It's valid to have concerns, but (a) please try to understand the entire design space before declaring it a "problem" (questions are OK), and (b) please wait until we're actually having that discussion.? This is a big, complex design space, and there is a clean separation between the user model of how it is declared and how it is rendered in the classfile, so I'm trying to keep the conversation focused so we can make progerss.? Please work with me on this. > Once we have pattern methods, we can have an interface that defines a > pattern method and a class that implement it, > something like As the "Patterns in the Object Model" document says, yes, patterns make sense in interface.? The best example is probably Map.Entry: ??? for (Entry(var k, var v) : map.entrySet()) { ... } These would translate (this conversation is about translation) as pattern methods in the interface.? (I haven't thought much about whether default implementations make sense.) > Do we agree that a binding type can be covariant ? (before saying no, > think about generics that's the reason we have return type covariance > in Java). > In that case, are we are in trouble with the translation strategy ? Its a fair question about whether we want this.? When the bindings act as a "multiple return bundle", though "covariant return" becomes much more complicated; you'd probably need some kind of "meet" restriction which says that for any two overrides X and Y that are more specific than Z, there is a "meet" W that is more specific than either X and Y.? Not sure it is worth going there. It's also a fair question about how it works out in translation, I'll think about this. > Pattern methods (static or not) does not have a real name, so '<' and > '>' are here to signal that the name is in the Pattern attribute. > We do not want people to unmangle the name of pattern methods that why > the name is in the attribute, using '<' and '>' signal that idea. Yes and no.? Remember that the non-dtor patterns in the source file have names, and they can be overloaded: ??? class X { ??????? __pattern(bindings) p(args1) { ... } ??????? __pattern(bindings) p(args2) { ... } ??????? __pattern(bindings) q(args1) { ... } ??????? __pattern(bindings) q(args2) { ... } ???? } The mangled name must be unique for each of these, but the first two must be derived in part from "p" (so that when the file is recompiled, we come up with the same name).? Only dtors are "nameless" and need a standin name like (or the class name, or any arbitrary spelling rule we want to make.)?? So while the translated name not be exactly name$mangle, the name is important (it can just go into the mangled part if we like.) As with other synthetic members, like bridge methods, we can make it a compile-time error to try to override it as a method rather than as a pattern. From brian.goetz at oracle.com Wed Mar 30 14:40:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 10:40:28 -0400 Subject: Remainder in pattern matching Message-ID: We should have wrapped this up a while ago, so I apologize for the late notice, but we really have to wrap up exceptions thrown from pattern contexts (today, switch) when an exhaustive context encounters a remainder.? I think there's really one one sane choice, and the only thing to discuss is the spelling, but let's go through it. In the beginning, nulls were special in switch.? The first thing is to evaluate the switch operand; if it is null, switch threw NPE.? (I don't think this was motivated by any overt null hostility, at least not at first; it came from unboxing, where we said "if its a box, unbox it", and the unboxing throws NPE, and the same treatment was later added to enums (though that came out in the same version) and strings.) We have since refined switch so that some switches accept null. But for those that don't, I see no other move besides "if the operand is null and there is no null handling case, throw NPE." Null will always be a special remainder value (when it appears in the remainder.) In Java 12, when we did switch expressions, we had to confront the issue of novel enum constants.? We considered a number of alternatives, and came up with throwing ICCE.? This was a reasonable choice, though as it turns out is not one that scales as well as we had hoped it would at the time.? The choice here is based on "the view of classfiles at compile time and run time has shifted in an incompatible way."? ICCE is, as Kevin pointed out, a reliable signal that your classpath is borked. We now have two precedents from which to extrapolate, but as it turns out, neither is really very good for the general remainder case. Recall that we have a definition of _exhaustiveness_, which is, at some level, deliberately not exhaustive.? We know that there are edge cases for which it is counterproductive to insist that the user explicitly cover, often for two reasons: one is that its annoying to the user (writing cases for things they believe should never happen), and the other that it undermines type checking (the most common way to do this is a default clause, which can sweep other errors under the rug.) If we have an exhaustive set of patterns on a type, the set of possible values for that type that are not covered by some pattern in the set is called the _remainder_.? Computing the remainder exactly is hard, but computing an upper bound on the remainder is pretty easy.? I'll say "x may be in the remainder of P* on T" to indicate that we're defining the upper bound. ?- If P* contains a deconstruction pattern P(Q*), null may be in the remainder of P*. ?- If T is sealed, instances of a novel subtype of T may be in the remainder of P*. ?- If T is an enum, novel enum constants of T may be in the remainder of P*. ?- If R(X x, Y y) is a record, and x is in the remainder of Q* on X, then `R(x, any)` may be in the remainder of { R(q) : q in Q*} on R. Examples: ??? sealed interface X permits X1, X2 { } ??? record X1(String s) implements X { } ??? record X2(String s) implements X { } ??? record R(X x1, X x2) { } ??? switch (r) { ???????? case R(X1(String s), any): ???????? case R(X2(String s), X1(String s)): ???????? case R(X2(String s), X2(String s)): ??? } This switch is exhaustive.? Let N be a novel subtype of X.? So the remainder includes: ??? null, R(N, _), R(_, N), R(null, _), R(X2, null) It might be tempting to argue (in fact, someone has) that we should try to pick a "root cause" (null or novel) and throw that.? But I think this is both excessive and unworkable. Excessive: This means that the compiler would have to enumerate the remainder set (its a set of patterns, so this is doable) and insert an extra synthetic clause for each.? This is a lot of code footprint and complexity for a questionable benefit, and the sort of place where bugs hide. Unworkable: Ultimately such code will have to make an arbitrary choice, because R(N, null) and R(null, N) are in the remainder set.? So which is the root cause?? Null or novel?? We'd have to make an arbitrary choice. So what I propose is the following simple answer instead: ?- If the switch target is null and no case handles null, throw NPE.? (We know statically whether any case handles null, so this is easy and similar to what we do today.) ?- If the switch is an exhaustive enum switch, and no case handles the target, throw ICCE.? (Again, we know statically whether the switch is over an enum type.) ?- In any other case of an exhaustive switch for which no case handles the target, we throw a new exception type, java.lang.MatchException, with an error message indicating remainder. The first two rules are basically dictated by compatibility.? In hindsight, we might have not chosen ICCE in 12, and gone with the general (third) rule instead, but that's water under the bridge. We need to wrap this up in the next few days, so if you've concerns here, please get them on the record ASAP. As a separate but not-separate exception problem, we have to deal with at least two additional sources of exceptions: ?- A dtor / record acessor may throw an arbitrary exception in the course of evaluating whether a case matches. ?- User code in the switch may throw an arbitrary exception. For the latter, this has always been handled by having the switch terminate abruptly with the same exception, and we should continue to do this. For the former, we surely do not want to swallow this exception (such an exception indicates a bug). The choices here are to treat this the same way we do with user code, throwing it out of the switch, or to wrap with MatchException. I prefer the latter -- wrapping with MatchException -- because the exception is thrown from synthetic code between the user code and the ultimate thrower, which means the pattern matching feature is mediating access to the thrower.? I think we should handle this as "if a pattern invoked from pattern matching completes abruptly by throwing X, pattern matching completes abruptly with MatchException", because the specific X is not a detail we want the user to bind to.? (We don't want them to bind to anything, but if they do, we want them to bind to the logical action, not the implementation details.) From forax at univ-mlv.fr Wed Mar 30 16:08:02 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 18:08:02 +0200 (CEST) Subject: Remainder in pattern matching In-Reply-To: References: Message-ID: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 4:40:28 PM > Subject: Remainder in pattern matching > We should have wrapped this up a while ago, so I apologize for the late notice, > but we really have to wrap up exceptions thrown from pattern contexts (today, > switch) when an exhaustive context encounters a remainder. I think there's > really one one sane choice, and the only thing to discuss is the spelling, but > let's go through it. > In the beginning, nulls were special in switch. The first thing is to evaluate > the switch operand; if it is null, switch threw NPE. (I don't think this was > motivated by any overt null hostility, at least not at first; it came from > unboxing, where we said "if its a box, unbox it", and the unboxing throws NPE, > and the same treatment was later added to enums (though that came out in the > same version) and strings.) > We have since refined switch so that some switches accept null. But for those > that don't, I see no other move besides "if the operand is null and there is no > null handling case, throw NPE." Null will always be a special remainder value > (when it appears in the remainder.) > In Java 12, when we did switch expressions, we had to confront the issue of > novel enum constants. We considered a number of alternatives, and came up with > throwing ICCE. This was a reasonable choice, though as it turns out is not one > that scales as well as we had hoped it would at the time. The choice here is > based on "the view of classfiles at compile time and run time has shifted in an > incompatible way." ICCE is, as Kevin pointed out, a reliable signal that your > classpath is borked. > We now have two precedents from which to extrapolate, but as it turns out, > neither is really very good for the general remainder case. > Recall that we have a definition of _exhaustiveness_, which is, at some level, > deliberately not exhaustive. We know that there are edge cases for which it is > counterproductive to insist that the user explicitly cover, often for two > reasons: one is that its annoying to the user (writing cases for things they > believe should never happen), and the other that it undermines type checking > (the most common way to do this is a default clause, which can sweep other > errors under the rug.) > If we have an exhaustive set of patterns on a type, the set of possible values > for that type that are not covered by some pattern in the set is called the > _remainder_. Computing the remainder exactly is hard, but computing an upper > bound on the remainder is pretty easy. I'll say "x may be in the remainder of > P* on T" to indicate that we're defining the upper bound. > - If P* contains a deconstruction pattern P(Q*), null may be in the remainder of > P*. > - If T is sealed, instances of a novel subtype of T may be in the remainder of > P*. > - If T is an enum, novel enum constants of T may be in the remainder of P*. > - If R(X x, Y y) is a record, and x is in the remainder of Q* on X, then `R(x, > any)` may be in the remainder of { R(q) : q in Q*} on R. > Examples: > sealed interface X permits X1, X2 { } > record X1(String s) implements X { } > record X2(String s) implements X { } > record R(X x1, X x2) { } > switch (r) { > case R(X1(String s), any): > case R(X2(String s), X1(String s)): > case R(X2(String s), X2(String s)): > } > This switch is exhaustive. Let N be a novel subtype of X. So the remainder > includes: > null, R(N, _), R(_, N), R(null, _), R(X2, null) > It might be tempting to argue (in fact, someone has) that we should try to pick > a "root cause" (null or novel) and throw that. But I think this is both > excessive and unworkable. [...] see below > So what I propose is the following simple answer instead: > - If the switch target is null and no case handles null, throw NPE. (We know > statically whether any case handles null, so this is easy and similar to what > we do today.) > - If the switch is an exhaustive enum switch, and no case handles the target, > throw ICCE. (Again, we know statically whether the switch is over an enum > type.) > - In any other case of an exhaustive switch for which no case handles the > target, we throw a new exception type, java.lang.MatchException, with an error > message indicating remainder. I agree for the first rule, if null is not handled, let throw a NPE. For when the static world and the dynamic world disagree, i think your analysis has miss an important question, switching on an enum throw an ICCE very late when we discover an unknown value, but in the case of a sealed type, we can decide to reject the switch much sooner. There is a spectrum of choices here, where to throw an ICCE, it can be - when the class is verified (when the method is verified for OpenJ9) - the first time the switch is reached (with an indy that validates once that the different sealed types "permits" set have not changed). - when we have exhausted all branches associated with a sealed type About excessive and unworkable, > Excessive: This means that the compiler would have to enumerate the remainder > set (its a set of patterns, so this is doable) and insert an extra synthetic > clause for each. This is a lot of code footprint and complexity for a > questionable benefit, and the sort of place where bugs hide. Remainders are dangling else in a cascade of if ... else, so yes, we have to care of them. So yes, it may a lot of bytecodes if we choose to add all branches but the benefit is not questionable, it's far better than the alternative which is GoodLuckFigureByYourselfException. > Unworkable: Ultimately such code will have to make an arbitrary choice, because > R(N, null) and R(null, N) are in the remainder set. So which is the root cause? > Null or novel? We'd have to make an arbitrary choice. Yes ! We have to make that choice, but it's not arbitrary, it's where we want to put the cursor, it's a trade off between supporting case when we are using a sealed type but not using that part of the pattern matching and an ICCE being thrown earlier. Also, the rules you propose make the addition of patterns over enum values harder in the future. R?mi From brian.goetz at oracle.com Wed Mar 30 16:32:15 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 12:32:15 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> References: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> Message-ID: > > For when the static world and the dynamic world disagree, i think your > analysis has miss an important question, switching on an enum throw an > ICCE very late when we discover an unknown value, but in the case of a > sealed type, Actually, I thought about that quite a bit before proposing this. And my conclusion is: using ICCE was mostly a (well intentioned) mistake here, and "doubling down" on that path is more trouble than it is worth.? So we are minimally consistent with the ICCE choice in the cases that were compilable in 12, but for anything else, we follow the general rule. The thought experiment that I did was: what if we had not done switch expressions in 12.? Then the only precedent we have to deal with is the null case, which has a pretty obvious answer.? So what would we do?? Would we introduce 10s of catch-all cases solely for the purpose of diagnosing the source of remainder, or would we introduce a throwing default that throws MatchException on everything but null?? I concluded we would do the latter, so what is proposed here is basically that, but carving out the 12-compatibility case. > Remainders are dangling else in a cascade of if ... else, so yes, we > have to care of them. Yes, but we can care for all of them in one swoop with a synthetic default. > So yes, it may a lot of bytecodes if we choose to add all branches but > the benefit is not questionable, it's far better than the alternative > which is GoodLuckFigureByYourselfException. Yes, when you get a dynamic error here in a complex switch, the range of what could have gone wrong is large.? (The same will be true outside of switches when we have more kinds of patterns (list patterns, map patterns, etc) and more ways to compose patterns into bigger patterns; if we have a big complex pattern that matches the JSON document with the keys we want, if it doesn't match because (say) some integer nested nine levels deep overflowed 32 bits, this is also going to be hard to diagnose.)? But you are proposing a new and significant language requirement -- that the language should mandate an arbitrarily complex explanation of why something didn't match.? I won't dispute that this has benefit -- but I am not convinced this is necessarily the place for this, or whether the cost is justified by the benefit. Also, note that the two are not inconsistent.? If the switch is required to throw MatchException on remainder, the compiler is *allowed* to try and diagnose the root cause (the ME can wrap something more specific), but not required to.?? Pattern failure diagnosis then becomes a quality of implementation choice, rather than having complex, brittle rules mandated by the spec.? There's nothing to stop us from doing the equivalent of the "helpful NPE" JEP in the future. From heidinga at redhat.com Wed Mar 30 18:12:21 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 30 Mar 2022 14:12:21 -0400 Subject: Remainder in pattern matching In-Reply-To: References: Message-ID: The rules regarding NPE, ICCE and MatchException look reasonable to me. > As a separate but not-separate exception problem, we have to deal with at least two additional sources of exceptions: > > - A dtor / record acessor may throw an arbitrary exception in the course of evaluating whether a case matches. > > - User code in the switch may throw an arbitrary exception. > > For the latter, this has always been handled by having the switch terminate abruptly with the same exception, and we should continue to do this. > > For the former, we surely do not want to swallow this exception (such an exception indicates a bug). The choices here are to treat this the same way we do with user code, throwing it out of the switch, or to wrap with MatchException. > > I prefer the latter -- wrapping with MatchException -- because the exception is thrown from synthetic code between the user code and the ultimate thrower, which means the pattern matching feature is mediating access to the thrower. I think we should handle this as "if a pattern invoked from pattern matching completes abruptly by throwing X, pattern matching completes abruptly with MatchException", because the specific X is not a detail we want the user to bind to. (We don't want them to bind to anything, but if they do, we want them to bind to the logical action, not the implementation details.) My intuition (and maybe I have the wrong mental model?) is that the pattern matching calling a user written dtor / record accessor is akin to calling a method. We don't wrap the exceptions thrown by methods apart from some very narrow cases (ie: reflection), and I thought part of reflection's behaviour was related to needing to ensure exceptions (particularly checked ones) were converted to something explicitly handled by the caller. If the dtor / record accessor can declare they throw checked exceptions, then I can kind of see the rationale for wrapping them. Otherwise, it seems clearer to me to let them be thrown without wrapping. I don't think we expect users to explicitly handle MatchException when using pattern matching so what does wrapping gain us here? --Dan From brian.goetz at oracle.com Wed Mar 30 18:26:53 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 14:26:53 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: Message-ID: <66413439-c5df-17be-6e32-622aa23fb7c1@oracle.com> It's a little like calling a method, but a little not like it too. For example, when you match on a record pattern: ??? case Point(var x, var y): ... what may happen is *either* you will invoke a user-written deconstructor pattern, *or* we will test if you are a Point with `instanceof`, and then invoke the accessor methods (which might be user-written or implicit.)? Similarly, if you match: ??? case Point(P, Q): ??? case Point(R, S): we may invoke the Point deconstructor once, or twice.? And there's no way to _directly_ invoke a pattern, only through switch, instanceof, and other contexts. All of this means that invocations of pattern methods is more indirect, and mediated by the language, than invoking a method. When you invoke a method, you are assenting to its contract about what it returns, what it throws, etc.? When you match a pattern, it feels more likely are assenting to the contract of _pattern matching_, which in turn hides implementation details of what pattern methods are invoked, when they are invoked, how often, etc. Dtors and record accessors cannot throw checked exceptions at all, and will be discouraged from throwing exceptions at all. One thing wrapping gains is that it gives us a place to centralize "something failed in pattern matching", which includes exhaustiveness failures as well as failures of invariants which PM assumes (e.g., dtors don't throw.)?? Another thing it gains is that it discourages people from thinking they can use exceptions in dtors; having these laundered through MatchException discourages using this as a side channel, though that's a more minor thing. Agree we do not expect users to explicitly handle ME, any more so than NPE. > My intuition (and maybe I have the wrong mental model?) is that the > pattern matching calling a user written dtor / record accessor is akin > to calling a method. We don't wrap the exceptions thrown by methods > apart from some very narrow cases (ie: reflection), and I thought part > of reflection's behaviour was related to needing to ensure exceptions > (particularly checked ones) were converted to something explicitly > handled by the caller. > > If the dtor / record accessor can declare they throw checked > exceptions, then I can kind of see the rationale for wrapping them. > Otherwise, it seems clearer to me to let them be thrown without > wrapping. > > I don't think we expect users to explicitly handle MatchException when > using pattern matching so what does wrapping gain us here? > > --Dan > From forax at univ-mlv.fr Wed Mar 30 18:31:34 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 30 Mar 2022 20:31:34 +0200 (CEST) Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> Message-ID: <1371483343.4376231.1648665094230.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 6:32:15 PM > Subject: Re: [External] : Re: Remainder in pattern matching >> For when the static world and the dynamic world disagree, i think your analysis >> has miss an important question, switching on an enum throw an ICCE very late >> when we discover an unknown value, but in the case of a sealed type, > Actually, I thought about that quite a bit before proposing this. And my > conclusion is: using ICCE was mostly a (well intentioned) mistake here, and > "doubling down" on that path is more trouble than it is worth. So we are > minimally consistent with the ICCE choice in the cases that were compilable in > 12, but for anything else, we follow the general rule. > The thought experiment that I did was: what if we had not done switch > expressions in 12. Then the only precedent we have to deal with is the null > case, which has a pretty obvious answer. So what would we do? Would we > introduce 10s of catch-all cases solely for the purpose of diagnosing the > source of remainder, or would we introduce a throwing default that throws > MatchException on everything but null? I concluded we would do the latter, so > what is proposed here is basically that, but carving out the 12-compatibility > case. We are discussing about what to do if a sealed types has more permitted subtypes at runtime than the one seen when the code was compiled. It's a separate compilation issue, hence the ICCE. It seems that what you are saying is that you think an Exception is better than an Error. If we follow that path, it means that it may make sense to recover from a MatchException but i fail to see how, we can not ask a developer of a code to change it while that code is executed, separate compilation errors are not recoverable. >> Remainders are dangling else in a cascade of if ... else, so yes, we have to >> care of them. > Yes, but we can care for all of them in one swoop with a synthetic default. >> So yes, it may a lot of bytecodes if we choose to add all branches but the >> benefit is not questionable, it's far better than the alternative which is >> GoodLuckFigureByYourselfException. > Yes, when you get a dynamic error here in a complex switch, the range of what > could have gone wrong is large. (The same will be true outside of switches when > we have more kinds of patterns (list patterns, map patterns, etc) and more ways > to compose patterns into bigger patterns; if we have a big complex pattern that > matches the JSON document with the keys we want, if it doesn't match because > (say) some integer nested nine levels deep overflowed 32 bits, this is also > going to be hard to diagnose.) But you are proposing a new and significant > language requirement -- that the language should mandate an arbitrarily complex > explanation of why something didn't match. I won't dispute that this has > benefit -- but I am not convinced this is necessarily the place for this, or > whether the cost is justified by the benefit. The explanation is not complex, there is a sealed type has more subtypes now than at a time the code was compiled. R?mi From brian.goetz at oracle.com Wed Mar 30 18:35:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 14:35:17 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: <1371483343.4376231.1648665094230.JavaMail.zimbra@u-pem.fr> References: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> <1371483343.4376231.1648665094230.JavaMail.zimbra@u-pem.fr> Message-ID: <9fd41dbe-aa42-5ae6-c539-fc89f78e4a4d@oracle.com> > > It seems that what you are saying is that you think an Exception is > better than an Error. Not exactly; what I'm saying is that the attempt to separate stray nulls from separate compilation issues here seems like a heroic effort for low value, and I'd rather have one channel for "exhaustiveness failure" and let implementations decide how heroic they want to get in sorting out the possible causes. From brian.goetz at oracle.com Wed Mar 30 18:38:29 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 14:38:29 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: Message-ID: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> Another way to think about this is: ?- If any of the code that the user actually wrote (the RHS of case clauses, or guards on case labels) throws, then the switch throws that ?- If any of the machinery of the switch dispatch throws, it throws MatchException. On 3/30/2022 2:12 PM, Dan Heidinga wrote: > The rules regarding NPE, ICCE and MatchException look reasonable to me. > > >> As a separate but not-separate exception problem, we have to deal with at least two additional sources of exceptions: >> >> - A dtor / record acessor may throw an arbitrary exception in the course of evaluating whether a case matches. >> >> - User code in the switch may throw an arbitrary exception. >> >> For the latter, this has always been handled by having the switch terminate abruptly with the same exception, and we should continue to do this. >> >> For the former, we surely do not want to swallow this exception (such an exception indicates a bug). The choices here are to treat this the same way we do with user code, throwing it out of the switch, or to wrap with MatchException. >> >> I prefer the latter -- wrapping with MatchException -- because the exception is thrown from synthetic code between the user code and the ultimate thrower, which means the pattern matching feature is mediating access to the thrower. I think we should handle this as "if a pattern invoked from pattern matching completes abruptly by throwing X, pattern matching completes abruptly with MatchException", because the specific X is not a detail we want the user to bind to. (We don't want them to bind to anything, but if they do, we want them to bind to the logical action, not the implementation details.) > My intuition (and maybe I have the wrong mental model?) is that the > pattern matching calling a user written dtor / record accessor is akin > to calling a method. We don't wrap the exceptions thrown by methods > apart from some very narrow cases (ie: reflection), and I thought part > of reflection's behaviour was related to needing to ensure exceptions > (particularly checked ones) were converted to something explicitly > handled by the caller. > > If the dtor / record accessor can declare they throw checked > exceptions, then I can kind of see the rationale for wrapping them. > Otherwise, it seems clearer to me to let them be thrown without > wrapping. > > I don't think we expect users to explicitly handle MatchException when > using pattern matching so what does wrapping gain us here? > > --Dan > From forax at univ-mlv.fr Wed Mar 30 18:40:49 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 20:40:49 +0200 (CEST) Subject: [External] : Re: Remainder in pattern matching In-Reply-To: <66413439-c5df-17be-6e32-622aa23fb7c1@oracle.com> References: <66413439-c5df-17be-6e32-622aa23fb7c1@oracle.com> Message-ID: <1473997540.4377334.1648665649969.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 8:26:53 PM > Subject: Re: [External] : Re: Remainder in pattern matching [...] > > One thing wrapping gains is that it gives us a place to centralize > "something failed in pattern matching", which includes exhaustiveness > failures as well as failures of invariants which PM assumes (e.g., dtors > don't throw.)? but such centralization is a bad practice, that the reason why catch(Exception) is considered as a bad practice. BTW, i hope that with loom people will use virtual threads (they are cheap) to manage scenarios where you want to discard a computation if something fails, like in Erlang. R?mi From heidinga at redhat.com Wed Mar 30 18:43:42 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 30 Mar 2022 14:43:42 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> References: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> Message-ID: On Wed, Mar 30, 2022 at 2:38 PM Brian Goetz wrote: > > Another way to think about this is: > > - If any of the code that the user actually wrote (the RHS of case clauses, or guards on case labels) throws, then the switch throws that > - If any of the machinery of the switch dispatch throws, it throws MatchException. > That's a reasonable way to factor this and makes the difference between the machinery and the direct user code clear, even when looking at stacktraces. And from your other response: > Another thing it gains is that it discourages people > from thinking they can use exceptions in dtors; having these laundered > through MatchException discourages using this as a side channel, though > that's a more minor thing. This is a stronger argument than you give it credit for being. Wrapping the exception adds a bit of friction to doing the wrong thing which will pay off in helping guide users to the intended behaviour. --Dan > On 3/30/2022 2:12 PM, Dan Heidinga wrote: > > The rules regarding NPE, ICCE and MatchException look reasonable to me. > > > As a separate but not-separate exception problem, we have to deal with at least two additional sources of exceptions: > > - A dtor / record acessor may throw an arbitrary exception in the course of evaluating whether a case matches. > > - User code in the switch may throw an arbitrary exception. > > For the latter, this has always been handled by having the switch terminate abruptly with the same exception, and we should continue to do this. > > For the former, we surely do not want to swallow this exception (such an exception indicates a bug). The choices here are to treat this the same way we do with user code, throwing it out of the switch, or to wrap with MatchException. > > I prefer the latter -- wrapping with MatchException -- because the exception is thrown from synthetic code between the user code and the ultimate thrower, which means the pattern matching feature is mediating access to the thrower. I think we should handle this as "if a pattern invoked from pattern matching completes abruptly by throwing X, pattern matching completes abruptly with MatchException", because the specific X is not a detail we want the user to bind to. (We don't want them to bind to anything, but if they do, we want them to bind to the logical action, not the implementation details.) > > My intuition (and maybe I have the wrong mental model?) is that the > pattern matching calling a user written dtor / record accessor is akin > to calling a method. We don't wrap the exceptions thrown by methods > apart from some very narrow cases (ie: reflection), and I thought part > of reflection's behaviour was related to needing to ensure exceptions > (particularly checked ones) were converted to something explicitly > handled by the caller. > > If the dtor / record accessor can declare they throw checked > exceptions, then I can kind of see the rationale for wrapping them. > Otherwise, it seems clearer to me to let them be thrown without > wrapping. > > I don't think we expect users to explicitly handle MatchException when > using pattern matching so what does wrapping gain us here? > > --Dan > > From forax at univ-mlv.fr Wed Mar 30 18:46:03 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 30 Mar 2022 20:46:03 +0200 (CEST) Subject: [External] : Re: Remainder in pattern matching In-Reply-To: <9fd41dbe-aa42-5ae6-c539-fc89f78e4a4d@oracle.com> References: <2031711733.4344207.1648656482967.JavaMail.zimbra@u-pem.fr> <1371483343.4376231.1648665094230.JavaMail.zimbra@u-pem.fr> <9fd41dbe-aa42-5ae6-c539-fc89f78e4a4d@oracle.com> Message-ID: <1489361036.4378823.1648665963613.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 8:35:17 PM > Subject: Re: [External] : Re: Remainder in pattern matching >> It seems that what you are saying is that you think an Exception is better than >> an Error. > Not exactly; what I'm saying is that the attempt to separate stray nulls from > separate compilation issues here seems like a heroic effort for low value, and > I'd rather have one channel for "exhaustiveness failure" and let > implementations decide how heroic they want to get in sorting out the possible > causes. NPE is a developer issue, separate compilation failure/ICCE is a deployment issue, there is no point to have one channel. R?mi From brian.goetz at oracle.com Wed Mar 30 18:52:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 14:52:17 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> Message-ID: Yes, and this is a special case of a more general thing -- that while pattern declarations may have a lot in common with methods, they are not "just methods with multiple return" (e.g., they have a different set of characteristics at the declaration, they are intrinsically conditional, they are "invoked" differently.)? While their bodies may look method-like, and ultimately they boil down to methods, thinking "they are just methods" is likely to drag you to the wrong place.? Of course, its a balance between how similar and HOW DIFFERENT they are, and that's what we're looking for. >> Another thing it gains is that it discourages people >> from thinking they can use exceptions in dtors; having these laundered >> through MatchException discourages using this as a side channel, though >> that's a more minor thing. > This is a stronger argument than you give it credit for being. > Wrapping the exception adds a bit of friction to doing the wrong thing > which will pay off in helping guide users to the intended behaviour. From brian.goetz at oracle.com Wed Mar 30 19:33:21 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Mar 2022 15:33:21 -0400 Subject: Patterns and GADTs (and type checking and inference and overload selection) Message-ID: <35905cda-2b1f-bbee-180a-5b89b10e9e66@oracle.com> GADTs -- sealed families whose permitted subtypes specialize the type variables of the base class -- pose some interesting challenges for pattern matching. (Remi: this is a big, complex area.? Off-the-cuff "this is wrong" or "you should X instead" replies are not helpful.? If in doubt, ask questions.? One comprehensive reply is more useful than many small replies.? Probably best to think about the whole thing for some time before responding.) Here is an example of a GADT hiearchy: sealed interface Node { } record IntNode(int i) implements Node { } record BoolNode(boolean b) implements Node { } record PlusNode(Node a, Node b) implements Node { } record OrNode(Node a, Node b) implements Node { } record IfNode(Node cond, Node a, Node b) implements Node { } Nodes can be parameterized, but some nodes are sharply typed, and some intermediate nodes (plus, or, if) have constraints on their components.? This is enough to model expressions like: ??? let ?????? a = true, b = false, x = 1, y = 2 ?????? in ?????????? if (a || b) then a + b else b; Note that `if` nodes can work on both Node and Node, and model a node of the right type. ## The Flow Problem As mentioned earlier, pattern matching can recover constraints on type variables, but currently we do not act on these.? For example, we might want to write the eval() like this: static T eval(Node n) { ??? return switch (n) { ??????? case IntNode(var i) -> i; ??????? case BoolNode(var b) -> b; ??????? case PlusNode(var a, var b) -> eval(a) + eval(b); ??????? case OrNode(var a, var b) -> eval(a) || eval(b); ??????? case IfNode(var c, var a, var b) -> eval(c) ? eval(a) : eval(b); ??? }; But this doesn't work.? The eval() method returns a T.? In the first case, we've matched Node to IntNode, so the compiler knows `i : int`.? Humans know that T can only be Integer, but the compiler doesn't know that yet.? As a result, the choice to return `i` will cause a type error; the compiler has no reason to believe that `i` is a `T`.? The only choice the user has is an unchecked cast to `T`.? This isn't great. We've discussed, as a possible solution, flow typing for type variables; matching IntNode to Node can generate a constraint T=Integer in the scope where the pattern matches. Pattern matching is already an explicitly conditional construct; whether a pattern matches already flows into scoping and control flow analysis.? Refining type constraints on type variables is a reasonable thing to consider, and offers a greater type-safety payoff than ordinary flow typing (since most flow typing can be replaced with pattern matching.) We have the same problem with the PlusNode and OrNode cases too; if we match PlusNode, then T can only be Integer, but the RHS will be int and assigning an int to a T will cause a problem.? Only the last case will type check without gathering extra T constraints. ## The Exhaustiveness Problem Now suppose we have a Node.? Then it can only be an IntNode, a PlusNode, or an IfNode.? So the following switch should be exhaustive: static int eval(Node n) { ??? return switch (n) { ??????? case IntNode(var i) -> i; ??????? case PlusNode(var a, var b) -> eval(a) + eval(b); ??????? case IfNode(var c, var a, var b) -> eval(c) ? eval(a) : eval(b); ??? }; We need to be able to eliminate BoolNode and OrNode from the list of types that have to be covered by the switch. We're proposing changes in the current round (also covered in my Coverage doc) that refines the "you cover a sealed type if you cover all the permitted subtypes" rule to exclude those whose parameterization are impossible. ## The Typing Problem Even without worrying about the RHS, we have problems with cases like this: static T eval(Node n) { ??? return switch (n) { ??????? ... ??????? case IfNode(var c, IntNode a, IntNode b) -> eval(c) ? a.i() + b.i(); // optimization ??? }; We know that an IfNode must have the same node parameterization on both a and b.? We don't encourage raw IfNode here; there should be something in the .? The rule is that if a type / record pattern is generic, the parameterization must be statically consistent with the target type; there has to be a cast conversion without unchecked conversion.? (This can get refined if we get sufficiently useful constraints from somewhere else, but not relaxed.)? But without some inference, we can't yet conclude that Integer is a valid (i.e., won't require unchecked conversion) parameterization for Node.? But clearly, Integer is the only possibility here.? So we can't even write this -- we'd have to use a raw or wildcard case, which is not very good.? We need more inference here, so we have enough type information for better well-formedness checks. #### Putting it all together Here's a related example from the "Lower your Guards" paper which ties it all together.? In Haskell: data T a b where T1 :: T Int Bool T2 :: T Char Bool g1 :: T Int b -> b -> Int g1 T1 False = 0 g1 T1 True = 1 Translating to Java: sealed interface T { } record T1() implements T { } record T2() implements T { } record G1(T t, B b) { } B bb = switch (g) { case G1(T1 t, false) -> 0; case G1(T1 t, true) -> 1; } The above switch is exhaustive on G1, but a lot has to happen to type-check the above switch: ?- We need to gather the constraint of B=int in both cases, from the nested type pattern T1 t. ?- We need to flow B=int into the body, so that assignment of int to bb can proceed. ?- We need to type-check that the generics in the cases is compatible with the target ?- (bonus) infer T=bool if not specified explicitly ?- We need to conclude that these two cases are exhaustive on G, which involves observing that T2 is not allowable here, so T1 covers T, and therefore (T1, false) and (T1, true) cover T x bool. I hope these examples illustrate that these cases are not silly, made-up cases; the hierarchy above is a sensible way to model expressions.? We are not going to solve all these problems immediately, but we need a story for getting there; otherwise we leave users with some bad choices, most of which involve raw types or unchecked casts. ## A shopping list So, we have quite a shopping list for type checking patterns. Some show up immediately as soon as we have record patterns; others will come later as we add explicit dtors. #### Well formedness checking Records can be generic, and we can nest patterns in record patterns.? Record patterns need a well-formedness check to ensure that any nested patterns are sound.? In the absence of inference, this is mostly a straightforward recursive application of the cast conversion test. We currently interpret a type pattern: ??? case ArrayList a: as a raw ArrayList; I think this may be a mistake for type patterns, but it surely will be a mistake for record patterns.? If we interpret: ??? case Box(...) as a raw box, we lose useful type information with which to gauge exhaustiveness.? In any case, if we have generic type parameters: ??? case Box(...) we can use the type parameters to refine the component types, which affects the recursive well-formedness check. #### Inference I think we should be using inference routinely on record/dtor patterns, and we should also consider doing the same on type patterns.? So this means interpreting ??? case Box(...) as "implicit diamond".? (We can talk about implicit vs explicit diamond, but that's mostly a syntax distraction, and you know the rules: no syntax discussions until there is consensus on semantics.) Inference in this case is different from what we do for methods (like everything else: well formedness, overload selection) because we're doing everything in reverse.? I'll sketch out an inference algorithm later, but a key point is that we have to solve nested generic patterns together with the outer pattern (as we do with methods.)? This then feeds more type information into the well formedness check, which is how we'd type check the IfNode case or the G1 case. Inference on type variables that flow in from the switch target can be flowed into the case bodies. #### Exhaustiveness I think what we've identified for exhaustiveness is enough. #### Overload selection When we add explicit dtor patterns, now we need to do overload selection.? This is like method overload selection, but again key parts run in reverse. When we add in primitive patterns with conversions, or varargs patterns, overload selection will need to reflect the same multi-phase applicability checking that we have for method overload selection.? Details to come at the appropriate time. From forax at univ-mlv.fr Wed Mar 30 20:59:15 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Mar 2022 22:59:15 +0200 (CEST) Subject: Patterns and GADTs (and type checking and inference and overload selection) In-Reply-To: <35905cda-2b1f-bbee-180a-5b89b10e9e66@oracle.com> References: <35905cda-2b1f-bbee-180a-5b89b10e9e66@oracle.com> Message-ID: <873025808.4391498.1648673955269.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 9:33:21 PM > Subject: Patterns and GADTs (and type checking and inference and overload selection) > GADTs -- sealed families whose permitted subtypes specialize the type > variables of the base class -- pose some interesting challenges for > pattern matching. > > (Remi: this is a big, complex area.? Off-the-cuff "this is wrong" or > "you should X instead" replies are not helpful.? If in doubt, ask > questions.? One comprehensive reply is more useful than many small > replies.? Probably best to think about the whole thing for some time > before responding.) No disagreement here, it's a nice summary of where we are and what are the challenges ahead. R?mi From forax at univ-mlv.fr Thu Mar 31 00:50:50 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 31 Mar 2022 02:50:50 +0200 (CEST) Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> Message-ID: <675774414.4428834.1648687850706.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Brian Goetz" > Cc: "amber-spec-experts" > Sent: Wednesday, March 30, 2022 8:43:42 PM > Subject: Re: [External] : Re: Remainder in pattern matching > On Wed, Mar 30, 2022 at 2:38 PM Brian Goetz wrote: >> [...] > > And from your other response: > >> Another thing it gains is that it discourages people >> from thinking they can use exceptions in dtors; having these laundered >> through MatchException discourages using this as a side channel, though >> that's a more minor thing. > > This is a stronger argument than you give it credit for being. > Wrapping the exception adds a bit of friction to doing the wrong thing > which will pay off in helping guide users to the intended behaviour. Wrapping exceptions into a MatchException seems a very bad idea to me. When you compute something on an AST, the pattern matching is recursive, so if an exception occurs, instead of having one exception with a long stacktrace, we will get a linked list of MatchException with each of them having a long stacktraces. > > --Dan R?mi > >> On 3/30/2022 2:12 PM, Dan Heidinga wrote: >> >> The rules regarding NPE, ICCE and MatchException look reasonable to me. >> >> >> As a separate but not-separate exception problem, we have to deal with at least >> two additional sources of exceptions: >> >> - A dtor / record acessor may throw an arbitrary exception in the course of >> evaluating whether a case matches. >> >> - User code in the switch may throw an arbitrary exception. >> >> For the latter, this has always been handled by having the switch terminate >> abruptly with the same exception, and we should continue to do this. >> >> For the former, we surely do not want to swallow this exception (such an >> exception indicates a bug). The choices here are to treat this the same way we >> do with user code, throwing it out of the switch, or to wrap with >> MatchException. >> >> I prefer the latter -- wrapping with MatchException -- because the exception is >> thrown from synthetic code between the user code and the ultimate thrower, >> which means the pattern matching feature is mediating access to the thrower. I >> think we should handle this as "if a pattern invoked from pattern matching >> completes abruptly by throwing X, pattern matching completes abruptly with >> MatchException", because the specific X is not a detail we want the user to >> bind to. (We don't want them to bind to anything, but if they do, we want them >> to bind to the logical action, not the implementation details.) >> >> My intuition (and maybe I have the wrong mental model?) is that the >> pattern matching calling a user written dtor / record accessor is akin >> to calling a method. We don't wrap the exceptions thrown by methods >> apart from some very narrow cases (ie: reflection), and I thought part >> of reflection's behaviour was related to needing to ensure exceptions >> (particularly checked ones) were converted to something explicitly >> handled by the caller. >> >> If the dtor / record accessor can declare they throw checked >> exceptions, then I can kind of see the rationale for wrapping them. >> Otherwise, it seems clearer to me to let them be thrown without >> wrapping. >> >> I don't think we expect users to explicitly handle MatchException when >> using pattern matching so what does wrapping gain us here? >> >> --Dan >> From brian.goetz at oracle.com Thu Mar 31 16:50:20 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 31 Mar 2022 12:50:20 -0400 Subject: [External] : Re: Remainder in pattern matching In-Reply-To: References: <4db9c947-48c6-f51c-cf48-c805b17e5a50@oracle.com> Message-ID: <2e5ceb1d-206d-c410-517d-b4d1db6ca713@oracle.com> Here's some candidate spec text for MatchException: Prototype spec for MatchException ( a preview API class ). Thrown to indicate an unexpected failure in pattern matching. MatchException may be thrown when an exhaustive pattern matching language construct (such as a switch expression) encounters a match target that does not match any of the provided patterns at runtime.? This can arise from a number of cases: ?- Separate compilation anomalies, where a sealed interface has a different set of permitted subtypes at runtime than it had at compilation time, an enum has a different set of constants at runtime than it had at compilation time, or the type hierarchy has changed in incompatible ways between compile time and run time. ?- Null targets and sealed types.? If an interface or abstract class `C` is sealed to permit `A` and `B`, then the set of record patterns `R(A a)` and `R(B b)` are exhaustive on a record `R` whose sole component is of type `C`, but neither of these patterns will match `new R(null)`. ?- Null targets and nested record patterns.? Given a record type `R` whose sole component is `S`, which in turn is a record whose sole component is `String`, then the nested record pattern `R(S(String s))` will not match `new R(null)`. Match failures arising from unexpected inputs will generally throw `MatchException` only after all patterns have been tried; even if `R(S(String s))` does not match `new R(null)`, a later pattern (such as `R r`) may still match the target. MatchException may also be thrown when operations performed as part of pattern matching throw an unexpected exception.? For example, pattern matching may cause methods such as record component accessors to be implicitly invoked in order to extract pattern bindings.? If these methods throw an exception, execution of the pattern matching construct may fail with `MatchException`. On 3/30/2022 2:43 PM, Dan Heidinga wrote: > On Wed, Mar 30, 2022 at 2:38 PM Brian Goetz wrote: >> Another way to think about this is: >> >> - If any of the code that the user actually wrote (the RHS of case clauses, or guards on case labels) throws, then the switch throws that >> - If any of the machinery of the switch dispatch throws, it throws MatchException. >> > That's a reasonable way to factor this and makes the difference > between the machinery and the direct user code clear, even when > looking at stacktraces. > > And from your other response: > >> Another thing it gains is that it discourages people >> from thinking they can use exceptions in dtors; having these laundered >> through MatchException discourages using this as a side channel, though >> that's a more minor thing. > This is a stronger argument than you give it credit for being. > Wrapping the exception adds a bit of friction to doing the wrong thing > which will pay off in helping guide users to the intended behaviour. > > --Dan > >> On 3/30/2022 2:12 PM, Dan Heidinga wrote: >> >> The rules regarding NPE, ICCE and MatchException look reasonable to me. >> >> >> As a separate but not-separate exception problem, we have to deal with at least two additional sources of exceptions: >> >> - A dtor / record acessor may throw an arbitrary exception in the course of evaluating whether a case matches. >> >> - User code in the switch may throw an arbitrary exception. >> >> For the latter, this has always been handled by having the switch terminate abruptly with the same exception, and we should continue to do this. >> >> For the former, we surely do not want to swallow this exception (such an exception indicates a bug). The choices here are to treat this the same way we do with user code, throwing it out of the switch, or to wrap with MatchException. >> >> I prefer the latter -- wrapping with MatchException -- because the exception is thrown from synthetic code between the user code and the ultimate thrower, which means the pattern matching feature is mediating access to the thrower. I think we should handle this as "if a pattern invoked from pattern matching completes abruptly by throwing X, pattern matching completes abruptly with MatchException", because the specific X is not a detail we want the user to bind to. (We don't want them to bind to anything, but if they do, we want them to bind to the logical action, not the implementation details.) >> >> My intuition (and maybe I have the wrong mental model?) is that the >> pattern matching calling a user written dtor / record accessor is akin >> to calling a method. We don't wrap the exceptions thrown by methods >> apart from some very narrow cases (ie: reflection), and I thought part >> of reflection's behaviour was related to needing to ensure exceptions >> (particularly checked ones) were converted to something explicitly >> handled by the caller. >> >> If the dtor / record accessor can declare they throw checked >> exceptions, then I can kind of see the rationale for wrapping them. >> Otherwise, it seems clearer to me to let them be thrown without >> wrapping. >> >> I don't think we expect users to explicitly handle MatchException when >> using pattern matching so what does wrapping gain us here? >> >> --Dan >> >>