String Interpolation

forax at univ-mlv.fr forax at univ-mlv.fr
Wed Oct 13 22:28:05 UTC 2021


> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>, "amber-spec-experts"
> <amber-spec-experts at openjdk.java.net>
> Sent: Mercredi 13 Octobre 2021 21:32:19
> Subject: Re: String Interpolation

> The ability to capture per-call-site computation so it could be done exactly
> once (including generating an MH to describe it) has been part of the goal all
> along. The JEP is deliberately cagey about this because we didn't want to
> descend down the translation rabbit hole before we'd achieved consensus on the
> broad strokes, any more than we wanted to descend down the syntax rabbit hole.

> (FWIW, all of these side-paths were ones we already traveled and rejected for
> various reasons :)

> As you correctly point out, without something like type classes, associating a
> static method like a bootstrap with a class requires committing some sort of
> sin, such as the "magic names" sins committed by serialization.
The current syntax is something like 
String."a text" 

There is no method name, so we have basically two choices, either make the syntax more like a method call, it's what Scala does 
String.method"a text" 

Or we specify what i would call a protocol. 
A protocol is like a method call by enhanced with an adhoc syntax and constraints. 

By example, the constructor of Java is a protocol, we name the method with the same name as the class and do not specify a return type and magically, it becomes a constructor with it's own set of rules. 

Soon we will introduce user defined pattern methods, this alspo needs a protocol, the current proposal for them is to use a special modifier like destructor or pattern. 

If you prefer to use a special modifier, i'm fine with that, if you think it's better to change the user site to specify a method name, i'm fine with that too. 

> We surely didn't want to do that either.

>> - we also want to be able to instantiate regex Pattern,
>>   and have a magic optimisation that creates the Pattern instance only one

>>   Pattern pattern = Pattern."foo|bar";

> You said the magic anti-word, which is "magic". We don't want this to be magic.
> (Examples like this are better treated as a form of optimistic constant
> folding, along the lines explored at my JVMLS talk a few years ago.)

> Summary: wait for constant folding.
I don't like constant folding for several reasons, it's a one size fit for all, you can not specify in the code how you transform the format before it being constant folded, and the more the Java compiler is dumb the better. 
Constant folding is the kind of feature that tends to interact with all other feature (recent example, case Foo foo vs case Foo foo && true vs case Foo foo && 2 == 2). 

>> I think the simplest way to specify an interpolation method is to have a method
>> with a special name,
>> i will use __interpolate__ because i don't want to discuss the exact syntax
>> here.

> This is committing the same "magic name" sin as serialization. We deliberately
> avoided this in the design. When we have type classes, we'll be able to use
> that as a way to bridge from a type name to a witness to a particular class.
> Our design was crafted so that it could be gracefully extended to such a
> mechanism, when it is available (using a type name instead of an instance
> reference at the use site.)

> Summary: wait for type classes.
Adding type classes may solve how to specify a contract on a static method, it does not solve the fact that you want the signature of the method (static or not) to be polymorphic. 

>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.

> This is an obviously attractive goal, but the mechanism is way too ad-hoc -- and
> also too limited -- and also too advanced to be a language feature. Bootstraps
> are way too complicated to expose in the source language in this way,
> especially not this magically. And its too ad-hoc, since its specific to the
> interpolation feature, whereas one could imagine a number of other contexts
> where it is useful too. So this is a bad tradeoff in many ways. Jim's
> implementation very cleverly gets the equivalent of this using pure library
> implementation (which leans on MutableCallSite.)
Using a MutableCallsite as a way to devirtualize something you have arbitrarily specified as virtual is the tail wagging the dog. 

I've written a library [1] that uses MutableCallsite where it should use ConstantCallSite to bypass the inability of javac to generate an invokedynamic. 
But at least, i always felt guilty about it. 

Adding mutable callsites to the runtime of Java is a mistake, the performance model is really tricky. 

> While it is surely a desirable goal to be able to optimize formatter
> implementation, it is also super-easy to become obsessed with this, and give it
> a bigger place in the feature than it deserves. For some cases -- notably
> String::format -- there are huge savings to be had (from a number of sources,
> not least of which is that scanning the string at every invocation and choosing
> a strategy based on that is expensive.) But in other cases, it is almost
> irrelevant. For pure concatenation, it is already pretty fast; for SQL, the
> cost of constructing the query is a tiny part of the execution time, so its not
> even worth optimizing. So this is a "nice to have" rather than the centerpiece
> of the feature.

> To be clear, the centerpiece is the gathering up of a template + parameters so
> that their combination can be handled by another entity, whether right now,
> later, or never. Optimizing the case where it is done right now, using a
> predictable choice of entity, is an optimization, but not the centerpiece.

> Let me sketch out how we're envisioning this. The API is something like:

> interface TemplatePolicy<T, E extends Exception> {
> T apply(TemplatedString ts);

> // returns MethodHandle (TemplatePolicy, TemplatedString) -> Object
> default MethodHandle asMethodHandle(TemplatedString ts) {
> return MH[TemplatePolicy::apply]
> }
> }
Let's not talk about the bootstrap method for a second. 
This API fails to indicate to the compiler the type of the parameters that are allowed before calling the template, by example, i may want to specify a query as a String but with only expression of type Expression as arguments. 
This API forces the implementation to be ready to have any arguments (and those have to be boxed). 

And you are re-inventing a strawman way to implement the JSR 292, you're design is actually this is quite close to the early designs of Gilad Bracha. 
At some point, you will want: 
- to share the same TemplatePolicy without doing inheritance, you will re-invent the Lookup object 
- avoid the unecessary boxing, you will pass the MethodType as parameter 
- avoid to have a PIC (Polymorphic Inliniing Cache) for things as simple as return always the same constant, you will make the API a function call not a method call 
- avoid to wait until until all arguments are on the stack and segregate between dynamic arguments and constant arguments, you you will re-invent the boostrap API 

The reason you will gravitate toward the bootstrap API is that fundmentally, it's a way to specify a linker in Java code, which is what you want here. 

> The API specification has a number of constraints on the implementation of
> asMethodHandle, which I'll get to in a second. When the compiler encounters an
> immediate application P."...", it generates an indy, which uses a special
> bootstrap that returns a MutableCallSite. The MutableCallSite initially has as
> its target a special secondary bootstrap MH, which represents an interpolation
> site that has not yet seen an actual invocation. The secondary bootstrap MH has
> the shape of TemplatePolicy::apply (e.g., (TemplatePolicy, TemplatedString) ->
> Object), so on first invocation it receives the TP object and the TS. It then
> calls TP::asMethodHandle, and wraps this MH with a GWT which validates the
> invariants and proceeds to that MH if they hold -- which they will 99.x% of the
> time.

> The invariant is that the dynamic type of the per-instantiation TP be == to the
> dynamic type of the TP that was present at secondary linkage. That is, it be an
> instance of the same class, but not the same instance. By definition, the
> string will always be the same as will the types of the parameters, since this
> is specific to concrete P."..." sites. So the MH can take advantage of that.

> The constraint on TP::asMethodHandle is that it not undermine this invariant;
> that if it generates a MH that is dependent on TP state, it not bake that state
> into the resulting MH, but instead, treat the TP state as a parameter. Further,
> the MH must be behaviorally equivalent to calling apply.

> If the GWT fails, it means the user is doing something like:

> for (TP p : listOfProcessors) {
> blah blah p."foo \{a}"
> }

> in which case the GWT falls back to the "just do an invokevirtual of TP::apply"
> strategy. (It could get fancier but I don't see any point.)

> This lets us rescue indy-based translation without exposing a magic indy-hook in
> the JLS. (Sorry, I know you wanted the magic indy hook.)
The issue is not about me asking you to add a magic hook, once you have an API that returns a MethodHandle used by an invokedynamic, you are providing a magic hook. 
The issue is that in your attempt to try to not provide a magic hook, you are providing a Smalltalk like magic hook, where all type information are lost (no typechecking by the compiler, no way to get the type information at runtime to avoid boxing) and with a crappy performance model (boxing again + a PIC for devirtualizing something which should not be virtual). 

Rémi 

[1] https://github.com/forax/exotic 

> On 10/13/2021 1:09 PM, Remi Forax wrote:

>> Hi everybody, i've spend some time to think how the String interpolation +
>> Policy should be specified and implemented.

>> The goal is to add a syntax specifying a user defined method to "interpolate"
>> (for a lack of better word) a string with arguments.

>> Given that it's a method, the exact semantics of the interpolation, things like
>> how the arguments are escaped, how the formatted string is parsed, is written
>> is Java, this will allow to support a wide range of use cases.

>> This proposal does not differ from the original proposal of Brian and Jim in its
>> goal but in the way a user declare the interpolation method(s).

>> TLDR; you can declare an interpolation method and optionally an interpolation
>> bootstrap method if you want a more efficient code at the price of having to
>> play with the method handle API.

>> ---

>> The proposal of Brian and Jim uses an interface to define the policy but in this
>> case, using an interface is not what we want.
>> I think there are two main reasons,
>> - the interpolation method can be an instance method but can also be a factory
>> method, a static method, and an interface can not constraint a static method.
>> - we want the signature of the interpolation method to be free to use any number
>> of parameters of any types, something that can not be specified with type
>> parameters in Java.

>> So let's take a step back and write some examples, as a user of the
>> interpolation method, we want to
>> - be able to specify string interpolation,
>>   you can notice that this is a static method.

>>   String name = ...
>>   int value = ...
>>   String s = String."name: \(name) age: \(age)";

>> - we also want to be able to instantiate regex Pattern,
>>   and have a magic optimisation that creates the Pattern instance only one

>>   Pattern pattern = Pattern."foo|bar";

>> - we also want to support instance method, so the interpolation can escape the
>> arguments differently depending on the context,
>>   here by example, escaping differently depending on the database driver.

>>   String username = ...
>>   Connection connection = ...
>>   connection."""
>>     SELECT * FROM users where user == "\(username)"
>>     """;

>> I think the simplest way to specify an interpolation method is to have a method
>> with a special name,
>> i will use __interpolate__ because i don't want to discuss the exact syntax
>> here.

>> This method can be a static method or an instance method and has a restriction,
>> the first parameter has to be a String because the first argument is the
>> formatted string.

>> Here is an example of how the method __interpolate__ inside java.lang.String can
>> be written.
>> To avoid everybody to re-implement the parsing of the formatted string, the
>> class java.lang.runtime.InterpolateMetafactory provides a helper method
>> "formatIterator" that returns an iterator splitting the formatted string into
>> text and binding.

>>   package java.lang;

>>   public class String {
>>     ...
>>     public static String __interpolate__(String format, Object... args) {
>>       var i = 0;
>>       var builder = new StringBuilder();
>>       var iterator = InterpolateMetafactory.formatIterator(format);
>>       while(iterator.hasNext()) {
>>         switch(iterator.next()) {
>>           case Text(var text) -> builder.append(text);
>>           case Binding binding -> args[i++];
>>         }
>>       }
>>       return builder.toString();
>>     }
>>     ...
>>   }

>> While this is nice, you may think that it's just syntactic sugar and it will not
>> be more performant that String.valueOf(), i.e. it will be slow.

>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.
>> This method __interpolate__bootstrap__ is not required, can not replace the
>> method __interpolate__, both __interpolate__ and __interpolate__bootstrap__
>> has to be present and it's a backward compatible change to add a method
>> __interpolate__bootstrap__ after the fact, there is no need to recompile
>> all the client code.

>> For that the compiler translation rely on invokedynamic to call the method
>> bootstrap of the class InterpolateMetafactor that at runtime decide
>> to trampoline either to the method __interpolate__bootstrap__ or to the method
>> __interpolate__ if no __interpolate__bootstrap__ exists.

>> Here is an example of how a call to the interpolation method of String is
>> generated by javac
>> For the Java code

>>   String name = ...
>>   int value = ...
>>   String s = String."name: \(name) age: \(age)";

>> the equivalent bytecode is

>>   aload_1.  // load name
>>   iload_2.  // load age
>>   invokedynamic __interpolate__ (Ljava/lang/StringI)Ljava/lang/String;
>>    java.lang.runtime.InterpolateMetafactory.bootstrap(Lookup, String, MethodType,
>>     String, MethodHandle):CallSite
>>    [ "name: \(name) age: \(age)", String::__interpolate__(String, Object[]):String
>>     ]

>> From the perspective of the compiler the method __interpolate__ works exactly
>> like a method with a polymorphic method signature (the method annotated with
>> @PolymorphicSignature),
>> so the descriptor of invokedynamic is created by collecting the type of the
>> argument, here the interpolation method is called with a String and an int, so
>> the descriptor
>> and the return type is String so the descriptor is
>> (Ljava/lang/StringI)Ljava/lang/String;

>> Considering the interpolation method as a polymorphic method is important in
>> term of performance because it means that not boxing will be done by the
>> compiler, if there are some boxing, they will be done by the runtime, so are
>> optional if the __interpolate__bootstrap__ does not need to box arguments.

>> You can also notice that the formatted string is passed as a bootstrap constant
>> so all the parsing of the format can be done once outside of the hot path.
>> A call to invokedynamic also pass as a second bootstrap argument the method
>> handle to the method __interpolate__, so the implementation inside
>> InterpolateMetafactory.bootstrap can called this method if no method
>> __interpolate__bootstrap__ exists.

>> Here is a raw implementation of the class InterpolateMetafactory.
>> The method formatIterator() return an Iterator of Token which is a sealed class.
>> The method bootstrap() first lookup to a method "__interpolate__bootstrap__" in
>> the lookup class that takes a Lookup, a String, a MethodType, the format and
>> the default implementation and call it if it exists or takes the default
>> implementation, bind the formatted String and adapt the arguments using asType
>> (ask for boxing, etc).

>>    package java.lang.runtime;

>>    public class InterpolateMetafactory {
>>      public sealed interface Token {
>>        public record Text(String text) implements Token {}
>>        public record Binding(String name) implements Token {}
>>      }

>>      public static Iterator<Token> formatIterator(String format) {
>>        ...
>>      }

>>     public static CallSite bootstrap(Lookup lookup, String name, MethodType
>>      methodType, String format, MethodHandle impl) throws Throwable {
>>        // check if there is a bootstrap method
>>        MethodHandle bootstrap;
>>        try {
>>          bootstrap = lookup.findStatic(lookup.lookupClass(),
>>          "__interpolate__bootstrap__", MethodType.methodType(CallSite.class,
>>          Lookup.class, String.class, MethodType.class, String.class,
>>           MethodHandle.class));
>>        } catch(NoSuchMethodException e) {
>>          // bind the default implementation
>>          return new ConstantCallSite(impl.bindTo(format).asType(methodType));
>>        }
>>        return boostrap.invoke(lookup, name, methodType, format, impl);
>>      }
>>    }

>> Here is another example, showing how to declare the methods __interpolate__ and
>> __interpolate__bootstrap__ inside java.util.regex.Pattern.
>> The "default" implementation calls Pattern.compile() and the optimized one
>> always returns the result of Pattern.compile() as a constant.

>>   package java.util.regex;

>>   public class Pattern {
>>    public static String __interpolate__(String format) {. // the formatted string
>>     can not have arguments
>>       return Pattern.compile(format);
>>     }

>>    private static CallSite __interpolate__bootstrap__(Lookup lookup, String name,
>>     MethodType methodType, String format, MethodHandle impl) {
>>      return new ConstantCallSite(MethodHandles.constant(Pattern.class,
>>       Pattern.compile(format)));
>>     }
>>   }

>> The method __interpolate__ provides via its signature, the parameter types that
>> are verified by the compiler.
>> It also provides a code that can be used by the tools that does static analysis
>> on the bytecode because those tools can not see through the method handle
>> returned by a bootstrap method given that it's a runtime construct, it's
>> usually not available at the time the static analysis is done. This should be
>> enough to have tools like Graal VM native image to see through the
>> invokedynamic in a similar way it sees through the invokedynamic used when
>> creating a lambda.

>> The fact that all invokedynamic goes through the method
>> InterpolateMetafactory.bootstrap and trampoline from it means that adding or
>> removing the method __interpolate__bootstrap__ is a binary compatible change,
>> if __interpolate__bootstrap__ is declared private. So implementing
>> __interpolate__bootstrap__ can be an afterthought.

>> regards,
>> Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20211014/9abbb4db/attachment-0001.htm>


More information about the amber-spec-experts mailing list