String Interpolation

forax at univ-mlv.fr forax at univ-mlv.fr
Fri Oct 15 17:09:57 UTC 2021


> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>, "amber-spec-experts"
> <amber-spec-experts at openjdk.java.net>
> Sent: Mercredi 13 Octobre 2021 21:32:19
> Subject: Re: String Interpolation

After grumbling a lot, let's restart 

[...] 

>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.

> This is an obviously attractive goal, but the mechanism is way too ad-hoc -- and
> also too limited -- and also too advanced to be a language feature. Bootstraps
> are way too complicated to expose in the source language in this way,
> especially not this magically. And its too ad-hoc, since its specific to the
> interpolation feature, whereas one could imagine a number of other contexts
> where it is useful too. So this is a bad tradeoff in many ways. Jim's
> implementation very cleverly gets the equivalent of this using pure library
> implementation (which leans on MutableCallSite.)

> While it is surely a desirable goal to be able to optimize formatter
> implementation, it is also super-easy to become obsessed with this, and give it
> a bigger place in the feature than it deserves. For some cases -- notably
> String::format -- there are huge savings to be had (from a number of sources,
> not least of which is that scanning the string at every invocation and choosing
> a strategy based on that is expensive.) But in other cases, it is almost
> irrelevant. For pure concatenation, it is already pretty fast; for SQL, the
> cost of constructing the query is a tiny part of the execution time, so its not
> even worth optimizing. So this is a "nice to have" rather than the centerpiece
> of the feature.

> To be clear, the centerpiece is the gathering up of a template + parameters so
> that their combination can be handled by another entity, whether right now,
> later, or never. Optimizing the case where it is done right now, using a
> predictable choice of entity, is an optimization, but not the centerpiece.

> Let me sketch out how we're envisioning this. The API is something like:

> interface TemplatePolicy<T, E extends Exception> {
> T apply(TemplatedString ts);

> // returns MethodHandle (TemplatePolicy, TemplatedString) -> Object
> default MethodHandle asMethodHandle(TemplatedString ts) {
> return MH[TemplatePolicy::apply]
> }
> }
I don't understand where you pass the arguments, is it not more something like 

public interface TemplatePolicy< T , E extends Exception> { 
T apply(TemplatedString template, Object... args) throws E ; 

// returns a MethodHandle with the signature T(TemplatePolicy, Object...) 
default MethodHandle asMethodHandle(TemplatedString template, MethodType type) { 
... 
} 
} 

The second parameter of asMethodHandle is the descriptor of invokedynamic, this ensure that there is no boxing on the fast path, and if the implementation of TemplatePolicy is a final class. 

> The API specification has a number of constraints on the implementation of
> asMethodHandle, which I'll get to in a second. When the compiler encounters an
> immediate application P."...", it generates an indy, which uses a special
> bootstrap that returns a MutableCallSite. The MutableCallSite initially has as
> its target a special secondary bootstrap MH, which represents an interpolation
> site that has not yet seen an actual invocation. The secondary bootstrap MH has
> the shape of TemplatePolicy::apply (e.g., (TemplatePolicy, TemplatedString) ->
> Object), so on first invocation it receives the TP object and the TS. It then
> calls TP::asMethodHandle, and wraps this MH with a GWT which validates the
> invariants and proceeds to that MH if they hold -- which they will 99.x% of the
> time.

> The invariant is that the dynamic type of the per-instantiation TP be == to the
> dynamic type of the TP that was present at secondary linkage. That is, it be an
> instance of the same class, but not the same instance. By definition, the
> string will always be the same as will the types of the parameters, since this
> is specific to concrete P."..." sites. So the MH can take advantage of that.

> The constraint on TP::asMethodHandle is that it not undermine this invariant;
> that if it generates a MH that is dependent on TP state, it not bake that state
> into the resulting MH, but instead, treat the TP state as a parameter. Further,
> the MH must be behaviorally equivalent to calling apply.

> If the GWT fails, it means the user is doing something like:

> for (TP p : listOfProcessors) {
> blah blah p."foo \{a}"
> }

> in which case the GWT falls back to the "just do an invokevirtual of TP::apply"
> strategy. (It could get fancier but I don't see any point.)

> This lets us rescue indy-based translation without exposing a magic indy-hook in
> the JLS. (Sorry, I know you wanted the magic indy hook.)
As i said, i don't care about having the exact bootstrap API, but i care about the unnecessary boxing / class check / etc that can occur. 
I believe that if asMethodHandle() takes a MethodType as second parameter, performance should be Ok. 

Is it something that can be negotiated ? 

I've implemented a prototype to convince myself that with a MethodType as parameter is was not actually that bad. 
[ https://github.com/forax/java-interpolation | https://github.com/forax/java-interpolation ] 

(I also suppose that the TemplatedString is created with a constant dynamic ?) 

Rémi 

> On 10/13/2021 1:09 PM, Remi Forax wrote:

>> Hi everybody, i've spend some time to think how the String interpolation +
>> Policy should be specified and implemented.

>> The goal is to add a syntax specifying a user defined method to "interpolate"
>> (for a lack of better word) a string with arguments.

>> Given that it's a method, the exact semantics of the interpolation, things like
>> how the arguments are escaped, how the formatted string is parsed, is written
>> is Java, this will allow to support a wide range of use cases.

>> This proposal does not differ from the original proposal of Brian and Jim in its
>> goal but in the way a user declare the interpolation method(s).

>> TLDR; you can declare an interpolation method and optionally an interpolation
>> bootstrap method if you want a more efficient code at the price of having to
>> play with the method handle API.

>> ---

>> The proposal of Brian and Jim uses an interface to define the policy but in this
>> case, using an interface is not what we want.
>> I think there are two main reasons,
>> - the interpolation method can be an instance method but can also be a factory
>> method, a static method, and an interface can not constraint a static method.
>> - we want the signature of the interpolation method to be free to use any number
>> of parameters of any types, something that can not be specified with type
>> parameters in Java.

>> So let's take a step back and write some examples, as a user of the
>> interpolation method, we want to
>> - be able to specify string interpolation,
>>   you can notice that this is a static method.

>>   String name = ...
>>   int value = ...
>>   String s = String."name: \(name) age: \(age)";

>> - we also want to be able to instantiate regex Pattern,
>>   and have a magic optimisation that creates the Pattern instance only one

>>   Pattern pattern = Pattern."foo|bar";

>> - we also want to support instance method, so the interpolation can escape the
>> arguments differently depending on the context,
>>   here by example, escaping differently depending on the database driver.

>>   String username = ...
>>   Connection connection = ...
>>   connection."""
>>     SELECT * FROM users where user == "\(username)"
>>     """;

>> I think the simplest way to specify an interpolation method is to have a method
>> with a special name,
>> i will use __interpolate__ because i don't want to discuss the exact syntax
>> here.

>> This method can be a static method or an instance method and has a restriction,
>> the first parameter has to be a String because the first argument is the
>> formatted string.

>> Here is an example of how the method __interpolate__ inside java.lang.String can
>> be written.
>> To avoid everybody to re-implement the parsing of the formatted string, the
>> class java.lang.runtime.InterpolateMetafactory provides a helper method
>> "formatIterator" that returns an iterator splitting the formatted string into
>> text and binding.

>>   package java.lang;

>>   public class String {
>>     ...
>>     public static String __interpolate__(String format, Object... args) {
>>       var i = 0;
>>       var builder = new StringBuilder();
>>       var iterator = InterpolateMetafactory.formatIterator(format);
>>       while(iterator.hasNext()) {
>>         switch(iterator.next()) {
>>           case Text(var text) -> builder.append(text);
>>           case Binding binding -> args[i++];
>>         }
>>       }
>>       return builder.toString();
>>     }
>>     ...
>>   }

>> While this is nice, you may think that it's just syntactic sugar and it will not
>> be more performant that String.valueOf(), i.e. it will be slow.

>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.
>> This method __interpolate__bootstrap__ is not required, can not replace the
>> method __interpolate__, both __interpolate__ and __interpolate__bootstrap__
>> has to be present and it's a backward compatible change to add a method
>> __interpolate__bootstrap__ after the fact, there is no need to recompile
>> all the client code.

>> For that the compiler translation rely on invokedynamic to call the method
>> bootstrap of the class InterpolateMetafactor that at runtime decide
>> to trampoline either to the method __interpolate__bootstrap__ or to the method
>> __interpolate__ if no __interpolate__bootstrap__ exists.

>> Here is an example of how a call to the interpolation method of String is
>> generated by javac
>> For the Java code

>>   String name = ...
>>   int value = ...
>>   String s = String."name: \(name) age: \(age)";

>> the equivalent bytecode is

>>   aload_1.  // load name
>>   iload_2.  // load age
>>   invokedynamic __interpolate__ (Ljava/lang/StringI)Ljava/lang/String;
>>    java.lang.runtime.InterpolateMetafactory.bootstrap(Lookup, String, MethodType,
>>     String, MethodHandle):CallSite
>>    [ "name: \(name) age: \(age)", String::__interpolate__(String, Object[]):String
>>     ]

>> From the perspective of the compiler the method __interpolate__ works exactly
>> like a method with a polymorphic method signature (the method annotated with
>> @PolymorphicSignature),
>> so the descriptor of invokedynamic is created by collecting the type of the
>> argument, here the interpolation method is called with a String and an int, so
>> the descriptor
>> and the return type is String so the descriptor is
>> (Ljava/lang/StringI)Ljava/lang/String;

>> Considering the interpolation method as a polymorphic method is important in
>> term of performance because it means that not boxing will be done by the
>> compiler, if there are some boxing, they will be done by the runtime, so are
>> optional if the __interpolate__bootstrap__ does not need to box arguments.

>> You can also notice that the formatted string is passed as a bootstrap constant
>> so all the parsing of the format can be done once outside of the hot path.
>> A call to invokedynamic also pass as a second bootstrap argument the method
>> handle to the method __interpolate__, so the implementation inside
>> InterpolateMetafactory.bootstrap can called this method if no method
>> __interpolate__bootstrap__ exists.

>> Here is a raw implementation of the class InterpolateMetafactory.
>> The method formatIterator() return an Iterator of Token which is a sealed class.
>> The method bootstrap() first lookup to a method "__interpolate__bootstrap__" in
>> the lookup class that takes a Lookup, a String, a MethodType, the format and
>> the default implementation and call it if it exists or takes the default
>> implementation, bind the formatted String and adapt the arguments using asType
>> (ask for boxing, etc).

>>    package java.lang.runtime;

>>    public class InterpolateMetafactory {
>>      public sealed interface Token {
>>        public record Text(String text) implements Token {}
>>        public record Binding(String name) implements Token {}
>>      }

>>      public static Iterator<Token> formatIterator(String format) {
>>        ...
>>      }

>>     public static CallSite bootstrap(Lookup lookup, String name, MethodType
>>      methodType, String format, MethodHandle impl) throws Throwable {
>>        // check if there is a bootstrap method
>>        MethodHandle bootstrap;
>>        try {
>>          bootstrap = lookup.findStatic(lookup.lookupClass(),
>>          "__interpolate__bootstrap__", MethodType.methodType(CallSite.class,
>>          Lookup.class, String.class, MethodType.class, String.class,
>>           MethodHandle.class));
>>        } catch(NoSuchMethodException e) {
>>          // bind the default implementation
>>          return new ConstantCallSite(impl.bindTo(format).asType(methodType));
>>        }
>>        return boostrap.invoke(lookup, name, methodType, format, impl);
>>      }
>>    }

>> Here is another example, showing how to declare the methods __interpolate__ and
>> __interpolate__bootstrap__ inside java.util.regex.Pattern.
>> The "default" implementation calls Pattern.compile() and the optimized one
>> always returns the result of Pattern.compile() as a constant.

>>   package java.util.regex;

>>   public class Pattern {
>>    public static String __interpolate__(String format) {. // the formatted string
>>     can not have arguments
>>       return Pattern.compile(format);
>>     }

>>    private static CallSite __interpolate__bootstrap__(Lookup lookup, String name,
>>     MethodType methodType, String format, MethodHandle impl) {
>>      return new ConstantCallSite(MethodHandles.constant(Pattern.class,
>>       Pattern.compile(format)));
>>     }
>>   }

>> The method __interpolate__ provides via its signature, the parameter types that
>> are verified by the compiler.
>> It also provides a code that can be used by the tools that does static analysis
>> on the bytecode because those tools can not see through the method handle
>> returned by a bootstrap method given that it's a runtime construct, it's
>> usually not available at the time the static analysis is done. This should be
>> enough to have tools like Graal VM native image to see through the
>> invokedynamic in a similar way it sees through the invokedynamic used when
>> creating a lambda.

>> The fact that all invokedynamic goes through the method
>> InterpolateMetafactory.bootstrap and trampoline from it means that adding or
>> removing the method __interpolate__bootstrap__ is a binary compatible change,
>> if __interpolate__bootstrap__ is declared private. So implementing
>> __interpolate__bootstrap__ can be an afterthought.

>> regards,
>> Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20211015/e2cf22e4/attachment-0001.htm>


More information about the amber-spec-experts mailing list