String Interpolation
forax at univ-mlv.fr
forax at univ-mlv.fr
Fri Oct 15 17:09:57 UTC 2021
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>, "amber-spec-experts"
> <amber-spec-experts at openjdk.java.net>
> Sent: Mercredi 13 Octobre 2021 21:32:19
> Subject: Re: String Interpolation
After grumbling a lot, let's restart
[...]
>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.
> This is an obviously attractive goal, but the mechanism is way too ad-hoc -- and
> also too limited -- and also too advanced to be a language feature. Bootstraps
> are way too complicated to expose in the source language in this way,
> especially not this magically. And its too ad-hoc, since its specific to the
> interpolation feature, whereas one could imagine a number of other contexts
> where it is useful too. So this is a bad tradeoff in many ways. Jim's
> implementation very cleverly gets the equivalent of this using pure library
> implementation (which leans on MutableCallSite.)
> While it is surely a desirable goal to be able to optimize formatter
> implementation, it is also super-easy to become obsessed with this, and give it
> a bigger place in the feature than it deserves. For some cases -- notably
> String::format -- there are huge savings to be had (from a number of sources,
> not least of which is that scanning the string at every invocation and choosing
> a strategy based on that is expensive.) But in other cases, it is almost
> irrelevant. For pure concatenation, it is already pretty fast; for SQL, the
> cost of constructing the query is a tiny part of the execution time, so its not
> even worth optimizing. So this is a "nice to have" rather than the centerpiece
> of the feature.
> To be clear, the centerpiece is the gathering up of a template + parameters so
> that their combination can be handled by another entity, whether right now,
> later, or never. Optimizing the case where it is done right now, using a
> predictable choice of entity, is an optimization, but not the centerpiece.
> Let me sketch out how we're envisioning this. The API is something like:
> interface TemplatePolicy<T, E extends Exception> {
> T apply(TemplatedString ts);
> // returns MethodHandle (TemplatePolicy, TemplatedString) -> Object
> default MethodHandle asMethodHandle(TemplatedString ts) {
> return MH[TemplatePolicy::apply]
> }
> }
I don't understand where you pass the arguments, is it not more something like
public interface TemplatePolicy< T , E extends Exception> {
T apply(TemplatedString template, Object... args) throws E ;
// returns a MethodHandle with the signature T(TemplatePolicy, Object...)
default MethodHandle asMethodHandle(TemplatedString template, MethodType type) {
...
}
}
The second parameter of asMethodHandle is the descriptor of invokedynamic, this ensure that there is no boxing on the fast path, and if the implementation of TemplatePolicy is a final class.
> The API specification has a number of constraints on the implementation of
> asMethodHandle, which I'll get to in a second. When the compiler encounters an
> immediate application P."...", it generates an indy, which uses a special
> bootstrap that returns a MutableCallSite. The MutableCallSite initially has as
> its target a special secondary bootstrap MH, which represents an interpolation
> site that has not yet seen an actual invocation. The secondary bootstrap MH has
> the shape of TemplatePolicy::apply (e.g., (TemplatePolicy, TemplatedString) ->
> Object), so on first invocation it receives the TP object and the TS. It then
> calls TP::asMethodHandle, and wraps this MH with a GWT which validates the
> invariants and proceeds to that MH if they hold -- which they will 99.x% of the
> time.
> The invariant is that the dynamic type of the per-instantiation TP be == to the
> dynamic type of the TP that was present at secondary linkage. That is, it be an
> instance of the same class, but not the same instance. By definition, the
> string will always be the same as will the types of the parameters, since this
> is specific to concrete P."..." sites. So the MH can take advantage of that.
> The constraint on TP::asMethodHandle is that it not undermine this invariant;
> that if it generates a MH that is dependent on TP state, it not bake that state
> into the resulting MH, but instead, treat the TP state as a parameter. Further,
> the MH must be behaviorally equivalent to calling apply.
> If the GWT fails, it means the user is doing something like:
> for (TP p : listOfProcessors) {
> blah blah p."foo \{a}"
> }
> in which case the GWT falls back to the "just do an invokevirtual of TP::apply"
> strategy. (It could get fancier but I don't see any point.)
> This lets us rescue indy-based translation without exposing a magic indy-hook in
> the JLS. (Sorry, I know you wanted the magic indy hook.)
As i said, i don't care about having the exact bootstrap API, but i care about the unnecessary boxing / class check / etc that can occur.
I believe that if asMethodHandle() takes a MethodType as second parameter, performance should be Ok.
Is it something that can be negotiated ?
I've implemented a prototype to convince myself that with a MethodType as parameter is was not actually that bad.
[ https://github.com/forax/java-interpolation | https://github.com/forax/java-interpolation ]
(I also suppose that the TemplatedString is created with a constant dynamic ?)
Rémi
> On 10/13/2021 1:09 PM, Remi Forax wrote:
>> Hi everybody, i've spend some time to think how the String interpolation +
>> Policy should be specified and implemented.
>> The goal is to add a syntax specifying a user defined method to "interpolate"
>> (for a lack of better word) a string with arguments.
>> Given that it's a method, the exact semantics of the interpolation, things like
>> how the arguments are escaped, how the formatted string is parsed, is written
>> is Java, this will allow to support a wide range of use cases.
>> This proposal does not differ from the original proposal of Brian and Jim in its
>> goal but in the way a user declare the interpolation method(s).
>> TLDR; you can declare an interpolation method and optionally an interpolation
>> bootstrap method if you want a more efficient code at the price of having to
>> play with the method handle API.
>> ---
>> The proposal of Brian and Jim uses an interface to define the policy but in this
>> case, using an interface is not what we want.
>> I think there are two main reasons,
>> - the interpolation method can be an instance method but can also be a factory
>> method, a static method, and an interface can not constraint a static method.
>> - we want the signature of the interpolation method to be free to use any number
>> of parameters of any types, something that can not be specified with type
>> parameters in Java.
>> So let's take a step back and write some examples, as a user of the
>> interpolation method, we want to
>> - be able to specify string interpolation,
>> you can notice that this is a static method.
>> String name = ...
>> int value = ...
>> String s = String."name: \(name) age: \(age)";
>> - we also want to be able to instantiate regex Pattern,
>> and have a magic optimisation that creates the Pattern instance only one
>> Pattern pattern = Pattern."foo|bar";
>> - we also want to support instance method, so the interpolation can escape the
>> arguments differently depending on the context,
>> here by example, escaping differently depending on the database driver.
>> String username = ...
>> Connection connection = ...
>> connection."""
>> SELECT * FROM users where user == "\(username)"
>> """;
>> I think the simplest way to specify an interpolation method is to have a method
>> with a special name,
>> i will use __interpolate__ because i don't want to discuss the exact syntax
>> here.
>> This method can be a static method or an instance method and has a restriction,
>> the first parameter has to be a String because the first argument is the
>> formatted string.
>> Here is an example of how the method __interpolate__ inside java.lang.String can
>> be written.
>> To avoid everybody to re-implement the parsing of the formatted string, the
>> class java.lang.runtime.InterpolateMetafactory provides a helper method
>> "formatIterator" that returns an iterator splitting the formatted string into
>> text and binding.
>> package java.lang;
>> public class String {
>> ...
>> public static String __interpolate__(String format, Object... args) {
>> var i = 0;
>> var builder = new StringBuilder();
>> var iterator = InterpolateMetafactory.formatIterator(format);
>> while(iterator.hasNext()) {
>> switch(iterator.next()) {
>> case Text(var text) -> builder.append(text);
>> case Binding binding -> args[i++];
>> }
>> }
>> return builder.toString();
>> }
>> ...
>> }
>> While this is nice, you may think that it's just syntactic sugar and it will not
>> be more performant that String.valueOf(), i.e. it will be slow.
>> That's why the specification allow you to provide a second more optimised
>> version of the interpolation method using a method __interpolate__bootstrap__.
>> This method __interpolate__bootstrap__ is not required, can not replace the
>> method __interpolate__, both __interpolate__ and __interpolate__bootstrap__
>> has to be present and it's a backward compatible change to add a method
>> __interpolate__bootstrap__ after the fact, there is no need to recompile
>> all the client code.
>> For that the compiler translation rely on invokedynamic to call the method
>> bootstrap of the class InterpolateMetafactor that at runtime decide
>> to trampoline either to the method __interpolate__bootstrap__ or to the method
>> __interpolate__ if no __interpolate__bootstrap__ exists.
>> Here is an example of how a call to the interpolation method of String is
>> generated by javac
>> For the Java code
>> String name = ...
>> int value = ...
>> String s = String."name: \(name) age: \(age)";
>> the equivalent bytecode is
>> aload_1. // load name
>> iload_2. // load age
>> invokedynamic __interpolate__ (Ljava/lang/StringI)Ljava/lang/String;
>> java.lang.runtime.InterpolateMetafactory.bootstrap(Lookup, String, MethodType,
>> String, MethodHandle):CallSite
>> [ "name: \(name) age: \(age)", String::__interpolate__(String, Object[]):String
>> ]
>> From the perspective of the compiler the method __interpolate__ works exactly
>> like a method with a polymorphic method signature (the method annotated with
>> @PolymorphicSignature),
>> so the descriptor of invokedynamic is created by collecting the type of the
>> argument, here the interpolation method is called with a String and an int, so
>> the descriptor
>> and the return type is String so the descriptor is
>> (Ljava/lang/StringI)Ljava/lang/String;
>> Considering the interpolation method as a polymorphic method is important in
>> term of performance because it means that not boxing will be done by the
>> compiler, if there are some boxing, they will be done by the runtime, so are
>> optional if the __interpolate__bootstrap__ does not need to box arguments.
>> You can also notice that the formatted string is passed as a bootstrap constant
>> so all the parsing of the format can be done once outside of the hot path.
>> A call to invokedynamic also pass as a second bootstrap argument the method
>> handle to the method __interpolate__, so the implementation inside
>> InterpolateMetafactory.bootstrap can called this method if no method
>> __interpolate__bootstrap__ exists.
>> Here is a raw implementation of the class InterpolateMetafactory.
>> The method formatIterator() return an Iterator of Token which is a sealed class.
>> The method bootstrap() first lookup to a method "__interpolate__bootstrap__" in
>> the lookup class that takes a Lookup, a String, a MethodType, the format and
>> the default implementation and call it if it exists or takes the default
>> implementation, bind the formatted String and adapt the arguments using asType
>> (ask for boxing, etc).
>> package java.lang.runtime;
>> public class InterpolateMetafactory {
>> public sealed interface Token {
>> public record Text(String text) implements Token {}
>> public record Binding(String name) implements Token {}
>> }
>> public static Iterator<Token> formatIterator(String format) {
>> ...
>> }
>> public static CallSite bootstrap(Lookup lookup, String name, MethodType
>> methodType, String format, MethodHandle impl) throws Throwable {
>> // check if there is a bootstrap method
>> MethodHandle bootstrap;
>> try {
>> bootstrap = lookup.findStatic(lookup.lookupClass(),
>> "__interpolate__bootstrap__", MethodType.methodType(CallSite.class,
>> Lookup.class, String.class, MethodType.class, String.class,
>> MethodHandle.class));
>> } catch(NoSuchMethodException e) {
>> // bind the default implementation
>> return new ConstantCallSite(impl.bindTo(format).asType(methodType));
>> }
>> return boostrap.invoke(lookup, name, methodType, format, impl);
>> }
>> }
>> Here is another example, showing how to declare the methods __interpolate__ and
>> __interpolate__bootstrap__ inside java.util.regex.Pattern.
>> The "default" implementation calls Pattern.compile() and the optimized one
>> always returns the result of Pattern.compile() as a constant.
>> package java.util.regex;
>> public class Pattern {
>> public static String __interpolate__(String format) {. // the formatted string
>> can not have arguments
>> return Pattern.compile(format);
>> }
>> private static CallSite __interpolate__bootstrap__(Lookup lookup, String name,
>> MethodType methodType, String format, MethodHandle impl) {
>> return new ConstantCallSite(MethodHandles.constant(Pattern.class,
>> Pattern.compile(format)));
>> }
>> }
>> The method __interpolate__ provides via its signature, the parameter types that
>> are verified by the compiler.
>> It also provides a code that can be used by the tools that does static analysis
>> on the bytecode because those tools can not see through the method handle
>> returned by a bootstrap method given that it's a runtime construct, it's
>> usually not available at the time the static analysis is done. This should be
>> enough to have tools like Graal VM native image to see through the
>> invokedynamic in a similar way it sees through the invokedynamic used when
>> creating a lambda.
>> The fact that all invokedynamic goes through the method
>> InterpolateMetafactory.bootstrap and trampoline from it means that adding or
>> removing the method __interpolate__bootstrap__ is a binary compatible change,
>> if __interpolate__bootstrap__ is declared private. So implementing
>> __interpolate__bootstrap__ can be an afterthought.
>> regards,
>> Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20211015/e2cf22e4/attachment-0001.htm>
More information about the amber-spec-experts
mailing list