Update on String Templates (JEP 459)

Brian Goetz brian.goetz at oracle.com
Tue Mar 19 14:23:03 UTC 2024


Let's pull on this string some more.  Assuming we settled on disjoint 
types and syntaxes, with no magic conversions, what library support do 
we need directly for ST?  I am thinking (please, let's focus on the 
functionality before we nitpick the names):

     // on String
     static join(StringTemplate)    // previously STR

     // on StringTemplate
     String join()                                 // STR, 
instance/suffix version
     static StringTemplate join(StringTemplate...) // + for string templates

This is a pleasantly short set; is anything missing?  (Not addressing 
the "which things were previously processors, but now need API points" 
right now -- that's a separate discussion.)

On 3/18/2024 9:38 AM, Brian Goetz wrote:
> I think this has been a good discussion, and it looks like we're 
> starting to see some convergence.
>
> I think we keep trying to exploit ambiguity / implicitness, and it 
> doesn't go well:
>
>  - Many users want STR to be the "implicit processor", but that isn't 
> good for security
>  - We tried reusing the String delimiters for string templates to 
> reduce the perception of how many different things there are here, but 
> that creates cognitive load (can't tell strings from templates without 
> parsing the entire contents), among other problems
>  - We tried making String a poly expression (and other tricks) to 
> reduce the number of explicit conversions, but that created problems too
>
> John's characterization captures the feeling and eventual conclusion 
> that I think many of us share:
>
>> I kind of like Guy’s offensive-to-everyone suggestion that $ is required to make a true ST.
>
> Indeed, my first reaction to the $ sigil was "please no", but I am 
> grudgingly coming to the conclusion that we should stop trying to 
> implicitly "just figure out what the user wants" and acknowledge the 
> reality: templates are not strings, strings are not templates, and 
> they can be converted to each other with ... methods, just like any 
> other relatable types.  So string literals are as they always were; 
> string templates are a new thing, whose syntax and type is disjoint 
> from that of strings, as Guy also seems to be converging on:
>
>> And now that I have that better understanding, I think I lean toward 
>> (a) abandoning string interpolation and (b) having a single, short, 
>> _non-optional_ prefix for templates (“$” would be a plausible 
>> choice), on the grounds that I think it makes code more readable if 
>> templates are always distinguished up front from strings—and this is 
>> especially helpful when the templates are rather long and any `\{` 
>> present might be far from the beginning. It has a minimal number of 
>> cases to explain:
>>
>> “…”      string literal, must not contain \{…}, type String
>> $”…”    template literal, may contain \{…}, type StringTemplate
>
> (concrete syntax TBB (to be bikeshod), along with the spellings of S 
> -> ST and ST -> S.)
>
> Some more useful observations:
>
>  - The toString behavior cannot be mere interpolation.  Besides the 
> principled objections and inevitable propping-open-the-security-door 
> that this would lead to, people will quickly learn to abuse "" + ST as 
> the "fewest characters required" way to get interpolation, which is 
> "clever" in the same way that John's "empty \{}" trick is clever, but 
> not good for clarity.
>  - We need a story to tell for how to write good overloads, which 
> seems to be more subtle than initially thought.
>  - If the only way to make a StringTemplate is the literal syntax, 
> then STs gain a valuable security property: all fragments in the ST 
> are strings that appeared literally in code, and therefore untainted.  
> This is probably too restrictive but we should be aware of what we are 
> giving up as we explore the API options.
>  - Processors should be encouraged to "flatten" embedded STs.
>
> A few people have implied that only the tainted parts of an ST (the 
> embedded expressions) need special processing, but I'll point out that 
> the untainted parts may often require domain-specific validation.  For 
> example, a ST representing a SQL query wants balanced quotes, and 
> might want to require quotes around embedded expressions.
>
>
>
> On 3/8/2024 1:35 PM, Brian Goetz wrote:
>>
>> Time to check in with where were are with String Templates.  We’ve 
>> gone through two rounds of preview, and have received some feedback.
>>
>> As a reminder, the primary goal of gathering feedback is to learn 
>> things about the design or implementation that we don’t already know. 
>>  This could be bug reports, experience reports, code review, careful 
>> analysis, novel alternatives, etc.    And the best feedback usually 
>> comes from using the feature “in anger” — trying to actually write 
>> code with it. (“Some people would prefer a different syntax” or “some 
>> people would prefer we focused on string interpolation only” fall 
>> squarely in the “things we already knew” camp.)
>>
>> In the course of using this feature in the `jextract` project, we did 
>> learn quite a few things we didn’t already know, and this was 
>> conclusive enough that it has motivated us to adjust our approach in 
>> this feature.  Specifically, the role of processors is “outsized” to 
>> the value they offer, and, after further exploration, we now believe 
>> it is possible to achieve the goals of the feature without an 
>> explicit “processor” abstraction at all!  This is a very positive 
>> development.
>>
>> First, I want to affirm that that the goals of the project have not 
>> changed.  From JEP 459:
>>
>> Goals
>>
>> • Simplify the writing of Java programs by making it easy to express 
>> strings that include values computed at run time.
>> • Enhance the readability of expressions that mix text and 
>> expressions, whether the text fits on a single source line (as with 
>> string literals) or spans several source lines (as with text blocks).
>> • Improve the security of Java programs that compose strings from 
>> user-provided values and pass them to other systems (e.g., building 
>> queries for databases) by supporting validation and transformation of 
>> both the template and the values of its embedded expressions.
>> • Retain flexibility by allowing Java libraries to define the 
>> formatting syntax used in string templates.
>> • Simplify the use of APIs that accept strings written in non-Java 
>> languages (e.g., SQL, XML, and JSON).
>> • Enable the creation of non-string values computed from literal text 
>> and embedded expressions without having to transit through an 
>> intermediate string representation.
>>
>> Non-Goals
>> • It is not a goal to introduce syntactic sugar for Java's string 
>> concatenation operator (+), since that would circumvent the goal of 
>> validation.
>> • It is not a goal to deprecate or remove the StringBuilder and 
>> StringBuffer classes, which have traditionally been used for complex 
>> or programmatic string composition.
>>
>> Another thing that has not changed is our view on the syntax for 
>> embedding expressions.  While many people did express the opinion of 
>> “why not ‘just' do what Kotlin/Scala does”, this issue was more than 
>> fully explored during the initial design round.  (In fact, while 
>> syntax disagreements are often purely subjective, this one was far 
>> more clear — the $-syntax is objectively worse, and would be doubly 
>> so if injected into an existing language where there were already 
>> string literals in the wild.  This has all been more than adequately 
>> covered elsewhere, so I won’t rehash it here.)
>>
>>
>> Now, let’s talk about what we do think should change: the role of 
>> processors and the StringTemplate type.
>>
>> Processors were envisioned as a means to abstract the transformation 
>> of templates to their final form (whether string, or something else.) 
>>  However, Java already has a well established means of abstracting 
>> behavior: methods.   (In fact, a processor application can be viewed 
>> as merely a new syntax for a method call.)  Our experience using the 
>> feature highlighted the question: When converting a SQL query 
>> expressed as a template to the form required by the database (such as 
>> PreparedStatement), why do we need to say:
>>
>>   DB.”… template …”
>>
>> When we could use an ordinary Java library:
>>
>>   Query q = Query.of(“…template…”)
>>
>> Indeed, one of the worst things about having processors in the 
>> language is that API designers are put in the difficult situation of 
>> not knowing whether to write a processor or an ordinary API, and 
>> often have to make that choice before the consequences are fully 
>> understood.  (To add to this, processors raise similar questions at 
>> the use site.) But the real criticism here is that template capture 
>> and processing are complected, when they should be separate, 
>> composable features.
>>
>> This motivated us to revisit some of the reasons why processors were 
>> so central to the initial design in the first place.  And it turned 
>> out, this choice had been influenced — perhaps overly so — by early 
>> implementation experiments.  (One of the background design goals was 
>> to enable expensive operations like `String::format` to be (much) 
>> cheaper.  Without digressing too deeply on performance, 
>> String::format can be more than an order of magnitude worse than the 
>> equivalent concatenation operation, and this in turn sometimes 
>> motivates developers to use worse idioms for formatting.  The FMT 
>> processor brough that cost back in line with the equivalent 
>> concatenation.)  These early experiments biased the design towards 
>> needing to know the processor at the point of template capture, but 
>> upon reexamination we realized that there are other ways to achieve 
>> the desired performance goals without requiring processors to be 
>> known at capture time.  This, in turn, enabled us to revisit a point 
>> in the design space we had transited through earlier, where string 
>> templates were “just a new kind of literal” and the job performed by 
>> processors could instead be performed by ordinary APIs.
>>
>> At this point, a simpler design and implementation emerged that met 
>> the semantic, correctness, and performance goals: template literals 
>> (“Hello \{name}”) are simply the literal form of StringTemplate:
>>
>>   StringTemplate st = “Hello \{name}”;
>>
>> String and StringTemplate remain unrelated types.  (We explored a 
>> number of ways to interconvert them, but they caused more trouble 
>> than they solved.)  Processing of string templates, including 
>> interpolation, is done by ordinary APIs that deal in StringTemplate, 
>> aided by some clever implementation tricks to ensure good performance.
>>
>> For APIs where interpolation is known to be safe in the domain, such 
>> as PrintWriter, APIs can make that choice on behalf of the domain, by 
>> providing overloads to embody this design choice:
>>
>>    void println(String) { … }
>>    void println(StringTemplate) { … interpolate and delegate to 
>> println(String) …. }
>>
>> The upshot is that for interpolation-safe APIs like println, we can 
>> use a template directly without giving up any safety:
>>
>>    System.out.println(“Hello \{name}”);
>>
>> In this example, the string template evaluates to StringTemplate, not 
>> String (no implicit interpolation), and chooses the StringTemplate 
>> overload of println, which in turn chooses how to process the 
>> template. This stays true to the design principle that interpolation 
>> is dangerous enough that it should be an explicit choice in the code 
>> — but it allows that choice to be made by libraries when the library 
>> is comfortable doing so.
>>
>> Similarly, the FMT processor is replaced by an overload of 
>> String::format that interprets templates with embedded format 
>> specifiers (e.g., “%d”):
>>
>>   String format(String formatString, Object… parameters) { … same as 
>> today … }
>>   String format(StringTemplate template) {... equivalent of FMT ...}
>>
>> And users can call this as:
>>
>>   String s = String.format(“Hello %12s\{name}”);
>>
>> Here, the String::format API has chosen to interpret string templates 
>> according to the rules previously specified in the FMT processor (not 
>> ordinary interpolation), but that choice is embedded in the library 
>> semantics so no further explicit choice at the use site is required. 
>>  The user already chose to pass it to String::format; that’s all the 
>> processing selection that is needed.
>>
>> Where APIs do not express a choice of what template expansion means, 
>> users continue to be free to process them explicitly before passing 
>> them, using APIs that do (such as String::format or ordinary 
>> interpolation.).
>>
>> The result is:
>>
>> - The need for use-site "goop" (previously, the processor name; now, 
>> static or instance methods to process a template) goes away entirely 
>> when dealing with libraries that are already template-friendly.
>> - Even with libraries that require use-site goop, it is no more 
>> intrusive than before, and can be reduced over time as APIs get with 
>> the program.
>> - StringTemplate is just another type that APIs can support if they 
>> want.  The "DB" processor becomes an ordinary factory method that 
>> accepts a string template or an ordinary builder API.
>> - APIs now can have _more_ control over the timing and meaning of 
>> template processing, because we are not biasing so strongly towards 
>> early processing.
>> - It becomes easier to abstract over template processing (i.e., 
>> combine or manipulate templates as templates before processing)
>> - Interpolation remains an explicit choice, but ST-aware libraries 
>> can make this choice on behalf of the user.
>> - The language feature and API surface get considerably smaller, 
>> which is good.  Core JDK APIs (e.g., println, format, exception 
>> constructors) get upgraded to work with string templates.
>>
>> The remaining question that everyone is probably asking is: “so how 
>> do we do interpolation.”  The answer there is “ordinary library 
>> methods”.  This might be a static method 
>> (String.join(StringTemplate)) or an instance method 
>> (template.join()), shed to be painted (but please, not right now.).
>>
>> This is a sketch of direction, so feel free to pose 
>> questions/comments on the direction.  We’ll discuss the details as we 
>> go.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-experts/attachments/20240319/acfa9571/attachment-0001.htm>


More information about the amber-spec-experts mailing list