Update on String Templates (JEP 459)

Clement Cherlin ccherlin at gmail.com
Thu Mar 14 19:04:02 UTC 2024


I think there are a few basic use cases which everyone wants to be
safe and ergonomic.

1. New APIs that accept StringTemplate, not String, and do processing
with the value above and beyond direct interpolation (SQL queries,
HTML/XML escaping, transforming to JSON, etc.).
2. Existing APIs that accept String or (String, Object...) that have
StringTemplate support added, such as PrintWriter::println or
String::format.
3. Old APIs that have not been (and may never be) updated to accept
StringTemplate, but we want to pass interpolated strings to.

# Problems

Use case #1:
Issues passing constant templates if there is no explicit syntactic
distinction between string and template literals.

Use case #2:
Complicated and potentially erroneous overload selection if there is
no explicit syntactic distinction between string and template
literals.

Use case 3:
Passing interpolated templates to APIs that only support String,
without excess ceremony.

# Proposed Solution

I believe there is a common solution to these problems that
(hopefully) addresses all of these issues.

Prefixing a template with an explicit processor was nice in one way,
because the processor made the semantics of the interpolation
explicit. However, processors were more trouble than they were worth.

What if instead of the extremes of a myriad of processors, or a single
template prefix, or no prefix and complex/confusing context rules, we
have exactly two prefixes? To avoid bikeshedding (obviously, the final
names would be much shorter), I will call them TEMPLATE and
INTERPOLATE. These are semantically identical to the old RAW and STR
processors respectively, but syntactically have no "." between them
and the leading quote.

TEMPLATE"hey \{name}" -> StringTemplate
INTERPOLATE"hey \{name}" -> String

Unlike processors, these two are the *only* valid prefixes.

This brings back the clarity of RAW and STR without the complexity of
processor classes. Processing of TEMPLATE literals is done by normal
methods that take StringTemplate. INTERPOLATE literals evaluate
directly to regular Strings.

The two kinds of expressions can have different translation
strategies, like constant-ification of INTERPOLATE expressions with
constant values, as Guy suggests.

# Examples

Use case #1
generateQuery(TEMPLATE"update table \{tableName} set \{column} =
\{value} where \{whereExp}"); // OK
generateQuery(INTERPOLATE"update table \{tableName} set \{column} =
\{value} where \{whereExp}"); // incompatible type error

Use case #2
System.out.println(TEMPLATE"Hello, \{world}!"); // OK
System.out.println(INTERPOLATE"Hello, \{world}!"); // OK, and if
'world' is constant, it may be folded
String.format(TEMPLATE"I am %d\{age} years old"); // OK
String.format(INTERPOLATE"I am %d\{age} years old"); // IDE warning
and runtime exception because format string doesn't match number of
parameters.

Use case #3
someOldStringMethod(TEMPLATE"some runtime values go here: \{value1}
and here: \{value2}"); // incompatible type error
someOldStringMethod(INTERPOLATE"some runtime values go here: \{value1}
and here: \{value2}"); // OK

What do you think?

Cheers,
Clement Cherlin

On Thu, Mar 14, 2024 at 12:44 PM Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> Not to pour too much cold water on the idea of having string interpolation literal, but I’d like to mention a few points here.
>
> First, it was a deliberate design goal of the string template feature to make interpolation an explicit act. Note that, if we had the syntax you describe, we actually achieve the opposite effect: string interpolation is now the default, and implicit, and actually cheaper (to type) than the safer template alternative. This is a bit of a red herring, I think.
>
> The second problem is that interpolation literals can sometimes be deceiving. Consider this example:
>
> String.format("Hello, my name is %s{name}"); // can you spot the bug?
>
> Where String::format has a new overload which accepts a StringTemplate.
>
> Basically, since here we forgot the leading “$” (or whatever char that is), the whole thing is just a big interpolation. Semantically equivalent to:
>
>  String.format("Hello, my name is %s" + name); // whoops!
>
> This will fail, as String::format will be waiting for an argument (a string), but none is provided. So:
>
> |  Exception java.util.MissingFormatArgumentException: Format specifier '%s'
> |        at Formatter.format (Formatter.java:2672)
> |        at Formatter.format (Formatter.java:2609)
> |        at String.format (String.java:2897)
> |        at (#2:1)
>
> This is a very odd (and new!) failure mode, that I’m sure is gonna surprise developers.
>
> Maurizio
>
> On 14/03/2024 15:08, Guy Steele wrote:
>
> Second thoughts about how to explain a string interpolation literal:
>
> On Mar 13, 2024, at 2:02 PM, Guy Steele <guy.steele at oracle.com> wrote:
> . . .
>
> —————————
> String is not a subtype of StringTemplate; they are disjoint types.
>
> $”foo”              is a (trivial) string template literal
> “foo”                is a string literal
>         $”Hello, \{x}”     is a (nontrivial) string template literal
>         “Hello, \{x}”      is a shorthand (expanded by the compiler) for `String.of($“Hello, \{x}”)`
> —————————
>
> Given that the intent is that String.of (or whatever we want to call it—possibly the `interpolation` instance method of class `StringTemplate` rather than a static method `String.of`) should just do standard string concatenation, we might be better off just saying that a string interpolation literal is expanded by the compiler into uses of “+”; for example,
>
>          “Hello, \{x}.”
>
> (I have added a period to the example to make the point clearer) is expanded into
>
>         “Hello, “ + x + “.”
>
> and in general
>
>         “c0\{e1}c1\{e2}c2…\{en}cn”
>
> (where each ck is a possibly empty sequence of string characters and each ek is an expression)  is expanded into
>
>         “c0” + (e1) + “c1” + (e2) + “c2” + … + (en) + “cn”
>
> The point is that, with this definition, “c0\{e1}c1\{e2}c2…\{en}cn” is a constant expression iff every ek is a constant expression. This is handy for interpolating constant variables into a string that is itself intended to be constant.
>
> —Guy
>


More information about the amber-spec-observers mailing list