Update on String Templates (JEP 459)

Mon Mar 11 18:24:31 UTC 2024

My thinking pretty much matched Brian’s analysis below, until I saw Archie’s examples and thought about them. Four points:

(1) I would add one more simplistic argument in favor of “different”: I see some value in a reader of code having fair warning, quite visible and up front, that what looks like a string may actually contain executable code (possibly having side effects). (Maybe this is related to what Brian meant by "The prefix sigil means no one has to ‘buffer' when interpreting the code.".)

I think we do want such a warning, if present, to be concise but hard to overlook, and I think the choice of “$” fits that bill admirably. (Pro: The character “$” is associated with string interpolation in a number of other languages, including C#, Dart, Groovy, JavaScript, Julia, Kotlin, PHP, TCL, TypeScript, and Visual Basic. Con: Of the languages just listed, those that use “$” before the opening double quote are C# and Visual Basic, and the proposed Java syntax is not otherwise identical to the syntax of C# and Visual Basic, which enclose expressions in _unescaped_ braces.)

(2) Because “$” is an identifier in Java, it suggests that we can hold open a possible future where we allow other string-prefix sigils having the syntax of an identifier, but without really committing to that generality at this time.

(3) Because “$” is a _discouraged_ identifier in Java (see JLS §3.8: "The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems.”), in practice all occurrences of dollar signs would in fact flag string templates.

(4) Archie's suggestion does not create an alternate syntax for the Plain Old String Literals we have had in Java since its inception.

For these reasons, I recommend that Archie’s suggestion (and perhaps also the C#/Visual Basic variation) be given careful (re-)consideration at this time.

On Mar 11, 2024, at 1:36 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

The overlap between string literals and string template literals is indeed a tricky one, and bears some review of the options.  Obviously string templates and strings have some things in common (its in the name!), but they are also different and evaluate to different types.  So how “same” or “different” should they look?

Simplistic arguments in favor of “different”:
 - Ambiguity is bad, clarity is good
 - String / string template literals can be both wide and tall; having to examine the entirety to know which it is could be confusing
 - Simpler for compiler and specification writers

Simplistic arguments in favor of “same”:
 - Will be perceived as “fussy” or distracting
 - Users are already grumpy that we’re not doing “string interpolation” and calling it a day
 - Most of the time, it is perfectly obvious which one it is
 - Have to make up yet another new and unfamiliar syntax to disambiguate, think of the bike shedding

There are probably others, but none of these seem like slam-dunks one way or the other.

There are a few choices here:

 - Keep the current syntax approach
 - Give STs a new syntax
 - Give both STs and string literals an _optional_ new syntax, such as I_IZ_STRING”…” and TEMPLATZ”…”, but allow the current approach when disambiguation is not needed

The last seeks a compromise between the current path and the desire for explicitness.  Suppose we allowed s”…” and t”…” literals, where the sigils were optional.  What then?

Obviously in the cases which are currently ambiguous-seeming, users could disambiguate explicitly.  The prefix sigil means no one has to “buffer” when interpreting the code.  That’s nice.  Having two ways to write classical string literals might confuse people who haven’t seen them before, or stimulate unproductive “style wars”.  That’s probably not too big a problem here.

Overall, though, I am not so enthused about creating yet another new lexical mechanism for having different kinds of stringy things.  The value is … meh, and it seems an attractive nuisance.  In other languages with multiple “flavors” of string, there is a tendency to proliferate more flavors.  (Raw strings, anyone?).

My take is that this is something that is bothering us a lot because it is new, but I’m skeptical that it carries its weight.

On Mar 11, 2024, at 9:07 AM, Archie Cobbs <archie.cobbs at gmail.com<mailto:archie.cobbs at gmail.com>> wrote:

On Mon, Mar 11, 2024 at 9:37 AM Remi Forax <forax at univ-mlv.fr<mailto:forax at univ-mlv.fr>> wrote:
I vote for making string templates explicit.

Caveat: I've been following this discussion only loosely so I'm likely to say something stupid/ignorant/redundant; if so please ignore.

But I am tending to agree with Remi. The recent simplifications Brian described are a definite improvement, but now we're left with a new question:

What is the advantage of having the language literals for String and StringTemplate look so confusingly similar?

Reversing that question, I'm not seeing the big downside of having a simple prefix for literals like this:

    var s = "this is a string";
    var st1 = $"this is a (degenerate) template";
    var st2 = $"this is also a \{template}";
    var x = "this is a \{lexical_error}";
    myobj.someOverloadedMethod($"this is definitely a template");
    myobj.someOverloadedMethod("this is definitely a string!");     // no need to consult javadoc here

Seems like the trade-off is straightforward:

Cost: one character
Benefit: instant disambiguation clarity in the developer's mind

At least, it makes the whole API design/overload question straightforward.

Put another way, StringTemplates are a cool new language feature, and as such it seems like they deserve a "first-class" allotment in the syntax of the language.

-Archie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240311/c72943a7/attachment-0001.htm>