Update on String Templates (JEP 459)

Gavin Bierman gavin.bierman at oracle.com
Sun Mar 17 17:14:10 UTC 2024


Hi Remi,

Yes, I think this is a good way to think about the design space. (It is a shame that the fact that this is NOT about string interpolation, but something much more general and focused on security - even though made explicit in the JEP - has been lost in some of the wider discussions.)

You can make the distinction even clearer - reading from the spec - a template "\{x} + \{y}” can be thought of as sugar for the expression new $HiddenClassImplementsStringTemplate(List.of("", " + ", ""), List.of(x, y)). So, sure, it’s an object that has the potential to be a string, but it’s an object with a couple of lists in it. The fact that the embedded values are kept as a separate list, and so can be validated and dealt with using domain-specific logic, is the key to safety. You need to write code to transform template values into something else (perhaps a string). In the old model, that was the role of the processor (and the reason why they came first - to remind you that the template needed processing to get a value), and with the new model will be a method. I agree with you that any design that makes it easy to conflate templates with strings is a road to another 30+ years of injection attacks.

Gavin


On 16 Mar 2024, at 07:18, Remi Forax <forax at univ-mlv.fr> wrote:



________________________________
From: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
To: "Guy Steele" <guy.steele at oracle.com>
Cc: "amber-spec-experts" <amber-spec-experts at openjdk.org>
Sent: Friday, March 15, 2024 5:31:28 PM
Subject: Re: Update on String Templates (JEP 459)

Hi

On 15/03/2024 16:07, Guy Steele wrote:

Then again, now that I ponder the space of use cases, it may be that, despite my initial enthusiasm, having a separate string interpolation syntax may not carry its weight if its uses are relatively rare. We always have the option of using a string template and then applying an interpolation processor (which might be spelled `String.of(<template>)` or `(<template>).interpolate()` or some other way), and about all we lose from that approach is the ability to use string interpolation to specify a constant expression—for which we still have the old-fashioned alternative of using `+` concatenation. If we drop string interpolation, we can then drop the INTERPOLATION prefix, and we are back to a single-prefix model, and the remaining question is whether that prefix is optional, at least in some cases. Okay, I think I now have a better understanding of the relationships among the various proposals in the design space. Thanks for your patience.

I think the advantage for not having a string interpolation prefix, is that then interpolation is “just another processor” e.g. a static method somewhere that takes a string template and returns a String. Another String::format, in a way. So that leads to a rather uniform design.


And now that I have that better understanding, I think I lean toward (a) abandoning string interpolation and (b) having a single, short, _non-optional_ prefix for templates (“$” would be a plausible choice), on the grounds that I think it makes code more readable if templates are always distinguished up front from strings—and this is especially helpful when the templates are rather long and any `\{` present might be far from the beginning. It has a minimal number of cases to explain:

“…”      string literal, must not contain \{…}, type String
$”…”    template literal, may contain \{…}, type StringTemplate

Yep, I agreee this a very principled way to look at the problem.

[...]

This is how i like to explain the design space to myself.
We have two kind of strings, tainted string and untainted string (this is not new, see [1]).
An untainted string is a string that can be escaped properly, in our case a StringTemplate. A tainted string is just a String.

We do not want a String to be a StringTemplate, because it means all untainted strings are tainted strings.
We do not want a StringTemplate to be a String, because it means that all tainted strings are untainted strings.
So both are different types, with neither a subtype relationship nor an automatic conversion between them.

For the literals, we need two different constructs otherwise we will have a conversion between tainted and untainted strings,
we also need the literal to construct an untainted string to be different and upfront to easily distinguish an untainted string from a tainted string, so
- "..." constructs a String, a tainted string,
- TEMPLATE"..." constructs a StringTemplate, an untainted string.

About string interpolation, this is another way to create a String and this is not directly related to a string being tainted or not, so it's a kind of orthogonal in term of design.
It can not be a prefix like INTERPOLATE, because this is different in nature from TEMPLATE, TEMPLATE creates another kind of String, interpolation creates just a String.
Having a static method (a processor) that creates a String from a StringTemplate creates a common conduit to get a tainted string from any untainted strings, which makes the distinction between untainted string and tainted string less relevant. So i would advise to not go in that direction.


Maurizio

​

Rémi

[1] https://en.wikipedia.org/wiki/Taint_checking


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240317/fee34942/attachment-0001.htm>


More information about the amber-spec-observers mailing list