Feedback: String Templates (JEP 430)
forax at univ-mlv.fr
forax at univ-mlv.fr
Sat Apr 1 06:43:40 UTC 2023
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Reinier Zwitserloot" <reinier at projectlombok.org>, "Brian Goetz"
> <brian.goetz at oracle.com>, "amber-dev" <amber-dev at openjdk.java.net>
> Sent: Friday, March 31, 2023 10:23:20 PM
> Subject: Re: Feedback: String Templates (JEP 430)
> On 31 Mar 2023, at 12:30, Remi Forax wrote:
>> …
>> I agree that interpolate() is too easy to misuse but at the same time, it's a
>> useful primitive.
> +1
>> I wonder if the solution is to add an escape function, a function that takes an
>> Object and returns an Object that should escape the values to interpolate.
>> Something like
>> public String StringTemplate.interpolate(UnaryOperator<Object> escapeFunction) {
>> ... }
>> By asking for an escape function, we are making the API safer to use.
> But the workaround is saying interpolate(x->x) and grumbling about ceremony.
> That workaround doesn’t get much closer to exposing the root problem. Also, the
> unary operator, if we were to do this functionally as suggested, needs to “see”
> more context about each value that it would be interpolated.
> It seems to me that part of the problem here is the responsibility for escaping
> is on the wrong side of the fence, with interpolate as presently discussed.
> When offered a function whose contract knows only about string joining, the
> party producing stringy bits to join into a correct DSL statement is burdened
> with responsibility of figuring out how and when to escape them, and in their
> various DSL-specific contexts.
> But surely that knowledge belongs more exactly to the ST processor, not to the
> supplier of interpolation values. Some languages have context-dependent quoting
> rules. Note that Remi’s suggested unary function doesn’t see the context. It
> could be given context as interpolate((x,c)->…) , and that begs a very good
> question about the type of c. What I’m saying is that c is not something the
> supplier of interpolation values should be forced to worry about.
> For example in SQL values are quoted with a single quote but names are quoted
> with a different kind of quote, often vendor-specific; yuck. Fighting against
> code-injection might require getting the correct contextual flavor of quote in
> each case, if not for SQL then for more complex templated notations. It’s lucky
> for SQL that textually doubling ' to '' will cover most use cases, regardless
> of context, but that’s just luck. JSON has distinct kinds of values, which
> would require distinct tactics for validation and/or quoting; you need
> quote-escapes for string bodies and field names but not for numbers. If you
> were trying to do Java templates you’d want to know the contextual difference
> between char and string literals.
> Generally speaking, getting the quoting right is not the direct responsibility
> directly of people supplying values to interpolate, but rather the
> responsibility of the party weaving together a (correct) template (SQL or JSON
> or …). Asking the value-supplier to shoulder the burden of correct quoting
> requires a mix of two kinds of expertise (business logic and query syntax),
> which is how bugs happen.
> I think I would prefer to see a formulation of interpolate which would require
> users to take apart the ST processor, lower it into a plain-string-cat template
> processor, and then run a natively string-cat-ing format operation on it; after
> that it can be lifted back to its DSL, with fingers crossed that we got avoided
> bad injections. But I admit I haven’t figured out the details, so that’s just a
> vague suggestion…
> What I hope is clear is my point about separating concerns, between knowing how
> and when to escape a value in a particular place , and coming up with a set of
> interpolation values for those places. It’s rooted in the distinction between
> an envelope and its contents. Quoting (and validation) is something
> envelope-specific. Contents are usually specific to some completely unrelated
> domain of business logic. Unless API users are helped to separate those
> concerns, there will be confusion, exploitable in attacks.
Thanks John,
A string template processor takes a template (a List<String>), an arguments descriptor (a MethodType) and the values (a List<Object>) and produce a result
List<String> -> MethodType -> List<Object> -> result
The current implementations provides two API points, TS.Processor.Linkage groups the arguments that way
(List<String>, MethodType) -> List<Object> -> result
while TS.Processor groups the arguments into TemplatedString as an intermediary object
TemplatedString (List<String>, MethodType, List<Object>) -> result
The question is what primitive to provide to a TS.Processor that uses string interpolation internally (TS.interpolate()).
A SQL TS.Processor, I hope, will use PreparedStatement internally and not rely on string interpolation, so will not call TS.interpolate(), but anyway your point about separating the processing of the fragments and the values holds.
It's not a secret that i do not like the interface TS.Processor because the incentive is wrong, it's hard to say at the same time, beware as an implementer of a processor you have to be cautious about injections so you have to separate the way you deal with fragments and values (to escape them correctly) and at the same time provide an API that regroups the fragments and the values into the same object.
It seems that a better API should be to not provide TS.interpolate as an instance method and change the static methods to takes both the fragments and the values and hopes that the template processor implementer uses that method call with care. I think it's better to provide a method that let everyone mess with string concatenation because at least you have an easy entrypoint in term of security auditing.
---
More about why i do not like TS.Processor API, "a particular place", i.e. the call site is something available using the TS.Processor.Linkage API but sadly not available using the TS.Processor API so all processor implementers will re-invent that notion using caching (see Reinier part of the message about JOOQL + cache).
I think that making the API to define a template processor too easy is a mistake. The sadistic part of me also think that the TS.Processor API entrypoint should not exist and that only TS.Processor.Linkage should exist, because TS.Processor.Linkage returns a MethodHandle and if you are able to understand how MethodHandle works, there is also a good chance that you know how to deal with injections. I would prefer to live in a world with fewer template processors because the bar to create one is higher than live in a world with a lot of template processors with half of them being wrong.
> — John
Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20230401/b05a08b3/attachment-0001.htm>
More information about the amber-dev
mailing list