Update on String Templates (JEP 459)
John Rose
john.r.rose at oracle.com
Thu Mar 14 01:22:53 UTC 2024
Thanks, Maurizio. I find your arguments helpful and persuasive. They
indicate that “autoboxing” is the wrong model, since it would lift
ad hoc strings into places that want only STs.
The poly-expression move, applied only to string literals, is not so
bad, since the only ad hoc strings liftable to STs are those right next
to the API points that demand STs.
But, if we are going to make ST-demanding APIs the lock and STs the
keys, it might be reasonable to demand that all STs look distinctive
(with that extra sigil), which is an argument even against the
poly-expression move.
Guy’s disruptive suggestion, of having both kinds of interpolation
expressions, would play out as two tiers of vetting and security. The
lower tier is inhabited by strings. You have to drive carefully on
those streets, where dodgy APIs accept all kinds of strings, and there
are no $ sigils to indicate vetted inputs. The higher tier would be API
points that demand STs (and do not welcome plain strings). To get into
that safer tier, you pay a cover charge, the $ sigils (or API points
which manufacture STs explicitly).
It might seem wrong to ask a cover charge for a tier we want users to
prefer, but the IDE will surely help pay it as needed. (The $ is
visible in the code, as a reminder the security is enabled. Like the
wrist band you get when you pay the cover?)
On the other hand, if we try to make everything be one tier (everything
potentially vettable, but with loopholes for raw strings), the security
guarantees get muddier. If everything is equally secure, and there are
loopholes (for string concat and the like) then everything is also
equally insecure, in some hand-wavy sense.
More hand-waving: Distinct tiers is a more honest design, allowing for
better invariants within the higher tier, and relaxed behavior in the
lower tier. Also, maybe, having the distinct tiers be visibly connected
by syntax encourages folks muddling around with string-concat to lift
their code to work on STs instead of strings. Switch the APIs and add
the dollar signs.
OK, I’ll stop now. I’m past the point where I need to try the API
on some serious project, before I speculate more.
On 13 Mar 2024, at 16:47, Maurizio Cimadamore wrote:
> There is a problem/slippery slope with overloads, which I think should
> be discussed (and that discussion seems, at least to me, more
> important than the discussion on how we spell string literals).
>
> Consider the case of a /new/ API, that perhaps wants to build SQL
> queries (or any other kind of injection-sensitive factory):
>
> |Query makeQuery(???) |
>
> What should be the natural parameter type for this query? Well, we
> know that String is flawed here. Easy to reach for, but also too easy
> to abuse. StringTemplate is a much better type because it allows
> user-injectable values and constant parts to carried in separate parts
> of the string template, so that the library has a chance at looking at
> what’s going on.
>
> Ok, so let’s say we write the factory as:
>
> |Query makeQuery(StringTemplate) |
>
> As that is clearly the safer option. This obviously works well /as
> long as clients are passing template with arguments/.
>
> No-argument templates might be a corner case, but, sooner or later
> somebody might want to do this:
>
> |makeQuery("SELECT foo FROM bar WHERE foo = 42"); |
>
> Only to discover that this doesn’t compile. What then? There are a
> couple of alternatives I can think of. The first is to add a
> String-accepting overload:
>
> |Query makeQuery(StringTemplate) Query makeQuery(String) |
>
> The second is to use some use-site factory call to turn the string
> into a degenerate string template:
>
> |makeQuery(StringTemplate.fromString("SELECT foo FROM bar WHERE foo =
> 42")); |
>
> IMHO, both approaches have problems: they force the user to go from
> the safer StringTemplate world, to the more unsafe String world.
> It’s sort of like crossing the Rubicon: once you’re in
> String-land, it then become easier to introduce potentially very
> costly mistakes. If we have overloads:
>
> |makeQuery("SELECT " + foo + " FROM " + bar + " WHERE " + condition);
> |
>
> This would now compile just fine. Effectively, safety-wise we’d be
> back at square one. The factory case is only marginally better -
> because using the factory is more convoluted, so it would perhaps be
> easier to spot that something fishy is going on. That said, as the
> expression got more complicated, it’s easier for bugs to sneak in:
>
> |makeQuery(StringTemplate.fromString("SELECT " + foo + "FROM bar WHERE
> foo = 42")); |
>
> So, at least in my opinion, having a string template literal, or some
> kind of compiler-controlled promotion from string /constants/ to
> string templates, is not just something we need to type less
> characters (I honestly couldn’t care less about that, at least not
> at this stage). These things are needed to allow developers to remain
> in StringTemplate-land.
>
> That is, the best /overall/ outcome is for the library /not/ to have
> an overload, /and/ for the client to either say this:
>
> |makeQuery("SELECT foo FROM bar WHERE foo = 42"); // works because of
> implicit promotion of constant String -> StringTemplate |
>
> or this:
>
> |makeQuery(<insert your favourite "I'M A TEMPLATE" char here>"SELECT
> foo FROM bar WHERE foo = 42"); // works because it's a string template
> all along |
>
> Maurizio
>
> On 13/03/2024 22:37, John Rose wrote:
>
> On 13 Mar 2024, at 15:22, John Rose wrote:
>
> … OVERLOADS …
>
> I don’t see (maybe I missed it) a decisive objection to
> overloading
> across ST and String, at least for some processing APIs.
> Perhaps it is this: A language processor API that takes STs and
> never Strings is making it clear that all inputs should be
> properly
> vetted, nothing taken on trust as a bare string.
>
> Doing that MIGHT require a performance model which permits
> expensive
> vetting operations to be memoized on particular OCCURRENCES of
> inputs
> (not just the input strings viewed in and of themselves).
>
> If that’s true, then I guess that’s support for Guy’s
> proposal: That
> STs (even trivial ones) should never look identical to strings.
> Maybe they should always be preceded by a sigil $, or (per my
> suggestion) they should always have at least one occurrence of {
> inside, even if it’s a trivial nop.
>
> I kind of like Guy’s offensive-to-everyone suggestion that $ is
> required to make a true ST. Then it’s clear how the veteting APIs
> mate up with their vetted inputs. And if $ is not placed in front,
> we surrender to the string-pasters, but at least the resulting
> true-string expressions won’t be accepted by the vetting APIs.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-experts/attachments/20240313/ceb5db92/attachment-0001.htm>
More information about the amber-spec-experts
mailing list