Feedback: String Templates (JEP 430)

Reinier Zwitserloot reinier at projectlombok.org
Thu Mar 30 23:42:13 UTC 2023


This design is _awesome_ and exactly the kind of '... but we actually
thought through how such features would be used and how extensive the scope
could be' that makes the past ~5 years of new java features such a breath
of fresh air.


In particular, the notes on how IDEs are strongly encouraged to render
string templates quite differently from string literals further highlights
this. As a language feature, IDEs are __irrelevant__, javac isn't an IDE
and the spec has no business dictating IDE builders what to do. However, in
practice of course, 99% of all java code is written in an IDE so keeping
that in mind is precisely the kind of thing we love to see in language
design proposals. We assume folks are aware that the specification of the
string template construct could have been made much simpler by simply
redefining that a string template consists of __zero__ or more embedded
expressions. Javac would be just fine with this simplified model, but we
_want_ IDE makers to copy this BNF - they need to differentiate between
'has 1 or more embedded expressions' and 'has no embedded expressions,
i.e., is a literal' even if only for allowing for different colour coding.
So, please keep that part.


As one might expect on a mailing list for feedback on language design
proposals, most of this post will involve complaints, errors, and other
whining. The 2 paragraphs above should hopefully highlight how these are
mere nits on an otherwise excellent proposal.


## The JSON example


There are a few problems with it. As a very minor nit, the text block with
the sample JSON ends in a semi colon which is, as per JSON.org, not legal
JSON. Simply remove the semicolon. Java habits, we guess.


A more pernicious problem with this example is that it uses interpolate().
__This is security-wise completely broken__ and means unsafe injection is
waiting to happen. It's _precisely_ the problem that the proposal spent
multiple paragraphs laying out immediately prior to this example, so that
really made me physically wince as we was reading through it. Possibly the
aim of the example is simply to show how this works, but at the very least
the example should end in:


// NB: Please DO NOT USE INTERPOLATE THIS WAY - this is entirely unsafe and
will cause your box to be p0wned.


Which is.. kinda awkward to have to put in a spec. Perhaps just leave the
example syntax as is, then leave the implementation of the
StringTemplate.Processor as an exercise to the reader instead of trying to
shoehorn an dubious (insecure) call to interpolate() in.


Or, possibly, the interpolate() call wasn't an intentional sacrifice for
the sake of example and Jim really did believe this was fine, in which
case, we think the interpolate() method just has to go. The whole point of
having the construct be a `StringTemplate`, with `j.l.String` instances
going out of their way _NOT_ to be an implementation of that interface, is
to avoid precisely this problem. I'm not sure that's a good idea; examples
such as `INTER` do want interpolate, but, then again, they can just call
`STR.process(st)` instead of `st.interpolate()` if they want that. Given
that `interpolate()` is a bazooka programmed to blow your own feet off,
perhaps it's a better idea to force anybody who wants to fire it to figure
out that they can call `STR.process` instead of having `interpolate()`
entice you into its dangerous waters by appearing in auto-complete dialogs
on string templates. Our suggestion is to __remove__ `interpolate()`
entirely for all the reasons spelled out in the "String interpolation is
dangerous" section of Jim's proposal.


## Outdated statements about the nature of StringTemplate.Processor.of


The text goes out of its way to say that ST.Processor.of can only produce
String-returning, non-checked-exception throwing processors.


But the current state of the javadoc as hosted on jlaskey's cr.openjdk.org
site disagrees with this; given that the current (as per the javadoc)
signature of the `of` method is:


  static <R,E extends Throwable> StringTemplate.Processor<R,E>
of(Function<StringTemplate,R> process)


which means we can make a checked-exception throwing, non-String returning
processor just fine. This would work:


```

Processor<Json, JsonValidationException> p = StringTemplate.Processor.of(st
-> {

  .... - something that validates and throws JVEx ....

  return jsonObject;

};

```


... but while we're here, the entire of method seems pointless. After all,
Processor is, itself, a functional interface, so we can just delete the
`StringTemplate.Processor.of()` part of the above snippet and it would
exactly the same way!


We suggest the `of` method is removed. Jim's proposal makes some vague
overtures about how the existence of this method lets you dance around
javac's inference engine somehow but I'm not quite sure how that works, and
if indeed there is a point to it, then why isn't there an `of` method in
`java.util.function.Function` itself? What's the key difference between
StringTemplate.Processor and any other functional interface that means an
`of` method is warranted? Presumably it's because of this:


```

var PROC = StringTemplate.Processor.of(st -> new Foo());

// instead of

StringTemplate.Processor<Foo, RuntimeException> PROC = st -> new Foo();

```


However, we see some rather serious issues with this. It's trying to
piecemeal fix a fundamental problem with java-the-language itself.


Ordinarily, I'd say `Processor<Foo> PROC = st -> new Foo();` isn't so
unwieldy that it requires making this 'of' method. However, because
Processor has used the 'trick' of including an `<E extends Throwable>`, the
signature is now unwieldy.


But, this means the java core libraries are now schizophrenic: With
absolutely no rational basis, _some_ of its libraries _include_ the `<E
extends Throwable>` part, _and_ have an `of` method, and other parts (such
as _all_ of `java.util.function`!) do not.


This feels like a debate that should be held at a (much) higher level, up
to and including a debate that all of `java.util.function` is deprecated
and replaced with a carbon copy that adds `<E extends Throwable>` to all
relevant signatures. Which is, understandably, a rather ridiculous
proposal, but _given_ that the flagship repository of functional interfaces
(`java.util.function`) _does not_ do it, either:


* StringTemplate.Processor also shouldn't do it, or

* java-the-language needs to solve this problem itself, so that the
`java.util.function` package is no longer at design odds with the `<E
extends Throwable>` concept.. at which point StringTemplate.Processor
definitely shouldn't have it explicitly either.


Both lead to the same conclusion, which is, StringTemplate.Processor
shouldn't have the `<E extends Throwable>`, and with it, the `of()` method.
Unless that third option (deprecate `j.u.f` and make a new package with
throwable-typevars) is on the table.


We get that segueing this debate into this fundamental issue is ridiculous
scope creep; that's not our point. The point is: We don't think individual
features/libraries should decide to piecemeal 'solve' the original sin by
introducing the `<E extends Throwable>` and then _also_ putting out the
fires _that_ causes by tossing in an `of` method that seems, on initial
glance, as completely useless.


As a frequent answerer of java questions on stack overflow, the fundamental
issue of checked exceptions in lambdas definitely is a topic that requires
a serious conversation. But it's not fair to burden this feature proposal
with it.


Be that as it may, in our opinion, the `<E extends Throwable>` type var
shouldn't be part of this proposal, and if that goes, the `of()` method is
then no longer needed either.


## Can processors have methods?


We have some more pointed comments about the SQL examples, but, it also
raises an interesting use case that we think just works as is, though the
proposal doesn't explicitly spell it out:


Can StringTemplate.Processor instances also just be used to invoke methods
on? For example, if one has a 'database' object that can
stringtemplate-process SQL into ResultSet instances, it's _still_ generally
more convenient to have an explicit API for UPSERTs. This:


```

   DB.insert("tableName")

     .mergeOn("uniqueColumn1", foo)

     .mergeOn("uniqueColumn2", bar)

     .put("col3", baz)

     .exec();

```


is 'nicer', we think, vs:


```

   DB."""INSERT INTO tableName (uniqueColumn1, uniqueColumn2, col3) VALUES

     (\{foo}, \{bar}, \{baz}) ON CONFLICT (foo, bar) DO

     UPDATE SET col3 = EXCLUDED.col3""";

```


And before you disagree on style, note that UPSERT is a classic 'syntax
really differs between DB engines' - exactly the kind of thing you might
want to abstract away into a uniform API such as the first snippet.


Hence, we want to have some variable `DB` such that we can write:


```

  ResultSet a = DB."SELECT * FROM foo";

  DB.insert("tableName")....;

```


which, looking at the FormatProcessor, just 'works' right? If somehow it
doesn't work like that, we would strongly suggest the above should be
possible.


## Please explain that caching thing


There is a bit of a throwaway line in the proposal:


> Moreover, since the text block template is constant, a more advanced
template processor could compile the template into a JSONObject with
placeholder values, cache that result, and then at each evaluation inject
the field values into a fresh deep copy of that cached JSONObject. There
would be no intermediate String anywhere.


But.. how does that work? One major nit we have for this proposal is that
it spends multiple _pages_ on re-explaining obvious conclusions but skips
over the interesting bits, such as how this caching idea would work.


So, let's say we do write this `JSON` template processor. How _DO_ we cache
this? Should we shove the result of `template.fragments()` (i.e. the
`List<String>` returned by that method) in a `WeakHashMap` as key for this
cache I'm supposed to write?


That's a bit.. non-obvious, no?


The design correctly keeps the string fragments entirely separate from the
values, but we couldn't quite figure out how the language proposal intends
for me to cache on this.


Getting back to the SQL example, there's a more fundamental itch to the
stated example. The example turns:


```

DB."SELECT * from foo WHERE name = \{name}"

```


into a `java.sql.ResultSet` object. This is.. contrary to the spirit of
JDBC, which acknowledges that the step of turning a string literal into a
`PreparedStatement` is on its own such a (potentially!) expensive step,
that the PS is, itself, a __resource__ that needs to be separately closed
(unless you rely on the mechanism where closing a Connection closes all
resources that you spawned from it).


In that light, imagine this code:


```

var dates = new ArrayList<LocalDate>();

for (Person p : persons) {

  try (var rs = DB."SELECT birthdate FROM person WHERE id = \{p.id}") {

    dates.add(rs.getObject(1, LocalDate.class));

  }

}

```


We don't see how, but the best possible result here would be if this is
internally desugared into something like:


```

ArrayList<LocalDate> dates;

try (PreparedStatement ps = con.prepareStatement("SELECT birthdate FROM
person WHERE id = ?")) {

  dates = new ArrayList<LocalDate>();

  for (Person p : persons) {

    ps.setInt(1, p.id);

    try (var rs = ps.query()) {

      dates.add(rs.getObject(1, LocalDate.class));

    }

  }

}

```


... but we don't get how a StringTemplate.Processor could possibly make
this work.


and, unfortunately, if it is _not possible_ to make this work, then __the
example is just horrible__.

In the sense that if someone actually wrote a library that worked the way
Jim's proposal suggests, we would need to

get out our soapbox and tell everybody how this library is a bad idea
nobody should be using.


Surely it can't be so incredibly difficult to find examples for how string
templates are awesome that we have to 'stoop'

to _fundamentally_ incompatible-with-the-notion examples!


If it is not possible to adapt template processors to have this kind of
powers, we're proposing that the entire 'look, this will save you from SQL
injection!' example is either ditched entirely or rewritten, possibly by
showing off how it would work with one of the many JDBC-abstracting
libraries out there like JOOQ, JDBI, or jdbc-template. Or that the entire
section gets a big fat warning caveat that none of this is actually a good
idea for real use.


## Compile time checking


It's on our todo list to look into it, but Brian Goetz has repeatedly
mentioned that e.g. `LocalDate.of(2023, 4, 1)` is eventually going to be a
compile time constant. we have no idea how this works, but, separate from
whatever proposal is in the works for that, it would _awesome_ if at
compile time the string template processor gets to inform the compiler
about what's going on. It would be _fantastic_ if this:


```

Pattern p = REGEX."foo+(bar)";

Pattern q = REGEX."foo+(bar";

```


results in a _compile time_ error telling me that the first line is fine,
but the second line is not because in the regex, the parens aren't matched.


It's probably beyond scope, but whatever Brian was cooking up for pluggable
compile time constants e.g. to make `LocalDate.of` be constant, that it
interops with this feature proposal.


-- Reinier Zwitserloot and Roel Spilker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20230330/4a17b061/attachment-0001.htm>


More information about the amber-dev mailing list