Strings and things

Reinier Zwitserloot reinier at zwitserloot.com
Wed Oct 13 15:15:52 UTC 2021


I love where this feature is going.

There are a few use cases and intended effects (at least, I assume they are
intended) that this feature would have which the document doesn't name. I'm
not sure if it's fully on the radar of Jim and Brian.

## Regexp literals

"Make the language support regexp literals natively" is a feature request
that (in my experience) comes up _a lot_, and it was more or less
determined to be part of the bathwater during the text-block discussion (in
that 'raw strings' were ejected from that proposal). That was mostly about
changing how java interpolates backslashes in string literals which this
feature will not solve. However, surely:

Pattern."^foo|bar$"

is all around an improvement over the current
`Pattern.compile("^foo|bar$")`. There are many things that native regexp
literals are supposed to accomplish, and this string interpolation proposal
doesn't cover all of it, but 'have the constant itself be of type Pattern'
is one of them, and this proposal can do that, which is awesome. The vast
majority of regexp strings out there are straight constants with no need
for string interpolation at all, and yet this feature improves matters
some; in that sense, this goes further than merely 'a nice way to add
string interpolation to java' - it's also a way to add "typed strings" to
java! Now to find a way to go back in time and eliminate String::replaceAll
from having ever existed. At least it can be overloaded with one that takes
a Pattern as first param, perhaps.

That leads to...

## Compile/Write-time identification (IDEs)

To an extent IDEs can already do this, but this feature would streamline
and simplify this idea: IDEs can now definitely identify the nature of a
string literal in your source code, by checking the type of the
TemplatePolicy.

If you were to type:

Pattern."^hello(there$"

in your IDE, an IDE should flag this, immediately (as you type, before
saving, before running any compilers or build tooling), as erroneous code;
that regexp has an unclosed paren. It should also syntax color the string
literal because the IDE 'knows' that it is a regexp. It can even offer
regexp editor and in-line tester popups if they want to go that far. Piling
that kind of functionality on top of the above feels a little cleaner than
piling it on top of `Pattern.compile("string-literal-here")`. Without going
overboard and bringing compiler plugins into scope, the IDE needs to just
'hard code' Pattern and all that that implies for now, but it's a start.

## This _improves_ security!

The same principle applies to SQL. The IDE can know the type of the
'receiver', and from there figure out that it is SQL, and thus apply syntax
checking and highlighting suitable for SQL strings. This then pushes people
__towards__ better security instead of asking them to trade it off, as
they'd want these IDE features and can't get them without using these
templated strings.

Something like:

con.prepareStatement("SELECT * FROM foo WHERE username = '" + username +
"'");

is a disaster waiting to happen but isn't flagged, but:

con."SELECT * FROM foo WHERE username = '" + username + "'";

either doesn't compile at all, or if it does, will fail at runtime (I'm
with Stephen on this one; failing at compile time would be even better,
but, one step at a time): the TemplatePolicy applied by `con` on
templatestring `SELECT * FROM foo WHERE username = '` has the opportunity
to read through that, notice the unbalanced quote, and throw an exception.
I implore JDBC driver writers to do just that.

Thus, if you want the benefits of IDE supported SQL highlighting and
auto-complete and such, you must use this construct, and in doing so, you
mostly eliminate the SQL injection opportunity. Marvellous.

It's things like this that make me excited about this feature :)

## A small issue: The type of the formatted string.

Cay mentioned this as well (in the context of how other languages do it),
but right now the proposal uses `String templateString` (in TemplatedString
/ TemplatePolicy), but it's not obvious what this string would contain.
Presumably not just "Hello {name} you are {age} years old" - a policy would
have no way to differentiate a 'hole' like "{name}" from a literal open
brace character. I assume this is something to be worked on later? One way
out is to pass a string where the holes are gone entirely, plus an int
array with the positions where the holes are. Turn the above into:

new TemplateString("Hello  you are  years old", new int[] {6, 16}).

or possibly:

new String[] {"Hello ", " you are ", " years old"}.

and then hand _that_ to the TemplatePolicy instead of `String
templatedString`.

As long as that object is constructed only once in the runtime, that
doesn't seem like a costly move, performance-wise. But, it would be nice if
the code that actually runs to process that object + the `List<Object>` of
params gets to also rely on some pre-processing "only once" without
handrolling a lazy-initializing system. As Stephen indicated, some of the
use cases of this feature have non-trivial preprocessing needs (and usually
validation needs as well). Brian mentioned that a lot of the 'cost' in the
java ecosystem's use of `String.format` is the overhead of reparsing that
format string over and over again. This would eliminate the runtime cost
down to once-per-literal which sounds like a worthwhile endeavour, no?

A 'static method' in an interface (not a thing in java right now) seems
like an answer here, so that as part of constructing that TemplateString
(which is done once, presumably as part of loading in the class, similar to
a `static {}` block in one), it is handed to some factory to pre-process
it. Some care needs to be taken here, as the 'receiver' needs to be
statically available (as it gets run during class load time, no instances
are available or can be resolved here), and yet the point of this feature
is that the receiver can be just about anything, such as `con."SELECT *
..."`. `con` is not available during class-init time, of course. Hence, it
needs to be a 'static method' on the compile-time type of `con`, and java
doesn't have a readily available mechanism to do such things. Perhaps an
annotation on the `java.sql.Connection` type pointing at a type for which
the classload mechanism can cook up a new instance via the no-args
constructor which can then do this processing? Or go the route of `main`
and `agentmain` and the serialization mechanism and employ structural
typing: Look for a method with a specific name (that's.. not a part of the
java spec I'm particular fond of).

It goes some ways to address Stephen Colebourne's concerns, so perhaps it
should be in scope.

* Right now, writing `Pattern.compile("foo|bar")`, or even
`input.replaceAll("someregexp", "replacement")` anywhere in java code is a
performance problem: That means the regexp needs to be parsed every time
that code is executed. The only way out right now is to have e.g. a
`private static final Pattern p = "PATTERN_GOES_HERE"` at the top so that
the regexp value itself is parsed, validated, and processed only once.
That's annoying to do. That kind of trick isn't even available for
`String.format`. However, adding this pre-processing step would solve the
problem! You can write `Pattern."foo|bar"` anywhere, and during class load
and init time, that is first turned into a `new TemplateString("foo|bar",
new int[] {})`, handed to the 'RegexpValidator', which turns that into
whatever it wants (presumably some sort of ValidatedRegExpTree object), and
the actual code `Pattern."foo|bar"` is turned into an invoke that hands
this ValidatedRegExpTree together with the parameters (which, here, is a
zero-length list) to the TemplatePolicy. Analogous process for
`String.format`.

* The validation of these strings is now at least always done at class-init
time instead of first-execution. This is a very pale shadow of compile-time
checked of course as Stephen indicates, but still better presumably. If
code has bugs, best that they occur sooner rather than later.

* With this system in place, meshing this feature together with
compiler-constant folding (see Brian's post and link to presentation) is a
little closer: If the path to the RegexpValidator instance which turns a
TemplateString into a ValidatedRegExpTree is entirely statically
determinable and ValidatedRegExpTree is serializable, one day the compiler
could run that step at compile time instead of at class-init time. Perhaps
the answer for that need is to keep it out of scope for the first release
of this feature but needs to be on the horizon as 'eventually we do want to
get there'.


 --Reinier Zwitserloot


On Thu, 16 Sept 2021 at 16:30, Jim Laskey <james.laskey at oracle.com> wrote:

> Dear amber-dev,
>
> Over on amber-spec-experts, I’ve posted a proposal for “string templates”.
> I
> hope people are as excited about this feature as we expect they will be.
>
> As a brief reminder, the amber-dev list is not, in general, for language
> design discussions; this list is primarily for discussion of
> implementation,
> bug repoorts, experience reports, etc.  We should let the experts do their
> job without having to follow parallel discussions on multiple lists.
>
> If people want to post brief, reasoned feedback on the approach,
> applicability, or usefulness (but *NOT* the syntax) of the proposal here,
> that would be fine. Please, let’s keep it constructive.
>
> Cheers,
>
> -- Jim
>
>
>


More information about the amber-dev mailing list