String reboot - (1a) incidental whitespace
Alex Buckley
alex.buckley at oracle.com
Sat Apr 20 00:16:31 UTC 2019
On 4/10/2019 8:22 AM, Jim Laskey wrote:
> Line terminators: When strings span lines, they do so using the line
> terminators present in the source file, which may vary depending on what
> operating system the file was authored. Should this be an aspect of
> multi-line-ness, or should we normalize these to a standard line
> terminator? It seems a little weird to treat string literals quite so
> literally; the choice of line terminator is surely an incidental one. I
> think we're all comfortable saying "these should be normalized", but its
> worth bringing this up because it is merely one way in which incidental
> artifacts of how the string is embedded in the source program force us
> to interpret what the user meant.
No-one has commented on this, but it's important because some libraries
are going to be surprised by the presence of line terminators, of any
kind, in strings denoted by multi-line string literals.
To be clear, I agree with normalizing line terminators. And, I
understand that any string could have contained line terminators thanks
to escape sequences in traditional string literals. But, it was not
common to see a \n except where multi-line-ness was expected or
harmless. Going forward, who can guarantee that refactoring the argument
of `prepareStatement` from a sequence of concatenations:
try (PreparedStatement s = connection.prepareStatement(
"SELECT * "
+ "FROM my_table "
+ "WHERE a = b "
)) {
...
}
to a multi-line string literal:
try (PreparedStatement s = connection.prepareStatement(
"""SELECT *
FROM my_table
WHERE a = b"""
)) {
...
}
is behaviorally compatible for `prepareStatement`? It had no reason to
expect \n in its string argument before.
(Hat tip:
https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/)
Maybe `prepareStatement` will work fine. But someone somewhere is going
to take a program with a sequence of 2000 concatenations and turn them
into a huge multi-line string literal, and the inserted line terminators
are going to cause memory pressure, and GC is going to take a little
longer, and eventually this bug will be filed: "My system runs 5% slower
because the source code changed a teeny tiny bit."
In reality, a few libraries will need fixing, and that will happen
quickly because developers are very keen to use multi-line string
literals. But it's fair to point out that while everyone is worrying
about whitespace on the left of the literal, the line terminators to the
right are a novel artifact too.
Alex
More information about the amber-spec-experts
mailing list