String reboot - (1a) incidental whitespace

Alex Buckley alex.buckley at oracle.com
Sat Apr 20 00:16:31 UTC 2019


On 4/10/2019 8:22 AM, Jim Laskey wrote:
> Line terminators:  When strings span lines, they do so using the line
> terminators present in the source file, which may vary depending on what
> operating system the file was authored.  Should this be an aspect of
> multi-line-ness, or should we normalize these to a standard line
> terminator?  It seems a little weird to treat string literals quite so
> literally; the choice of line terminator is surely an incidental one.  I
> think we're all comfortable saying "these should be normalized", but its
> worth bringing this up because it is merely one way in which incidental
> artifacts of how the string is embedded in the source program force us
> to interpret what the user meant.

No-one has commented on this, but it's important because some libraries 
are going to be surprised by the presence of line terminators, of any 
kind, in strings denoted by multi-line string literals.

To be clear, I agree with normalizing line terminators. And, I 
understand that any string could have contained line terminators thanks 
to escape sequences in traditional string literals. But, it was not 
common to see a \n except where multi-line-ness was expected or 
harmless. Going forward, who can guarantee that refactoring the argument 
of `prepareStatement` from a sequence of concatenations:

   try (PreparedStatement s = connection.prepareStatement(
       "SELECT * "
     + "FROM my_table "
     + "WHERE a = b "
   )) {
       ...
   }

to a multi-line string literal:

   try (PreparedStatement s = connection.prepareStatement(
       """SELECT *
          FROM my_table
          WHERE a = b"""
   )) {
       ...
   }

is behaviorally compatible for `prepareStatement`? It had no reason to 
expect \n in its string argument before.

(Hat tip: 
https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/)

Maybe `prepareStatement` will work fine. But someone somewhere is going 
to take a program with a sequence of 2000 concatenations and turn them 
into a huge multi-line string literal, and the inserted line terminators 
are going to cause memory pressure, and GC is going to take a little 
longer, and eventually this bug will be filed: "My system runs 5% slower 
because the source code changed a teeny tiny bit."

In reality, a few libraries will need fixing, and that will happen 
quickly because developers are very keen to use multi-line string 
literals. But it's fair to point out that while everyone is worrying 
about whitespace on the left of the literal, the line terminators to the 
right are a novel artifact too.

Alex


More information about the amber-spec-experts mailing list