Enhancing Java String Literals Round 2

Thibault Kruse tibokruse at googlemail.com
Thu Jan 10 12:55:58 UTC 2019


(I could not register for any other amber-spec-* mailing list, the
final confirmation step after
https://mail.openjdk.java.net/mailman/confirm/amber-dev leads to an
error page)

I just wanted to drop my 5 cents on the Raw Literals discussion.

Some more examples of multiline Strings in Java codebases:
 * Text templates (such as for emails)
  * Long log messages
  * Textual console output (such as usage for --help)
  * Annotations for metadata such as Swagger
     @ApiOperation(value = "Finds Pets",  notes = "Multiple status
values can be provided  with comma separated strings\n...",)
  * Unit test expectations assertEquals("""...""", stdout)

Other considerations, mentioned for brainstorming purposes:

* Strings in Java are often split/merged during the lifetime of code,
a prefix for a String like R"foo\nbar"has the disadvantage that
semantics change on split like R"foo" + "\nbar", same for merging
Strings. Using markers on both sides of the String helps with that.

* Annotations could be used like @Raw "x\ny" (disregarding my previous comment)

* In multiline Strings, some ambiguity about the newline characters
exists (“\n” or “\r” or “\r\n”), which can be painful e.g. in the case
where the multiline String is the expected value for an assertion.

* The first and last newline in multiline Strings are a common source
of confusion, same as for the indentation. Consider the next three
examples, where _ means a blank character:

____"""
Foo
Bar
"""

____"""Foo
Bar"""

   """___
Foo
Bar
___"""

____"""___
____Foo
____Bar
____"""

A language can either allow all variants or insist e.g. on the first
one. When allowing all, a language can still declare all 3 to define
the same String "Foo\nBar", or 3 different Strings.
E.g. consider the definition for Swift
(https://docs.swift.org/swift-book/LanguageGuide/StringsAndCharacters.html):
'A multiline string literal includes all of the lines between its
opening and closing quotation marks. The string begins on the first
line after the opening quotation marks (""") and ends on the line
before the closing quotation marks'
Similar for nim(https://nim-lang.org/docs/manual.html#lexical-analysis-string-literals):
'String literals can also be delimited by three double quotes """ ...
""". Literals in this form may run for several lines, may contain "
and do not interpret any escape sequences. For convenience, when the
opening """ is followed by a newline (there may be whitespace between
the opening """ and the newline), the newline (and the preceding
whitespace) is not included in the string. The ending of the string
literal is defined by the pattern """[^"], so this:
""""long string within quotes"""" Produces: "long string within quotes"'

* Other than looking at the syntax of other languages, I recommend
reading the design discussions other language communities had about
the same issue, e.g.:

  * https://github.com/apple/swift-evolution/blob/master/proposals/0168-multi-line-string-literals.md
  * https://forums.swift.org/t/pure-bikeshedding-raw-strings-why-yes-again/13866?page=4
  * https://github.com/rust-lang/rust/issues/9411
  * https://laravel-news.com/flexible-heredoc-and-nowdoc-coming-to-php-7-3

Hope any of this helps.


More information about the amber-dev mailing list