multiline: Must we waste one of the final few 'free' symbols on this?

Reinier Zwitserloot reinier at zwitserloot.com
Wed Feb 28 06:06:20 UTC 2018


Some feedback on multiline string literals. Where 'proposal' is referenced,
it refers to: https://bugs.openjdk.java.net/browse/JDK-8196004

# Must we waste one of the final few 'free' symbols on this? #

If you look at all easily accessible symbols on a keyboard, the only ones
that don't yet have a syntactic meaning in java source files are the
backtick and the hash. Everything else is either defined to be an
identifierpart which makes using them as a symbol somewhat difficult
(that'd be the underscore and the dollar, although the underscore has
already backwards-incompatibly been torn out; presumably the dollar can be
'rescued' in the same fashion). Is THIS what we're going to spend one of
our final 2 to 3 symbols on?

One obvious alternate use for the backtick is for encoding identifiers; if
you want to name a method "while", which the JVM spec does allow you to do,
you could maybe one day use backticks. Some JVM-targeted languages already
do this. I'm not saying this is a good idea, but I am saying that
implementing the raw string literal proposal as written pretty much
eliminates this notion from ever seeing the light of day, forever. Perhaps
it's worth some debate before we just casually close that door in
perpetuity.

alternatives:

Is: R"This is a raw string" an option? An advantage to the 'R' concept is
that you can separate 'escapes arent processed' ('raw') from 'feel free to
newline in these' ('multiline'): The R indicates raw, and hitting enter
immediately after the quote indicates multiline, which would be backwards
compatible as currently its always illegal java if you newline in the
middle of string literals. Thus:

String x = R"Escapes \t are not processed here; this contains raw
backslash-t instead of a tab";
String multi = "
    This is
    multiline but \t DOES contain a tab";
String rawMulti = R"
    This is
    multi with \t backslash-t literally, not a tab";

Another option would be to investigate the use of triple quotes. In java9
syntax, having 3 quotes in immediate succession cannot possibly be valid in
a source file unless in a comment. Therefore, it would seem possible to use
triple quotes as a delimiter without creating the ambiguity mentioned in
the 'Choice of delimiters' section. Example:

String regex = """Hey now I don't have to \w+ escape my backslashes!""";

This syntax also has quite a lot of precedence (kotlin, swift, groovy, and
python). Note that the 'other languages' section misconstrues how python
works; triple quotes is for multiline strings. For raw strings, you use
R"foo". Most python programmers seem to think the R stands for regex, as
that's pretty much what they're always used for. Nevertheless, it stands
for 'raw'. See:
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

In regards to investigating simply allowing java strings to contain
newlines; the 'Choice of delimiters' section has this quote:

> Enabling such a feature would affect tools and tests that assume
multi-line traditional string literals as an error.

This makes no sense. Any unupdated tool would consider use of a backtick
also an error. Either way, tools not aware of the new feature would treat
multiline string literals as a syntax error, whether you use backtick,
quote, or triple-quote. Unlike the introduction of very fancy footwork to
treat backslash-u escapes as raw inside these literals, addition of
backtick (or triple quote, or single quote) as signifying raw and/or
multiline strings won't be particularly difficult for existing java parsers
to implement. It doesn't seem relevant as an argument for or against any
particular delimiter.

 --Reinier Zwitserloot


More information about the amber-dev mailing list