Enhanced String literal for Java: version 1.4 (final ?)

rssh at gradsoft.com.ua rssh at gradsoft.com.ua
Sat Mar 21 13:46:14 PDT 2009


Changes from previous release:
 - removed  windowsLf() and unixLf() String member functions.
 - library changes (i.e. String.platformLf() is implemented)
 - added reference to LongString annotation implementation by Bruce Chapman



AUTHOR(s): Ruslan Shevchenko, Reinier Zwitserloot (if agree)

version:1.4 snapshot.

OVERVIEW:

FEATURE SUMMARY:
new string literals in java language:
* multiline string literals.
* raw string literals without escape processing.

Note, that this proposal say nothing about possible embedding
expressions in string literals: we belive this can be done on another
level (i.e. library level, by extending family of format functions or
reusing JSR223 scripting interface  or metaprogramming-level, by parsing
ready string literal by annotation processor or hight-level compiler pass)

MAJOR ADVANTAGE:
Possibility more elegant to code strings from  other languages, such as
sql constructions or inline xml (for multiline strings) or regular
expressions
(for string literals without escape processing).

MAJOR DISADVANTAGE
 Small increasing of language complexity.

ALTERNATIVES:

For multiline strings use operations and concatenation methods, such as:

 <pre>
  String,contact("Multiline \n",
                 "string ");
 </pre>
 or
 <pre>
  String bigString="First line\n"+
                   "second line"
 </pre>

For unescaped ('raw') stings - use escaping of ordinary java string.


EXAMPLES

SIMPLE EXAMPLE:

Multiline string:

 <pre>
  StringBuilder sb = new StringBuilder();
  sb.append("""select a from Area a, CountryCodes cc
                where
                   cc.isoCode='UA'
                  and
                   a.owner = cc.country
              """);
  if (question.getAreaName()!=null) {
     sb.append("""and
                  a.name like ?
               """);
     sqlParams.setString(++i,question.getAreaName());
  }
 </pre>

 instead:
 <pre>
  StringBuilder sb = new StringBuilder();
  sb.append("select a from Area a, CountryCodes cc\n");
  sb.append("where cc.isoCode='UA'\n");
  sb.append("and a.owner=cc.country'\n");
  if (question.getAreaName()!=null) {
     sb.append("and a.name like ?");
     sqlParams.setString(++i,question.getAreaName());
  }
 </pre>

 <pre>
 String platformDepended="""q
 """.platformLf();
 </pre>
 is 'q\n' if run on Unix and 'q\n\r' if run on Windows.

 <pre>
 String platformIndepended="""q
 q""";
 </pre>
 is always "q\nq".

Unescaped String:
 <pre>
 String myParrern=''..*\.*'';
 </pre>
 instead
 <pre>
 String myParrern="\..*\\.*";
 </pre>

 <pre>
 String fname=''C:\Program Files\My Program\Configuration'';
 </pre>
 instead
 <pre>
 String myParrern="C:\\Program Files\\My Program\\Configuration";
 </pre>



ADVANCED EXAMPLE:


 <pre>
 String empty="""
 """;
 </pre>
 is empty.

 <pre>
 String foo = """
     bar
                 baz
       bla
     qux";
 </pre>

is equal to: String foo = "bar\n            baz\n  bla\nqux";

and the following:
 <pre>
 String foo = """
    foo
 bar""";
 </pre>
 is a compile-time error.

 <pre>
 String manyQuotes="""\"""\"\"\"""";
 </pre> is """""

 <pre>
  String s = """I'm  long string in groovy stile wi\
  th \\ at end of line""";
 </pre>
 is:
 <pre>I'm  long string in groovy stile with \ at end of line</pre>

 <pre>
  String s = ''I'm  long string in groovy stile wi\
  th \\ at end of line'';
 </pre>
 is:
 <pre>
 I'm  long string in groovy stile wi
 th \\ at end of line
 </pre>


DETAILS:

Multiline strings are part of program text, which begin and ends by three
double quotes.

I. e. grammar in 3.10.5 of JLS can be extented as:

<pre>
MultilineStringLiteral:
        """ MultilineStringCharacters/opt """

MultilineStringCharacters:
        MultilineStringCharacter
        MultilineStringCharacters  (MultilineStringCharacter but not ")
        (MultilineStringCharacters but not "") "

MultilineStringCharacter:
        InputCharacter but not \
        EscapeSequence
        LineTermination
        EolEscapeSequence

EolEscapeSequence: \LineTermination.

</pre>


Unescaped strings are part of program text, which begin and ends by two
single quotes.


<pre>
 RowStringLiteral:
                   '' RowInputCharacters/opt ''

 RowInputCharacters:
                      ' (InputCharacter but not ')
                     |
                      (InputCharacter but not ') '
                     |
                      LineTermination
</pre>

 Method for replacing line termination sequences in string to native
format of host platform must be added to standard library.


COMPILATION:

Handling of multiline strings:

Text within """ brackets processed in next way:

1. splitted to sequence of lines by line termination symbols.
2. escape sequences in each line are processed exactly as in ordinary Java
strings.
3. sequence \LineTermination at the end of line is erased and such line
cause line be concatenated with next line in one.
4. elimination of leading whitespaces are processed in next way:
  - at first determinate sequence of whitespace symbols (exclude
LineTermination, i.e. ST, HP, FF) at first nonempty line in sequence.
    let's call it 'leading whitespace sequence'
  - all next lines, except case, when last line consists only from
multiline string close quotes (""") must start with same leading
whitespace sequence.
  - whitespace processing erase such leading sequence from resulting
lines. If line does not start with leading whitespace sequence, than
compilation
    warning is issued
5. set of lines after erasing of leading whitespace sequence is
concatenated, with LF (i. e. '\n') line-termination sequences between two
neighbour lines,
   regardless of host system


Handling of row strings:
Text within '' brackets processed in next way:
1. splitted to sequence of lines by line termination symbols.
2. set of lines after erasing of leading whitespace sequence is
concatenated, with '\n' line-termination sequences between two neighbour
lines,

No escape processing, no leading whitespace elimination are performed for
receiving of resulting string value.

new strings literals created and used in .class files exactly as ordinary
strings.

TESTING:
 add new  strings literals to test-cases for all combinations of finished
and unfinished escape sequences and quotes.


LIBRARY SUPPORT:

 add method platformLf() to class String, which convert newlines to form,
suitable for running platform.
 s.platformLf() - returns string which replace all line-termination
sequences in s by value of system property 'line.separator'

REFLECTIVE APIS: None

OTHER CHANGES: None

MIGRATION: None

COMPABILITY
 None

REFERENCES

 http://bugs.sun.com/view_bug.do?bug_id=4165111
 http://bugs.sun.com/view_bug.do?bug_id=4472509
 http://docs.google.com/View?docid=d36kv8n_32g9zj7pdd by  by Jacek
Furmankiewicz
 http://blog.efftinge.de/2008/10/multi-line-string-literals-in-java.html 
library implementation by Sven Efftinge
 http://www.jroller.com/scolebourne/entry/java_7_multi_line_string -
proposal by Stephen Colebourne
 http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000331.html  -
alternative joke proposal by Felix Frost
 https://rapt.dev.java.net/nonav/docs/api/net/java/dev/rapt/proposed/generators/LongString.html
annotation-based implementation  by Bruce Chapman

URL FOR PROTOTYPE:

 Compiler changes with set of jtreg tests is available from mercurial
repository at
  http://datacenter.gradsoft.ua/mercurial/cgi/hgwebdir.cgi/jdk7/tl/langtools/

 Library changes with simple test:
  http://datacenter.gradsoft.ua/mercurial/cgi/hgwebdir.cgi/jdk7/jdk/






More information about the coin-dev mailing list