Enhanced String literal for Java: version 1.3 with implementation.
rssh at gradsoft.com.ua
rssh at gradsoft.com.ua
Wed Mar 18 04:14:59 PDT 2009
Good day.
Here is next version with enhanced string literals for Java.
Changes are:
- compiler part of implementation with set of jtreg tests is added.
(technically, I see, that jdk and langtools are different projects in
java net, so looks, like I can't write test, which work with library,
and library changes must be technically next step ?)
- refined process of leading whitespace elimination, to allow less
number of whitespaces in last line of multiline string, which consists
only from close quotes.
- Incorrect common whitespace prefix in multiline string now casue
warning, not error.
- Jeremy Manson removed himself from list of authors. His 'raw' strings
left in proposal.
- Added references to
- proposal by Stephen Colebourne;
- very clever proposal and library 'implementation' by by Sven
Efftinge;
- funny proposal by Felix Frost.
- Cosmetics changes in text.
Here is next version: 1.3
AUTHOR(s): Ruslan Shevchenko, Reinier Zwitserloot (if agree)
OVERVIEW:
FEATURE SUMMARY:
new string literals in java language:
* multiline string literals.
* string literals without escape processing.
MAJOR ADVANTAGE:
Possibility more elegant to code strings from other languages, such as
sql constructions or inline xml (for multiline strings) or regular
expressions
(for string literals without escape processing).
MAJOR DISADVANTAGE
Small increasing of language complexity.
ALTERNATIVES:
For multiline strings use operations and concatenation methods, such as:
String,contact("Multiline \n",
"string ");
or
String bigString="First line\n"+
"second line"
For unescaped ('raw') stings - use escaping of ordinary java string.
EXAMPLES
SIMPLE EXAMPLE:
Multiline string:
StringBuilder sb = new StringBuilder();
sb.append("""select a from Area a, CountryCodes cc
where
cc.isoCode='UA'
and
a.owner = cc.country
""");
if (question.getAreaName()!=null) {
sb.append("""and
a.name like ?
""");
sqlParams.setString(++i,question.getAreaName());
}
instead:
StringBuilder sb = new StringBuilder();
sb.append("select a from Area a, CountryCodes cc\n");
sb.append("where cc.isoCode='UA'\n");
sb.append("and a.owner=cc.country'\n");
if (question.getAreaName()!=null) {
sb.append("and a.name like ?");
sqlParams.setString(++i,question.getAreaName());
}
String platformDepended="""q
""".nativeLf();
is 'q\n' if run on Unix and 'q\n\r' if run on Windows.
String platformIndepended="""q
""";
is always "q\n".
String platformIndepended="""q
"""U.unixLf();
is the same.
String platformIndepended="""
""".windowsLf();
is always '\r\n'.
Unescaped String:
String myParrern=''..*\.*'';
instead
String myParrern="\..*\\.*";
String fname=''C:\Program Files\My Program\Configuration'';
instead
String myParrern="C:\\Program Files\\My Program\\Configuration";
ADVANCED EXAMPLE:
String empty="""
""";
is empty.
String foo = """
bar
baz
bla
qux";
is equal to: String foo = "bar\n baz\n bla\nqux";
and the following:
String foo = """
foo
bar""";
is compiled to "foo\nbar" as compilation warning
String manyQuotes="""\"""\"\"\"""";
is """""
String s = """I'm long string in groovy stile wi\
th \\ at end of line""";
is:
I'm long string in groovy stile with \ at end of line
String s = '' I'm long string in groovy stile wi\
th \\ at end of line'';
is:
I'm long string in groovy stile wi\
th \\ at end of line
DETAILS:
Multiline strings are part of program text, which begin and ends by three
double quotes.
I. e. grammar in 3.10.5 of JLS can be extented as:
<pre>
MultilineStringLiteral:
""" MultilineStringCharacters/opt """
MultilineStringCharacters:
MultilineStringCharacter
MultilineStringCharacters (MultilineStringCharacter but not ")
(MultilineStringCharacters but not "") "
MultilineStringCharacter:
InputCharacter but not \
EscapeSequence
LineTermination
EolEscapeSequence
EolEscapeSequence: \LineTermination.
</pre>
Unescaped strings are part of program text, which begin and ends by two
single quotes.
<pre>
RowStringLiteral:
'' RowInputCharacters/opt ''
RowInputCharacters:
' (InputCharacter but not ')
|
(InputCharacter but not ') '
|
LineTermination
</pre>
Methods for replacing line termination sequences in string to native
format of host platform, and to well-known unix/windows formats
must be added to standard library.
COMPILATION:
Handling of multiline strings:
Text within """ brackets processed in next way:
1. splitted to sequence of lines by line termination symbols.
2. escape sequences in each line are processed exactly as in ordinary Java
strings.
3. sequence \LineTermination at the end of line is erased and such line
cause line be concatenated with next line in one.
4. elimination of leading whitespaces are processed in next way:
- at first determinate sequence of whitespace symbols (exclude
LineTermination, i.e. ST, HP, FF) at first nonempty line in sequence.
let's call it 'leading whitespace sequence'
- all next lines, except case, when last line consists only from
multiline string close quotes (""") must start with same leading
whitespace sequence.
- whitespace processing erase such leading sequence from resulting
lines. If line does not start with leading whitespace sequence, than
compilation
warning is issued
5. set of lines after erasing of leading whitespace sequence is
concatenated, with LF (i. e. '\n') line-termination sequences between two
neighbour lines,
regardless of host system
Handling of row strings:
Text within '' brackets processed in next way:
1. splitted to sequence of lines by line termination symbols.
2. set of lines after erasing of leading whitespace sequence is
concatenated, with '\n' line-termination sequences between two neighbour
lines,
No escape processing, no leading whitespace elimination are performed for
receiving of resulting string value.
new strings literals created and used in .class files exactly as ordinary
strings.
TESTING:
add new strings literals to test-cases for all combinations of finished
and unfinished escape sequences and quotes.
LIBRARY SUPPORT:
It would be good add to String next methods:
s.platformLf() - returns string which replace all line-termination
sequences in s by value of system property 'line.separator'
s.unixLf() - returns string which replace all line-termination sequences
in s by '\n'
s.windowsLf() - returns string which replace all line-termination
sequences in s by '\r\n'
REFLECTIVE APIS: None
OTHER CHANGES: None
MIGRATION: None
COMPABILITY
None
REFERENCES
http://bugs.sun.com/view_bug.do?bug_id=4165111
http://bugs.sun.com/view_bug.do?bug_id=4472509
http://docs.google.com/View?docid=d36kv8n_32g9zj7pdd by by Jacek
Furmankiewicz
http://blog.efftinge.de/2008/10/multi-line-string-literals-in-java.html
library implementation by Sven Efftinge
http://www.jroller.com/scolebourne/entry/java_7_multi_line_string -
proposal by Stephen Colebourne
http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000331.html -
alternative joke proposal by Felix Frost
IMPLEMENTATION URL:
Compiler changes with set of jtreg tests available from mercurial
repository at
http://datacenter.gradsoft.ua/mercurial/cgi/hgwebdir.cgi/jdk7/tl/langtools/
More information about the coin-dev
mailing list