String reboot
James Laskey
james.laskey at oracle.com
Sun Feb 10 16:30:26 UTC 2019
I should know better than format e-mails. Many a backslash eaten. The summary should be;
>> For instance; if we got up from the table with the ", """, \", \""" set of delimiters, we could still introduce structured delimiters in the future; either with repeated \ (see Swift) or repeated ". We could also follow a suggestion John made to use a pseudo nonce like \5" for \\\\\" or \""""".
>>
Sent from my iPhone
> On Feb 10, 2019, at 11:43 AM, Jim Laskey <james.laskey at oracle.com> wrote:
>
>
>>
>>
>> Focus
>>
>> Instead of ordering everything on the menu and immobilizing ourselves with excessive gluttony, let’s focus our attention on the appetizer. If we plan correctly, we'll have room for entrees and desserts later.
>>
>> The appetizer here is simplifying the injection of "foreign" language code into Java source. Think tapas. We may well be sated by the time we’re done.
>>
>> Goal
>>
>> Repurposing the Java String as a "foreign" code literal seems to be the most natural and least intrusive contrivance for Java support. In fact, this is already the case. Example;
>>
>> // <html>
>> // <body style="width: 100vw">
>> // <p>Hello World.</p>
>> // </body>
>> // <script>console.log("\nloaded")</script>
>> // </html>
>>
>> String html = "<html>\n" +
>> " <body style=\"width: 100vw\">\n" +
>> " <p>Hello World.</p>\n" +
>> " </body>\n" +
>> " <script>console.log(\"\\nloaded\")</script>\n" +
>> "</html>\n";
>> The primary reason we are having the string literal discussion is that the existing form has a few issues;
>>
>> The existing form is difficult to maintain without support from IDEs and is prone to error. The introduction and subsequent editing of foreign code requires additional delimiters, newlines, concatenations and escape sequences (DNCE).
>>
>> More to the point, the existing form is difficult to read. The additional DNCE obscure the underlying content of the string.
>>
>> Our aim is to come up with a DNCE lexicon that improves foreign code literal readability and maintainability without leaving developers in a confused state; with emphasis on reducing the E (escape sequences.)
>>
>> 50% solution
>>
>> Where we keep running into trouble is that a choice for one part of the lexicon spreads into the the other parts. That is, use of certain characters in the delimiter affect which characters require escaping and which characters can be used for escaping.
>>
>> So, let's pick off the lexicon easy bits first. Newlines, concatenations and in-between delimiters can be implicit if we just allow strings to span multiple lines (see Rust.)
>>
>> String html = "<html>
>> <body style=\"width: 100vw\">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log(\"\\nloaded\")</script>
>> </html>";
>> That's not so bad. If we did nothing else, we still would be better off than we were before.
>>
>> 75% solution, almost
>>
>> What problems are left?
>>
>> The foreign delimiters (quotes) have to be escaped.
>>
>> The foreign escape sequences also have to be escaped.
>>
>> And to a lesser degree, it's difficult to locate the closing delimiter.
>>
>> Fortunately, we don't have many choices for dealing with escapes;
>>
>> Backslash is Java's escape character.
>>
>> Either escaping is on or is off (raw), so we need a way to flag a string as being escaped. We could have an option to turn escaping on/off within a string, but it has been hard to come up with examples where this might be required.
>>
>> Even with escaping off, we still might have to escape delimiters. Repeated backslashes (or repeated delimiters) is the typical out.
>>
>> How about trying as the flag for escapes off;
>>
>> String html = \"<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>";
>> That doesn't work because it looks like the string ends at the first quote. Let's try symmetry, either " or " as the closing delimiter. " is preferable because then it doesn't look like an escape sequence (see Swift.)
>>
>> String html = \"<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>"\;
>> The only new string rule added is to allow multi-line strings.
>>
>> Adding backslash before and after the string indicates escaping off.
>>
>> But wait
>>
>> This looks like the 75% solution;
>>
>> Builds on our cred with existing strings.
>>
>> Escape processing is orthogonal to multi-line.
>>
>> Delimiter can easily be understood to mean “string with escapes."
>>
>> But wait. "" looks like it contains the end delimiter. Rats!!! Captain we need more sequences.
>>
>> And, this is the crux of all the debate around strings. Fixed delimiters imply a requirement for escape sequences, otherwise there is content you cannot express as a string.
>>
>> The inverse of this implication is that if you have escape sequences you don't need flexible delimiters. This can be reinterpreted as you only need flexible delimiters if you want to always avoid escape sequences.
>>
>> Wasn't avoiding escape sequences the goal?
>>
>> All this brings us to the central choice we have to make before we get into the rest of the meal. Do we go with fixed delimiter(s), structured delimiters or nonce delimiters.
>>
>> Fixed delimiter
>>
>> If we go with a fixed delimiter then we limit the content that can be expressed without escape sequences. This is not totally left field. There are floating point values we can not express in Java and types we can express but not denote, such as anonymous class types, intersection types or capture types.
>>
>> Everything is a degree of tradeoff. And, those tradeoffs are okay as long as we are explicit about it.
>>
>> We could get closer to the 85% mark if we had a way to have " in our content without escaping. Let's introduce a secondary delimiter, """.
>>
>> String html = """<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\\nloaded")</script>
>> </html>""";
>> The introduction of """ would allow " with the only restriction that we can not use """ in the content without escaping. We could say that """ also means escaping off, but then we would have no way to escape """ (\"""). Keeping escaping as an orthogonal issue allows the best of both worlds.
>>
>> String html = \"""<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>"""\;
>> Once you take away conflicts with the delimiter, most strings do not require escaping.
>>
>> Also at this point we should note that other combinations of quotes ('''. ```, "'") don't bring anything new to the table; Tomato/Tomato, Potato/Potato.
>>
>> Summary: All strings can be expressed with fixed plus escaping, but can not express strings containing the fixed delimiter (""") with escaping off.
>>
>> Jumping ahead: I think that stating that traditional " strings must be single-line will be a popular restriction, even if it not needed. Then they will think of """ as meaning multi-line.
>>
>> Structured delimiter
>>
>> A structured delimiter contains a repeating pattern that can be expanded to suit a scenario. We attempted to introduce this notion with the original backtick proposal, but that proposal was withdrawn because a) didn't want to burn the backtick, b) developers weren't comfortable with infinitely repeating delimiters, and c) non-expressible anomalies such as content with leading or trailing backticks.
>>
>> Using " instead of backtick addresses a).
>>
>> String html = """"""<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>"""""";
>> For b) is there a limit where developers would be comfortable? That is, what about a range of fixed delimiters; ", """, """", """"", """""". This is slightly different than fixed delimiters in that it increases the combinations of content containing delimiters. Example, """"" could allow ", """, """", ..., Nx" for N != 5.
>>
>> Structured delimiters also differ from fixed delimiters in the fact that there is pressure to have escaping off when N >= 3. You can always fall back to a single ".
>>
>> Summary: Can express all strings with and without escaping. If the delimiter length is limited the there there is still a (smaller) set of strings that can not be expressed.
>>
>> Nonce delimiter
>>
>> A nonce or custom delimiter allows developers to include a unique character sequence in the delimiter. This provides a flexible delimiter without fear of going too far. There is also the advantage/distraction of providing commentary.
>>
>> String html = \HTML"<html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>"HTML\;
>> Summary: Can express all strings with and without escaping, but nonce can affect readability.
>>
>> Multi-line formatting
>>
>> I left this out of the main discussion, but I think we can all agree that formatting rules should separate the delimiters from the content. Other details can be refined after choice of delimiter(s).
>>
>> String html = \"""
>> <html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>
>> """\;
>> String html = """"""
>> <html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>
>> """""";
>> String html = \HTML"
>> <html>
>> <body style="width: 100vw">
>> <p>Hello World.</p>
>> </body>
>> <script>console.log("\nloaded")</script>
>> </html>
>> "HTML/;
>> Entrees and desserts
>>
>> If we make good choices now (stay away from the oysters) we can still move on to other courses later.
>>
>> For instance; if we got up from the table with the ", """, ", """ set of delimiters, we could still introduce structured delimiters in the future; either with repeated (see Swift) or repeated ". We could also follow a suggestion John made to use a pseudo nonce like " for \\" or """"".
>>
>> Point being, we can work with a 85% solution now that we can supplement later when we're not so hangry.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190210/f638607a/attachment-0001.html>
More information about the amber-spec-experts
mailing list