String reboot (plain text)
elias vasylenko
eliasvasylenko at gmail.com
Mon Feb 11 11:05:09 UTC 2019
> Instead of ordering everything on the menu and immobilizing ourselves with
> excessive gluttony, let’s focus our attention on the appetizer. If we plan
> correctly, we'll have room for entrees and desserts later.
>
This is a nice philosophy, but unfortunately the choices made here may
intersect with, and thus restrict the delivery of, any future courses.
You're going to fill the table up, so to speak, and there will be no more
room. I think this is one of the few situations where the entire meal needs
to be served all in one go.
I agree with your goal statement, but there are assumptions about escaping
which I don't think bear out. And I realise I've said most of this before
on this list and I hate to sound like a broken record, but this choice can
only be made once.
• Either escaping is on or is off (raw), so we need a way to flag a
> string as being escaped. We could have an option to turn escaping on/off
> within a string, but it has been hard to come up with examples where this
> might be required.
>
You reference Swift a few times, so you're probably aware that they still
do still support escaping in raw strings. Escaping is not off, the escape
marker is just changed to avoid collisions. (Where by "escape marker" I
mean the prefix of an escape sequence, normally a single backslash.)
And yes maybe this is more important in Swift, primarily because of the
value of supporting interpolation within raw strings, but that's exactly
the sort of feature you risk boxing yourself out from. As another use-case
consider that some people may be restricted to ASCII format code for
whatever reason, so then what if they want to embed unicode characters in
their otherwise-raw string literals? They may still wish to have the option
to use escapes.
If you don't choose some way to designate a variable escape marker now then
you will forever be precluded from introducing them to raw strings. (At
least without also introducing *yet another* variation on the delimiter
with *yet another* set of rules to apply to it.)
Swift chooses to indicate the form of the escape marker at the opening
delimiter. So what's a sensible choice for the form of such an escape
marker? Swift chooses a backslash followed by a variable-length sequence of
hashes. Another option is simply a variable-length sequences of
backslashes, but this would not be possible to add to your proposal after
the fact for two reasons.
- The single backslash case would already have been assigned a different
meaning.
- The closing delimiter needs to be reshuffled to avoid potential
collisions.
To illustrate this consider the case where we select \\" as the delimiter,
and we say that this implies \\ as the escape marker:
String html = \\"<html>
<body style="width: 100vw">
<p>Hello World.</p>
</body>
<script>console.log("\nloaded")</script>
</html>"\\;
That looks fine when there are no escapes involved, but if we want to
escape something at the start of the embedded javascript string, e.g. if
Java were to introduce interpolation like Swift in the future, then this
becomes ambiguous with our closing delimiter:
String loadedMessage = loadedMessage();
String html = \\"<html>
<body style="width: 100vw">
<p>Hello World.</p>
</body>
<script>console.log("\\(loadedMessage)\n")</script>
</html>"\\;
> That doesn't work because it looks like the string ends at the first
> quote. Let's try symmetry, either \" or "\ as the closing delimiter. "\ is
> preferable because then it doesn't look like an escape sequence (see Swift.)
>
But in a way it *is* an escape sequence in that context! One which endows
special meaning to the quote to say that it should behave as a delimiter
instead of just another character in the string. If you embrace that notion
it also avoids the collision in the previous example.
String loadedMessage = loadedMessage();
String html = \\"<html>
<body style="width: 100vw">
<p>Hello World.</p>
</body>
<script>console.log("\\(loadedMessage)\n")</script>
</html>\\";
- No strings *require* escaping. All possible content is representable with
no escapes, including strings which start or end with backslashes or quotes.
- All strings *allow* an escape sequence to occur in any location.
- Lexing and user model are both straightforward and captured by a small
number of rules.
My specific proposal/example with \\\" style delimiters isn't important,
maybe something closer to the Swift way is better. I just want to see
variable escape markers within raw strings be a part of the conversation.
Delivering an 85% solution without thinking about them is sure to restrict
the ways in which they could be introduced, and I hope to avoid seeing that
becoming a source of regret.
String html = \"<html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>"\;
>
> • The only new string rule added is to allow multi-line strings.
>
> • Adding backslash before and after the string indicates escaping
> off.
>
>
> But wait
> ========
>
> This looks like the 75% solution;
>
> • Builds on our cred with existing strings.
>
> • Escape processing is orthogonal to multi-line.
>
> • Delimiter can easily be understood to mean “string with escapes."
>
> But wait. "\nloaded" looks like it contains the end delimiter. Rats!!!
> Captain we need more sequences.
>
> And, this is the crux of all the debate around strings. Fixed delimiters
> imply a requirement for escape sequences, otherwise there is content you
> cannot express as a string.
>
> The inverse of this implication is that if you have escape sequences you
> don't need flexible delimiters. This can be reinterpreted as you only need
> flexible delimiters if you want to always avoid escape sequences.
>
> Wasn't avoiding escape sequences the goal?
>
> All this brings us to the central choice we have to make before we get
> into the rest of the meal. Do we go with fixed delimiter(s), structured
> delimiters or nonce delimiters.
>
>
> Fixed delimiter
> ===============
>
> If we go with a fixed delimiter then we limit the content that can be
> expressed without escape sequences. This is not totally left field. There
> are floating point values we can not express in Java and types we can
> express but not denote, such as anonymous class types, intersection types
> or capture types.
>
> Everything is a degree of tradeoff. And, those tradeoffs are okay as long
> as we are explicit about it.
>
> We could get closer to the 85% mark if we had a way to have " in our
> content without escaping. Let's introduce a secondary delimiter, """.
>
> String html = """<html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\\nloaded")</script>
> </html>""";
>
> The introduction of """ would allow " with the only restriction that we
> can not use """ in the content without escaping. We could say that """ also
> means escaping off, but then we would have no way to escape """ (\""").
> Keeping escaping as an orthogonal issue allows the best of both worlds.
>
> String html = \"""<html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>"""\;
>
> Once you take away conflicts with the delimiter, most strings do not
> require escaping.
>
> Also at this point we should note that other combinations of quotes ('''.
> ```, "'") don't bring anything new to the table; Tomato/Tomato,
> Potato/Potato.
>
> Summary: All strings can be expressed with fixed plus escaping, but can
> not express strings containing the fixed delimiter (""") with escaping off.
>
> Jumping ahead: I think that stating that traditional " strings must be
> single-line will be a popular restriction, even if it not needed. Then they
> will think of """ as meaning multi-line.
>
>
> Structured delimiter
> ====================
>
> A structured delimiter contains a repeating pattern that can be expanded
> to suit a scenario. We attempted to introduce this notion with the original
> backtick proposal, but that proposal was withdrawn because a) didn't want
> to burn the backtick, b) developers weren't comfortable with infinitely
> repeating delimiters, and c) non-expressible anomalies such as content with
> leading or trailing backticks.
>
> Using " instead of backtick addresses a).
>
> String html = """"""<html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>"""""";
>
> For b) is there a limit where developers would be comfortable? That is,
> what about a range of fixed delimiters; ", """, """", """"", """""". This
> is slightly different than fixed delimiters in that it increases the
> combinations of content containing delimiters. Example, """"" could allow
> ", """, """", ..., Nx" for N != 5.
>
> Structured delimiters also differ from fixed delimiters in the fact that
> there is pressure to have escaping off when N >= 3. You can always fall
> back to a single ".
>
> Summary: Can express all strings with and without escaping. If the
> delimiter length is limited the there there is still a (smaller) set of
> strings that can not be expressed.
>
>
> Nonce delimiter
> ===============
>
> A nonce or custom delimiter allows developers to include a unique
> character sequence in the delimiter. This provides a flexible delimiter
> without fear of going too far. There is also the advantage/distraction of
> providing commentary.
>
> String html = \HTML"<html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>"HTML\;
>
> Summary: Can express all strings with and without escaping, but nonce can
> affect readability.
>
>
> Multi-line formatting
> =====================
>
> I left this out of the main discussion, but I think we can all agree that
> formatting rules should separate the delimiters from the content. Other
> details can be refined after choice of delimiter(s).
>
> String html = \"""
> <html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>
> """\;
>
> String html = """"""
> <html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>
> """""";
>
> String html = \HTML"
> <html>
> <body style="width: 100vw">
> <p>Hello World.</p>
> </body>
> <script>console.log("\nloaded")</script>
> </html>
> "HTML/;
>
>
> Entrees and desserts
> ====================
>
> If we make good choices now (stay away from the oysters) we can still move
> on to other courses later.
>
> For instance; if we got up from the table with the ", """, \", \""" set of
> delimiters, we could still introduce structured delimiters in the future;
> either with repeated \ (see Swift) or repeated ". We could also follow a
> suggestion John made to use a pseudo nonce like \5" for \\\\\" or \""""".
>
> Point being, we can work with a 85% solution now that we can supplement
> later when we're not so hangry.
>
>
>
>
>
>
> > On Feb 10, 2019, at 12:30 PM, James Laskey <james.laskey at oracle.com>
> wrote:
> >
> > I should know better than format e-mails. Many a backslash eaten. The
> summary should be;
> >
> >>> For instance; if we got up from the table with the ", """, \", \"""
> set of delimiters, we could still introduce structured delimiters in the
> future; either with repeated \ (see Swift) or repeated ". We could also
> follow a suggestion John made to use a pseudo nonce like \5" for \\\\\" or
> \""""".
> >>>
> >
> >
> >
> > Sent from my iPhone
> >
> > On Feb 10, 2019, at 11:43 AM, Jim Laskey <james.laskey at oracle.com>
> wrote:
> >
> >>
> >>>
> >>>
> >>> Focus
> >>>
> >>> Instead of ordering everything on the menu and immobilizing ourselves
> with excessive gluttony, let’s focus our attention on the appetizer. If we
> plan correctly, we'll have room for entrees and desserts later.
> >>>
> >>> The appetizer here is simplifying the injection of "foreign" language
> code into Java source. Think tapas. We may well be sated by the time we’re
> done.
> >>>
> >>> Goal
> >>>
> >>> Repurposing the Java String as a "foreign" code literal seems to be
> the most natural and least intrusive contrivance for Java support. In fact,
> this is already the case. Example;
> >>>
> >>> // <html>
> >>> // <body style="width: 100vw">
> >>> // <p>Hello World.</p>
> >>> // </body>
> >>> // <script>console.log("\nloaded")</script>
> >>> // </html>
> >>>
> >>> String html = "<html>\n" +
> >>> " <body style=\"width: 100vw\">\n" +
> >>> " <p>Hello World.</p>\n" +
> >>> " </body>\n" +
> >>> "
> <script>console.log(\"\\nloaded\")</script>\n" +
> >>> "</html>\n";
> >>>
> >>> The primary reason we are having the string literal discussion is that
> the existing form has a few issues;
> >>>
> >>> • The existing form is difficult to maintain without support from
> IDEs and is prone to error. The introduction and subsequent editing of
> foreign code requires additional delimiters, newlines, concatenations and
> escape sequences (DNCE).
> >>>
> >>> • More to the point, the existing form is difficult to read. The
> additional DNCE obscure the underlying content of the string.
> >>>
> >>> Our aim is to come up with a DNCE lexicon that improves foreign code
> literal readability and maintainability without leaving developers in a
> confused state; with emphasis on reducing the E (escape sequences.)
> >>>
> >>> 50% solution
> >>>
> >>> Where we keep running into trouble is that a choice for one part of
> the lexicon spreads into the the other parts. That is, use of certain
> characters in the delimiter affect which characters require escaping and
> which characters can be used for escaping.
> >>>
> >>> So, let's pick off the lexicon easy bits first. Newlines,
> concatenations and in-between delimiters can be implicit if we just allow
> strings to span multiple lines (see Rust.)
> >>>
> >>> String html = "<html>
> >>> <body style=\"width: 100vw\">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log(\"\\nloaded\")</script>
> >>> </html>";
> >>>
> >>> That's not so bad. If we did nothing else, we still would be better
> off than we were before.
> >>>
> >>> 75% solution, almost
> >>>
> >>> What problems are left?
> >>>
> >>> • The foreign delimiters (quotes) have to be escaped.
> >>>
> >>> • The foreign escape sequences also have to be escaped.
> >>>
> >>> • And to a lesser degree, it's difficult to locate the closing
> delimiter.
> >>>
> >>> Fortunately, we don't have many choices for dealing with escapes;
> >>>
> >>> • Backslash is Java's escape character.
> >>>
> >>> • Either escaping is on or is off (raw), so we need a way to flag
> a string as being escaped. We could have an option to turn escaping on/off
> within a string, but it has been hard to come up with examples where this
> might be required.
> >>>
> >>> • Even with escaping off, we still might have to escape
> delimiters. Repeated backslashes (or repeated delimiters) is the typical
> out.
> >>>
> >>> How about trying as the flag for escapes off;
> >>>
> >>> String html = \"<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>";
> >>>
> >>> That doesn't work because it looks like the string ends at the first
> quote. Let's try symmetry, either " or " as the closing delimiter. " is
> preferable because then it doesn't look like an escape sequence (see Swift.)
> >>>
> >>> String html = \"<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>"\;
> >>>
> >>> • The only new string rule added is to allow multi-line strings.
> >>>
> >>> • Adding backslash before and after the string indicates escaping
> off.
> >>>
> >>> But wait
> >>>
> >>> This looks like the 75% solution;
> >>>
> >>> • Builds on our cred with existing strings.
> >>>
> >>> • Escape processing is orthogonal to multi-line.
> >>>
> >>> • Delimiter can easily be understood to mean “string with escapes."
> >>>
> >>> But wait. "" looks like it contains the end delimiter. Rats!!! Captain
> we need more sequences.
> >>>
> >>> And, this is the crux of all the debate around strings. Fixed
> delimiters imply a requirement for escape sequences, otherwise there is
> content you cannot express as a string.
> >>>
> >>> The inverse of this implication is that if you have escape sequences
> you don't need flexible delimiters. This can be reinterpreted as you only
> need flexible delimiters if you want to always avoid escape sequences.
> >>>
> >>> Wasn't avoiding escape sequences the goal?
> >>>
> >>> All this brings us to the central choice we have to make before we get
> into the rest of the meal. Do we go with fixed delimiter(s), structured
> delimiters or nonce delimiters.
> >>>
> >>> Fixed delimiter
> >>>
> >>> If we go with a fixed delimiter then we limit the content that can be
> expressed without escape sequences. This is not totally left field. There
> are floating point values we can not express in Java and types we can
> express but not denote, such as anonymous class types, intersection types
> or capture types.
> >>>
> >>> Everything is a degree of tradeoff. And, those tradeoffs are okay as
> long as we are explicit about it.
> >>>
> >>> We could get closer to the 85% mark if we had a way to have " in our
> content without escaping. Let's introduce a secondary delimiter, """.
> >>>
> >>> String html = """<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\\nloaded")</script>
> >>> </html>""";
> >>>
> >>> The introduction of """ would allow " with the only restriction that
> we can not use """ in the content without escaping. We could say that """
> also means escaping off, but then we would have no way to escape """
> (\"""). Keeping escaping as an orthogonal issue allows the best of both
> worlds.
> >>>
> >>> String html = \"""<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>"""\;
> >>>
> >>> Once you take away conflicts with the delimiter, most strings do not
> require escaping.
> >>>
> >>> Also at this point we should note that other combinations of quotes
> ('''. ```, "'") don't bring anything new to the table; Tomato/Tomato,
> Potato/Potato.
> >>>
> >>> Summary: All strings can be expressed with fixed plus escaping, but
> can not express strings containing the fixed delimiter (""") with escaping
> off.
> >>>
> >>> Jumping ahead: I think that stating that traditional " strings must be
> single-line will be a popular restriction, even if it not needed. Then they
> will think of """ as meaning multi-line.
> >>>
> >>> Structured delimiter
> >>>
> >>> A structured delimiter contains a repeating pattern that can be
> expanded to suit a scenario. We attempted to introduce this notion with the
> original backtick proposal, but that proposal was withdrawn because a)
> didn't want to burn the backtick, b) developers weren't comfortable with
> infinitely repeating delimiters, and c) non-expressible anomalies such as
> content with leading or trailing backticks.
> >>>
> >>> Using " instead of backtick addresses a).
> >>>
> >>> String html = """"""<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>>
> <script>console.log("\nloaded")</script>
> >>> </html>"""""";
> >>>
> >>> For b) is there a limit where developers would be comfortable? That
> is, what about a range of fixed delimiters; ", """, """", """"", """""".
> This is slightly different than fixed delimiters in that it increases the
> combinations of content containing delimiters. Example, """"" could allow
> ", """, """", ..., Nx" for N != 5.
> >>>
> >>> Structured delimiters also differ from fixed delimiters in the fact
> that there is pressure to have escaping off when N >= 3. You can always
> fall back to a single ".
> >>>
> >>> Summary: Can express all strings with and without escaping. If the
> delimiter length is limited the there there is still a (smaller) set of
> strings that can not be expressed.
> >>>
> >>> Nonce delimiter
> >>>
> >>> A nonce or custom delimiter allows developers to include a unique
> character sequence in the delimiter. This provides a flexible delimiter
> without fear of going too far. There is also the advantage/distraction of
> providing commentary.
> >>>
> >>> String html = \HTML"<html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>"HTML\;
> >>>
> >>> Summary: Can express all strings with and without escaping, but nonce
> can affect readability.
> >>>
> >>> Multi-line formatting
> >>>
> >>> I left this out of the main discussion, but I think we can all agree
> that formatting rules should separate the delimiters from the content.
> Other details can be refined after choice of delimiter(s).
> >>>
> >>> String html = \"""
> >>> <html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>
> >>> """\;
> >>>
> >>> String html = """"""
> >>> <html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>
> >>> """""";
> >>>
> >>> String html = \HTML"
> >>> <html>
> >>> <body style="width: 100vw">
> >>> <p>Hello World.</p>
> >>> </body>
> >>> <script>console.log("\nloaded")</script>
> >>> </html>
> >>> "HTML/;
> >>>
> >>> Entrees and desserts
> >>>
> >>> If we make good choices now (stay away from the oysters) we can still
> move on to other courses later.
> >>>
> >>> For instance; if we got up from the table with the ", """, ", """ set
> of delimiters, we could still introduce structured delimiters in the
> future; either with repeated (see Swift) or repeated ". We could also
> follow a suggestion John made to use a pseudo nonce like " for \\" or """"".
> >>>
> >>> Point being, we can work with a 85% solution now that we can
> supplement later when we're not so hangry.
> >>>
>
>
More information about the amber-dev
mailing list