String reboot (plain text)

Remi Forax forax at univ-mlv.fr
Sun Feb 10 23:09:03 UTC 2019


About the formatting rules,
we can reuse the doc comment trick to use a character to specify the alignment.

           String html = \"
                          " <html>
                          "   <body style="width: 100vw">
                          "     <p>Hello World.</p>
                          "   </body>
                          "   <script>console.log("loaded")</script>
                          " </html>
                          "\;

i think it makes the code more readable if the nonce is several characters

           String html = \"""
                          " <html>
                          "   <body style="width: 100vw">
                          "     <p>Hello World.</p>
                          "   </body>
                          "   <script>console.log("\nloaded")</script>
                          " </html>
                          """\;

Rémi

----- Mail original -----
> De: "Jim Laskey" <james.laskey at oracle.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Dimanche 10 Février 2019 19:10:03
> Objet: Re: String reboot (plain text)

> Focus
> =====
> 
> Instead of ordering everything on the menu and immobilizing ourselves with
> excessive gluttony, let’s focus our attention on the appetizer. If we plan
> correctly, we'll have room for entrees and desserts later.
> 
> The appetizer here is simplifying the injection of "foreign" language code into
> Java source. Think tapas. We may well be sated by the time we’re done.
> 
> 
> Goal
> ====
> 
> Repurposing the Java String as a "foreign" code literal seems to be the most
> natural and least intrusive contrivance for Java support. In fact, this is
> already the case. Example;
> 
>          // <html>
>          //   <body style="width: 100vw">
>          //       <p>Hello World.</p>
>          //   </body>
>          //   <script>console.log("\nloaded")</script>
>          // </html>
> 
>          String html = "<html>\n" +
>                        "  <body style=\"width: 100vw\">\n" +
>                        "        <p>Hello World.</p>\n" +
>                        "  </body>\n" +
>                        "  <script>console.log(\"\\nloaded\")</script>\n" +
>                        "</html>\n";
> 
> The primary reason we are having the string literal discussion is that the
> existing form has a few issues;
> 
>	• The existing form is difficult to maintain without support from IDEs and is
>	prone to error. The introduction and subsequent editing of foreign code
>	requires additional delimiters, newlines, concatenations and escape sequences
>	(DNCE).
> 
>	• More to the point, the existing form is difficult to read. The additional DNCE
>	obscure the underlying content of the string.
> 
> Our aim is to come up with a DNCE lexicon that improves foreign code literal
> readability and maintainability without leaving developers in a confused state;
> with emphasis on reducing the E (escape sequences.)
> 
> 
> 50% solution
> ============
> 
> Where we keep running into trouble is that a choice for one part of the lexicon
> spreads into the the other parts. That is, use of certain characters in the
> delimiter affect which characters require escaping and which characters can be
> used for escaping.
> 
> So, let's pick off the lexicon easy bits first. Newlines, concatenations and
> in-between delimiters can be implicit if we just allow strings to span multiple
> lines (see Rust.)
> 
>          String html = "<html>
>                           <body style=\"width: 100vw\">
>                                 <p>Hello World.</p>
>                          </body>
>                          <script>console.log(\"\\nloaded\")</script>
>                         </html>";
> 
> That's not so bad. If we did nothing else, we still would be better off than we
> were before.
> 
> 
> 75% solution, almost
> ====================
> 
> What problems are left?
> 
>	• The foreign delimiters (quotes) have to be escaped.
> 
>	• The foreign escape sequences also have to be escaped.
> 
>	• And to a lesser degree, it's difficult to locate the closing delimiter.
> 
> Fortunately, we don't have many choices for dealing with escapes;
> 
>	• Backslash is Java's escape character.
> 
>	• Either escaping is on or is off (raw), so we need a way to flag a string as
>	being escaped. We could have an option to turn escaping on/off within a string,
>	but it has been hard to come up with examples where this might be required.
> 
>	• Even with escaping off, we still might have to escape delimiters. Repeated
>	backslashes (or repeated delimiters) is the typical out.
> 
> How about trying \ as the flag for escapes off;
> 
>          String html = \"<html>
>                            <body style="width: 100vw">
>                                  <p>Hello World.</p>
>                            </body>
>                            <script>console.log("\nloaded")</script>
>                          </html>";
> 
> That doesn't work because it looks like the string ends at the first quote.
> Let's try symmetry, either \" or "\ as the closing delimiter. "\ is preferable
> because then it doesn't look like an escape sequence (see Swift.)
> 
>          String html = \"<html>
>                            <body style="width: 100vw">
>                                  <p>Hello World.</p>
>                            </body>
>                            <script>console.log("\nloaded")</script>
>                          </html>"\;
> 
>	• The only new string rule added is to allow multi-line strings.
> 
>	• Adding backslash before and after the string indicates escaping off.
> 
> 
> But wait
> ========
> 
> This looks like the 75% solution;
> 
>	• Builds on our cred with existing strings.
> 
>	• Escape processing is orthogonal to multi-line.
> 
>	• Delimiter can easily be understood to mean “string with escapes."
> 
> But wait. "\nloaded" looks like it contains the end delimiter. Rats!!! Captain
> we need more sequences.
> 
> And, this is the crux of all the debate around strings. Fixed delimiters imply a
> requirement for escape sequences, otherwise there is content you cannot express
> as a string.
> 
> The inverse of this implication is that if you have escape sequences you don't
> need flexible delimiters. This can be reinterpreted as you only need flexible
> delimiters if you want to always avoid escape sequences.
> 
> Wasn't avoiding escape sequences the goal?
> 
> All this brings us to the central choice we have to make before we get into the
> rest of the meal. Do we go with fixed delimiter(s), structured delimiters or
> nonce delimiters.
> 
> 
> Fixed delimiter
> ===============
> 
> If we go with a fixed delimiter then we limit the content that can be expressed
> without escape sequences. This is not totally left field. There are floating
> point values we can not express in Java and types we can express but not
> denote, such as anonymous class types, intersection types or capture types.
> 
> Everything is a degree of tradeoff. And, those tradeoffs are okay as long as we
> are explicit about it.
> 
> We could get closer to the 85% mark if we had a way to have " in our content
> without escaping. Let's introduce a secondary delimiter, """.
> 
>          String html = """<html>
>                             <body style="width: 100vw">
>                                   <p>Hello World.</p>
>                             </body>
>                             <script>console.log("\\nloaded")</script>
>                           </html>""";
> 
> The introduction of """ would allow " with the only restriction that we can not
> use """ in the content without escaping. We could say that """ also means
> escaping off, but then we would have no way to escape """ (\"""). Keeping
> escaping as an orthogonal issue allows the best of both worlds.
> 
>          String html = \"""<html>
>                              <body style="width: 100vw">
>                                    <p>Hello World.</p>
>                              </body>
>                              <script>console.log("\nloaded")</script>
>                            </html>"""\;
> 
> Once you take away conflicts with the delimiter, most strings do not require
> escaping.
> 
> Also at this point we should note that other combinations of quotes ('''. ```,
> "'") don't bring anything new to the table; Tomato/Tomato, Potato/Potato.
> 
> Summary: All strings can be expressed with fixed plus escaping, but can not
> express strings containing the fixed delimiter (""") with escaping off.
> 
> Jumping ahead: I think that stating that traditional " strings must be
> single-line will be a popular restriction, even if it not needed. Then they
> will think of """ as meaning multi-line.
> 
> 
> Structured delimiter
> ====================
> 
> A structured delimiter contains a repeating pattern that can be expanded to suit
> a scenario. We attempted to introduce this notion with the original backtick
> proposal, but that proposal was withdrawn because a) didn't want to burn the
> backtick, b) developers weren't comfortable with infinitely repeating
> delimiters, and c) non-expressible anomalies such as content with leading or
> trailing backticks.
> 
> Using " instead of backtick addresses a).
> 
>          String html = """"""<html>
>                                <body style="width: 100vw">
>                                  <p>Hello World.</p>
>                                </body>
>                                <script>console.log("\nloaded")</script>
>                              </html>"""""";
> 
> For b) is there a limit where developers would be comfortable? That is, what
> about a range of fixed delimiters; ", """, """", """"", """""". This is
> slightly different than fixed delimiters in that it increases the combinations
> of content containing delimiters. Example, """"" could allow ", """, """", ...,
> Nx" for N != 5.
> 
> Structured delimiters also differ from fixed delimiters in the fact that there
> is pressure to have escaping off when N >= 3. You can always fall back to a
> single ".
> 
> Summary: Can express all strings with and without escaping. If the delimiter
> length is limited the there there is still a (smaller) set of strings that can
> not be expressed.
> 
> 
> Nonce delimiter
> ===============
> 
> A nonce or custom delimiter allows developers to include a unique character
> sequence in the delimiter. This provides a flexible delimiter without fear of
> going too far. There is also the advantage/distraction of providing commentary.
> 
>          String html = \HTML"<html>
>                              <body style="width: 100vw">
>                                    <p>Hello World.</p>
>                              </body>
>                              <script>console.log("\nloaded")</script>
>                            </html>"HTML\;
> 
> Summary: Can express all strings with and without escaping, but nonce can affect
> readability.
> 
> 
> Multi-line formatting
> =====================
> 
> I left this out of the main discussion, but I think we can all agree that
> formatting rules should separate the delimiters from the content. Other details
> can be refined after choice of delimiter(s).
> 
>          String html = \"""
>                          <html>
>                            <body style="width: 100vw">
>                              <p>Hello World.</p>
>                            </body>
>                            <script>console.log("\nloaded")</script>
>                          </html>
>                        """\;
> 
>          String html = """"""
>                          <html>
>                            <body style="width: 100vw">
>                              <p>Hello World.</p>
>                            </body>
>                            <script>console.log("\nloaded")</script>
>                          </html>
>                        """""";
> 
>          String html = \HTML"
>                          <html>
>                            <body style="width: 100vw">
>                              <p>Hello World.</p>
>                            </body>
>                            <script>console.log("\nloaded")</script>
>                          </html>
>                        "HTML/;
> 
> 
> Entrees and desserts
> ====================
> 
> If we make good choices now (stay away from the oysters) we can still move on to
> other courses later.
> 
> For instance; if we got up from the table with the ", """, \", \""" set of
> delimiters, we could still introduce structured delimiters in the future;
> either with repeated \ (see Swift) or repeated ". We could also follow a
> suggestion John made to use a pseudo nonce like \5" for \\\\\" or \""""".
> 
> Point being, we can work with a 85% solution now that we can supplement later
> when we're not so hangry.
> 
> 
> 
> 
> 
> 
>> On Feb 10, 2019, at 12:30 PM, James Laskey <james.laskey at oracle.com> wrote:
>> 
>> I should know better than format e-mails. Many a backslash eaten. The summary
>> should be;
>> 
>>>> For instance; if we got up from the table with the ", """, \", \""" set of
>>>> delimiters, we could still introduce structured delimiters in the future;
>>>> either with repeated  \ (see Swift) or repeated ". We could also follow a
>>>> suggestion John made to use a pseudo nonce like \5" for \\\\\" or \""""".
>>>> 
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On Feb 10, 2019, at 11:43 AM, Jim Laskey <james.laskey at oracle.com> wrote:
>> 
>>> 
>>>> 
>>>> 
>>>> Focus
>>>> 
>>>> Instead of ordering everything on the menu and immobilizing ourselves with
>>>> excessive gluttony, let’s focus our attention on the appetizer. If we plan
>>>> correctly, we'll have room for entrees and desserts later.
>>>> 
>>>> The appetizer here is simplifying the injection of "foreign" language code into
>>>> Java source. Think tapas. We may well be sated by the time we’re done.
>>>> 
>>>> Goal
>>>> 
>>>> Repurposing the Java String as a "foreign" code literal seems to be the most
>>>> natural and least intrusive contrivance for Java support. In fact, this is
>>>> already the case. Example;
>>>> 
>>>>           // <html>
>>>>           //   <body style="width: 100vw">
>>>>           //       <p>Hello World.</p>
>>>>           //   </body>
>>>>           //   <script>console.log("\nloaded")</script>
>>>>           // </html>
>>>> 
>>>>           String html = "<html>\n" +
>>>>                         "  <body style=\"width: 100vw\">\n" +
>>>>                         "        <p>Hello World.</p>\n" +
>>>>                         "  </body>\n" +
>>>>                         "  <script>console.log(\"\\nloaded\")</script>\n" +
>>>>                         "</html>\n";
>>>> 
>>>> The primary reason we are having the string literal discussion is that the
>>>> existing form has a few issues;
>>>> 
>>>> 	• The existing form is difficult to maintain without support from IDEs and is
>>>> 	prone to error. The introduction and subsequent editing of foreign code
>>>> 	requires additional delimiters, newlines, concatenations and escape sequences
>>>> 	(DNCE).
>>>> 
>>>> 	• More to the point, the existing form is difficult to read. The additional DNCE
>>>> 	obscure the underlying content of the string.
>>>> 
>>>> Our aim is to come up with a DNCE lexicon that improves foreign code literal
>>>> readability and maintainability without leaving developers in a confused state;
>>>> with emphasis on reducing the E (escape sequences.)
>>>> 
>>>> 50% solution
>>>> 
>>>> Where we keep running into trouble is that a choice for one part of the lexicon
>>>> spreads into the the other parts. That is, use of certain characters in the
>>>> delimiter affect which characters require escaping and which characters can be
>>>> used for escaping.
>>>> 
>>>> So, let's pick off the lexicon easy bits first. Newlines, concatenations and
>>>> in-between delimiters can be implicit if we just allow strings to span multiple
>>>> lines (see Rust.)
>>>> 
>>>>           String html = "<html>
>>>>                            <body style=\"width: 100vw\">
>>>>                                  <p>Hello World.</p>
>>>>                           </body>
>>>>                           <script>console.log(\"\\nloaded\")</script>
>>>>                          </html>";
>>>> 
>>>> That's not so bad. If we did nothing else, we still would be better off than we
>>>> were before.
>>>> 
>>>> 75% solution, almost
>>>> 
>>>> What problems are left?
>>>> 
>>>> 	• The foreign delimiters (quotes) have to be escaped.
>>>> 
>>>> 	• The foreign escape sequences also have to be escaped.
>>>> 
>>>> 	• And to a lesser degree, it's difficult to locate the closing delimiter.
>>>> 
>>>> Fortunately, we don't have many choices for dealing with escapes;
>>>> 
>>>> 	• Backslash is Java's escape character.
>>>> 
>>>> 	• Either escaping is on or is off (raw), so we need a way to flag a string as
>>>> 	being escaped. We could have an option to turn escaping on/off within a string,
>>>> 	but it has been hard to come up with examples where this might be required.
>>>> 
>>>> 	• Even with escaping off, we still might have to escape delimiters. Repeated
>>>> 	backslashes (or repeated delimiters) is the typical out.
>>>> 
>>>> How about trying  as the flag for escapes off;
>>>> 
>>>>           String html = \"<html>
>>>>                             <body style="width: 100vw">
>>>>                                   <p>Hello World.</p>
>>>>                             </body>
>>>>                             <script>console.log("\nloaded")</script>
>>>>                           </html>";
>>>> 
>>>> That doesn't work because it looks like the string ends at the first quote.
>>>> Let's try symmetry, either " or " as the closing delimiter. " is preferable
>>>> because then it doesn't look like an escape sequence (see Swift.)
>>>> 
>>>>           String html = \"<html>
>>>>                             <body style="width: 100vw">
>>>>                                   <p>Hello World.</p>
>>>>                             </body>
>>>>                             <script>console.log("\nloaded")</script>
>>>>                           </html>"\;
>>>> 
>>>> 	• The only new string rule added is to allow multi-line strings.
>>>> 
>>>> 	• Adding backslash before and after the string indicates escaping off.
>>>> 
>>>> But wait
>>>> 
>>>> This looks like the 75% solution;
>>>> 
>>>> 	• Builds on our cred with existing strings.
>>>> 
>>>> 	• Escape processing is orthogonal to multi-line.
>>>> 
>>>> 	• Delimiter can easily be understood to mean “string with escapes."
>>>> 
>>>> But wait. "" looks like it contains the end delimiter. Rats!!! Captain we need
>>>> more sequences.
>>>> 
>>>> And, this is the crux of all the debate around strings. Fixed delimiters imply a
>>>> requirement for escape sequences, otherwise there is content you cannot express
>>>> as a string.
>>>> 
>>>> The inverse of this implication is that if you have escape sequences you don't
>>>> need flexible delimiters. This can be reinterpreted as you only need flexible
>>>> delimiters if you want to always avoid escape sequences.
>>>> 
>>>> Wasn't avoiding escape sequences the goal?
>>>> 
>>>> All this brings us to the central choice we have to make before we get into the
>>>> rest of the meal. Do we go with fixed delimiter(s), structured delimiters or
>>>> nonce delimiters.
>>>> 
>>>> Fixed delimiter
>>>> 
>>>> If we go with a fixed delimiter then we limit the content that can be expressed
>>>> without escape sequences. This is not totally left field. There are floating
>>>> point values we can not express in Java and types we can express but not
>>>> denote, such as anonymous class types, intersection types or capture types.
>>>> 
>>>> Everything is a degree of tradeoff. And, those tradeoffs are okay as long as we
>>>> are explicit about it.
>>>> 
>>>> We could get closer to the 85% mark if we had a way to have " in our content
>>>> without escaping. Let's introduce a secondary delimiter, """.
>>>> 
>>>>           String html = """<html>
>>>>                              <body style="width: 100vw">
>>>>                                    <p>Hello World.</p>
>>>>                              </body>
>>>>                              <script>console.log("\\nloaded")</script>
>>>>                            </html>""";
>>>> 
>>>> The introduction of """ would allow " with the only restriction that we can not
>>>> use """ in the content without escaping. We could say that """ also means
>>>> escaping off, but then we would have no way to escape """ (\"""). Keeping
>>>> escaping as an orthogonal issue allows the best of both worlds.
>>>> 
>>>>           String html = \"""<html>
>>>>                               <body style="width: 100vw">
>>>>                                     <p>Hello World.</p>
>>>>                               </body>
>>>>                               <script>console.log("\nloaded")</script>
>>>>                             </html>"""\;
>>>> 
>>>> Once you take away conflicts with the delimiter, most strings do not require
>>>> escaping.
>>>> 
>>>> Also at this point we should note that other combinations of quotes ('''. ```,
>>>> "'") don't bring anything new to the table; Tomato/Tomato, Potato/Potato.
>>>> 
>>>> Summary: All strings can be expressed with fixed plus escaping, but can not
>>>> express strings containing the fixed delimiter (""") with escaping off.
>>>> 
>>>> Jumping ahead: I think that stating that traditional " strings must be
>>>> single-line will be a popular restriction, even if it not needed. Then they
>>>> will think of """ as meaning multi-line.
>>>> 
>>>> Structured delimiter
>>>> 
>>>> A structured delimiter contains a repeating pattern that can be expanded to suit
>>>> a scenario. We attempted to introduce this notion with the original backtick
>>>> proposal, but that proposal was withdrawn because a) didn't want to burn the
>>>> backtick, b) developers weren't comfortable with infinitely repeating
>>>> delimiters, and c) non-expressible anomalies such as content with leading or
>>>> trailing backticks.
>>>> 
>>>> Using " instead of backtick addresses a).
>>>> 
>>>>           String html = """"""<html>
>>>>                                 <body style="width: 100vw">
>>>>                                   <p>Hello World.</p>
>>>>                                 </body>
>>>>                                 <script>console.log("\nloaded")</script>
>>>>                               </html>"""""";
>>>> 
>>>> For b) is there a limit where developers would be comfortable? That is, what
>>>> about a range of fixed delimiters; ", """, """", """"", """""". This is
>>>> slightly different than fixed delimiters in that it increases the combinations
>>>> of content containing delimiters. Example, """"" could allow ", """, """", ...,
>>>> Nx" for N != 5.
>>>> 
>>>> Structured delimiters also differ from fixed delimiters in the fact that there
>>>> is pressure to have escaping off when N >= 3. You can always fall back to a
>>>> single ".
>>>> 
>>>> Summary: Can express all strings with and without escaping. If the delimiter
>>>> length is limited the there there is still a (smaller) set of strings that can
>>>> not be expressed.
>>>> 
>>>> Nonce delimiter
>>>> 
>>>> A nonce or custom delimiter allows developers to include a unique character
>>>> sequence in the delimiter. This provides a flexible delimiter without fear of
>>>> going too far. There is also the advantage/distraction of providing commentary.
>>>> 
>>>>           String html = \HTML"<html>
>>>>                               <body style="width: 100vw">
>>>>                                     <p>Hello World.</p>
>>>>                               </body>
>>>>                               <script>console.log("\nloaded")</script>
>>>>                             </html>"HTML\;
>>>> 
>>>> Summary: Can express all strings with and without escaping, but nonce can affect
>>>> readability.
>>>> 
>>>> Multi-line formatting
>>>> 
>>>> I left this out of the main discussion, but I think we can all agree that
>>>> formatting rules should separate the delimiters from the content. Other details
>>>> can be refined after choice of delimiter(s).
>>>> 
>>>>           String html = \"""
>>>>                           <html>
>>>>                             <body style="width: 100vw">
>>>>                               <p>Hello World.</p>
>>>>                             </body>
>>>>                             <script>console.log("\nloaded")</script>
>>>>                           </html>
>>>>                         """\;
>>>> 
>>>>           String html = """"""
>>>>                           <html>
>>>>                             <body style="width: 100vw">
>>>>                               <p>Hello World.</p>
>>>>                             </body>
>>>>                             <script>console.log("\nloaded")</script>
>>>>                           </html>
>>>>                         """""";
>>>> 
>>>>           String html = \HTML"
>>>>                           <html>
>>>>                             <body style="width: 100vw">
>>>>                               <p>Hello World.</p>
>>>>                             </body>
>>>>                             <script>console.log("\nloaded")</script>
>>>>                           </html>
>>>>                         "HTML/;
>>>> 
>>>> Entrees and desserts
>>>> 
>>>> If we make good choices now (stay away from the oysters) we can still move on to
>>>> other courses later.
>>>> 
>>>> For instance; if we got up from the table with the ", """, ", """ set of
>>>> delimiters, we could still introduce structured delimiters in the future;
>>>> either with repeated  (see Swift) or repeated ". We could also follow a
>>>> suggestion John made to use a pseudo nonce like " for \\" or """"".
>>>> 
>>>> Point being, we can work with a 85% solution now that we can supplement later
>>>> when we're not so hangry.


More information about the amber-spec-experts mailing list