Escape Sequences For Managing Whitespace (Preview)
John Rose
john.r.rose at oracle.com
Wed Aug 14 20:55:38 UTC 2019
On Aug 13, 2019, at 10:29 AM, John Rose <john.r.rose at oracle.com> wrote:
>
> So I suggest deleting all of <LWS* \ LT LWS*> where LWS is either space or tab, unescaped of course.
>
Just to be clear about my suggestion for stripping additional incidental whitespace before
and/or after <\ LT>, here’s an example for using a rule that expand stripping:
String story = """
"When I use a word," Humpty Dumpty said, \
in rather a scornful tone, "it means just what I \
choose it to mean - neither more nor less.”
"The question is," said Alice, "whether you \
can make words mean so many different things."
"The question is," said Humpty Dumpty, \
"which is to be master - that's all."
""";
Here, all the spaces *before* the backslashes are treated as incidental. They are beneficial
in allowing the programmer to lay them out separately from the content. The dots show
which spaces are incidental (in both existing rules and under my suggestion)l:
String story = """
...."When I use a word," Humpty Dumpty said,.........\
.... in rather a scornful tone, "it means just what I.\
.... choose it to mean - neither more nor less.”
...."The question is," said Alice, "whether you......\
.... can make words mean so many different things."
...."The question is," said Humpty Dumpty,...........\
.... "which is to be master - that's all."
....""";
Here is the same content but without the extra incidental spaces, which conforms to the current
proposal by Jim:
String story = """
"When I use a word," Humpty Dumpty said, \
in rather a scornful tone, "it means just what I \
choose it to mean - neither more nor less.”
"The question is," said Alice, "whether you \
can make words mean so many different things."
"The question is," said Humpty Dumpty, \
"which is to be master - that's all."
""";
Equivalently, and a bit more legibly, and also within the bounds of Jim's proposal:
String story = """
"When I use a word," Humpty Dumpty said,\
in rather a scornful tone, "it means just what I\
choose it to mean - neither more nor less.”
"The question is," said Alice, "whether you\
can make words mean so many different things."
"The question is," said Humpty Dumpty,\
"which is to be master - that's all."
""";
The intended content is, of course:
String story = """
"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master - that's all."
""";
Incidental space could also be stripped after <\ LT> as well as before. I think that the value of aligning
the backslashes is the main benefit here, so stripping *before* <\ LT> is more important than after.
But stripping incidentals after would allow the programmer to lay out continuation lines at a different
left margin, which can also be a help to readability, although the <\ s> sequences undo the benefit:
String story = """
"When I use a word," Humpty Dumpty said, \
\sin rather a scornful tone, "it means just what I \
\schoose it to mean - neither more nor less.”
"The question is," said Alice, "whether you \
\scan make words mean so many different things."
"The question is," said Humpty Dumpty, \
\s"which is to be master - that's all."
""";
So I’ll amend my suggestion to something a little trickier: <LWS* \ LT> is stripped
as incidental whitespace, but in the case of <LWS* \ LT LWS* LWS>, the final character
of linear whitespace (aka horizontal whitespace) is significant content, not incidental.
This allows the obscuring <\ s> to be removed in the example:
String story = """
"When I use a word," Humpty Dumpty said, \
in rather a scornful tone, "it means just what I \
choose it to mean - neither more nor less.”
"The question is," said Alice, "whether you \
can make words mean so many different things."
"The question is," said Humpty Dumpty, \
"which is to be master - that's all."
""";
Here is the extra incidental space marked as dots:
String story = """
...."When I use a word," Humpty Dumpty said,.........\
........ in rather a scornful tone, "it means just what I.\
........ choose it to mean - neither more nor less.”
...."The question is," said Alice, "whether you......\
........ can make words mean so many different things."
...."The question is," said Humpty Dumpty,...........\
........ "which is to be master - that's all."
....""";
The suggested rule allows the author to adjust the broken line endings for maximum readability,
and the elision of the incidental space flows with them.
Meanwhile, if the content has no spaces (perhaps it’s some kind fo binary resource) the user
cannot indent the continuation lines, but can still space out the backslash into a proudly separate
position from the content:
String hexData = """
000102030405060708090a0b0c0d0e0f101112131415 \
161718191a1b1c1d1e1f202122232425262728292a2b2c \
2d2e2f303132333435363738393a3b3c3d3e3f40 \
4142434445464748494a4b4c4d4e4f505152535455 \
""";
assert !hexData.contains(" ");
The basic motivation should be clear: Give users tools to lay out delimiters clearly separate
from content. Whether it’s worth adding the extra twist to the rule about <\ LT> I will leave for
others to decide.
HTH.
— John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190814/1312a7ed/attachment-0001.html>
More information about the amber-spec-experts
mailing list