String reboot - (1a) incidental whitespace
Stephen Colebourne
scolebourne at joda.org
Sat Apr 13 21:54:47 UTC 2019
This is an impressively long email to respond to. My TLDR is that
aligned strings are good and perfectly suitable to be in the language,
but I have a subtly harsher design centre for the feature, which is
fine as I also believe they *must* be opt-in.
On Wed, 10 Apr 2019 at 16:25, Jim Laskey <james.laskey at oracle.com> wrote:
> Line terminators & Whitespace
I agree with EOL normalization - no one wants to keep Windows line
endings. But I think the aligned strings feature should go a littel
bit further. There is a unix principle that a text file consists of a
number of lines, each ending in \n, and that any other format is is
binary. My POV is that alignment should only produce results that
match that principle. I think the benefits of this slightly harsher
definition are in the simplicity of the rules. It also means that
certain of the example cases become errors, which is absolutely fine
as the feature is opt-in.
I don't believe that all multi-line strings should be aligned as there
are highly likely to be cases where aligning is problematic, or the
rules are close but don't quite match. Opting in allows users to treat
normal multi-line strings as being completely without surprises, while
keeping the aligned strings available as confidence grows.
My rules for an opt-in aligned multi-line string:
- opening delimiter must be followed by newline (error if not)
- if the closing delimiter is on a line with non-whitespace content, a
newline is added to complete the line
- if the closing delimiter is at the end of a whitespace-only line,
that line is not part of the result
- thus, it is equivalent as to whether the closing delimiter is at the
end of a content line or on the following empty line
- content is left justified while preserving relative indentation
- the closing delimiter has no effect on the left justification
> Examples
Of the examples, a, b c, d, e and g are identical. The position of the
closing delimiter has no effect on the content - it always ends in a
newline (as per the unix principle). Note in particular that g still
ends with a newline with my rules.
Examples h and i have text after the opening delimiter. With my rules,
these would be a compile error (since alignment is opt-in, this would
never be a real problem).
Example f has two leading blank lines and two trailing ones. These are
retained. The only subtlety with the closing delimiter is whether it
has non-whitespace text on the same line or not - if it has content a
newline is added otherwise the line is ignored. In this case, there is
no content on the final line so the line of the closing delimiter is
ignored.
Example j has text after the opening delimiter in the second
multi-line string, so it would be a compile error. The first
multi-line string would also end in a newline, so the output would
fail to match the user's intentions. As has sort of been concluded a
number of times, multi-line strings do not mix well with string
concatenation. Example k shows the correct way to handle situations
like this.
> As we can see, there were a lot of cases where the user _probably_ wanted one thing, but _might have_ wanted another. What control knobs do we have, that we could assign meaning to, that would let the user choose either way?
No control knobs for the user. The alignment is opt-in and should have
a fixed set of rules guaranteed (with compile errors) to produce a
unix text file (all lines ending with a newline).
> - Do we allow content on the same lines as the delimiters?
> - Should we always add a final newline?
No at the start, yes at the end with newline added
> - Should we strip blanks lines? Only on the first and last? All leading and trailing?
No
> - How do we interpret auto-alignment on single-line strings? Strip?
Compile error
> - Should we right strip lines?
Probably, yes.
> And some syntax choices (not to be discussed now):
> - How do we indicate opt-out?
Opt-in. Whatever is chosen, the rules will have edges. The simple """
delimiter must not be aligned, there are simply too many pitfalls to
that approach. I've mentioned my preferred syntax for orthogonally
opting-in to multiple features before, so won't repeat it here now.
> Examples narrative.
> From this, we recommend that multi-line string fat delimiters should follow the brace pattern used in array initialization, lambdas and other Java constructs. The open delimiter should end the current line. Content follows on separate lines, indented one level. The close delimiter starts a new line, back indented one level, followed by the continuation of enclosing expression.
> int[] ia = new int[] {
> 1,
> 2,
> 3
> };
In my experience it is also quite common for arrays to be formatted
with the closing brace on the same line as the last element:
int[] ia = new int[] {
1,
2,
3};
The two ways to format the arrays are equivalent, and I think the same
should be true for aligned multi-line strings - they should be
absolutely equivalent:
String d = """
+--------+
| text |
+--------+
""";
String d = """
+--------+
| text |
+--------+""";
Both must produce the same three lines:
+--------+\n
| text |\n
+--------+\n
> Note that fat delimiters can be used on single lines. What are the semantics for auto-alignment in that case? The question of stripping whitespace and newlines is not really about alignment. It's about what are the rules for handling incidental characters in a fat delimiter string.
Alignment is a multi-line problem, so single line alignment is a
compile error. This is no problem when the feature is opt-in. (And
stripping whitespace from the ends of a single-line literal is a daft
use case anyway ;-)
> So the question here is: should it be possible to specify "extra" indentation through the positioning of quotes
No.
Finally, all of the above outlines what my view of opt-in aligned
multi-line strings is, but it doesn't discuss the default/normal
multi-line string case. IMO that is simply a multi-line variant of the
existing string literal, with the only extra rule being the
normalization of line endings. ie. everything is preserved exactly as
entered apart from escapes and newlines.
thanks
Stephen
More information about the amber-dev
mailing list