Wrapping up the first two courses
Brian Goetz
brian.goetz at oracle.com
Mon Apr 22 13:15:19 UTC 2019
> The main thing Brian is waiting for, though, is not lots of new ideas,
> but rather a consensus that (a) we can treat leading whitespace outside
> of a given rectangle as syntax-not-payload (thus stripped), and (b) that
> we should provide a way for programmers to opt out of the stripping
> (making all space into syntax-and-payload). It feels to me like we
> have arrived there and are driving around the parking lot, checking
> out all the parking spots, worrying that we will miss the best one.
Glad to hear it :)
So, I posit, we have consensus over the following things:
- Multi-line strings are a useful feature on their own
- Using “fat” delimiters for multi-line strings is practical and intuitive
- Multi-line string literals share the same escape language as single-line string literals
- Newlines in MLSLs should be normalized to \n
- There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out)
- To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again
- In the common case, a MLSL will be a combination of some intended and some incidental indentation, and it is reasonable for the default to be that the language attempts to normalize away the incidental indendentation
- There needs to be an opt-out, for the cases where alignment is not the default the user wants
(A useful way to frame the discussion we had regarding linguistic alignment is: whether a string literal is “one dimensional” or “two dimensional.” The 1D interpretation says a string literal is just a sequence of characters between two delimiters; the 2D interpretation says that it has an inherent line structure that could be manipulated directly.)
What I like about this proposal — much more than with the previous round — is that the two flavors of string literal (thin and fat) are clearly projections of the same feature, and their differences pertain solely to their essential difference — multi-line-ness.
I will leave it to Jim to summarize the current state of the alignment algorithm, and any open questions (e.g., closing delimiter influence, treatment of single-line strings, etc) that may still be lingering, but these are not blockers to placing our order for the first two courses.
I am still having a hard time getting comfortable with Guy’s proposal to use more “envelope” here — I think others have expressed similar discomfort. If I had to put my finger on it, it is that being able to cut and paste in and out is such a big part of what is currently missing, and there is insufficient trust that there would be ubiquitous IDE support in all the various ways that people edit Java code. But given that this is framed as “let’s carve out some extra envelope space”, we can keep discussing this even as we move forward.
We still need to make some decisions on syntax; the main one that is currently relevant being opt-out. (For any syntax issues, please create another thread.) Jim hinted at this earlier: use an escape sequence that is stripped out of the string but means “no alignment.” Something like:
String s = “"“\-
Leave me just the way
you found me”””
Obviously there is room to argue over the specific escape sequence, so let’s put this in the “open questions” bucket.
There was another proposal, which was to use a prefix character:
String s = a”…” // opt into alignment
String s = r”…” // raw string
I’d like to put this one to bed quickly, because I see it as having a number of issues.
Having a set of prefix characters is one of those features that starts off weak and scales badly from there :). With only two prefixes, as suggested above, it has a feel of overgeneralization, but with a large number of candidate prefixes, it gets worse, because invariably as such a feature gets more complicated, there are interactions. One need look only at a Perl regex that uses multiple modifiers:
/foo*/egimosx
to realize that what started as a simple feature (I think initially just `g`) had grown out of control.
More importantly, of the two prefixes suggested, one doesn’t really make sense. And that is: while the notion of “raw” string is attractive, one of the things that tripped us up the first time around is the believe that “raw” is a binary thing. In reality, raw-ness comes in degrees — how hard you have to work to break out of the “string of uninterpreted characters” mode. (Note: please let’s not start a discussion on raw strings; we’re wrapping up our orders for the first courses now. I raise this only to put to bed a syntax choice predicated on the assumption that raw-ness is a binary characteristic.).
If we’re pursuing align-by-default, we should consider a different name for the align() method; the name was originally chosen as a compromise when there was no align-by-default, and most of the other names were too long to ask people to type routinely. If alignment is the default, the explicit name can be more descriptive.
So, next steps:
- Jim to write up current details of alignment algorithm, with current open issues;
- Remaining bike sheds on opt-out and naming of align()
Once 1/1a are in the pipe, we can consider whether we want to move ahead to raw strings.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190422/399a0739/attachment.html>
More information about the amber-spec-experts
mailing list