String literals: some principles

Guy Steele guy.steele at oracle.com
Mon Apr 29 15:48:05 UTC 2019


> On Apr 28, 2019, at 4:32 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> . . .
> Looking ahead to the next round, we can build on this.  In the first round, we mistakenly thought that there was something that could reasonably be called a “raw” string, but this notion is a fantasy; no string literal is so raw that it can’t recognize its closing delimiter.  So “rawness” is really only a matter of degree.  

This is _almost_ true.  If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content.

Put another way: one cannot determine how long the raw content is by examining it.  That’s a solid principle.

But there are other ways of determining how long it is.  All have this property in common: you have to know how long the content is before you begin to scan it.  And this leads to an obvious solution: you need a count of bytes up front.

The design currently under consideration can easily accommodate this, now or in the future: a raw string is an opening delimiter, then a byte count (say, expressed as a decimal integer), then a LineTerminator , then as many bytes as the count indicated, then a LineTerminator, then a closing delimiter (the last two are not really needed, but they look nice, satisfy user expectations, and provide some redundancy to help make sure the byte count was correct).

Examples:

	String PrintableAscii = “””95
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
“””;                // no need to worry about that embedded backslash

	String LotsaQuotes = “””50
“””””””””"“””””””””"“””””””””"“””””””””"“”””””””””
“””;                // the payload cannot be confused with the closing delimiter

	String LineNoise = “””16
	
”””;                // I pasted in ^H^I^J^K^L^M^N^O^P^Q^R^S^T^U^V^W here—not sure how it will render in your mail reader

The syntax could be further adjusted in arbitrary ways for added clarity: for example

	String PrintableAscii = “””RAW DATA (95 bytes):
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
“””;

Presumably an IDE could help you make sure the byte count is correct.

—Guy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190429/e812b3e1/attachment.html>


More information about the amber-spec-experts mailing list