[raw-strings] Indentation problem
Brian Goetz
brian.goetz at oracle.com
Mon Feb 5 15:53:41 UTC 2018
Sorry for the delay getting back to this.
> Hello!
>
> Every language which implements the multiline strings has problems
> with indentation.
Indeed. The fundamental problem here is that the indentation of
embedded snippets is serving two masters; the nesting of the surrounding
code, and the snippet itself. Sometimes the user cares about one;
sometimes the other, and there's no one-size-fits-all set of rules that
any language has come up with that doesn't make both camps happy.
Sometimes it doesn't really matter; a few extra spaces in an HTML
document or SQL query is often an acceptable price to pay for
clean-looking code. But sometimes it does matter. Which raises two
questions:
- What should programmers do?
- What should the language help them do?
> E.g. consider something like this:
So, in light of the above questions, let's ask: is this the right way to
generate a HTML document? It not only has "holes" to be filled in, but
it has entire sections whose presence or absence depends on state. I
think the mess of this example goes far deeper than indentation. (But
yes, people will write code like this, with whatever tools we give
them.) To the second question, what should the language do to help this
code? Some would say "of course, the problem is you don't support
interpolation." But as this example shows, interpolation only helps
with the trivial bits; it doesn't help with the conditional inclusion,
so it only gets you a small part of the way to this example. For that,
you either need something with more structure, or a templating engine,
or a builder, or one of the zillion other tools we've invented for this
sort of thing.
So, without ignoring your fundamental question about indentation, I'll
just point out that this example is about way more than indentation, and
move on ...
> Now we have broken formatting in the generated HTML, which ruins the
> idea of multiline strings
I think "multiline strings" (or even "raw strings") are a bit of a
misleading name. What we're going for here is the ability to embed an
arbitrary snippet of a "program" (shell script, SQL query, JSON doc) in
a Java program, without having to mangle the embedded snippet. This
enhances readability (not mucked up with escapes and extra quotes) and
reduces errors (because you can just cut and paste that snippet of
script from the editor in which you've probably already written it,
without risking breaking it via syntactic mangling.) But, as you say,
there are issues with indentation, when it matters. (Surely it matters
for snippets of python.)
Secondarily, the design center for this feature is: _short_ snippets --
those for which putting them in a separate document would be
obfuscatory. To see this, we have to approach it from both sides. On
the short side, imagine Java didn't have string literals at all. Having
to read "yes" and "no" out of a file would be ridiculously obfuscatory;
eliminating this indirection makes code easier to read and less
error-prone. But on the long side, using raw strings to embed a
million-line snippet in a Java program is also ridiculous; it would be
far easier for maintainers of both the Java part and the embedded part
to have their own uniform artifacts to maintain. So the sweet spot for
this feature is somewhere in the middle -- snippets that are short
enough that indirecting to a file impairs readability, but not so long
that there's any question where the embedded snippet ends and the Java
code resumes. (Subjectively, I'd say that this sweet spot is in the
5-10 line range.)
> (why bother to generate \n in output HTML if
> it looks like a mess anyways?) Moreover, the structure of Java program
> now affects the output. E.g. if you add several more nested "if" or
> "switch" statement, you will need to indent <p> even more.
My answer to those people is: then don't do that ;) They're already
well outside the design center (as outlined above). They should be
using a templating mechanism, a builder, or something else to decouple
the static content from the dynamic content. Of course, they will, but
I'm not sure bending over backwards to accomodate them is the winning move.
> Many languages provide library methods to handle this.
Good, now we're back to indentation. All things being equal, it is
better to do things in libraries than in the language; it is cheaper,
more flexible, faster to market, less risky, and can support a broader
range of preferences (you can have different libraries for different
preferences.) So I like this direction.
> E.g.
> trimIndent() could be provided to remove leading spaces of every line,
> but this would kill the HTML indents at all. Another possibility is to
> provide a method like trimMargin() on Kotlin [1] which trims all
> spaces before a special character (pipe by default) including a
> special character itself.
Now that we're in library world, we can have _all_ of these. We can
trim indents to the first indent, or trim a specified number of spaces
off, or trim to a user-selected marker. And if the users don't like the
ones we include, they can write their own.
> This is almost nice. Even without syntax highlighting you can easily
> distinguish between Java code and injected HTML code, you can indent
> Java and HTML independently and HTML code does not clash with Java
> code structure.
Pushing this to a library gives users the option, but not the obligation
to do this. That's good.
> The only problem is the necesity to call the
> trimMargin() method.
For some meaning of "only" :) Like most syntactic conventions, some
users will say "this is great" and others will say "yuck". I prefer the
semantic transparency of calling a method that has a clear specification
-- especially when there are multiple possible options.
Remember that we're already in a corner case with respect to indentation
-- in many cases, the users don't care at all about the extra spaces,
they're just building up a SQL query that is going to be sent to a
database, and the database doesn't care either.
> This means that original line is preserved in the
> bytecode and during runtime and the trimming is processed every time
> the method is called causing performance and memory handicap. This
> problem could be minimized making trimMargin() a javac intrinsic.
There are multiple layers at which this can be optimized (the JIT may be
able to observe that this a pure function applied to a constant), but
indeed, this is a great candidate for compile-time constant folding.
(You can even see experiments related to compile-time constant folding
going on in the condy-folding branch of the amber repo.) Note too that
we're now in corner-case-of-corner-case territory -- those who care
about the indentation and the cost of runtime string processing.
> Hoever even in this case it would be hard to enforce usage of this
> method and I expect that tons of hard-to-read Java code will appear in
> the wild, despite I believe that Java is about readability.
Developers ability to combine simple features to produce unreadable code
far outstrips the ability of language designers to do anything about it ...
> So I propose to enforce such (or similar) format on language level
> instead of adding a library method like "trimMargin()".
I think this would be a language design mistake. This is taking one
arbitrary convention and burning it into the language. That convention
might be fine for some situations, but terrible for others; not only
might it not be the most readable choice in all cases, but it could be
an actual conflict -- what if the | character is meaningful in the
embedded language, such as Markdown tables? Now we're back to escaping
-- which we were trying to avoid.
The language shouldn't pick favorites here; it should provide a simple,
clear mechanism, which can be usefully composed with other mechanisms to
get the job done. Polluting the language to avoid the method call is a
bad trade.
> I see some advantages with such syntax:
> 1. You can comment (or comment out!) a part of multiline string
> without terminating it
Rather than framing this as a property of a proposed solution, let's
frame it as a question. What should be the interaction with comments in
a raw string? Should you be able to embed comments? Should you be able
to comment lines out? (Note that many languages support comments, so it
may be possible to do this by embedding a comment, rather than using the
Java-level commenting.) While I can surely see the utility of
interaction with commenting, I also think that these "requirements" are
only in play when the string in question is too long in the first place.
> 2. Looking into code fragment out of context (e.g. diff log) you
> understand that you are inside a multiline literal.
> reviewing a diff like
>
> | x++;
> + | if (x == 10) break;
> | foo(x);
>
> Without pipes you could think that it's Java code without any further
> consideration.
This is true, but this is also true of large block comments; you can't
tell whether the added line is part of a commented out block or of
executable code.
Again, with raw strings, this is more of a problem when used with
too-long blocks.
So, there are two things I don't like about this proposal: it's too
"opinionated", and at the same time, it loses the fundamental goal we
were trying to get to -- not having to muck up an embedded block with
escaping. (Sure, IDEs could (and should) help on pasting here, but that
only helps writing, not reading.)
> The only disadvantage I see in forcing a pipe prefix is inability to
> just paste a big snippet from somewhere to the middle of Java program
> in a plain text editor.
As mentioned, we think this is most of the point, so this is a pretty
big disadvantage indeed.
More information about the amber-spec-experts
mailing list