stripIndent() behavior
Éamonn McManus
emcmanus at google.com
Tue Apr 10 21:48:07 UTC 2018
> > 3. If the input contains *any* tab characters at all (except any that
are
> > part of the trailing whitespace), then this method cannot know that it
> > isn't jumbling the end result, and maybe it should just throw.
> I think there's a middle ground, where it strips any common whitespace
> prefix. So if every line starts with tab-tab-space-space, it can safely
> strip this.
I'm afraid that's not true. In practice if you are using tabs at all it is
very easy in many editors to end up with a mix of spaces and tabs. So you
could easy have (with 8-space tabs) one line that has 3 tabs at the start,
and another that has 2 tabs and 8 spaces. For example with Emacs you can
get this just by hitting delete after a tab and then hitting space. You
would nevertheless want stripIndent to remove the indentation from both
lines, since they look identical.
The situation is made worse by the fact that there are two common
conventions for tab width, 4 or 8.
I think the only way to avoid these problems is for stripIndent to throw if
its argument has any tabs, or at least any tabs in leading whitespace, and
provide a separate method `detab` whose argument says what the width of a
tab stop is. (Just to be sure: this method should arrange for tab stops to
be at positions 4N or 8N, where the first column is column 0. So a tab can
expand into 1 to 4 spaces, or 1 to 8 spaces.)
Then users operating in tab-free codebases can just write .stripIndent(),
and users in tab-infected codebases can write .detab(4).stripIndent().
The alternative is to expose novice users to many hours of exasperation.
Tabs are generally invisible, so you can imagine someone trying to figure
out why two lines that look exactly the same ended up treated differently.
Users may not even know there is such a thing as a tab character. (If
stripIndent throws, it should have a helpful message that suggests calling
detab(N) and that the value of n should probably be 4 or 8.)
> String asciiArtFTW =
> `````````
> ` BOO `
> `````````.trimMarkers("`", "`");
I'm not sure I get that. It doesn't correspond to anything I've ever wanted
to do, even in languages that already have multiline strings. At least,
could we have an overload that just takes the starting marker, for the
overwhelmingly commoner case where you only want to strip at the start?
--
Éamonn
On Tue, 10 Apr 2018 at 13:50, Brian Goetz <brian.goetz at oracle.com> wrote:
> > (now stripIndent)
> >
> > I've accumulated a few questions/comments on this.
> >
> > 1. When choosing the amount to trim, it ought to ignore blank lines and
> > only-whitespace lines, right?
> Seems right.
> > 2. Is it really appropriate to automatically remove trailing whitespace?
> I'm not sure about this either. The reason that RSLs will have "extra"
> whitespace that needs to be stripped is that we want to indent the RSL
> snippet relative to the Java code (and as you point out, the IDE may do
> that automatically for us.) But if there's trailing whitespace, its
> because the user put it there, and who is it hurting? It might be
> significant.
> > 3. If the input contains *any* tab characters at all (except any that
are
> > part of the trailing whitespace), then this method cannot know that it
> > isn't jumbling the end result, and maybe it should just throw.
> I think there's a middle ground, where it strips any common whitespace
> prefix. So if every line starts with tab-tab-space-space, it can safely
> strip this.
> > 5. If we do end up in a world where we have to call this for almost
every
> > one of our tens of thousands of multi-line RSLs... is it strange that I
> > feel like I would prefer it was static? It seems like it would look a
lot
> > more normal that way visually. Ugh...
> I think this is likely to vary subjectively a lot. Some people like
> that the instance method is mostly out of the way; others like the
> up-front shouting of the static method.
> The reason we can't have both is then we can't resolve the method
> reference String::strip as a Function<String,String>, which seems a
> useful thing to do.
> > On top of *that*, I have no idea what "right markers" are good for, nor
> > what customizing the marker choice is good for (other than creating more
> > needless variation between different pieces of code).
> >
> String asciiArtFTW =
> `````````
> ` BOO `
> `````````.trimMarkers("`", "`");
More information about the core-libs-dev
mailing list