stripIndent() behavior

Brian Goetz brian.goetz at oracle.com
Tue Apr 10 22:27:38 UTC 2018


I think this is "throwing" the baby out with the bathwater.  It is 
punishing those who can use tabs responsibly for the sins of those who 
cannot.

You have to commit three sins before you have a problem:
  - using tabs at all
  - using tabs inconsistently across the lines of a single expression;
  - using tabs after you've already used spaces on a line.

While I am sure that there are people who do so, it just seems 
unreasonable to me to throw in the presence of tabs because someone, 
somewhere, might commit these three sins together and *be confused by 
the result*.  (So, no, it's not the only option.)

Note that IDEs can also highlight code that would be inappropriately 
mangled as a result, so people learn not to commit all three of the sins 
listed.

On 4/10/2018 5:39 PM, Éamonn McManus wrote:
>>> 3. If the input contains *any* tab characters at all (except any that
> are
>>> part of the trailing whitespace), then this method cannot know that it
>>> isn't jumbling the end result, and maybe it should just throw.
>> I think there's a middle ground, where it strips any common whitespace
>> prefix.  So if every line starts with tab-tab-space-space, it can safely
>> strip this.
> I'm afraid that's not true. In practice if you are using tabs at all it is
> very easy in many editors to end up with a mix of spaces and tabs. So you
> could easy have (with 8-space tabs) one line that has 3 tabs at the start,
> and another that has 2 tabs and 8 spaces. For example with Emacs you can
> get this just by hitting delete after a tab and then hitting space. You
> would nevertheless want stripIndent to remove the indentation from both
> lines, since they look identical.
>
> The situation is made worse by the fact that there are two common
> conventions for tab width, 4 or 8.
>
> I think the only way to avoid these problems is for stripIndent to throw if
> its argument has any tabs, or at least any tabs in leading whitespace, and
> provide a separate method `detab` whose argument says what the width of a
> tab stop is. (Just to be sure: this method should arrange for tab stops to
> be at positions 4N or 8N, where the first column is column 0. So a tab can
> expand into 1 to 4 spaces, or 1 to 8 spaces.)
>
> Then users operating in tab-free codebases can just write .stripIndent(),
> and users in tab-infected codebases can write .detab(4).stripIndent().
>
> The alternative is to expose novice users to many hours of exasperation.
> Tabs are generally invisible, so you can imagine someone trying to figure
> out why two lines that look exactly the same ended up treated differently.
> Users may not even know there is such a thing as a tab character. (If
> stripIndent throws, it should have a helpful message that suggests calling
> detab(N) and that the value of n should probably be 4 or 8.)
>
>> String asciiArtFTW =
>> `````````
>>        `  BOO  `
>>        `````````.trimMarkers("`", "`");
> I'm not sure I get that. It doesn't correspond to anything I've ever wanted
> to do, even in languages that already have multiline strings. At least,
> could we have an overload that just takes the starting marker, for the
> overwhelmingly commoner case where you only want to strip at the start?
>
> On Tue, 10 Apr 2018 at 13:50, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>
>
>>> (now stripIndent)
>>>
>>> I've accumulated a few questions/comments on this.
>>>
>>> 1. When choosing the amount to trim, it ought to ignore blank lines and
>>> only-whitespace lines, right?
>> Seems right.
>>> 2. Is it really appropriate to automatically remove trailing whitespace?
>> I'm not sure about this either.  The reason that RSLs will have "extra"
>> whitespace that needs to be stripped is that we want to indent the RSL
>> snippet relative to the Java code (and as you point out, the IDE may do
>> that automatically for us.)  But if there's trailing whitespace, its
>> because the user put it there, and who is it hurting?  It might be
>> significant.
>>> 3. If the input contains *any* tab characters at all (except any that
> are
>>> part of the trailing whitespace), then this method cannot know that it
>>> isn't jumbling the end result, and maybe it should just throw.
>> I think there's a middle ground, where it strips any common whitespace
>> prefix.  So if every line starts with tab-tab-space-space, it can safely
>> strip this.
>>> 5. If we do end up in a world where we have to call this for almost
> every
>>> one of our tens of thousands of multi-line RSLs... is it strange that I
>>> feel like I would prefer it was static? It seems like it would look a
> lot
>>> more normal that way visually. Ugh...
>> I think this is likely to vary subjectively a lot.  Some people like
>> that the instance method is mostly out of the way; others like the
>> up-front shouting of the static method.
>> The reason we can't have both is then we can't resolve the method
>> reference String::strip as a Function<String,String>, which seems a
>> useful thing to do.
>>> On top of *that*, I have no idea what "right markers" are good for, nor
>>> what customizing the marker choice is good for (other than creating more
>>> needless variation between different pieces of code).
>>>
>> String asciiArtFTW =
>> `````````
>>        `  BOO  `
>>        `````````.trimMarkers("`", "`");



More information about the core-libs-dev mailing list