stripIndent() behavior

Kevin Bourrillion kevinb at google.com
Wed Apr 11 00:07:41 UTC 2018


I've also been trying to sort out just how much we should really weight
this risk. We can make a best-effort for the mixed-tabs case by making sure
stripIndent() only removes an absolutely *identical* prefix from each line.
After that, yes, they may still get a misaligned result at runtime (note
that that could happen *anyway* just by having a different tab stop in that
environment vs. the editor), and that's not good, but actually throwing to
force them to clean up their whitespace seems a magnitude too severe.



On Tue, Apr 10, 2018 at 3:27 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I think this is "throwing" the baby out with the bathwater.  It is
> punishing those who can use tabs responsibly for the sins of those who
> cannot.
>
> You have to commit three sins before you have a problem:
>  - using tabs at all
>  - using tabs inconsistently across the lines of a single expression;
>  - using tabs after you've already used spaces on a line.
>
> While I am sure that there are people who do so, it just seems
> unreasonable to me to throw in the presence of tabs because someone,
> somewhere, might commit these three sins together and *be confused by the
> result*.  (So, no, it's not the only option.)
>
> Note that IDEs can also highlight code that would be inappropriately
> mangled as a result, so people learn not to commit all three of the sins
> listed.
>
>
> On 4/10/2018 5:39 PM, Éamonn McManus wrote:
>
>> 3. If the input contains *any* tab characters at all (except any that
>>>>
>>> are
>>
>>> part of the trailing whitespace), then this method cannot know that it
>>>> isn't jumbling the end result, and maybe it should just throw.
>>>>
>>> I think there's a middle ground, where it strips any common whitespace
>>> prefix.  So if every line starts with tab-tab-space-space, it can safely
>>> strip this.
>>>
>> I'm afraid that's not true. In practice if you are using tabs at all it is
>> very easy in many editors to end up with a mix of spaces and tabs. So you
>> could easy have (with 8-space tabs) one line that has 3 tabs at the start,
>> and another that has 2 tabs and 8 spaces. For example with Emacs you can
>> get this just by hitting delete after a tab and then hitting space. You
>> would nevertheless want stripIndent to remove the indentation from both
>> lines, since they look identical.
>>
>> The situation is made worse by the fact that there are two common
>> conventions for tab width, 4 or 8.
>>
>> I think the only way to avoid these problems is for stripIndent to throw
>> if
>> its argument has any tabs, or at least any tabs in leading whitespace, and
>> provide a separate method `detab` whose argument says what the width of a
>> tab stop is. (Just to be sure: this method should arrange for tab stops to
>> be at positions 4N or 8N, where the first column is column 0. So a tab can
>> expand into 1 to 4 spaces, or 1 to 8 spaces.)
>>
>> Then users operating in tab-free codebases can just write .stripIndent(),
>> and users in tab-infected codebases can write .detab(4).stripIndent().
>>
>> The alternative is to expose novice users to many hours of exasperation.
>> Tabs are generally invisible, so you can imagine someone trying to figure
>> out why two lines that look exactly the same ended up treated differently.
>> Users may not even know there is such a thing as a tab character. (If
>> stripIndent throws, it should have a helpful message that suggests calling
>> detab(N) and that the value of n should probably be 4 or 8.)
>>
>> String asciiArtFTW =
>>> `````````
>>>        `  BOO  `
>>>        `````````.trimMarkers("`", "`");
>>>
>> I'm not sure I get that. It doesn't correspond to anything I've ever
>> wanted
>> to do, even in languages that already have multiline strings. At least,
>> could we have an overload that just takes the starting marker, for the
>> overwhelmingly commoner case where you only want to strip at the start?
>>
>> On Tue, 10 Apr 2018 at 13:50, Brian Goetz <brian.goetz at oracle.com> wrote:
>>
>>
>>
>> (now stripIndent)
>>>>
>>>> I've accumulated a few questions/comments on this.
>>>>
>>>> 1. When choosing the amount to trim, it ought to ignore blank lines and
>>>> only-whitespace lines, right?
>>>>
>>> Seems right.
>>>
>>>> 2. Is it really appropriate to automatically remove trailing whitespace?
>>>>
>>> I'm not sure about this either.  The reason that RSLs will have "extra"
>>> whitespace that needs to be stripped is that we want to indent the RSL
>>> snippet relative to the Java code (and as you point out, the IDE may do
>>> that automatically for us.)  But if there's trailing whitespace, its
>>> because the user put it there, and who is it hurting?  It might be
>>> significant.
>>>
>>>> 3. If the input contains *any* tab characters at all (except any that
>>>>
>>> are
>>
>>> part of the trailing whitespace), then this method cannot know that it
>>>> isn't jumbling the end result, and maybe it should just throw.
>>>>
>>> I think there's a middle ground, where it strips any common whitespace
>>> prefix.  So if every line starts with tab-tab-space-space, it can safely
>>> strip this.
>>>
>>>> 5. If we do end up in a world where we have to call this for almost
>>>>
>>> every
>>
>>> one of our tens of thousands of multi-line RSLs... is it strange that I
>>>> feel like I would prefer it was static? It seems like it would look a
>>>>
>>> lot
>>
>>> more normal that way visually. Ugh...
>>>>
>>> I think this is likely to vary subjectively a lot.  Some people like
>>> that the instance method is mostly out of the way; others like the
>>> up-front shouting of the static method.
>>> The reason we can't have both is then we can't resolve the method
>>> reference String::strip as a Function<String,String>, which seems a
>>> useful thing to do.
>>>
>>>> On top of *that*, I have no idea what "right markers" are good for, nor
>>>> what customizing the marker choice is good for (other than creating more
>>>> needless variation between different pieces of code).
>>>>
>>>> String asciiArtFTW =
>>> `````````
>>>        `  BOO  `
>>>        `````````.trimMarkers("`", "`");
>>>
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com


More information about the core-libs-dev mailing list