Propose to use clang-format to enforce hotspot codestyle
Jesper Wilhelmsson
jesper.wilhelmsson at oracle.com
Thu Mar 12 14:26:58 UTC 2020
> On 12 Mar 2020, at 10:30, Liu, Xin <xxinliu at amazon.com> wrote:
>
> Thanks for the feedbacks.
>
> For Andrew’s concerns, I think I got answers.
> Sometimes, hand-craft code is better indeed. Clang-format provides a directive which can temporarily turn it off.
> https://releases.llvm.org/6.0.1/tools/clang/docs/ClangFormatStyleOptions.html#disabling-formatting-on-a-piece-of-code <https://urldefense.com/v3/__https://releases.llvm.org/6.0.1/tools/clang/docs/ClangFormatStyleOptions.html*disabling-formatting-on-a-piece-of-code__;Iw!!GqivPVa7Brio!OOfqbHl7BR-FAqgxbdaX1tTUPQasjG5INyltu654bQzTLbZuu6vEMxNnTJl0tGTIubLcgZ0$>
> Developers can also define customized formats. clang-format takes precedence the local configure file over parent directory’s. Assume that GC guys would like to use an exotic code style, they can define it and place it in their own directory.
>
> I expect to have a huge change, but it’s a one-time thing. I don’t understand why we have to review it completely. A formatter should never alter semantics of user’s code. If hotspot can’t be built or spot any regression, there must be something wrong. We should be glad some bugs are caught by an automatic tool.
If you get a huge patch committed into the JDK without proper review there is something wrong in the review process. There are so many aspects to this and I'm actually surprised that you would expect the community to accept a huge change like this. Even if you disregard all the practical problems with conflicts with every single project repo and clone out there and all the issues that a change like that will cause for backports for years to come, even if we only look at mainline and think forward, there are huge issues with a huge patch.
First of all, if we assume that the change would contain only non-semantic cleanup of indentation, spaces, and line breaks, the developers who work on this code on a daily basis would not accept the change without them looking through their areas first. We have already concluded that there is a need to opt out from formatting in parts of the code. These parts would have to be found and cleaned up in your patch before you push it. So already here we have a requirement from basically all teams out there working on the JVM to review the patch.
The patch is quite likely to contain other changes as well though. For instance it will likely add { } for all if/for/while etc that are missing it. This in it self is a good change, but without a proper review this automatic cleanup risk hiding bugs rather than finding them. Suppose you have code like this:
if (expression)
foo();
bar();
other_code();
It's pretty obvious that something is fishy here, but is it the indentation that is wrong, or are we missing { }? The formatter can't know but is likely to clean up like this:
if (expression) {
foo();
}
bar();
other_code();
Now it's not at all obvious that something is fishy anymore. If the indentation was correct (we were missing { }) and this manifests as an intermittent bug we just made it a lot harder to find it. Actually this example shows the same point even without adding the { }. "Fixing" the indentation would also hide the bug. There are many more examples of similar issues with common cleanups made by formatters.
Then there is the whole security aspect. Why should the community trust a huge patch where the author waves his hand and says "Nah, you don't have to look at this"? From a security perspective that would be insane. Anything could be hiding in there.
Just for the record since you brought it up, I'm not worried about the tool finding bugs in the hotspot code - that would be a good thing. I'm more worried about the hotspot code finding bugs in the tool.
> Yes, clang-format is a formatter, but we can turn it into a validator very cheap by git/hg. If we run clang-format and generate any diff, we know the current code violates hotspot’s codestyle. In my proposal, I plan to do that in phase-2, eg. “make check-style”. Of course, it only works in a clean repo.
Yes, running the formatter on a copy of your modified file and diffing would tell if you have discrepancies, but that involves saving temporary files on disk, and it requires some additional formatting of the diff output to make it understandable what is wrong with your modified code. Doable, sure, but I'd prefer a tool that gives you clean warnings up front without an extra layer that must be kept up to date and free from bugs.
> How about we just forget about validator. I think we need a .clang-format first. It helps developers to format code automatically. To proof my idea, I made a draft .clang-format and run it against compiler/*.cpp. I think most of those changes make hotspot more readable.
> https://cr.openjdk.java.net/~xliu/8240834/webrev/ <https://cr.openjdk.java.net/~xliu/8240834/webrev/>
Looking at your webrev I see that it hides a lot of changes since it doesn't show all whitespace-only changes as diffs. So already here there is a major flaw that requires an actual review to tell that your patch is even larger than what your webrev says.
This patch removes all aligned '=' in declarations. I assume this is configurable(?) It's not an improvement in my book.
default: ; <---I think the removal of this space makes it harder to read.
How long lines does this formatter prefer? The change below creates a very long line that was broken into two for a reason.
- int filler_limit = align_instr() ? max_instr_size_in_bytes : ((instruction_size_in_bytes+abstract_instruction_bytes_per_block-1)/abstract_instruction_bytes_per_block)
- *abstract_instruction_bytes_per_block;
+ int filler_limit = align_instr() ? max_instr_size_in_bytes : ((instruction_size_in_bytes + abstract_instruction_bytes_per_block - 1) / abstract_instruction_bytes_per_block) * abstract_instruction_bytes_per_block;
I just looked at one file but already I can say that there is no way this can be pushed without the compiler team looking at it.
> I don’t realize that hotspot macros are special. It seems that clang-format has many macro-related styles too. Let me try them on globals.hpp.
There are some really unfriendly macros out there...
/Jesper
>
> Thanks,
> --lx
>
>
> From: Jesper Wilhelmsson <jesper.wilhelmsson at oracle.com <mailto:jesper.wilhelmsson at oracle.com>>
> Date: Wednesday, March 11, 2020 at 11:18 AM
> To: "Liu, Xin" <xxinliu at amazon.com <mailto:xxinliu at amazon.com>>
> Cc: Volker Simonis <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>>, HotSpot Open Source Developers <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
> Subject: RE: [EXTERNAL]Propose to use clang-format to enforce hotspot codestyle
>
> On 11 Mar 2020, at 13:06, Volker Simonis <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>> wrote:
>
> Liu, Xin <xxinliu at amazon.com <mailto:xxinliu at amazon.com>> schrieb am Mi., 11. März 2020, 09:50:
>
>
> Hi,
>
> I just filed a JBS issue JDK-8240834 about hotspot codestyle(
> https://wiki.openjdk.java.net/display/HotSpot/StyleGuide <https://wiki.openjdk.java.net/display/HotSpot/StyleGuide>). I found many
> inconsistent code in hotspot so I propose to use clang-format[1] to enforce
> it.
>
> I know you’re all very experienced developers, but sometimes wrong
> indentations or tailing whitespaces creep in. it's hard to catch them in
> webrev. Even though we can, new contributors still need to contact their
> sponsors for formatting changes. I think it's a good idea to have lint
> thing done at first place.
>
> Clang is powerful enough to parse modern C/C++ code and it's very handy to
> format code. clang-format should be available on all supporting platforms.
> It's still okay even if it is absent on some platforms because nightly
> build and submit repo will cover unsupported platforms.
>
> I plan to have a .clang-format and correcting styles in the 1st patch. We
> can bring auto clang-format check to hotspot's makefile in the second
> patch. Does it sound like a plan?
>
>
> Have you already checked how big this first change will be? I'm afraid it
> will be huge because we have a lot of manually formatted code which doesn't
> confirm to any specific style (e.g. macros). We also have consistent but
> different code styles in various parts of HS. So changing all the sources
> to conform to a single style will necessarily be quite invasive and I'm not
> sure if we want that?
>
>
> In theory I support using a formatting tool to maintain a single code style throughout the hotspot code, but in practice this is extremely difficult. Volker brings up one point here, the situation today is bad - really bad - from a formatter's perspective. The size of the initial change will be huge. To review that requires people from all affected areas to look in detail on their code to make sure nothing is broken. Andrew brings up a different point in his mail; parts of the code is manually formatted for a reason. No automatic formatter will be able to make sense of all code if it applies the same rules all over.
>
> I have looked into this problem for quite some time and I believe we have to take a very slow approach to this. Not only to be able to actually review the changes, but also to keep our developers around. First, any changes we make to clean up formatting must be automatically verified from that point forward to stop people from pushing new code that violates the rules we have managed to agree on. This requires a tool that can enable individual checks one by one. Secondly, the tool can't go in and modify people's code, it can only verify that it is correct and give a warning if there is a violation. This means that what we are looking for is a validator, not a formatter. There must also be the option to disable validation for parts of the code.
>
> I have not found any such tools out there, one that can accept the hotspot source code as is without any remarks if all checks are disabled, and then let you enable one check a time as we gradually clean up the code. The best I could find was uncrustify, but it still is a formatter, not a validator, and there are a few style rules that can't be disabled. I asked the developers of uncrustify if there was any chance of having the option for it to be a validator instead but that didn't seem to be in their interest.
>
> As I didn't find one I started to implement my own validator a few years ago. I do have a tool that swallows most of the hotspot code as is, but there are a few issues with macro usages. You see, that is another issue we have to deal with. It's not enough to parse C++. If we want to read and format the code we need to treat our macros as part of the language as well. Hotspot has a few macros that make for real challenges for a parser :-) I haven't yet included the option to disable verification for a block of code, coming to think of it, that could actually solve the most weird macro usages we have out there. I'll take another look at that at some point.
>
> Until we have a tool of this kind I'm afraid it will be extremely difficult to enforce any style rules in practice. But I do agree that this is a goal worth pursuing.
>
> Cheers,
> /Jesper
>
>
>
>
>
> Reference:
> [1] https://clang.llvm.org/docs/ClangFormat.html <https://urldefense.com/v3/__https://clang.llvm.org/docs/ClangFormat.html__;!!GqivPVa7Brio!OOfqbHl7BR-FAqgxbdaX1tTUPQasjG5INyltu654bQzTLbZuu6vEMxNnTJl0tGTIfZEEix4$>
> Thanks,
> --lx
More information about the hotspot-dev
mailing list