Propose to use clang-format to enforce hotspot codestyle

Fri Mar 13 08:30:25 UTC 2020

Hi,

First of all, I have to admit that using a huge patch to correct all inconsistencies is a bad idea.  I take it back. It would devastate annotation information which is more important than styles.
Thanks John to share his rationale and a bit history behind the blog. I really fell the trap you left.  The reason Jesper criticized my very long code comes  from “There is no hard line length limit” in the wiki.  I thought it means unlimited. Now we know it’s deliberately ambiguous and developers are at their discretion. It’s also very hard to determine AlignConsecutiveAlignments.

I will step back and try to use a .clang-format to format my new code locally.  I got it. It’s a distributed effort to maintain hotspot codestyle instead of authoritarian rules.  I really appreciate the inputs, which are very valuable for me.

Finally,  I would like share my discover about clang-format as a trick. I think you can deploy clang-format in emacs as well.
I was told to maintain include headers in alphabetical order. It’s mechanical. I found it’s pretty handy to get it done by IncludeCategories.
https://cr.openjdk.java.net/~xliu/8240834/webrev/src/hotspot/.clang-format.html

thanks,
--lx

From: John Rose <john.r.rose at oracle.com>
Date: Thursday, March 12, 2020 at 8:28 PM
To: Andrew Haley <aph at redhat.com>
Cc: "Liu, Xin" <xxinliu at amazon.com>, Jesper Wilhelmsson <jesper.wilhelmsson at oracle.com>, HotSpot Open Source Developers <hotspot-dev at openjdk.java.net>
Subject: RE: [EXTERNAL]Propose to use clang-format to enforce hotspot codestyle

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Mar 12, 2020, at 2:36 AM, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:

No, please do not do this.

It well mess up the diffs, and it will make auto-merging fail. It will
add considerable costs to the updates project for every backport. It
is in every way a bad idea.

+100

I think one influence in discussions like this one is that they
turn on the “prescriptive vs. descriptive” distinction discovered
by linguists, but applied (maybe surprisingly) to software notations.
It may take a while to realize it, but this distinction is relevant,
hard to resolve, and subtle.

Here’s a primer from the linguistic world:

https://www.ling.upenn.edu/courses/ling001/prescription.html

Short version:  Languages (and code), as used by humans, vary
over time and place in every possible way.  Variation must be
controlled to preserve mutual intelligibility. *Prescriptivists*
try to rescue the system by saying “hey everybody just follow
my rules and we’ll be good; I promise all the good users of
this language have already been following them”.  Meanwhile,
people correct each others’ pronunciation and grammar as
it occurs to them, and write dictionaries and grammar books
to make this easier.  These books are not usually ammo for
prescriptivists, but rather *descriptions* of how people
actually behave, at scale.  Prescription is brittle, since it doesn’t
take special cases into account, and requires a central ruling body,
an “Academy” to define, evolve, and adjudicate.  Description
doesn’t need an Academy because it cedes authority to the
speakers as a whole to correct and evolve, and to put up or
tear down standards (dictionaries, etc.).

So when someone comes along and says, “You forgot the
style rule book but luckily I have one for you”, you are
dealing with a prescriptivist.  If you say, “wait, that doesn’t
describe what we are doing”, perhaps from a descriptivist
viewpoint, you will be talking at cross purposes for a while.
The prescriptivist will say, “that’s OK, I’ll change my rules
for your practice”, but still assume that a central Academy
(of some sort) is the obvious right answer.  That’s what’s
seems to be going on here.  Liu clearly wants to be helpful
and to contribute something valuable to this project, but
in some of his proposals he seems to assume that a
mechanical solution will help, but this has well-known
drawbacks, deeply embedded.  I think style has to be managed
by humans not robots, and in a distributed manner not centralized.

You cannot easily change *any* rules in a large software project,
can you?  That goes for language choice, build tools, SCM, naming
conventions, and even whitespace use.  If we were to use a strong
prescriptive style guide, we would have to have made this decision
at the beginning of the project, or else make a decision to phase it
in very slowly, like we phase in tool chains, SCM (see how hard
that is right now?), language dialects (C++11 anyone?), and so on.
The various objections on this thread to reformatting the source
base are evidence of the sort of work we’d have to do, over months
and months, to guide the project into the purified style.  I think
the consensus will be, unless someone pulls an unprecedented
rabbit out of a hat, that it’s not worth the effort:  We can continue
to tolerate style noise, as long as we have some confidence that
we can catch the worst offenders at review time.  Prescriptivists
and descriptivists agree that there are folks out there whose
noise-to-signal ratio need to be lowered; they disagree about
what to do with it, between the extremes of “have the Academy’s
robot enforcer refuse their PR”, or “trust reviewers to apply
social pressure, backed by existing descriptions of known-good
behavior”.  There are other points on the spectrum, such as
hyper-P “require certification before someone can talk” or
hyper-D “If I don’t understand it it must be my fault”, and
a middle ground (for us) “have a robot flag *possible* style
infractions in the PR, for reviewers to comment on”.  I think
that’s the most a robot can do (usefully), and only on future code,
not retroactively.

IMO, in the end, I’m a descriptivist (like most practitioners)
because I think we can get along better on balance without
an Academy, certainly in this matter.  (The Java and JVM specs.,
OTOH, are and have always been prescriptive.)  I’m not hyper-D
either, because, after all, I find style noise distracting and hope
folks will agree to reduce it… which is one reason I wrote the
HotSpot Style Guide.

— John

P.S. <memory-lane>

I was the original author of something that ended up
p0sted at https://wiki.openjdk.java.net/display/HotSpot/StyleGuide.
I took this task on because, at an early point in the project,
there were signs that styles would diverge greatly, making it
hard (yes, harder than today) to decide which example to
follow, to make one’s new code “fit in” with old code.

(Disclaimer:  I’m talking HotSpot only, not JDK.  I never
had anything to do with Java style rules.  In that day
they were separate projects.)

At that moment, most of HotSpot had been developed by
a few engineers in a room, with one of them asking the
team to make coherent choices about naming and formatting.
Their choices are still with us today.  Also at that moment,
the team was quickly growing, and the personal influence
of the original team was waning.  Some of the new coders
were determined to code in their own special style and
the code base was getting noisy (in style-space).

Linguists note that languages diverge at all scales, and
in the setting of expanding (or fragmenting) social relations.
The smaller groups can exert more control over their
common conventions, while larger ones find it more difficult
to exert control, and usually opt for less control (more
diversity), instead of expensive and brittle central controls
like Academies.  This happened as our project grew.

I wrote the style guide neither because (a) I agreed that
the original style was perfect and wanted to be the new
Style Policeman, nor because (b) I aspired to a better style
that everyone else would adopt as soon as I explained it to
them.  I just wanted (c) to take practical steps to reduce
style-noise.  And it worked, I suppose:  There have been
times when a reviewer said, “Hey that’s style-noisy, and
you can fix it by reading the online Style Guide.”

One of my sneakier goals in writing the style guide was
making it difficult to use by future would-be Style Police
to browbeat the rest of us into taking a step such as is
being proposed today, to reformat the code base in the
name of “efficiency” or “consistency” or whatever.
I’m also averse to confrontation, so it was easy to hope
that even the style-noisy folks in our group (/opto/) would
on balance contribute better if we let them set their own
rules in their own area.  I think that was the case, but as
a result we now have parts of the system which have their
special uglinesses.

But a Style Guide With Teeth or Style Enforcer Robot would
have been almost useless in attacking the root causes of
teams having trouble working together.  The different
whitespace styles are the tip of an iceberg of differences
between design rules, algorithm formulation, workflow,
and more.  A Style-Police Robot would affect, within its purview
of style correction, about 5% of the cross-team dynamics that
swirl across a multi-group team, dynamics which must be dealt
with via ad hoc social negotiations; robots are useless, and only
the simplest rules (like “be kind”) are deeply useful.

Anyway, that’s why the HotSpot Style Guide says not so
much “here’s what you must always do” but more “here
are some good things to do”, and then “follow local
conventions if they exist”.

One more influence:  I love that George Orwell said,
in his style guide “Politics and the English Language”,
at the end of a short list of rules:  “Break any of these
rules sooner than say anything outright barbarous.”
By “barbarous” I think he means something that your
reader will have trouble understanding (root meaning
of term “barbarous”), or that would somehow cause a
distraction.  I mentally tip my hat to him when I break
a small rule to make something look (as I think) more
intelligible.  I’m often wrong, and Coleen or David will
quickly correct me during review.  Meanwhile a robot
would just sit there counting whitespaces, oblivious.

</memory-lane>