Stop using precompiled headers for Linux?
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Fri Nov 2 11:14:17 UTC 2018
On 2018-11-02 11:39, Magnus Ihse Bursie wrote:
> On 2018-11-02 00:53, Ioi Lam wrote:
>> Maybe precompiled.hpp can be periodically (weekly?) updated by a
>> robot, which parses the dependencies files generated by gcc, and pick
>> the most popular N files?
> I think that's tricky to implement automatically. However, I've done
> more or less, that, and I've got some wonderful results! :-)
Ok, I'm done running my tests.
TL;DR: I've managed to reduce wall-clock time from 2m 45s (with pch) or
2m 23s (without pch), to 1m 55s. The cpu time spent went from 52m 27s
(with pch) or 55m 30s (without pch) to 41m 10s. This is a huge gain for
our automated builds! And a clear improvement even for the ordinary
developer.
The list of included header files is reduced to just 37. The winning
combination was to include all header files that was included in more
than 130 different files, but to exclude all files with the name
"*.inline.hpp". Hopefully, a further gain of not pulling in the
*.inline.hpp files is that the risk of pch/non-pch failures will diminish.
However, these 37 files in turn pull in an additional 201 header files.
Of these, three are *.inline.hpp:
share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdBits.inline.hpp,
os_cpu/linux_x86/bytes_linux_x86.inline.hpp and
os_cpu/linux_x86/copy_linux_x86.inline.hpp. This looks like a problem
with the header files to me.
With some exceptions (mostly related to JFR), these additional 200 files
have "generic" looking names (like share/gc/g1/g1_globals.hpp), which
indicate to me that it is reasonable to have them in this list, just as
the list of the original 37 tended to be quite general and high-level
includes. However, some files (like
share/jfr/instrumentation/jfrEventClassTransformer.hpp) has maybe leaked
in where they should not really be. It might be worth letting a hotspot
engineer spend some cycles to check up these files and see if anything
can be improved.
Caveats: I have only run this on my local linux build with the default
server JVM configuration. Other machines will have different sweet
spots. Other JVM variants/feature combinations will have different sweet
spots. And, most importantly, I have not tested this at all on Windows.
Nevertheless, I'm almost prepared to suggest a patch that uses this
selection of files if running on gcc, just as is, because of the speed
improvements I measured.
And some data:
Here is my log from my runs. The "on or above" means the cutoff I used
for how many files that needed to include the files that were selected.
As you can see, there is not much difference between cutoffs between
130-150, or (without the inline files) between 110 and 150. (There were
a lot of additional inline files in the positions below 130.) With all
other equal, I'd prefer a solution with fewer files. That is less likely
to go bad.
real 2m45.623s
user 52m27.813s
sys 5m27.176s
hotspot with original pch
real 2m23.837s
user 55m30.448s
sys 3m39.739s
hotspot without pch
real 1m59.533s
user 42m50.019s
sys 3m0.893s
hotspot new pch on or above 250
real 1m58.937s
user 42m18.994s
sys 3m0.245s
hotspot new pch on or above 200
real 2m0.729s
user 42m16.636s
sys 2m57.125s
hotspot new pch on or above 170
real 1m58.064s
user 42m9.618s
sys 2m57.635s
hotspot new pch on or above 150
real 1m58.053s
user 42m9.796s
sys 2m58.732s
hotspot new pch on or above 130
real 2m3.364s
user 42m54.818s
sys 3m2.737s
hotspot new pch on or above 100
real 2m6.698s
user 44m30.434s
sys 3m12.015s
hotspot new pch on or above 70
real 2m0.598s
user 41m17.810s
sys 2m56.258s
hotspot new pch on or above 150 without inline
real 1m55.981s
user 41m10.076s
sys 2m51.983s
hotspot new pch on or above 130 without inline
real 1m56.449s
user 41m10.667s
sys 2m53.808s
hotspot new pch on or above 110 without inline
And here is the "winning" list (which I declared as "on or above 130,
without inline"). I encourage everyone to try this on their own system,
and report back the results!
#ifndef DONT_USE_PRECOMPILED_HEADER
# include "classfile/classLoaderData.hpp"
# include "classfile/javaClasses.hpp"
# include "classfile/systemDictionary.hpp"
# include "gc/shared/collectedHeap.hpp"
# include "gc/shared/gcCause.hpp"
# include "logging/log.hpp"
# include "memory/allocation.hpp"
# include "memory/iterator.hpp"
# include "memory/memRegion.hpp"
# include "memory/resourceArea.hpp"
# include "memory/universe.hpp"
# include "oops/instanceKlass.hpp"
# include "oops/klass.hpp"
# include "oops/method.hpp"
# include "oops/objArrayKlass.hpp"
# include "oops/objArrayOop.hpp"
# include "oops/oop.hpp"
# include "oops/oopsHierarchy.hpp"
# include "runtime/atomic.hpp"
# include "runtime/globals.hpp"
# include "runtime/handles.hpp"
# include "runtime/mutex.hpp"
# include "runtime/orderAccess.hpp"
# include "runtime/os.hpp"
# include "runtime/thread.hpp"
# include "runtime/timer.hpp"
# include "services/memTracker.hpp"
# include "utilities/align.hpp"
# include "utilities/bitMap.hpp"
# include "utilities/copy.hpp"
# include "utilities/debug.hpp"
# include "utilities/exceptions.hpp"
# include "utilities/globalDefinitions.hpp"
# include "utilities/growableArray.hpp"
# include "utilities/macros.hpp"
# include "utilities/ostream.hpp"
# include "utilities/ticks.hpp"
#endif // !DONT_USE_PRECOMPILED_HEADER
/Magnus
>
> I'd still like to run some more tests, but preliminiary data indicates
> that there is much to be gained by having a more sensible list of
> files in the precompiled header.
>
> The fewer files we got on this list, the less likely it is to become
> (drastically) outdated. So I don't think we need to do this
> automatically, but perhaps manually every now and then when we feel
> build times are increasing.
>
> /Magnus
>
>>
>> - Ioi
>>
>>
>> On 11/1/18 4:38 PM, David Holmes wrote:
>>> It's not at all obvious to me that the way we use PCH is the
>>> right/best way to use it. We dump every header we think it would be
>>> good to precompile into precompiled.hpp and then only ask gcc to
>>> precompile it. That results in a ~250MB file that has to be read
>>> into and processed for every source file! That doesn't seem very
>>> efficient to me.
>>>
>>> Cheers,
>>> David
>>>
>>> On 2/11/2018 3:18 AM, Erik Joelsson wrote:
>>>> Hello,
>>>>
>>>> My point here, which wasn't very clear, is that Mac and Linux seem
>>>> to lose just as much real compile time. The big difference in these
>>>> tests was rather the number of cpus in the machine (32 threads in
>>>> the linux box vs 8 on the mac). The total amount of work done was
>>>> increased when PCH was disabled, that's the user time. Here is my
>>>> theory on why the real (wall clock) time was not consistent with
>>>> user time between these experiments can be explained:
>>>>
>>>> With pch the time line (simplified) looks like this:
>>>>
>>>> 1. Single thread creating PCH
>>>> 2. All cores compiling C++ files
>>>>
>>>> When disabling pch it's just:
>>>>
>>>> 1. All cores compiling C++ files
>>>>
>>>> To gain speed with PCH, the time spent in 1 much be less than the
>>>> time saved in 2. The potential time saved in 2 goes down as the
>>>> number of cpus go up. I'm pretty sure that if I repeated the
>>>> experiment on Linux on a smaller box (typically one we use in CI),
>>>> the results would look similar to Macosx, and similarly, if I had
>>>> access to a much bigger mac, it would behave like the big Linux
>>>> box. This is why I'm saying this should be done for both or none of
>>>> these platforms.
>>>>
>>>> In addition to this, the experiment only built hotspot. If you we
>>>> would instead build the whole JDK, then the time wasted in 1 in the
>>>> PCH case would be negated to a large extent by other build targets
>>>> running concurrently, so for a full build, PCH is still providing
>>>> value.
>>>>
>>>> The question here is that if the value of PCH isn't very big,
>>>> perhaps it's not worth it if it's also creating as much grief as
>>>> described here. There is no doubt that there is value however. And
>>>> given the examination done by Magnus, it seems this value could be
>>>> increased.
>>>>
>>>> The main reason why we haven't disabled PCH in CI before this. We
>>>> really really want to get CI builds fast. We don't have a ton of
>>>> over capacity to just throw at it. PCH made builds faster, so we
>>>> used them. My other reason is consistency between builds.
>>>> Supporting multiple different modes of building creates the
>>>> potential for inconsistencies. For that reason I would definitely
>>>> not support having PCH on by default, but turned off in our
>>>> CI/dev-submit. We pick one or the other as the official build
>>>> configuration, and we stick with the official build configuration
>>>> for all builds of any official capacity (which includes CI).
>>>>
>>>> In the current CI setup, we have a bunch of tiers that execute one
>>>> after the other. The jdk-submit currently only runs tier1. In tier2
>>>> I've put slowdebug builds with PCH disabled, just to help verify a
>>>> common developer configuration. These builds are not meant to be
>>>> used for testing or anything like that, they are just run for
>>>> verification, which is why this is ok. We could argue that it would
>>>> make sense to move the linux-x64-slowdebug without pch build to
>>>> tier1 so that it's included in dev-submit.
>>>>
>>>> /Erik
>>>>
>>>> On 2018-11-01 03:38, Magnus Ihse Bursie wrote:
>>>>>
>>>>>
>>>>> On 2018-10-31 00:54, Erik Joelsson wrote:
>>>>>> Below are the corresponding numbers from a Mac, (Mac Pro (Late
>>>>>> 2013), 3.7 GHz, Quad-Core Intel Xeon E5, 16 GB). To be clear, the
>>>>>> -npch is without precompiled headers. Here we see a slight
>>>>>> degradation when disabling on both user time and wall clock time.
>>>>>> My guess is that the user time increase is about the same, but
>>>>>> because of a lower cpu count, the extra load is not as easily
>>>>>> covered.
>>>>>>
>>>>>> These tests were run with just building hotspot. This means that
>>>>>> the precompiled header is generated alone on one core while
>>>>>> nothing else is happening, which would explain this degradation
>>>>>> in build speed. If we were instead building the whole product, we
>>>>>> would see a better correlation between user and real time.
>>>>>>
>>>>>> Given the very small benefit here, it could make sense to disable
>>>>>> precompiled headers by default for Linux and Mac, just as we did
>>>>>> with ccache.
>>>>>>
>>>>>> I do know that the benefit is huge on Windows though, so we
>>>>>> cannot remove the feature completely. Any other comments?
>>>>>
>>>>> Well, if you show that it is a loss in time on macosx to disable
>>>>> precompiled headers, and no-one (as far as I've seen) has
>>>>> complained about PCH on mac, then why not keep them on as default
>>>>> there? That the gain is small is no argument to lose it. (I
>>>>> remember a time when you were hunting seconds in the build time ;-))
>>>>>
>>>>> On linux, the story seems different, though. People experience PCH
>>>>> as a problem, and there is a net loss of time, at least on
>>>>> selected testing machines. It makes sense to turn it off as
>>>>> default, then.
>>>>>
>>>>> /Magnus
>>>>>
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>> macosx-x64
>>>>>> real 4m13.658s
>>>>>> user 27m17.595s
>>>>>> sys 2m11.306s
>>>>>>
>>>>>> macosx-x64-npch
>>>>>> real 4m27.823s
>>>>>> user 30m0.434s
>>>>>> sys 2m18.669s
>>>>>>
>>>>>> macosx-x64-debug
>>>>>> real 5m21.032s
>>>>>> user 35m57.347s
>>>>>> sys 2m20.588s
>>>>>>
>>>>>> macosx-x64-debug-npch
>>>>>> real 5m33.728s
>>>>>> user 38m10.311s
>>>>>> sys 2m27.587s
>>>>>>
>>>>>> macosx-x64-slowdebug
>>>>>> real 3m54.439s
>>>>>> user 25m32.197s
>>>>>> sys 2m8.750s
>>>>>>
>>>>>> macosx-x64-slowdebug-npch
>>>>>> real 4m11.987s
>>>>>> user 27m59.857s
>>>>>> sys 2m18.093s
>>>>>>
>>>>>>
>>>>>> On 2018-10-30 14:00, Erik Joelsson wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On 2018-10-30 13:17, Aleksey Shipilev wrote:
>>>>>>>> On 10/30/2018 06:26 PM, Ioi Lam wrote:
>>>>>>>>> Is there any advantage of using precompiled headers on Linux?
>>>>>>>> I have measured it recently on shenandoah repositories, and
>>>>>>>> fastdebug/release build times have not
>>>>>>>> improved with or without PCH. Actually, it gets worse when you
>>>>>>>> touch a single header that is in PCH
>>>>>>>> list, and you end up recompiling the entire Hotspot. I would be
>>>>>>>> in favor of disabling it by default.
>>>>>>> I just did a measurement on my local workstation (2x8 cores x2
>>>>>>> ht Ubuntu 18.04 using Oracle devkit GCC 7.3.0). I ran "time make
>>>>>>> hotspot" with clean build directories.
>>>>>>>
>>>>>>> linux-x64:
>>>>>>> real 4m6.657s
>>>>>>> user 61m23.090s
>>>>>>> sys 6m24.477s
>>>>>>>
>>>>>>> linux-x64-npch
>>>>>>> real 3m41.130s
>>>>>>> user 66m11.824s
>>>>>>> sys 4m19.224s
>>>>>>>
>>>>>>> linux-x64-debug
>>>>>>> real 4m47.117s
>>>>>>> user 75m53.740s
>>>>>>> sys 8m21.408s
>>>>>>>
>>>>>>> linux-x64-debug-npch
>>>>>>> real 4m42.877s
>>>>>>> user 84m30.764s
>>>>>>> sys 4m54.666s
>>>>>>>
>>>>>>> linux-x64-slowdebug
>>>>>>> real 3m54.564s
>>>>>>> user 44m2.828s
>>>>>>> sys 6m22.785s
>>>>>>>
>>>>>>> linux-x64-slowdebug-npch
>>>>>>> real 3m23.092s
>>>>>>> user 55m3.142s
>>>>>>> sys 4m10.172s
>>>>>>>
>>>>>>> These numbers support your claim. Wall clock time is actually
>>>>>>> increased with PCH enabled, but total user time is decreased.
>>>>>>> Does not seem worth it to me.
>>>>>>>>> It's on by default and we keep having
>>>>>>>>> breakage where someone would forget to add #include. The
>>>>>>>>> latest instance is JDK-8213148.
>>>>>>>> Yes, we catch most of these breakages in CIs. Which tells me
>>>>>>>> adding it to jdk-submit would cover
>>>>>>>> most of the breakage during pre-integration testing.
>>>>>>> jdk-submit is currently running what we call "tier1". We do have
>>>>>>> builds of Linux slowdebug with precompiled headers disabled in
>>>>>>> tier2. We also build solaris-sparcv9 in tier1 which does not
>>>>>>> support precompiled headers at all, so to not be caught in
>>>>>>> jdk-submit you would have to be in Linux specific code. The
>>>>>>> example bug does not seem to be that. Mach5/jdk-submit was down
>>>>>>> over the weekend and yesterday so my suspicion is the offending
>>>>>>> code in this case was never tested.
>>>>>>>
>>>>>>> That said, given that we get practically no benefit from PCH on
>>>>>>> Linux/GCC, we should probably just turn it off by default for
>>>>>>> Linux and/or GCC. I think we need to investigate Macos as well
>>>>>>> here.
>>>>>>>
>>>>>>> /Erik
>>>>>>>> -Aleksey
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
More information about the build-dev
mailing list