RFR: 8365053: Refresh hotspot precompiled.hpp with headers based on current frequency [v11]

Thu Aug 21 10:13:57 UTC 2025

On Tue, 12 Aug 2025 11:32:42 GMT, Francesco Andreuzzi <duke at openjdk.org> wrote:

>> In this PR I propose to refresh the included headers in hotspot `precompiled.hpp`. The current set of precompiled headers was refreshed in 2018, 7 years ago. I repeated the same operations and measurements after refreshing the set of precompiled headers according to the current usage frequency.
>> 
>> These are the results I observed. Depending on the platform, the improvement is between 10 and 20% in terms of total work (user+sys). The results are in seconds.
>> 
>> 
>> linux-x64 GCC
>> master      real 81.39 user 3352.15 sys 287.49
>> JDK-8365053 real 81.94 user 3030.24 sys 295.82
>> 
>> linux-x64 Clang
>> master      real 43.44 user 2082.93 sys 130.70
>> JDK-8365053 real 38.44 user 1723.80 sys 117.68
>> 
>> linux-aarch64 GCC
>> master      real 1188.08 user 2015.22 sys 175.53
>> JDK-8365053 real 1019.85 user 1667.45 sys 171.86
>> 
>> linux-aarch64 clang
>> master      real 981.77 user 1645.05 sys 118.60
>> JDK-8365053 real 791.96 user 1262.92 sys 101.50
>
> Francesco Andreuzzi has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - conditional includes
>  - variants

I think I found a more sensible approach to tackle this problem. Using clang [`-ftime-trace`](https://clang.llvm.org/docs/analyzer/developer-docs/PerformanceInvestigation.html#performance-analysis-using-ftime-trace) we can get reports in [Trace Event format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0) about each header. Example of one such file here: [shenandoahOldGC.json](https://github.com/user-attachments/files/21915502/shenandoahOldGC.json).

These files can be processed (e.g. with [ClangBuildAnalyzer](https://github.com/aras-p/ClangBuildAnalyzer/tree/main)) to dig where time was spent during the build. Among the information we can get from `ClangBuildAnalyzer`, here is the interesting one:

**** Expensive headers:
597169 ms: /jdk/src/hotspot/share/oops/access.inline.hpp (included 650 times, avg 918 ms), included via:
  80x: oop.inline.hpp iterator.inline.hpp 
  70x: javaClasses.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp 
  32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp iterator.inline.hpp 
  30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 
  ...

425714 ms: /jdk/src/hotspot/share/memory/iterator.inline.hpp (included 646 times, avg 659 ms), included via:
  80x: oop.inline.hpp 
  70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp 
  30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  ...

400304 ms: /jdk/src/hotspot/share/oops/oop.inline.hpp (included 1165 times, avg 343 ms), included via:
  80x: <direct include>
  70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 
  66x: oop.inline.hpp iterator.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 
  60x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp iterator.inline.hpp instanceKlass.inline.hpp klass.inline.hpp classLoaderData.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 
  ...

[...]

This should give us a clear understanding of which headers should go into `precompiled.hpp`, and uses all information available from the compiler itself, as opposed to counting the number of inclusions. Now, improvements in build time are comparable with the initial approach I tried in this PR, but I think this approach will prove more accurate in the long term.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26681#issuecomment-3209879924