RFR: 8365306: Provide OS Process Size and Libc statistic metrics to JFR
Thomas Stuefe
stuefe at openjdk.org
Sat Aug 16 04:21:09 UTC 2025
This provides the following new metrics:
- `ProcessSize` event (new, periodic)
- vsize (for analyzing address-space fragmentation issues)
- RSS including subtypes (subtypes are useful for excluding atypical issues, e.g. kernel problems that cause large file buffer bloat)
- peak RSS
- process swap (if we swap we cannot trust the RSS values, plus it indicates bad sizing)
- pte size (to quickly see if we run with a super-large working set but an unsuitably small page size)
- `LibcStatistics` (new, periodic)
- outstanding malloc size (important counterpoint to whatever NMT tries to tell me, which alone is often misleading)
- retained malloc size (super-important for the same reason)
- number of libc trims the hotspot executed (needed to gauge the usefulness of the retain counter, and to see if a customer employs native heap auto trimming (`-XX:TrimNativeHeapInterval`)
- `NativeHeapTrim` (new, event-driven) (for both manual and automatic trims)
- RSS before and RSS after
- RSS recovered by this trim
- whether it was an automatic or manual trim
- duration
- `JavaThreadStatistic`
- os thread counter (new field) (useful to understand the behavior of third-party code in our process if threads are created that bypass the JVM. E.g. some custom launchers do that.)
- nonJava thread counter (new field) (needed to interprete the os thread counter)
Notes:
- we already have `ResidentSetSize` event, and the new `ProcessSize` event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it.
- Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue.
I provided implementations, as far as possible, to Linux, MacOS and Windows.
Testing:
- ran the new tests manually and as part of GHAs
-------------
Commit messages:
- copyrights
- Windows
- MacOS tests
- fix mac
- wip
- mac implementation
- typo
- Add nonjava thread count; add test for thread statistics
- tests
- Add test to workflow
- ... and 11 more: https://git.openjdk.org/jdk/compare/dbae90c9...39e282ae
Changes: https://git.openjdk.org/jdk/pull/26756/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26756&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8365306
Stats: 778 lines in 28 files changed: 643 ins; 27 del; 108 mod
Patch: https://git.openjdk.org/jdk/pull/26756.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/26756/head:pull/26756
PR: https://git.openjdk.org/jdk/pull/26756
More information about the hotspot-dev
mailing list