RFR: 7903722: JMH: Add xctrace-based perfnorm profiler for macOS
Filipp Zhinkin
fzhinkin at openjdk.org
Tue May 7 06:06:26 UTC 2024
Implementation of a perfnorm-alike profiler for macOS based on `xctrace` command line tool bundled with Xcode.
While the profiler is tested and seems to be working well, I consider it rather a preliminary version and open to a discussion on what and how it should measure.
Currently, the profiler only supports PMU counters sampling using `CPU Counters` instrument provided by the Instruments app / xctrace.
Unfortunately, `CPU Counters` instrument has no default settings, unlike `Time Profiler` and `CPU Profiler` instruments used by the recently merged `xctraceasm` profiler.
To use `CPU Counters`, a user has to create a template in the Instruments UI, select PMU events, save the template and then supply to `xctracenorm` as an argument.
This workflow not only prevents use of the profiler without preliminary manual configuration, but also tends to be annoying when it comes to measuring multiple events, as xctrace, unlike perf_events, does not support events multiplexing.
Thankfully, command-line-based configuration and default parameters could be emulated by building a custom Instruments package that imports data from `CPU Counters` and also supplies all required parameters.
As you can guess, there's no way to get information about supported PMU events directly from xctrace, but it could be fetched from KPEP database files, stored is `/usr/share/kpep`.
`xctracenorm` relies on that data to validate events specified by a user, if any, and also to print a help message that gives some insights into what could be sampled.
To sum up, there are a few things that were implemented to make `xctracenorm` profiler works:
- CPU model deletion using `sysctl`;
- KPEP file parsing to extract information about the PMU and all supported events;
- selected performance events validation;
- Instruments package building (generate XML, call a builder tool), packages are cached in `~/Library/Caches/org.openjdk.jmh`;
- xctrace execution, resulting samples extraction, and aggregation;
- samples postprocessing to calculate some additional metrics, like CPI and branch missprediction ratio.
Currently, if a user didn't specify any additional options, `xctracenorm` will sample instructions, cycles, branches and mispredicted branches events.
These were selected as events that should be supported in all hardware macOS runs on; only 4 events were selected for the same reason.
Profiling results look like this on M2-based MacBook:
java -jar ./benchmarks.jar -prof xctracenorm -f 1 JMHSample_35_Profilers.Atomic
...
Benchmark Mode Cnt Score Error Units
JMHSample_35_Profilers.Atomic.test avgt 5 4.055 ± 0.185 ns/op
JMHSample_35_Profilers.Atomic.test:BRANCH_MISPRED_NONSPEC avgt ≈ 10⁻⁵ #/op
JMHSample_35_Profilers.Atomic.test:Branch miss ratio avgt ≈ 10⁻⁶ BRANCH_MISPRED_NONSPEC/INST_BRANCH
JMHSample_35_Profilers.Atomic.test:CORE_ACTIVE_CYCLE avgt 10.541 #/op
JMHSample_35_Profilers.Atomic.test:CPI avgt 0.351 CORE_ACTIVE_CYCLE/INST_ALL
JMHSample_35_Profilers.Atomic.test:INST_ALL avgt 30.031 #/op
JMHSample_35_Profilers.Atomic.test:INST_BRANCH avgt 3.850 #/op
JMHSample_35_Profilers.Atomic.test:INST_BRANCH density (of instructions) avgt 0.128 INST_BRANCH/INST_ALL
JMHSample_35_Profilers.Atomic.test:IPC avgt 2.849 INST_ALL/CORE_ACTIVE_CYCLE
Here are some alternatives to existing implementation (of default parameters mode, mostly):
- Create an Instruments template and bundle it with JMH instead of generating a package dynamically: templates have a proprietary and obscure format, it's unclear if a template created on one device will work on other devices, or whether it continues working well after Xcode update. Also, dynamic package generation facilitates PMU events selection in CLI when running benchmarks, which seems to be much more convenient that setting up templates in UI.
- Don't parse KPEP files: data from these files used for validation, and also allows to gather info about what could be sampled without going to a separate tool. Validation is necessary as xctrace simply crashes when something is wrong with selected events.
Open questions and things that require some fixes:
* [ ] Currently, only PMU sampling was supported, however, profiler may also sample some software events like context switches, virtual memory operations and syscall statistics. It's unclear if all that should ever be supported.
* [ ] Some parts of the profiling process, namely KPEP parsing and Instruments package building are covered by tests in `jmh-core-it` module (`XCTraceSupportTest`). Tests functions are package-private in `jmh-core`, so to test them on `it` module I had to call everything through reflection, which definitely doesn't look good. I'm not sure how to both keep the API private and test in another module, so I'm looking forward to any advice on that.
* [ ] `CPU Counters` doesn't work inside VMs (at least, because there are no kpep-files for VM CPU ids, so xctrace can't load those files to fetch info about PMU events; and even if a KPEP file is there, there's no way to sample PMU counters inside VM), so all positive profiler's tests are currently skipped inside GH agents. I'm not sure what could be done here to improve testability.
* [x] When running tests locally, surefire forks used in `jmh-core-it` cause interference between `xctraceasm` and `xctracenorm` tests leading to test failures. I'm currently looking into how to overcome the issue and serialize these tests' execution. In the worst case, tests for both profilers could be placed in a single test class, I guess.
Speaking of testing, I manually ran basic scenarios (listing profilers, printing a help message, listing supported events, and, finally, profiling) on Intel-, M1- and M2-based MacBooks.
Results and a script I used to collect data could be found here: https://gist.github.com/fzhinkin/f7c5db00f5e3417191d66994ed880818
-------------
Commit messages:
- 7903722: Add extra tests
- 7903722: Scan all possible KPEP file locations
- 7903722: Serialize xctrace tests execution
- 7903722: simplified code, added missing docs, supported branch events
- 7903722: Improve events preprocessing
- 7903722: Refactor KPEP database loading
- 7903722: compute AS Arm64 instructions density metrics
- 7903722: check if all listed events could be sampled simultaneously
- 7903722: Add xctracenorm profiler
- 7903722: Get rid of custom event aliases
- ... and 4 more: https://git.openjdk.org/jmh/compare/6d6ce631...ecba1544
Changes: https://git.openjdk.org/jmh/pull/131/files
Webrev: https://webrevs.openjdk.org/?repo=jmh&pr=131&range=00
Issue: https://bugs.openjdk.org/browse/CODETOOLS-7903722
Stats: 5800 lines in 16 files changed: 5779 ins; 0 del; 21 mod
Patch: https://git.openjdk.org/jmh/pull/131.diff
Fetch: git fetch https://git.openjdk.org/jmh.git pull/131/head:pull/131
PR: https://git.openjdk.org/jmh/pull/131
More information about the jmh-dev
mailing list