RFR: 7903722: JMH: Add xctrace-based perfnorm profiler for macOS

Fri Sep 20 18:22:49 UTC 2024

On Mon, 16 Sep 2024 15:23:36 GMT, Filipp Zhinkin <fzhinkin at openjdk.org> wrote:

>> Implementation of a perfnorm-alike profiler for macOS based on `xctrace` command line tool bundled with Xcode.
>> 
>> While the profiler is tested and seems to be working well, I consider it rather a preliminary version and open to a discussion on what and how it should measure.
>> 
>> Currently, the profiler only supports PMU counters sampling using `CPU Counters` instrument provided by the Instruments app / xctrace.
>> Unfortunately, `CPU Counters` instrument has no default settings, unlike `Time Profiler` and `CPU Profiler` instruments used by the recently merged `xctraceasm` profiler.
>> To use `CPU Counters`, a user has to create a template in the Instruments UI, select PMU events, save the template and then supply to `xctracenorm` as an argument.
>> 
>> This workflow not only prevents use of the profiler without preliminary manual configuration, but also tends to be annoying when it comes to measuring multiple events, as xctrace, unlike perf_events, does not support events multiplexing.
>> 
>> Thankfully, command-line-based configuration and default parameters could be emulated by building a custom Instruments package that imports data from `CPU Counters` and also supplies all required parameters.
>> As you can guess, there's no way to get information about supported PMU events directly from xctrace, but it could be fetched from KPEP database files, stored is `/usr/share/kpep`.
>> `xctracenorm` relies on that data to validate events specified by a user, if any, and also to print a help message that gives some insights into what could be sampled.
>> 
>> To sum up, there are a few things that were implemented to make `xctracenorm` profiler works:
>> - CPU model deletion using `sysctl`;
>> - KPEP file parsing to extract information about the PMU and all supported events;
>> - selected performance events validation;
>> - Instruments package building (generate XML, call a builder tool), packages are cached in `~/Library/Caches/org.openjdk.jmh`;
>> - xctrace execution, resulting samples extraction, and aggregation;
>> - samples postprocessing to calculate some additional metrics, like CPI and branch missprediction ratio.
>> 
>> Currently, if a user didn't specify any additional options, `xctracenorm` will sample instructions, cycles, branches and mispredicted branches events. 
>> These were selected as events that should be supported in all hardware macOS runs on; only 4 events were selected for the same reason.
>> 
>> Profiling results look like this on M2-based MacBook:
>> 
>> j...
>
> @shipilev hey! Does it make any sense to keep this PR open (and continuing working on it)? 
> I guess, the immense amount of changes blocks this PR from being reviewed in a reasonable time frame. I'd love to throw away as much code as possible (see https://github.com/openjdk/jmh/pull/131#issuecomment-2131129163; also Summary's "Here are some alternatives to existing implementation (of default parameters mode, mostly)" portion), but to keep the profiler as user-friendly as, for instance, `perfnorm`, some portion of logic gluing all parts together have to be preserved.
> 
> Please let me know what would work better (if anything).

Hi @fzhinkin! Sorry, the delays are my fault, I keep getting distracted to other things, and this is a huge PR.

I think we should indeed consider trimming down things that we do here. This is an impressive piece of work, but I cannot help but think how clunky this MacOS profiling interface really is. Building a package to configure a profiler, wow, what a hassle.

If we commit to do an extended thing here, it would be harder to retract. If we start small, maybe we do not really need to extend it any in future. So I think we should be doing a favor to ourselves here, and only implement a very basic support: no building packages, no selectable events, a preconfigured template (which we might later conditionalize on xcode version?) with most useful events, an ability to substitute your own template (for power users). Yes, it would not be as great as `perfnorm`, but that's the tradeoff we make not to deal with the profiling interface clunkiness.

Does that work?

To hopefully answer the specific questions:

> Currently, only PMU sampling was supported, however, profiler may also sample some software events like context switches, virtual memory operations and syscall statistics. It's unclear if all that should ever be supported.

No need to make it even more complicated. Scope it out of this PR.

> Some parts of the profiling process, [...] Tests functions are package-private in jmh-core, so to test them on it module I had to call everything through reflection, which definitely doesn't look good. I'm not sure how to both keep the API private and test in another module, so I'm looking forward to any advice on that.

Reflection is fine for now. If this becomes a real burden, we can rethink how tests are factored.

> CPU Counters doesn't work inside VMs [...], so all positive profiler's tests are currently skipped inside GH agents. I'm not sure what could be done here to improve testability.

Skipping tests in GHA if GHA runners cannot support them is fine.

-------------

PR Comment: https://git.openjdk.org/jmh/pull/131#issuecomment-2364284090