RFR: 7903740: JMH: Perf event validation not working with skid options [v4]
Ian Rogers
irogers at google.com
Thu Sep 5 16:12:39 UTC 2024
On Thu, Sep 5, 2024 at 8:54 AM Galder Zamarreño <galder at openjdk.org> wrote:
>
> On Thu, 22 Aug 2024 12:54:10 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>
> >> Well, I would have expected we parsed `perf record` error and error code directly?
> >>
> >>
> >> % perf record -e blah echo 1
> >> event syntax error: 'blah'
> >> ___ parser error
> >> Run 'perf list' for a list of valid events
> >>
> >> Usage: perf record [<options>] [<command>]
> >> or: perf record [<options>] -- <command> [<options>]
> >>
> >> -e, --event <event> event selector. use 'perf list' to list available events
> >>
> >> % echo $?
> >> 129
> >>
> >> % perf record -e cycles:p echo 1
> >> Error:
> >> cycles:p: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
> >>
> >> % echo $?
> >> 255
> >>
> >> % perf record -e cycles echo 1
> >> 1
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.013 MB perf.data (16 samples) ]
> >>
> >> % echo $?
> >> 0
> >>
> >>
> >> Yes, I understand it actually does not test whether the counters _report_ non-zero values, but that's okay. We only do a light-weight validation that `perf record` even accepts these.
> >
> > Failing hard when some events cannot be supported is also proper, I'd think. So there is no need to verify them one by one. Keep it as simple as possible?
I'd think so :-) Fwiw, ARM have added a cycles event to their L3 cache
(in ARM language dsu) and so now "perf record -e cycles" fails on
Ampere CPUs as L3 cache events can't sample. A proposed work around is
to make it so that events don't fail.
> The thing is that `perf record` does not always fail when it doesn't work. For example in container ([example](https://github.com/galderz/github-actions/actions/runs/10723692355/job/29737549503)):
>
>
> $ perf record --event cycles --output perf-record-validate.data echo 1
> WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
> check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
There is special fallback for cycles to try to move it to a timer
based (aka software) event cpu-clock or task-clock.
> Samples in kernel functions may not be resolved if a suitable vmlinux
> file is not found in the buildid cache or in the vmlinux path.
>
> Samples in kernel modules won't be resolved at all.
>
> If some relocation was applied (e.g. kexec) symbols may be misresolved
> even with a suitable vmlinux or kallsyms file.
>
> Couldn't record kernel reference relocation symbol
> Symbol resolution may be skewed if relocation was used (e.g. kexec).
> Check /proc/kallsyms permission or run as root.
> 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.001 MB perf-record-validate.data (3 samples) ]
>
> $ echo $?
> 0
>
> $ perf report -q -i perf-record-validate.data --stdio | grep cycles | wc -l
> 0
There's a --stats option for perf report, for example:
```
$ sudo perf record -e cycles:P -o - true | perf report -i - --
stats
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.058 MB - ]
Aggregated stats:
TOTAL events: 459
MMAP events: 295 (64.3%)
COMM events: 2 ( 0.4%)
EXIT events: 1 ( 0.2%)
SAMPLE events: 7 ( 1.5%)
MMAP2 events: 4 ( 0.9%)
KSYMBOL events: 86 (18.7%)
BPF_EVENT events: 36 ( 7.8%)
ATTR events: 1 ( 0.2%)
FINISHED_ROUND events: 1 ( 0.2%)
ID_INDEX events: 1 ( 0.2%)
THREAD_MAP events: 1 ( 0.2%)
CPU_MAP events: 1 ( 0.2%)
EVENT_UPDATE events: 1 ( 0.2%)
TIME_CONV events: 1 ( 0.2%)
FEATURE events: 20 ( 4.4%)
FINISHED_INIT events: 1 ( 0.2%)
cycles:ppp stats:
SAMPLE events: 7
```
> A lot of the issues I was having was because I assumed that in container `perf record` would fail but it doesn't, and later on you get errors saying that no events were recorded.
>
> And in any case, if `perf record` fails, that's already handled by the current code when it calls `!failMsg.isEmpty()`.
>
> > Failing hard when some events cannot be supported is also proper, I'd think. So there is no need to verify them one by one. Keep it as simple as possible?
>
> I'll try to craft a command that checks all the events in one go and if any is missing fail hard.
Perhaps directly using task-clock:u (ie restricting events to user
land) can lessen the problems.
Thanks,
Ian
> -------------
>
> PR Review Comment: https://git.openjdk.org/jmh/pull/132#discussion_r1745797969
More information about the jmh-dev
mailing list