RFR: CODETOOLS-7902799: perfasm still handles event modifiers incorrectly
Volker Simonis
simonis at openjdk.java.net
Wed Nov 25 19:22:13 UTC 2020
Using the perfasm profiler with an event specification like `-prof perfasm:events=cycles:ppp` will result in no output because JMH won't find any events:
Secondary result "io.simonis.jmh.Synchronization.synchronizedIncrementGlobal:·asm":
PrintAssembly processed: 142164 total address lines.
Perf output processed (skipped 7.859 seconds):
Column 1: cycles:ppp (0 events)
This is because since [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) the tags are removed from the events when the perf output is parsed. But the list of profiled events (which is kept in the `AbstractPerfAsmProfiler` base class of `LinuxPerfAsmProfiler`) still keeps the original event specification with the tag.
I've tried to normalize all event name usages in `LinuxPerfAsmProfiler` to use the event name without tags but doing so is not easy because in the end the base class will check the original events against the parsed ones in `processAssembly()/PerfEvents()` and fail (i.e. in the sense that it won't find any events).
So in the end I gave up and simple removed the part of [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) which chops the tags from the event name. I don't think that's necessary and everything seems to still work just fine with the full name plus tags (except for a tests in `PerfParseTest.java` which had to be removed because it specifically checked that `LinuxPerfAsmProfiler.parsePerfLine()` removes tags from event names which are parsed from the `perf` output).
This fix will make it possible to use the `:ppp` tag which prevents instruction skew in `perf` results (see http://www.brendangregg.com/perf.html#EventProfiling). So the following output:
PrintAssembly processed: 142764 total address lines.
Perf output processed (skipped 7.898 seconds):
Column 1: cycles (10247 events)
Hottest code regions (>10.00% "cycles" events):
....[Hottest Region 1]..............................................................................
c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 468 (221 bytes)
0.03% 0x00007fffe01e1f37: test $0x2,%rax
╭ 0x00007fffe01e1f3d: jne 0x00007fffe01e1f63
│ 0x00007fffe01e1f3f: or $0x1,%rax
│ 0x00007fffe01e1f43: mov %rax,(%rbx)
│ 0x00007fffe01e1f46: lock cmpxchg %rbx,(%rsi)
│╭ 0x00007fffe01e1f4b: je 0x00007fffe01e1f76
││ 0x00007fffe01e1f51: sub %rsp,%rax
││ 0x00007fffe01e1f54: and $0xfffffffffffff007,%rax
││ 0x00007fffe01e1f5b: mov %rax,(%rbx)
││╭ 0x00007fffe01e1f5e: jmpq 0x00007fffe01e1f76
0.32% ↘││ 0x00007fffe01e1f63: mov %rax,%r11
0.01% ││ 0x00007fffe01e1f66: xor %rax,%rax
0.07% ││ 0x00007fffe01e1f69: lock cmpxchg %r15,0x3e(%r11)
35.31% ││ 0x00007fffe01e1f6f: movq $0x3,(%rbx)
0.15% ↘↘╭ 0x00007fffe01e1f76: jne 0x00007fffe01e202b ;*synchronization entry
which obviously suffers from instruction skew because it attributes 35% of the cycles to the `movq` instruction after the `lock cmpxchg` will be fixed and look correctly like this:
PrintAssembly processed: 143768 total address lines.
Perf output processed (skipped 7.921 seconds):
Column 1: cycles:ppp (10074 events)
Hottest code regions (>10.00% "cycles:ppp" events):
....[Hottest Region 1]..............................................................................
c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 481 (137 bytes)
0x00007fffe01e3fb7: test $0x2,%rax
0.32% ╭ 0x00007fffe01e3fbd: jne 0x00007fffe01e3fe3
│ 0x00007fffe01e3fbf: or $0x1,%rax
│ 0x00007fffe01e3fc3: mov %rax,(%rbx)
│ 0x00007fffe01e3fc6: lock cmpxchg %rbx,(%rsi)
│╭ 0x00007fffe01e3fcb: je 0x00007fffe01e3ff6
││ 0x00007fffe01e3fd1: sub %rsp,%rax
││ 0x00007fffe01e3fd4: and $0xfffffffffffff007,%rax
││ 0x00007fffe01e3fdb: mov %rax,(%rbx)
││╭ 0x00007fffe01e3fde: jmpq 0x00007fffe01e3ff6
0.01% ↘││ 0x00007fffe01e3fe3: mov %rax,%r11
0.05% ││ 0x00007fffe01e3fe6: xor %rax,%rax
35.03% ││ 0x00007fffe01e3fe9: lock cmpxchg %r15,0x3e(%r11)
0.21% ││ 0x00007fffe01e3fef: movq $0x3,(%rbx)
0.26% ↘↘ 0x00007fffe01e3ff6: jne 0x00007fffe01e40ab ;*synchronization entry
-------------
Commit messages:
- CODETOOLS-7902799: perfasm still handles event modifiers incorrectly
Changes: https://git.openjdk.java.net/jmh/pull/9/files
Webrev: https://webrevs.openjdk.java.net/?repo=jmh&pr=9&range=00
Issue: https://bugs.openjdk.java.net/browse/CODETOOLS-7902799
Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod
Patch: https://git.openjdk.java.net/jmh/pull/9.diff
Fetch: git fetch https://git.openjdk.java.net/jmh pull/9/head:pull/9
PR: https://git.openjdk.java.net/jmh/pull/9
More information about the jmh-dev
mailing list