RFR: CODETOOLS-7902799: perfasm still handles event modifiers incorrectly [v4]
Aleksey Shipilev
shade at openjdk.java.net
Tue Dec 8 22:44:38 UTC 2020
On Tue, 8 Dec 2020 20:39:57 GMT, Volker Simonis <simonis at openjdk.org> wrote:
>> Using the perfasm profiler with an event specification like `-prof perfasm:events=cycles:ppp` will result in no output because JMH won't find any events:
>> Secondary result "io.simonis.jmh.Synchronization.synchronizedIncrementGlobal:·asm":
>> PrintAssembly processed: 142164 total address lines.
>> Perf output processed (skipped 7.859 seconds):
>> Column 1: cycles:ppp (0 events)
>> This is because since [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) the tags are removed from the events when the perf output is parsed. But the list of profiled events (which is kept in the `AbstractPerfAsmProfiler` base class of `LinuxPerfAsmProfiler`) still keeps the original event specification with the tag.
>>
>> I've tried to normalize all event name usages in `LinuxPerfAsmProfiler` to use the event name without tags but doing so is not easy because in the end the base class will check the original events against the parsed ones in `processAssembly()/PerfEvents()` and fail (i.e. in the sense that it won't find any events).
>>
>> So in the end I gave up and simple removed the part of [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) which chops the tags from the event name. I don't think that's necessary and everything seems to still work just fine with the full name plus tags (except for a tests in `PerfParseTest.java` which had to be removed because it specifically checked that `LinuxPerfAsmProfiler.parsePerfLine()` removes tags from event names which are parsed from the `perf` output).
>>
>>
>> This fix will make it possible to use the `:ppp` tag which prevents instruction skew in `perf` results (see http://www.brendangregg.com/perf.html#EventProfiling). So the following output:
>> PrintAssembly processed: 142764 total address lines.
>> Perf output processed (skipped 7.898 seconds):
>> Column 1: cycles (10247 events)
>>
>> Hottest code regions (>10.00% "cycles" events):
>>
>> ....[Hottest Region 1]..............................................................................
>> c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 468 (221 bytes)
>>
>> 0.03% 0x00007fffe01e1f37: test $0x2,%rax
>> ╭ 0x00007fffe01e1f3d: jne 0x00007fffe01e1f63
>> │ 0x00007fffe01e1f3f: or $0x1,%rax
>> │ 0x00007fffe01e1f43: mov %rax,(%rbx)
>> │ 0x00007fffe01e1f46: lock cmpxchg %rbx,(%rsi)
>> │╭ 0x00007fffe01e1f4b: je 0x00007fffe01e1f76
>> ││ 0x00007fffe01e1f51: sub %rsp,%rax
>> ││ 0x00007fffe01e1f54: and $0xfffffffffffff007,%rax
>> ││ 0x00007fffe01e1f5b: mov %rax,(%rbx)
>> ││╭ 0x00007fffe01e1f5e: jmpq 0x00007fffe01e1f76
>> 0.32% ↘││ 0x00007fffe01e1f63: mov %rax,%r11
>> 0.01% ││ 0x00007fffe01e1f66: xor %rax,%rax
>> 0.07% ││ 0x00007fffe01e1f69: lock cmpxchg %r15,0x3e(%r11)
>> 35.31% ││ 0x00007fffe01e1f6f: movq $0x3,(%rbx)
>> 0.15% ↘↘╭ 0x00007fffe01e1f76: jne 0x00007fffe01e202b ;*synchronization entry
>> which obviously suffers from instruction skew because it attributes 35% of the cycles to the `movq` instruction after the `lock cmpxchg` will be fixed and look correctly like this:
>> PrintAssembly processed: 143768 total address lines.
>> Perf output processed (skipped 7.921 seconds):
>> Column 1: cycles:ppp (10074 events)
>>
>> Hottest code regions (>10.00% "cycles:ppp" events):
>>
>> ....[Hottest Region 1]..............................................................................
>> c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 481 (137 bytes)
>>
>> 0x00007fffe01e3fb7: test $0x2,%rax
>> 0.32% ╭ 0x00007fffe01e3fbd: jne 0x00007fffe01e3fe3
>> │ 0x00007fffe01e3fbf: or $0x1,%rax
>> │ 0x00007fffe01e3fc3: mov %rax,(%rbx)
>> │ 0x00007fffe01e3fc6: lock cmpxchg %rbx,(%rsi)
>> │╭ 0x00007fffe01e3fcb: je 0x00007fffe01e3ff6
>> ││ 0x00007fffe01e3fd1: sub %rsp,%rax
>> ││ 0x00007fffe01e3fd4: and $0xfffffffffffff007,%rax
>> ││ 0x00007fffe01e3fdb: mov %rax,(%rbx)
>> ││╭ 0x00007fffe01e3fde: jmpq 0x00007fffe01e3ff6
>> 0.01% ↘││ 0x00007fffe01e3fe3: mov %rax,%r11
>> 0.05% ││ 0x00007fffe01e3fe6: xor %rax,%rax
>> 35.03% ││ 0x00007fffe01e3fe9: lock cmpxchg %r15,0x3e(%r11)
>> 0.21% ││ 0x00007fffe01e3fef: movq $0x3,(%rbx)
>> 0.26% ↘↘ 0x00007fffe01e3ff6: jne 0x00007fffe01e40ab ;*synchronization entry
>
> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Merge branch 'master' into CODETOOLS-7902799
> - Added new version from Aleksey
> - Added back test for tag parsing
> - CODETOOLS-7902799: perfasm still handles event modifiers incorrectly
Marked as reviewed by shade (Committer).
-------------
PR: https://git.openjdk.java.net/jmh/pull/9
More information about the jmh-dev
mailing list