RFR: CODETOOLS-7902799: perfasm still handles event modifiers incorrectly [v2]

Thu Nov 26 10:32:17 UTC 2020

> Using the perfasm profiler with an event specification like `-prof perfasm:events=cycles:ppp` will result in no output because JMH won't find any events:
> Secondary result "io.simonis.jmh.Synchronization.synchronizedIncrementGlobal:·asm":
> PrintAssembly processed: 142164 total address lines.
> Perf output processed (skipped 7.859 seconds):
>  Column 1: cycles:ppp (0 events)
> This is because since [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) the tags are removed from the events when the perf output is parsed. But the list of profiled events (which is kept in the `AbstractPerfAsmProfiler` base class of `LinuxPerfAsmProfiler`) still keeps the original event specification with the tag.
> 
> I've tried to normalize all event name usages in `LinuxPerfAsmProfiler` to use the event name without tags but doing so is not easy because in the end the base class will check the original events against the parsed ones in `processAssembly()/PerfEvents()` and fail (i.e. in the sense that it won't find any events).
> 
> So in the end I gave up and simple removed the part of [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) which chops the tags from the event name. I don't think that's necessary and everything seems to still work just fine with the full name plus tags (except for a tests in `PerfParseTest.java` which had to be removed because it specifically checked that `LinuxPerfAsmProfiler.parsePerfLine()` removes tags from event names which are parsed from the `perf` output).
> 
> 
> This fix will make it possible to use the `:ppp` tag which prevents instruction skew in `perf` results (see http://www.brendangregg.com/perf.html#EventProfiling). So the following output:
> PrintAssembly processed: 142764 total address lines.
> Perf output processed (skipped 7.898 seconds):
>  Column 1: cycles (10247 events)
> 
> Hottest code regions (>10.00% "cycles" events):
> 
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 468 (221 bytes)
> 
>   0.03% 0x00007fffe01e1f37: test $0x2,%rax
>          ╭ 0x00007fffe01e1f3d: jne 0x00007fffe01e1f63
>          │ 0x00007fffe01e1f3f: or $0x1,%rax
>          │ 0x00007fffe01e1f43: mov %rax,(%rbx)
>          │ 0x00007fffe01e1f46: lock cmpxchg %rbx,(%rsi)
>          │╭ 0x00007fffe01e1f4b: je 0x00007fffe01e1f76
>          ││ 0x00007fffe01e1f51: sub %rsp,%rax
>          ││ 0x00007fffe01e1f54: and $0xfffffffffffff007,%rax
>          ││ 0x00007fffe01e1f5b: mov %rax,(%rbx)
>          ││╭ 0x00007fffe01e1f5e: jmpq 0x00007fffe01e1f76
>   0.32% ↘││ 0x00007fffe01e1f63: mov %rax,%r11
>   0.01% ││ 0x00007fffe01e1f66: xor %rax,%rax
>   0.07% ││ 0x00007fffe01e1f69: lock cmpxchg %r15,0x3e(%r11)
>  35.31% ││ 0x00007fffe01e1f6f: movq $0x3,(%rbx)
>   0.15% ↘↘╭ 0x00007fffe01e1f76: jne 0x00007fffe01e202b ;*synchronization entry
> which obviously suffers from instruction skew because it attributes 35% of the cycles to the `movq` instruction after the `lock cmpxchg` will be fixed and look correctly like this:
> PrintAssembly processed: 143768 total address lines.
> Perf output processed (skipped 7.921 seconds):
>  Column 1: cycles:ppp (10074 events)
> 
> Hottest code regions (>10.00% "cycles:ppp" events):
> 
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 481 (137 bytes)
> 
>                    0x00007fffe01e3fb7: test $0x2,%rax
>   0.32% ╭ 0x00007fffe01e3fbd: jne 0x00007fffe01e3fe3
>          │ 0x00007fffe01e3fbf: or $0x1,%rax
>          │ 0x00007fffe01e3fc3: mov %rax,(%rbx)
>          │ 0x00007fffe01e3fc6: lock cmpxchg %rbx,(%rsi)
>          │╭ 0x00007fffe01e3fcb: je 0x00007fffe01e3ff6
>          ││ 0x00007fffe01e3fd1: sub %rsp,%rax
>          ││ 0x00007fffe01e3fd4: and $0xfffffffffffff007,%rax
>          ││ 0x00007fffe01e3fdb: mov %rax,(%rbx)
>          ││╭ 0x00007fffe01e3fde: jmpq 0x00007fffe01e3ff6
>   0.01% ↘││ 0x00007fffe01e3fe3: mov %rax,%r11
>   0.05% ││ 0x00007fffe01e3fe6: xor %rax,%rax
>  35.03% ││ 0x00007fffe01e3fe9: lock cmpxchg %r15,0x3e(%r11)
>   0.21% ││ 0x00007fffe01e3fef: movq $0x3,(%rbx)
>   0.26% ↘↘ 0x00007fffe01e3ff6: jne 0x00007fffe01e40ab ;*synchronization entry

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Added back test for tag parsing

-------------

Changes:
  - all: https://git.openjdk.java.net/jmh/pull/9/files
  - new: https://git.openjdk.java.net/jmh/pull/9/files/49710367..18671528

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jmh&pr=9&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jmh&pr=9&range=00-01

  Stats: 22 lines in 1 file changed: 22 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jmh/pull/9.diff
  Fetch: git fetch https://git.openjdk.java.net/jmh pull/9/head:pull/9

PR: https://git.openjdk.java.net/jmh/pull/9