RFR: CODETOOLS-7902799: perfasm still handles event modifiers incorrectly

Volker Simonis simonis at openjdk.java.net
Wed Nov 25 19:22:13 UTC 2020


Using the perfasm profiler with an event specification like `-prof perfasm:events=cycles:ppp` will result in no output because JMH won't find any events:
Secondary result "io.simonis.jmh.Synchronization.synchronizedIncrementGlobal:·asm":
PrintAssembly processed: 142164 total address lines.
Perf output processed (skipped 7.859 seconds):
 Column 1: cycles:ppp (0 events)
This is because since [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) the tags are removed from the events when the perf output is parsed. But the list of profiled events (which is kept in the `AbstractPerfAsmProfiler` base class of `LinuxPerfAsmProfiler`) still keeps the original event specification with the tag.

I've tried to normalize all event name usages in `LinuxPerfAsmProfiler` to use the event name without tags but doing so is not easy because in the end the base class will check the original events against the parsed ones in `processAssembly()/PerfEvents()` and fail (i.e. in the sense that it won't find any events).

So in the end I gave up and simple removed the part of [CODETOOLS-7901905](https://bugs.openjdk.java.net/browse/CODETOOLS-7902799) which chops the tags from the event name. I don't think that's necessary and everything seems to still work just fine with the full name plus tags (except for a tests in `PerfParseTest.java` which had to be removed because it specifically checked that `LinuxPerfAsmProfiler.parsePerfLine()` removes tags from event names which are parsed from the `perf` output).


This fix will make it possible to use the `:ppp` tag which prevents instruction skew in `perf` results (see http://www.brendangregg.com/perf.html#EventProfiling). So the following output:
PrintAssembly processed: 142764 total address lines.
Perf output processed (skipped 7.898 seconds):
 Column 1: cycles (10247 events)

Hottest code regions (>10.00% "cycles" events):

....[Hottest Region 1]..............................................................................
c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 468 (221 bytes)

  0.03% 0x00007fffe01e1f37: test $0x2,%rax
         ╭ 0x00007fffe01e1f3d: jne 0x00007fffe01e1f63
         │ 0x00007fffe01e1f3f: or $0x1,%rax
         │ 0x00007fffe01e1f43: mov %rax,(%rbx)
         │ 0x00007fffe01e1f46: lock cmpxchg %rbx,(%rsi)
         │╭ 0x00007fffe01e1f4b: je 0x00007fffe01e1f76
         ││ 0x00007fffe01e1f51: sub %rsp,%rax
         ││ 0x00007fffe01e1f54: and $0xfffffffffffff007,%rax
         ││ 0x00007fffe01e1f5b: mov %rax,(%rbx)
         ││╭ 0x00007fffe01e1f5e: jmpq 0x00007fffe01e1f76
  0.32% ↘││ 0x00007fffe01e1f63: mov %rax,%r11
  0.01% ││ 0x00007fffe01e1f66: xor %rax,%rax
  0.07% ││ 0x00007fffe01e1f69: lock cmpxchg %r15,0x3e(%r11)
 35.31% ││ 0x00007fffe01e1f6f: movq $0x3,(%rbx)
  0.15% ↘↘╭ 0x00007fffe01e1f76: jne 0x00007fffe01e202b ;*synchronization entry
which obviously suffers from instruction skew because it attributes 35% of the cycles to the `movq` instruction after the `lock cmpxchg` will be fixed and look correctly like this:
PrintAssembly processed: 143768 total address lines.
Perf output processed (skipped 7.921 seconds):
 Column 1: cycles:ppp (10074 events)

Hottest code regions (>10.00% "cycles:ppp" events):

....[Hottest Region 1]..............................................................................
c2, level 4, io.simonis.jmh.Synchronization::synchronizedIncrementGlobal, version 481 (137 bytes)

                   0x00007fffe01e3fb7: test $0x2,%rax
  0.32% ╭ 0x00007fffe01e3fbd: jne 0x00007fffe01e3fe3
         │ 0x00007fffe01e3fbf: or $0x1,%rax
         │ 0x00007fffe01e3fc3: mov %rax,(%rbx)
         │ 0x00007fffe01e3fc6: lock cmpxchg %rbx,(%rsi)
         │╭ 0x00007fffe01e3fcb: je 0x00007fffe01e3ff6
         ││ 0x00007fffe01e3fd1: sub %rsp,%rax
         ││ 0x00007fffe01e3fd4: and $0xfffffffffffff007,%rax
         ││ 0x00007fffe01e3fdb: mov %rax,(%rbx)
         ││╭ 0x00007fffe01e3fde: jmpq 0x00007fffe01e3ff6
  0.01% ↘││ 0x00007fffe01e3fe3: mov %rax,%r11
  0.05% ││ 0x00007fffe01e3fe6: xor %rax,%rax
 35.03% ││ 0x00007fffe01e3fe9: lock cmpxchg %r15,0x3e(%r11)
  0.21% ││ 0x00007fffe01e3fef: movq $0x3,(%rbx)
  0.26% ↘↘ 0x00007fffe01e3ff6: jne 0x00007fffe01e40ab ;*synchronization entry

-------------

Commit messages:
 - CODETOOLS-7902799: perfasm still handles event modifiers incorrectly

Changes: https://git.openjdk.java.net/jmh/pull/9/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jmh&pr=9&range=00
  Issue: https://bugs.openjdk.java.net/browse/CODETOOLS-7902799
  Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod
  Patch: https://git.openjdk.java.net/jmh/pull/9.diff
  Fetch: git fetch https://git.openjdk.java.net/jmh pull/9/head:pull/9

PR: https://git.openjdk.java.net/jmh/pull/9


More information about the jmh-dev mailing list