The mix of native and java code in "-prof perfasm"

Sergey Bylokhov sergey.bylokhov at oracle.com
Thu Nov 12 16:31:07 UTC 2015


Hello Everybody.
Probably this was discussed already, but I fails to find an answer to my question.

I have a benchmark which spend most of the time in the native library.
If I run it with "-prof perfasm" but without "-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly" I got correct result [1].
But if I add -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly then information about the native code is missing [2]:

Is it a bug or I missed some additional options? Thanks.

The benchmark:

import org.openjdk.jmh.annotations.*;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS)
public class SoftwareBlitLoopsItself {

    @Param({"2"})
    public int dsttype;
    @Param({"5000"})
    public int size;
    @Param({"SrcOver"})
    public String comp;

    private BufferedImage dst;
    private Graphics2D g;

    @Setup
    public void setup() {
        dst = new BufferedImage(size, size, dsttype);
        g = dst.createGraphics();
        if (comp.equals("src")) {
            g.setComposite(AlphaComposite.Src);
        } else {
            g.setComposite(AlphaComposite.SrcOver);
        }
    }

    @Benchmark
    public void drawItself() {
        g.drawImage(dst, 0, 0, null);
    }
}

================================================
================================================
================================================

[1] Correct results:

PrintAssembly processed: 160704 total address lines.
Perf output processed (skipped 5,178 seconds):
 Column 1: cycles (6356 events)
 Column 2: instructions (6506 events)

Hottest code regions (>10,00% "cycles" events):
....[Hottest Region 1]..............................................................................
 [0x7f912876c920:0x7f912876c936] in IntArgbToIntArgbSrcOverMaskBlit (libawt.so)

 <no assembly is recorded, native region>
....................................................................................................
 58,02%   55,81%  <total for region 1>

....[Hottest Region 2]..............................................................................
 [0x7f912876ca2f:0x7f912876ca61] in IntArgbToIntArgbSrcOverMaskBlit (libawt.so)

 <no assembly is recorded, native region>
....................................................................................................
 38,51%   40,30%  <total for region 2>

....[Hottest Regions]...............................................................................
 58,02%   55,81%  [0x7f912876c920:0x7f912876c936] in IntArgbToIntArgbSrcOverMaskBlit (libawt.so)
 38,51%   40,30%  [0x7f912876ca2f:0x7f912876ca61] in IntArgbToIntArgbSrcOverMaskBlit (libawt.so)
  0,52%    0,49%  [0xffffffff8105b83a:0xffffffff8105b83a] in native_write_msr_safe ([kernel.kallsyms])
  0,11%    0,05%  [0xffffffff817b43f0:0xffffffff817b43f0] in mutex_unlock ([kernel.kallsyms])
  0,09%    0,05%  [0xffffffff810d31c1:0xffffffff810d31f8] in __srcu_read_lock; __srcu_read_unlock ([kernel.kallsyms])
  0,09%    0,03%  [0xffffffff817b5f7c:0xffffffff817b5f94] in _raw_spin_lock_irqsave ([kernel.kallsyms])
  0,08%           [0xffffffff813bceb2:0xffffffff813bceb2] in ioread32 ([kernel.kallsyms])
  0,08%    0,05%  [0xffffffff817b6610:0xffffffff817b661c] in system_call; system_call_after_swapgs ([kernel.kallsyms])
  0,06%           [0xffffffff811ee96c:0xffffffff811ee96c] in __sb_start_write ([kernel.kallsyms])
  0,05%           [0xffffffff817b60fe:0xffffffff817b60fe] in _raw_read_lock ([kernel.kallsyms])
  0,05%    0,09%  [0x7f91618c9519:0x7f917cca5000] in sun.misc.Unsafe::compareAndSwapInt
  0,05%    0,02%  [0x7f917c876e20:0x7f917c876e3f] in __pthread_enable_asynccancel (libpthread-2.19.so)
  0,05%           [0x7f917c876e97:0x7f917c876ea2] in __pthread_disable_asynccancel (libpthread-2.19.so)
  0,03%    0,06%  [0xffffffff81091150:0xffffffff81091181] in __task_pid_nr_ns ([kernel.kallsyms])
  0,03%           [0xffffffff811ec244:0xffffffff811ec244] in vfs_write ([kernel.kallsyms])
  0,03%    0,03%  [0xffffffff811f49a0:0xffffffff811f49b1] in pipe_write ([kernel.kallsyms])
  0,03%    0,06%  [0xffffffff81209441:0xffffffff8120947a] in __fget ([kernel.kallsyms])
  0,03%    0,02%  [0xffffffff813ccfbe:0xffffffff813ccfc4] in __percpu_counter_add ([kernel.kallsyms])
  0,03%           [0xffffffffc055190a:0xffffffffc055191b] in nvif_notify ([kernel.kallsyms])
  0,03%           [0x7f916186c060:0x7f916186c269] in <unresolved>
  2,00%    2,94%  <...other 228 warm regions...>
....................................................................................................
 99,98%   99,98%  <totals>

....[Hottest Methods (after inlining)]..............................................................
 96,54%   96,11%  IntArgbToIntArgbSrcOverMaskBlit (libawt.so)
  0,74%    0,95%  sun.misc.Unsafe::compareAndSwapInt
  0,52%    0,49%  native_write_msr_safe ([kernel.kallsyms])
  0,11%    0,05%  mutex_unlock ([kernel.kallsyms])
  0,09%    0,05%  __srcu_read_lock; __srcu_read_unlock ([kernel.kallsyms])
  0,09%    0,03%  _raw_spin_lock_irqsave ([kernel.kallsyms])
  0,08%    0,06%  pipe_write ([kernel.kallsyms])
  0,08%           ioread32 ([kernel.kallsyms])
  0,08%    0,05%  system_call; system_call_after_swapgs ([kernel.kallsyms])
  0,06%           __sb_start_write ([kernel.kallsyms])
  0,06%    0,17%  vfprintf (libc-2.19.so)
  0,05%           nvif_notify ([kernel.kallsyms])
  0,05%           __pthread_disable_asynccancel (libpthread-2.19.so)
  0,05%           _raw_read_lock ([kernel.kallsyms])
  0,05%    0,06%  _IO_fflush (libc-2.19.so)
  0,05%    0,02%  do_get_write_access ([kernel.kallsyms])
  0,05%    0,02%  __pthread_enable_asynccancel (libpthread-2.19.so)
  0,05%    0,06%  __task_pid_nr_ns ([kernel.kallsyms])
  0,03%    0,02%  fsnotify ([kernel.kallsyms])
  0,03%    0,03%  __percpu_counter_add ([kernel.kallsyms])
  1,13%    0,69%  <...other 64 warm methods...>
....................................................................................................
 99,98%   98,85%  <totals>

....[Distribution by Area]..........................................................................
 96,55%   96,11%  <native code in (libawt.so)>
  2,17%    2,06%  <native code in ([kernel.kallsyms])>
  0,87%    1,00%  <generated code>
  0,27%    0,77%  <native code in (libc-2.19.so)>
  0,13%    0,05%  <native code in (libpthread-2.19.so)>
....................................................................................................
 99,98%   99,98%  <totals>

# Run complete. Total time: 00:00:16

================================================
================================================
================================================

Incorrect results [2]:

# Processing profiler results: LinuxPerfAsmProfiler 

Result "drawItself":
  35,329 ops/s

Secondary result "·asm":
PrintAssembly processed: 159135 total address lines.
Perf output processed (skipped 5,124 seconds):
 Column 1: cycles (6485 events)
 Column 2: instructions (6600 events)

Hottest code regions (>10,00% "cycles" events):
....[Hottest Region 1]..............................................................................
 [0x7:0x7fda31b84000] in <unresolved>

 <region is too big to display, has 287783 lines, but threshold is 1000>
....................................................................................................
 57,52%   55,12%  <total for region 1>

....[Hottest Region 2]..............................................................................
 [0x7:0x7fda31b84000] in <unresolved>

 <region is too big to display, has 287783 lines, but threshold is 1000>
....................................................................................................
 38,61%   41,17%  <total for region 2>

....[Hottest Regions]...............................................................................
 57,52%   55,12%  [0x7:0x7fda31b84000] in <unresolved>
 38,61%   41,17%  [0x7:0x7fda31b84000] in <unresolved>
  0,39%    0,36%  [0xffffffff8105b83a:0xffffffff8105b83a] in native_write_msr_safe ([kernel.kallsyms])
  0,14%    0,02%  [0xffffffff817b6610:0xffffffff817b6636] in system_call; system_call_after_swapgs ([kernel.kallsyms])
  0,11%    0,06%  [0xffffffff817b43e0:0xffffffff817b43f0] in mutex_unlock ([kernel.kallsyms])
  0,08%    0,11%  [0xffffffff810d31cd:0xffffffff810d31ec] in __srcu_read_lock; __srcu_read_unlock ([kernel.kallsyms])
  0,08%           [0xffffffff813bceb2:0xffffffff813bceb2] in ioread32 ([kernel.kallsyms])
  0,08%    0,06%  [0x7:0x7fda31b84000] in sun.reflect.ClassFileAssembler::cpi
  0,06%    0,03%  [0xffffffff817b5f7c:0xffffffff817b5f94] in _raw_spin_lock_irqsave ([kernel.kallsyms])
  0,06%           [0x7:0x7fda31b84000] in sun.reflect.ClassFileAssembler::cpi
  0,05%    0,03%  [0xffffffff811ee8e8:0xffffffff811ee8f6] in __sb_end_write ([kernel.kallsyms])
  0,05%    0,02%  [0xffffffff81218357:0xffffffff81218357] in __mark_inode_dirty ([kernel.kallsyms])
  0,05%           [0xffffffff812b6ef3:0xffffffff812b6f00] in jbd2_journal_stop ([kernel.kallsyms])
  0,05%           [0x7fda1587d130:0x7fda1587d19a] in <stub: method entry point (kind = zerolocals)>
  0,03%    0,03%  [0xffffffff811ee96c:0xffffffff811ee96c] in __sb_start_write ([kernel.kallsyms])
  0,03%           [0xffffffff812c1884:0xffffffff812c18a0] in jbd2_journal_put_journal_head ([kernel.kallsyms])
  0,03%    0,03%  [0xffffffff813ccfa0:0xffffffff813ccfbe] in __percpu_counter_add ([kernel.kallsyms])
  0,03%           [0xffffffff817b60fe:0xffffffff817b60fe] in _raw_read_lock ([kernel.kallsyms])
  0,03%           [0x7:0x7fda31b84000] in sun.reflect.ClassFileAssembler::cpi
  0,03%           [0x7:0x7fda31b84000] in sun.reflect.ClassFileAssembler::cpi
  2,50%    2,97%  <...other 256 warm regions...>
....................................................................................................
 99,98%  100,00%  <totals>

....[Hottest Methods (after inlining)]..............................................................
 96,16%   96,29%  <unresolved>
  1,70%    1,77%  sun.reflect.ClassFileAssembler::cpi
  0,39%    0,36%  native_write_msr_safe ([kernel.kallsyms])
  0,14%    0,02%  system_call; system_call_after_swapgs ([kernel.kallsyms])
  0,11%    0,06%  mutex_unlock ([kernel.kallsyms])
  0,08%    0,11%  __srcu_read_lock; __srcu_read_unlock ([kernel.kallsyms])
  0,08%           ioread32 ([kernel.kallsyms])
  0,06%    0,03%  _raw_spin_lock_irqsave ([kernel.kallsyms])
  0,06%           <stub: method entry point (kind = zerolocals)>
  0,05%    0,03%  __percpu_counter_add ([kernel.kallsyms])
  0,05%    0,03%  __sb_end_write ([kernel.kallsyms])
  0,05%    0,02%  __mark_inode_dirty ([kernel.kallsyms])
  0,05%           jbd2_journal_stop ([kernel.kallsyms])
  0,03%           __block_commit_write.isra.20 ([kernel.kallsyms])
  0,03%    0,09%  pipe_write ([kernel.kallsyms])
  0,03%           _raw_read_lock ([kernel.kallsyms])
  0,03%           _raw_spin_lock ([kernel.kallsyms])
  0,03%    0,06%  __sb_start_write ([kernel.kallsyms])
  0,03%           ext4_mark_iloc_dirty ([kernel.kallsyms])
  0,03%           mutex_optimistic_spin ([kernel.kallsyms])
  0,82%    0,32%  <...other 52 warm methods...>
....................................................................................................
 99,98%   99,18%  <totals>

....[Distribution by Area]..........................................................................
 98,00%   98,08%  <generated code>
  1,97%    1,92%  <native code in ([kernel.kallsyms])>
  0,02%           <native code in (perf-21675.map)>
....................................................................................................
 99,98%  100,00%  <totals>


More information about the jmh-dev mailing list