DTrace asm profiler for Mac OS X

Henri Tremblay henri.tremblay at gmail.com
Thu Dec 28 19:35:48 UTC 2017


I am far far far from being an expert here so I'm pretty sure you will
throw some stupid mistake in my face but here it goes.

You can use
https://github.com/JCTools/JCTools/tree/master/jctools-benchmarks.

I did on Linux:
java -jar target/microbenchmarks.jar -f 1 --prof=perfasm
org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput

And got (yes, with an error on PrintAssembly):

ERROR: No address lines detected in assembly capture, make sure your JDK is
PrintAssembly-enabled:
    https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly

Perf output processed (skipped 2.844 seconds):
 Column 1: cycles (12218 events)
 Column 2: instructions (12169 events)

Hottest code regions (>10.00% "cycles" events):

....[Hottest Region
1]..............................................................................
perf-52432.map, [unknown] (177 bytes)

 <no assembly is recorded, native region>
....................................................................................................
 19.81%   11.78%  <total for region 1>

....[Hottest Region
2]..............................................................................
perf-52432.map, [unknown] (381 bytes)

 <no assembly is recorded, native region>
....................................................................................................
 15.03%   12.21%  <total for region 2>

....[Hottest Region
3]..............................................................................
perf-52432.map, [unknown] (138 bytes)

 <no assembly is recorded, native region>
....................................................................................................
 10.38%    6.35%  <total for region 3>

....[Hottest
Regions]...............................................................................
 19.81%   11.78%      perf-52432.map  [unknown] (177 bytes)
 15.03%   12.21%      perf-52432.map  [unknown] (381 bytes)
 10.38%    6.35%      perf-52432.map  [unknown] (138 bytes)
  9.82%   37.09%      perf-52432.map  [unknown] (447 bytes)
  8.22%    2.47%      perf-52432.map  [unknown] (72 bytes)
  7.89%    1.69%      perf-52432.map  [unknown] (28 bytes)
  7.65%    1.69%      perf-52432.map  [unknown] (33 bytes)
  5.20%    2.94%      perf-52432.map  [unknown] (173 bytes)
  1.98%    1.59%      perf-52432.map  [unknown] (287 bytes)
  1.85%    4.54%      perf-52432.map  [unknown] (59 bytes)
  1.81%    4.48%      perf-52432.map  [unknown] (55 bytes)
  1.51%    0.96%      perf-52432.map  [unknown] (116 bytes)
  1.47%    1.83%      perf-52432.map  [unknown] (71 bytes)
  1.26%    1.25%              kernel  [unknown] (2 bytes)
  1.15%    0.53%      perf-52432.map  [unknown] (95 bytes)
  0.89%    0.40%      perf-52432.map  [unknown] (75 bytes)
  0.56%    0.05%              kernel  [unknown] (0 bytes)
  0.53%    2.34%      perf-52432.map  [unknown] (92 bytes)
  0.45%    1.16%      perf-52432.map  [unknown] (8 bytes)
  0.44%    2.47%      perf-52432.map  [unknown] (8 bytes)
  2.11%    2.14%  <...other 199 warm regions...>
....................................................................................................
100.00%   99.99%  <totals>

....[Hottest Methods (after
inlining)]..............................................................
 96.95%   97.51%      perf-52432.map  [unknown]
  2.76%    2.10%              kernel  [unknown]
  0.03%    0.07%           libjvm.so  fileStream::write
  0.02%    0.01%        libc-2.12.so  __strlen_sse42
  0.02%                 libc-2.12.so  _IO_file_xsputn@@GLIBC_2.2.5
  0.02%                 libc-2.12.so  __printf_fp
  0.01%                    libjvm.so  CompileBroker::set_last_compile
  0.01%                    libjvm.so  CodeCache::allocate
  0.01%           libpthread-2.12.so  pthread_mutex_unlock
  0.01%                    libjvm.so  os::set_priority
  0.01%                    libjvm.so
DebugInformationRecorder::find_sharable_decode_offset
  0.01%           libpthread-2.12.so  pthread_cond_wait@@GLIBC_2.3.2
  0.01%                    libjvm.so
CompileBroker::invoke_compiler_on_method
  0.01%                    libjvm.so  ciEnv::get_klass_by_index_impl
  0.01%    0.01%           libjvm.so  PhiResolverState::reset
  0.01%                    libjvm.so  CompilerOracle::should_exclude
  0.01%                    libjvm.so  CompilerOracle::has_option_string
  0.01%                    libjvm.so  LinearScan::compute_local_live_sets
  0.01%                    libjvm.so  OptoRuntime::new_instance_C
  0.01%                    libjvm.so  ChunkPool::allocate
  0.10%    0.02%  <...other 12 warm methods...>
....................................................................................................
100.00%   99.71%  <totals>

....[Distribution by
Source]........................................................................
 96.95%   97.51%      perf-52432.map
  2.76%    2.10%              kernel
  0.22%    0.31%           libjvm.so
  0.05%    0.06%        libc-2.12.so
  0.02%           libpthread-2.12.so
....................................................................................................
100.00%   99.99%  <totals>

But on OSX when I do

java -jar target/microbenchmarks.jar -f 1 --prof=dtraceasm
org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput

I get:

PrintAssembly processed: 193901 total address lines.
Perf output processed (skipped 6.097 seconds):
 Column 1: sampled_pc (0 events)

WARNING: No hottest code region above the threshold (10.00%) for
disassembly.
Use "hotThreshold" profiler option to lower the filter threshold.

....[Hottest
Regions]...............................................................................
....................................................................................................
         <totals>

....[Hottest Methods (after
inlining)]..............................................................
....................................................................................................
         <totals>

....[Distribution by
Source]........................................................................
....................................................................................................
         <totals>

WARNING: The perf event count is suspiciously low (0). The performance data
might be
inaccurate or misleading. Try to do the profiling again, or tune up the
sampling frequency.

Which seem pretty empty.

Henri

On 27 December 2017 at 09:56, Henri Tremblay <henri.tremblay at gmail.com>
wrote:

> No. One was Linux (perf), the other was OSX (dtrace). Let me put the
> benchmark out.
>
> On 26 December 2017 at 14:19, Vsevolod Tolstopyatov <qwwdfsad at gmail.com>
> wrote:
>
>> Hi, could you share your benchmark?
>> I've just re-applied my patch over clean repo and
>> run JMHSample_37_CacheAccess with dtrace-profiler, everything works as
>> expected, so maybe your hottest region lies in kernel code.
>>
>> >With perf, I would get some content. With dtrace, nothing.
>> Are you running both on Linux?
>>
>>
>>
>> --
>> Best regards,
>> Tolstopyatov Vsevolod
>>
>> On Wed, Dec 13, 2017 at 7:25 PM, Henri Tremblay <henri.tremblay at gmail.com
>> > wrote:
>>
>>> A bit late but my only problem right now is that I don't get any hot
>>> section. Which is weird.
>>>
>>> With perf, I would get some content. With dtrace, nothing.
>>>
>>> However, I am not an expert in using both. So maybe some javac or java
>>> arguments are required to get nice results. Is it the case?
>>>
>>> Thanks,
>>> Henri
>>>
>>> On 23 November 2017 at 13:04, Aleksey Shipilev <shade at redhat.com> wrote:
>>>
>>>> On 11/23/2017 09:09 AM, Vsevolod Tolstopyatov wrote:
>>>> > Hello,
>>>> >
>>>> > Any news about this patch? Is it going into jmh?
>>>>
>>>> It will. Just let me figure out some Mac testing.
>>>>
>>>> -Aleksey
>>>>
>>>>
>>>
>>
>


More information about the jmh-dev mailing list