DTrace asm profiler for Mac OS X

Roman Leventov leventov.ru at gmail.com
Tue Jan 16 15:50:50 UTC 2018


Vsevolod, thanks for this contribution, it works like a charm.

On 14 January 2018 at 16:24, Vsevolod Tolstopyatov <qwwdfsad at gmail.com>
wrote:

> Hi, it took me a while to reproduce your problem.
>
> The problem lies in Mac OS X version (everything after El Capitan) and
> system integrity protection (SIP).
> Usually DTrace works as intended, but on newer OS versions it requires
> additional privileges. In such cases if you run DTrace manually you should
> see something like "dtrace cannot control executables signed with
> restricted entitlements" [1]
> The only possible solution is to disable SIP [2]
>
> I have limited access to different versions of Mac OS X, but it seems that
> in some minor updates DTrace works with SIP enabled.
> So as solution I'd suggest to check SIP status on profiler start (via
> "csrutil status") and print warning if it's enabled or just clarify it in
> javadoc. It's up to Alexey to decide what approach is preferable in JMH
>
> [1] https://news.ycombinator.com/item?id=10790127
> [2]
> http://osxdaily.com/2015/10/05/disable-rootless-system-
> integrity-protection-mac-os-x/
>
> --
> Best regards,
> Tolstopyatov Vsevolod
>
> On Thu, Dec 28, 2017 at 10:35 PM, Henri Tremblay <henri.tremblay at gmail.com
> >
> wrote:
>
> > I am far far far from being an expert here so I'm pretty sure you will
> > throw some stupid mistake in my face but here it goes.
> >
> > You can use https://github.com/JCTools/JCTools/tree/master/
> > jctools-benchmarks.
> >
> > I did on Linux:
> > java -jar target/microbenchmarks.jar -f 1 --prof=perfasm
> > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput
> >
> > And got (yes, with an error on PrintAssembly):
> >
> > ERROR: No address lines detected in assembly capture, make sure your JDK
> > is PrintAssembly-enabled:
> >     https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly
> >
> > Perf output processed (skipped 2.844 seconds):
> >  Column 1: cycles (12218 events)
> >  Column 2: instructions (12169 events)
> >
> > Hottest code regions (>10.00% "cycles" events):
> >
> > ....[Hottest Region 1]............................
> > ..................................................
> > perf-52432.map, [unknown] (177 bytes)
> >
> >  <no assembly is recorded, native region>
> > ............................................................
> > ........................................
> >  19.81%   11.78%  <total for region 1>
> >
> > ....[Hottest Region 2]............................
> > ..................................................
> > perf-52432.map, [unknown] (381 bytes)
> >
> >  <no assembly is recorded, native region>
> > ............................................................
> > ........................................
> >  15.03%   12.21%  <total for region 2>
> >
> > ....[Hottest Region 3]............................
> > ..................................................
> > perf-52432.map, [unknown] (138 bytes)
> >
> >  <no assembly is recorded, native region>
> > ............................................................
> > ........................................
> >  10.38%    6.35%  <total for region 3>
> >
> > ....[Hottest Regions]......................
> ..............................
> > ...........................
> >  19.81%   11.78%      perf-52432.map  [unknown] (177 bytes)
> >  15.03%   12.21%      perf-52432.map  [unknown] (381 bytes)
> >  10.38%    6.35%      perf-52432.map  [unknown] (138 bytes)
> >   9.82%   37.09%      perf-52432.map  [unknown] (447 bytes)
> >   8.22%    2.47%      perf-52432.map  [unknown] (72 bytes)
> >   7.89%    1.69%      perf-52432.map  [unknown] (28 bytes)
> >   7.65%    1.69%      perf-52432.map  [unknown] (33 bytes)
> >   5.20%    2.94%      perf-52432.map  [unknown] (173 bytes)
> >   1.98%    1.59%      perf-52432.map  [unknown] (287 bytes)
> >   1.85%    4.54%      perf-52432.map  [unknown] (59 bytes)
> >   1.81%    4.48%      perf-52432.map  [unknown] (55 bytes)
> >   1.51%    0.96%      perf-52432.map  [unknown] (116 bytes)
> >   1.47%    1.83%      perf-52432.map  [unknown] (71 bytes)
> >   1.26%    1.25%              kernel  [unknown] (2 bytes)
> >   1.15%    0.53%      perf-52432.map  [unknown] (95 bytes)
> >   0.89%    0.40%      perf-52432.map  [unknown] (75 bytes)
> >   0.56%    0.05%              kernel  [unknown] (0 bytes)
> >   0.53%    2.34%      perf-52432.map  [unknown] (92 bytes)
> >   0.45%    1.16%      perf-52432.map  [unknown] (8 bytes)
> >   0.44%    2.47%      perf-52432.map  [unknown] (8 bytes)
> >   2.11%    2.14%  <...other 199 warm regions...>
> > ............................................................
> > ........................................
> > 100.00%   99.99%  <totals>
> >
> > ....[Hottest Methods (after inlining)]....................
> > ..........................................
> >  96.95%   97.51%      perf-52432.map  [unknown]
> >   2.76%    2.10%              kernel  [unknown]
> >   0.03%    0.07%           libjvm.so  fileStream::write
> >   0.02%    0.01%        libc-2.12.so  __strlen_sse42
> >   0.02%                 libc-2.12.so  _IO_file_xsputn@@GLIBC_2.2.5
> >   0.02%                 libc-2.12.so  __printf_fp
> >   0.01%                    libjvm.so  CompileBroker::set_last_compile
> >   0.01%                    libjvm.so  CodeCache::allocate
> >   0.01%           libpthread-2.12.so  pthread_mutex_unlock
> >   0.01%                    libjvm.so  os::set_priority
> >   0.01%                    libjvm.so  DebugInformationRecorder::
> > find_sharable_decode_offset
> >   0.01%           libpthread-2.12.so  pthread_cond_wait@@GLIBC_2.3.2
> >   0.01%                    libjvm.so  CompileBroker::invoke_
> > compiler_on_method
> >   0.01%                    libjvm.so  ciEnv::get_klass_by_index_impl
> >   0.01%    0.01%           libjvm.so  PhiResolverState::reset
> >   0.01%                    libjvm.so  CompilerOracle::should_exclude
> >   0.01%                    libjvm.so  CompilerOracle::has_option_string
> >   0.01%                    libjvm.so  LinearScan::compute_local_
> live_sets
> >   0.01%                    libjvm.so  OptoRuntime::new_instance_C
> >   0.01%                    libjvm.so  ChunkPool::allocate
> >   0.10%    0.02%  <...other 12 warm methods...>
> > ............................................................
> > ........................................
> > 100.00%   99.71%  <totals>
> >
> > ....[Distribution by Source].......................
> > .................................................
> >  96.95%   97.51%      perf-52432.map
> >   2.76%    2.10%              kernel
> >   0.22%    0.31%           libjvm.so
> >   0.05%    0.06%        libc-2.12.so
> >   0.02%           libpthread-2.12.so
> > ............................................................
> > ........................................
> > 100.00%   99.99%  <totals>
> >
> > But on OSX when I do
> >
> > java -jar target/microbenchmarks.jar -f 1 --prof=dtraceasm
> > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput
> >
> > I get:
> >
> > PrintAssembly processed: 193901 total address lines.
> > Perf output processed (skipped 6.097 seconds):
> >  Column 1: sampled_pc (0 events)
> >
> > WARNING: No hottest code region above the threshold (10.00%) for
> > disassembly.
> > Use "hotThreshold" profiler option to lower the filter threshold.
> >
> > ....[Hottest Regions]......................
> ..............................
> > ...........................
> > ............................................................
> > ........................................
> >          <totals>
> >
> > ....[Hottest Methods (after inlining)]....................
> > ..........................................
> > ............................................................
> > ........................................
> >          <totals>
> >
> > ....[Distribution by Source].......................
> > .................................................
> > ............................................................
> > ........................................
> >          <totals>
> >
> > WARNING: The perf event count is suspiciously low (0). The performance
> > data might be
> > inaccurate or misleading. Try to do the profiling again, or tune up the
> > sampling frequency.
> >
> > Which seem pretty empty.
> >
> > Henri
> >
> > On 27 December 2017 at 09:56, Henri Tremblay <henri.tremblay at gmail.com>
> > wrote:
> >
> >> No. One was Linux (perf), the other was OSX (dtrace). Let me put the
> >> benchmark out.
> >>
> >> On 26 December 2017 at 14:19, Vsevolod Tolstopyatov <qwwdfsad at gmail.com
> >
> >> wrote:
> >>
> >>> Hi, could you share your benchmark?
> >>> I've just re-applied my patch over clean repo and
> >>> run JMHSample_37_CacheAccess with dtrace-profiler, everything works as
> >>> expected, so maybe your hottest region lies in kernel code.
> >>>
> >>> >With perf, I would get some content. With dtrace, nothing.
> >>> Are you running both on Linux?
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Tolstopyatov Vsevolod
> >>>
> >>> On Wed, Dec 13, 2017 at 7:25 PM, Henri Tremblay <
> >>> henri.tremblay at gmail.com> wrote:
> >>>
> >>>> A bit late but my only problem right now is that I don't get any hot
> >>>> section. Which is weird.
> >>>>
> >>>> With perf, I would get some content. With dtrace, nothing.
> >>>>
> >>>> However, I am not an expert in using both. So maybe some javac or java
> >>>> arguments are required to get nice results. Is it the case?
> >>>>
> >>>> Thanks,
> >>>> Henri
> >>>>
> >>>> On 23 November 2017 at 13:04, Aleksey Shipilev <shade at redhat.com>
> >>>> wrote:
> >>>>
> >>>>> On 11/23/2017 09:09 AM, Vsevolod Tolstopyatov wrote:
> >>>>> > Hello,
> >>>>> >
> >>>>> > Any news about this patch? Is it going into jmh?
> >>>>>
> >>>>> It will. Just let me figure out some Mac testing.
> >>>>>
> >>>>> -Aleksey
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>


More information about the jmh-dev mailing list