From qwwdfsad at gmail.com Sun Jan 14 15:24:09 2018 From: qwwdfsad at gmail.com (Vsevolod Tolstopyatov) Date: Sun, 14 Jan 2018 18:24:09 +0300 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> Message-ID: Hi, it took me a while to reproduce your problem. The problem lies in Mac OS X version (everything after El Capitan) and system integrity protection (SIP). Usually DTrace works as intended, but on newer OS versions it requires additional privileges. In such cases if you run DTrace manually you should see something like "dtrace cannot control executables signed with restricted entitlements" [1] The only possible solution is to disable SIP [2] I have limited access to different versions of Mac OS X, but it seems that in some minor updates DTrace works with SIP enabled. So as solution I'd suggest to check SIP status on profiler start (via "csrutil status") and print warning if it's enabled or just clarify it in javadoc. It's up to Alexey to decide what approach is preferable in JMH [1] https://news.ycombinator.com/item?id=10790127 [2] http://osxdaily.com/2015/10/05/disable-rootless-system-integrity-protection-mac-os-x/ -- Best regards, Tolstopyatov Vsevolod On Thu, Dec 28, 2017 at 10:35 PM, Henri Tremblay wrote: > I am far far far from being an expert here so I'm pretty sure you will > throw some stupid mistake in my face but here it goes. > > You can use https://github.com/JCTools/JCTools/tree/master/ > jctools-benchmarks. > > I did on Linux: > java -jar target/microbenchmarks.jar -f 1 --prof=perfasm > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput > > And got (yes, with an error on PrintAssembly): > > ERROR: No address lines detected in assembly capture, make sure your JDK > is PrintAssembly-enabled: > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > Perf output processed (skipped 2.844 seconds): > Column 1: cycles (12218 events) > Column 2: instructions (12169 events) > > Hottest code regions (>10.00% "cycles" events): > > ....[Hottest Region 1]............................ > .................................................. > perf-52432.map, [unknown] (177 bytes) > > > ............................................................ > ........................................ > 19.81% 11.78% > > ....[Hottest Region 2]............................ > .................................................. > perf-52432.map, [unknown] (381 bytes) > > > ............................................................ > ........................................ > 15.03% 12.21% > > ....[Hottest Region 3]............................ > .................................................. > perf-52432.map, [unknown] (138 bytes) > > > ............................................................ > ........................................ > 10.38% 6.35% > > ....[Hottest Regions].................................................... > ........................... > 19.81% 11.78% perf-52432.map [unknown] (177 bytes) > 15.03% 12.21% perf-52432.map [unknown] (381 bytes) > 10.38% 6.35% perf-52432.map [unknown] (138 bytes) > 9.82% 37.09% perf-52432.map [unknown] (447 bytes) > 8.22% 2.47% perf-52432.map [unknown] (72 bytes) > 7.89% 1.69% perf-52432.map [unknown] (28 bytes) > 7.65% 1.69% perf-52432.map [unknown] (33 bytes) > 5.20% 2.94% perf-52432.map [unknown] (173 bytes) > 1.98% 1.59% perf-52432.map [unknown] (287 bytes) > 1.85% 4.54% perf-52432.map [unknown] (59 bytes) > 1.81% 4.48% perf-52432.map [unknown] (55 bytes) > 1.51% 0.96% perf-52432.map [unknown] (116 bytes) > 1.47% 1.83% perf-52432.map [unknown] (71 bytes) > 1.26% 1.25% kernel [unknown] (2 bytes) > 1.15% 0.53% perf-52432.map [unknown] (95 bytes) > 0.89% 0.40% perf-52432.map [unknown] (75 bytes) > 0.56% 0.05% kernel [unknown] (0 bytes) > 0.53% 2.34% perf-52432.map [unknown] (92 bytes) > 0.45% 1.16% perf-52432.map [unknown] (8 bytes) > 0.44% 2.47% perf-52432.map [unknown] (8 bytes) > 2.11% 2.14% <...other 199 warm regions...> > ............................................................ > ........................................ > 100.00% 99.99% > > ....[Hottest Methods (after inlining)].................... > .......................................... > 96.95% 97.51% perf-52432.map [unknown] > 2.76% 2.10% kernel [unknown] > 0.03% 0.07% libjvm.so fileStream::write > 0.02% 0.01% libc-2.12.so __strlen_sse42 > 0.02% libc-2.12.so _IO_file_xsputn@@GLIBC_2.2.5 > 0.02% libc-2.12.so __printf_fp > 0.01% libjvm.so CompileBroker::set_last_compile > 0.01% libjvm.so CodeCache::allocate > 0.01% libpthread-2.12.so pthread_mutex_unlock > 0.01% libjvm.so os::set_priority > 0.01% libjvm.so DebugInformationRecorder:: > find_sharable_decode_offset > 0.01% libpthread-2.12.so pthread_cond_wait@@GLIBC_2.3.2 > 0.01% libjvm.so CompileBroker::invoke_ > compiler_on_method > 0.01% libjvm.so ciEnv::get_klass_by_index_impl > 0.01% 0.01% libjvm.so PhiResolverState::reset > 0.01% libjvm.so CompilerOracle::should_exclude > 0.01% libjvm.so CompilerOracle::has_option_string > 0.01% libjvm.so LinearScan::compute_local_live_sets > 0.01% libjvm.so OptoRuntime::new_instance_C > 0.01% libjvm.so ChunkPool::allocate > 0.10% 0.02% <...other 12 warm methods...> > ............................................................ > ........................................ > 100.00% 99.71% > > ....[Distribution by Source]....................... > ................................................. > 96.95% 97.51% perf-52432.map > 2.76% 2.10% kernel > 0.22% 0.31% libjvm.so > 0.05% 0.06% libc-2.12.so > 0.02% libpthread-2.12.so > ............................................................ > ........................................ > 100.00% 99.99% > > But on OSX when I do > > java -jar target/microbenchmarks.jar -f 1 --prof=dtraceasm > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput > > I get: > > PrintAssembly processed: 193901 total address lines. > Perf output processed (skipped 6.097 seconds): > Column 1: sampled_pc (0 events) > > WARNING: No hottest code region above the threshold (10.00%) for > disassembly. > Use "hotThreshold" profiler option to lower the filter threshold. > > ....[Hottest Regions].................................................... > ........................... > ............................................................ > ........................................ > > > ....[Hottest Methods (after inlining)].................... > .......................................... > ............................................................ > ........................................ > > > ....[Distribution by Source]....................... > ................................................. > ............................................................ > ........................................ > > > WARNING: The perf event count is suspiciously low (0). The performance > data might be > inaccurate or misleading. Try to do the profiling again, or tune up the > sampling frequency. > > Which seem pretty empty. > > Henri > > On 27 December 2017 at 09:56, Henri Tremblay > wrote: > >> No. One was Linux (perf), the other was OSX (dtrace). Let me put the >> benchmark out. >> >> On 26 December 2017 at 14:19, Vsevolod Tolstopyatov >> wrote: >> >>> Hi, could you share your benchmark? >>> I've just re-applied my patch over clean repo and >>> run JMHSample_37_CacheAccess with dtrace-profiler, everything works as >>> expected, so maybe your hottest region lies in kernel code. >>> >>> >With perf, I would get some content. With dtrace, nothing. >>> Are you running both on Linux? >>> >>> >>> >>> -- >>> Best regards, >>> Tolstopyatov Vsevolod >>> >>> On Wed, Dec 13, 2017 at 7:25 PM, Henri Tremblay < >>> henri.tremblay at gmail.com> wrote: >>> >>>> A bit late but my only problem right now is that I don't get any hot >>>> section. Which is weird. >>>> >>>> With perf, I would get some content. With dtrace, nothing. >>>> >>>> However, I am not an expert in using both. So maybe some javac or java >>>> arguments are required to get nice results. Is it the case? >>>> >>>> Thanks, >>>> Henri >>>> >>>> On 23 November 2017 at 13:04, Aleksey Shipilev >>>> wrote: >>>> >>>>> On 11/23/2017 09:09 AM, Vsevolod Tolstopyatov wrote: >>>>> > Hello, >>>>> > >>>>> > Any news about this patch? Is it going into jmh? >>>>> >>>>> It will. Just let me figure out some Mac testing. >>>>> >>>>> -Aleksey >>>>> >>>>> >>>> >>> >> > From sergei.tsypanov at yandex.ru Sun Jan 14 18:24:19 2018 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Sun, 14 Jan 2018 20:24:19 +0200 Subject: OOME in simple benchmark Message-ID: <2991741515954259@web12j.yandex.ru> I execute the following benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class AddIntoArrayListBenchmark { @Benchmark public boolean add(Data data) { return data.list.add(data.integer); } @State(Scope.Thread) public static class Data { private Integer integer = 1; private ArrayList list; @Setup public void setup() { list = new ArrayList<>(); } } } with this runner: public class BenchmarkRunner { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .warmupIterations(10) .measurementIterations(20) .forks(1) .shouldFailOnError(true) .build(); new Runner(opt).run(); } } and on the 2nd or 3rd iteration constantly get OOME. java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3688) at java.base/java.util.ArrayList.grow(ArrayList.java:236) at java.base/java.util.ArrayList.grow(ArrayList.java:241) at java.base/java.util.ArrayList.add(ArrayList.java:466) at java.base/java.util.ArrayList.add(ArrayList.java:479) at com.luxoft.logeek.benchmark.AddIntoArrayListBenchmark.add(AddIntoArrayListBenchmark.java:16) at com.luxoft.logeek.benchmark.generated.AddIntoArrayListBenchmark_add_jmhTest.add_avgt_jmhStub(AddIntoArrayListBenchmark_add_jmhTest.java:191) at com.luxoft.logeek.benchmark.generated.AddIntoArrayListBenchmark_add_jmhTest.add_AverageTime(AddIntoArrayListBenchmark_add_jmhTest.java:154) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Here is run time output: # JMH version: 1.19 # VM version: JDK 9, VM 9+181 # VM invoker: G:\jdk\jdk-9\bin\java.exe # VM options: -Dvisualvm.id=3867492965950 -javaagent:G:\idea\IntelliJ IDEA 173.2463.16\lib\idea_rt.jar=56255:G:\idea\IntelliJ IDEA 173.2463.16\bin -Dfile.encoding=UTF-8 # Warmup: 10 iterations, 1 s each # Measurement: 20 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: com.luxoft.logeek.benchmark.AddIntoArrayListBenchmark.add As I understand benchmark method 'add' should be executed 30 times (10 for warmup and 20 for measurement), but for some reason it seems to be executed in an infinite loop causing error. Using @Setup(Level.Iteration) solves the issue but I wonder what's behind it? I've looked into the code of generated class AddIntoArrayListBenchmark_add_jmhTest and it seems the only difference is the usage of AddIntoArrayListBenchmark_Data_jmhType l_data1_1 = _jmh_tryInit_f_data1_1(control); When plain @Setup is used l_data1_1 is instantiated with calling setup() and then passed to benchmark method. For @Setup(Level.Iteration) l_data1_1.setup() is called after control.preSetup() which seems to be logical. As far as I have only one thread and one fork l_data1_1.setup() is executed only once for both Level.Trial and Level.Iteration in my case. I do not see any other differences in the code of AddIntoArrayListBenchmark_add_jmhTest but the behaviour is different. What am I doing wrong? From sitnikov.vladimir at gmail.com Sun Jan 14 18:41:29 2018 From: sitnikov.vladimir at gmail.com (Vladimir Sitnikov) Date: Sun, 14 Jan 2018 18:41:29 +0000 Subject: OOME in simple benchmark In-Reply-To: <2991741515954259@web12j.yandex.ru> References: <2991741515954259@web12j.yandex.ru> Message-ID: ??????> .measurementIterations(20) Suppose you add .measurementTime(TimeValue.seconds(1)) for clarity. Then measurementIterations(20) would mean JMH would perform 20 loops of 1-second long tests. JMH would perform as many executions of benchmark method as it can during those 1-second long tests. That is why you are getting OOM. If you want to call benchmark method 20 times only, then you need something like @BenchmarkMode(Mode.SingleShotTime) Does that make sense? Vladimir From ecki at zusammenkunft.net Sun Jan 14 19:41:01 2018 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Sun, 14 Jan 2018 20:41:01 +0100 Subject: OOME in simple benchmark In-Reply-To: <2991741515954259@web12j.yandex.ru> References: <2991741515954259@web12j.yandex.ru> Message-ID: <5a5bb24d.12badf0a.57715.ace9@mx.google.com> Iteration and Warmup is not ?one shot?, it will run for as Long as configured (30s - resulting in millions of executions). I dont think there would be any useful results if you measure such a fest activity only 30 times. Maybe clear the list when it reaches a specific size? Gruss Bernd -- http://bernd.eckenfels.net Von: ?????? ??????? Gesendet: Sonntag, 14. Januar 2018 20:24 An: jmh-dev at openjdk.java.net Betreff: OOME in simple benchmark I execute the following benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class AddIntoArrayListBenchmark { @Benchmark public boolean add(Data data) { return data.list.add(data.integer); } @State(Scope.Thread) public static class Data { private Integer integer = 1; private ArrayList list; @Setup public void setup() { list = new ArrayList<>(); } } } with this runner: public class BenchmarkRunner { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .warmupIterations(10) .measurementIterations(20) .forks(1) .shouldFailOnError(true) .build(); new Runner(opt).run(); } } and on the 2nd or 3rd iteration constantly get OOME. java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3688) at java.base/java.util.ArrayList.grow(ArrayList.java:236) at java.base/java.util.ArrayList.grow(ArrayList.java:241) at java.base/java.util.ArrayList.add(ArrayList.java:466) at java.base/java.util.ArrayList.add(ArrayList.java:479) at com.luxoft.logeek.benchmark.AddIntoArrayListBenchmark.add(AddIntoArrayListBenchmark.java:16) at com.luxoft.logeek.benchmark.generated.AddIntoArrayListBenchmark_add_jmhTest.add_avgt_jmhStub(AddIntoArrayListBenchmark_add_jmhTest.java:191) at com.luxoft.logeek.benchmark.generated.AddIntoArrayListBenchmark_add_jmhTest.add_AverageTime(AddIntoArrayListBenchmark_add_jmhTest.java:154) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Here is run time output: # JMH version: 1.19 # VM version: JDK 9, VM 9+181 # VM invoker: G:\jdk\jdk-9\bin\java.exe # VM options: -Dvisualvm.id=3867492965950 -javaagent:G:\idea\IntelliJ IDEA 173.2463.16\lib\idea_rt.jar=56255:G:\idea\IntelliJ IDEA 173.2463.16\bin -Dfile.encoding=UTF-8 # Warmup: 10 iterations, 1 s each # Measurement: 20 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: com.luxoft.logeek.benchmark.AddIntoArrayListBenchmark.add As I understand benchmark method 'add' should be executed 30 times (10 for warmup and 20 for measurement), but for some reason it seems to be executed in an infinite loop causing error. Using @Setup(Level.Iteration) solves the issue but I wonder what's behind it? I've looked into the code of generated class AddIntoArrayListBenchmark_add_jmhTest and it seems the only difference is the usage of AddIntoArrayListBenchmark_Data_jmhType l_data1_1 = _jmh_tryInit_f_data1_1(control); When plain @Setup is used l_data1_1 is instantiated with calling setup() and then passed to benchmark method. For @Setup(Level.Iteration) l_data1_1.setup() is called after control.preSetup() which seems to be logical. As far as I have only one thread and one fork l_data1_1.setup() is executed only once for both Level.Trial and Level.Iteration in my case. I do not see any other differences in the code of AddIntoArrayListBenchmark_add_jmhTest but the behaviour is different. What am I doing wrong? From anuraaga at gmail.com Tue Jan 16 10:54:15 2018 From: anuraaga at gmail.com (Anuraag Agrawal) Date: Tue, 16 Jan 2018 19:54:15 +0900 Subject: [PATCH] Support enum params that override Object#toString() In-Reply-To: References: Message-ID: Hello - my CLA has been processed so I guess the patch is ready for a look. Can anyone help with this? Thanks, - Anuraag On Tue, Dec 26, 2017 at 7:15 PM, Anuraag Agrawal wrote: > Hi all, > > As discussed in http://mail.openjdk.java.net/pipermail/jmh-dev/2017- > December/002678.html currently JMH does not handle enum params where the > enum overrides toString when using the reflection or ASM generators. This > is because the spec explicitly allows enums to override toString() to > something that doesn't match the descriptor's name, but restoring the enum > using valueOf can only happen with the descriptor's name. > > https://docs.oracle.com/javase/7/docs/api/java/lang/Enum.html#toString() > > This patch changes use of toString() to name(), which is always guaranteed > to be the descriptor's name and round-trippable. > > https://docs.oracle.com/javase/7/docs/api/java/lang/Enum.html#name() > > The patch has been attached and inlined below, including a regression test > which fails without the logic change. > > Note, I just signed and emailed the Oracle CLA, so it may take some time > for that to get processed. > > Thanks. Patch follows > > # HG changeset patch > # User Anuraag Agrawal > # Date 1512560148 -32400 > # Wed Dec 06 20:35:48 2017 +0900 > # Node ID 4eefe5751280399fdbde2df655d2dcf43f2de557 > # Parent 1ddf31f810a3100b9433c3fedf24615e85b1d1a7 > Use Enum.name() instead of Enum.toString() for determining the string > value of enum params. Only Enum.name() is guaranteed to round-trip through > Enum.valueOf during execution, the specification explicitly allows > overriding toString() to something that will not round-trip. > > diff --git a/jmh-core-it/src/test/java/org/openjdk/jmh/it/params/EnumParamSequenceTest.java > b/jmh-core-it/src/test/java/org/openjdk/jmh/it/params/ > EnumParamToStringOverridingTest.java > copy from jmh-core-it/src/test/java/org/openjdk/jmh/it/params/ > EnumParamSequenceTest.java > copy to jmh-core-it/src/test/java/org/openjdk/jmh/it/params/ > EnumParamToStringOverridingTest.java > --- a/jmh-core-it/src/test/java/org/openjdk/jmh/it/params/ > EnumParamSequenceTest.java > +++ b/jmh-core-it/src/test/java/org/openjdk/jmh/it/params/ > EnumParamToStringOverridingTest.java > @@ -45,56 +45,32 @@ > @Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MICROSECONDS) > @Fork(1) > @State(Scope.Thread) > -public class EnumParamSequenceTest { > +public class EnumParamToStringOverridingTest { > > @Param({"VALUE_A", "VALUE_B", "VALUE_C"}) > public SampleEnumA a; > > - @Param({"VALUE_A", "VALUE_B", "VALUE_C"}) > - public SampleEnumB b; > - > @Benchmark > public void test() { > Fixtures.work(); > } > > @Test > - public void full() throws RunnerException { > + public void normal() throws RunnerException { > Options opts = new OptionsBuilder() > .include(Fixtures.getTestMask(this.getClass())) > .shouldFailOnError(true) > .build(); > > - Assert.assertEquals(3 * 3, new Runner(opts).run().size()); > - } > - > - @Test > - public void constrainedA() throws RunnerException { > - Options opts = new OptionsBuilder() > - .include(Fixtures.getTestMask(this.getClass())) > - .shouldFailOnError(true) > - .param("a", SampleEnumA.VALUE_A.name()) > - .build(); > - > - Assert.assertEquals(1 * 3, new Runner(opts).run().size()); > - } > - > - @Test > - public void constrainedB() throws RunnerException { > - Options opts = new OptionsBuilder() > - .include(Fixtures.getTestMask(this.getClass())) > - .shouldFailOnError(true) > - .param("b", SampleEnumB.VALUE_A.name()) > - .build(); > - > - Assert.assertEquals(1*3, new Runner(opts).run().size()); > + Assert.assertEquals(3, new Runner(opts).run().size()); > } > > public enum SampleEnumA { > - VALUE_A, VALUE_B, VALUE_C > - } > + VALUE_A, VALUE_B, VALUE_C; > > - public enum SampleEnumB { > - VALUE_A, VALUE_B, VALUE_C > + @Override > + public String toString() { > + return name().toLowerCase(); > + } > } > } > diff --git a/jmh-generator-asm/src/main/java/org/openjdk/jmh/ > generators/asm/ASMClassInfo.java b/jmh-generator-asm/src/main/ > java/org/openjdk/jmh/generators/asm/ASMClassInfo.java > --- a/jmh-generator-asm/src/main/java/org/openjdk/jmh/ > generators/asm/ASMClassInfo.java > +++ b/jmh-generator-asm/src/main/java/org/openjdk/jmh/ > generators/asm/ASMClassInfo.java > @@ -221,7 +221,7 @@ > try { > Collection res = new ArrayList<>(); > for (Object cnst : Class.forName(origQualifiedName, > false, Thread.currentThread().getContextClassLoader()).getEnumConstants()) > { > - res.add(cnst.toString()); > + res.add(((Enum) cnst).name()); > } > return res; > } catch (ClassNotFoundException e) { > diff --git a/jmh-generator-reflection/src/main/java/org/openjdk/jmh/ > generators/reflection/RFClassInfo.java b/jmh-generator-reflection/ > src/main/java/org/openjdk/jmh/generators/reflection/RFClassInfo.java > --- a/jmh-generator-reflection/src/main/java/org/openjdk/jmh/ > generators/reflection/RFClassInfo.java > +++ b/jmh-generator-reflection/src/main/java/org/openjdk/jmh/ > generators/reflection/RFClassInfo.java > @@ -162,7 +162,7 @@ > public Collection getEnumConstants() { > Collection res = new ArrayList<>(); > for (Object cnst : klass.getEnumConstants()) { > - res.add(cnst.toString()); > + res.add(((Enum) cnst).name()); > } > return res; > } > > - Anuraag > From ashipile at redhat.com Tue Jan 16 11:20:17 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 16 Jan 2018 11:20:17 +0000 Subject: hg: code-tools/jmh: 7902095: Support enum params that override Object#toString() Message-ID: <201801161120.w0GBKHIY024771@aojmv0008.oracle.com> Changeset: a0c4f5e23278 Author: shade Date: 2018-01-16 12:13 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/a0c4f5e23278 7902095: Support enum params that override Object#toString() Contributed-by: Anuraag Agrawal + jmh-core-it/src/test/java/org/openjdk/jmh/it/params/EnumParamToStringOverridingTest.java ! jmh-generator-asm/src/main/java/org/openjdk/jmh/generators/asm/ASMClassInfo.java ! jmh-generator-reflection/src/main/java/org/openjdk/jmh/generators/reflection/RFClassInfo.java From shade at redhat.com Tue Jan 16 11:16:44 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 12:16:44 +0100 Subject: [PATCH] Support enum params that override Object#toString() In-Reply-To: References: Message-ID: <1cf5a6f7-33bd-a599-df60-9fc6ca7627a5@redhat.com> On 01/16/2018 11:54 AM, Anuraag Agrawal wrote: > Hello - my CLA has been processed so I guess the patch is ready for a look. Can anyone help with this? Pushed as: https://bugs.openjdk.java.net/browse/CODETOOLS-7902095 Thank you, -Aleksey From leventov.ru at gmail.com Tue Jan 16 15:50:50 2018 From: leventov.ru at gmail.com (Roman Leventov) Date: Tue, 16 Jan 2018 16:50:50 +0100 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> Message-ID: Vsevolod, thanks for this contribution, it works like a charm. On 14 January 2018 at 16:24, Vsevolod Tolstopyatov wrote: > Hi, it took me a while to reproduce your problem. > > The problem lies in Mac OS X version (everything after El Capitan) and > system integrity protection (SIP). > Usually DTrace works as intended, but on newer OS versions it requires > additional privileges. In such cases if you run DTrace manually you should > see something like "dtrace cannot control executables signed with > restricted entitlements" [1] > The only possible solution is to disable SIP [2] > > I have limited access to different versions of Mac OS X, but it seems that > in some minor updates DTrace works with SIP enabled. > So as solution I'd suggest to check SIP status on profiler start (via > "csrutil status") and print warning if it's enabled or just clarify it in > javadoc. It's up to Alexey to decide what approach is preferable in JMH > > [1] https://news.ycombinator.com/item?id=10790127 > [2] > http://osxdaily.com/2015/10/05/disable-rootless-system- > integrity-protection-mac-os-x/ > > -- > Best regards, > Tolstopyatov Vsevolod > > On Thu, Dec 28, 2017 at 10:35 PM, Henri Tremblay > > wrote: > > > I am far far far from being an expert here so I'm pretty sure you will > > throw some stupid mistake in my face but here it goes. > > > > You can use https://github.com/JCTools/JCTools/tree/master/ > > jctools-benchmarks. > > > > I did on Linux: > > java -jar target/microbenchmarks.jar -f 1 --prof=perfasm > > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput > > > > And got (yes, with an error on PrintAssembly): > > > > ERROR: No address lines detected in assembly capture, make sure your JDK > > is PrintAssembly-enabled: > > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > > > Perf output processed (skipped 2.844 seconds): > > Column 1: cycles (12218 events) > > Column 2: instructions (12169 events) > > > > Hottest code regions (>10.00% "cycles" events): > > > > ....[Hottest Region 1]............................ > > .................................................. > > perf-52432.map, [unknown] (177 bytes) > > > > > > ............................................................ > > ........................................ > > 19.81% 11.78% > > > > ....[Hottest Region 2]............................ > > .................................................. > > perf-52432.map, [unknown] (381 bytes) > > > > > > ............................................................ > > ........................................ > > 15.03% 12.21% > > > > ....[Hottest Region 3]............................ > > .................................................. > > perf-52432.map, [unknown] (138 bytes) > > > > > > ............................................................ > > ........................................ > > 10.38% 6.35% > > > > ....[Hottest Regions]...................... > .............................. > > ........................... > > 19.81% 11.78% perf-52432.map [unknown] (177 bytes) > > 15.03% 12.21% perf-52432.map [unknown] (381 bytes) > > 10.38% 6.35% perf-52432.map [unknown] (138 bytes) > > 9.82% 37.09% perf-52432.map [unknown] (447 bytes) > > 8.22% 2.47% perf-52432.map [unknown] (72 bytes) > > 7.89% 1.69% perf-52432.map [unknown] (28 bytes) > > 7.65% 1.69% perf-52432.map [unknown] (33 bytes) > > 5.20% 2.94% perf-52432.map [unknown] (173 bytes) > > 1.98% 1.59% perf-52432.map [unknown] (287 bytes) > > 1.85% 4.54% perf-52432.map [unknown] (59 bytes) > > 1.81% 4.48% perf-52432.map [unknown] (55 bytes) > > 1.51% 0.96% perf-52432.map [unknown] (116 bytes) > > 1.47% 1.83% perf-52432.map [unknown] (71 bytes) > > 1.26% 1.25% kernel [unknown] (2 bytes) > > 1.15% 0.53% perf-52432.map [unknown] (95 bytes) > > 0.89% 0.40% perf-52432.map [unknown] (75 bytes) > > 0.56% 0.05% kernel [unknown] (0 bytes) > > 0.53% 2.34% perf-52432.map [unknown] (92 bytes) > > 0.45% 1.16% perf-52432.map [unknown] (8 bytes) > > 0.44% 2.47% perf-52432.map [unknown] (8 bytes) > > 2.11% 2.14% <...other 199 warm regions...> > > ............................................................ > > ........................................ > > 100.00% 99.99% > > > > ....[Hottest Methods (after inlining)].................... > > .......................................... > > 96.95% 97.51% perf-52432.map [unknown] > > 2.76% 2.10% kernel [unknown] > > 0.03% 0.07% libjvm.so fileStream::write > > 0.02% 0.01% libc-2.12.so __strlen_sse42 > > 0.02% libc-2.12.so _IO_file_xsputn@@GLIBC_2.2.5 > > 0.02% libc-2.12.so __printf_fp > > 0.01% libjvm.so CompileBroker::set_last_compile > > 0.01% libjvm.so CodeCache::allocate > > 0.01% libpthread-2.12.so pthread_mutex_unlock > > 0.01% libjvm.so os::set_priority > > 0.01% libjvm.so DebugInformationRecorder:: > > find_sharable_decode_offset > > 0.01% libpthread-2.12.so pthread_cond_wait@@GLIBC_2.3.2 > > 0.01% libjvm.so CompileBroker::invoke_ > > compiler_on_method > > 0.01% libjvm.so ciEnv::get_klass_by_index_impl > > 0.01% 0.01% libjvm.so PhiResolverState::reset > > 0.01% libjvm.so CompilerOracle::should_exclude > > 0.01% libjvm.so CompilerOracle::has_option_string > > 0.01% libjvm.so LinearScan::compute_local_ > live_sets > > 0.01% libjvm.so OptoRuntime::new_instance_C > > 0.01% libjvm.so ChunkPool::allocate > > 0.10% 0.02% <...other 12 warm methods...> > > ............................................................ > > ........................................ > > 100.00% 99.71% > > > > ....[Distribution by Source]....................... > > ................................................. > > 96.95% 97.51% perf-52432.map > > 2.76% 2.10% kernel > > 0.22% 0.31% libjvm.so > > 0.05% 0.06% libc-2.12.so > > 0.02% libpthread-2.12.so > > ............................................................ > > ........................................ > > 100.00% 99.99% > > > > But on OSX when I do > > > > java -jar target/microbenchmarks.jar -f 1 --prof=dtraceasm > > org.jctools.maps.nhbm_test.jmh.ConcurrentMapThroughput > > > > I get: > > > > PrintAssembly processed: 193901 total address lines. > > Perf output processed (skipped 6.097 seconds): > > Column 1: sampled_pc (0 events) > > > > WARNING: No hottest code region above the threshold (10.00%) for > > disassembly. > > Use "hotThreshold" profiler option to lower the filter threshold. > > > > ....[Hottest Regions]...................... > .............................. > > ........................... > > ............................................................ > > ........................................ > > > > > > ....[Hottest Methods (after inlining)].................... > > .......................................... > > ............................................................ > > ........................................ > > > > > > ....[Distribution by Source]....................... > > ................................................. > > ............................................................ > > ........................................ > > > > > > WARNING: The perf event count is suspiciously low (0). The performance > > data might be > > inaccurate or misleading. Try to do the profiling again, or tune up the > > sampling frequency. > > > > Which seem pretty empty. > > > > Henri > > > > On 27 December 2017 at 09:56, Henri Tremblay > > wrote: > > > >> No. One was Linux (perf), the other was OSX (dtrace). Let me put the > >> benchmark out. > >> > >> On 26 December 2017 at 14:19, Vsevolod Tolstopyatov > > >> wrote: > >> > >>> Hi, could you share your benchmark? > >>> I've just re-applied my patch over clean repo and > >>> run JMHSample_37_CacheAccess with dtrace-profiler, everything works as > >>> expected, so maybe your hottest region lies in kernel code. > >>> > >>> >With perf, I would get some content. With dtrace, nothing. > >>> Are you running both on Linux? > >>> > >>> > >>> > >>> -- > >>> Best regards, > >>> Tolstopyatov Vsevolod > >>> > >>> On Wed, Dec 13, 2017 at 7:25 PM, Henri Tremblay < > >>> henri.tremblay at gmail.com> wrote: > >>> > >>>> A bit late but my only problem right now is that I don't get any hot > >>>> section. Which is weird. > >>>> > >>>> With perf, I would get some content. With dtrace, nothing. > >>>> > >>>> However, I am not an expert in using both. So maybe some javac or java > >>>> arguments are required to get nice results. Is it the case? > >>>> > >>>> Thanks, > >>>> Henri > >>>> > >>>> On 23 November 2017 at 13:04, Aleksey Shipilev > >>>> wrote: > >>>> > >>>>> On 11/23/2017 09:09 AM, Vsevolod Tolstopyatov wrote: > >>>>> > Hello, > >>>>> > > >>>>> > Any news about this patch? Is it going into jmh? > >>>>> > >>>>> It will. Just let me figure out some Mac testing. > >>>>> > >>>>> -Aleksey > >>>>> > >>>>> > >>>> > >>> > >> > > > From alex.averbuch at neo4j.com Tue Jan 16 21:06:19 2018 From: alex.averbuch at neo4j.com (Alex Averbuch) Date: Tue, 16 Jan 2018 21:06:19 -0000 Subject: beforeTrial/afterTrial not called for warmup forks In-Reply-To: <0a8dac2b-5958-2d7c-90b6-925ff2aec97c@redhat.com> References: <0a8dac2b-5958-2d7c-90b6-925ff2aec97c@redhat.com> Message-ID: Thanks, Aleksey. On Thu, Dec 21, 2017 at 8:23 AM, Aleksey Shipilev wrote: > Hi, > > On 12/15/2017 10:21 AM, Alex Averbuch wrote: > > I've noticed that ExternalProfiler#addJVMInvokeOptions & > > ExternalProfiler#addJVMOptions are invoked for warmup forks, but that > > ExternalProfiler#beforeTrial & ExternalProfiler#afterTrial are not. > > > > On first impression it feels inconsistent. > > Is this intended behavior? > > It is inconsistent. The intent was to ignore profiler results for warmup > forks, and it was crudely > implemented by doing the warmup forks separately, this is why > {after,before}Trial is called only > with measurement forks. But, JVM still forks with the added of > ExternalProfiler JVM options during > warmup forks anyway. > > It seems more consistent to run ExternalProfilers during warmup forks too, > and just ignore their > results. This is less surprising, and aligns better with what > InternalProfilers are doing -- > internal profilers also run during warmups and their results ignored. > > Fixed with: > https://bugs.openjdk.java.net/browse/CODETOOLS-7902088 > > Thanks, > -Aleksey > > > From osipov.av at gmail.com Wed Jan 17 03:21:16 2018 From: osipov.av at gmail.com (Alexei Osipov) Date: Wed, 17 Jan 2018 06:21:16 +0300 Subject: State setup/tearDown order guaranties Message-ID: <25269ef8-79c6-547e-f53f-0a46e02ed4ca@gmail.com> Hello, Are there any docs on execution order guaranties for "setup/tearDown" methods in @State objects with dependencies? I'm getting unexpected "setup" method execution order when I mix states with @State(Scope.Group) and @State(Scope.Thread) that use @Setup(Level.Iteration) on fixture methods. Test case: https://gist.github.com/alexei-osipov/1afd9c0482900b5f2b1248a3d99164d5 I'm getting a stable assertion error with it. The issue disappears when I change fixture levels to "Iteration". So I wonder if I'm doing something wrong or JMH just does not provide order guaranties for the case with @State(Scope.Thread) + @Setup(Level.Iteration). Any recommendations on what should I check? JMH: 1.19 Best regards, Alexei Osipov From sergei.tsypanov at yandex.ru Wed Jan 17 07:47:30 2018 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Wed, 17 Jan 2018 09:47:30 +0200 Subject: Usage of Blackhole in a loop distorts benchmark results Message-ID: <1432911516175250@web36o.yandex.ru> Say I have this benchmark: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-XX:+UseParallelGC", "-Xms2g", "-Xmx2g"}) public class IteratorFromStreamBenchmark { @Benchmark public void iteratorFromStream(Data data, Blackhole bh) { Iterator iterator = data.items.stream() .iterator(); while (iterator.hasNext()) bh.consume(iterator.next()); } @Benchmark public void forEach(Data data, Blackhole bh) { data.items.stream().forEach(bh::consume); } @State(Scope.Thread) public static class Data { private Collection items; private int size = 1000; @Setup public void init() { items = IntStream.range(0, size).boxed().collect(toList()); } } } which on Java 9 (JDK 9, VM 9+181) yields this output: Benchmark Mode Cnt Score Error Units forEach avgt 100 6130,066 ? 308,597 ns/op iteratorFromStream avgt 100 4835,355 ? 57,886 ns/op Here 'iteratorFromStream' appears to be faster than 'forEach' Then I change the behaviour to accumulate the result of iteration over elements and return it: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-XX:+UseParallelGC", "-Xms2g", "-Xmx2g"}) public class IteratorFromStreamBenchmark { @Benchmark public int iteratorFromStream(Data data) { int sum = 0; Iterator iterator = data.list.stream() .iterator(); while (iterator.hasNext()) sum += iterator.next(); return sum; } @Benchmark public int forEach(Data data) { int[] sum = {0}; data.list.stream().forEach(integer -> sum[0] = sum[0] + integer); return sum[0]; } @State(Scope.Thread) public static class Data { private List list; private int size = 100; @Setup public void init() { list = IntStream.range(0, size).boxed().collect(toList()); } } } Which yields: Benchmark Mode Cnt Score Error Units forEach avgt 100 133,118 ? 1,580 ns/op iteratorFromStream avgt 100 228,061 ? 5,491 ns/op The question here is not only huge difference in absolute values, but the fact 'forEach' now appears to be faster than 'iteratorFromStream'. Also error is lower in case of value returning benchmark. Could anyone explain is it correct behaviour of Blackhole? Best regards, Sergei Tsypanov From shade at redhat.com Wed Jan 17 09:21:33 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 10:21:33 +0100 Subject: State setup/tearDown order guaranties In-Reply-To: <25269ef8-79c6-547e-f53f-0a46e02ed4ca@gmail.com> References: <25269ef8-79c6-547e-f53f-0a46e02ed4ca@gmail.com> Message-ID: On 01/17/2018 04:21 AM, Alexei Osipov wrote: > Are there any docs on execution order guaranties for "setup/tearDown" methods in @State objects with > dependencies? > [...] > So I wonder if I'm doing something wrong or JMH just does not provide order guaranties for the case > with @State(Scope.Thread) + @Setup(Level.Iteration). Any recommendations on what should I check? You are not doing anything wrong: the @State dependencies should work as you would expect: all states that are coming as arguments should have their helper methods executed. Except that there is a JMH bug: it first runs all the helpers for Scope.Thread in DAG order, and then runs all the helpers for Scope.{Benchmark|Iteration} in DAG order, which obviously breaks this assumption. Let me handle this. -Aleksey From nitsanw at yahoo.com Wed Jan 17 10:40:12 2018 From: nitsanw at yahoo.com (Nitsan Wakart) Date: Wed, 17 Jan 2018 10:40:12 +0000 (UTC) Subject: Usage of Blackhole in a loop distorts benchmark results In-Reply-To: <1432911516175250@web36o.yandex.ru> References: <1432911516175250@web36o.yandex.ru> Message-ID: <550179939.4914164.1516185612237@mail.yahoo.com> See this post:?http://psy-lob-saw.blogspot.com/2014/08/the-volatile-read-suprise.htmlAnd the current BH code:?http://hg.openjdk.java.net/code-tools/jmh/file/a0c4f5e23278/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l306In summary, blackhole carries semantics of calling into an NOT-inlined method and a memory barrier. This is arguably heavy handed, but consider the formidable foe (the compiler) we are trying to fool with it. We are trying to prevent DCE due to unsunk values. It works, yay!The benchmarks you compare with/out blackhole are therefore very different in meaning, and as such different results are not surprising.? On Wednesday, January 17, 2018 9:47 AM, ?????? ??????? wrote: Say I have this benchmark: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-XX:+UseParallelGC", "-Xms2g", "-Xmx2g"}) public class IteratorFromStreamBenchmark { ? ? @Benchmark ? ? public void iteratorFromStream(Data data, Blackhole bh) { ? ? ? ? Iterator iterator = data.items.stream() ? ? ? ? ? ? ? ? .iterator(); ? ? ? ? while (iterator.hasNext()) ? ? ? ? ? ? bh.consume(iterator.next()); ? ? } ? ? @Benchmark ? ? public void forEach(Data data, Blackhole bh) { ? ? ? ? data.items.stream().forEach(bh::consume); ? ? } ? ? @State(Scope.Thread) ? ? public static class Data { ? ? ? ? private Collection items; ? ? ? ? private int size = 1000; ? ? ? ? @Setup ? ? ? ? public void init() { ? ? ? ? ? ? items = IntStream.range(0, size).boxed().collect(toList()); ? ? ? ? } ? ? } } which on Java 9 (JDK 9, VM 9+181) yields this output: Benchmark? ? ? ? ? ? ? Mode? Cnt? ? ? ? Score? ? ? ? Error? Units forEach? ? ? ? ? ? ? ? ? ? ? avgt? 100? 6130,066 ? 308,597? ns/op iteratorFromStream? ? avgt? 100? 4835,355 ?? 57,886? ns/op Here 'iteratorFromStream' appears to be faster than 'forEach' Then I change the behaviour to accumulate the result of iteration over elements and return it: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-XX:+UseParallelGC", "-Xms2g", "-Xmx2g"}) public class IteratorFromStreamBenchmark { ? ? @Benchmark ? ? public int iteratorFromStream(Data data) { ? ? ? ? int sum = 0; ? ? ? ? Iterator iterator = data.list.stream() ? ? ? ? ? ? ? ? .iterator(); ? ? ? ? while (iterator.hasNext()) ? ? ? ? ? ? sum += iterator.next(); ? ? ? ? return sum; ? ? } ? ? @Benchmark ? ? public int forEach(Data data) { ? ? ? ? int[] sum = {0}; ? ? ? ? data.list.stream().forEach(integer -> sum[0] = sum[0] + integer); ? ? ? ? return sum[0]; ? ? } ? ? @State(Scope.Thread) ? ? public static class Data { ? ? ? ? private List list; ? ? ? ? private int size = 100; ? ? ? ? @Setup ? ? ? ? public void init() { ? ? ? ? ? ? list = IntStream.range(0, size).boxed().collect(toList()); ? ? ? ? } ? ? } } Which yields: Benchmark? ? ? ? ? ? ? Mode? Cnt? ? ? Score? ? Error? Units forEach? ? ? ? ? ? ? ? ? ? ? avgt? 100? 133,118 ? 1,580? ns/op iteratorFromStream? ? avgt? 100? 228,061 ? 5,491? ns/op The question here is not only huge difference in absolute values, but the fact 'forEach' now appears to be faster than 'iteratorFromStream'. Also error is lower in case of value returning benchmark. Could anyone explain is it correct behaviour of Blackhole? Best regards, Sergei Tsypanov From shade at redhat.com Wed Jan 17 11:00:23 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 12:00:23 +0100 Subject: Usage of Blackhole in a loop distorts benchmark results In-Reply-To: <550179939.4914164.1516185612237@mail.yahoo.com> References: <1432911516175250@web36o.yandex.ru> <550179939.4914164.1516185612237@mail.yahoo.com> Message-ID: <1e39f2be-f563-6e09-a007-759ba759411a@redhat.com> On 01/17/2018 11:40 AM, Nitsan Wakart wrote: > See this post: http://psy-lob-saw.blogspot.com/2014/08/the-volatile-read-suprise.htmlAnd the > current BH code: > http://hg.openjdk.java.net/code-tools/jmh/file/a0c4f5e23278/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l306In > summary, blackhole carries semantics of calling into an NOT-inlined method and a memory barrier. > This is arguably heavy handed, but consider the formidable foe (the compiler) we are trying to > fool with it. We are trying to prevent DCE due to unsunk values. It works, yay!The benchmarks you > compare with/out blackhole are therefore very different in meaning, and as such different results > are not surprising. What Nitsan said, and also this is a tangential topic for JMHSample_34_SafeLooping: http://hg.openjdk.java.net/code-tools/jmh/file/a0c4f5e23278/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_34_SafeLooping.java Thanks, -Aleksey From ashipile at redhat.com Wed Jan 17 11:08:10 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 17 Jan 2018 11:08:10 +0000 Subject: hg: code-tools/jmh: 7902096: State order DAG does not work correctly with mixed Scope-d state objects Message-ID: <201801171108.w0HB8A4w000589@aojmv0008.oracle.com> Changeset: e953411d3d5e Author: shade Date: 2018-01-17 11:53 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/e953411d3d5e 7902096: State order DAG does not work correctly with mixed Scope-d state objects + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedBGTIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedBGTTrialDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedBTGIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedBTGTrialDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedGBTIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedGBTTrialDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedGTBIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedGTBTrialDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedTBGIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedTBGTrialDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedTGBIterationDagOrderTest.java + jmh-core-it/src/test/java/org/openjdk/jmh/it/dagorder/MixedTGBTrialDagOrderTest.java ! jmh-core/src/main/java/org/openjdk/jmh/generators/core/StateObjectHandler.java From shade at redhat.com Wed Jan 17 11:04:20 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 12:04:20 +0100 Subject: State setup/tearDown order guaranties In-Reply-To: References: <25269ef8-79c6-547e-f53f-0a46e02ed4ca@gmail.com> Message-ID: <0e895ae1-9326-2305-2f9f-8804c3b8ffe5@redhat.com> On 01/17/2018 10:21 AM, Aleksey Shipilev wrote: > On 01/17/2018 04:21 AM, Alexei Osipov wrote: >> Are there any docs on execution order guaranties for "setup/tearDown" methods in @State objects with >> dependencies? >> [...] >> So I wonder if I'm doing something wrong or JMH just does not provide order guaranties for the case >> with @State(Scope.Thread) + @Setup(Level.Iteration). Any recommendations on what should I check? > > You are not doing anything wrong: the @State dependencies should work as you would expect: all > states that are coming as arguments should have their helper methods executed. Except that there is > a JMH bug: it first runs all the helpers for Scope.Thread in DAG order, and then runs all the > helpers for Scope.{Benchmark|Iteration} in DAG order, which obviously breaks this assumption. > > Let me handle this. Fixed as: https://bugs.openjdk.java.net/browse/CODETOOLS-7902096 -Aleksey From shade at redhat.com Wed Jan 17 12:20:47 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 13:20:47 +0100 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> Message-ID: <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: > I have limited access to different versions of Mac OS X, but it seems that in some minor updates > DTrace works with SIP enabled. > So as solution I'd suggest to check SIP status on profiler start (via "csrutil status") and print > warning if it's enabled or just clarify it in javadoc. It's up to Alexey to decide what approach is > preferable in JMH? So, perf* profiler print warning messages when then number of samples is suspiciously low. I think dtraceasm profiler should do the same, and clearly say what the user is supposed to do: WARNING: The perf event count is suspiciously low (" + sum + "). The performance data might be inaccurate or misleading. Try to do the profiling again, or tune up the sampling frequency. With some profilers on Mac OS X, System Integrity Protection (SIP) may prevent profiling. In such case, temporarily disabling SIP with 'csrutil disable' might help. I'll add this to the patch myself. I can push this without my own testing, hoping that external people validated this. OpenJDK rules require the patch to be hosted on OpenJDK infra to get accepted, so, Vsevolod, can you please post the recent version of the patch inline here? Thanks -Aleksey From henri.tremblay at gmail.com Wed Jan 17 12:58:40 2018 From: henri.tremblay at gmail.com (Henri Tremblay) Date: Wed, 17 Jan 2018 07:58:40 -0500 Subject: DTrace asm profiler for Mac OS X In-Reply-To: <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: Well found! It seems much better. Then, it still not found lines. I thought JMH was adding the correct PrintAssembly flags when forking. But the issue seems to be similar with the warning I had for perfasm. ERROR: No address lines detected in assembly capture, make sure your JDK is PrintAssembly-enabled: https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly Perf output processed (skipped 2.531 seconds): Column 1: sampled_pc (11033 events) Hottest code regions (>10.00% "sampled_pc" events): ....[Hottest Region 1].............................................................................. [unknown], [unknown] (231 bytes) .................................................................................................... 22.54% ....[Hottest Region 2].............................................................................. [unknown], [unknown] (134 bytes) .................................................................................................... 11.31% ....[Hottest Region 3].............................................................................. [unknown], [unknown] (41 bytes) .................................................................................................... 10.39% ....[Hottest Region 4].............................................................................. [unknown], [unknown] (417 bytes) .................................................................................................... 10.27% ....[Hottest Regions]............................................................................... 22.54% [unknown] [unknown] (231 bytes) 11.31% [unknown] [unknown] (134 bytes) 10.39% [unknown] [unknown] (41 bytes) 10.27% [unknown] [unknown] (417 bytes) 8.45% [unknown] [unknown] (396 bytes) 5.71% [unknown] [unknown] (43 bytes) 5.66% [unknown] [unknown] (55 bytes) 5.45% [unknown] [unknown] (84 bytes) 4.17% [unknown] [unknown] (209 bytes) 2.47% [unknown] [unknown] (71 bytes) 2.12% [unknown] [unknown] (28 bytes) 2.04% [unknown] [unknown] (28 bytes) 1.63% [unknown] [unknown] (170 bytes) 0.95% [unknown] [unknown] (86 bytes) 0.89% [unknown] [unknown] (5 bytes) 0.83% [unknown] [unknown] (127 bytes) 0.81% [unknown] [unknown] (0 bytes) 0.55% [unknown] [unknown] (46 bytes) 0.44% [unknown] [unknown] (71 bytes) 0.41% [unknown] [unknown] (257 bytes) 2.91% <...other 73 warm regions...> .................................................................................................... 99.99% ....[Hottest Methods (after inlining)].............................................................. 99.99% [unknown] [unknown] .................................................................................................... 99.99% ....[Distribution by Source]........................................................................ 99.99% [unknown] .................................................................................................... 99.99% # Run complete. Total time: 00:00:18 Benchmark (implementation) (readRatio) (tableSize) Mode Cnt Score Error Units ConcurrentMapThroughput.randomGetPutRemove NonBlockingHashMap 50 100000 thrpt 6 22470630.498 ? 3730613.043 ops/s ConcurrentMapThroughput.randomGetPutRemove:?asm NonBlockingHashMap 50 100000 thrpt NaN --- ConcurrentMapThroughput.randomGetPutRemove ConcurrentHashMap 50 100000 thrpt 6 23411000.106 ? 1960758.197 ops/s ConcurrentMapThroughput.randomGetPutRemove:?asm ConcurrentHashMap 50 100000 thrpt NaN --- On 17 January 2018 at 07:20, Aleksey Shipilev wrote: > On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: > > I have limited access to different versions of Mac OS X, but it seems > that in some minor updates > > DTrace works with SIP enabled. > > So as solution I'd suggest to check SIP status on profiler start (via > "csrutil status") and print > > warning if it's enabled or just clarify it in javadoc. It's up to Alexey > to decide what approach is > > preferable in JMH > > So, perf* profiler print warning messages when then number of samples is > suspiciously low. I think > dtraceasm profiler should do the same, and clearly say what the user is > supposed to do: > > WARNING: The perf event count is suspiciously low (" + sum + "). The > performance data might be > inaccurate or misleading. Try to do the profiling again, or tune up the > sampling frequency. > With some profilers on Mac OS X, System Integrity Protection (SIP) may > prevent profiling. > In such case, temporarily disabling SIP with 'csrutil disable' might help. > > I'll add this to the patch myself. > > I can push this without my own testing, hoping that external people > validated this. OpenJDK rules > require the patch to be hosted on OpenJDK infra to get accepted, so, > Vsevolod, can you please post > the recent version of the patch inline here? > > Thanks > -Aleksey > > > From sergei.tsypanov at yandex.ru Wed Jan 17 20:43:31 2018 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Wed, 17 Jan 2018 22:43:31 +0200 Subject: Usage of Blackhole in a loop distorts benchmark results In-Reply-To: <550179939.4914164.1516185612237@mail.yahoo.com> References: <1432911516175250@web36o.yandex.ru> <550179939.4914164.1516185612237@mail.yahoo.com> Message-ID: <526851516221811@web48j.yandex.ru> @Nitsan thank you for your explanation, I've read your article and found it quite helpful. One thing, however is odd to me: different ratio between benchmarks. Assume volatile reads inside of Blackhole bring some some steady impact onto benchmarks and then it should be the same for all iterations, right? If this assumption is correct then ratio must remain the same in spite of the fact absolute values differ. In my case ratio is different. @Alexey 1) JMH-generated code provides loops calling @Benchmark-annotated method like this one: ------- do { blackhole.consume(l_iteratorfromstreambenchmark0_0.iteratorFromCollectedList(l_data1_1)); operations++; } while(!control.isDone); ------- Here we also have an instance of org.openjdk.jmh.infra.Blackhole swallowing the value returned from my method and volatile-read related effects should effectively take place for the preceding code just like for 'goodOldLoopReturns' method in Nitsan's example preventing DCE. But results indicate the opposite. 2) As mentioned in the comment to JMHSample_34_SafeLooping.measureWrong_2 HotSpot does loop unrolling, so I pass -XX:LoopUnrollLimit=0 as an argument of @Fork.jvmAppendArgs over my "accumulating" benchmark. Result of 'iteratorFromStream' remains the same, but 'forEach' gets almost 3 times slower (740 -> 1971 ns). Both do looping, but unrolling affects only one of them. What's the matter? Best regards! From nitsanw at yahoo.com Wed Jan 17 21:07:01 2018 From: nitsanw at yahoo.com (Nitsan Wakart) Date: Wed, 17 Jan 2018 23:07:01 +0200 Subject: Usage of Blackhole in a loop distorts benchmark results In-Reply-To: <526851516221811@web48j.yandex.ru> References: <1432911516175250@web36o.yandex.ru> <550179939.4914164.1516185612237@mail.yahoo.com> <526851516221811@web48j.yandex.ru> Message-ID: <03B2F718-758C-440A-A8AA-1694521303A6@yahoo.com> I appreciate your curiosity, now if you really want to know why things are the way they are, I suggest you do it by profiling A and B with -prof perfasm and keep digging ;-) You have not run into an issue with JMH, but into the reality of the JVM. Have fun! > On 17 Jan 2018, at 22:43, ?????? ??????? wrote: > > @Nitsan > > thank you for your explanation, I've read your article and found it quite helpful. > > One thing, however is odd to me: different ratio between benchmarks. Assume volatile reads inside of Blackhole bring some some steady impact onto benchmarks and then it should be the same for all iterations, right? > If this assumption is correct then ratio must remain the same in spite of the fact absolute values differ. In my case ratio is different. > > @Alexey > > 1) JMH-generated code provides loops calling @Benchmark-annotated method like this one: > > ------- > do { > blackhole.consume(l_iteratorfromstreambenchmark0_0.iteratorFromCollectedList(l_data1_1)); > operations++; > } while(!control.isDone); > ------- > > Here we also have an instance of org.openjdk.jmh.infra.Blackhole swallowing the value returned from my method and volatile-read related effects should effectively take place for the preceding code just like for 'goodOldLoopReturns' method in Nitsan's example preventing DCE. But results indicate the opposite. > > 2) As mentioned in the comment to JMHSample_34_SafeLooping.measureWrong_2 HotSpot does loop unrolling, so I pass -XX:LoopUnrollLimit=0 as an argument of @Fork.jvmAppendArgs over my "accumulating" benchmark. Result of 'iteratorFromStream' remains the same, but 'forEach' gets almost 3 times slower (740 -> 1971 ns). Both do looping, but unrolling affects only one of them. What's the matter? > > Best regards! > > > From qwwdfsad at gmail.com Sat Jan 20 19:52:41 2018 From: qwwdfsad at gmail.com (Vsevolod Tolstopyatov) Date: Sat, 20 Jan 2018 22:52:41 +0300 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: >Vsevolod, can you please post the recent version of the patch inline here? Here it is. Note that I've changed one line: "String[] splits = line.split(" ");" -> "String[] splits = line.split(" ", 5);" to properly handle native symbols (especially ones from libjvm.dylib). Retested again on JMH samples + VM-heavy benchmarks (String#intern, exceptions) diff -r 1ddf31f810a3 jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java --- a/jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java Fri Sep 22 18:11:47 2017 +0200 +++ b/jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java Sat Jan 20 22:38:48 2018 +0300 @@ -270,7 +270,7 @@ @Override public Collection afterTrial(BenchmarkResult br, long pid, File stdOut, File stdErr) { - PerfResult result = processAssembly(br, stdOut, stdErr); + PerfResult result = processAssembly(br); // we know these are not needed anymore, proactively delete hsLog.delete(); @@ -311,7 +311,7 @@ */ protected abstract String perfBinaryExtension(); - private PerfResult processAssembly(BenchmarkResult br, File stdOut, File stdErr) { + private PerfResult processAssembly(BenchmarkResult br) { /** * 1. Parse binary events. */ @@ -647,11 +647,11 @@ } } - void printDottedLine(PrintWriter pw) { + private void printDottedLine(PrintWriter pw) { printDottedLine(pw, null); } - void printDottedLine(PrintWriter pw, String header) { + private void printDottedLine(PrintWriter pw, String header) { final int HEADER_WIDTH = 100; pw.print("...."); @@ -668,7 +668,7 @@ pw.println(); } - List makeRegions(Assembly asms, PerfEvents events) { + private List makeRegions(Assembly asms, PerfEvents events) { List regions = new ArrayList<>(); SortedSet allAddrs = events.getAllAddresses(); @@ -728,7 +728,7 @@ return intervals; } - Collection> splitAssembly(File stdOut) { + private Collection> splitAssembly(File stdOut) { try (FileReader in = new FileReader(stdOut); BufferedReader br = new BufferedReader(in)) { Multimap writerToLines = new HashMultimap<>(); @@ -764,7 +764,7 @@ } } - Assembly readAssembly(File stdOut) { + private Assembly readAssembly(File stdOut) { List lines = new ArrayList<>(); SortedMap addressMap = new TreeMap<>(); @@ -919,11 +919,11 @@ static class PerfResultAggregator implements Aggregator { @Override public PerfResult aggregate(Collection results) { - String output = ""; + StringBuilder output = new StringBuilder(); for (PerfResult r : results) { - output += r.output; + output.append(r.output); } - return new PerfResult(output); + return new PerfResult(output.toString()); } } diff -r 1ddf31f810a3 jmh-core/src/main/java/org/openjdk/jmh/profile/DTraceAsmProfiler.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/jmh-core/src/main/java/org/openjdk/jmh/profile/DTraceAsmProfiler.java Sat Jan 20 22:38:48 2018 +0300 @@ -0,0 +1,194 @@ +package org.openjdk.jmh.profile; + +import joptsimple.OptionException; +import joptsimple.OptionParser; +import joptsimple.OptionSpec; +import org.openjdk.jmh.infra.BenchmarkParams; +import org.openjdk.jmh.results.BenchmarkResult; +import org.openjdk.jmh.results.Result; +import org.openjdk.jmh.util.*; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.Collection; +import java.util.Collections; +import java.util.Map; +import java.util.TreeMap; +import java.util.concurrent.TimeUnit; + +/** + * Mac OS X perfasm profiler based on DTrace "profile-n" provider which samples program counter by timer interrupt. + * Due to DTrace limitations on Mac OS X target JVM cannot be run directly under DTrace control, so DTrace is run separately, + * all processes are sampled and irrelevant samples are filtered out in {@link #readEvents(double, double)} stage. + * Super user privileges are required in order to run DTrace. + *

+ * If you see a lot of "[unknown]" regions in profile then you are probably hitting kernel code, kernel sampling is not yet supported. + * + * @author Tolstopyatov Vsevolod + * @since 18/10/2017 + */ +public class DTraceAsmProfiler extends AbstractPerfAsmProfiler { + + private final long sampleFrequency; + private volatile String pid; + private volatile Process dtraceProcess; + private OptionSpec optFrequency; + + public DTraceAsmProfiler(String initLine) throws ProfilerException { + super(initLine, "sampled_pc"); + + // Check DTrace availability + Collection messages = Utils.tryWith("sudo", "dtrace", "-V"); + if (!messages.isEmpty()) { + throw new ProfilerException(messages.toString()); + } + + try { + sampleFrequency = set.valueOf(optFrequency); + } catch (OptionException e) { + throw new ProfilerException(e.getMessage()); + } + } + + @Override + public void beforeTrial(BenchmarkParams params) { + super.beforeTrial(params); + } + + @Override + public Collection afterTrial(BenchmarkResult br, long pid, File stdOut, File stdErr) { + if (pid == 0) { + throw new IllegalStateException("DTrace needs the forked VM PID, but it is not initialized"); + } + + Collection messages = Utils.destroy(dtraceProcess); + if (!messages.isEmpty()) { + throw new IllegalStateException(messages.toString()); + } + + this.pid = String.valueOf(pid); + return super.afterTrial(br, pid, stdOut, stdErr); + } + + @Override + public Collection addJVMInvokeOptions(BenchmarkParams params) { + dtraceProcess = Utils.runAsync("sudo", "dtrace", "-n", "profile-" + sampleFrequency + + " /arg1/ { printf(\"%d 0x%lx %d\", pid, arg1, timestamp); ufunc(arg1)}", "-o", + perfBinData.getAbsolutePath()); + return Collections.emptyList(); + } + + @Override + public String getDescription() { + return "DTrace profile provider + PrintAssembly Profiler"; + } + + @Override + protected void addMyOptions(OptionParser parser) { + optFrequency = parser.accepts("frequency", + "Sampling frequency. This is synonymous to profile-#") + .withRequiredArg().ofType(Long.class).describedAs("freq").defaultsTo(1001L); + } + + @Override + protected void parseEvents() { + // Do nothing because DTrace writes text output anyway + } + + @Override + protected PerfEvents readEvents(double skipMs, double lenMs) { + long start = (long) skipMs; + long end = (long) (skipMs + lenMs); + + try (FileReader fr = new FileReader(perfBinData.file()); + BufferedReader reader = new BufferedReader(fr)) { + + Deduplicator dedup = new Deduplicator<>(); + Multimap methods = new HashMultimap<>(); + Multiset events = new TreeMultiset<>(); + + long dtraceTimestampBase = 0L; + String line; + while ((line = reader.readLine()) != null) { + + // Filter out DTrace misc + if (!line.contains(":profile")) { + continue; + } + + line = line.trim(); + line = line.substring(line.indexOf(":profile")); + String[] splits = line.split(" ", 5); + String sampledPid = splits[1]; + + if (!sampledPid.equals(pid)) { + continue; + } + + // Sometimes DTrace ufunc fails and gives no information about symbols + if (splits.length < 4) { + continue; + } + + long timestamp = Long.valueOf(splits[3]); + if (dtraceTimestampBase == 0) { + // Use first event timestamp as base for time comparison + dtraceTimestampBase = timestamp; + continue; + } + + long elapsed = timestamp - dtraceTimestampBase; + long elapsedMs = TimeUnit.NANOSECONDS.toMillis(elapsed); + + if (elapsedMs < start || elapsedMs > end) { + continue; + } + + long address = Long.decode(splits[2]); + events.add(address); + + String methodLine = splits[4]; + // JIT-compiled code has address instead of symbol information + if (methodLine.startsWith("0x")) { + continue; + } + + String symbol = "[unknown]"; + String[] methodSplit = methodLine.split("`"); + String library = methodSplit[0]; + if ("".equals(library)) { + library = "[unknown]"; + } + + if (methodSplit.length == 2) { + symbol = methodSplit[1]; + } + + methods.put(dedup.dedup(MethodDesc.nativeMethod(symbol, library)), address); + } + + IntervalMap methodMap = new IntervalMap<>(); + for (MethodDesc md : methods.keys()) { + Collection longs = methods.get(md); + methodMap.add(md, Utils.min(longs), Utils.max(longs)); + } + + Map> allEvents = new TreeMap<>(); + assert this.events.size() == 1; + allEvents.put(this.events.get(0), events); + return new PerfEvents(this.events, allEvents, methodMap); + + } catch (IOException e) { + return new PerfEvents(events); + } + + } + + @Override + protected String perfBinaryExtension() { + // DTrace produces human-readable txt + return ".txt"; + } +} diff -r 1ddf31f810a3 jmh-core/src/main/java/org/openjdk/jmh/profile/ProfilerFactory.java --- a/jmh-core/src/main/java/org/openjdk/jmh/profile/ProfilerFactory.java Fri Sep 22 18:11:47 2017 +0200 +++ b/jmh-core/src/main/java/org/openjdk/jmh/profile/ProfilerFactory.java Sat Jan 20 22:38:48 2018 +0300 @@ -27,7 +27,6 @@ import org.openjdk.jmh.runner.options.ProfilerConfig; import java.io.PrintStream; -import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; import java.util.*; @@ -178,6 +177,7 @@ BUILT_IN.put("perfnorm", LinuxPerfNormProfiler.class); BUILT_IN.put("perfasm", LinuxPerfAsmProfiler.class); BUILT_IN.put("xperfasm", WinPerfAsmProfiler.class); + BUILT_IN.put("dtraceasm", DTraceAsmProfiler.class); BUILT_IN.put("pauses", PausesProfiler.class); BUILT_IN.put("safepoints", SafepointsProfiler.class); } diff -r 1ddf31f810a3 jmh-core/src/main/java/org/openjdk/jmh/util/Utils.java --- a/jmh-core/src/main/java/org/openjdk/jmh/util/Utils.java Fri Sep 22 18:11:47 2017 +0200 +++ b/jmh-core/src/main/java/org/openjdk/jmh/util/Utils.java Sat Jan 20 22:38:48 2018 +0300 @@ -446,6 +446,31 @@ return messages; } + public static Process runAsync(String... cmd) { + try { + return new ProcessBuilder(cmd).start(); + } catch (IOException ex) { + throw new IllegalStateException(ex); + } + } + + public static Collection destroy(Process process) { + Collection messages = new ArrayList<>(); + try { + ByteArrayOutputStream baos = new ByteArrayOutputStream(); + process.destroy(); + int exitCode = process.waitFor(); + if (exitCode == 0) { + return Collections.emptyList(); + } + + messages.add(baos.toString()); + return messages; + } catch (InterruptedException e) { + throw new IllegalStateException(e); + } + } + public static Collection runWith(List cmd) { Collection messages = new ArrayList<>(); try { diff -r 1ddf31f810a3 jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java --- a/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java Fri Sep 22 18:11:47 2017 +0200 +++ b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java Sat Jan 20 22:38:48 2018 +0300 @@ -33,6 +33,7 @@ import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; import org.openjdk.jmh.profile.ClassloaderProfiler; +import org.openjdk.jmh.profile.DTraceAsmProfiler; import org.openjdk.jmh.profile.LinuxPerfProfiler; import org.openjdk.jmh.profile.StackProfiler; import org.openjdk.jmh.runner.Runner; @@ -352,7 +353,7 @@ * $ java -jar target/benchmarks.jar JMHSample_35.*Atomic -prof perfnorm -f 3 (Linux) * $ java -jar target/benchmarks.jar JMHSample_35.*Atomic -prof perfasm -f 1 (Linux) * $ java -jar target/benchmarks.jar JMHSample_35.*Atomic -prof xperfasm -f 1 (Windows) - * + * $ java -jar target/benchmarks.jar JMHSample_35.*Atomic -prof dtraceasm -f 1 (Mac OS X) * b) Via the Java API: * (see the JMH homepage for possible caveats when running from IDE: * http://openjdk.java.net/projects/code-tools/jmh/) @@ -365,6 +366,7 @@ // .addProfiler(LinuxPerfNormProfiler.class) // .addProfiler(LinuxPerfAsmProfiler.class) // .addProfiler(WinPerfAsmProfiler.class) +// .addProfiler(DTraceAsmProfiler.class) .build(); new Runner(opt).run(); -- Best regards, Tolstopyatov Vsevolod On Wed, Jan 17, 2018 at 3:58 PM, Henri Tremblay wrote: > Well found! It seems much better. > > Then, it still not found lines. I thought JMH was adding the correct > PrintAssembly flags when forking. But the issue seems to be similar with > the warning I had for perfasm. > > ERROR: No address lines detected in assembly capture, make sure your JDK > is PrintAssembly-enabled: > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > Perf output processed (skipped 2.531 seconds): > Column 1: sampled_pc (11033 events) > > Hottest code regions (>10.00% "sampled_pc" events): > > ....[Hottest Region 1]............................ > .................................................. > [unknown], [unknown] (231 bytes) > > > ............................................................ > ........................................ > 22.54% > > ....[Hottest Region 2]............................ > .................................................. > [unknown], [unknown] (134 bytes) > > > ............................................................ > ........................................ > 11.31% > > ....[Hottest Region 3]............................ > .................................................. > [unknown], [unknown] (41 bytes) > > > ............................................................ > ........................................ > 10.39% > > ....[Hottest Region 4]............................ > .................................................. > [unknown], [unknown] (417 bytes) > > > ............................................................ > ........................................ > 10.27% > > ....[Hottest Regions].................................................... > ........................... > 22.54% [unknown] [unknown] (231 bytes) > 11.31% [unknown] [unknown] (134 bytes) > 10.39% [unknown] [unknown] (41 bytes) > 10.27% [unknown] [unknown] (417 bytes) > 8.45% [unknown] [unknown] (396 bytes) > 5.71% [unknown] [unknown] (43 bytes) > 5.66% [unknown] [unknown] (55 bytes) > 5.45% [unknown] [unknown] (84 bytes) > 4.17% [unknown] [unknown] (209 bytes) > 2.47% [unknown] [unknown] (71 bytes) > 2.12% [unknown] [unknown] (28 bytes) > 2.04% [unknown] [unknown] (28 bytes) > 1.63% [unknown] [unknown] (170 bytes) > 0.95% [unknown] [unknown] (86 bytes) > 0.89% [unknown] [unknown] (5 bytes) > 0.83% [unknown] [unknown] (127 bytes) > 0.81% [unknown] [unknown] (0 bytes) > 0.55% [unknown] [unknown] (46 bytes) > 0.44% [unknown] [unknown] (71 bytes) > 0.41% [unknown] [unknown] (257 bytes) > 2.91% <...other 73 warm regions...> > ............................................................ > ........................................ > 99.99% > > ....[Hottest Methods (after inlining)].................... > .......................................... > 99.99% [unknown] [unknown] > ............................................................ > ........................................ > 99.99% > > ....[Distribution by Source]....................... > ................................................. > 99.99% [unknown] > ............................................................ > ........................................ > 99.99% > > > > # Run complete. Total time: 00:00:18 > > Benchmark (implementation) > (readRatio) (tableSize) Mode Cnt Score Error Units > ConcurrentMapThroughput.randomGetPutRemove NonBlockingHashMap > 50 100000 thrpt 6 22470630.498 ? 3730613.043 ops/s > ConcurrentMapThroughput.randomGetPutRemove:?asm NonBlockingHashMap > 50 100000 thrpt NaN --- > ConcurrentMapThroughput.randomGetPutRemove ConcurrentHashMap > 50 100000 thrpt 6 23411000.106 ? 1960758.197 ops/s > ConcurrentMapThroughput.randomGetPutRemove:?asm ConcurrentHashMap > 50 100000 thrpt NaN --- > > On 17 January 2018 at 07:20, Aleksey Shipilev wrote: > >> On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: >> > I have limited access to different versions of Mac OS X, but it seems >> that in some minor updates >> > DTrace works with SIP enabled. >> > So as solution I'd suggest to check SIP status on profiler start (via >> "csrutil status") and print >> > warning if it's enabled or just clarify it in javadoc. It's up to >> Alexey to decide what approach is >> > preferable in JMH >> >> So, perf* profiler print warning messages when then number of samples is >> suspiciously low. I think >> dtraceasm profiler should do the same, and clearly say what the user is >> supposed to do: >> >> WARNING: The perf event count is suspiciously low (" + sum + "). The >> performance data might be >> inaccurate or misleading. Try to do the profiling again, or tune up the >> sampling frequency. >> With some profilers on Mac OS X, System Integrity Protection (SIP) may >> prevent profiling. >> In such case, temporarily disabling SIP with 'csrutil disable' might >> help. >> >> I'll add this to the patch myself. >> >> I can push this without my own testing, hoping that external people >> validated this. OpenJDK rules >> require the patch to be hosted on OpenJDK infra to get accepted, so, >> Vsevolod, can you please post >> the recent version of the patch inline here? >> >> Thanks >> -Aleksey >> >> >> > From qwwdfsad at gmail.com Sat Jan 20 19:55:02 2018 From: qwwdfsad at gmail.com (Vsevolod Tolstopyatov) Date: Sat, 20 Jan 2018 22:55:02 +0300 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: >Then, it still not found lines. I thought JMH was adding the correct PrintAssembly flags when forking. But the issue seems to be similar with the warning I had for perfasm. Yes, that's your PrintAssembly-related issues. Just ran ConcurrentMapThroughput with dtraceasm on latest JCTools, everything seems fine -- Best regards, Tolstopyatov Vsevolod On Wed, Jan 17, 2018 at 3:58 PM, Henri Tremblay wrote: > Well found! It seems much better. > > Then, it still not found lines. I thought JMH was adding the correct > PrintAssembly flags when forking. But the issue seems to be similar with > the warning I had for perfasm. > > ERROR: No address lines detected in assembly capture, make sure your JDK > is PrintAssembly-enabled: > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > Perf output processed (skipped 2.531 seconds): > Column 1: sampled_pc (11033 events) > > Hottest code regions (>10.00% "sampled_pc" events): > > ....[Hottest Region 1]............................ > .................................................. > [unknown], [unknown] (231 bytes) > > > ............................................................ > ........................................ > 22.54% > > ....[Hottest Region 2]............................ > .................................................. > [unknown], [unknown] (134 bytes) > > > ............................................................ > ........................................ > 11.31% > > ....[Hottest Region 3]............................ > .................................................. > [unknown], [unknown] (41 bytes) > > > ............................................................ > ........................................ > 10.39% > > ....[Hottest Region 4]............................ > .................................................. > [unknown], [unknown] (417 bytes) > > > ............................................................ > ........................................ > 10.27% > > ....[Hottest Regions].................................................... > ........................... > 22.54% [unknown] [unknown] (231 bytes) > 11.31% [unknown] [unknown] (134 bytes) > 10.39% [unknown] [unknown] (41 bytes) > 10.27% [unknown] [unknown] (417 bytes) > 8.45% [unknown] [unknown] (396 bytes) > 5.71% [unknown] [unknown] (43 bytes) > 5.66% [unknown] [unknown] (55 bytes) > 5.45% [unknown] [unknown] (84 bytes) > 4.17% [unknown] [unknown] (209 bytes) > 2.47% [unknown] [unknown] (71 bytes) > 2.12% [unknown] [unknown] (28 bytes) > 2.04% [unknown] [unknown] (28 bytes) > 1.63% [unknown] [unknown] (170 bytes) > 0.95% [unknown] [unknown] (86 bytes) > 0.89% [unknown] [unknown] (5 bytes) > 0.83% [unknown] [unknown] (127 bytes) > 0.81% [unknown] [unknown] (0 bytes) > 0.55% [unknown] [unknown] (46 bytes) > 0.44% [unknown] [unknown] (71 bytes) > 0.41% [unknown] [unknown] (257 bytes) > 2.91% <...other 73 warm regions...> > ............................................................ > ........................................ > 99.99% > > ....[Hottest Methods (after inlining)].................... > .......................................... > 99.99% [unknown] [unknown] > ............................................................ > ........................................ > 99.99% > > ....[Distribution by Source]....................... > ................................................. > 99.99% [unknown] > ............................................................ > ........................................ > 99.99% > > > > # Run complete. Total time: 00:00:18 > > Benchmark (implementation) > (readRatio) (tableSize) Mode Cnt Score Error Units > ConcurrentMapThroughput.randomGetPutRemove NonBlockingHashMap > 50 100000 thrpt 6 22470630.498 ? 3730613.043 ops/s > ConcurrentMapThroughput.randomGetPutRemove:?asm NonBlockingHashMap > 50 100000 thrpt NaN --- > ConcurrentMapThroughput.randomGetPutRemove ConcurrentHashMap > 50 100000 thrpt 6 23411000.106 ? 1960758.197 ops/s > ConcurrentMapThroughput.randomGetPutRemove:?asm ConcurrentHashMap > 50 100000 thrpt NaN --- > > On 17 January 2018 at 07:20, Aleksey Shipilev wrote: > >> On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: >> > I have limited access to different versions of Mac OS X, but it seems >> that in some minor updates >> > DTrace works with SIP enabled. >> > So as solution I'd suggest to check SIP status on profiler start (via >> "csrutil status") and print >> > warning if it's enabled or just clarify it in javadoc. It's up to >> Alexey to decide what approach is >> > preferable in JMH >> >> So, perf* profiler print warning messages when then number of samples is >> suspiciously low. I think >> dtraceasm profiler should do the same, and clearly say what the user is >> supposed to do: >> >> WARNING: The perf event count is suspiciously low (" + sum + "). The >> performance data might be >> inaccurate or misleading. Try to do the profiling again, or tune up the >> sampling frequency. >> With some profilers on Mac OS X, System Integrity Protection (SIP) may >> prevent profiling. >> In such case, temporarily disabling SIP with 'csrutil disable' might >> help. >> >> I'll add this to the patch myself. >> >> I can push this without my own testing, hoping that external people >> validated this. OpenJDK rules >> require the patch to be hosted on OpenJDK infra to get accepted, so, >> Vsevolod, can you please post >> the recent version of the patch inline here? >> >> Thanks >> -Aleksey >> >> >> > From henri.tremblay at gmail.com Mon Jan 22 00:32:48 2018 From: henri.tremblay at gmail.com (Henri Tremblay) Date: Sun, 21 Jan 2018 19:32:48 -0500 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: Do you have Java code matching memory addresses? Because I know get ERROR: No address lines detected in assembly capture, make sure your JDK is PrintAssembly-enabled: https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly Perf output processed (skipped 2.585 seconds): Column 1: sampled_pc (10804 events) Hottest code regions (>10.00% "sampled_pc" events): ....[Hottest Region 1].............................................................................. 0x110543f36, [unknown] (584 bytes) .................................................................................................... 22.08% ....[Hottest Region 2].............................................................................. 0x11055a27b, [unknown] (369 bytes) .................................................................................................... 21.54% ....[Hottest Region 3].............................................................................. 0x110569b7f, [unknown] (650 bytes) .................................................................................................... 12.44% ....[Hottest Region 4].............................................................................. 0x11055a140, [unknown] (64 bytes) .................................................................................................... 12.10% ....[Hottest Region 5].............................................................................. 0x110543e00, [unknown] (52 bytes) .................................................................................................... 12.01% ....[Hottest Regions]............................................................................... 22.08% 0x110543f36 [unknown] (584 bytes) 21.54% 0x11055a27b [unknown] (369 bytes) 12.44% 0x110569b7f [unknown] (650 bytes) 12.10% 0x11055a140 [unknown] (64 bytes) 12.01% 0x110543e00 [unknown] (52 bytes) 6.30% 0x11055a540 [unknown] (217 bytes) 5.89% 0x11055a666 [unknown] (361 bytes) 3.46% 0x11055a805 [unknown] (85 bytes) 1.99% 0x11053da80 [unknown] (71 bytes) 1.49% 0x1105441d1 [unknown] (72 bytes) 0.22% 0x110543ea3 [unknown] (7 bytes) 0.12% 0x110569f09 [unknown] (16 bytes) 0.11% 0x11055a1d8 [unknown] (2 bytes) 0.08% 0x11055a63c [unknown] (9 bytes) 0.06% 0x11055a9ee [unknown] (5 bytes) 0.01% libjvm.dylib Assembler::locate_operand(unsigned char*, Assembler::WhichOperand) (0 bytes) 0.01% libjvm.dylib LIR_OpVisitState::append(LIR_OprDesc*&, LIR_OpVisitState::OprMode) (0 bytes) 0.01% libjvm.dylib GraphBuilder::push_scope(ciMethod*, BlockBegin*) (0 bytes) 0.01% libjvm.dylib LIR_Assembler::reg2stack(LIR_OprDesc*, LIR_OprDesc*, BasicType, bool) (0 bytes) 0.01% libjvm.dylib NullCheckEliminator::visit(Instruction**) (0 bytes) 0.05% <...other 5 warm regions...> .................................................................................................... 99.99% ....[Hottest Methods (after inlining)].............................................................. 22.08% 0x110543f36 [unknown] 21.54% 0x11055a27b [unknown] 12.44% 0x110569b7f [unknown] 12.10% 0x11055a140 [unknown] 12.01% 0x110543e00 [unknown] 6.30% 0x11055a540 [unknown] 5.89% 0x11055a666 [unknown] 3.46% 0x11055a805 [unknown] 1.99% 0x11053da80 [unknown] 1.49% 0x1105441d1 [unknown] 0.22% 0x110543ea3 [unknown] 0.12% 0x110569f09 [unknown] 0.11% 0x11055a1d8 [unknown] 0.08% 0x11055a63c [unknown] 0.06% 0x11055a9ee [unknown] 0.01% libjvm.dylib Assembler::locate_operand(unsigned char*, Assembler::WhichOperand) 0.01% libjvm.dylib NullCheckEliminator::visit(Instruction**) 0.01% libjvm.dylib LIR_OpVisitState::append(LIR_OprDesc*&, LIR_OpVisitState::OprMode) 0.01% libjvm.dylib GrowableArray::append(Metadata* const&) 0.01% 0x1102dd079 [unknown] 0.05% <...other 5 warm methods...> .................................................................................................... 99.99% ....[Distribution by Source]........................................................................ 22.08% 0x110543f36 21.54% 0x11055a27b 12.44% 0x110569b7f 12.10% 0x11055a140 12.01% 0x110543e00 6.30% 0x11055a540 5.89% 0x11055a666 3.46% 0x11055a805 1.99% 0x11053da80 1.49% 0x1105441d1 0.22% 0x110543ea3 0.12% 0x110569f09 0.11% 0x11055a1d8 0.08% 0x11055a63c 0.06% libjvm.dylib 0.06% 0x11055a9ee 0.01% 0x1102dd079 0.01% libsystem_kernel.dylib 0.01% libsystem_c.dylib .................................................................................................... 99.99% On 20 January 2018 at 14:55, Vsevolod Tolstopyatov wrote: > >Then, it still not found lines. I thought JMH was adding the correct > PrintAssembly flags when forking. But the issue seems to be similar with > the warning I had for perfasm. > Yes, that's your PrintAssembly-related issues. > > Just ran ConcurrentMapThroughput with dtraceasm on latest JCTools, > everything seems fine > > -- > Best regards, > Tolstopyatov Vsevolod > > On Wed, Jan 17, 2018 at 3:58 PM, Henri Tremblay > wrote: > >> Well found! It seems much better. >> >> Then, it still not found lines. I thought JMH was adding the correct >> PrintAssembly flags when forking. But the issue seems to be similar with >> the warning I had for perfasm. >> >> ERROR: No address lines detected in assembly capture, make sure your JDK >> is PrintAssembly-enabled: >> https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly >> >> Perf output processed (skipped 2.531 seconds): >> Column 1: sampled_pc (11033 events) >> >> Hottest code regions (>10.00% "sampled_pc" events): >> >> ....[Hottest Region 1]............................ >> .................................................. >> [unknown], [unknown] (231 bytes) >> >> >> ............................................................ >> ........................................ >> 22.54% >> >> ....[Hottest Region 2]............................ >> .................................................. >> [unknown], [unknown] (134 bytes) >> >> >> ............................................................ >> ........................................ >> 11.31% >> >> ....[Hottest Region 3]............................ >> .................................................. >> [unknown], [unknown] (41 bytes) >> >> >> ............................................................ >> ........................................ >> 10.39% >> >> ....[Hottest Region 4]............................ >> .................................................. >> [unknown], [unknown] (417 bytes) >> >> >> ............................................................ >> ........................................ >> 10.27% >> >> ....[Hottest Regions].................................................... >> ........................... >> 22.54% [unknown] [unknown] (231 bytes) >> 11.31% [unknown] [unknown] (134 bytes) >> 10.39% [unknown] [unknown] (41 bytes) >> 10.27% [unknown] [unknown] (417 bytes) >> 8.45% [unknown] [unknown] (396 bytes) >> 5.71% [unknown] [unknown] (43 bytes) >> 5.66% [unknown] [unknown] (55 bytes) >> 5.45% [unknown] [unknown] (84 bytes) >> 4.17% [unknown] [unknown] (209 bytes) >> 2.47% [unknown] [unknown] (71 bytes) >> 2.12% [unknown] [unknown] (28 bytes) >> 2.04% [unknown] [unknown] (28 bytes) >> 1.63% [unknown] [unknown] (170 bytes) >> 0.95% [unknown] [unknown] (86 bytes) >> 0.89% [unknown] [unknown] (5 bytes) >> 0.83% [unknown] [unknown] (127 bytes) >> 0.81% [unknown] [unknown] (0 bytes) >> 0.55% [unknown] [unknown] (46 bytes) >> 0.44% [unknown] [unknown] (71 bytes) >> 0.41% [unknown] [unknown] (257 bytes) >> 2.91% <...other 73 warm regions...> >> ............................................................ >> ........................................ >> 99.99% >> >> ....[Hottest Methods (after inlining)].................... >> .......................................... >> 99.99% [unknown] [unknown] >> ............................................................ >> ........................................ >> 99.99% >> >> ....[Distribution by Source]....................... >> ................................................. >> 99.99% [unknown] >> ............................................................ >> ........................................ >> 99.99% >> >> >> >> # Run complete. Total time: 00:00:18 >> >> Benchmark (implementation) >> (readRatio) (tableSize) Mode Cnt Score Error Units >> ConcurrentMapThroughput.randomGetPutRemove NonBlockingHashMap >> 50 100000 thrpt 6 22470630.498 ? 3730613.043 ops/s >> ConcurrentMapThroughput.randomGetPutRemove:?asm NonBlockingHashMap >> 50 100000 thrpt NaN --- >> ConcurrentMapThroughput.randomGetPutRemove ConcurrentHashMap >> 50 100000 thrpt 6 23411000.106 ? 1960758.197 ops/s >> ConcurrentMapThroughput.randomGetPutRemove:?asm ConcurrentHashMap >> 50 100000 thrpt NaN --- >> >> On 17 January 2018 at 07:20, Aleksey Shipilev wrote: >> >>> On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: >>> > I have limited access to different versions of Mac OS X, but it seems >>> that in some minor updates >>> > DTrace works with SIP enabled. >>> > So as solution I'd suggest to check SIP status on profiler start (via >>> "csrutil status") and print >>> > warning if it's enabled or just clarify it in javadoc. It's up to >>> Alexey to decide what approach is >>> > preferable in JMH >>> >>> So, perf* profiler print warning messages when then number of samples is >>> suspiciously low. I think >>> dtraceasm profiler should do the same, and clearly say what the user is >>> supposed to do: >>> >>> WARNING: The perf event count is suspiciously low (" + sum + "). The >>> performance data might be >>> inaccurate or misleading. Try to do the profiling again, or tune up the >>> sampling frequency. >>> With some profilers on Mac OS X, System Integrity Protection (SIP) may >>> prevent profiling. >>> In such case, temporarily disabling SIP with 'csrutil disable' might >>> help. >>> >>> I'll add this to the patch myself. >>> >>> I can push this without my own testing, hoping that external people >>> validated this. OpenJDK rules >>> require the patch to be hosted on OpenJDK infra to get accepted, so, >>> Vsevolod, can you please post >>> the recent version of the patch inline here? >>> >>> Thanks >>> -Aleksey >>> >>> >>> >> > From qwwdfsad at gmail.com Mon Jan 22 08:58:03 2018 From: qwwdfsad at gmail.com (Vsevolod Tolstopyatov) Date: Mon, 22 Jan 2018 11:58:03 +0300 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: I do. Small checklist: 1) Have you installed hsdis properly? (instruction: http://psy-lob-saw.blogspot.ru/2013/01/java-print-assembly.html) 2) Do you see any generated code if you run JMH-benchmark in verbose mode with dtraceasm? (option "-v EXTRA") -- Best regards, Tolstopyatov Vsevolod On Mon, Jan 22, 2018 at 3:32 AM, Henri Tremblay wrote: > Do you have Java code matching memory addresses? > > Because I know get > > ERROR: No address lines detected in assembly capture, make sure your JDK > is PrintAssembly-enabled: > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > Perf output processed (skipped 2.585 seconds): > Column 1: sampled_pc (10804 events) > > Hottest code regions (>10.00% "sampled_pc" events): > > ....[Hottest Region 1]............................ > .................................................. > 0x110543f36, [unknown] (584 bytes) > > > ............................................................ > ........................................ > 22.08% > > ....[Hottest Region 2]............................ > .................................................. > 0x11055a27b, [unknown] (369 bytes) > > > ............................................................ > ........................................ > 21.54% > > ....[Hottest Region 3]............................ > .................................................. > 0x110569b7f, [unknown] (650 bytes) > > > ............................................................ > ........................................ > 12.44% > > ....[Hottest Region 4]............................ > .................................................. > 0x11055a140, [unknown] (64 bytes) > > > ............................................................ > ........................................ > 12.10% > > ....[Hottest Region 5]............................ > .................................................. > 0x110543e00, [unknown] (52 bytes) > > > ............................................................ > ........................................ > 12.01% > > ....[Hottest Regions].................................................... > ........................... > 22.08% 0x110543f36 [unknown] (584 bytes) > 21.54% 0x11055a27b [unknown] (369 bytes) > 12.44% 0x110569b7f [unknown] (650 bytes) > 12.10% 0x11055a140 [unknown] (64 bytes) > 12.01% 0x110543e00 [unknown] (52 bytes) > 6.30% 0x11055a540 [unknown] (217 bytes) > 5.89% 0x11055a666 [unknown] (361 bytes) > 3.46% 0x11055a805 [unknown] (85 bytes) > 1.99% 0x11053da80 [unknown] (71 bytes) > 1.49% 0x1105441d1 [unknown] (72 bytes) > 0.22% 0x110543ea3 [unknown] (7 bytes) > 0.12% 0x110569f09 [unknown] (16 bytes) > 0.11% 0x11055a1d8 [unknown] (2 bytes) > 0.08% 0x11055a63c [unknown] (9 bytes) > 0.06% 0x11055a9ee [unknown] (5 bytes) > 0.01% libjvm.dylib Assembler::locate_operand(unsigned > char*, Assembler::WhichOperand) (0 bytes) > 0.01% libjvm.dylib LIR_OpVisitState::append(LIR_OprDesc*&, > LIR_OpVisitState::OprMode) (0 bytes) > 0.01% libjvm.dylib GraphBuilder::push_scope(ciMethod*, > BlockBegin*) (0 bytes) > 0.01% libjvm.dylib LIR_Assembler::reg2stack(LIR_OprDesc*, > LIR_OprDesc*, BasicType, bool) (0 bytes) > 0.01% libjvm.dylib NullCheckEliminator::visit(Instruction**) > (0 bytes) > 0.05% <...other 5 warm regions...> > ............................................................ > ........................................ > 99.99% > > ....[Hottest Methods (after inlining)].................... > .......................................... > 22.08% 0x110543f36 [unknown] > 21.54% 0x11055a27b [unknown] > 12.44% 0x110569b7f [unknown] > 12.10% 0x11055a140 [unknown] > 12.01% 0x110543e00 [unknown] > 6.30% 0x11055a540 [unknown] > 5.89% 0x11055a666 [unknown] > 3.46% 0x11055a805 [unknown] > 1.99% 0x11053da80 [unknown] > 1.49% 0x1105441d1 [unknown] > 0.22% 0x110543ea3 [unknown] > 0.12% 0x110569f09 [unknown] > 0.11% 0x11055a1d8 [unknown] > 0.08% 0x11055a63c [unknown] > 0.06% 0x11055a9ee [unknown] > 0.01% libjvm.dylib Assembler::locate_operand(unsigned > char*, Assembler::WhichOperand) > 0.01% libjvm.dylib NullCheckEliminator::visit( > Instruction**) > 0.01% libjvm.dylib LIR_OpVisitState::append(LIR_OprDesc*&, > LIR_OpVisitState::OprMode) > 0.01% libjvm.dylib GrowableArray::append(Metadata* > const&) > 0.01% 0x1102dd079 [unknown] > 0.05% <...other 5 warm methods...> > ............................................................ > ........................................ > 99.99% > > ....[Distribution by Source]....................... > ................................................. > 22.08% 0x110543f36 > 21.54% 0x11055a27b > 12.44% 0x110569b7f > 12.10% 0x11055a140 > 12.01% 0x110543e00 > 6.30% 0x11055a540 > 5.89% 0x11055a666 > 3.46% 0x11055a805 > 1.99% 0x11053da80 > 1.49% 0x1105441d1 > 0.22% 0x110543ea3 > 0.12% 0x110569f09 > 0.11% 0x11055a1d8 > 0.08% 0x11055a63c > 0.06% libjvm.dylib > 0.06% 0x11055a9ee > 0.01% 0x1102dd079 > 0.01% libsystem_kernel.dylib > 0.01% libsystem_c.dylib > ............................................................ > ........................................ > 99.99% > > On 20 January 2018 at 14:55, Vsevolod Tolstopyatov > wrote: > >> >Then, it still not found lines. I thought JMH was adding the correct >> PrintAssembly flags when forking. But the issue seems to be similar with >> the warning I had for perfasm. >> Yes, that's your PrintAssembly-related issues. >> >> Just ran ConcurrentMapThroughput with dtraceasm on latest JCTools, >> everything seems fine >> >> -- >> Best regards, >> Tolstopyatov Vsevolod >> >> On Wed, Jan 17, 2018 at 3:58 PM, Henri Tremblay > > wrote: >> >>> Well found! It seems much better. >>> >>> Then, it still not found lines. I thought JMH was adding the correct >>> PrintAssembly flags when forking. But the issue seems to be similar with >>> the warning I had for perfasm. >>> >>> ERROR: No address lines detected in assembly capture, make sure your JDK >>> is PrintAssembly-enabled: >>> https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly >>> >>> Perf output processed (skipped 2.531 seconds): >>> Column 1: sampled_pc (11033 events) >>> >>> Hottest code regions (>10.00% "sampled_pc" events): >>> >>> ....[Hottest Region 1]............................ >>> .................................................. >>> [unknown], [unknown] (231 bytes) >>> >>> >>> ............................................................ >>> ........................................ >>> 22.54% >>> >>> ....[Hottest Region 2]............................ >>> .................................................. >>> [unknown], [unknown] (134 bytes) >>> >>> >>> ............................................................ >>> ........................................ >>> 11.31% >>> >>> ....[Hottest Region 3]............................ >>> .................................................. >>> [unknown], [unknown] (41 bytes) >>> >>> >>> ............................................................ >>> ........................................ >>> 10.39% >>> >>> ....[Hottest Region 4]............................ >>> .................................................. >>> [unknown], [unknown] (417 bytes) >>> >>> >>> ............................................................ >>> ........................................ >>> 10.27% >>> >>> ....[Hottest Regions]...................... >>> ......................................................... >>> 22.54% [unknown] [unknown] (231 bytes) >>> 11.31% [unknown] [unknown] (134 bytes) >>> 10.39% [unknown] [unknown] (41 bytes) >>> 10.27% [unknown] [unknown] (417 bytes) >>> 8.45% [unknown] [unknown] (396 bytes) >>> 5.71% [unknown] [unknown] (43 bytes) >>> 5.66% [unknown] [unknown] (55 bytes) >>> 5.45% [unknown] [unknown] (84 bytes) >>> 4.17% [unknown] [unknown] (209 bytes) >>> 2.47% [unknown] [unknown] (71 bytes) >>> 2.12% [unknown] [unknown] (28 bytes) >>> 2.04% [unknown] [unknown] (28 bytes) >>> 1.63% [unknown] [unknown] (170 bytes) >>> 0.95% [unknown] [unknown] (86 bytes) >>> 0.89% [unknown] [unknown] (5 bytes) >>> 0.83% [unknown] [unknown] (127 bytes) >>> 0.81% [unknown] [unknown] (0 bytes) >>> 0.55% [unknown] [unknown] (46 bytes) >>> 0.44% [unknown] [unknown] (71 bytes) >>> 0.41% [unknown] [unknown] (257 bytes) >>> 2.91% <...other 73 warm regions...> >>> ............................................................ >>> ........................................ >>> 99.99% >>> >>> ....[Hottest Methods (after inlining)].................... >>> .......................................... >>> 99.99% [unknown] [unknown] >>> ............................................................ >>> ........................................ >>> 99.99% >>> >>> ....[Distribution by Source]....................... >>> ................................................. >>> 99.99% [unknown] >>> ............................................................ >>> ........................................ >>> 99.99% >>> >>> >>> >>> # Run complete. Total time: 00:00:18 >>> >>> Benchmark (implementation) >>> (readRatio) (tableSize) Mode Cnt Score Error Units >>> ConcurrentMapThroughput.randomGetPutRemove NonBlockingHashMap >>> 50 100000 thrpt 6 22470630.498 ? 3730613.043 ops/s >>> ConcurrentMapThroughput.randomGetPutRemove:?asm NonBlockingHashMap >>> 50 100000 thrpt NaN --- >>> ConcurrentMapThroughput.randomGetPutRemove ConcurrentHashMap >>> 50 100000 thrpt 6 23411000.106 ? 1960758.197 ops/s >>> ConcurrentMapThroughput.randomGetPutRemove:?asm ConcurrentHashMap >>> 50 100000 thrpt NaN --- >>> >>> On 17 January 2018 at 07:20, Aleksey Shipilev wrote: >>> >>>> On 01/14/2018 04:24 PM, Vsevolod Tolstopyatov wrote: >>>> > I have limited access to different versions of Mac OS X, but it seems >>>> that in some minor updates >>>> > DTrace works with SIP enabled. >>>> > So as solution I'd suggest to check SIP status on profiler start (via >>>> "csrutil status") and print >>>> > warning if it's enabled or just clarify it in javadoc. It's up to >>>> Alexey to decide what approach is >>>> > preferable in JMH >>>> >>>> So, perf* profiler print warning messages when then number of samples >>>> is suspiciously low. I think >>>> dtraceasm profiler should do the same, and clearly say what the user is >>>> supposed to do: >>>> >>>> WARNING: The perf event count is suspiciously low (" + sum + "). The >>>> performance data might be >>>> inaccurate or misleading. Try to do the profiling again, or tune up >>>> the sampling frequency. >>>> With some profilers on Mac OS X, System Integrity Protection (SIP) may >>>> prevent profiling. >>>> In such case, temporarily disabling SIP with 'csrutil disable' might >>>> help. >>>> >>>> I'll add this to the patch myself. >>>> >>>> I can push this without my own testing, hoping that external people >>>> validated this. OpenJDK rules >>>> require the patch to be hosted on OpenJDK infra to get accepted, so, >>>> Vsevolod, can you please post >>>> the recent version of the patch inline here? >>>> >>>> Thanks >>>> -Aleksey >>>> >>>> >>>> >>> >> > From ashipile at redhat.com Mon Jan 22 16:59:57 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 16:59:57 +0000 Subject: hg: code-tools/jmh: Linux perf should not assume cycles/instructions counters are always available Message-ID: <201801221659.w0MGxvsi004241@aojmv0008.oracle.com> Changeset: 6d51e8e924d0 Author: shade Date: 2018-01-22 17:50 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/6d51e8e924d0 Linux perf should not assume cycles/instructions counters are always available ! jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java ! jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfNormProfiler.java From ashipile at redhat.com Mon Jan 22 17:12:01 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 17:12:01 +0000 Subject: hg: code-tools/jmh: 7902097: dtraceasm profiler for Mac OS X Message-ID: <201801221712.w0MHC10a009824@aojmv0008.oracle.com> Changeset: ae08c0b9db44 Author: shade Date: 2018-01-22 18:04 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/ae08c0b9db44 7902097: dtraceasm profiler for Mac OS X Contributed-by: Vsevolod Tolstopyatov ! jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java + jmh-core/src/main/java/org/openjdk/jmh/profile/DTraceAsmProfiler.java ! jmh-core/src/main/java/org/openjdk/jmh/profile/ProfilerFactory.java ! jmh-core/src/main/java/org/openjdk/jmh/util/Utils.java ! jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java From shade at redhat.com Mon Jan 22 17:08:33 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 18:08:33 +0100 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: On 01/20/2018 08:52 PM, Vsevolod Tolstopyatov wrote: >>Vsevolod, can you please post?the recent version of the patch inline here? > > Here it is. Note that I've changed one line: "String[] splits = line.split(" ");" -> "String[] > splits = line.split(" ", 5);" to properly handle native symbols (especially ones from libjvm.dylib). > Retested again on JMH samples + VM-heavy benchmarks (String#intern, exceptions) Pushed: https://bugs.openjdk.java.net/browse/CODETOOLS-7902097 Once you and Henry figure out what is wrong with his environment, please do a follow-up patch that would provide a helpful failure message or such? Thanks, -Aleksey From ashipile at redhat.com Mon Jan 22 17:17:48 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 17:17:48 +0000 Subject: hg: code-tools/jmh: Amend 7902097 with better failure message about Mac OS X SIP. Message-ID: <201801221717.w0MHHm3B012941@aojmv0008.oracle.com> Changeset: a5079769b73b Author: shade Date: 2018-01-22 18:10 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/a5079769b73b Amend 7902097 with better failure message about Mac OS X SIP. ! jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java From shade at redhat.com Mon Jan 22 17:27:13 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 18:27:13 +0100 Subject: [PATCH]: Forking benchmarks with long classpath on Windows (7) In-Reply-To: References: <372409f3-0a34-ded5-7290-bfa49f782a63@redhat.com> <5132e129-d2f6-7e25-8531-14c7367bbd1c@redhat.com> Message-ID: <2d50bec7-1da4-8142-e6b6-2699e3f01f90@redhat.com> On 12/22/2017 09:12 PM, D?vid Karnok wrote: > Unfortunately, the system property approach doesn't work. The JMH plugin (0.4.5) can only set the > `jvmArgs*` and not the -D param on the task that > runs the parent JMH process. (I.e., it would require a change here: > https://github.com/melix/jmh-gradle-plugin/blob/19ff7a4ed013e7980ef162534e0fefc91536d12c/src/main/groovy/me/champeau/gradle/JMHTask.java#L64) > > I don't have much hope the plugin would evolve in reasonable time to accomodate this property so I > suggest adding a workaround > for checking for the flag: > > String jvmargs = "" > + options.getJvmArgs().orElse(Collections.emptyList()) > + options.getJvmArgsPrepend().orElse(Collections.emptyList()) > + options.getJvmArgsAppend().orElse(Collections.emptyList()); > > > if (Boolean.getBoolean("jmh.separateClasspathJAR") > || jvmargs.contains("jmh.separateClasspathJAR=true")) { > > > With this change, my benchmarks executed under Gradle 4.3.1, JMH Plugin 0.4.5 and setting jvmArgs. All right, fine! Let's do this: http://cr.openjdk.java.net/~shade/jmh/long-classpath-2.patch If you can sanity-check it still works for you, I'll push. Thanks, -Aleksey From akarnokd at gmail.com Mon Jan 22 17:55:44 2018 From: akarnokd at gmail.com (=?UTF-8?Q?D=C3=A1vid_Karnok?=) Date: Mon, 22 Jan 2018 18:55:44 +0100 Subject: [PATCH]: Forking benchmarks with long classpath on Windows (7) In-Reply-To: <2d50bec7-1da4-8142-e6b6-2699e3f01f90@redhat.com> References: <372409f3-0a34-ded5-7290-bfa49f782a63@redhat.com> <5132e129-d2f6-7e25-8531-14c7367bbd1c@redhat.com> <2d50bec7-1da4-8142-e6b6-2699e3f01f90@redhat.com> Message-ID: Thanks Aleksey, the 1.20 patch works: # JMH version: 1.19 # VM version: JDK 1.8.0_162, VM 25.162-b12 # VM invoker: C:\Program Files\Java\jdk1.8.0_162\jre\bin\java.exe # VM options: # Warmup: 5 iterations, 1 s each # Measurement: 5 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: io.reactivex.FlattenJustPerf.flowable # Parameters: (times = 1) # Run progress: 0,00% complete, ETA 00:02:20 # Fork: 1 of 1 # JMH version: 1.20-SNAPSHOT # VM version: JDK 1.8.0_162, VM 25.162-b12 # VM invoker: C:\Program Files\Java\jdk1.8.0_162\jre\bin\java.exe # VM options: -Djmh.separateClasspathJAR=true # Warmup: 5 iterations, 1 s each # Measurement: 5 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: io.reactivex.FlattenJustPerf.flowable # Parameters: (times = 1) # Run progress: 0,00% complete, ETA 00:02:20 # Fork: 1 of 1 # Warmup Iteration 1: 22808968,264 ops/s # Warmup Iteration 2: 23924488,213 ops/s # Warmup Iteration 3: 25972212,907 ops/s # Warmup Iteration 4: 25621809,089 ops/s # Warmup Iteration 5: 26211772,303 ops/s Iteration 1: 25449588,158 ops/s Iteration 2: 25343894,842 ops/s Iteration 3: 25234472,179 ops/s Iteration 4: 25386835,113 ops/s Iteration 5: 25836381,567 ops/s 2018-01-22 18:27 GMT+01:00 Aleksey Shipilev : > On 12/22/2017 09:12 PM, D?vid Karnok wrote: > > Unfortunately, the system property approach doesn't work. The JMH plugin > (0.4.5) can only set the > > `jvmArgs*` and not the -D param on the task that > > runs the parent JMH process. (I.e., it would require a change here: > > https://github.com/melix/jmh-gradle-plugin/blob/ > 19ff7a4ed013e7980ef162534e0fefc91536d12c/src/main/groovy/me/ > champeau/gradle/JMHTask.java#L64) > > > > I don't have much hope the plugin would evolve in reasonable time to > accomodate this property so I > > suggest adding a workaround > > for checking for the flag: > > > > String jvmargs = "" > > + options.getJvmArgs().orElse(Collections.emptyList()) > > + options.getJvmArgsPrepend().orElse(Collections. > emptyList()) > > + options.getJvmArgsAppend().orElse(Collections. > emptyList()); > > > > > > if (Boolean.getBoolean("jmh.separateClasspathJAR") > > || jvmargs.contains("jmh.separateClasspathJAR=true")) { > > > > > > With this change, my benchmarks executed under Gradle 4.3.1, JMH Plugin > 0.4.5 and setting jvmArgs. > > All right, fine! Let's do this: > http://cr.openjdk.java.net/~shade/jmh/long-classpath-2.patch > > If you can sanity-check it still works for you, I'll push. > > Thanks, > -Aleksey > > > -- Best regards, David Karnok From ashipile at redhat.com Mon Jan 22 18:06:25 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 18:06:25 +0000 Subject: hg: code-tools/jmh: 7902106: jmh.separateClasspathJAR option to handle benchmarks with long classpath Message-ID: <201801221806.w0MI6PPa001877@aojmv0008.oracle.com> Changeset: f797116d4991 Author: shade Date: 2018-01-22 18:35 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/f797116d4991 7902106: jmh.separateClasspathJAR option to handle benchmarks with long classpath Contributed-by: David Karnok + jmh-core-it/src/test/java/org/openjdk/jmh/it/fork/ForkSeparateClasspathJARTest.java ! jmh-core/src/main/java/org/openjdk/jmh/runner/Runner.java From shade at redhat.com Mon Jan 22 18:02:27 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 19:02:27 +0100 Subject: [PATCH]: Forking benchmarks with long classpath on Windows (7) In-Reply-To: References: <372409f3-0a34-ded5-7290-bfa49f782a63@redhat.com> <5132e129-d2f6-7e25-8531-14c7367bbd1c@redhat.com> <2d50bec7-1da4-8142-e6b6-2699e3f01f90@redhat.com> Message-ID: <94d055ce-d964-c648-d3cf-8db0eb156cb8@redhat.com> On 01/22/2018 06:55 PM, D?vid Karnok wrote: > Thanks Aleksey, the 1.20 patch works: And pushed under: https://bugs.openjdk.java.net/browse/CODETOOLS-7902106 -Aleksey From henri.tremblay at gmail.com Mon Jan 22 18:43:34 2018 From: henri.tremblay at gmail.com (Henri Tremblay) Date: Mon, 22 Jan 2018 13:43:34 -0500 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: I'm on it. I think I've lost my hsdis configuration and haven't noticed. So I'm quite confident it should work after that. And yes, you can then improve the error message On 22 January 2018 at 12:08, Aleksey Shipilev wrote: > On 01/20/2018 08:52 PM, Vsevolod Tolstopyatov wrote: > >>Vsevolod, can you please post the recent version of the patch inline > here? > > > > Here it is. Note that I've changed one line: "String[] splits = > line.split(" ");" -> "String[] > > splits = line.split(" ", 5);" to properly handle native symbols > (especially ones from libjvm.dylib). > > Retested again on JMH samples + VM-heavy benchmarks (String#intern, > exceptions) > > Pushed: > https://bugs.openjdk.java.net/browse/CODETOOLS-7902097 > > Once you and Henry figure out what is wrong with his environment, please > do a follow-up patch that > would provide a helpful failure message or such? > > Thanks, > -Aleksey > > From henri.tremblay at gmail.com Mon Jan 22 18:59:01 2018 From: henri.tremblay at gmail.com (Henri Tremblay) Date: Mon, 22 Jan 2018 13:59:01 -0500 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: I confirm. It works perfectly. This is awesome! So, the only thing that I could complain about is the error message: ERROR: No address lines detected in assembly capture, make sure your JDK is PrintAssembly-enabled: https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly It's on the wiki but the way it's phrased, I was thinking +PrintAssembly. And nothing else. I will suggest this: ERROR: No address lines detected in assembly capture. Make sure your JDK is properly configured to print assembly. For details, see the link below https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly On 22 January 2018 at 13:43, Henri Tremblay wrote: > I'm on it. I think I've lost my hsdis configuration and haven't noticed. > So I'm quite confident it should work after that. And yes, you can then > improve the error message > > On 22 January 2018 at 12:08, Aleksey Shipilev wrote: > >> On 01/20/2018 08:52 PM, Vsevolod Tolstopyatov wrote: >> >>Vsevolod, can you please post the recent version of the patch inline >> here? >> > >> > Here it is. Note that I've changed one line: "String[] splits = >> line.split(" ");" -> "String[] >> > splits = line.split(" ", 5);" to properly handle native symbols >> (especially ones from libjvm.dylib). >> > Retested again on JMH samples + VM-heavy benchmarks (String#intern, >> exceptions) >> >> Pushed: >> https://bugs.openjdk.java.net/browse/CODETOOLS-7902097 >> >> Once you and Henry figure out what is wrong with his environment, please >> do a follow-up patch that >> would provide a helpful failure message or such? >> >> Thanks, >> -Aleksey >> >> > From shade at redhat.com Mon Jan 22 19:05:14 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 20:05:14 +0100 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: On 01/22/2018 07:59 PM, Henri Tremblay wrote: > I confirm. It works perfectly. This is awesome! > > So, the only thing that I could complain about is the error message: > > ERROR: No address lines detected in assembly capture, make sure your JDK is PrintAssembly-enabled: > ? ??https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > > It's on the wiki but the way it's phrased, I was thinking +PrintAssembly. And nothing else. I will > suggest this: > > ERROR: No address lines detected in assembly capture. Make sure your JDK is properly configured to > print assembly. For details, see the link below > ? ??https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > Not really actionable, I think. How about this one? ERROR: No address lines detected in assembly capture. Make sure your JDK is properly configured to print generated assembly. The most probable cause for this failure is that hsdis is not available, or resides at the wrong path within the JDK. Try to run the JDK in question with -XX:+PrintAssembly and simple non-JMH program. For details, see the link below: https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly -Aleksey From henri.tremblay at gmail.com Mon Jan 22 20:05:34 2018 From: henri.tremblay at gmail.com (Henri Tremblay) Date: Mon, 22 Jan 2018 15:05:34 -0500 Subject: DTrace asm profiler for Mac OS X In-Reply-To: References: <0b213005-161b-a2f8-c5c4-c2df4d61c3ed@redhat.com> <750bf3a7-f879-52e4-1a71-e7825ee4ee27@redhat.com> Message-ID: Even better :-) On 22 January 2018 at 14:05, Aleksey Shipilev wrote: > On 01/22/2018 07:59 PM, Henri Tremblay wrote: > > I confirm. It works perfectly. This is awesome! > > > > So, the only thing that I could complain about is the error message: > > > > ERROR: No address lines detected in assembly capture, make sure your JDK > is PrintAssembly-enabled: > > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > > > > > It's on the wiki but the way it's phrased, I was thinking > +PrintAssembly. And nothing else. I will > > suggest this: > > > > ERROR: No address lines detected in assembly capture. Make sure your JDK > is properly configured to > > print assembly. For details, see the link below > > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > > > Not really actionable, I think. How about this one? > > ERROR: No address lines detected in assembly capture. Make sure your JDK > is properly configured to > print generated assembly. The most probable cause for this failure is that > hsdis is not available, > or resides at the wrong path within the JDK. Try to run the JDK in > question with -XX:+PrintAssembly > and simple non-JMH program. For details, see the link below: > https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly > > -Aleksey > > From ashipile at redhat.com Mon Jan 22 21:44:25 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 21:44:25 +0000 Subject: hg: code-tools/jmh: 7902107: perfasm should provide better suggestion about PrintAssembly failures Message-ID: <201801222144.w0MLiPLH018474@aojmv0008.oracle.com> Changeset: ef50cc696984 Author: shade Date: 2018-01-22 22:39 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/ef50cc696984 7902107: perfasm should provide better suggestion about PrintAssembly failures ! jmh-core/src/main/java/org/openjdk/jmh/profile/AbstractPerfAsmProfiler.java From shade at redhat.com Tue Jan 23 09:20:57 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 23 Jan 2018 10:20:57 +0100 Subject: JMH 1.20 Message-ID: <38cd8f2a-e1d8-5fb3-3f21-0c105e171e18@redhat.com> Hi, JMH 1.20 is released and available on Maven Central. This release comes with the following features and bugfixes: *) dtraceasm profiler for Mac OS X, contributed by Vsevolod Tolstopyatov. This allows perfasm-like functionality on Mac OS X. This requires functional dtrace, superuser privileges, and sometimes additional system configuration. Try it out, and report problems at jmh-dev@! https://bugs.openjdk.java.net/browse/CODETOOLS-7902097 *) jmh.separateClasspathJAR option to handle benchmarks with long classpath, base patch contributed by David Karnok. This optionally handles the case of operating systems/environments that have limit on the size of the command line. https://bugs.openjdk.java.net/browse/CODETOOLS-7902106 *) Support enum params that override Object#toString(), contributed by Anuraag Agrawal. This fixes a nasty overlook in @Param handling for enums, that has the potential to break the benchmarks. https://bugs.openjdk.java.net/browse/CODETOOLS-7902095 *) The invocation order of @Setup/@TearDown methods on @State objects of different Scopes was messed up, which may break benchmark assumptions. https://bugs.openjdk.java.net/browse/CODETOOLS-7902096 *) External profilers should be called with before/afterTrial during warmup forks, for consistency https://bugs.openjdk.java.net/browse/CODETOOLS-7902088 *) UX: Linux perf should not assume cycles/instructions counters are always available. This saves some fatal errors on platforms that do not provide them (e.g. VMs without hardware performance counters support). This also makes default perfasm profiler narrower, as we only report "cycles" by default. https://bugs.openjdk.java.net/browse/CODETOOLS-7902105 *) UX: Make sure only a single profiler of given type is used https://bugs.openjdk.java.net/browse/CODETOOLS-7901990 *) UX: perfasm provides better suggestions about PrintAssembly failures https://bugs.openjdk.java.net/browse/CODETOOLS-7902107 *) UX: JMH Core Benchmarks includes more advanced timing tests https://bugs.openjdk.java.net/browse/CODETOOLS-7902040 http://central.maven.org/maven2/org/openjdk/jmh/jmh-core-benchmarks/1.20/jmh-core-benchmarks-1.20-full.jar *) Make sure JMH builds and runs with JDK 9 GA -- mostly build time fixes, but some internal API changes, and fixes for regressions too: https://bugs.openjdk.java.net/browse/CODETOOLS-7902041 https://bugs.openjdk.java.net/browse/CODETOOLS-7902085 *) Typos and cleanups https://bugs.openjdk.java.net/browse/CODETOOLS-7901985 https://bugs.openjdk.java.net/browse/CODETOOLS-7901984 Enjoy! Thanks, -Aleksey From ashipile at redhat.com Tue Jan 23 09:25:30 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 23 Jan 2018 09:25:30 +0000 Subject: hg: code-tools/jmh: 3 new changesets Message-ID: <201801230925.w0N9PUQD023778@aojmv0008.oracle.com> Changeset: bd52958ab680 Author: shade Date: 2018-01-23 09:43 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/bd52958ab680 JMH v1.20. ! jmh-archetypes/jmh-groovy-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-java-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-kotlin-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-scala-benchmark-archetype/pom.xml ! jmh-archetypes/pom.xml ! jmh-core-benchmarks/pom.xml ! jmh-core-ct/pom.xml ! jmh-core-it/pom.xml ! jmh-core/pom.xml ! jmh-generator-annprocess/pom.xml ! jmh-generator-asm/pom.xml ! jmh-generator-bytecode/pom.xml ! jmh-generator-reflection/pom.xml ! jmh-samples/pom.xml ! pom.xml Changeset: 319b44de7f8e Author: shade Date: 2018-01-23 09:43 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/319b44de7f8e Added tag 1.20 for changeset bd52958ab680 ! .hgtags Changeset: 25d8b2695bac Author: shade Date: 2018-01-23 09:44 +0100 URL: http://hg.openjdk.java.net/code-tools/jmh/rev/25d8b2695bac Continue in 1.21-SNAPSHOT ! jmh-archetypes/jmh-groovy-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-java-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-kotlin-benchmark-archetype/pom.xml ! jmh-archetypes/jmh-scala-benchmark-archetype/pom.xml ! jmh-archetypes/pom.xml ! jmh-core-benchmarks/pom.xml ! jmh-core-ct/pom.xml ! jmh-core-it/pom.xml ! jmh-core/pom.xml ! jmh-generator-annprocess/pom.xml ! jmh-generator-asm/pom.xml ! jmh-generator-bytecode/pom.xml ! jmh-generator-reflection/pom.xml ! jmh-samples/pom.xml ! pom.xml From sergei.tsypanov at yandex.ru Tue Jan 23 20:50:11 2018 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Tue, 23 Jan 2018 22:50:11 +0200 Subject: Usage of Blackhole in a loop distorts benchmark results In-Reply-To: <03B2F718-758C-440A-A8AA-1694521303A6@yahoo.com> References: <1432911516175250@web36o.yandex.ru> <550179939.4914164.1516185612237@mail.yahoo.com> <526851516221811@web48j.yandex.ru> <03B2F718-758C-440A-A8AA-1694521303A6@yahoo.com> Message-ID: <632981516740611@web55g.yandex.ru> All right, I've done the measurements with -prof perfasm, as it appears from the results When I use Blackhole inside of a loop (as stated in JMHSample_34_SafeLooping.measureRight_1) I've got his output: ....[Hottest Regions]............................................................................... 55,38% C2, level 4 com.luxoft.logeek.benchmark.iterator.generated.IteratorFromStreamBenchmark_iteratorFromStream_jmhTest::iteratorFromStream_avgt_jmhStub, version 647 (203 bytes) 16,57% C2, level 4 org.openjdk.jmh.infra.Blackhole::consume, version 607 (51 bytes) 9,30% (0 bytes) for wrong way (accumulating iteration results) in local variable and returning it: ....[Hottest Regions]............................................................................... 76,64% C2, level 4 com.luxoft.logeek.benchmark.iterator.generated.IteratorFromStreamBenchmark_iteratorFromStream_jmhTest::iteratorFromStream_avgt_jmhStub, version 621 (93 bytes) 8,36% (0 bytes) 1,01% ntdll.dll 0x00000000777bf287 (78 bytes) I wonder what is , some infrastructure costs? Is it possible to have a deeper look into the code? P. S. Suddenly it all stopped working and now writes: java.lang.IllegalStateException: Failed to start xperf: [xperf: error: NT Kernel Logger: Cannot create a file when that file already exists. (0xb7). Has anyone faced it?