Perfasm attributes 39% of cycles to JMH infrastructure in my benchmark
Chris Vest
mr.chrisvest at gmail.com
Thu Apr 16 12:10:43 UTC 2015
Hi,
I have this benchmark, built with JMH 1.8 (change the neo4j.version property to “2.2.1” when building):
https://github.com/chrisvest/traversal-benchmark <https://github.com/chrisvest/traversal-benchmark>
That I run like this (given the nodes.db and rels.db files are 3 and 12 GBs of random data, respectively):
$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
$ cat props.conf
pageSize = 8k
nodeRecordSize = 32
relationshipRecordSize = 64
maxRam = 15G
sparseValue = 10
nodeStore = /mnt/pcie-ssd/nodes.db
relationshipStore = /mnt/pcie-ssd/rels.db
$ java -jar traversal-benchmark.jar -f 1 -wi 180 -i 30 -prof perfasm -t 64
On a machine that has 4 sockets, 32 real cores, 64 hyper-threaded. And I’m told that the hottest instructions in the hottest code region, is a bit of JMH infrastructure code:
Hottest code regions (>10.00% "cycles" events):
....[Hottest Region 1]..............................................................................
[0x7facc96f5fd0:0x7facc96f6082] in org.neo4j.simulation.generated.PageCacheTraversal_traverse::traverse_Throughput
0x00007facc96f5faf: mov (%rsp),%r9
0x00007facc96f5fb3: mov 0x8(%rsp),%r11
0x00007facc96f5fb8: mov 0x10(%rsp),%r8
0x00007facc96f5fbd: mov 0x18(%rsp),%rbx
0x00007facc96f5fc2: jmp 0x00007facc96f6035
0x00007facc96f5fc4: nopl 0x0(%rax,%rax,1)
0x00007facc96f5fcc: data16 data16 xchg %ax,%ax ;*ifne
; - java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::fullCheck at 8 (line 432)
; - java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::compareAndSet at 24 (line 439)
; - org.neo4j.simulation.generated.PageCacheTraversal_traverse::traverse_Throughput at 159 (line 84)
0.02% 0.01% 0x00007facc96f5fd0: movabs $0x5a022cfc0,%r10 ; {oop(a 'java/util/concurrent/atomic/AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl')}
0.00% 0x00007facc96f5fda: mov 0x18(%r10),%ecx ;*getfield cclass
; - java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::fullCheck at 20 (line 434)
; - java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::compareAndSet at 24 (line 439)
; - org.neo4j.simulation.generated.PageCacheTraversal_traverse::traverse_Throughput at 159 (line 84)
0.00% 0.00% 0x00007facc96f5fde: test %ecx,%ecx
0x00007facc96f5fe0: jne 0x00007facc96f66e1
0.00% 0.00% 0x00007facc96f5fe6: mov 0x10(%r10),%rcx
0.02% 0.01% 0x00007facc96f5fea: xor %eax,%eax
0.00% 0.00% 0x00007facc96f5fec: mov $0x1,%r10d
0.00% 0x00007facc96f5ff2: lock cmpxchg %r10d,(%rbx,%rcx,1)
38.82% 32.84% 0x00007facc96f5ff8: sete %r10b
0.01% 0.00% 0x00007facc96f5ffc: movzbl %r10b,%r10d ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::compareAndSet at 37 (line 440)
; - org.neo4j.simulation.generated.PageCacheTraversal_traverse::traverse_Throughput at 159 (line 84)
Looking at the generated Java code, at lines 83-86:
if (control.isLastIteration()) {
while(!PageCacheTraversal_jmh.tearTrialMutexUpdater.compareAndSet(l_pagecachetraversal0_G, 0, 1)) {
if (Thread.interrupted()) throw new InterruptedException();
}
Apparently that “tearTrialMutexUpdater” is pretty popular. Does this influence my results, and if so, can I do anything to reduce the effect?
Cheers,
Chris
More information about the jmh-dev
mailing list