a situation when JHM 1.21 measurements are broken

Tue Feb 5 14:37:36 UTC 2019

On 2/5/19 1:20 PM, Valentin Kovalenko wrote:
> I encountered a simple situation where JMH 1.21 produces obviously
> incorrect results. The test (is also temporarily available at
> https://github.com/stIncMale/sandbox/blob/master/benchmarks/src/test/java/stincmale/sandbox/benchmarks/TmpTest.java
> ,
> you can run it with `mvn clean test -f benchmarks/pom.xml -Dtest=TmpTest`):

It is customary to do MCVE that does not have any dependencies except JMH itself, like this:

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.StampedLock;
import org.openjdk.jmh.annotations.*;

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class TmpTest {

    private StampedLock lock;
    private AtomicLong counter;

    @Setup(Level.Trial)
    public final void setup() {
        lock = new StampedLock();
        counter = new AtomicLong();
    }

    @Benchmark
    public final long stampedLock() throws InterruptedException {
        long stamp = lock.readLock();
        try {
            if (counter.getAndIncrement() < 1000) {
                return counter.get();
            }
        } finally {
            lock.unlockRead(stamp);
        }
        stamp = lock.writeLock();
        try {
            Thread.sleep(10);
            counter.set(0);
        } finally {
            lock.unlockWrite(stamp);
        }
        return counter.get();
    }
}

> Obviously, there is no way the throughput for this benchmarks is greater
> than 1000 ops / 10ms = 100 ops/ms. And JMH produces stable results lesser
> than 50 ops/ms for 4 threads for both reentrantRwLock and stampedLock.
> However, for 32 threads JMH produces sane results only for reentrantRwLock,
> but for stampedLock the run looks like this:

Every time "obviously" is spelled somewhere, we need to challenge that "obviously" first. I can see
how you can prove that _a single thread_ cannot do more than 100 ops/ms: indeed, it would have to
sleep 10 ms each 1000 ops.

But this does not extend to multiple threads, because other threads can "reset" the counter at
unfortunate times. Consider N threads, all N threads have failed the increment, all of them queue up
for writeLock acquisition. One of those threads succeeds, enters, resets the counter, and proceeds
to do increments under readLock. But, other N-1 threads are entering and resetting the counter when
first thread is doing its own increments.

So, in this failure scenario, the worst case is incrementer thread doing 999 increments, and all N-1
threads reset the counter to zero "just then". Each of those threads would deliver no more than
(N-1) resets each 10 ms. Therefore, the incrementer thread can perceive the throughput of 999 *
(N-1) / 10 ops/ms. Or, it your example of 32 threads, up to around 3100 ops/ms.

This would be exacerbated by more threads actually stalling on writeLock acquisition, so more
threads do more outliers. Pretty sure it would be less pronunciated if pending writeLock acquisition
starve the readLock acquisitions, which might explain why ReentrantRWLock "appears" unaffected.

Pretty sure there are other failure modes in this benchmark, but this should get you the whiff of
how non-obvious this benchmark really is. The outliers do not point to JMH bug, IMO.

-Aleksey