[PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

Thu Feb 8 16:54:15 UTC 2018

Hi Ben,

Thanks. I anticipated a performance hit but not necessarily a 10x. Without looking at the generated code of the benchmark method it is hard to be sure [*], but i believe the fence is interfering with loop unrolling and/or vectorization, the comparative differences between byte and int may be related to vectorization (for byte there may be less or limited support for vectorization).

How about we now try another experiment commenting out the @DontInline on the fence method and re-run the benchmarks. From Peter’s observations and Vladimir’s analysis we should be able to remove that, or even, contrary to what we initial expected when adding this feature, change to @ForceInline!

Thanks,
Paul.

[*] If you are running on linux you can use the excellent JMH perfasm feature to dump the hot parts of HotSpots generated code.

> On Feb 8, 2018, at 8:22 AM, Ben Walsh <ben_walsh at uk.ibm.com> wrote:
> 
> Hi Paul,
> 
> Following up with the requested loop and vectorization benchmarks ...
> 
> 
> (Do the vectorization benchmark results imply that the Hotspot compiler 
> has been unable to perform the vectorization optimisation due to the 
> presence of the reachabilityFence ?)
> 
> 
> -----------------------------------------------------------------------------------------------------------------------
> 
> 
> Loop Benchmarking
> ---- ------------
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>    @Param({"1", "10", "100", "1000", "10000"})
>    public int L;
> 
>    @State(Scope.Benchmark)
>    public static class ByteBufferContainer {
> 
>        ByteBuffer bb;
> 
>        @Setup(Level.Invocation)
>        public void initByteBuffer() {
>            bb = ByteBuffer.allocateDirect(10000);
>        }
> 
>        ByteBuffer getByteBuffer() {
>            return bb;
>        }
>    }
> 
>    @Benchmark
>    public ByteBuffer benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>        ByteBuffer bb = bbC.getByteBuffer();
> 
>        for (int i = 0; i < L; i++) {
>            bb.put((byte)i);
>        }
> 
>        return bb;
>    }
> 
> }
> 
> 
> Without Changes
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29303145.752 ± 635979.750  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 24260859.017 ± 528891.303  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 8512366.637 ± 136615.070  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1323756.037 ±  21485.369  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 145965.305 ±   1301.469  ops/s
> 
> 
> With Changes
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 28893540.122 ± 754554.747  ops/s  -1.398%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 15317696.355 ± 231621.608  ops/s  -36.863%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 2546599.578 ±  32136.873  ops/s  -70.084%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 288832.514 ±   3854.522  ops/s  -78.181%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 29747.386 
> ±    214.831  ops/s  -79.620%
> 
> 
> -----------------------------------------------------------------------------------------------------------------------
> 
> 
> Vectorization Benchmarking
> ------------- ------------
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>    @Param({"1", "10", "100", "1000", "10000"})
>    public int L;
> 
>    @State(Scope.Benchmark)
>    public static class ByteBufferContainer {
> 
>        ByteBuffer bb;
> 
>        @Setup(Level.Invocation)
>        public void initByteBuffer() {
>            bb = ByteBuffer.allocateDirect(4 * 10000);
> 
>            for (int i = 0; i < 10000; i++) {
>                bb.putInt(i);
>            }
>        }
> 
>        ByteBuffer getByteBuffer() {
>            return bb;
>        }
> 
>    }
> 
>    @Benchmark
>    public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>        ByteBuffer bb = bbC.getByteBuffer();
> 
>        bb.position(0);
> 
>        int sum = 0;
> 
>        for (int i = 0; i < L; i++) {
>            sum += bb.getInt();
>        }
> 
>        return sum;
> 
>    }
> 
> }
> 
> 
> Without Changes
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29677205.748 ± 544721.142  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 18219951.454 ± 320724.793  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 7767650.826 ± 121798.910  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1646075.010 ±   9804.499  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 183489.418 ±   1355.967  ops/s
> 
> 
> With Changes
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 15230086.695 ± 390174.190  ops/s  -48.681%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 8126310.728 ± 123661.342  ops/s  -55.399%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 1582699.233 ±   7278.744  ops/s  -79.624%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 179726.465 ±    802.333  ops/s  -89.082%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 18327.049 
> ±      9.506  ops/s  -90.012%
> 
> 
> 
> NB : For reference - for this and previous benchmarking results ...
> 
> "Without Changes" and "With Changes" - java -version ...
> 
> openjdk version "10-internal" 2018-03-20
> OpenJDK Runtime Environment (build 10-internal+0-adhoc.walshbp.jdk)
> OpenJDK 64-Bit Server VM (build 10-internal+0-adhoc.walshbp.jdk, mixed 
> mode)
> 
> 
> -----------------------------------------------------------------------------------------------------------------------
> 
> 
> Regards,
> Ben Walsh
>