[PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup
Paul Sandoz
paul.sandoz at oracle.com
Thu Feb 8 16:54:15 UTC 2018
Hi Ben,
Thanks. I anticipated a performance hit but not necessarily a 10x. Without looking at the generated code of the benchmark method it is hard to be sure [*], but i believe the fence is interfering with loop unrolling and/or vectorization, the comparative differences between byte and int may be related to vectorization (for byte there may be less or limited support for vectorization).
How about we now try another experiment commenting out the @DontInline on the fence method and re-run the benchmarks. From Peter’s observations and Vladimir’s analysis we should be able to remove that, or even, contrary to what we initial expected when adding this feature, change to @ForceInline!
Thanks,
Paul.
[*] If you are running on linux you can use the excellent JMH perfasm feature to dump the hot parts of HotSpots generated code.
> On Feb 8, 2018, at 8:22 AM, Ben Walsh <ben_walsh at uk.ibm.com> wrote:
>
> Hi Paul,
>
> Following up with the requested loop and vectorization benchmarks ...
>
>
> (Do the vectorization benchmark results imply that the Hotspot compiler
> has been unable to perform the vectorization optimisation due to the
> presence of the reachabilityFence ?)
>
>
> -----------------------------------------------------------------------------------------------------------------------
>
>
> Loop Benchmarking
> ---- ------------
>
> package org.sample;
>
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
>
> import java.nio.ByteBuffer;
>
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
>
> @Param({"1", "10", "100", "1000", "10000"})
> public int L;
>
> @State(Scope.Benchmark)
> public static class ByteBufferContainer {
>
> ByteBuffer bb;
>
> @Setup(Level.Invocation)
> public void initByteBuffer() {
> bb = ByteBuffer.allocateDirect(10000);
> }
>
> ByteBuffer getByteBuffer() {
> return bb;
> }
> }
>
> @Benchmark
> public ByteBuffer benchmark_byte_buffer_put(ByteBufferContainer bbC) {
>
> ByteBuffer bb = bbC.getByteBuffer();
>
> for (int i = 0; i < L; i++) {
> bb.put((byte)i);
> }
>
> return bb;
> }
>
> }
>
>
> Without Changes
>
> Benchmark (L) Mode Cnt Score
> Error Units
> ByteBufferBenchmark.benchmark_byte_buffer_put 1 thrpt 200
> 29303145.752 ± 635979.750 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10 thrpt 200
> 24260859.017 ± 528891.303 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 100 thrpt 200
> 8512366.637 ± 136615.070 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 1000 thrpt 200
> 1323756.037 ± 21485.369 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10000 thrpt 200
> 145965.305 ± 1301.469 ops/s
>
>
> With Changes
>
> Benchmark (L) Mode Cnt Score
> Error Units Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put 1 thrpt 200
> 28893540.122 ± 754554.747 ops/s -1.398%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10 thrpt 200
> 15317696.355 ± 231621.608 ops/s -36.863%
> ByteBufferBenchmark.benchmark_byte_buffer_put 100 thrpt 200
> 2546599.578 ± 32136.873 ops/s -70.084%
> ByteBufferBenchmark.benchmark_byte_buffer_put 1000 thrpt 200
> 288832.514 ± 3854.522 ops/s -78.181%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10000 thrpt 200 29747.386
> ± 214.831 ops/s -79.620%
>
>
> -----------------------------------------------------------------------------------------------------------------------
>
>
> Vectorization Benchmarking
> ------------- ------------
>
> package org.sample;
>
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
>
> import java.nio.ByteBuffer;
>
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
>
> @Param({"1", "10", "100", "1000", "10000"})
> public int L;
>
> @State(Scope.Benchmark)
> public static class ByteBufferContainer {
>
> ByteBuffer bb;
>
> @Setup(Level.Invocation)
> public void initByteBuffer() {
> bb = ByteBuffer.allocateDirect(4 * 10000);
>
> for (int i = 0; i < 10000; i++) {
> bb.putInt(i);
> }
> }
>
> ByteBuffer getByteBuffer() {
> return bb;
> }
>
> }
>
> @Benchmark
> public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
>
> ByteBuffer bb = bbC.getByteBuffer();
>
> bb.position(0);
>
> int sum = 0;
>
> for (int i = 0; i < L; i++) {
> sum += bb.getInt();
> }
>
> return sum;
>
> }
>
> }
>
>
> Without Changes
>
> Benchmark (L) Mode Cnt Score
> Error Units
> ByteBufferBenchmark.benchmark_byte_buffer_put 1 thrpt 200
> 29677205.748 ± 544721.142 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10 thrpt 200
> 18219951.454 ± 320724.793 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 100 thrpt 200
> 7767650.826 ± 121798.910 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 1000 thrpt 200
> 1646075.010 ± 9804.499 ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10000 thrpt 200
> 183489.418 ± 1355.967 ops/s
>
>
> With Changes
>
> Benchmark (L) Mode Cnt Score
> Error Units Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put 1 thrpt 200
> 15230086.695 ± 390174.190 ops/s -48.681%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10 thrpt 200
> 8126310.728 ± 123661.342 ops/s -55.399%
> ByteBufferBenchmark.benchmark_byte_buffer_put 100 thrpt 200
> 1582699.233 ± 7278.744 ops/s -79.624%
> ByteBufferBenchmark.benchmark_byte_buffer_put 1000 thrpt 200
> 179726.465 ± 802.333 ops/s -89.082%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10000 thrpt 200 18327.049
> ± 9.506 ops/s -90.012%
>
>
>
> NB : For reference - for this and previous benchmarking results ...
>
> "Without Changes" and "With Changes" - java -version ...
>
> openjdk version "10-internal" 2018-03-20
> OpenJDK Runtime Environment (build 10-internal+0-adhoc.walshbp.jdk)
> OpenJDK 64-Bit Server VM (build 10-internal+0-adhoc.walshbp.jdk, mixed
> mode)
>
>
> -----------------------------------------------------------------------------------------------------------------------
>
>
> Regards,
> Ben Walsh
>
More information about the core-libs-dev
mailing list