[PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

Paul Sandoz paul.sandoz at oracle.com
Tue Feb 27 02:50:08 UTC 2018


Hi Ben,

Here is the webrev online:

  http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html

(I don’t know if you have any colleagues with author or above roles in OpenJDK to upload for you, it might be faster.)

Reference.java
—

 423     @ForceInline
 424     public static void reachabilityFence(Object ref) {
 425         // Does nothing, because this method is annotated with @DontInline
 426         // HotSpot needs to retain the ref and not GC it before a call to this
 427         // method
 428     }

We need to update the comment, preferably using a summary of Vladimir’s analysis.


Direct-X-Buffer-bin.java.template
—

  34     private $type$ get$Type$(long a) {
  35         $memtype$ x = UNSAFE.get$Memtype$Unaligned(null, a, bigEndian);
  36         $type$ y = $fromBits$(x);
  37         Reference.reachabilityFence(this);
  38         return y;
  39     }

It’s overkill in the above case but for good practice reasons i recommend for all usages using a try/finally block as suggested in the JavaDoc for Reference.reachabilityFence.


Direct-X-Buffer.java.template
—

 260     public $type$ get() {
 261         return $fromBits$($swap$(UNSAFE.get$Swaptype$(ix(nextGetIndex()))));
 262     }
 263 
 264     public $type$ get(int i) {
 265         return $fromBits$($swap$(UNSAFE.get$Swaptype$(ix(checkIndex(i)))));
 266     }
 267 
 268 #if[streamableType]
 269     $type$ getUnchecked(int i) {
 270         return $fromBits$($swap$(UNSAFE.get$Swaptype$(ix(i))));
 271     }
 272 #end[streamableType]

Missing fences. We also need to look carefully at the bulk operations as well, you have a fence for the bulk put accepting a ByteBuffer although a fence is likely required on the src as well.


 506     byte _get(int i) {                          // package-private
 507         return UNSAFE.getByte(address + i);
 508     }

AFAICT the _get and _put methods are no longer used and the code could be deleted (left over from other refactoring to the view classes).

Thanks,
Paul.


 

> On Feb 19, 2018, at 8:37 AM, Ben Walsh <ben_walsh at uk.ibm.com> wrote:
> 
> As requested, here are the results with modifications to the annotations 
> on Reference.reachabilityFence. Much more promising ...
> 
> 
> * Benchmark 1 *
> 
> Test Code :
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> public class ByteBufferBenchmark {
> 
>    @State(Scope.Benchmark)
>    public static class ByteBufferContainer {
> 
>        ByteBuffer bb;
> 
>        @Setup(Level.Invocation)
>        public void initByteBuffer() {
>            bb = ByteBuffer.allocateDirect(1);
>        }
> 
>        ByteBuffer getByteBuffer() {
>            return bb;
>        }
>    }
> 
>    @Benchmark
>    public void benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>        bbC.getByteBuffer().put((byte)42);
>    }
> 
> }
> 
> Results :
> 
> - Unmodified Build -
> 
> Benchmark                                       Mode  Cnt         Score   
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  35604933.518 ± 
> 654975.515  ops/s
> 
> - Build With Reference.reachabilityFences Added -
> 
> Benchmark                                       Mode  Cnt         Score   
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  33100911.857 ± 
> 747461.951  ops/s  -7.033%
> 
> - Build With Reference.reachabilityFences Added And DontInline Replaced 
> With ForceInline -
> 
> Benchmark                                       Mode  Cnt         Score   
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  34836320.294 ± 
> 640188.408  ops/s  -2.159%
> 
> - Build With Reference.reachabilityFences Added And DontInline Removed -
> 
> Benchmark                                       Mode  Cnt         Score   
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  34740015.332 ± 
> 556578.542  ops/s  -2.429%
> 
> 
> * Benchmark 2 *
> 
> Test Code :
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>    @Param({"1", "10", "100", "1000", "10000"})
>    public int L;
> 
>    @State(Scope.Benchmark)
>    public static class ByteBufferContainer {
> 
>        ByteBuffer bb;
> 
>        @Setup(Level.Invocation)
>        public void initByteBuffer() {
>            bb = ByteBuffer.allocateDirect(10000);
>        }
> 
>        ByteBuffer getByteBuffer() {
>            return bb;
>        }
>    }
> 
>    @Benchmark
>    public ByteBuffer benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>        ByteBuffer bb = bbC.getByteBuffer();
> 
>        for (int i = 0; i < L; i++) {
>            bb.put((byte)i);
>        }
> 
>        return bb;
>    }
> 
> }
> 
> Results :
> 
> - Unmodified Build -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29303145.752 ± 635979.750  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 24260859.017 ± 528891.303  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 8512366.637 ± 136615.070  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1323756.037 ±  21485.369  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 145965.305 ±   1301.469  ops/s
> 
> - Build With Reference.reachabilityFences Added -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 28893540.122 ± 754554.747  ops/s  -1.398%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 15317696.355 ± 231621.608  ops/s  -36.863%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 2546599.578 ±  32136.873  ops/s  -70.084%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 288832.514 ±   3854.522  ops/s  -78.181%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 29747.386 
> ±    214.831  ops/s  -79.620%
> 
> - Build With Reference.reachabilityFences Added And DontInline Replaced 
> With ForceInline -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29372326.859 ± 525988.179  ops/s  +0.236%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 24326735.480 ± 484358.862  ops/s  +0.272%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 8492692.912 ± 120924.878  ops/s  -0.231%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1332131.417 ±  14981.587  ops/s  +0.633%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 144990.569 ±   1518.877  ops/s  -0.668%
> 
> - Build With Reference.reachabilityFences Added And DontInline Removed -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29842696.017 ± 462902.634  ops/s  +1.841%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 24842729.069 ± 436174.452  ops/s  +2.398%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 8518393.953 ± 129254.536  ops/s  +0.071%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1344772.370 ±  15916.867  ops/s  +1.588%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 145087.256 ±   1277.491  ops/s  -0.602%
> 
> 
> * Benchmark 3 *
> 
> Test Code :
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>    @Param({"1", "10", "100", "1000", "10000"})
>    public int L;
> 
>    @State(Scope.Benchmark)
>    public static class ByteBufferContainer {
> 
>        ByteBuffer bb;
> 
>        @Setup(Level.Invocation)
>        public void initByteBuffer() {
>            bb = ByteBuffer.allocateDirect(4 * 10000);
> 
>            for (int i = 0; i < 10000; i++) {
>                bb.putInt(i);
>            }
>        }
> 
>        ByteBuffer getByteBuffer() {
>            return bb;
>        }
> 
>    }
> 
>    @Benchmark
>    public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>        ByteBuffer bb = bbC.getByteBuffer();
> 
>        bb.position(0);
> 
>        int sum = 0;
> 
>        for (int i = 0; i < L; i++) {
>            sum += bb.getInt();
>        }
> 
>        return sum;
> 
>    }
> 
> }
> 
> Results :
> 
> - Unmodified Build -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29677205.748 ± 544721.142  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 18219951.454 ± 320724.793  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 7767650.826 ± 121798.910  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1646075.010 ±   9804.499  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 183489.418 ±   1355.967  ops/s
> 
> - Build With Reference.reachabilityFences Added -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 15230086.695 ± 390174.190  ops/s  -48.681%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 8126310.728 ± 123661.342  ops/s  -55.399%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 1582699.233 ±   7278.744  ops/s  -79.624%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 179726.465 ±    802.333  ops/s  -89.082%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 18327.049 
> ±      9.506  ops/s  -90.012%
> 
> - Build With Reference.reachabilityFences Added And DontInline Replaced 
> With ForceInline -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29839190.147 ± 576585.796  ops/s  +0.546%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 18397768.759 ± 338144.327  ops/s  +0.976%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 7746079.875 ± 101621.105  ops/s  -0.278%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1629413.444 ±  24163.399  ops/s  -1.012%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 182250.811 ±   2028.461  ops/s  -0.675%
> 
> - Build With Reference.reachabilityFences Added And DontInline Removed -
> 
> Benchmark                                        (L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
> 29442980.464 ± 556324.877  ops/s  -0.789%
> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
> 18401757.539 ± 419383.901  ops/s  +0.998%
> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
> 7816766.062 ± 100144.611  ops/s  +0.632%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1636811.564 ±  13811.447  ops/s  -0.563%
> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 183463.292 ±   2056.016  ops/s  -0.014%
> 
> 
> Regards,
> Ben
> 
> 
> 
> From:   Paul Sandoz <paul.sandoz at oracle.com>
> To:     Ben Walsh <ben_walsh at uk.ibm.com>
> Cc:     core-libs-dev <core-libs-dev at openjdk.java.net>
> Date:   08/02/2018 16:54
> Subject:        Re: [PATCH] Reduce Chance Of Mistakenly Early Backing 
> Memory Cleanup
> 
> 
> 
> Hi Ben,
> 
> Thanks. I anticipated a performance hit but not necessarily a 10x. Without 
> looking at the generated code of the benchmark method it is hard to be 
> sure [*], but i believe the fence is interfering with loop unrolling 
> and/or vectorization, the comparative differences between byte and int may 
> be related to vectorization (for byte there may be less or limited support 
> for vectorization).
> 
> How about we now try another experiment commenting out the @DontInline on 
> the fence method and re-run the benchmarks. From Peter’s observations and 
> Vladimir’s analysis we should be able to remove that, or even, contrary to 
> what we initial expected when adding this feature, change to @ForceInline!
> 
> Thanks,
> Paul.
> 
> [*] If you are running on linux you can use the excellent JMH perfasm 
> feature to dump the hot parts of HotSpots generated code.
> 
>> On Feb 8, 2018, at 8:22 AM, Ben Walsh <ben_walsh at uk.ibm.com> wrote:
>> 
>> Hi Paul,
>> 
>> Following up with the requested loop and vectorization benchmarks ...
>> 
>> 
>> (Do the vectorization benchmark results imply that the Hotspot compiler 
>> has been unable to perform the vectorization optimisation due to the 
>> presence of the reachabilityFence ?)
>> 
>> 
>> 
> -----------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> Loop Benchmarking
>> ---- ------------
>> 
>> package org.sample;
>> 
>> import org.openjdk.jmh.annotations.Benchmark;
>> import org.openjdk.jmh.annotations.Level;
>> import org.openjdk.jmh.annotations.Param;
>> import org.openjdk.jmh.annotations.Scope;
>> import org.openjdk.jmh.annotations.Setup;
>> import org.openjdk.jmh.annotations.State;
>> 
>> import java.nio.ByteBuffer;
>> 
>> @State(Scope.Benchmark)
>> public class ByteBufferBenchmark {
>> 
>>   @Param({"1", "10", "100", "1000", "10000"})
>>   public int L;
>> 
>>   @State(Scope.Benchmark)
>>   public static class ByteBufferContainer {
>> 
>>       ByteBuffer bb;
>> 
>>       @Setup(Level.Invocation)
>>       public void initByteBuffer() {
>>           bb = ByteBuffer.allocateDirect(10000);
>>       }
>> 
>>       ByteBuffer getByteBuffer() {
>>           return bb;
>>       }
>>   }
>> 
>>   @Benchmark
>>   public ByteBuffer benchmark_byte_buffer_put(ByteBufferContainer bbC) 
> {
>> 
>>       ByteBuffer bb = bbC.getByteBuffer();
>> 
>>       for (int i = 0; i < L; i++) {
>>           bb.put((byte)i);
>>       }
>> 
>>       return bb;
>>   }
>> 
>> }
>> 
>> 
>> Without Changes
>> 
>> Benchmark                                        (L)   Mode  Cnt Score 
>> Error  Units
>> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
>> 29303145.752 ± 635979.750  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
>> 24260859.017 ± 528891.303  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
>> 8512366.637 ± 136615.070  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
>> 1323756.037 ±  21485.369  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
>> 145965.305 ±   1301.469  ops/s
>> 
>> 
>> With Changes
>> 
>> Benchmark                                        (L)   Mode  Cnt Score 
>> Error  Units  Impact
>> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
>> 28893540.122 ± 754554.747  ops/s  -1.398%
>> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
>> 15317696.355 ± 231621.608  ops/s  -36.863%
>> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
>> 2546599.578 ±  32136.873  ops/s  -70.084%
>> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
>> 288832.514 ±   3854.522  ops/s  -78.181%
>> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 29747.386 
>> ±    214.831  ops/s  -79.620%
>> 
>> 
>> 
> -----------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> Vectorization Benchmarking
>> ------------- ------------
>> 
>> package org.sample;
>> 
>> import org.openjdk.jmh.annotations.Benchmark;
>> import org.openjdk.jmh.annotations.Level;
>> import org.openjdk.jmh.annotations.Param;
>> import org.openjdk.jmh.annotations.Scope;
>> import org.openjdk.jmh.annotations.Setup;
>> import org.openjdk.jmh.annotations.State;
>> 
>> import java.nio.ByteBuffer;
>> 
>> @State(Scope.Benchmark)
>> public class ByteBufferBenchmark {
>> 
>>   @Param({"1", "10", "100", "1000", "10000"})
>>   public int L;
>> 
>>   @State(Scope.Benchmark)
>>   public static class ByteBufferContainer {
>> 
>>       ByteBuffer bb;
>> 
>>       @Setup(Level.Invocation)
>>       public void initByteBuffer() {
>>           bb = ByteBuffer.allocateDirect(4 * 10000);
>> 
>>           for (int i = 0; i < 10000; i++) {
>>               bb.putInt(i);
>>           }
>>       }
>> 
>>       ByteBuffer getByteBuffer() {
>>           return bb;
>>       }
>> 
>>   }
>> 
>>   @Benchmark
>>   public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
>> 
>>       ByteBuffer bb = bbC.getByteBuffer();
>> 
>>       bb.position(0);
>> 
>>       int sum = 0;
>> 
>>       for (int i = 0; i < L; i++) {
>>           sum += bb.getInt();
>>       }
>> 
>>       return sum;
>> 
>>   }
>> 
>> }
>> 
>> 
>> Without Changes
>> 
>> Benchmark                                        (L)   Mode  Cnt Score 
>> Error  Units
>> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
>> 29677205.748 ± 544721.142  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
>> 18219951.454 ± 320724.793  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
>> 7767650.826 ± 121798.910  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
>> 1646075.010 ±   9804.499  ops/s
>> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
>> 183489.418 ±   1355.967  ops/s
>> 
>> 
>> With Changes
>> 
>> Benchmark                                        (L)   Mode  Cnt Score 
>> Error  Units  Impact
>> ByteBufferBenchmark.benchmark_byte_buffer_put      1  thrpt  200 
>> 15230086.695 ± 390174.190  ops/s  -48.681%
>> ByteBufferBenchmark.benchmark_byte_buffer_put     10  thrpt  200 
>> 8126310.728 ± 123661.342  ops/s  -55.399%
>> ByteBufferBenchmark.benchmark_byte_buffer_put    100  thrpt  200 
>> 1582699.233 ±   7278.744  ops/s  -79.624%
>> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
>> 179726.465 ±    802.333  ops/s  -89.082%
>> ByteBufferBenchmark.benchmark_byte_buffer_put  10000  thrpt  200 
> 18327.049 
>> ±      9.506  ops/s  -90.012%
>> 
>> 
>> 
>> NB : For reference - for this and previous benchmarking results ...
>> 
>> "Without Changes" and "With Changes" - java -version ...
>> 
>> openjdk version "10-internal" 2018-03-20
>> OpenJDK Runtime Environment (build 10-internal+0-adhoc.walshbp.jdk)
>> OpenJDK 64-Bit Server VM (build 10-internal+0-adhoc.walshbp.jdk, mixed 
>> mode)
>> 
>> 
>> 
> -----------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> Regards,
>> Ben Walsh
>> 
> 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 



More information about the core-libs-dev mailing list