RFR: 8338967: Improve performance for MemorySegment::fill [v4]

Mon Aug 26 21:41:02 UTC 2024

On Mon, 26 Aug 2024 14:25:31 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> It is true, that this is a compromise where we give up inline space, code-cache space, and introduce added complexity against the prospect of better small-size performance. Depending on the workload, this may or may not pay off. In the (presumably common) case where we allocate/fill small segments of constant sizes, this is likely a win. Writing a dynamic performance test sounds like a good idea.
>
> Here is a benchmark that fills segments of various random sizes:
> 
> 
> 
> @BenchmarkMode(Mode.AverageTime)
> @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
> @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
> @State(Scope.Thread)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Fork(value = 3)
> public class TestFill {
> 
>     private static final int SIZE = 16;
>     private static final int[] INDICES = new Random(42).ints(0, 8)
>             .limit(SIZE)
>             .toArray();
> 
> 
>     private MemorySegment[] segments;
> 
>     @Setup
>     public void setup() {
>         segments = IntStream.of(INDICES)
>                 .mapToObj(i -> MemorySegment.ofArray(new byte[i]))
>                 .toArray(MemorySegment[]::new);
>     }
> 
>     @Benchmark
>     public void heap_segment_fill() {
>         for (int i = 0; i < SIZE; i++) {
>             segments[i].fill((byte) 0);
>         }
>     }
> 
> }
> 
> 
> This produces the following on my Mac M1:
> 
> 
> Benchmark                   Mode  Cnt   Score   Error  Units
> TestFill.heap_segment_fill  avgt   30  59.054 ? 3.723  ns/op
> 
> 
> On average, an operation will take 59/16 = ~3 ns per operation (including looping).
> 
> A test with the same size for every benchmark looks like this on my machine:
> 
> 
> Benchmark                   (ELEM_SIZE)  Mode  Cnt  Score   Error  Units
> TestFill.heap_segment_fill            0  avgt   30  1.112 ? 0.027  ns/op
> TestFill.heap_segment_fill            1  avgt   30  1.602 ? 0.060  ns/op
> TestFill.heap_segment_fill            2  avgt   30  1.583 ? 0.004  ns/op
> TestFill.heap_segment_fill            3  avgt   30  1.909 ? 0.055  ns/op
> TestFill.heap_segment_fill            4  avgt   30  1.605 ? 0.059  ns/op
> TestFill.heap_segment_fill            5  avgt   30  1.900 ? 0.064  ns/op
> TestFill.heap_segment_fill            6  avgt   30  1.891 ? 0.038  ns/op
> TestFill.heap_segment_fill            7  avgt   30  2.237 ? 0.091  ns/op

As discussed offline, can't we use a stable array of functions or something like that which can be populated lazily? That way you can access the function you want in a single array access, and we could put all these helper methods somewhere else.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1731855496