Performance bug in and XXS patch for Parallel Old GC scanning large object arrays

Reingruber, Richard richard.reingruber at sap.com
Thu Mar 3 14:59:34 UTC 2016


Thanks, Jon.

I have created a webrev: http://cr.openjdk.java.net/~goetz/wr16/8151101_paroldgc_parallelization_of_large_obj_arrays/webrev/

Regards,
Richard.

From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of Jon Masamitsu
Sent: Donnerstag, 3. März 2016 00:34
To: hotspot-gc-dev at openjdk.java.net
Subject: Re: Performance bug in and XXS patch for Parallel Old GC scanning large object arrays

Richard,

I've created

https://bugs.openjdk.java.net/browse/JDK-8151101

and attached your  mail to it.

Jon
On 03/02/2016 04:26 AM, Reingruber, Richard wrote:

Hi,



I would like to report a performance bug in the parallel old collector handling

large object arrays and contribute an XXS patch for the bug that improves gc pauses by

5x in a microbenchmark. The problem is that scanning of large object arrays is

not really parallelized.



When marking through large object arrays, the elements are scanned in chunks of

ObjArrayMarkingStride. This is done in



ObjArrayKlass::objarray_follow_contents(ParCompactionManager* cm, oop obj, int index)



In particular, the partially scanned array is pushed on the _objarray_stack only

*after* the elements are marked and pushed on the marking stack. It would help

parallelizing the work, if the array was made available to other threads (by

pushing it on _objarray_stack) *before* the current thread visits its stride of

elements.



diff -r 5c4f8192021e src/share/vm/gc/parallel/psCompactionManager.inline.hpp

--- a/src/share/vm/gc/parallel/psCompactionManager.inline.hpp  Tue Mar 01 09:42:19 2016 +0100

+++ b/src/share/vm/gc/parallel/psCompactionManager.inline.hpp  Wed Mar 02 12:17:07 2016 +0100

@@ -125,14 +125,14 @@

   T* const beg = base + beg_index;

   T* const end = base + end_index;



+  if (end_index < len) {

+    cm->push_objarray(obj, end_index); // Push the continuation.

+  }

+

   // Push the non-NULL elements of the next stride on the marking stack.

   for (T* e = beg; e < end; e++) {

     cm->mark_and_push<T>(e);

   }

-

-  if (end_index < len) {

-    cm->push_objarray(obj, end_index); // Push the continuation.

-  }

 }



 inline void ParCompactionManager::follow_contents(objArrayOop obj, int index) {



Below you can find a microbenchmark that demonstrates the performance bug and

the output of 2 runs without and with the patch on a linux x86_64 box with 32 hw

threads and 16 cores.



I'd suggest as well to reduce ObjArrayMarkingStride (512 by dflt) for the

parallel old gc for better parallelization. G1 and the parallel scavenger for

example use ParGCArrayScanChunk of 50). Both push the remainder *before*

iterating the array.



Regards,

Richard.



--

Richard Reingruber | SAP JVM | CORE PLATFORM,  SAP SE

Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx



###

### Benchmark Run 1 without the path

###





$ ./jdk9/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4

openjdk version "9-internal"

OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)

OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)



Building large array with 100000 elements of small trees with depth 4

Heap stats: free:7769 MB (99.0 %)  max:7851 MB

done

Heap stats: free:6417 MB (81.7 %)  max:7851 MB

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 8759 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1856 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 3657 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 4280 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1823 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 2439 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1803 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1675 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1889 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1867 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1854 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1771 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1792 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1681 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1844 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1807 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1832 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1835 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1953 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1846 ms

d038402 at ld3357 [/net/usr.work/d038402/tmp] 17:55:04



###

### Benchmark Run 2 without the path

###



$ ./jdk9_mod/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4

openjdk version "9-internal"

OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)

OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)



Building large array with 100000 elements of small trees with depth 4

Heap stats: free:7769 MB (99.0 %)  max:7851 MB

done

Heap stats: free:6376 MB (81.2 %)  max:7851 MB

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 3684 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 1073 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 425 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 369 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 436 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 334 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 352 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 420 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 391 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 393 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 356 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 352 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 348 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 354 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 340 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 370 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 363 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 370 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 355 ms

Calling System.gc()

Heap stats: free:6467 MB (82.4 %)  max:7851 MB

Full GC duration: 348 ms



###

### Benchmark source code

###



package test.gc;



public class TestLargeObjectArray {



    private static final long MB = 1 << 20;

    private int arraySize;

    private int treeDepth;

    private int iterations;

    private LittleTree[] arrayOfSmallTrees;



    public static void main(String[] args) {

        TestLargeObjectArray test = new TestLargeObjectArray();

        test.parseCmdLine(args);

        test.buildLargeArrayOfSmallTrees();

        test.run();

    }



    private void run() {

        while (iterations-- > 0) {

            long start = System.currentTimeMillis();

            message("Calling System.gc()");

            System.gc();

            long end = System.currentTimeMillis();

            heapStats();

            message("Full GC duration: " + (end - start) + " ms");

        }

    }



    public void buildLargeArrayOfSmallTrees() {

        try {

            message("Building large array with " + arraySize + " elements of small trees with depth " + treeDepth);

            heapStats();

            arrayOfSmallTrees = new LittleTree[arraySize];

            for (int i = 0; i < arraySize; i++) {

                arrayOfSmallTrees[i] = LittleTree.build(treeDepth);

            }

            message("done");

            heapStats();

        } catch (OutOfMemoryError oom) {

            arrayOfSmallTrees = null;

            message("OutOfMemory: please reduce array size or tree depth");

            System.exit(1);

        }

    }



    private void heapStats() {

        long free = Runtime.getRuntime().freeMemory();

        long max = Runtime.getRuntime().maxMemory();

        float freePercentage = round((float) free / (float) max * 100);

        message("Heap stats: free:" + free / MB + " MB (" + freePercentage + " %)" + "  max:" + max / MB + " MB");

    }



    private float round(float f) {

        return Math.round(f * 10) / 10f;

    }



    private void parseCmdLine(String[] args) {

        if (args.length != 3) {

            message();

            message("usage: java " + TestLargeObjectArray.class.getName()

                    + " <test iterations, e.g. 20> <large array size, e.g. 50000> <small tree depth, e.g. 4>");

            message();

            System.exit(1);

        }

        int i = 0;

        iterations = Integer.parseInt(args[i++]);

        arraySize = Integer.parseInt(args[i++]);

        treeDepth = Integer.parseInt(args[i++]);

    }



    private void message(String msg) {

        System.out.println(msg);

    }



    private void message() {

        System.out.println();

    }

}



class LittleTree {



    private static final int ARITY = 6;

    @SuppressWarnings("unused")

    private LittleTree[] subtrees;



    public LittleTree(LittleTree[] trees) {

        this.subtrees = trees;

    }



    public static LittleTree build(int treeDepth) {

        if (treeDepth == 0) {

            return null;

        }



        // recursion

        LittleTree[] subtrees = new LittleTree[ARITY];

        for (int i = 0; i < ARITY; i++) {

            subtrees[i] = build(treeDepth - 1);

        }

        return new LittleTree(subtrees);

    }

}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160303/0f109103/attachment.htm>


More information about the hotspot-gc-dev mailing list