Performance bug in and XXS patch for Parallel Old GC scanning large object arrays

Jon Masamitsu jon.masamitsu at oracle.com
Wed Mar 2 23:33:46 UTC 2016


Richard,

I've created

https://bugs.openjdk.java.net/browse/JDK-8151101

and attached your  mail to it.

Jon

On 03/02/2016 04:26 AM, Reingruber, Richard wrote:
> Hi,
>
> I would like to report a performance bug in the parallel old collector handling
> large object arrays and contribute an XXS patch for the bug that improves gc pauses by
> 5x in a microbenchmark. The problem is that scanning of large object arrays is
> not really parallelized.
>
> When marking through large object arrays, the elements are scanned in chunks of
> ObjArrayMarkingStride. This is done in
>
> ObjArrayKlass::objarray_follow_contents(ParCompactionManager* cm, oop obj, int index)
>
> In particular, the partially scanned array is pushed on the _objarray_stack only
> *after* the elements are marked and pushed on the marking stack. It would help
> parallelizing the work, if the array was made available to other threads (by
> pushing it on _objarray_stack) *before* the current thread visits its stride of
> elements.
>
> diff -r 5c4f8192021e src/share/vm/gc/parallel/psCompactionManager.inline.hpp
> --- a/src/share/vm/gc/parallel/psCompactionManager.inline.hpp	Tue Mar 01 09:42:19 2016 +0100
> +++ b/src/share/vm/gc/parallel/psCompactionManager.inline.hpp	Wed Mar 02 12:17:07 2016 +0100
> @@ -125,14 +125,14 @@
>     T* const beg = base + beg_index;
>     T* const end = base + end_index;
>   
> +  if (end_index < len) {
> +    cm->push_objarray(obj, end_index); // Push the continuation.
> +  }
> +
>     // Push the non-NULL elements of the next stride on the marking stack.
>     for (T* e = beg; e < end; e++) {
>       cm->mark_and_push<T>(e);
>     }
> -
> -  if (end_index < len) {
> -    cm->push_objarray(obj, end_index); // Push the continuation.
> -  }
>   }
>   
>   inline void ParCompactionManager::follow_contents(objArrayOop obj, int index) {
>
> Below you can find a microbenchmark that demonstrates the performance bug and
> the output of 2 runs without and with the patch on a linux x86_64 box with 32 hw
> threads and 16 cores.
>
> I'd suggest as well to reduce ObjArrayMarkingStride (512 by dflt) for the
> parallel old gc for better parallelization. G1 and the parallel scavenger for
> example use ParGCArrayScanChunk of 50). Both push the remainder *before*
> iterating the array.
>
> Regards,
> Richard.
>
> --
> Richard Reingruber | SAP JVM | CORE PLATFORM,  SAP SE
> Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx
>
> ###
> ### Benchmark Run 1 without the path
> ###
>
>
> $ ./jdk9/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
> openjdk version "9-internal"
> OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
> OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
>
> Building large array with 100000 elements of small trees with depth 4
> Heap stats: free:7769 MB (99.0 %)  max:7851 MB
> done
> Heap stats: free:6417 MB (81.7 %)  max:7851 MB
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 8759 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1856 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 3657 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 4280 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1823 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 2439 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1803 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1675 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1889 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1867 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1854 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1771 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1792 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1681 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1844 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1807 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1832 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1835 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1953 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1846 ms
> d038402 at ld3357 [/net/usr.work/d038402/tmp] 17:55:04
>
> ###
> ### Benchmark Run 2 without the path
> ###
>
> $ ./jdk9_mod/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
> openjdk version "9-internal"
> OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
> OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
>
> Building large array with 100000 elements of small trees with depth 4
> Heap stats: free:7769 MB (99.0 %)  max:7851 MB
> done
> Heap stats: free:6376 MB (81.2 %)  max:7851 MB
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 3684 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 1073 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 425 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 369 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 436 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 334 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 352 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 420 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 391 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 393 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 356 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 352 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 348 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 354 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 340 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 370 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 363 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 370 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 355 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %)  max:7851 MB
> Full GC duration: 348 ms
>
> ###
> ### Benchmark source code
> ###
>
> package test.gc;
>
> public class TestLargeObjectArray {
>
>      private static final long MB = 1 << 20;
>      private int arraySize;
>      private int treeDepth;
>      private int iterations;
>      private LittleTree[] arrayOfSmallTrees;
>
>      public static void main(String[] args) {
>          TestLargeObjectArray test = new TestLargeObjectArray();
>          test.parseCmdLine(args);
>          test.buildLargeArrayOfSmallTrees();
>          test.run();
>      }
>
>      private void run() {
>          while (iterations-- > 0) {
>              long start = System.currentTimeMillis();
>              message("Calling System.gc()");
>              System.gc();
>              long end = System.currentTimeMillis();
>              heapStats();
>              message("Full GC duration: " + (end - start) + " ms");
>          }
>      }
>
>      public void buildLargeArrayOfSmallTrees() {
>          try {
>              message("Building large array with " + arraySize + " elements of small trees with depth " + treeDepth);
>              heapStats();
>              arrayOfSmallTrees = new LittleTree[arraySize];
>              for (int i = 0; i < arraySize; i++) {
>                  arrayOfSmallTrees[i] = LittleTree.build(treeDepth);
>              }
>              message("done");
>              heapStats();
>          } catch (OutOfMemoryError oom) {
>              arrayOfSmallTrees = null;
>              message("OutOfMemory: please reduce array size or tree depth");
>              System.exit(1);
>          }
>      }
>
>      private void heapStats() {
>          long free = Runtime.getRuntime().freeMemory();
>          long max = Runtime.getRuntime().maxMemory();
>          float freePercentage = round((float) free / (float) max * 100);
>          message("Heap stats: free:" + free / MB + " MB (" + freePercentage + " %)" + "  max:" + max / MB + " MB");
>      }
>
>      private float round(float f) {
>          return Math.round(f * 10) / 10f;
>      }
>
>      private void parseCmdLine(String[] args) {
>          if (args.length != 3) {
>              message();
>              message("usage: java " + TestLargeObjectArray.class.getName()
>                      + " <test iterations, e.g. 20> <large array size, e.g. 50000> <small tree depth, e.g. 4>");
>              message();
>              System.exit(1);
>          }
>          int i = 0;
>          iterations = Integer.parseInt(args[i++]);
>          arraySize = Integer.parseInt(args[i++]);
>          treeDepth = Integer.parseInt(args[i++]);
>      }
>
>      private void message(String msg) {
>          System.out.println(msg);
>      }
>
>      private void message() {
>          System.out.println();
>      }
> }
>
> class LittleTree {
>
>      private static final int ARITY = 6;
>      @SuppressWarnings("unused")
>      private LittleTree[] subtrees;
>
>      public LittleTree(LittleTree[] trees) {
>          this.subtrees = trees;
>      }
>
>      public static LittleTree build(int treeDepth) {
>          if (treeDepth == 0) {
>              return null;
>          }
>
>          // recursion
>          LittleTree[] subtrees = new LittleTree[ARITY];
>          for (int i = 0; i < ARITY; i++) {
>              subtrees[i] = build(treeDepth - 1);
>          }
>          return new LittleTree(subtrees);
>      }
> }

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160302/cbe7b2bf/attachment.htm>


More information about the hotspot-gc-dev mailing list