Performance bug in and XXS patch for Parallel Old GC scanning large object arrays
Jon Masamitsu
jon.masamitsu at oracle.com
Wed Mar 2 23:33:46 UTC 2016
Richard,
I've created
https://bugs.openjdk.java.net/browse/JDK-8151101
and attached your mail to it.
Jon
On 03/02/2016 04:26 AM, Reingruber, Richard wrote:
> Hi,
>
> I would like to report a performance bug in the parallel old collector handling
> large object arrays and contribute an XXS patch for the bug that improves gc pauses by
> 5x in a microbenchmark. The problem is that scanning of large object arrays is
> not really parallelized.
>
> When marking through large object arrays, the elements are scanned in chunks of
> ObjArrayMarkingStride. This is done in
>
> ObjArrayKlass::objarray_follow_contents(ParCompactionManager* cm, oop obj, int index)
>
> In particular, the partially scanned array is pushed on the _objarray_stack only
> *after* the elements are marked and pushed on the marking stack. It would help
> parallelizing the work, if the array was made available to other threads (by
> pushing it on _objarray_stack) *before* the current thread visits its stride of
> elements.
>
> diff -r 5c4f8192021e src/share/vm/gc/parallel/psCompactionManager.inline.hpp
> --- a/src/share/vm/gc/parallel/psCompactionManager.inline.hpp Tue Mar 01 09:42:19 2016 +0100
> +++ b/src/share/vm/gc/parallel/psCompactionManager.inline.hpp Wed Mar 02 12:17:07 2016 +0100
> @@ -125,14 +125,14 @@
> T* const beg = base + beg_index;
> T* const end = base + end_index;
>
> + if (end_index < len) {
> + cm->push_objarray(obj, end_index); // Push the continuation.
> + }
> +
> // Push the non-NULL elements of the next stride on the marking stack.
> for (T* e = beg; e < end; e++) {
> cm->mark_and_push<T>(e);
> }
> -
> - if (end_index < len) {
> - cm->push_objarray(obj, end_index); // Push the continuation.
> - }
> }
>
> inline void ParCompactionManager::follow_contents(objArrayOop obj, int index) {
>
> Below you can find a microbenchmark that demonstrates the performance bug and
> the output of 2 runs without and with the patch on a linux x86_64 box with 32 hw
> threads and 16 cores.
>
> I'd suggest as well to reduce ObjArrayMarkingStride (512 by dflt) for the
> parallel old gc for better parallelization. G1 and the parallel scavenger for
> example use ParGCArrayScanChunk of 50). Both push the remainder *before*
> iterating the array.
>
> Regards,
> Richard.
>
> --
> Richard Reingruber | SAP JVM | CORE PLATFORM, SAP SE
> Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx
>
> ###
> ### Benchmark Run 1 without the path
> ###
>
>
> $ ./jdk9/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
> openjdk version "9-internal"
> OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
> OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
>
> Building large array with 100000 elements of small trees with depth 4
> Heap stats: free:7769 MB (99.0 %) max:7851 MB
> done
> Heap stats: free:6417 MB (81.7 %) max:7851 MB
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 8759 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1856 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 3657 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 4280 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1823 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 2439 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1803 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1675 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1889 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1867 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1854 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1771 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1792 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1681 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1844 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1807 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1832 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1835 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1953 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1846 ms
> d038402 at ld3357 [/net/usr.work/d038402/tmp] 17:55:04
>
> ###
> ### Benchmark Run 2 without the path
> ###
>
> $ ./jdk9_mod/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
> openjdk version "9-internal"
> OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
> OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)
>
> Building large array with 100000 elements of small trees with depth 4
> Heap stats: free:7769 MB (99.0 %) max:7851 MB
> done
> Heap stats: free:6376 MB (81.2 %) max:7851 MB
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 3684 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 1073 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 425 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 369 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 436 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 334 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 352 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 420 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 391 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 393 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 356 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 352 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 348 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 354 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 340 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 370 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 363 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 370 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 355 ms
> Calling System.gc()
> Heap stats: free:6467 MB (82.4 %) max:7851 MB
> Full GC duration: 348 ms
>
> ###
> ### Benchmark source code
> ###
>
> package test.gc;
>
> public class TestLargeObjectArray {
>
> private static final long MB = 1 << 20;
> private int arraySize;
> private int treeDepth;
> private int iterations;
> private LittleTree[] arrayOfSmallTrees;
>
> public static void main(String[] args) {
> TestLargeObjectArray test = new TestLargeObjectArray();
> test.parseCmdLine(args);
> test.buildLargeArrayOfSmallTrees();
> test.run();
> }
>
> private void run() {
> while (iterations-- > 0) {
> long start = System.currentTimeMillis();
> message("Calling System.gc()");
> System.gc();
> long end = System.currentTimeMillis();
> heapStats();
> message("Full GC duration: " + (end - start) + " ms");
> }
> }
>
> public void buildLargeArrayOfSmallTrees() {
> try {
> message("Building large array with " + arraySize + " elements of small trees with depth " + treeDepth);
> heapStats();
> arrayOfSmallTrees = new LittleTree[arraySize];
> for (int i = 0; i < arraySize; i++) {
> arrayOfSmallTrees[i] = LittleTree.build(treeDepth);
> }
> message("done");
> heapStats();
> } catch (OutOfMemoryError oom) {
> arrayOfSmallTrees = null;
> message("OutOfMemory: please reduce array size or tree depth");
> System.exit(1);
> }
> }
>
> private void heapStats() {
> long free = Runtime.getRuntime().freeMemory();
> long max = Runtime.getRuntime().maxMemory();
> float freePercentage = round((float) free / (float) max * 100);
> message("Heap stats: free:" + free / MB + " MB (" + freePercentage + " %)" + " max:" + max / MB + " MB");
> }
>
> private float round(float f) {
> return Math.round(f * 10) / 10f;
> }
>
> private void parseCmdLine(String[] args) {
> if (args.length != 3) {
> message();
> message("usage: java " + TestLargeObjectArray.class.getName()
> + " <test iterations, e.g. 20> <large array size, e.g. 50000> <small tree depth, e.g. 4>");
> message();
> System.exit(1);
> }
> int i = 0;
> iterations = Integer.parseInt(args[i++]);
> arraySize = Integer.parseInt(args[i++]);
> treeDepth = Integer.parseInt(args[i++]);
> }
>
> private void message(String msg) {
> System.out.println(msg);
> }
>
> private void message() {
> System.out.println();
> }
> }
>
> class LittleTree {
>
> private static final int ARITY = 6;
> @SuppressWarnings("unused")
> private LittleTree[] subtrees;
>
> public LittleTree(LittleTree[] trees) {
> this.subtrees = trees;
> }
>
> public static LittleTree build(int treeDepth) {
> if (treeDepth == 0) {
> return null;
> }
>
> // recursion
> LittleTree[] subtrees = new LittleTree[ARITY];
> for (int i = 0; i < ARITY; i++) {
> subtrees[i] = build(treeDepth - 1);
> }
> return new LittleTree(subtrees);
> }
> }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160302/cbe7b2bf/attachment.htm>
More information about the hotspot-gc-dev
mailing list