From rwestrel at redhat.com Tue Jan 3 09:41:07 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 03 Jan 2017 10:41:07 +0100 Subject: move in cset test from stub to c2 IR Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/incsettest/webrev.00/ With some code refactoring... Roland. From rkennke at redhat.com Tue Jan 3 10:15:05 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 03 Jan 2017 11:15:05 +0100 Subject: move in cset test from stub to c2 IR In-Reply-To: References: Message-ID: <1483438505.5843.0.camel@redhat.com> Am Dienstag, den 03.01.2017, 10:41 +0100 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/incsettest/webrev.00/ > > With some code refactoring... Patch looks good to me! This warrants some serious testing using gcbench. Good that you made this possible to turn on and off :-) Roman From rwestrel at redhat.com Tue Jan 3 10:32:30 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 03 Jan 2017 10:32:30 +0000 Subject: hg: shenandoah/jdk9/hotspot: in cset fast test in C2 IR Message-ID: <201701031032.v03AWUbG005700@aojmv0008.oracle.com> Changeset: 313a04b8b17d Author: roland Date: 2017-01-03 11:31 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/313a04b8b17d in cset fast test in C2 IR ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/shenandoahSupport.cpp From rwestrel at redhat.com Tue Jan 3 12:59:02 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 03 Jan 2017 13:59:02 +0100 Subject: Attempt some loop opts after write barrier expansion Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/wbloopopts/webrev.00/ This attempts a few more rounds of loop opts after write barrier expansion. When there are 2 write barriers in a row, the evacuation in progress tests are merged: if (evac_in_progress) { slow_path_1 } else { fast_path_1 } if (evac_in_progress) { slow_path_2 } else { fast_path_2 } becomes: if (evac_in_progress) { slow_path_1 slow_path_2 } else { fast_path_1 fast_path_2 } Loops are unswitched when they contain an evacuation in progress test that can be moved out of the loop (i.e. no safepoint = -UseCountedLoopSafepoints). for (;;) { some_stuff if (evac_in_progress) { slow_path } else { fast_path } more_stuff } becomes if (evac_in_progress) { for (;;) { some_stuff slow_path more_stuff } } else { for (;;) { some_stuff fast_path more_stuff } } Roland. From rkennke at redhat.com Tue Jan 3 18:00:29 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 3 Jan 2017 13:00:29 -0500 (EST) Subject: Attempt some loop opts after write barrier expansion Message-ID: <752096852.6449451.1483466429868.JavaMail.zimbra@zmail25.collab.prod.int.phx2.redhat.com> Looks good and sounds very useful. Not your fault, but somebody should change those flags to 1<<$shift :-S RomanAm 03.01.2017 1:59 nachm. schrieb Roland Westrelin : > > > http://cr.openjdk.java.net/~roland/shenandoah/wbloopopts/webrev.00/ > > This attempts a few more rounds of loop opts after write barrier > expansion. > > When there are 2 write barriers in a row, the evacuation in progress > tests are merged: > > if (evac_in_progress) { > ? slow_path_1 > } else { > ? fast_path_1 > } > if (evac_in_progress) { > ? slow_path_2 > } else { > ? fast_path_2 > } > > becomes: > > if (evac_in_progress) { > ? slow_path_1 > ? slow_path_2 > } else { > ? fast_path_1 > ? fast_path_2 > } > > Loops are unswitched when they contain an evacuation in progress test > that can be moved out of the loop (i.e. no safepoint = > -UseCountedLoopSafepoints). > > for (;;) { > ? some_stuff > ? if (evac_in_progress) { > ??? slow_path > ? } else { > ??? fast_path > ? } > ? more_stuff > } > > becomes > > if (evac_in_progress) { > ? for (;;) { > ??? some_stuff > ??? slow_path > ??? more_stuff > ? } > } else { > ? for (;;) { > ??? some_stuff > ??? fast_path > ??? more_stuff > ? } > } > > Roland. From roman at kennke.org Wed Jan 4 16:25:58 2017 From: roman at kennke.org (roman at kennke.org) Date: Wed, 04 Jan 2017 16:25:58 +0000 Subject: hg: shenandoah/jdk8u/hotspot: 19 new changesets Message-ID: <201701041625.v04GPwSI000389@aojmv0008.oracle.com> Changeset: 9fe66b8f9d19 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9fe66b8f9d19 Avoid evacuation if concurrent GC was cancelled. Make sure Full GC is able to recover. ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/runtime/thread.cpp Changeset: 1def7a9a30be Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/1def7a9a30be Fix TLAB flapping. Do not reply with MinTLABSize if we have no space left in current region, make allocator to ask for another region. ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.cpp Changeset: c07dbebf60f9 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c07dbebf60f9 Fix object initialization in C2 ! src/share/vm/opto/macro.cpp Changeset: a4b8d20c15ef Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/a4b8d20c15ef C1 cleanup ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/share/vm/c1/c1_LIR.cpp ! src/share/vm/c1/c1_LIR.hpp ! src/share/vm/c1/c1_LIRGenerator.cpp Changeset: 6fb2ed4e97b9 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6fb2ed4e97b9 Fix shutdown/cancelled races. ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp Changeset: 456fcbf22594 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/456fcbf22594 Heap dump support ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Changeset: 135b06fb56f5 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/135b06fb56f5 Fix another Full GC trigger race ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.hpp Changeset: 5d2b541157fa Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5d2b541157fa Enable -XX:+HeapDump{Before|After}FullGC. ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp Changeset: 268d57171c9f Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/268d57171c9f Do more Full GC tries following the allocation failure ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp Changeset: dbad5da24efa Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/dbad5da24efa Add remaining unused free space to 'used' counter in free list. Makes heuristics more precise. ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp Changeset: 963893176ea7 Author: rkennke Date: 2017-01-04 13:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/963893176ea7 Fix MXBean Full GC notifications. ! src/share/vm/services/memoryManager.cpp ! src/share/vm/services/memoryManager.hpp ! src/share/vm/services/memoryService.cpp Changeset: 6b50d518992e Author: rkennke Date: 2017-01-04 13:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6b50d518992e JVMStat heap region counters ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp + src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionCounters.cpp + src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionCounters.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp ! src/share/vm/runtime/arguments.cpp Changeset: b991fdff1e7f Author: rkennke Date: 2017-01-04 13:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b991fdff1e7f Locked allocation ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp Changeset: 352e7275a860 Author: rkennke Date: 2017-01-04 13:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/352e7275a860 Fix freeze when running OOM during write barrier ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp Changeset: 17e523dc476c Author: rkennke Date: 2017-01-04 13:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/17e523dc476c More efficient heap expansion ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Changeset: 145137908d2f Author: rkennke Date: 2017-01-04 13:46 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/145137908d2f Degenerating concurrent marking ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp ! src/share/vm/utilities/taskqueue.cpp ! src/share/vm/utilities/taskqueue.hpp Changeset: 5fe3f645db28 Author: rkennke Date: 2017-01-04 13:46 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5fe3f645db28 Enable UseCountedLoopSafepoints with Shenandoah. ! src/share/vm/runtime/arguments.cpp Changeset: 9e21fa63bbf8 Author: rkennke Date: 2017-01-04 14:36 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9e21fa63bbf8 Improve AryEq instruction by avoiding false negatives with a Shenandoah cmp barrier ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp Changeset: 714dea8cd74c Author: rkennke Date: 2017-01-04 14:36 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/714dea8cd74c Refactor concurrent mark to be more inlineable. ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp From rkennke at redhat.com Wed Jan 4 16:28:29 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 04 Jan 2017 17:28:29 +0100 Subject: FYI: backported recent changes from jdk9 to jdk8u (was hg: shenandoah/jdk8u/hotspot: 19 new changesets) In-Reply-To: <201701041625.v04GPwSI000389@aojmv0008.oracle.com> References: <201701041625.v04GPwSI000389@aojmv0008.oracle.com> Message-ID: <1483547309.2654.3.camel@redhat.com> I backported all relevent changes and fixes from jdk9 to jdk8u. Tested by running specjvm and jcstress (sanity and quick). jdk8u and jdk9 should be in sync now for the Shenandoah parts, some experimental c2 changes notwithstanding. Roman Am Mittwoch, den 04.01.2017, 16:25 +0000 schrieb roman at kennke.org: > Changeset: 9fe66b8f9d19 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9f > e66b8f9d19 > > Avoid evacuation if concurrent GC was cancelled. Make sure Full GC is > able to recover. > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.c > pp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.h > pp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp > ! src/share/vm/runtime/thread.cpp > > Changeset: 1def7a9a30be > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/1d > ef7a9a30be > > Fix TLAB flapping. Do not reply with MinTLABSize if we have no space > left in current region, make allocator to ask for another region. > > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.cpp > > Changeset: c07dbebf60f9 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c0 > 7dbebf60f9 > > Fix object initialization in C2 > > ! src/share/vm/opto/macro.cpp > > Changeset: a4b8d20c15ef > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/a4 > b8d20c15ef > > C1 cleanup > > ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp > ! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp > ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp > ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp > ! src/share/vm/c1/c1_LIR.cpp > ! src/share/vm/c1/c1_LIR.hpp > ! src/share/vm/c1/c1_LIRGenerator.cpp > > Changeset: 6fb2ed4e97b9 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6f > b2ed4e97b9 > > Fix shutdown/cancelled races. > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp > > Changeset: 456fcbf22594 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/45 > 6fcbf22594 > > Heap dump support > > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > > Changeset: 135b06fb56f5 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/13 > 5b06fb56f5 > > Fix another Full GC trigger race > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > hpp > > Changeset: 5d2b541157fa > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5d > 2b541157fa > > Enable -XX:+HeapDump{Before|After}FullGC. > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.c > pp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.h > pp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp > > Changeset: 268d57171c9f > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/26 > 8d57171c9f > > Do more Full GC tries following the allocation failure > > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp > > Changeset: dbad5da24efa > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/db > ad5da24efa > > Add remaining unused free space to 'used' counter in free list. Makes > heuristics more precise. > > ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp > > Changeset: 963893176ea7 > Author:????rkennke > Date:??????2017-01-04 13:09 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/96 > 3893176ea7 > > Fix MXBean Full GC notifications. > > ! src/share/vm/services/memoryManager.cpp > ! src/share/vm/services/memoryManager.hpp > ! src/share/vm/services/memoryService.cpp > > Changeset: 6b50d518992e > Author:????rkennke > Date:??????2017-01-04 13:26 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6b > 50d518992e > > JVMStat heap region counters > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > + > src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionCounter > s.cpp > + > src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionCounter > s.hpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport > .cpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport > .hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp > ! src/share/vm/runtime/arguments.cpp > > Changeset: b991fdff1e7f > Author:????rkennke > Date:??????2017-01-04 13:26 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b9 > 91fdff1e7f > > Locked allocation > > ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.cpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp > > Changeset: 352e7275a860 > Author:????rkennke > Date:??????2017-01-04 13:26 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/35 > 2e7275a860 > > Fix freeze when running OOM during write barrier > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > > Changeset: 17e523dc476c > Author:????rkennke > Date:??????2017-01-04 13:26 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/17 > e523dc476c > > More efficient heap expansion > > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > > Changeset: 145137908d2f > Author:????rkennke > Date:??????2017-01-04 13:46 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/14 > 5137908d2f > > Degenerating concurrent marking > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.c > pp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.h > pp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cp > p > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.inline.hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.hpp > ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp > ! src/share/vm/utilities/taskqueue.cpp > ! src/share/vm/utilities/taskqueue.hpp > > Changeset: 5fe3f645db28 > Author:????rkennke > Date:??????2017-01-04 13:46 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5f > e3f645db28 > > Enable UseCountedLoopSafepoints with Shenandoah. > > ! src/share/vm/runtime/arguments.cpp > > Changeset: 9e21fa63bbf8 > Author:????rkennke > Date:??????2017-01-04 14:36 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9e > 21fa63bbf8 > > Improve AryEq instruction by avoiding false negatives with a > Shenandoah cmp barrier > > ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp > ! src/cpu/x86/vm/macroAssembler_x86.cpp > > Changeset: 714dea8cd74c > Author:????rkennke > Date:??????2017-01-04 14:36 +0100 > URL:???????http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/71 > 4dea8cd74c > > Refactor concurrent mark to be more inlineable. > > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cp > p > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.hp > p > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.in > line.hpp > ! > src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread. > cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp > ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp > ! > src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cp > p > From shade at redhat.com Wed Jan 4 22:23:28 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Jan 2017 23:23:28 +0100 Subject: RFR (M): Thread-local buffers for liveness data Message-ID: <2a44819d-a1a8-6f4b-5c89-821e64f513e9@redhat.com> Hi, We know from mark-compact performance work that liveness computation takes a non-negligible part of marking time. If you look into profiles for the application with large dataset, then you can clearly see the atomic "lock xadd" from SHRegion::increase_live_data in hotspots. It is a hotspot for both plain latency and contention reasons, even on a moderately sized x86. Let's upgrade the one-slot cache into the full-blown thread-local buffers for liveness data: http://cr.openjdk.java.net/~shade/shenandoah/liveness-threadlocal/webrev.01/ Observations: a) One-slot cache gives ~20-40% cache hit rate on most workloads. Which means every second object does the atomic xadd. My attempts in doing smarter N-slot/history caching were not fruitful: the long tail flaps happily all over the place. b) size_t and jint are overkill for the table. Each thread would potentially touch ${regions}*${sizeof(element)}-sized local table. On my machine, 2K size_t adds up to 16KB, which is half of L1. With jushort, it is only 4KB. In reality, most threads would touch only a few elements, and touch the atomic add on rare overflows. c) Switching live_data from bytes to HeapWords helps to expand the buffering capacity. d) With 8 threads, we take up 4*8 = +32KB of additional space. I would expect that our region count to grow sub-linearly with thread counts, and so for 128 threads, it would be +512KB for all threads. e) Performance-wise, SPECjvm2008 is not affected (LDS is way too low); f) Mark tests that retain large object graphs benefit a lot. With "aggressive" heuristics, and large tree with 10M nodes: Baseline, conc mark times: 35.99 s (avg = 105.24 ms) (num = 342) 35.90 s (avg = 108.47 ms) (num = 331) 35.98 s (avg = 103.69 ms) (num = 347) 36.08 s (avg = 104.89 ms) (num = 344) 36.09 s (avg = 104.90 ms) (num = 344) Patched, conc mark times: 33.68 s (avg = 83.37 ms) (num = 404) 33.69 s (avg = 84.64 ms) (num = 398) 33.67 s (avg = 83.77 ms) (num = 402) 33.71 s (avg = 82.01 ms) (num = 411) 33.65 s (avg = 85.41 ms) (num = 394) (lower times => more frequent marks under "aggressive") Testing: hotspot_gc_shenandoah, SPECjvm2008, targeted benchmarks Thanks, -Aleksey From rkennke at redhat.com Thu Jan 5 10:44:33 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 05 Jan 2017 11:44:33 +0100 Subject: RFR (M): Thread-local buffers for liveness data In-Reply-To: <2a44819d-a1a8-6f4b-5c89-821e64f513e9@redhat.com> References: <2a44819d-a1a8-6f4b-5c89-821e64f513e9@redhat.com> Message-ID: <1483613073.2654.6.camel@redhat.com> Good work, please push! Roman Am Mittwoch, den 04.01.2017, 23:23 +0100 schrieb Aleksey Shipilev: > Hi, > > We know from mark-compact performance work that liveness computation > takes a > non-negligible part of marking time. > > If you look into profiles for the application with large dataset, > then you can > clearly see the atomic "lock xadd" from SHRegion::increase_live_data > in > hotspots. It is a hotspot for both plain latency and contention > reasons, even on > a moderately sized x86. > > Let's upgrade the one-slot cache into the full-blown thread-local > buffers for > liveness data: > ? http://cr.openjdk.java.net/~shade/shenandoah/liveness-threadlocal/w > ebrev.01/ > > Observations: > > ?a) One-slot cache gives ~20-40% cache hit rate on most workloads. > Which means > every second object does the atomic xadd. My attempts in doing > smarter > N-slot/history caching were not fruitful: the long tail flaps happily > all over > the place. > > ?b) size_t and jint are overkill for the table. Each thread would > potentially > touch ${regions}*${sizeof(element)}-sized local table. On my machine, > 2K size_t > adds up to 16KB, which is half of L1. With jushort, it is only 4KB. > In reality, > most threads would touch only a few elements, and touch the atomic > add on rare > overflows. > > ?c) Switching live_data from bytes to HeapWords helps to expand the > buffering > capacity. > > ?d) With 8 threads, we take up 4*8 = +32KB of additional space. I > would expect > that our region count to grow sub-linearly with thread counts, and so > for 128 > threads, it would be +512KB for all threads. > > ?e) Performance-wise, SPECjvm2008 is not affected (LDS is way too > low); > > ?f) Mark tests that retain large object graphs benefit a lot. With > "aggressive" > heuristics, and large tree with 10M nodes: > > Baseline, conc mark times: > ? 35.99 s (avg =???105.24 ms)??(num =???342) > ? 35.90 s (avg =???108.47 ms)??(num =???331) > ? 35.98 s (avg =???103.69 ms)??(num =???347) > ? 36.08 s (avg =???104.89 ms)??(num =???344) > ? 36.09 s (avg =???104.90 ms)??(num =???344) > > Patched, conc mark times: > ? 33.68 s (avg =????83.37 ms)??(num =???404) > ? 33.69 s (avg =????84.64 ms)??(num =???398) > ? 33.67 s (avg =????83.77 ms)??(num =???402) > ? 33.71 s (avg =????82.01 ms)??(num =???411) > ? 33.65 s (avg =????85.41 ms)??(num =???394) > > (lower times => more frequent marks under "aggressive") > > Testing: hotspot_gc_shenandoah, SPECjvm2008, targeted benchmarks > > Thanks, > -Aleksey > From ashipile at redhat.com Thu Jan 5 11:34:15 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 05 Jan 2017 11:34:15 +0000 Subject: hg: shenandoah/jdk9/hotspot: Thread-local buffers for liveness data. Message-ID: <201701051134.v05BYFVV001061@aojmv0008.oracle.com> Changeset: 44d762e94dfd Author: shade Date: 2017-01-05 12:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/44d762e94dfd Thread-local buffers for liveness data. ! src/share/vm/gc/shenandoah/shenandoahCollectionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahFreeSet.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionCounters.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp From shade at redhat.com Fri Jan 6 15:01:14 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 6 Jan 2017 16:01:14 +0100 Subject: RFR (S): Percentile levels in -Xlog:gc+stats Message-ID: <0c2f7384-c415-0658-0298-aa5e0a856177@redhat.com> Hi, The non-normality in phase times make average times in our gc+stats log confusing. For example, can you trust this line? Concurrent Marking Times = 18.18 s (avg = 142.02 ms) (num = 128, ... You can't, because there were two very different phases in workload lifetime: the initial burst of short concmarks when app is initializing, and then the steady state concmarks on stable LDS. To identify these cases in the stats, we are better off reporting the n-quantile levels to get the immediate "feel" of the distribution we are looking at. Webrev: http://cr.openjdk.java.net/~shade/shenandoah/stats-percentiles/webrev.01/ This is a full line in patched version: Concurrent Marking Times = 18.18 s (avg = 142018 us) (num = 128, lvls (10% step, us) = 787, 858, 960, 2660, 4440, 4830, 5830, 7880, 9600, 2533512) Notice the distribution skew in levels. This is the line that is more trustable: Concurrent Marking Times = 15.16 s (avg = 63693 us) (num = 238, lvls (10% step, us) = 291, 524, 615, 772, 1000, 1600, 186000, 197000, 199000, 228671) And this looks very solid: Concurrent Marking Times = 1.80 s (avg = 179735 us) (num = 10, lvls (10% step, us) = 174000, 176000, 176000, 176000, 177000, 180000, 180000, 181000, ... Switching to microseconds instead of milliseconds helps to get more fidelity in sub-ms pause times. Testing: hotspot_gc_shenandoah, selected benchmarks Thanks, -Aleksey From rkennke at redhat.com Mon Jan 9 10:48:08 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 09 Jan 2017 11:48:08 +0100 Subject: RFR (S): Percentile levels in -Xlog:gc+stats In-Reply-To: <0c2f7384-c415-0658-0298-aa5e0a856177@redhat.com> References: <0c2f7384-c415-0658-0298-aa5e0a856177@redhat.com> Message-ID: <1483958888.2647.3.camel@redhat.com> Sounds good in general. Maybe instead of multiplying by 1000, measure more precisely? Can't we have both max/SD and percentile stats? Or does it not make sense at all to see max/sd? Roman Am Freitag, den 06.01.2017, 16:01 +0100 schrieb Aleksey Shipilev: > Hi, > > The non-normality in phase times make average times in our gc+stats > log > confusing. For example, can you trust this line? > > ?Concurrent Marking Times??= 18.18 s (avg =???142.02 ms)??(num > =???128, ... > > You can't, because there were two very different phases in workload > lifetime: > the initial burst of short concmarks when app is initializing, and > then the > steady state concmarks on stable LDS. To identify these cases in the > stats, we > are better off reporting the n-quantile levels to get the immediate > "feel" of > the distribution we are looking at. > > Webrev: > ?http://cr.openjdk.java.net/~shade/shenandoah/stats-percentiles/webre > v.01/ > > This is a full line in patched version: > > ?Concurrent Marking Times??= 18.18 s (avg =???142018 us) > ? (num =???128, lvls (10% step, us) = > ??????787, 858, 960, 2660, 4440, 4830, 5830, 7880, 9600, 2533512) > > Notice the distribution skew in levels. > > This is the line that is more trustable: > > ? Concurrent Marking Times??= 15.16 s (avg =????63693 us) > ????(num =???238, lvls (10% step, us) = > ???????291, 524,??615, 772, 1000, 1600, 186000, 197000, 199000, > 228671) > > And this looks very solid: > > ? Concurrent Marking Times???= 1.80 s (avg =???179735 us) > ????(num =????10, lvls (10% step, us) = > ???????174000, 176000, 176000, 176000, 177000, 180000, 180000, > 181000, ... > > Switching to microseconds instead of milliseconds helps to get more > fidelity in > sub-ms pause times. > > Testing: hotspot_gc_shenandoah, selected benchmarks > > Thanks, > -Aleksey > From shade at redhat.com Mon Jan 9 11:56:51 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Jan 2017 12:56:51 +0100 Subject: RFR (S): Percentile levels in -Xlog:gc+stats In-Reply-To: <1483958888.2647.3.camel@redhat.com> References: <0c2f7384-c415-0658-0298-aa5e0a856177@redhat.com> <1483958888.2647.3.camel@redhat.com> Message-ID: On 01/09/2017 11:48 AM, Roman Kennke wrote: > Maybe instead of multiplying by 1000, measure more precisely? We are measuring precisely: the counters reply in floating-point seconds, so multiplying to 1K or 1M gives you integral milliseconds and microseconds, respectively. HDR storage would coarsen, though, to get footprint advantages. > Can't we have both max/SD and percentile stats? Or does it not make > sense at all to see max/sd? SD is useless for non-normal distributions. Well, average is misleading too, unless levels say the distribution looks uniform/normal enough. The last level was supposed to be maximum, but it might be not evident from the logging. Changed it to explicit "max": Concurrent Marking Times = 15.24 s (avg = 64594 us) (num = 236, lvls (10% step, us) = 273, 463, 609, 863, 1055, 1895, 185547, 199219, 199219, max = 204852) Is this better? Ok to push? Thanks, -Aleksey From rwestrel at redhat.com Mon Jan 9 13:04:22 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Mon, 09 Jan 2017 13:04:22 +0000 Subject: hg: shenandoah/jdk9/hotspot: loop opts of write barriers once expanded Message-ID: <201701091304.v09D4MiP005266@aojmv0008.oracle.com> Changeset: c46f0b378ff1 Author: roland Date: 2017-01-03 13:42 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c46f0b378ff1 loop opts of write barriers once expanded ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/ifnode.cpp ! src/share/vm/opto/loopTransform.cpp ! src/share/vm/opto/loopUnswitch.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/loopopts.cpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/opto/shenandoahSupport.hpp From rkennke at redhat.com Mon Jan 9 13:31:36 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 09 Jan 2017 14:31:36 +0100 Subject: RFR (S): Percentile levels in -Xlog:gc+stats In-Reply-To: References: <0c2f7384-c415-0658-0298-aa5e0a856177@redhat.com> <1483958888.2647.3.camel@redhat.com> Message-ID: <1483968696.2647.7.camel@redhat.com> Am Montag, den 09.01.2017, 12:56 +0100 schrieb Aleksey Shipilev: > On 01/09/2017 11:48 AM, Roman Kennke wrote: > > Maybe instead of multiplying by 1000, measure more precisely? > > We are measuring precisely: the counters reply in floating-point > seconds, so > multiplying to 1K or 1M gives you integral milliseconds and > microseconds, > respectively. HDR storage would coarsen, though, to get footprint > advantages. I was suspicious about those numbers: ? Concurrent Marking Times???= 1.80 s (avg =???179735 us) ????(num =????10, lvls (10% step, us) = ???????174000, 176000, 176000, 176000, 177000, 180000, 180000, 181000, ... > > Can't we have both max/SD and percentile stats? Or does it not make > > sense at all to see max/sd? > > SD is useless for non-normal distributions. Well, average is > misleading too, > unless levels say the distribution looks uniform/normal enough. The > last level > was supposed to be maximum, but it might be not evident from the > logging. > Changed it to explicit "max": > > ? Concurrent Marking Times???=????15.24 s (avg =????64594 us) > ????(num =???236, lvls (10% step, us) = > ???????273, 463, 609, 863, 1055, 1895, 185547, 199219, 199219, max = > 204852) > > Is this better? Ok to push? OK. Roman From ashipile at redhat.com Mon Jan 9 13:39:15 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 09 Jan 2017 13:39:15 +0000 Subject: hg: shenandoah/jdk9/hotspot: Percentile levels in -Xlog:gc+stats. Message-ID: <201701091339.v09DdFvf021313@aojmv0008.oracle.com> Changeset: 5bde5cc33911 Author: shade Date: 2017-01-09 14:39 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5bde5cc33911 Percentile levels in -Xlog:gc+stats. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/utilities/numberSeq.cpp ! src/share/vm/utilities/numberSeq.hpp From shade at redhat.com Tue Jan 10 09:12:46 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 10 Jan 2017 10:12:46 +0100 Subject: Reserved space in GC/TLABs Message-ID: <77857ebf-343e-4b75-c47e-2ff5a9ca9040@redhat.com> Hi, So I've been chasing a weird bimodal behavior in our mark tests. The test retains the large immutable tree in the heap, and allocates objects around it. The intent is that GC would clean up new allocations, after marking the immutable tree. However, with "aggressive" heuristics, there are two clear phases in workload life: a) Choosing the regions with the immutable tree, and evacuating it. This takes a long time for a large tree; b) Choosing a few (two, basically) regions to promote the current chunk of new roots -- probably the objects that happened to be temporarily live at safepoint. The difference in performance between (a) and (b) are drastic. Looking closely at (a), we can notice that otherwise immutable regions are chosen because there used-live = 576 words: [1.205s][info][gc] Choose region 33 with garbage = 576 and live = 4193728 "aggressive" chooses any region with garbage > 0. After digging around, I realized those 576 words are the TLAB reserved space! (Our promotion goes through GCLAB, which is technically the same thing). It goes down once I tune -XX:AllocatePrefetchLines and friends. The comment in ThreadLocalAllocBuffer::startup_initialization() mentions this reserved space is needed to avoid faulting for going beyond the heap. (Why this is not in GC storage management code, but handled for each TLAB, is beyond me at this point). But perhaps a more interesting question is why phase (b) is immune to this. The short answer is because mark-compact _ignores_ all those reserved space things, because it does not deal with GCLABs at all. So, there are few issues: 1) GC/TLAB reserved space means every region has "garbage", which has heuristics implications. We can check for "garbage > TLAB::alignment_reserve()" in "aggressive"? 2) More concerning: after mark-compact we can read beyound the heap, if we are alloc-prefetching at the last region, near its end. We probably want to reserve a region at the end of the heap to handle this? Thanks, -Aleksey From rkennke at redhat.com Tue Jan 10 09:36:34 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 10 Jan 2017 10:36:34 +0100 Subject: Reserved space in GC/TLABs In-Reply-To: <77857ebf-343e-4b75-c47e-2ff5a9ca9040@redhat.com> References: <77857ebf-343e-4b75-c47e-2ff5a9ca9040@redhat.com> Message-ID: <1484040994.2566.1.camel@redhat.com> Am Dienstag, den 10.01.2017, 10:12 +0100 schrieb Aleksey Shipilev: > Hi, > > So I've been chasing a weird bimodal behavior in our mark tests. The > test > retains the large immutable tree in the heap, and allocates objects > around it. > The intent is that GC would clean up new allocations, after marking > the > immutable tree. > > However, with "aggressive" heuristics, there are two clear phases in > workload life: > ? a) Choosing the regions with the immutable tree, and evacuating it. > This takes > a long time for a large tree; > ? b) Choosing a few (two, basically) regions to promote the current > chunk of new > roots -- probably the objects that happened to be temporarily live at > safepoint. > > The difference in performance between (a) and (b) are drastic. > Looking closely > at (a), we can notice that otherwise immutable regions are chosen > because there > used-live = 576 words: > > [1.205s][info][gc] Choose region 33 with garbage = 576 and live = > 4193728 > > "aggressive" chooses any region with garbage > 0. > > After digging around, I realized those 576 words are the TLAB > reserved space! > (Our promotion goes through GCLAB, which is technically the same > thing). It goes > down once I tune -XX:AllocatePrefetchLines and friends. The comment > in > ThreadLocalAllocBuffer::startup_initialization() mentions this > reserved space is > needed to avoid faulting for going beyond the heap. (Why this is not > in GC > storage management code, but handled for each TLAB, is beyond me at > this point). > > But perhaps a more interesting question is why phase (b) is immune to > this. The > short answer is because mark-compact _ignores_ all those reserved > space things, > because it does not deal with GCLABs at all. > > So, there are few issues: > > ?1) GC/TLAB reserved space means every region has "garbage", which > has > heuristics implications. We can check for "garbage > > TLAB::alignment_reserve()" > in "aggressive"? > > ?2) More concerning: after mark-compact we can read beyound the heap, > if we are > alloc-prefetching at the last region, near its end. We probably want > to reserve > a region at the end of the heap to handle this? Is it only an issue with aggressive heuristics? I wouldn't be too concerned about that, aggressive is meant to be used for testing only, and explicitely made to give the GC a hard time (e.g. collect as much as possible, as much of the time as possible, and generally do random- ish things to exercise code paths that are otherwise rarely used) Roman From shade at redhat.com Wed Jan 11 09:17:21 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 10:17:21 +0100 Subject: Reserved space in GC/TLABs In-Reply-To: <1484040994.2566.1.camel@redhat.com> References: <77857ebf-343e-4b75-c47e-2ff5a9ca9040@redhat.com> <1484040994.2566.1.camel@redhat.com> Message-ID: <07d8c1a9-298d-4d1a-208c-643b46f23e45@redhat.com> On 01/10/2017 10:36 AM, Roman Kennke wrote: >> So, there are few issues: >> >> 1) GC/TLAB reserved space means every region has "garbage", which has >> heuristics implications. We can check for "garbage > >> TLAB::alignment_reserve()" in "aggressive"? >> >> 2) More concerning: after mark-compact we can read beyound the heap, if we >> are alloc-prefetching at the last region, near its end. We probably want to >> reserve a region at the end of the heap to handle this? > > Is it only an issue with aggressive heuristics? I wouldn't be too concerned > about that, aggressive is meant to be used for testing only, and explicitely > made to give the GC a hard time (e.g. collect as much as possible, as much of > the time as possible, and generally do random- ish things to exercise code > paths that are otherwise rarely used) My major concern is that "garbage() == 0" condition is not enough to disambiguate the regions with no garbage at all. I can see the heuristics that would not touch the completely live regions, even under drastic conditions (e.g. all other regions are only 99% full). Either interpretation of "aggressive" is only half-way: a) "aggressive" as testing strategy: in the workload example above, there are two distinct phases in workload, roughly "before mark-compact" and "after mark-compact". Before mark-compact we always evac "full" regions because garbage is not zero due to GCLAB allocation. After mark-compact we never evac the full regions because mark-compact plunged the garbage holes. If we want "aggressive" to be the testing strategy, then I would say we need to evac the full regions always. This amounts to changing "garbage() > 0" to, say, "garbage() > 0 || live() > 0". b) "aggressive" as product strategy (I can see how that can be useful to workaround late concmark starts at the expense of performance): in this mode, we can save quite a few cycles by not considering full regions for collection set. This amounts to changing "garbage() > 0" to handle the GCLAB waste. Thanks, -Aleksey From rkennke at redhat.com Wed Jan 11 11:05:40 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jan 2017 12:05:40 +0100 Subject: Reserved space in GC/TLABs In-Reply-To: <07d8c1a9-298d-4d1a-208c-643b46f23e45@redhat.com> References: <77857ebf-343e-4b75-c47e-2ff5a9ca9040@redhat.com> <1484040994.2566.1.camel@redhat.com> <07d8c1a9-298d-4d1a-208c-643b46f23e45@redhat.com> Message-ID: <1484132740.2566.5.camel@redhat.com> Am Mittwoch, den 11.01.2017, 10:17 +0100 schrieb Aleksey Shipilev: > On 01/10/2017 10:36 AM, Roman Kennke wrote: > > > So, there are few issues: > > > > > > 1) GC/TLAB reserved space means every region has "garbage", which > > > has? > > > heuristics implications. We can check for "garbage >? > > > TLAB::alignment_reserve()" in "aggressive"? > > > > > > 2) More concerning: after mark-compact we can read beyound the > > > heap, if we > > > are alloc-prefetching at the last region, near its end. We > > > probably want to > > > reserve a region at the end of the heap to handle this? > > > > Is it only an issue with aggressive heuristics? I wouldn't be too > > concerned > > about that, aggressive is meant to be used for testing only, and > > explicitely > > made to give the GC a hard time (e.g. collect as much as possible, > > as much of > > the time as possible, and generally do random- ish things to > > exercise code > > paths that are otherwise rarely used) > > My major concern is that "garbage() == 0" condition is not enough to > disambiguate the regions with no garbage at all. Yes ok. I don't see what we can do when a TLAB has just been started but not used yet. At the end of marking, we 'finalize' all active tlabs, i.e. fill them with a filler object and close them. Maybe it's possible to recognize when a TLAB is still empty, and reclaim it instead? > I can see the heuristics that > would not touch the completely live regions, even under drastic > conditions (e.g. > all other regions are only 99% full). > > Either interpretation of "aggressive" is only half-way: > > ?a) "aggressive" as testing strategy: in the workload example above, > there are > two distinct phases in workload, roughly "before mark-compact" and > "after > mark-compact". Before mark-compact we always evac "full" regions > because garbage > is not zero due to GCLAB allocation. After mark-compact we never evac > the full > regions because mark-compact plunged the garbage holes. If we want > "aggressive" > to be the testing strategy, then I would say we need to evac the full > regions > always. This amounts to changing "garbage() > 0" to, say, "garbage() > > 0 || > live() > 0". > > ?b) "aggressive" as product strategy (I can see how that can be > useful to > workaround late concmark starts at the expense of performance): in > this mode, we > can save quite a few cycles by not considering full regions for > collection set. > This amounts to changing "garbage() > 0" to handle the GCLAB waste. Yes, I've been thinking about that too. 'Aggressive' might be a misnomer and should probably be named 'testing' or such? Roman From rkennke at redhat.com Wed Jan 11 11:35:40 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jan 2017 12:35:40 +0100 Subject: RFR: Fix another deadlock with oom_during_evacuation() Message-ID: <1484134540.2566.8.camel@redhat.com> I encounter another deadlock involving oom_during_evacuation(): 1. One (or more) Java threads get into oom_during_evacuation(), waiting for _evacuation_in_progress to become false. 2. Some other thread(s) tries to execute a VM task (non-Shenandoah). The safepoint-begin protocol acquires the Threads_lock. 3. The ShenandoahConcurrentThread tries to turn off evacuation, and this attempts to acquire the Threads_lock too, but can't because of 2. 2 is waiting for 1 to get to a safepoint. 1 cannot get there as long as 3 hasn't turned off evacuation. -> deadlock My solution is to set the _evacuation_in_progress flag to false without the Threads_lock. Threads_lock is only required when turning off the thread-local flag. This allows 1 to proceed and get to a safepoint, and thus resolve the deadlock. http://cr.openjdk.java.net/~rkennke/fixoomdeadlock/webrev.00/ Ok to push? I would also like to push this to jdk8 right away. Roman From shade at redhat.com Wed Jan 11 12:51:31 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 13:51:31 +0100 Subject: RFR: Fix another deadlock with oom_during_evacuation() In-Reply-To: <1484134540.2566.8.camel@redhat.com> References: <1484134540.2566.8.camel@redhat.com> Message-ID: On 01/11/2017 12:35 PM, Roman Kennke wrote: > I encounter another deadlock involving oom_during_evacuation(): > > 1. One (or more) Java threads get into oom_during_evacuation(), waiting > for _evacuation_in_progress to become false. > 2. Some other thread(s) tries to execute a VM task (non-Shenandoah). > The safepoint-begin protocol acquires the Threads_lock. > 3. The ShenandoahConcurrentThread tries to turn off evacuation, and > this attempts to acquire the Threads_lock too, but can't because of 2. > 2 is waiting for 1 to get to a safepoint. 1 cannot get there as long as > 3 hasn't turned off evacuation. -> deadlock > > My solution is to set the _evacuation_in_progress flag to false without > the Threads_lock. Threads_lock is only required when turning off the > thread-local flag. This allows 1 to proceed and get to a safepoint, and > thus resolve the deadlock. > > http://cr.openjdk.java.net/~rkennke/fixoomdeadlock/webrev.00/ > > Ok to push? I would also like to push this to jdk8 right away. Ok. Thanks, -Aleksey From roman at kennke.org Wed Jan 11 14:50:39 2017 From: roman at kennke.org (roman at kennke.org) Date: Wed, 11 Jan 2017 14:50:39 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix another deadlock with oom_during_evacuation() Message-ID: <201701111450.v0BEodvr019585@aojmv0008.oracle.com> Changeset: 199e8a7f598b Author: rkennke Date: 2017-01-11 15:50 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/199e8a7f598b Fix another deadlock with oom_during_evacuation() ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp From roman at kennke.org Wed Jan 11 14:52:39 2017 From: roman at kennke.org (roman at kennke.org) Date: Wed, 11 Jan 2017 14:52:39 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Fix another deadlock with oom_during_evacuation() Message-ID: <201701111452.v0BEqdNO020068@aojmv0008.oracle.com> Changeset: 835e79217215 Author: rkennke Date: 2017-01-11 15:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/835e79217215 Fix another deadlock with oom_during_evacuation() ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp From shade at redhat.com Wed Jan 11 16:19:20 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 17:19:20 +0100 Subject: RFR (XS): Avoid double-touching for array headers during mark Message-ID: <5dd51abe-3a3d-cabb-3f03-10335a083ead@redhat.com> Hi, There is a tiny micro-optimization in our marking code. We don't need to touch the array header before submitting the chunked array processing. G1 already does this trick [1], don't see why we should not do the same: http://cr.openjdk.java.net/~shade/shenandoah/arrays-double-header/webrev.01/ Testing: hotspot_gc_shenandoah If you create the large tree with Node[2] {left, right} arrays in each node, then the marking times are significantly improved: Before: Concurrent Marking Times = 23.55 s (avg = 1121586 us) (num = 21, lvls (10% step, us) = 1015625, 1132812, 1132812, 1132812, 1152344, 1152344, 1171875, 1171875, 1171875, max = 1231954) After: Concurrent Marking Times = 22.10 s (avg = 1004685 us) (num = 22, lvls (10% step, us) = 917969, 996094, 996094, 1015625, 1015625, 1015625, 1035156, 1035156, 1035156, max = 1096716) (This, BTW, tells a bad story about re-touching klasses for objects in heap, which may be another source of improvements) Thanks, -Aleksey [1] http://hg.openjdk.java.net/jdk9/dev/hotspot/file/31f1d26c60df/src/share/vm/gc/g1/g1ParScanThreadState.inline.hpp#l104 From rkennke at redhat.com Wed Jan 11 16:34:51 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jan 2017 17:34:51 +0100 Subject: RFR (XS): Avoid double-touching for array headers during mark In-Reply-To: <5dd51abe-3a3d-cabb-3f03-10335a083ead@redhat.com> References: <5dd51abe-3a3d-cabb-3f03-10335a083ead@redhat.com> Message-ID: <1484152491.2566.21.camel@redhat.com> Looks good to me. Roman Am Mittwoch, den 11.01.2017, 17:19 +0100 schrieb Aleksey Shipilev: > Hi, > > There is a tiny micro-optimization in our marking code. We don't need > to touch > the array header before submitting the chunked array processing. G1 > already does > this trick [1], don't see why we should not do the same: > ? http://cr.openjdk.java.net/~shade/shenandoah/arrays-double-header/w > ebrev.01/ > > Testing: hotspot_gc_shenandoah > > If you create the large tree with Node[2] {left, right} arrays in > each node, > then the marking times are significantly improved: > > Before: > ?Concurrent Marking Times =???23.55 s (avg =??1121586 us) > ????(num =????21, lvls (10% step, us) = > ??????1015625, 1132812, 1132812, 1132812, 1152344, > ??????1152344, 1171875, 1171875, 1171875, max =??1231954) > > After: > ?Concurrent Marking Times??=??22.10 s (avg =??1004685 us) > ????(num =????22, lvls (10% step, us) = > ???????917969,??996094, 996094,??1015625, 1015625, > ??????1015625, 1035156, 1035156, 1035156, max =??1096716) > > (This, BTW, tells a bad story about re-touching klasses for objects > in heap, > which may be another source of improvements) > > Thanks, > -Aleksey > > [1] > http://hg.openjdk.java.net/jdk9/dev/hotspot/file/31f1d26c60df/src/sha > re/vm/gc/g1/g1ParScanThreadState.inline.hpp#l104 > From ashipile at redhat.com Wed Jan 11 17:30:14 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 11 Jan 2017 17:30:14 +0000 Subject: hg: shenandoah/jdk9/hotspot: Avoid double-touching array headers during mark. Message-ID: <201701111730.v0BHUEAo000750@aojmv0008.oracle.com> Changeset: 87e70319cea2 Author: shade Date: 2017-01-11 18:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/87e70319cea2 Avoid double-touching array headers during mark. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp From rkennke at redhat.com Wed Jan 11 17:35:08 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jan 2017 18:35:08 +0100 Subject: RFR: Print heap addresses in hs_err Message-ID: <1484156108.2566.26.camel@redhat.com> this change makes ShenandoahHeap::print_on() also print the start and end of the heap, plus the information from VirtualSpace. I adapted the latter from VirtualSpace::print_on() instead of calling it because it's #ifndef PRODUCT. It now prints this info: Heap Shenandoah Heap total = 4194304 K, used 212992 K??[0x00000006c0000000, 0x00000007c0000000) Region size = 2048K? Virtual space: ?- committed: 4294967296 ?- reserved:??4294967296 ?- [low, high]:?????[0x00000006c0000000, 0x00000007c0000000] ?- [low_b, high_b]: [0x00000006c0000000, 0x00000007c0000000] http://cr.openjdk.java.net/~rkennke/printheapaddrs/webrev.02/ Ok to push? From shade at redhat.com Wed Jan 11 17:37:02 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 18:37:02 +0100 Subject: RFR: Print heap addresses in hs_err In-Reply-To: <1484156108.2566.26.camel@redhat.com> References: <1484156108.2566.26.camel@redhat.com> Message-ID: On 01/11/2017 06:35 PM, Roman Kennke wrote: > this change makes ShenandoahHeap::print_on() also print the start and > end of the heap, plus the information from VirtualSpace. I adapted the > latter from VirtualSpace::print_on() instead of calling it because it's > #ifndef PRODUCT. It now prints this info: > > > Heap > Shenandoah Heap total = 4194304 K, used 212992 K [0x00000006c0000000, > 0x00000007c0000000) Region size = 2048K > Virtual space: > - committed: 4294967296 > - reserved: 4294967296 > - [low, high]: [0x00000006c0000000, 0x00000007c0000000] > - [low_b, high_b]: [0x00000006c0000000, 0x00000007c0000000] > > > http://cr.openjdk.java.net/~rkennke/printheapaddrs/webrev.02/ > > Ok to push? Ok. -Aleksey From roman at kennke.org Wed Jan 11 17:48:49 2017 From: roman at kennke.org (roman at kennke.org) Date: Wed, 11 Jan 2017 17:48:49 +0000 Subject: hg: shenandoah/jdk9/hotspot: Print heap start/end addresses in hs_err. Message-ID: <201701111748.v0BHmnOA005552@aojmv0008.oracle.com> Changeset: 6ba6eea573b7 Author: rkennke Date: 2017-01-11 18:48 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6ba6eea573b7 Print heap start/end addresses in hs_err. ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp From shade at redhat.com Wed Jan 11 18:40:18 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 19:40:18 +0100 Subject: RFR (S): Replace VirtualSpace-based pretouch with region-based one Message-ID: <944dcfe1-89d4-eafe-4a83-56d08ee9077d@redhat.com> Hi, This leverages our existing region infrastructure, and thus makes the whole thing less susceptible for errors: http://cr.openjdk.java.net/~shade/shenandoah/always-pretouch-per-region/webrev.01/ Testing: hotspot_gc_shenandoah, custom runs with release/fastdebug Thanks, -Aleksey From rkennke at redhat.com Wed Jan 11 18:56:40 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jan 2017 19:56:40 +0100 Subject: RFR (S): Replace VirtualSpace-based pretouch with region-based one In-Reply-To: <944dcfe1-89d4-eafe-4a83-56d08ee9077d@redhat.com> References: <944dcfe1-89d4-eafe-4a83-56d08ee9077d@redhat.com> Message-ID: <1484161000.2566.29.camel@redhat.com> Looks good to me. Roman Am Mittwoch, den 11.01.2017, 19:40 +0100 schrieb Aleksey Shipilev: > Hi, > > This leverages our existing region infrastructure, and thus makes the > whole > thing less susceptible for errors: > ? http://cr.openjdk.java.net/~shade/shenandoah/always-pretouch-per-re > gion/webrev.01/ > > Testing: hotspot_gc_shenandoah, custom runs with release/fastdebug > > Thanks, > -Aleksey > From ashipile at redhat.com Wed Jan 11 20:25:18 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 11 Jan 2017 20:25:18 +0000 Subject: hg: shenandoah/jdk9/hotspot: Replace VirtualSpace-based pretouch with region-based one. Message-ID: <201701112025.v0BKPIeZ015383@aojmv0008.oracle.com> Changeset: c813c2175488 Author: shade Date: 2017-01-11 21:25 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c813c2175488 Replace VirtualSpace-based pretouch with region-based one. ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp From shade at redhat.com Wed Jan 11 22:20:15 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jan 2017 23:20:15 +0100 Subject: RFR (S): ObjArrayFromToTask -> SCMTask Message-ID: Hi, I would like to properly alias ObjArrayFromToTask as SCMTask: http://cr.openjdk.java.net/~shade/shenandoah/mark-scmtask/webrev.01/ This also introduces a few other renames. Plus a micro-optimization for do_task that accepts SCMTask now, not the exploded fields. This matters since do_task is large enough to be denied inlining: passing three arguments instead of one adds up for many tasks to process. This patch is the per-requisite for the larger follow-up patch with ObjArrayFromToTask performance improvements. Testing: hotspot_gc_shenandoah, selected benchmarks Thanks, -Aleksey From rwestrel at redhat.com Thu Jan 12 08:31:00 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Thu, 12 Jan 2017 08:31:00 +0000 Subject: hg: shenandoah/jdk8u/hotspot: 8161147: jvm crashes when -XX:+UseCountedLoopSafepoints is enabled Message-ID: <201701120831.v0C8V0Zv020844@aojmv0008.oracle.com> Changeset: 72a422e2fc2e Author: roland Date: 2016-07-25 14:31 -0700 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/72a422e2fc2e 8161147: jvm crashes when -XX:+UseCountedLoopSafepoints is enabled Summary: don't convert loop with safepoint on the backedge to Counted loop Reviewed-by: kvn ! src/share/vm/opto/loopnode.cpp + test/compiler/loopopts/TestCountedLoopSafepointBackedge.java From rkennke at redhat.com Thu Jan 12 09:31:23 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Jan 2017 10:31:23 +0100 Subject: RFR (S): ObjArrayFromToTask -> SCMTask In-Reply-To: References: Message-ID: <1484213483.2566.30.camel@redhat.com> Am Mittwoch, den 11.01.2017, 23:20 +0100 schrieb Aleksey Shipilev: > Hi, > > I would like to properly alias ObjArrayFromToTask as SCMTask: > ?http://cr.openjdk.java.net/~shade/shenandoah/mark-scmtask/webrev.01/ > > This also introduces a few other renames. Plus a micro-optimization > for do_task > that accepts SCMTask now, not the exploded fields. This matters since > do_task is > large enough to be denied inlining: passing three arguments instead > of one adds > up for many tasks to process. This patch is the per-requisite for the > larger > follow-up patch with ObjArrayFromToTask performance improvements. > > Testing: hotspot_gc_shenandoah, selected benchmarks Ok! Roman From ashipile at redhat.com Thu Jan 12 09:33:59 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 12 Jan 2017 09:33:59 +0000 Subject: hg: shenandoah/jdk9/hotspot: Alias ObjArrayFromToTask -> SCMTask. Message-ID: <201701120933.v0C9XxFP004840@aojmv0008.oracle.com> Changeset: b72ae64a946b Author: shade Date: 2017-01-12 10:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b72ae64a946b Alias ObjArrayFromToTask -> SCMTask. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahTaskqueue.hpp From ashipile at redhat.com Fri Jan 13 15:52:22 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 13 Jan 2017 15:52:22 +0000 Subject: hg: shenandoah/jdk9/hotspot: Cherry-pick the ObjArrayMarkingStride change from JDK-8057003. Message-ID: <201701131552.v0DFqMrd013576@aojmv0008.oracle.com> Changeset: a8feb1bc2631 Author: shade Date: 2017-01-13 16:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a8feb1bc2631 Cherry-pick the ObjArrayMarkingStride change from JDK-8057003. ! src/share/vm/runtime/globals.hpp From shade at redhat.com Fri Jan 13 17:46:51 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 13 Jan 2017 18:46:51 +0100 Subject: RFR (M): Reformat gc+stats table Message-ID: <8db2eda1-3d81-b915-73df-1fab99f24e91@redhat.com> Hi, This cleans up and reformats GC stats table: http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/webrev.01/ Before/after: http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/before.txt http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/after.txt Thanks, -Aleksey From zgu at redhat.com Fri Jan 13 18:13:53 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 13 Jan 2017 13:13:53 -0500 Subject: RFR (M): Reformat gc+stats table In-Reply-To: <8db2eda1-3d81-b915-73df-1fab99f24e91@redhat.com> References: <8db2eda1-3d81-b915-73df-1fab99f24e91@redhat.com> Message-ID: Look good to me. Thanks, -Zhengyu On 01/13/2017 12:46 PM, Aleksey Shipilev wrote: > Hi, > > This cleans up and reformats GC stats table: > http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/webrev.01/ > > Before/after: > http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/before.txt > http://cr.openjdk.java.net/~shade/shenandoah/stats-reformat/after.txt > > Thanks, > -Aleksey > From ashipile at redhat.com Fri Jan 13 18:30:48 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 13 Jan 2017 18:30:48 +0000 Subject: hg: shenandoah/jdk9/hotspot: Reformat GC stats table. Message-ID: <201701131830.v0DIUmPU020371@aojmv0008.oracle.com> Changeset: 86a69f0208ca Author: shade Date: 2017-01-13 19:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/86a69f0208ca Reformat GC stats table. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp From rwestrel at redhat.com Fri Jan 13 20:21:00 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 13 Jan 2017 20:21:00 +0000 Subject: hg: shenandoah/jdk9/hotspot: PhaseCFG::replace_uses_with_shenandoah_barrier() causes incorrect execution on aarch64 Message-ID: <201701132021.v0DKL0is014837@aojmv0008.oracle.com> Changeset: bfa281f27d71 Author: roland Date: 2017-01-13 10:23 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bfa281f27d71 PhaseCFG::replace_uses_with_shenandoah_barrier() causes incorrect execution on aarch64 ! src/share/vm/opto/shenandoahSupport.cpp From rkennke at redhat.com Sat Jan 14 10:36:07 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 14 Jan 2017 11:36:07 +0100 Subject: Fix (over) optimization for cmp-objects Message-ID: <1484390167.2566.49.camel@redhat.com> Back when I implemented the cmp-objects-barrier, I was thinking we could optimize it away if only *one* of the operands was coming from an allocation. The logic being that in this case we know it's in to-space already, and a false negative could not happen. However, working on i-u, I realized this is wrong. What can happen is that while that one operand comes from an allocation, it could be 'behind' a safepoint. This is not a problem in itself, because we ensure at safepoints to evacuate those in-flight operands. However, we cannot know that that the other operand is in to-space too. Let's say A has been allocated before the safepoint, and then written to a field. Then, at a safepoint, we target the region containing A for evacuation, and initially evacuate A to A'. We also update the in-flight variable a to point to A'. Then, after the safepoint, we load the field to b, which still points to A. If we now compare a and b, we *do* need the acmp barriers, because b still points to A. However, our current optimization would remove the acmp barriers. Notice that this problem was much more pronounced with my (not published yet) incremental-update work, because with SATB we consider all new object live, and thus would very likely never collect regions that have been allocation regions before the safepoint (because they are usually near 100% live). With i-u, even allocation regions are very likely to be targeted for evacuation, and thus much more likely to trigger this bug. However, while it's unlikely with SATB, it is not impossible. This fixes it: http://cr.openjdk.java.net/~rkennke/cmpalloc/webrev.00/ Notice that the operands are still subject to optimizing away, but not the whole acmp-barrier. I.e. in the above scenario, the 'a' operand might still get optimized away because it's coming from an allocation. Tested by running SPECjvm all night, especially compiler benchmarks, which tended to create problems. Ok to push? Roman From rwestrel at redhat.com Mon Jan 16 08:56:04 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 16 Jan 2017 09:56:04 +0100 Subject: Fix (over) optimization for cmp-objects In-Reply-To: <1484390167.2566.49.camel@redhat.com> References: <1484390167.2566.49.camel@redhat.com> Message-ID: > http://cr.openjdk.java.net/~rkennke/cmpalloc/webrev.00/ That looks good to me. Roland. From roman at kennke.org Mon Jan 16 09:34:18 2017 From: roman at kennke.org (roman at kennke.org) Date: Mon, 16 Jan 2017 09:34:18 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix (over) optimization for cmp-objects. Message-ID: <201701160934.v0G9YIuI020140@aojmv0008.oracle.com> Changeset: 25b5aa4868df Author: rkennke Date: 2017-01-16 10:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/25b5aa4868df Fix (over) optimization for cmp-objects. ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/subnode.cpp From rkennke at redhat.com Mon Jan 16 14:01:14 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Jan 2017 15:01:14 +0100 Subject: RFR: Combine store-val with satb-prebarrier Message-ID: <1484575274.2566.56.camel@redhat.com> This patch combines the storeval-(read-)barrier with the SATB pre-barrier. The usual pattern for object-stores is this: val = read_barrier_storeval(val); if (marking) { push_satb(pre_val); } store(addr, val); however, we only need the storeval-barrier when updating references, which currently only happens during marking. And since we already check for marking, we can just as well combine the two: if (marking) { val = read_barrier_storeval(val); push_satb(pre_val); } store(addr, val); There's a caveat though: storing only (likely) to-space objects into fields has the potential advantage to update references early and make cache misses for read-barriers on such fields less likely. And it possibly reduces work when actually updating references. Some benchmarks in SPECjvm seem to benefit from this change (e.g. serial, xml, derby) some are unaffected (e.g. scimark): https://paste.fedoraproject.org/528249/14845638/ I made this optimization optional and disabled by default. (- XX:+ShenandoahReduceStoreValBarriers turns it on). I suspect the effect of this optimization will be more pronounced with incremental-update (still working on this), because there we can also fold-up the null-check and avoid the loading of the pre-value. I needed to change the interface for GraphKit::pre_barrier() a little: it now returns a possibly modified newval. The Shenandoah implementation of the pre-barrier is based on the G1 version, with the inserted read-barrier in the if (marking) {.. } branch. It also uses the G1 version as fallback, in case no newval is passed (used for Reference.get() where we're not actually storing anything). http://cr.openjdk.java.net/~rkennke/reduce-storeval-barrier/webrev.00/ Ok to push? Roman From rwestrel at redhat.com Mon Jan 16 14:38:29 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 16 Jan 2017 15:38:29 +0100 Subject: RFR: Combine store-val with satb-prebarrier In-Reply-To: <1484575274.2566.56.camel@redhat.com> References: <1484575274.2566.56.camel@redhat.com> Message-ID: > http://cr.openjdk.java.net/~rkennke/reduce-storeval-barrier/webrev.00/ That looks ok. Roland. From shade at redhat.com Mon Jan 16 15:46:20 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 16 Jan 2017 16:46:20 +0100 Subject: RFR (M): Optimize object/array marking with bit-stealing task encoding Message-ID: <9cdac78c-bae8-4e58-58b5-c19330995df4@redhat.com> Hi, Our mark stack contains ObjArrayFromToTask instances, which is are the tuples . For arrays, from/to are describing the chunk to process. For objects, from is always -1, indicating no chunk is expected. Since HS taskqueue employs copying constructors to poll/push the tasks from/to the queue, this means we always copy from/to fields, and the queue footprint also always includes from/to fields. This is excessive for a prevailing case of regular oop marking. This is an attempt to improve the case for regular oops, without regressing parallel array processing: http://cr.openjdk.java.net/~shade/shenandoah/mark-objtask-regular/webrev.02/ This patch improves concurrent mark times significantly for regular oops: retain.Tree -p size=50000000: Baseline: Concurrent Marking = 99.17 s (a = 826446 us) (n = 120) (lvls, us = 806641, 826172, 839844, 841797, 887344) Patched: Concurrent Marking = 93.77 s (a = 774975 us) (n = 121) (lvls, us = 753906, 771484, 785156, 787109, 837818) ...and also ever-so-slightly improving for object arrays: retain.RefArray -p size=2000000000: Baseline: Concurrent Marking = 157.29 s (a = 741921 us) (n = 212) (lvls, us = 720703, 740234, 753906, 755859, 822552) Patched: Concurrent Marking = 158.64 s (a = 734448 us) (n = 216) (lvls, us = 720703, 734375, 744141, 746094, 764200) Less targeted workloads also improve concurrent mark times, e.g. Compiler.compiler: Baseline: Concurrent Marking = 3.87 s (a = 168337 us) (n = 23) (lvls, us = 93750, 103516, 154297, 232422, 439476) Patched: Concurrent Marking = 2.53 s (a = 120386 us) (n = 21) (lvls, us = 76953, 93164, 103516, 125000, 400385) Testing: hotspot_gc_shenandoah, jcstress tests-all. Thanks, -Aleksey From roman at kennke.org Mon Jan 16 16:02:53 2017 From: roman at kennke.org (roman at kennke.org) Date: Mon, 16 Jan 2017 16:02:53 +0000 Subject: hg: shenandoah/jdk9/hotspot: Combine store-val with satb-prebarrier. Message-ID: <201701161602.v0GG2s6v016581@aojmv0008.oracle.com> Changeset: be8ee3d7b4a2 Author: rkennke Date: 2017-01-16 17:02 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/be8ee3d7b4a2 Combine store-val with satb-prebarrier. ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/graphKit.hpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/parse2.cpp ! src/share/vm/opto/parse3.cpp From rkennke at redhat.com Mon Jan 16 16:06:55 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Jan 2017 17:06:55 +0100 Subject: RFR (M): Optimize object/array marking with bit-stealing task encoding In-Reply-To: <9cdac78c-bae8-4e58-58b5-c19330995df4@redhat.com> References: <9cdac78c-bae8-4e58-58b5-c19330995df4@redhat.com> Message-ID: <1484582815.2566.58.camel@redhat.com> Excellent! '512TB ought to be enough for anybody' ? ;-) Good to go. Needs revisiting in 100 years or so, if we are still stuck with 64bit addressing then ;-) Roman Am Montag, den 16.01.2017, 16:46 +0100 schrieb Aleksey Shipilev: > Hi, > > Our mark stack contains ObjArrayFromToTask instances, which is are > the tuples > . For arrays, from/to are describing the chunk to > process. For > objects, from is always -1, indicating no chunk is expected. > > Since HS taskqueue employs copying constructors to poll/push the > tasks from/to > the queue, this means we always copy from/to fields, and the queue > footprint > also always includes from/to fields. This is excessive for a > prevailing case of > regular oop marking. This is an attempt to improve the case for > regular oops, > without regressing parallel array processing: > ? http://cr.openjdk.java.net/~shade/shenandoah/mark-objtask-regular/w > ebrev.02/ > > This patch improves concurrent mark times significantly for regular > oops: > > retain.Tree -p size=50000000: > > ?Baseline: Concurrent Marking =????99.17 s (a =???826446 us) (n > =???120) > ?????????????(lvls, us > =???806641,???826172,???839844,???841797,???887344) > > ? Patched: Concurrent Marking =????93.77 s (a =???774975 us) (n > =???121) > ?????????????(lvls, us > =???753906,???771484,???785156,???787109,???837818) > > ...and also ever-so-slightly improving for object arrays: > > retain.RefArray -p size=2000000000: > > ?Baseline: Concurrent Marking =???157.29 s (a =???741921 us) (n > =???212) > ?????????????(lvls, us > =???720703,???740234,???753906,???755859,???822552) > > ? Patched: Concurrent Marking =???158.64 s (a =???734448 us) (n > =???216) > ?????????????(lvls, us > =???720703,???734375,???744141,???746094,???764200) > > Less targeted workloads also improve concurrent mark times, e.g. > Compiler.compiler: > > ?Baseline: Concurrent Marking =?????3.87 s (a =???168337 us) (n > =????23) > ?????????????(lvls, us > =????93750,???103516,???154297,???232422,???439476) > > ? Patched: Concurrent Marking =?????2.53 s (a =???120386 us) (n > =????21) > ?????????????(lvls, us > =????76953,????93164,???103516,???125000,???400385) > > Testing: hotspot_gc_shenandoah, jcstress tests-all. > > Thanks, > -Aleksey > From shade at redhat.com Mon Jan 16 16:18:08 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 16 Jan 2017 17:18:08 +0100 Subject: RFR (M): Optimize object/array marking with bit-stealing task encoding In-Reply-To: <1484582815.2566.58.camel@redhat.com> References: <9cdac78c-bae8-4e58-58b5-c19330995df4@redhat.com> <1484582815.2566.58.camel@redhat.com> Message-ID: On 01/16/2017 05:06 PM, Roman Kennke wrote: > Excellent! > > '512TB ought to be enough for anybody' ? ;-) Yes, was shooting for 1 PB first: https://twitter.com/shipilev/status/819279445613809664 ...but then I needed 5 bits for power, and at least 10 bits for chunks. Because I believe we will reach 1024 threads much faster than 512 TB :) With some effort, we could make chunks counted in ObjArrayMarkingStrides, and then enforcing OAMS > 2^16 would enable us to use 4 bits for power. But that complicates the code quite a bit, so I gave up. Thanks, -Aleksey From ashipile at redhat.com Mon Jan 16 16:41:59 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 16 Jan 2017 16:41:59 +0000 Subject: hg: shenandoah/jdk9/hotspot: Optimize object/array marking with bit-stealing task encoding. Message-ID: <201701161641.v0GGfxOT026443@aojmv0008.oracle.com> Changeset: 53c734d6690b Author: shade Date: 2017-01-16 17:31 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/53c734d6690b Optimize object/array marking with bit-stealing task encoding. ! src/share/vm/gc/shared/taskqueue.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahTaskqueue.hpp ! src/share/vm/runtime/arguments.cpp From rkennke at redhat.com Mon Jan 16 16:45:53 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Jan 2017 17:45:53 +0100 Subject: For experimentation: Incremental-update Message-ID: <1484585153.2566.60.camel@redhat.com> This patch replaces the current SATB based marking with Incremental- update. In short: instead of marking through the old graph and ensuring consistency at the beginning of marking (i.e. snapshot-at-the- beginning), incremental update is kindof the reverse: it marks through the new graph and ensures consistency at the end. I would like to throw out this patch for people to experiment with. Currently it does not match SATB in performance, but I may be overlooking something, or maybe heuristics need to be changed or something like that. The idea of the benefit of i-u is that we'd get less float and would collect more efficiently by not treating all new objects as implicitely live (which can easily amount to 1/3rd of the heap during one marking). I-u instead marks through new objects too, and should give us a more precise idea of liveness of the heap. Please give it a shot. Partial collection might require i-u. http://cr.openjdk.java.net/~rkennke/incremental-update/webrev.00/ Roman From ashipile at redhat.com Mon Jan 16 18:31:31 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 16 Jan 2017 18:31:31 +0000 Subject: hg: shenandoah/jdk9/hotspot: GC stats table should report minimum and median. Message-ID: <201701161831.v0GIVVig025779@aojmv0008.oracle.com> Changeset: f2bc7a51c9dd Author: shade Date: 2017-01-16 19:31 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f2bc7a51c9dd GC stats table should report minimum and median. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/utilities/numberSeq.cpp From rwestrel at redhat.com Tue Jan 17 12:48:35 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 17 Jan 2017 12:48:35 +0000 Subject: hg: shenandoah/jdk9/hotspot: pre barrier for scalarized objects should be removed Message-ID: <201701171248.v0HCmZjn016402@aojmv0008.oracle.com> Changeset: b3026a0cd95e Author: roland Date: 2017-01-17 13:24 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b3026a0cd95e pre barrier for scalarized objects should be removed ! src/share/vm/opto/callnode.cpp ! src/share/vm/opto/callnode.hpp ! src/share/vm/opto/cfgnode.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/ifnode.cpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/macro.hpp ! src/share/vm/opto/memnode.cpp ! src/share/vm/opto/node.cpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/phaseX.cpp ! src/share/vm/opto/superword.cpp From rwestrel at redhat.com Mon Jan 23 09:57:53 2017 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Mon, 23 Jan 2017 09:57:53 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fixes to write barrier expansion Message-ID: <201701230957.v0N9vsPm018430@aojmv0008.oracle.com> Changeset: d4e949f715c1 Author: roland Date: 2017-01-20 22:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d4e949f715c1 Fixes to write barrier expansion ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/shenandoahSupport.cpp From shade at redhat.com Mon Jan 23 23:15:30 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Jan 2017 00:15:30 +0100 Subject: RFR (M): Avoid touching metadata if class unloading is not requested Message-ID: <23dac6fc-02ef-9d60-aa5b-b2a6fadf3172@redhat.com> Hi, This wild idea is due to Roman. In many GC cycles, we don't unload the classes, and therefore, we don't need to see which classes are alive. With that, we don't need to touch Klasses and CLDs in most cycles. Here's a patch that improves on this: http://cr.openjdk.java.net/~shade/shenandoah/concmark-no-metadata/webrev.01/ It also touches up a few places in concurrent mark code to get better inlining. Marking a large HashMap improves quite a bit: before: Concurrent Marking = 134.99 s (a = 1249911 us) (n = 108) (lvls, us = 869141, 1210938, 1250000, 1269531, 1326828) after: Concurrent Marking = 124.58 s (a = 1132579 us) (n = 110) (lvls, us = 787500, 1113281, 1152344, 1171875, 1215482) Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Tue Jan 24 10:29:22 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 24 Jan 2017 11:29:22 +0100 Subject: RFR (M): Avoid touching metadata if class unloading is not requested In-Reply-To: <23dac6fc-02ef-9d60-aa5b-b2a6fadf3172@redhat.com> References: <23dac6fc-02ef-9d60-aa5b-b2a6fadf3172@redhat.com> Message-ID: <1485253762.2566.72.camel@redhat.com> Cool! Yes! Roman Am Dienstag, den 24.01.2017, 00:15 +0100 schrieb Aleksey Shipilev: > Hi, > > This wild idea is due to Roman. In many GC cycles, we don't unload > the classes, > and therefore, we don't need to see which classes are alive. With > that, we don't > need to touch Klasses and CLDs in most cycles. Here's a patch that > improves on this: > ?http://cr.openjdk.java.net/~shade/shenandoah/concmark-no-metadata/we > brev.01/ > > It also touches up a few places in concurrent mark code to get better > inlining. > > Marking a large HashMap improves quite a bit: > > before: > ? Concurrent Marking = 134.99 s (a =??1249911 us) (n =???108) > ????(lvls, us =???869141,??1210938,??1250000,??1269531,??1326828) > > after: > ? Concurrent Marking = 124.58 s (a =??1132579 us) (n =???110) > ????(lvls, us =???787500,??1113281,??1152344,??1171875,??1215482) > > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > From ashipile at redhat.com Tue Jan 24 11:13:32 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 24 Jan 2017 11:13:32 +0000 Subject: hg: shenandoah/jdk9/hotspot: Avoid touching metadata if class unloading is not requested. Message-ID: <201701241113.v0OBDWti020727@aojmv0008.oracle.com> Changeset: 221b8cade588 Author: shade Date: 2017-01-24 10:58 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/221b8cade588 Avoid touching metadata if class unloading is not requested. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoah_specialized_oop_closures.hpp From shade at redhat.com Tue Jan 24 15:01:12 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Jan 2017 16:01:12 +0100 Subject: RFR (XS): Bump the inlining limits for concurrent mark Message-ID: Hi, In the last few days, we have struggled with GCC inlining in concurrent mark code. We are very close to the default GCC inlining budget, and every recent patch had to rearrange code in some way to deal with that. The issue is compounded by lots of templated closures we have to inline to get decent performance. This repeated balancing act is making already hard performance work even harder. For example, I have wasted almost entire day yesterday trying to find the method split that made GCC happy, and that was not entirely enough. With that, I would like us to claim surrender, bow before the compiler, and burn it to ashes bump the inlining limits for one file: http://cr.openjdk.java.net/~shade/shenandoah/concmark-bump-inline/webrev.01/ This is not unprecedented in Hotspot codebase, the same file has the similar line for psPromotionManager.cpp. The effect is clearly visible in profiled disassembly, but here are sample performance improvements for model tests: *) 20M HashMap marking: before: 133.05 s (a = 1243493 us) (n = 107) (lvls, us = 568359, 1210938, 1230469, 1269531, 1390970) after: 117.95 s (a = 1082074 us) (n = 109) (lvls, us = 921875, 1054688, 1074219, 1093750, 1155972) *) 20M Tree marking: before: 82.91 s (a = 637769 us) (n = 130) (lvls, us = 587891, 615234, 626953, 632812, 726433) after: 59.86 s (a = 436915 us) (n = 137) (lvls, us = 296875, 425781, 431641, 437500, 482738) *) 20M Array marking: before: 22.06 s (a = 176497 us) (n = 125) (lvls, us = 169922, 171875, 173828, 177734, 188691) after: 16.47 s (a = 129720 us) (n = 127) (lvls, us = 123047, 125000, 126953, 132812, 149198) Static footprint increased a bit, for a 130K: before: 20.634.880 libjvm.so after: 20.761.304 libjvm.so Testing: hotspot_gc_shenandoah, targeted benchmarks Thanks, -Aleksey From rkennke at redhat.com Tue Jan 24 15:43:20 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 24 Jan 2017 10:43:20 -0500 (EST) Subject: RFR (XS): Bump the inlining limits for concurrent mark Message-ID: <985123830.17328535.1485272600882.JavaMail.zimbra@zmail25.collab.prod.int.phx2.redhat.com> Yup. Am 24.01.2017 4:02 nachm. schrieb Aleksey Shipilev : > > Hi, > > In the last few days, we have struggled with GCC inlining in concurrent mark > code. We are very close to the default GCC inlining budget, and every recent > patch had to rearrange code in some way to deal with that. The issue is > compounded by lots of templated closures we have to inline to get decent > performance. > > This repeated balancing act is making already hard performance work even harder. > For example, I have wasted almost entire day yesterday trying to find the method > split that made GCC happy, and that was not entirely enough. > > With that, I would like us to claim surrender, bow before the compiler, and > burn it to ashes bump the inlining limits for one file: > ? http://cr.openjdk.java.net/~shade/shenandoah/concmark-bump-inline/webrev.01/ > > This is not unprecedented in Hotspot codebase, the same file has the similar > line for psPromotionManager.cpp. > > The effect is clearly visible in profiled disassembly, but here are sample > performance improvements for model tests: > > *) 20M HashMap marking: > > ? before: 133.05 s (a =? 1243493 us) (n =?? 107) > ??????????? (lvls, us =?? 568359,? 1210938,? 1230469,? 1269531,? 1390970) > > ? after:? 117.95 s (a =? 1082074 us) (n =?? 109) > ??????????? (lvls, us =?? 921875,? 1054688,? 1074219,? 1093750,? 1155972) > > *) 20M Tree marking: > > ? before:? 82.91 s (a =?? 637769 us) (n =?? 130) > ??????????? (lvls, us =?? 587891,?? 615234,?? 626953,?? 632812,?? 726433) > > ? after:?? 59.86 s (a =?? 436915 us) (n =?? 137) > ??????????? (lvls, us =?? 296875,?? 425781,?? 431641,?? 437500,?? 482738) > > *) 20M Array marking: > > ? before:? 22.06 s (a =?? 176497 us) (n =?? 125) > ??????????? (lvls, us =?? 169922,?? 171875,?? 173828,?? 177734,?? 188691) > > ? after:?? 16.47 s (a =?? 129720 us) (n =?? 127) > ??????????? (lvls, us =?? 123047,?? 125000,?? 126953,?? 132812,?? 149198) > > Static footprint increased a bit, for a 130K: > ? before:? 20.634.880? libjvm.so > ? after:?? 20.761.304? libjvm.so > > Testing: hotspot_gc_shenandoah, targeted benchmarks > > Thanks, > -Aleksey > > From roman at kennke.org Tue Jan 24 16:01:43 2017 From: roman at kennke.org (roman at kennke.org) Date: Tue, 24 Jan 2017 16:01:43 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201701241601.v0OG1hIr004356@aojmv0008.oracle.com> Changeset: 9021c546b308 Author: rkennke Date: 2017-01-24 16:01 +0000 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9021c546b308 8170100: AArch64: Crash in C1-compiled code accessing References ! src/cpu/aarch64/vm/templateInterpreterGenerator_aarch64.cpp Changeset: f1f18b912d4a Author: rkennke Date: 2017-01-24 16:01 +0000 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f1f18b912d4a Merge From ashipile at redhat.com Tue Jan 24 16:03:38 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 24 Jan 2017 16:03:38 +0000 Subject: hg: shenandoah/jdk9/hotspot: Bump the inlining limits for concurrent mark. Message-ID: <201701241603.v0OG3c5u004755@aojmv0008.oracle.com> Changeset: 9fef7865556f Author: shade Date: 2017-01-24 17:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9fef7865556f Bump the inlining limits for concurrent mark. ! make/lib/JvmOverrideFiles.gmk From shade at redhat.com Tue Jan 24 21:33:01 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Jan 2017 22:33:01 +0100 Subject: RFR (S): Buffered TQ buffer breaks LIFO Message-ID: <4fbee8c1-31af-0725-ccea-34a8c1d470dd@redhat.com> Hi, When doing the single-entry buffer in TQ, I missed an obvious thing: if we bypass buffer on queue push, then the queue stops being LIFO. In other words, current code does: template inline bool BufferedOverflowTaskQueue::push(E t) { if (_buf_empty) { _elem = t; _buf_empty = false; return true; } else { return taskqueue_t::push(t); // oops, jumping over the buf } } ...which means that if we push and pop (1, 2, 3); we will get (1, 3, 2), not (3, 2, 1), as expected. Among other things, this has implications for work stealing. In case of divide-and-conquer array handling, we keep the largest task in the buffer, while we should have pushed it out into the tail. Fix: http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-fix-lifo/webrev.01/ This, of course, makes array marking much faster: 100M array: before: 121.96 s (a = 916964 us) (n = 133) (lvls, us = 707031, 853516, 923828, 982422, 1154833) after: 79.54 s (a = 580586 us) (n = 137) (lvls, us = 544922, 566406, 574219, 582031, 648559) ...while keeping Tree and HashMap traversals roughly the same: 20M Tree: before: 59.56 s (a = 437911 us) (n = 136) (lvls, us = 408203, 425781, 433594, 435547, 488014) after: 62.18 s (a = 457238 us) (n = 136) (lvls, us = 425781, 445312, 451172, 460938, 509219) 20M HashMap: before: 139.84 s (a = 1282913 us) (n = 109) (lvls, us = 1152344, 1250000, 1289062, 1308594, 1363561) after: 137.04 s (a = 1268875 us) (n = 108) (lvls, us = 1171875, 1230469, 1269531, 1289062, 1327203) Testing: hotspot_gc_shenandoah, targeted benchmarks Thanks, -Aleksey From rkennke at redhat.com Wed Jan 25 10:07:20 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 25 Jan 2017 05:07:20 -0500 (EST) Subject: RFR (S): Buffered TQ buffer breaks LIFO Message-ID: <778906296.17754625.1485338840789.JavaMail.zimbra@zmail25.collab.prod.int.phx2.redhat.com> OK. Roman Am 24.01.2017 10:34 nachm. schrieb Aleksey Shipilev : > > Hi, > > When doing the single-entry buffer in TQ, I missed an obvious thing: if we > bypass buffer on queue push, then the queue stops being LIFO. In other words, > current code does: > > template > inline bool BufferedOverflowTaskQueue::push(E t) > { > ? if (_buf_empty) { > ??? _elem = t; > ??? _buf_empty = false; > ??? return true; > ? } else { > ??? return taskqueue_t::push(t); // oops, jumping over the buf > ? } > } > > ...which means that if we push and pop (1, 2, 3); we will get (1, 3, 2), not (3, > 2, 1), as expected. Among other things, this has implications for work stealing. > In case of divide-and-conquer array handling, we keep the largest task in the > buffer, while we should have pushed it out into the tail. > > Fix: > http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-fix-lifo/webrev.01/ > > This, of course, makes array marking much faster: > > 100M array: > ? before: 121.96 s (a =?? 916964 us) (n =?? 133) > ??????????? (lvls, us =?? 707031,?? 853516,?? 923828,?? 982422,? 1154833) > > ?? after:? 79.54 s (a =?? 580586 us) (n =?? 137) > ??????????? (lvls, us =?? 544922,?? 566406,?? 574219,?? 582031,?? 648559) > > > ...while keeping Tree and HashMap traversals roughly the same: > > 20M Tree: > ? before:? 59.56 s (a =?? 437911 us) (n =?? 136) > ??????????? (lvls, us =?? 408203,?? 425781,?? 433594,?? 435547,?? 488014) > > ?? after:? 62.18 s (a =?? 457238 us) (n =?? 136) > ??????????? (lvls, us =?? 425781,?? 445312,?? 451172,?? 460938,?? 509219) > > 20M HashMap: > ? before: 139.84 s (a =? 1282913 us) (n =?? 109) > ??????????? (lvls, us =? 1152344,? 1250000,? 1289062,? 1308594,? 1363561) > > ?? after: 137.04 s (a =? 1268875 us) (n =?? 108) > ??????????? (lvls, us =? 1171875,? 1230469,? 1269531,? 1289062,? 1327203) > > Testing: hotspot_gc_shenandoah, targeted benchmarks > > Thanks, > -Aleksey > From ashipile at redhat.com Wed Jan 25 10:08:28 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 25 Jan 2017 10:08:28 +0000 Subject: hg: shenandoah/jdk9/hotspot: Buffered TQ buffer breaks LIFO. Message-ID: <201701251008.v0PA8Slm001863@aojmv0008.oracle.com> Changeset: 7d0d703891a0 Author: shade Date: 2017-01-25 11:06 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7d0d703891a0 Buffered TQ buffer breaks LIFO. ! src/share/vm/gc/shared/taskqueue.inline.hpp From shade at redhat.com Thu Jan 26 17:39:34 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Jan 2017 18:39:34 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause Message-ID: Hi, Profiled the pause-intensive application for fun, and spotted an easy optimization target. In final mark pause, we select collection set, and for that, we sort the regions by garbage. This incurs (N log N) calls to comparator, which calls SHHR->garbage() and handles nulls, etc. Doing a simple trick: http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webrev.01/ ...improves timings: before: Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n = 6659) (lvls, us = 717, 805, 830, 912, 9376) Final Mark Pauses (net) = 3.03 s (a = 454 us) (n = 6659) (lvls, us = 102, 211, 221, 270, 8728) Prepare Evacuation = 2.04 s (a = 306 us) (n = 6659) (lvls, us = 273, 293, 297, 301, 1490) after: Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n = 7195) (lvls, us = 547, 605, 629, 689, 5335) Final Mark Pauses (net) = 3.15 s (a = 438 us) (n = 7195) (lvls, us = 98, 203, 211, 260, 4877) Prepare Evacuation = 0.75 s (a = 105 us) (n = 7195) (lvls, us = 82, 96, 105, 109, 187) 0.2 ms off the already low pause time. Thanks, -Aleksey From rkennke at redhat.com Thu Jan 26 17:43:01 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Jan 2017 18:43:01 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: References: Message-ID: <1485452581.2566.82.camel@redhat.com> Or maybe not sort the list at all? Downside: we need to scan all regions and decide on their garbage, instead of stopping at the first region that exceeds the garbage threshold. Upside: no sorting necessary. May be worth a try. Roman Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey Shipilev: > Hi, > > Profiled the pause-intensive application for fun, and spotted an easy > optimization target. In final mark pause, we select collection set, > and for > that, we sort the regions by garbage. This incurs (N log N) calls to > comparator, > which calls SHHR->garbage() and handles nulls, etc. > > Doing a simple trick: > ?http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webre > v.01/ > > ...improves timings: > > before: > > ? Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n =??6659) > ????(lvls, us = 717, 805, 830, 912, 9376) > > ? Final Mark Pauses (net)???= 3.03 s (a =??454 us) (n =??6659) > ????(lvls, us = 102, 211, 221, 270, 8728) > > ? Prepare Evacuation = 2.04 s (a = 306 us) (n =??6659) > ????(lvls, us = 273, 293, 297, 301, 1490) > > after: > > ? Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n =??7195) > ????(lvls, us = 547, 605, 629, 689, 5335) > > ? Final Mark Pauses (net)???= 3.15 s (a = 438 us) (n =??7195) > ????(lvls, us = 98, 203, 211, 260, 4877) > > ? Prepare Evacuation = 0.75 s (a = 105 us) (n =??7195) > ????(lvls, us = 82, 96, 105, 109, 187) > > 0.2 ms off the already low pause time. > > Thanks, > -Aleksey > > From shade at redhat.com Thu Jan 26 17:48:08 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Jan 2017 18:48:08 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: <1485452581.2566.82.camel@redhat.com> References: <1485452581.2566.82.camel@redhat.com> Message-ID: Maybe! Let's see... -Aleksey On 01/26/2017 06:43 PM, Roman Kennke wrote: > Or maybe not sort the list at all? Downside: we need to scan all > regions and decide on their garbage, instead of stopping at the first > region that exceeds the garbage threshold. Upside: no sorting > necessary. May be worth a try. > > Roman > > > Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey Shipilev: >> Hi, >> >> Profiled the pause-intensive application for fun, and spotted an easy >> optimization target. In final mark pause, we select collection set, >> and for >> that, we sort the regions by garbage. This incurs (N log N) calls to >> comparator, >> which calls SHHR->garbage() and handles nulls, etc. >> >> Doing a simple trick: >> http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webre >> v.01/ >> >> ...improves timings: >> >> before: >> >> Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n = 6659) >> (lvls, us = 717, 805, 830, 912, 9376) >> >> Final Mark Pauses (net) = 3.03 s (a = 454 us) (n = 6659) >> (lvls, us = 102, 211, 221, 270, 8728) >> >> Prepare Evacuation = 2.04 s (a = 306 us) (n = 6659) >> (lvls, us = 273, 293, 297, 301, 1490) >> >> after: >> >> Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n = 7195) >> (lvls, us = 547, 605, 629, 689, 5335) >> >> Final Mark Pauses (net) = 3.15 s (a = 438 us) (n = 7195) >> (lvls, us = 98, 203, 211, 260, 4877) >> >> Prepare Evacuation = 0.75 s (a = 105 us) (n = 7195) >> (lvls, us = 82, 96, 105, 109, 187) >> >> 0.2 ms off the already low pause time. >> >> Thanks, >> -Aleksey >> >> From rkennke at redhat.com Thu Jan 26 17:53:17 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Jan 2017 18:53:17 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: References: <1485452581.2566.82.camel@redhat.com> Message-ID: <1485453197.2566.84.camel@redhat.com> Duh. We don't even seem to stop at the first region that exceeds the threshold. This whole sorting seems not necessary, and lots of wasted space too (for the _sorted_regions list). Roman Am Donnerstag, den 26.01.2017, 18:48 +0100 schrieb Aleksey Shipilev: > Maybe! Let's see... > > -Aleksey > > On 01/26/2017 06:43 PM, Roman Kennke wrote: > > Or maybe not sort the list at all? Downside: we need to scan all > > regions and decide on their garbage, instead of stopping at the > > first > > region that exceeds the garbage threshold. Upside: no sorting > > necessary. May be worth a try. > > > > Roman > > > > > > Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey > > Shipilev: > > > Hi, > > > > > > Profiled the pause-intensive application for fun, and spotted an > > > easy > > > optimization target. In final mark pause, we select collection > > > set, > > > and for > > > that, we sort the regions by garbage. This incurs (N log N) calls > > > to > > > comparator, > > > which calls SHHR->garbage() and handles nulls, etc. > > > > > > Doing a simple trick: > > > ?http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/w > > > ebre > > > v.01/ > > > > > > ...improves timings: > > > > > > before: > > > > > > ? Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n =??6659) > > > ????(lvls, us = 717, 805, 830, 912, 9376) > > > > > > ? Final Mark Pauses (net)???= 3.03 s (a =??454 us) (n =??6659) > > > ????(lvls, us = 102, 211, 221, 270, 8728) > > > > > > ? Prepare Evacuation = 2.04 s (a = 306 us) (n =??6659) > > > ????(lvls, us = 273, 293, 297, 301, 1490) > > > > > > after: > > > > > > ? Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n =??7195) > > > ????(lvls, us = 547, 605, 629, 689, 5335) > > > > > > ? Final Mark Pauses (net)???= 3.15 s (a = 438 us) (n =??7195) > > > ????(lvls, us = 98, 203, 211, 260, 4877) > > > > > > ? Prepare Evacuation = 0.75 s (a = 105 us) (n =??7195) > > > ????(lvls, us = 82, 96, 105, 109, 187) > > > > > > 0.2 ms off the already low pause time. > > > > > > Thanks, > > > -Aleksey > > > > > > > > From shade at redhat.com Thu Jan 26 18:45:52 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Jan 2017 19:45:52 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: <1485453197.2566.84.camel@redhat.com> References: <1485452581.2566.82.camel@redhat.com> <1485453197.2566.84.camel@redhat.com> Message-ID: <9b3b077f-bd0c-406b-6cfb-5ec0427a7283@redhat.com> Okay, let's do this: 1) Do not even try to sort when heuristics is fine with unsorted (some need sorted anyway, and probably some in the future would). 2) Trim down the candidate list first, and then sort a hopefully smaller list. See: http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webrev.02/ -Aleksey On 01/26/2017 06:53 PM, Roman Kennke wrote: > Duh. We don't even seem to stop at the first region that exceeds the > threshold. This whole sorting seems not necessary, and lots of wasted > space too (for the _sorted_regions list). > > Roman > > Am Donnerstag, den 26.01.2017, 18:48 +0100 schrieb Aleksey Shipilev: >> Maybe! Let's see... >> >> -Aleksey >> >> On 01/26/2017 06:43 PM, Roman Kennke wrote: >>> Or maybe not sort the list at all? Downside: we need to scan all >>> regions and decide on their garbage, instead of stopping at the >>> first >>> region that exceeds the garbage threshold. Upside: no sorting >>> necessary. May be worth a try. >>> >>> Roman >>> >>> >>> Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey >>> Shipilev: >>>> Hi, >>>> >>>> Profiled the pause-intensive application for fun, and spotted an >>>> easy >>>> optimization target. In final mark pause, we select collection >>>> set, >>>> and for >>>> that, we sort the regions by garbage. This incurs (N log N) calls >>>> to >>>> comparator, >>>> which calls SHHR->garbage() and handles nulls, etc. >>>> >>>> Doing a simple trick: >>>> http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/w >>>> ebre >>>> v.01/ >>>> >>>> ...improves timings: >>>> >>>> before: >>>> >>>> Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n = 6659) >>>> (lvls, us = 717, 805, 830, 912, 9376) >>>> >>>> Final Mark Pauses (net) = 3.03 s (a = 454 us) (n = 6659) >>>> (lvls, us = 102, 211, 221, 270, 8728) >>>> >>>> Prepare Evacuation = 2.04 s (a = 306 us) (n = 6659) >>>> (lvls, us = 273, 293, 297, 301, 1490) >>>> >>>> after: >>>> >>>> Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n = 7195) >>>> (lvls, us = 547, 605, 629, 689, 5335) >>>> >>>> Final Mark Pauses (net) = 3.15 s (a = 438 us) (n = 7195) >>>> (lvls, us = 98, 203, 211, 260, 4877) >>>> >>>> Prepare Evacuation = 0.75 s (a = 105 us) (n = 7195) >>>> (lvls, us = 82, 96, 105, 109, 187) >>>> >>>> 0.2 ms off the already low pause time. >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> >> >> From rkennke at redhat.com Thu Jan 26 18:50:33 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Jan 2017 19:50:33 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: <9b3b077f-bd0c-406b-6cfb-5ec0427a7283@redhat.com> References: <1485452581.2566.82.camel@redhat.com> <1485453197.2566.84.camel@redhat.com> <9b3b077f-bd0c-406b-6cfb-5ec0427a7283@redhat.com> Message-ID: <1485456633.2566.85.camel@redhat.com> Ok. What's the profile saying? Roman Am Donnerstag, den 26.01.2017, 19:45 +0100 schrieb Aleksey Shipilev: > Okay, let's do this: > > 1) Do not even try to sort when heuristics is fine with unsorted > (some need > sorted anyway, and probably some in the future would). > > 2) Trim down the candidate list first, and then sort a hopefully > smaller list. > > See: > ?http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webre > v.02/ > > -Aleksey > > On 01/26/2017 06:53 PM, Roman Kennke wrote: > > Duh. We don't even seem to stop at the first region that exceeds > > the > > threshold. This whole sorting seems not necessary, and lots of > > wasted > > space too (for the _sorted_regions list). > > > > Roman > > > > Am Donnerstag, den 26.01.2017, 18:48 +0100 schrieb Aleksey > > Shipilev: > > > Maybe! Let's see... > > > > > > -Aleksey > > > > > > On 01/26/2017 06:43 PM, Roman Kennke wrote: > > > > Or maybe not sort the list at all? Downside: we need to scan > > > > all > > > > regions and decide on their garbage, instead of stopping at the > > > > first > > > > region that exceeds the garbage threshold. Upside: no sorting > > > > necessary. May be worth a try. > > > > > > > > Roman > > > > > > > > > > > > Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey > > > > Shipilev: > > > > > Hi, > > > > > > > > > > Profiled the pause-intensive application for fun, and spotted > > > > > an > > > > > easy > > > > > optimization target. In final mark pause, we select > > > > > collection > > > > > set, > > > > > and for > > > > > that, we sort the regions by garbage. This incurs (N log N) > > > > > calls > > > > > to > > > > > comparator, > > > > > which calls SHHR->garbage() and handles nulls, etc. > > > > > > > > > > Doing a simple trick: > > > > > ?http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-bett > > > > > er/w > > > > > ebre > > > > > v.01/ > > > > > > > > > > ...improves timings: > > > > > > > > > > before: > > > > > > > > > > ? Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n > > > > > =??6659) > > > > > ????(lvls, us = 717, 805, 830, 912, 9376) > > > > > > > > > > ? Final Mark Pauses (net)???= 3.03 s (a =??454 us) (n > > > > > =??6659) > > > > > ????(lvls, us = 102, 211, 221, 270, 8728) > > > > > > > > > > ? Prepare Evacuation = 2.04 s (a = 306 us) (n =??6659) > > > > > ????(lvls, us = 273, 293, 297, 301, 1490) > > > > > > > > > > after: > > > > > > > > > > ? Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n =??7195) > > > > > ????(lvls, us = 547, 605, 629, 689, 5335) > > > > > > > > > > ? Final Mark Pauses (net)???= 3.15 s (a = 438 us) (n =??7195) > > > > > ????(lvls, us = 98, 203, 211, 260, 4877) > > > > > > > > > > ? Prepare Evacuation = 0.75 s (a = 105 us) (n =??7195) > > > > > ????(lvls, us = 82, 96, 105, 109, 187) > > > > > > > > > > 0.2 ms off the already low pause time. > > > > > > > > > > Thanks, > > > > > -Aleksey > > > > > > > > > > > > > > > > > > From shade at redhat.com Thu Jan 26 18:56:26 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Jan 2017 19:56:26 +0100 Subject: RFR (S): Sorting the regions for collection set takes a while during pause In-Reply-To: <1485456633.2566.85.camel@redhat.com> References: <1485452581.2566.82.camel@redhat.com> <1485453197.2566.84.camel@redhat.com> <9b3b077f-bd0c-406b-6cfb-5ec0427a7283@redhat.com> <1485456633.2566.85.camel@redhat.com> Message-ID: <00068ff5-f433-3d3e-cd87-0ca0712225a4@redhat.com> No sorting with default heuristics, the "prepare evac" times are slightly better than the previous patch. I would need to construct a proper torture test to say something about the pauses now :) Thanks, -Aleksey On 01/26/2017 07:50 PM, Roman Kennke wrote: > Ok. > > What's the profile saying? > > Roman > > > Am Donnerstag, den 26.01.2017, 19:45 +0100 schrieb Aleksey Shipilev: >> Okay, let's do this: >> >> 1) Do not even try to sort when heuristics is fine with unsorted >> (some need >> sorted anyway, and probably some in the future would). >> >> 2) Trim down the candidate list first, and then sort a hopefully >> smaller list. >> >> See: >> http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-better/webre >> v.02/ >> >> -Aleksey >> >> On 01/26/2017 06:53 PM, Roman Kennke wrote: >>> Duh. We don't even seem to stop at the first region that exceeds >>> the >>> threshold. This whole sorting seems not necessary, and lots of >>> wasted >>> space too (for the _sorted_regions list). >>> >>> Roman >>> >>> Am Donnerstag, den 26.01.2017, 18:48 +0100 schrieb Aleksey >>> Shipilev: >>>> Maybe! Let's see... >>>> >>>> -Aleksey >>>> >>>> On 01/26/2017 06:43 PM, Roman Kennke wrote: >>>>> Or maybe not sort the list at all? Downside: we need to scan >>>>> all >>>>> regions and decide on their garbage, instead of stopping at the >>>>> first >>>>> region that exceeds the garbage threshold. Upside: no sorting >>>>> necessary. May be worth a try. >>>>> >>>>> Roman >>>>> >>>>> >>>>> Am Donnerstag, den 26.01.2017, 18:39 +0100 schrieb Aleksey >>>>> Shipilev: >>>>>> Hi, >>>>>> >>>>>> Profiled the pause-intensive application for fun, and spotted >>>>>> an >>>>>> easy >>>>>> optimization target. In final mark pause, we select >>>>>> collection >>>>>> set, >>>>>> and for >>>>>> that, we sort the regions by garbage. This incurs (N log N) >>>>>> calls >>>>>> to >>>>>> comparator, >>>>>> which calls SHHR->garbage() and handles nulls, etc. >>>>>> >>>>>> Doing a simple trick: >>>>>> http://cr.openjdk.java.net/~shade/shenandoah/pause-sort-bett >>>>>> er/w >>>>>> ebre >>>>>> v.01/ >>>>>> >>>>>> ...improves timings: >>>>>> >>>>>> before: >>>>>> >>>>>> Final Mark Pauses (gross) = 7.05 s (a = 1059 us) (n >>>>>> = 6659) >>>>>> (lvls, us = 717, 805, 830, 912, 9376) >>>>>> >>>>>> Final Mark Pauses (net) = 3.03 s (a = 454 us) (n >>>>>> = 6659) >>>>>> (lvls, us = 102, 211, 221, 270, 8728) >>>>>> >>>>>> Prepare Evacuation = 2.04 s (a = 306 us) (n = 6659) >>>>>> (lvls, us = 273, 293, 297, 301, 1490) >>>>>> >>>>>> after: >>>>>> >>>>>> Final Mark Pauses (gross) = 6.12 s (a = 851 us) (n = 7195) >>>>>> (lvls, us = 547, 605, 629, 689, 5335) >>>>>> >>>>>> Final Mark Pauses (net) = 3.15 s (a = 438 us) (n = 7195) >>>>>> (lvls, us = 98, 203, 211, 260, 4877) >>>>>> >>>>>> Prepare Evacuation = 0.75 s (a = 105 us) (n = 7195) >>>>>> (lvls, us = 82, 96, 105, 109, 187) >>>>>> >>>>>> 0.2 ms off the already low pause time. >>>>>> >>>>>> Thanks, >>>>>> -Aleksey >>>>>> >>>>>> >>>> >>>> >> >> From ashipile at redhat.com Thu Jan 26 18:57:22 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 26 Jan 2017 18:57:22 +0000 Subject: hg: shenandoah/jdk9/hotspot: Sorting the regions for collection set takes a while during pause. Message-ID: <201701261857.v0QIvME8008730@aojmv0008.oracle.com> Changeset: 1a7cae11ca05 Author: shade Date: 2017-01-26 19:57 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/1a7cae11ca05 Sorting the regions for collection set takes a while during pause. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp From shade at redhat.com Fri Jan 27 14:46:04 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Jan 2017 15:46:04 +0100 Subject: RFR (XS): Interleave "process references" and "unload classes" to amortize the pause Message-ID: <19153ac3-f8cf-e0ed-672f-278113faf431@redhat.com> Hi, Obvious idea: make sure reference processing and class unloading do not happen on the *same* N-th cycle to amortize the costs. Otherwise we make the "outlier" pause even more outlier-ish. Webrev: http://cr.openjdk.java.net/~shade/shenandoah/heuristics-interleave-class-refs/webrev.01/ Testing: hotspot_gc_shenandoah -Aleksey From rkennke at redhat.com Fri Jan 27 14:47:41 2017 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Jan 2017 15:47:41 +0100 Subject: RFR (XS): Interleave "process references" and "unload classes" to amortize the pause In-Reply-To: <19153ac3-f8cf-e0ed-672f-278113faf431@redhat.com> References: <19153ac3-f8cf-e0ed-672f-278113faf431@redhat.com> Message-ID: <1485528461.2566.89.camel@redhat.com> Ok Roman Am Freitag, den 27.01.2017, 15:46 +0100 schrieb Aleksey Shipilev: > Hi, > > Obvious idea: make sure reference processing and class unloading do > not happen > on the *same* N-th cycle to amortize the costs. Otherwise we make the > "outlier" > pause even more outlier-ish. > > Webrev: > ?http://cr.openjdk.java.net/~shade/shenandoah/heuristics-interleave-c > lass-refs/webrev.01/ > > Testing: hotspot_gc_shenandoah > > -Aleksey > > From ashipile at redhat.com Fri Jan 27 14:49:01 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 27 Jan 2017 14:49:01 +0000 Subject: hg: shenandoah/jdk9/hotspot: Interleave "process references" and "unload classes" to amortize the pause. Message-ID: <201701271449.v0REn1LG028738@aojmv0008.oracle.com> Changeset: b3e9b952c288 Author: shade Date: 2017-01-27 15:48 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b3e9b952c288 Interleave "process references" and "unload classes" to amortize the pause. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp From shade at redhat.com Fri Jan 27 15:58:27 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Jan 2017 16:58:27 +0100 Subject: RFR (S): Print GC cycle ID (+ VMOperations cleanup) Message-ID: <12ee2e7b-fd68-b829-25fb-fa2e8c85a9e9@redhat.com> Hi, Let's finally have GC cycle IDs in the log: http://cr.openjdk.java.net/~shade/shenandoah/gc-cycle-id/webrev.01/ Prints like G1: [5.241s][info][gc] GC(1) Pause Init-Mark 3.008ms [5.525s][info][gc] GC(1) Concurrent marking 3076M->3152M(4096M) 283.962ms [5.531s][info][gc] GC(1) Pause Final Mark 3152M->1148M(4096M) 4.860ms [5.539s][info][gc] GC(1) Concurrent evacuation 1148M->1192M(4096M) 8.572ms [7.499s][info][gc] GC(2) Pause Init-Mark 5.103ms [7.777s][info][gc] GC(2) Concurrent marking 3084M->3250M(4096M) 277.868ms [7.783s][info][gc] GC(2) Pause Final Mark 3250M->2460M(4096M) 5.320ms [7.807s][info][gc] GC(2) Concurrent evacuation 2460M->2534M(4096M) 24.227ms [8.352s][info][gc] GC(3) Pause Init-Mark 0.576ms Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Fri Jan 27 16:22:26 2017 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Jan 2017 17:22:26 +0100 Subject: RFR (S): Print GC cycle ID (+ VMOperations cleanup) In-Reply-To: <12ee2e7b-fd68-b829-25fb-fa2e8c85a9e9@redhat.com> References: <12ee2e7b-fd68-b829-25fb-fa2e8c85a9e9@redhat.com> Message-ID: <1485534146.2566.90.camel@redhat.com> Nice. Go! Roman Am Freitag, den 27.01.2017, 16:58 +0100 schrieb Aleksey Shipilev: > Hi, > > Let's finally have GC cycle IDs in the log: > ? http://cr.openjdk.java.net/~shade/shenandoah/gc-cycle-id/webrev.01/ > > Prints like G1: > > [5.241s][info][gc] GC(1) Pause Init-Mark 3.008ms > [5.525s][info][gc] GC(1) Concurrent marking 3076M->3152M(4096M) > 283.962ms > [5.531s][info][gc] GC(1) Pause Final Mark 3152M->1148M(4096M) 4.860ms > [5.539s][info][gc] GC(1) Concurrent evacuation??1148M->1192M(4096M) > 8.572ms > [7.499s][info][gc] GC(2) Pause Init-Mark 5.103ms > [7.777s][info][gc] GC(2) Concurrent marking 3084M->3250M(4096M) > 277.868ms > [7.783s][info][gc] GC(2) Pause Final Mark 3250M->2460M(4096M) 5.320ms > [7.807s][info][gc] GC(2) Concurrent evacuation??2460M->2534M(4096M) > 24.227ms > [8.352s][info][gc] GC(3) Pause Init-Mark 0.576ms > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > From ashipile at redhat.com Fri Jan 27 16:23:12 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 27 Jan 2017 16:23:12 +0000 Subject: hg: shenandoah/jdk9/hotspot: Print GC cycle ID, and clean up VMOperations. Message-ID: <201701271623.v0RGNCFm026226@aojmv0008.oracle.com> Changeset: dd1f7d788094 Author: shade Date: 2017-01-27 17:21 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/dd1f7d788094 Print GC cycle ID, and clean up VMOperations. ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.hpp From rkennke at redhat.com Mon Jan 30 15:56:19 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Jan 2017 16:56:19 +0100 Subject: RFR: Fix double-marking Message-ID: <1485791779.2566.96.camel@redhat.com> Aleksey observed last week that Shenandoah has a tendency to get into an equilibrium where two GC cycles would go back-to-back. It goes like this: GC cycle #1 marks through heap and finds X garbage and evacuates that, but cannot reclaim it right away, because it can only be reclaimed when all references have been update, which happens during next cycle. This means that memory is still low after evacuation, and hence start another cycle right away. That cycle updates all refs, and then reclaims memory from cycle #1, but since it's so close to the previous cycle, it doesn't find that much garbage, and leaves a larger gap until the next cycle. This would usually start out with a small oscillation but tends to amplify itself after some cycles. Another way to put it is that GC cycle #2 is degenerated and basically only serves as an update-refs and reclamation cycle for #1. The root cause is the dependency between cycles that lies in the floating 'dead' garbage that the previous cycle generated. This patch fixes the problem by accounting for that floating garbage when starting the next cycle: when a lot of memory is about to get reclaimed, we can start the cycle later, if only little memory is reclaimed, we need to start earlier. This counteracts the dynamics that leads to the observed equilibrium and now we get nice evenly spaced out GC cycles. This also leads to improved performance: On moderate GC load, we I measured a baseline of: [728,745s][info][gc,stats] Concurrent Marking??????????=????84,93 s (a =???719732 us) (n =???118) (lvls, us =???427734,???644531,???712891,???787109,??1350562) With the patch, this goes down to: [724,359s][info][gc,stats] Concurrent Marking??????????=????67,38 s (a =???701885 us) (n =????96) (lvls, us =???394531,???625000,???681641,???742188,??1311134) With adaptive heuristics it looks even (slightly) better: [722,760s][info ][gc,stats] Concurrent Marking??????????=????64,67 s (a =???726670 us) (n =????89) (lvls, us =???380859,???662109,???718750,???783203,??1056381) Notice that with adaptive heuristics, we'd get occasional full-GCs before that patch, which doesn't happen anymore too. Under heavy load, the numbers look like this: Baseline: [868,046s][info][gc,stats] Concurrent Marking??????????=???333,44 s (a =??1096839 us) (n =???304) (lvls, us =???349609,??1015625,??1093750,??1171875,??1382350) Dynamic-patched: [834,421s][info][gc,stats] Concurrent Marking??????????=???273,95 s (a =??1070101 us) (n =???256) (lvls, us =???541016,??1015625,??1074219,??1113281,??1465707) Adaptive-patched: [825,412s][info ][gc,stats] Concurrent Marking??????????=???257,65 s (a =??1073544 us) (n =???240) (lvls, us =???365234,??1015625,??1074219,??1132812,??1398973) Again, no more full-GCs with adaptive. http://cr.openjdk.java.net/~rkennke/fixdoublemarks/webrev.00/ Ok to push? Roman From shade at redhat.com Mon Jan 30 16:14:14 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Jan 2017 17:14:14 +0100 Subject: RFR: Fix double-marking In-Reply-To: <1485791779.2566.96.camel@redhat.com> References: <1485791779.2566.96.camel@redhat.com> Message-ID: <45cd74b0-61ea-87bb-c697-e9d939811ac4@redhat.com> On 01/30/2017 04:56 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/fixdoublemarks/webrev.00/ Looks good. Confirmed the double-marking is gone with my runs. The descriptions in options like "Defaults to 25%" are not true anymore though. Thanks, -Aleksey From roman at kennke.org Mon Jan 30 16:33:37 2017 From: roman at kennke.org (roman at kennke.org) Date: Mon, 30 Jan 2017 16:33:37 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix double-marking. Message-ID: <201701301633.v0UGXbVt017297@aojmv0008.oracle.com> Changeset: 6854d3818395 Author: rkennke Date: 2017-01-30 17:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6854d3818395 Fix double-marking. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From ashipile at redhat.com Tue Jan 31 11:09:38 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 31 Jan 2017 11:09:38 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix failing TestShenandoahArgumentRanges test. Message-ID: <201701311109.v0VB9cJh002216@aojmv0008.oracle.com> Changeset: e1c0e1ddd34e Author: shade Date: 2017-01-31 12:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e1c0e1ddd34e Fix failing TestShenandoahArgumentRanges test. ! test/gc/shenandoah/TestShenandoahArgumentRanges.java From shade at redhat.com Tue Jan 31 12:35:37 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 31 Jan 2017 13:35:37 +0100 Subject: RFR (S): Enable ShenandoahConcurrentCodeRoots Message-ID: <8881abfa-07a6-59e2-3e4c-1c43d2f7af77@redhat.com> Hi, While investigating the cause for the bimodal behavior for Init-Mark: [14.162s][info][gc] GC(1) Pause Init-Mark 3.810ms [17.943s][info][gc] GC(2) Pause Init-Mark 3.327ms [20.599s][info][gc] GC(3) Pause Init-Mark 3.442ms [22.953s][info][gc] GC(4) Pause Init-Mark 0.819ms [25.357s][info][gc] GC(5) Pause Init-Mark 2.861ms [27.838s][info][gc] GC(6) Pause Init-Mark 3.013ms [30.602s][info][gc] GC(7) Pause Init-Mark 3.440ms [33.417s][info][gc] GC(8) Pause Init-Mark 3.113ms [36.279s][info][gc] GC(9) Pause Init-Mark 0.838ms [39.098s][info][gc] GC(10) Pause Init-Mark 3.264ms [41.880s][info][gc] GC(11) Pause Init-Mark 2.713ms [44.635s][info][gc] GC(12) Pause Init-Mark 2.832ms [47.135s][info][gc] GC(13) Pause Init-Mark 2.876ms [49.514s][info][gc] GC(14) Pause Init-Mark 0.856ms ...found that scanning code cache when class_unload=false takes a few milliseconds. Then discovered ShenandoahConcurrentCodeRoots, which scans the code roots in concurrent marking workers under the CodeCache_lock. Enabling it seems to help with Init-Mark times! before: Initial Mark Pauses (net) = 0.51 s (a = 2648 us) (n = 193) (lvls, us = 686, 2734, 2969, 3125, 4058) after: Initial Mark Pauses (net) = 0.24 s (a = 1173 us) (n = 202) (lvls, us = 752, 1113, 1172, 1250, 3046) I cannot see the theoretical problems with enabling it. Both jcstress and hotspot_gc_shenandoah pass with the flag turned on by default too. So, let's do it: http://cr.openjdk.java.net/~shade/shenandoah/codecache-scan/webrev.01/ Thanks, -Aleksey From rkennke at redhat.com Tue Jan 31 13:49:24 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 31 Jan 2017 14:49:24 +0100 Subject: RFR (S): Enable ShenandoahConcurrentCodeRoots In-Reply-To: <8881abfa-07a6-59e2-3e4c-1c43d2f7af77@redhat.com> References: <8881abfa-07a6-59e2-3e4c-1c43d2f7af77@redhat.com> Message-ID: <1485870564.3269.3.camel@redhat.com> Great. Go! Roman Am Dienstag, den 31.01.2017, 13:35 +0100 schrieb Aleksey Shipilev: > Hi, > > While investigating the cause for the bimodal behavior for Init-Mark: > > ?[14.162s][info][gc] GC(1) Pause Init-Mark 3.810ms > ?[17.943s][info][gc] GC(2) Pause Init-Mark 3.327ms > ?[20.599s][info][gc] GC(3) Pause Init-Mark 3.442ms > ?[22.953s][info][gc] GC(4) Pause Init-Mark 0.819ms > ?[25.357s][info][gc] GC(5) Pause Init-Mark 2.861ms > ?[27.838s][info][gc] GC(6) Pause Init-Mark 3.013ms > ?[30.602s][info][gc] GC(7) Pause Init-Mark 3.440ms > ?[33.417s][info][gc] GC(8) Pause Init-Mark 3.113ms > ?[36.279s][info][gc] GC(9) Pause Init-Mark 0.838ms > ?[39.098s][info][gc] GC(10) Pause Init-Mark 3.264ms > ?[41.880s][info][gc] GC(11) Pause Init-Mark 2.713ms > ?[44.635s][info][gc] GC(12) Pause Init-Mark 2.832ms > ?[47.135s][info][gc] GC(13) Pause Init-Mark 2.876ms > ?[49.514s][info][gc] GC(14) Pause Init-Mark 0.856ms > > ...found that scanning code cache when class_unload=false takes a few > milliseconds. Then discovered ShenandoahConcurrentCodeRoots, which > scans the > code roots in concurrent marking workers under the CodeCache_lock. > Enabling it > seems to help with Init-Mark times! > > ?before: Initial Mark Pauses (net)???=?????0.51 s (a =?????2648 us) > (n =???193) > ????(lvls, us =??????686,?????2734,?????2969,?????3125,?????4058) > > ? after: Initial Mark Pauses (net)???=?????0.24 s (a =?????1173 us) > (n =???202) > ????(lvls, us =??????752,?????1113,?????1172,?????1250,?????3046) > > I cannot see the theoretical problems with enabling it. Both jcstress > and > hotspot_gc_shenandoah pass with the flag turned on by default too. > So, let's do it: > ? http://cr.openjdk.java.net/~shade/shenandoah/codecache-scan/webrev. > 01/ > > Thanks, > -Aleksey > From ashipile at redhat.com Tue Jan 31 13:50:20 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 31 Jan 2017 13:50:20 +0000 Subject: hg: shenandoah/jdk9/hotspot: Enable ShenandoahConcurrentCodeRoots. Message-ID: <201701311350.v0VDoKD3016544@aojmv0008.oracle.com> Changeset: 973b3b16e3b1 Author: shade Date: 2017-01-31 14:50 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/973b3b16e3b1 Enable ShenandoahConcurrentCodeRoots. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From shade at redhat.com Tue Jan 31 14:06:58 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 31 Jan 2017 15:06:58 +0100 Subject: RFR (S): Print the timings for conc bitmap cleaning Message-ID: Hi, In our large-machine runs, we have quite a few cycles taken by concurrent bitmap cleanup. Closer look identified that we don't report this important thing as part of collection cycle, while we actually should. Fixed: http://cr.openjdk.java.net/~shade/shenandoah/times-print/webrev.01/ (This also fixes a few nagging UX problems, like s/Init-Mark/Init Mark/, etc). Thanks, -Aleksey From rkennke at redhat.com Tue Jan 31 14:10:14 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 31 Jan 2017 15:10:14 +0100 Subject: RFR (S): Print the timings for conc bitmap cleaning In-Reply-To: References: Message-ID: <1485871814.3269.4.camel@redhat.com> Good idea! Go! Roman Am Dienstag, den 31.01.2017, 15:06 +0100 schrieb Aleksey Shipilev: > Hi, > > In our large-machine runs, we have quite a few cycles taken by > concurrent bitmap > cleanup. Closer look identified that we don't report this important > thing as > part of collection cycle, while we actually should. Fixed: > ? http://cr.openjdk.java.net/~shade/shenandoah/times-print/webrev.01/ > > (This also fixes a few nagging UX problems, like s/Init-Mark/Init > Mark/, etc). > > Thanks, > -Aleksey > > > From ashipile at redhat.com Tue Jan 31 14:11:16 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 31 Jan 2017 14:11:16 +0000 Subject: hg: shenandoah/jdk9/hotspot: Print the timings for conc bitmap cleaning. Message-ID: <201701311411.v0VEBHw7020706@aojmv0008.oracle.com> Changeset: 41f7a5c1a5a9 Author: shade Date: 2017-01-31 15:11 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/41f7a5c1a5a9 Print the timings for conc bitmap cleaning. ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp From shade at redhat.com Tue Jan 31 18:57:08 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 31 Jan 2017 19:57:08 +0100 Subject: RFR (S): Ensure BitMaps clearing is done with memset Message-ID: <17ed10b1-d793-cbe3-09cf-acd00c7e65e5@redhat.com> Hi, On our large lab machines, we sometimes see rather high bitmap cleaning times: Concurrent Marking = 353.84 s (a = 1259228 us) (n = 281) (lvls, us = 791016, 1132812, 1250000, 1328125, 3448034) Concurrent Evacuation = 192.62 s (a = 687937 us) (n = 280) (lvls, us = 398438, 609375, 687500, 744141, 1485181) Reset Bitmaps = 53.37 s (a = 190614 us) (n = 280) (lvls, us = 684, 166016, 191406, 218750, 328977) Not the least reason for this is that we are using per-word cleanup, while we should instead use memset. BitMap provides the methods for that, with _large suffixes. This patch ensures it: GCC 4.8 has -ftree-loop-distribute-patterns at -O3, which seems to fold clear_range into memset. This explains why we haven't seen this on our up-to-date development machines. Disabling that optimization demonstrates the benefits for the patch. baseline: Reset Bitmaps = 3.15 s (a = 11430 us) (n = 276) (lvls, us = 94, 8184, 9473, 13086, 35697) patched: Reset Bitmaps = 1.55 s (a = 5394 us) (n = 287) (lvls, us = 90, 4238, 4707, 5332, 19467) Testing: hotspot_gc_shenandoah Thanks, -Aleksey From shade at redhat.com Tue Jan 31 19:00:45 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 31 Jan 2017 20:00:45 +0100 Subject: RFR (S): Ensure BitMaps clearing is done with memset In-Reply-To: <17ed10b1-d793-cbe3-09cf-acd00c7e65e5@redhat.com> References: <17ed10b1-d793-cbe3-09cf-acd00c7e65e5@redhat.com> Message-ID: <19c9cc56-78ef-b4cb-ad71-31b602780d04@redhat.com> On 01/31/2017 07:57 PM, Aleksey Shipilev wrote: > Not the least reason for this is that we are using per-word cleanup, while we > should instead use memset. BitMap provides the methods for that, with _large > suffixes. This patch ensures it: D'oh. Here it is: http://cr.openjdk.java.net/~shade/shenandoah/bitmaps-memset/webrev.01/ Thanks, -Aleksey From rkennke at redhat.com Tue Jan 31 19:04:35 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 31 Jan 2017 20:04:35 +0100 Subject: RFR (S): Ensure BitMaps clearing is done with memset In-Reply-To: <19c9cc56-78ef-b4cb-ad71-31b602780d04@redhat.com> References: <17ed10b1-d793-cbe3-09cf-acd00c7e65e5@redhat.com> <19c9cc56-78ef-b4cb-ad71-31b602780d04@redhat.com> Message-ID: <1485889475.3269.5.camel@redhat.com> Am Dienstag, den 31.01.2017, 20:00 +0100 schrieb Aleksey Shipilev: > On 01/31/2017 07:57 PM, Aleksey Shipilev wrote: > > Not the least reason for this is that we are using per-word > > cleanup, while we > > should instead use memset. BitMap provides the methods for that, > > with _large > > suffixes. This patch ensures it: > > D'oh. Here it is: > ?http://cr.openjdk.java.net/~shade/shenandoah/bitmaps- > memset/webrev.01/ Very good! Do it! Roman From ashipile at redhat.com Tue Jan 31 19:07:37 2017 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 31 Jan 2017 19:07:37 +0000 Subject: hg: shenandoah/jdk9/hotspot: Ensure BitMaps clearing is done with memset. Message-ID: <201701311907.v0VJ7b39011155@aojmv0008.oracle.com> Changeset: a7bdd79f5a47 Author: shade Date: 2017-01-31 20:07 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a7bdd79f5a47 Ensure BitMaps clearing is done with memset. ! src/share/vm/gc/shared/cmBitMap.cpp ! src/share/vm/gc/shared/cmBitMap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp