RFR (M): 8027295: Free CSet takes ~50% of young pause time
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Feb 17 08:59:55 UTC 2014
Hi all,
at http://cr.openjdk.java.net/~tschatzl/8027295/webrev.1/ there is a
slightly updated webrev, undoing some last minute change: in particular
in G1CodeRootSet::find() I changed the chunk iteration order starting
originally from the head to the tail chunk. As ::add() adds new chunks
to the head, this would mean that find() actually starts searching from
the end of the list.
Here's the diff:
- G1CodeRootChunk* cur = _list.tail();
+ G1CodeRootChunk* cur = _list.head();
while (cur != NULL) {
if (cur->find(method)) {
return cur;
}
- cur = (G1CodeRootChunk*)cur->prev();
+ cur = (G1CodeRootChunk*)cur->next();
}
Thanks,
Thomas
On Thu, 2014-02-13 at 18:54 +0100, Thomas Schatzl wrote:
> Hi all,
>
> can I have reviews for the following change that improves the (serial)
> performance of freeing the collection set? On applications that have a
> high amount of collection set regions, freeing the CSet takes up a large
> part of the entire collection pause (e.g. 50% on 2GB heaps) and/or takes
> really long in absolute terms (500ms on 460GB heaps).
>
> This change tries to introduce several small changes across CSet freeing
> that improve the total serial performance by around ~33%.
>
> It consists of the following changes (please also have a look at the CR
> for some figures):
>
> - manage code cache roots as set of chunks of nmethods
> - improves performance for code cache roots reclamation
> - also improves removing/adding elements slightly (no need to
> reallocate and copy around the entire GrowableArray)
> - this change is also a prerequisite for better load balancing code
> cache root scanning
> - some chunk cache to avoid malloc()/free() calls that were the
> performance issue using the FreeList class. (It unfortunately adds some
> interface clutter but I _really_ did not want to add the 100th
> implementation of a linked list in the GC code. It seems good enough).
>
> - fast card cache changes
> - pad FCC rows to cache line size to avoid any false sharing (every
> row represents the card cache for a single worker thread)
> - fixed (the surprising) main performance problem in FCC clearing by
> simply factoring out the call to HeapRegionRemSet::num_par_rem_sets()
> from the clear loop
> - a future change will extract the FCC into a separate class as
> cleanup (JDK-8034868)
>
> - moved the mutex to protect the OtherRegionsTable up to the
> HeapRegionRemSet
> - fixes a (potential) bug that we do not protect code roots cleanup by
> a lock
> - it seems to be more fitting, as this lock is actually supposed to
> protect the entire RSet, not only the OtherRegionsTable part
>
> - some interface changes to avoid locking mutexes unnecessarily during
> cleanup (seems to give 3% Free CSet time on TOPLINK)
> - i.e. the "locked" parameter for G1CollectedHeap::free_region().
>
> - added new statistics output separating young/nonyoung free cset time
> when G1LogLevel is set to finest
>
> - other changes
> - minor cleanups
>
> - the remaining changes in this area are
> - clearing and counting the length of the sparse RSet; that would need
> some quite intrusive RSet changes and is TODO.
> - parallelization: moved parallelization efforts into a separate CR,
> JDK-8034842.
> - concurrent collection set freeing: to be considered in a follow-up
> CR (JDK-8034873) for when parallelization stops scaling (like in cases
> when cset freeing already takes only a few ms and adding another thread
> just decreases performance) or just to decrease pause time further.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8027295
>
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8027295/webrev/
>
> Testing:
> JPRT with this version, specjbb*, specjvm*, dacapo, PSR tests (Fuse, BPM
> stress, SalesServer, TOPLINK) with a slightly less cleaned up version.
>
> Thanks,
> Thomas
>
>
>
More information about the hotspot-gc-dev
mailing list