Optimizing card table scanning in CMS collector
Alexey Ragozin
alexey.ragozin at gmail.com
Wed Jul 6 11:13:02 UTC 2011
Hi,
I have done few experiments to analyze cost factors affecting pause duration
of young GC.
Here some interesting results:
It turns out that ClearNoncleanCardWrapper::do_MemRegion method is a severe
bottleneck.
Current implementation of this method scan card table byte by byte which
takes too many CPU cycles. Normally majority of cards are clean, so I have
added fast path to this method which is testing whole row of 8 bytes. Test
have shown rogthly 8 times reduction in card table scan time from this
optimization on serial collector.
On CMS ParNew collector I have to increase stride size
(-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=4096)to see effect.
Modified code of method (cardTableRS.cpp)
void ClearNoncleanCardWrapper::do_MemRegion(MemRegion mr) {
assert(mr.word_size() > 0, "Error");
assert(_ct->is_aligned(mr.start()), "mr.start() should be card aligned");
// mr.end() may not necessarily be card aligned.
jbyte* cur_entry = _ct->byte_for(mr.last());
const jbyte* limit = _ct->byte_for(mr.start());
HeapWord* end_of_non_clean = mr.end();
HeapWord* start_of_non_clean = end_of_non_clean;
while (cur_entry >= limit) {
HeapWord* cur_hw = _ct->addr_for(cur_entry);
if ((*cur_entry != CardTableRS::clean_card_val()) &&
clear_card(cur_entry)) {
// Continue the dirty range by opening the
// dirty window one card to the left.
start_of_non_clean = cur_hw;
cur_entry--;
} else {
// We hit a "clean" card; process any non-empty
// "dirty" range accumulated so far.
if (start_of_non_clean < end_of_non_clean) {
const MemRegion mrd(start_of_non_clean, end_of_non_clean);
_dirty_card_closure->do_MemRegion(mrd);
}
// fast forward via continuous range of clean cards
// hardcoded 64 bit version
if ((((jlong)cur_entry) & 7) == 0) {
jbyte* cur_row = cur_entry - 8;
while(cur_row >= limit) {
if (*((jlong*)cur_row) == ((jlong)-1) /* hardcoded row of
8 clean cards */) {
cur_row -= 8;
}
else {
break;
}
}
cur_entry = cur_row + 7;
HeapWord* last_hw = _ct->addr_for(cur_row + 8);
end_of_non_clean = last_hw;
start_of_non_clean = last_hw;
}
else {
// Reset the dirty window, while continuing to look
// for the next dirty card that will start a
// new dirty window.
end_of_non_clean = cur_hw;
start_of_non_clean = cur_hw;
cur_entry--;
}
}
// Note that "cur_entry" leads "start_of_non_clean" in
// its leftward excursion after this point
// in the loop and, when we hit the left end of "mr",
// will point off of the left end of the card-table
// for "mr".
}
// If the first card of "mr" was dirty, we will have
// been left with a dirty window, co-initial with "mr",
// which we now process.
if (start_of_non_clean < end_of_non_clean) {
const MemRegion mrd(start_of_non_clean, end_of_non_clean);
_dirty_card_closure->do_MemRegion(mrd);
}
}
Some more information about testing and test result are available here
http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html
On my real application effect of this patch was 2.5 reduction of average GC
pause duration for 28GiB heap size. I really hope to see that kind of
improvement in main stream JDK soon.
Thank you
On Wed, Jun 15, 2011 at 12:03 PM, Alexey Ragozin
<alexey.ragozin at gmail.com>wrote:
> Hi,
>
> Recently I was analyzing CMS GC pause times on JVM with 32Gb of heap
> (using Oracle Coherence node as sample application). It seems like young
> collection pause time is totally dominated by time required to scan card
> table (I suppose size of table should be 64Mb in this case). I believe time
> to scan card table could be cut significantly at price of slightly more
> complex write-barrier. By introducing super-cards collector can avoid
> scanning whole ranges of card table. I would like to implement POC to prove
> reduction of young collection pause (also it should probably reduce CMS
> remark pause time).
>
> I need an advice to locate right places for modification in code base (I’m
> not familiar with it). I thing I can ignore JIT for sake of POC (running JVM
> in interpreter mode). So I need to modify write barrier used in interpreter
> and card table scanning procedure.
>
>
> Thank you for advice.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110706/52a82572/attachment.htm>
More information about the hotspot-gc-dev
mailing list