Request for review (S): 7068625 Testing 8 bytes of card table entries at a time speeds up card-scanning

Tue Mar 13 14:49:35 UTC 2012

Hi,

Updated
http://cr.openjdk.java.net/~brutisso/7068625/webrev.02/

Regards,
Alexey

On Mar 13, 2012, at 2:08, John Coomes <John.Coomes at oracle.com> wrote:

> Alexey Ragozin (alexey.ragozin at gmail.com) wrote:
>> Hi,
>> 
>> Updated patch is here
>> http://cr.openjdk.java.net/~brutisso/7068625/webrev.01/
> 
> Hi Alexey,
> 
> Can you change back to the single decrement of cur_entry in
> ClearNoncleanCardWrapper::do_MemRegion()?  With the updated code,
> having the increment at the end of the loop (line 226 in the new
> version) keeps the preceding comment relevant.
> 
> In cardTableModRefBS.hpp, I think
> 
>  75   // a word worth row of clean card values
> 
> should be
> 
>  75   // a word's worth (row) of clean card values
> 
> Aside from those two nits, looks good.
> 
> -John
> 
>> From: John Coomes <John.Coomes at oracle.com>
>>> Subject: Re: Fwd: (resend) Request for review (S): 7068625 Testing 8
>>>       bytes of        card table entries at a time speeds up card-scanning
>>> To: Bengt Rutisson <bengt.rutisson at oracle.com>
>>> Cc: "hotspot-gc-dev at openjdk.java.net"
>>>       <hotspot-gc-dev at openjdk.java.net>
>>> Message-ID: <20300.16537.700270.474085 at oracle.com>
>>> Content-Type: text/plain; charset=us-ascii
>>> 
>>> Bengt Rutisson (bengt.rutisson at oracle.com) wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> Just pinging this review request. Does anybody have some time to look at
>>>> it? It is a fairly small and straight forward change...
>>> 
>>> Semantically looks ok.  Some style issues, though.
>>> 
>>> cardTableRS.cpp:
>>> ---------------
>>> 
>>> I'd rather see the loop on 207 rewritten as:
>>> 
>>> 207         while (cur_row >= limit &&
>>> 208                *((intptr_t*)cur_row) ==
>>> CardTableRS::clean_card_row()) {
>>> 209           cur_row -= BytesPerWord;
>>> 210         }
>>> 
>> 
>> done
>> 
>> as it's shorter and eliminates the 'break'.
>>> 
>>> There are also whitespace problems (inconsistent indentation, 4 spaces
>>> instead of 2, no space after while, etc.).
>>> 
>>> The comment that starts with 'Reset the dirty window' is now in only
>>> one of the places that does the reset.  And this comment:
>>> 
>>>   // Note that "cur_entry" leads "start_of_non_clean" in
>>>   // its leftward excursion after this point
>>>   // in the loop and, when we hit the left end of "mr",
>>>   // will point off of the left end of the card-table
>>>   // for "mr".
>>> }
>>> 
>>> is in an odd position (no code follows it) and the 'after' should be
>>> changed to 'at'.
>>> 
>>> But instead of changing the comments, the new behavior can be achieved
>>> with a single new hunk of code in do_MemRegion that keeps the comments
>>> relevant:
>>> 
>>> --- cardTableRS.cpp.org
>>> +++ cardTableRS.cpp
>>> @@ -17,10 +17,22 @@
>>>      // "dirty" range accumulated so far.
>>>      if (start_of_non_clean < end_of_non_clean) {
>>>        const MemRegion mrd(start_of_non_clean, end_of_non_clean);
>>>        _dirty_card_closure->do_MemRegion(mrd);
>>>      }
>>> +
>>> +      // fast forward through potential continuous range of clean cards
>>> +      if (is_word_aligned(cur_entry)) {
>>> +        jbyte* cur_row = cur_entry - BytesPerWord;
>>> +        while (cur_row >= limit &&
>>> +               *((intptr_t*)cur_row) == CardTableRS::clean_card_row()) {
>>> +          cur_row -= BytesPerWord;
>>> +        }
>>> +        cur_entry = cur_row + BytesPerWord;
>>> +        cur_hw = _ct->addr_for(cur_row + BytesPerWord);
>>> +      }
>>> +
>>>      // Reset the dirty window, while continuing to look
>>>      // for the next dirty card that will start a
>>>      // new dirty window.
>>>      end_of_non_clean = cur_hw;
>>>      start_of_non_clean = cur_hw;
>>> 
>>> I think it's easier to follow.
>>> 
>>> 
>> agreed
>> 
>> 
>>> cardTableRS.hpp:
>>> ---------------
>>> 
>>> 48   // making this constants into inline methods kills performance for
>>> some reason
>>> 49   static intptr_t clean_card_row() {
>>> 50     return CardTableModRefBS::clean_card_row;
>>> 51   }
>>> 
>>> The comment seems to warn against the very thing the code is doing.
>>> If the comment is accurate, it's fine to eliminate the method and use
>>> the enum directly.
>>> 
>> 
>> comment was included in patch by accident, removed
>> 
>> 
>>> 53   static bool  card_is_dirty_wrt_gen_iter(jbyte cv) {
>>> 54     return CardTableModRefBS::card_is_dirty_wrt_gen_iter(cv);
>>> 
>>> This whitespace change looks spurios.
>>> 
>> 
>> true
>> 
>> 
>>> 
>>> -John
>>> 
>>>> -------- Original Message --------
>>>> Subject:      Request for review (S): 7068625 Testing 8 bytes of card
>>> table
>>>> entries at a time speeds up card-scanning
>>>> Date:         Tue, 21 Feb 2012 12:03:50 +0400
>>>> From:         Alexey Ragozin <alexey.ragozin at gmail.com>
>>>> To:   hotspot-gc-dev at openjdk.java.net
>>>> CC:   Bengt Rutisson <bengt.rutisson at oracle.com>
>>>> 
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> I would like few volunteers to review changes for
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068625
>>>> WebRev: http://cr.openjdk.java.net/~brutisso/7068625/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Ebrutisso/7068625/webrev.00/>
>>>> 
>>>> Change summary
>>>> For large heaps (I was focusing on 8GiB and above) it is common to have
>>>> long continuous ranges of clean cards.
>>>> Patch is introducing a short path for skipping ranges of clean cards
>>>> using word aligned memory access instead of byte aligned.
>>>> 
>>>> Patch affects serial and CMS collectors. For CMS collector stride size
>>>> should be increase to see any performance gains (I was using
>>>> -XX:+UnlockDiagnosticVMOptions
>>>> -XX:ParGCCardsPerStrideChunk=4096)
>>>> 
>>>> For testing I was mainly using synthetic benchmark randomly modifying
>>>> hash tables in heap, thus uniformly touching cards across heaps.
>>>> Average duration of young GC pause were used as KPI.
>>>> More details about testing can be found at
>>>> 
>>> http://blog.ragozin.info/2011/07/openjdk-patch-cutting-down-gc-pause.html
>>>> Though article is referring jdk6, my resent tests with trunk jdk7 show
>>>> no difference.
>>>> I was also tested patch with real application (Oracle Coherence storage
>>>> node).
>>>> With 16GiB of heap and CMS/ParNew GC, enabling patch have shortened GC
>>>> pauses roughly in 2 times.
>>>> 
>>>> Source code of benchmark used in test are available at
>>>> 
>>> https://gridkit.googlecode.com/svn/branches/aragozin-sandbox/young-gc-bench
>>>> Main class YoungGCPauseBenchmark
>>>> 
>>>> Regards,
>>>> Alexey
>>>> 
>>>