Request for review (S): 7068625 Testing 8 bytes of card table entries at a time speeds up card-scanning

Wed Mar 14 07:38:38 UTC 2012

Alexey Ragozin (alexey.ragozin at gmail.com) wrote:
> Hi,
> 
> Updated
> http://cr.openjdk.java.net/~brutisso/7068625/webrev.02/

Looks good to me.  Thanks!

-John

> 
> On Mar 13, 2012, at 2:08, John Coomes <John.Coomes at oracle.com> wrote:
> 
> > Alexey Ragozin (alexey.ragozin at gmail.com) wrote:
> >> Hi,
> >> 
> >> Updated patch is here
> >> http://cr.openjdk.java.net/~brutisso/7068625/webrev.01/
> > 
> > Hi Alexey,
> > 
> > Can you change back to the single decrement of cur_entry in
> > ClearNoncleanCardWrapper::do_MemRegion()?  With the updated code,
> > having the increment at the end of the loop (line 226 in the new
> > version) keeps the preceding comment relevant.
> > 
> > In cardTableModRefBS.hpp, I think
> > 
> >  75   // a word worth row of clean card values
> > 
> > should be
> > 
> >  75   // a word's worth (row) of clean card values
> > 
> > Aside from those two nits, looks good.
> > 
> > -John
> > 
> >> From: John Coomes <John.Coomes at oracle.com>
> >>> Subject: Re: Fwd: (resend) Request for review (S): 7068625 Testing 8
> >>>       bytes of        card table entries at a time speeds up card-scanning
> >>> To: Bengt Rutisson <bengt.rutisson at oracle.com>
> >>> Cc: "hotspot-gc-dev at openjdk.java.net"
> >>>       <hotspot-gc-dev at openjdk.java.net>
> >>> Message-ID: <20300.16537.700270.474085 at oracle.com>
> >>> Content-Type: text/plain; charset=us-ascii
> >>> 
> >>> Bengt Rutisson (bengt.rutisson at oracle.com) wrote:
> >>>> 
> >>>> Hi all,
> >>>> 
> >>>> Just pinging this review request. Does anybody have some time to look at
> >>>> it? It is a fairly small and straight forward change...
> >>> 
> >>> Semantically looks ok.  Some style issues, though.
> >>> 
> >>> cardTableRS.cpp:
> >>> ---------------
> >>> 
> >>> I'd rather see the loop on 207 rewritten as:
> >>> 
> >>> 207         while (cur_row >= limit &&
> >>> 208                *((intptr_t*)cur_row) ==
> >>> CardTableRS::clean_card_row()) {
> >>> 209           cur_row -= BytesPerWord;
> >>> 210         }
> >>> 
> >> 
> >> done
> >> 
> >> as it's shorter and eliminates the 'break'.
> >>> 
> >>> There are also whitespace problems (inconsistent indentation, 4 spaces
> >>> instead of 2, no space after while, etc.).
> >>> 
> >>> The comment that starts with 'Reset the dirty window' is now in only
> >>> one of the places that does the reset.  And this comment:
> >>> 
> >>>   // Note that "cur_entry" leads "start_of_non_clean" in
> >>>   // its leftward excursion after this point
> >>>   // in the loop and, when we hit the left end of "mr",
> >>>   // will point off of the left end of the card-table
> >>>   // for "mr".
> >>> }
> >>> 
> >>> is in an odd position (no code follows it) and the 'after' should be
> >>> changed to 'at'.
> >>> 
> >>> But instead of changing the comments, the new behavior can be achieved
> >>> with a single new hunk of code in do_MemRegion that keeps the comments
> >>> relevant:
> >>> 
> >>> --- cardTableRS.cpp.org
> >>> +++ cardTableRS.cpp
> >>> @@ -17,10 +17,22 @@
> >>>      // "dirty" range accumulated so far.
> >>>      if (start_of_non_clean < end_of_non_clean) {
> >>>        const MemRegion mrd(start_of_non_clean, end_of_non_clean);
> >>>        _dirty_card_closure->do_MemRegion(mrd);
> >>>      }
> >>> +
> >>> +      // fast forward through potential continuous range of clean cards
> >>> +      if (is_word_aligned(cur_entry)) {
> >>> +        jbyte* cur_row = cur_entry - BytesPerWord;
> >>> +        while (cur_row >= limit &&
> >>> +               *((intptr_t*)cur_row) == CardTableRS::clean_card_row()) {
> >>> +          cur_row -= BytesPerWord;
> >>> +        }
> >>> +        cur_entry = cur_row + BytesPerWord;
> >>> +        cur_hw = _ct->addr_for(cur_row + BytesPerWord);
> >>> +      }
> >>> +
> >>>      // Reset the dirty window, while continuing to look
> >>>      // for the next dirty card that will start a
> >>>      // new dirty window.
> >>>      end_of_non_clean = cur_hw;
> >>>      start_of_non_clean = cur_hw;
> >>> 
> >>> I think it's easier to follow.
> >>> 
> >>> 
> >> agreed
> >> 
> >> 
> >>> cardTableRS.hpp:
> >>> ---------------
> >>> 
> >>> 48   // making this constants into inline methods kills performance for
> >>> some reason
> >>> 49   static intptr_t clean_card_row() {
> >>> 50     return CardTableModRefBS::clean_card_row;
> >>> 51   }
> >>> 
> >>> The comment seems to warn against the very thing the code is doing.
> >>> If the comment is accurate, it's fine to eliminate the method and use
> >>> the enum directly.
> >>> 
> >> 
> >> comment was included in patch by accident, removed
> >> 
> >> 
> >>> 53   static bool  card_is_dirty_wrt_gen_iter(jbyte cv) {
> >>> 54     return CardTableModRefBS::card_is_dirty_wrt_gen_iter(cv);
> >>> 
> >>> This whitespace change looks spurios.
> >>> 
> >> 
> >> true
> >> 
> >> 
> >>> 
> >>> -John
> >>> 
> >>>> -------- Original Message --------
> >>>> Subject:      Request for review (S): 7068625 Testing 8 bytes of card
> >>> table
> >>>> entries at a time speeds up card-scanning
> >>>> Date:         Tue, 21 Feb 2012 12:03:50 +0400
> >>>> From:         Alexey Ragozin <alexey.ragozin at gmail.com>
> >>>> To:   hotspot-gc-dev at openjdk.java.net
> >>>> CC:   Bengt Rutisson <bengt.rutisson at oracle.com>
> >>>> 
> >>>> 
> >>>> 
> >>>> Hi,
> >>>> 
> >>>> I would like few volunteers to review changes for
> >>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068625
> >>>> WebRev: http://cr.openjdk.java.net/~brutisso/7068625/webrev.00/
> >>>> <http://cr.openjdk.java.net/%7Ebrutisso/7068625/webrev.00/>
> >>>> 
> >>>> Change summary
> >>>> For large heaps (I was focusing on 8GiB and above) it is common to have
> >>>> long continuous ranges of clean cards.
> >>>> Patch is introducing a short path for skipping ranges of clean cards
> >>>> using word aligned memory access instead of byte aligned.
> >>>> 
> >>>> Patch affects serial and CMS collectors. For CMS collector stride size
> >>>> should be increase to see any performance gains (I was using
> >>>> -XX:+UnlockDiagnosticVMOptions
> >>>> -XX:ParGCCardsPerStrideChunk=4096)
> >>>> 
> >>>> For testing I was mainly using synthetic benchmark randomly modifying
> >>>> hash tables in heap, thus uniformly touching cards across heaps.
> >>>> Average duration of young GC pause were used as KPI.
> >>>> More details about testing can be found at
> >>>> 
> >>> http://blog.ragozin.info/2011/07/openjdk-patch-cutting-down-gc-pause.html
> >>>> Though article is referring jdk6, my resent tests with trunk jdk7 show
> >>>> no difference.
> >>>> I was also tested patch with real application (Oracle Coherence storage
> >>>> node).
> >>>> With 16GiB of heap and CMS/ParNew GC, enabling patch have shortened GC
> >>>> pauses roughly in 2 times.
> >>>> 
> >>>> Source code of benchmark used in test are available at
> >>>> 
> >>> https://gridkit.googlecode.com/svn/branches/aragozin-sandbox/young-gc-bench
> >>>> Main class YoungGCPauseBenchmark
> >>>> 
> >>>> Regards,
> >>>> Alexey
> >>>> 
> >>>