RFR (M): 8077144: Concurrent mark initialization takes too long

Thomas Schatzl thomas.schatzl at oracle.com
Fri Mar 18 12:04:20 UTC 2016


Hi Mikael,

On Thu, 2016-03-17 at 14:12 +0100, Mikael Gerdin wrote:
> Hi Thomas,
> 
> On 2016-03-15 23:12, Thomas Schatzl wrote:
> > Hi Mikael,
> > 
> >    updated webrev at
> > http://cr.openjdk.java.net/~tschatzl/8077144/webrev.3/ (full)
> > http://cr.openjdk.java.net/~tschatzl/8077144/webrev.2_to_3/ (diff)
> 
> Overall I think this looks okay, given that you want to put further 
> work into this part of the code.

Thanks. I will send out the RFR for the cleanup shortly.

> For future cleanups, here's a couple of comments

I added these change requests to that changeset, and will answer to
them there individually. They seem to be minor, and adding them here
would just make me lots of additional work.

Thanks a lot,
  Thomas


> g1ConcurrentMark.cpp:
> 
> "inline void set_card_bitmap_range(BitMap* bm,"
> 
> Since the name already implies that it mutates the card bitmap,
> perhaps 
> remove the "bm" parameter"?
> The "inline" keyword is superfluous here, the method is already
> defined 
> inline in the class.
> 
> "  // Updates the live data count for the given heap region and
> returns 
> the number
>    // of bytes marked.
>    size_t create_live_data_count(HeapRegion* hr) {"
> 
> The comment and method name could be clearer with the fact that it's 
> re-creating the data for verification purposes.
> 
> "  void set_bit_for_region(HeapRegion* hr) {
>      BitMap::idx_t index = (BitMap::idx_t) hr->hrm_index();
>      _region_bm->par_at_put(index, true);
>    }"
> 
> I'm not convinced that the _region_bm even needs to be part of the
> "helper".
> Perhaps it would be cleaner if the helper class was only responsible 
> for the card bitmap and the modification of the region bit map was 
> moved elsewhere?
> 
> 
> /Mikael
> 
> 
> > 
> > which implements the suggested changes.
> > 
> > It makes the change a lot nicer, thanks for the suggestion. There
> > are
> > still possibilities for improvements. I would prefer to do larger
> > refactorings (particularly to improve testability/code quality and
> > add
> > test cases) after moving all this related code into a separate
> > class/file, which will be next in JDK-8151386 and maybe one or two
> > follow-ups.
> > With that in place, this liveness information can then also be used
> > with fully coarsened per-region tables in remset.
> > 
> > Testing:
> > jprt, vm.gc with -XX:+VerifyDuringGC, local checking of
> > marked_bytes
> > values.
> > 
> > Thanks,
> >    Thomas
> > 
> > On Mon, 2016-03-14 at 16:37 +0100, Mikael Gerdin wrote:
> > > Hi,
> > > 
> > > I had an IM discussion with Thomas around some issues with the
> > > design
> > > of
> > > the aggregation and verification code.
> > > 
> > > G1LiveDataClosureBase should become a utlility class instead of a
> > > base
> > > class. It's weird that the G1VerifyLiveDataHRClosure embeds
> > > another
> > > HRClosure and calls doHeapRegion on it, it would be better if the
> > > mark_marked_during_marking and mark_allocated_since_marking
> > > methods
> > > could be called right away.
> > > 
> > > Mark_marked_during_marking should return the marked bytes instead
> > > and
> > > let the caller take care of mutating the heap region and doing
> > > the
> > > yield
> > > check. There is currently a bug where the verification code calls
> > > add_to_marked_bytes on the HeapRegion. There is also an issue
> > > with
> > > how
> > > add_to_marked_bytes is called on HumongousContinues regions since
> > > marked_bytes is aggregated in each iteration and then the
> > > aggregate
> > > is
> > > added to the current hr.
> > > 
> > > It might be a good idea to hold off on reviews until the updated
> > > webrev
> > > is out.
> > > 
> > > /Mikael
> > > 
> > > On 2016-03-14 14:15, Thomas Schatzl wrote:
> > > > Hi all,
> > > > 
> > > >     could I have reviews for this from-scratch solution for the
> > > > problem
> > > > that G1 startup takes too long?
> > > > 
> > > > Current G1 uses per-mark thread liveness mark bitmaps that span
> > > > the
> > > > entire heap to be ultimately able to create information about
> > > > areas
> > > > in
> > > > the heap where there are any live objects on a card basis.
> > > > This information is needed for scrubbing remembered sets later.
> > > > 
> > > > Basically, in addition to updating the previous bitmap required
> > > > for
> > > > SATB, the marking threads also, for every live object, mark all
> > > > bits
> > > > corresponding to the area the object covers on a per thread
> > > > basis
> > > > on
> > > > these per-thread liveness mark bitmaps.
> > > > 
> > > > During the remark pause, this information is aggregated into
> > > > (two)
> > > > global bitmaps ("Liveness Count Data"), then in the cleanup
> > > > pause
> > > > augmented with some more liveness information, and then used
> > > > for
> > > > scrubbing the remembered sets.
> > > > 
> > > > The main problems with that solution:
> > > > 
> > > > - the per-mark thread data structures take up a lot of space.
> > > > E.g.
> > > > with
> > > > 64 mark threads, this data structure has the same size of the
> > > > Java
> > > > heap. Now, when you need to use 60 mark threads, the heap is
> > > > big.
> > > > And
> > > > at those heap sizes, needing that much more memory hurts a lot.
> > > > 
> > > > - management of these additional data structures is costly, it
> > > > takes a
> > > > long time to initialize, and regularly clear them. The
> > > > increased
> > > > startup time has actually been the cause for this issue.
> > > > 
> > > > - it takes a significant amount of time to aggregate this data
> > > > in
> > > > the
> > > > remark pause.
> > > > 
> > > > - it slows down marking, the combined bitmap update (the prev
> > > > bitmap
> > > > and these per-thread bitmaps) is slower than doing these phases
> > > > seperately.
> > > > 
> > > > This proposed solution removes the per-thread additional mark
> > > > bitmaps,
> > > > and recreates this information from the (complete) prev bitmap
> > > > in
> > > > an
> > > > extra concurrent phase after the Remark pause.
> > > > 
> > > > This can be done since the Prev bitmap does not change after
> > > > Remark
> > > > any
> > > > more.
> > > > 
> > > > In total, this separation of the tasks is faster (lowers
> > > > concurrent
> > > > cycle time) than doing this work at once for the following
> > > > reasons:
> > > > 
> > > >     - I did not observe any throughput regresssions with this
> > > > change:
> > > > actually, throughput of some large applications even increases
> > > > with
> > > > that change (not taking into account that you could increase
> > > > heap
> > > > size
> > > > now since not so much is taken up by these additional bitmaps).
> > > > 
> > > >     - the concurrent phase to prepare for the next marking is
> > > > much
> > > > shorter now, since we do not need to clear lots of memory any
> > > > more.
> > > > 
> > > >     - the remark pause can be much faster (I have measurements
> > > > of a
> > > > decrease in the order of a magnitude on large applications,
> > > > where
> > > > this
> > > > aggregation phase dominates the remark pause).
> > > > 
> > > >     - startup and footprint naturally decreases significantly,
> > > > particularly on large systems.
> > > > 
> > > > As a nice side-effect, the change in effect removes a
> > > > significant
> > > > amount of LOC.
> > > > 
> > > > There is a follow-up change to move (and later clean up) the
> > > > still
> > > > remaining data structures required for scrubbing into extra
> > > > classes,
> > > > since they will be used more cleverly in the future (JDK
> > > > -8151386).
> > > > 
> > > > There will be another follow-up change without CR yet to fix
> > > > the
> > > > use of
> > > > an excessive amount of parallel gc threads for clearing the
> > > > liveness
> > > > count data.
> > > > 
> > > > The change is based on JDK-8151614, JDK-8151126 (I do not think
> > > > it
> > > > conflicts with that actually), and JDK-8151534 (array allocator
> > > > refactoring).
> > > > 
> > > > CR:
> > > > https://bugs.openjdk.java.net/browse/JDK-8077144
> > > > Webrev:
> > > > http://cr.openjdk.java.net/~tschatzl/8077144/webrev.2/
> > > > Testing:
> > > > jprt, vm.gc, kitchensink, some perf benchmarks
> > > > 
> > > > Thanks,
> > > >     Thomas
> > > > 



More information about the hotspot-gc-dev mailing list