G1 question: concurrent cleaning of dirty cards

Wed Jul 17 11:20:39 UTC 2013

Hi,

  trying to revive that somewhat dying thread with some suggestions...

On Fri, 2013-06-28 at 16:02 -0700, Igor Veresov wrote:
> The mutator processing doesn't solve it. The card clearing event is
> still asynchronous with respect to possible mutations in other
> threads. While one mutator thread is processing buffers and clearing
> cards the other can sneak in and do the store to the same object that
> will go unnoticed. So I'm afraid it's either a store-load barrier, or
> we need to stop all mutator threads to prevent this race, or worse..

One option to reduce the overhead of the store-load barrier is to only
execute it if it is needed; actually a large part of the memory accesses
are to the young gen.
These accesses are going to be filtered out by the existing mechanism
anyway, are always dirty, and never reset to clean.

An (e.g. per-region) auxiliary table could be used that indicates that
for a particular region we will actually need the card mark and the
storeload barrier or not.

Outside of safepoints, entries to that table are only ever marked dirty,
never reset to clean. This could be done without synchronization I
think, as in the worst case a thread will see from the card table that
the corresponding regions' cards are dirty (i.e. will be filtered
anyway).

The impact of the additional cost in the barrier might be offset by the
cache bandwidth saved by not accessing the card table to some degree
(and avoiding the StoreLoad barrier for most accesses). The per-region
table should be small (a byte per region would be sufficient).

Actually one could tests where the actual card table lookup is
completely disabled and just always handle mutations in the areas not
covered by this table.
If this area is sufficiently small, this could be an option.

> On Jun 28, 2013, at 1:53 PM, John Cuthbertson
> <john.cuthbertson at oracle.com> wrote:
> 
> > Hi Igor,
> > 
> > Yeah G1 has that facility right now. In fact you added it. :) When
> > the number of completed buffers is below the green zone upper limit,
> > none of the refinement threads are refining buffers. That is the
> > green zone upper limit is number of buffers that we expect to be
> > able to process during the GC without it going over some percentage
> > of the pause time (I think the default is 10%). When the number of
> > buffers grows above the green zone upper limit, the refinement
> > threads start processing the buffers in stepped manner. 
> > 
> > So during the safepoint we would process N - green-zone-upper-limit
> > completed buffers. In fact we could have a watcher task that
> > monitors the number of completed buffers and triggers a safepoint
> > when the number of completed buffers becomes sufficiently high - say
> > above the yellow-zone upper limit.
> > 
> > That does away with the whole notion of concurrent refinement but
> > will remove a lot of the nasty complicated code that gets executed
> > by the mutators or refinement threads.

I think it is possible to only reset the card table at the safepoint;
the buffers that were filled before taking the snapshot can still be
processed concurrently afterwards.

(That is also Igor's suggestion from the other email I think).

That may be somewhat expensive for very large heaps; but as you mention
that effort could be limited by only cleaning the cards that have a
completed buffer entry.

> > My main concern is that the we would be potentially  increasing the
> > number and duration of non-GC safepoints which cause issues with
> > latency sensitive apps. For those workloads that only care about 90%
> > of the transactions this approach would probably be fine.
> > 
> > We would need to evaluate the performance of each approach. 

Hth,
Thomas