RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

Wed Jan 16 00:57:07 UTC 2013

Hi John,

Wow, that's a giant heap! :)

I think G1ConcRSLogCacheSize needs to be validated to make sure it's <= 31;
otherwise, I think you get undefined behavior on left shifting with it.

I don't think you need _def_use_cache -- can be replaced with
G1ConcRSLogCacheSize > 0?

I'm sure this is due to my lack of G1 knowledge, but the concurrency
control inside g1HotCardCache is a bit unclear.  There's a CAS to claim the
region of cards, there's a HotCache lock for inserting a card.  However,
reset_hot_cache() does a naked write of a few fields.  Are there any
visibility and ordering constraints that need to be enforced? Do some of
the stores need an OrderAccess barrier of some sort, depending on what's
required? Sorry if I'm just missing it ...

I didn't finish looking at the rest yet so that's all I have for the moment.

Thanks

Sent from my phone
On Jan 15, 2013 6:32 PM, "John Cuthbertson" <john.cuthbertson at oracle.com>
wrote:

> Hi Everyone,
>
> Can I have a couple of people look over the changes for this CR - the
> webrev can be found at: http://cr.openjdk.java.net/~**
> johnc/7176479/webrev.0/<http://cr.openjdk.java.net/~johnc/7176479/webrev.0/>
>
> Background:
> The issue here was that we were encoding the card index into the card
> counts table entries along with the GC number so that we could determine if
> the count associated with was valid. We had a check to ensure that the
> maximum card index could be encoded in an int. With such large heap size -
> the number of cards could not be encoded and so the check failed.
>
> The previous mechanism was an attempt to solve the problem of one thread
> arriving late to the actual GC work. The thread in question was being held
> up zeroing the card counts table at the start of the GC. The card counts
> table is used to determine which cards are being refined frequently. Once a
> card has been refined frequently enough, further refinements of that card
> are delayed by placing the card into a fixed size evicting table - the hot
> card cache. The card would then be refined when it was evicted from the hot
> card cache or when the cache was drained during the next GC.
>
> To solve the problem of zeroing we added an epoch (GC number) to the
> entries in the counts table and, eliminate the increase in footprint, we
> made the counts table into a cache which would expand if needed. This
> approach had some negatives: we might have to refine two cards during a
> single refinement operation, hashing the card, and performing CAS
> operations increasing the overhead of concurrent refinement. Also expanding
> the counts table during a GC incurred a penalty.
>
> This approach also limited the heap size to just under 1TB - which the
> systems team ran into.
>
> The new approach effectively undoes the previous mechanism and
> re-simplifies the card counts table.
>
> Summary of Changes:
> The hot card cache and card counts table have been moved from the
> concurrent refinement code into their own files.
>
> The hot card cache can now exist independently of whether the counts table
> exists. In this case refining a card once adds it to the hot card cache,
> i.e. all cards are treated as 'hot'.
>
> The interface to the hot card cache has been simplified - a simple query
> and a simple drain routine. This simplifies the calling code in
> g1RemSet.cpp and results in up to only a single card being refined for
> every call to "refine_card" instead of possibly two. This should reduce the
> overhead of concurrent refinement.
>
> The number of cards that the hot card cache can hold before cards start
> getting evicted is controlled by the flag G1ConcRSLogCacheSize, which is
> now product flag. The default value is 10 giving a hot card cache that can
> hold 1K cards.
>
> The card counts table has been greatly simplified. It is a simple array of
> counts how many times a card has been refined. The space for the table is
> now allocated from virtual memory instead of C heap. The space for the
> table is committed when the heap is initially committed and the spans the
> committed size of the heap. When the committed size of the heap is
> expanded, the counts table is also expanded to cover the newly expanded
> heap. If we fail to commit the memory for the counts table, cards that map
> to the uncommitted space will be treated as cold, i.e. they will be refined
> immediately. Having a simpler counts table also should reduce the overhead
> of concurrent refinement (there is no need to hash the card index and there
> are no CAS operations) Having a simpler interface will allow us to change
> the underlying data structure to an alternative that's perhaps more sparse
> in the future.
>
> During an incremental GC we no longer zero the entire counts table. We now
> zero the cards spanned by a region when the region is freed (i.e. when we
> free the collection set at the end of a GC and when we free regions at the
> end of a cleanup).  If a card was "hot" before a GC then we will consider
> it hot after the GC and the first refinement after the GC will insert the
> card into the hot card cache. Furthermore, since we don't refine cards in
> young regions, we only need to clear the counts associated with cards
> spanned by non-young regions.
>
> During a full GC we still discard the entries in the hot card cache and
> zero the counts for all the cards in the heap.
>
> Testing:
> GC Test suite with MaxTenuringThreshold=0 (to increase the amount of
> refinement) and a low IHOP value (to force cleanups).
> SPECjbb2005 with a 1.5TB heap size and 256GB young size,
> MaxTenuringThreshold=0 and a low IHOP value (1%). The systems team are
> continuing to test with very large heaps.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130115/f1c913bc/attachment.htm>