RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

Thu Jan 24 00:19:51 UTC 2013

Hi John,

Thanks for this explanation as well.  I see what you're saying about the
concurrency control, but what I don't understand is when this is called:

void reset_hot_cache() {
107     _hot_cache_idx = 0; _n_hot = 0;
108   }

Since these are plain stores, what exactly ensures that they're (promptly)
visible to other GC threads? Is there some dependency here, e.g. if you see
_n_hot = 0 then _hot_cache_idx must also be zero? I strongly suspect I
missed the details in your response that explain why this isn't a concern.
Is there only a particular type of thread that can call reset_hot_cache
and/or only at a certain point? It kind of sounds like it so don't know if
there's an assert that can be added to verify that.

Thanks

Sent from my phone
On Jan 23, 2013 5:51 PM, "John Cuthbertson" <john.cuthbertson at oracle.com>
wrote:

> Hi Vitaly,
>
> Thanks for looking over the code changes. I'll respond to your other
> comments in a separate email. Detailed responses inline....
>
> On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
>
>>
>> Hi John,
>>
>> Wow, that's a giant heap! :)
>>
>> I think G1ConcRSLogCacheSize needs to be validated to make sure it's <=
>> 31; otherwise, I think you get undefined behavior on left shifting with it.
>>
>>
> Good catch. Done.
>
>  I don't think you need _def_use_cache -- can be replaced with
>> G1ConcRSLogCacheSize > 0?
>>
>>
> Done. I've added a function that returns the result of the comparison and
> I use that in place of G1ConcRSLogCacheSize.
>
>  I'm sure this is due to my lack of G1 knowledge, but the concurrency
>> control inside g1HotCardCache is a bit unclear. There's a CAS to claim the
>> region of cards, there's a HotCache lock for inserting a card.  However,
>> reset_hot_cache() does a naked write of a few fields.  Are there any
>> visibility and ordering constraints that need to be enforced? Do some of
>> the stores need an OrderAccess barrier of some sort, depending on what's
>> required? Sorry if I'm just missing it ...
>>
>>
> The drain routine is only called from within a GC pause but it is called
> by multiple GC worker threads. Each worker will claim a chunk of cards
> using the CAS and refine them. Resetting the boundaries (the values reset
> by reset_hot_cache()) in the drain routine would be a mistake since a
> worker thread could see the new boundary values and return, potentially
> leaving some cards unrefined and some missing entries in remembered sets. I
> can only clear the fields when the last thread has finished draining the
> cache. The best place to do this is just before the VM thread re-enables
> the cache (we know the worker threads will have finished at this point).
> Since the "drain" doesn't actually drain, perhaps a better name might be
> refine_all()?
>
> The HotCache lock is used when adding entries to the cache. Entries are
> added by the refinement threads (and there will most likely be more than
> one). Since the act of adding an entry can also evict an entry we need the
> lock to guard against hitting the ABA problem. This could result in
> skipping the refinement of a card, which will lead to missing remembered
> set entries which are not fun to track down.
>
> Draining during the GC is immune from the ABA problem because we're not
> actually removing entries from the cache. We would still be immune,
> however, if we were removing entries since we would not be adding entries
> at the same time.
>
> Thanks,
>
> JohnC
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130123/2aa568db/attachment.htm>