RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

Thu Jan 24 19:34:32 UTC 2013

Hi Vitaly.

I'm not sure it's an issue. I don't recall seeing any unflushed writes 
from during the GC when the VM leaves the safepoint even on an RMO 
architecture (Itanium).

JohnC

On 1/23/2013 4:41 PM, Vitaly Davidovich wrote:
>
> Got it now - thanks.  So then does exiting the safepoint guarantee 
> that these writes are flushed to memory so next time GC threads run 
> they see 0s? Or is that not important/enforced elsewhere?
>
> Sent from my phone
>
> On Jan 23, 2013 7:36 PM, "John Cuthbertson" 
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>     Hi Vitalty,
>
>     On 1/23/2013 4:19 PM, Vitaly Davidovich wrote:
>>
>>     Hi John,
>>
>>     Thanks for this explanation as well.  I see what you're saying
>>     about the concurrency control, but what I don't understand is
>>     when this is called:
>>
>>     void reset_hot_cache() {
>>     107     _hot_cache_idx = 0; _n_hot = 0;
>>     108   }
>>
>>     Since these are plain stores, what exactly ensures that they're
>>     (promptly) visible to other GC threads? Is there some dependency
>>     here, e.g. if you see _n_hot = 0 then _hot_cache_idx must also be
>>     zero? I strongly suspect I missed the details in your response
>>     that explain why this isn't a concern.  Is there only a
>>     particular type of thread that can call reset_hot_cache and/or
>>     only at a certain point? It kind of sounds like it so don't know
>>     if there's an assert that can be added to verify that.
>>
>
>     At the point where this routine is called the GC workers have
>     finished the parallel phases of the GC and are idle. The thread
>     that is running is the VM thread. The rest of the VM is at a
>     safepoint so we are, in effect, single threaded. Yes there is an
>     assert we can add here:
>
>     assert(SafepointSynchronize::is_at_safepoint() &&
>     Thread::current()->is_VM_thread(), "...");
>
>     JohnC
>
>>     Thanks
>>
>>     Sent from my phone
>>
>>     On Jan 23, 2013 5:51 PM, "John Cuthbertson"
>>     <john.cuthbertson at oracle.com
>>     <mailto:john.cuthbertson at oracle.com>> wrote:
>>
>>         Hi Vitaly,
>>
>>         Thanks for looking over the code changes. I'll respond to
>>         your other comments in a separate email. Detailed responses
>>         inline....
>>
>>         On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
>>
>>
>>             Hi John,
>>
>>             Wow, that's a giant heap! :)
>>
>>             I think G1ConcRSLogCacheSize needs to be validated to
>>             make sure it's <= 31; otherwise, I think you get
>>             undefined behavior on left shifting with it.
>>
>>
>>         Good catch. Done.
>>
>>             I don't think you need _def_use_cache -- can be replaced
>>             with G1ConcRSLogCacheSize > 0?
>>
>>
>>         Done. I've added a function that returns the result of the
>>         comparison and I use that in place of G1ConcRSLogCacheSize.
>>
>>             I'm sure this is due to my lack of G1 knowledge, but the
>>             concurrency control inside g1HotCardCache is a bit
>>             unclear. There's a CAS to claim the region of cards,
>>             there's a HotCache lock for inserting a card.  However,
>>             reset_hot_cache() does a naked write of a few fields.
>>              Are there any visibility and ordering constraints that
>>             need to be enforced? Do some of the stores need an
>>             OrderAccess barrier of some sort, depending on what's
>>             required? Sorry if I'm just missing it ...
>>
>>
>>         The drain routine is only called from within a GC pause but
>>         it is called by multiple GC worker threads. Each worker will
>>         claim a chunk of cards using the CAS and refine them.
>>         Resetting the boundaries (the values reset by
>>         reset_hot_cache()) in the drain routine would be a mistake
>>         since a worker thread could see the new boundary values and
>>         return, potentially leaving some cards unrefined and some
>>         missing entries in remembered sets. I can only clear the
>>         fields when the last thread has finished draining the cache.
>>         The best place to do this is just before the VM thread
>>         re-enables the cache (we know the worker threads will have
>>         finished at this point). Since the "drain" doesn't actually
>>         drain, perhaps a better name might be refine_all()?
>>
>>         The HotCache lock is used when adding entries to the cache.
>>         Entries are added by the refinement threads (and there will
>>         most likely be more than one). Since the act of adding an
>>         entry can also evict an entry we need the lock to guard
>>         against hitting the ABA problem. This could result in
>>         skipping the refinement of a card, which will lead to missing
>>         remembered set entries which are not fun to track down.
>>
>>         Draining during the GC is immune from the ABA problem because
>>         we're not actually removing entries from the cache. We would
>>         still be immune, however, if we were removing entries since
>>         we would not be adding entries at the same time.
>>
>>         Thanks,
>>
>>         JohnC
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130124/bd10cd21/attachment.htm>