RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap
John Cuthbertson
john.cuthbertson at oracle.com
Thu Jan 24 19:34:32 UTC 2013
Hi Vitaly.
I'm not sure it's an issue. I don't recall seeing any unflushed writes
from during the GC when the VM leaves the safepoint even on an RMO
architecture (Itanium).
JohnC
On 1/23/2013 4:41 PM, Vitaly Davidovich wrote:
>
> Got it now - thanks. So then does exiting the safepoint guarantee
> that these writes are flushed to memory so next time GC threads run
> they see 0s? Or is that not important/enforced elsewhere?
>
> Sent from my phone
>
> On Jan 23, 2013 7:36 PM, "John Cuthbertson"
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
> Hi Vitalty,
>
> On 1/23/2013 4:19 PM, Vitaly Davidovich wrote:
>>
>> Hi John,
>>
>> Thanks for this explanation as well. I see what you're saying
>> about the concurrency control, but what I don't understand is
>> when this is called:
>>
>> void reset_hot_cache() {
>> 107 _hot_cache_idx = 0; _n_hot = 0;
>> 108 }
>>
>> Since these are plain stores, what exactly ensures that they're
>> (promptly) visible to other GC threads? Is there some dependency
>> here, e.g. if you see _n_hot = 0 then _hot_cache_idx must also be
>> zero? I strongly suspect I missed the details in your response
>> that explain why this isn't a concern. Is there only a
>> particular type of thread that can call reset_hot_cache and/or
>> only at a certain point? It kind of sounds like it so don't know
>> if there's an assert that can be added to verify that.
>>
>
> At the point where this routine is called the GC workers have
> finished the parallel phases of the GC and are idle. The thread
> that is running is the VM thread. The rest of the VM is at a
> safepoint so we are, in effect, single threaded. Yes there is an
> assert we can add here:
>
> assert(SafepointSynchronize::is_at_safepoint() &&
> Thread::current()->is_VM_thread(), "...");
>
> JohnC
>
>> Thanks
>>
>> Sent from my phone
>>
>> On Jan 23, 2013 5:51 PM, "John Cuthbertson"
>> <john.cuthbertson at oracle.com
>> <mailto:john.cuthbertson at oracle.com>> wrote:
>>
>> Hi Vitaly,
>>
>> Thanks for looking over the code changes. I'll respond to
>> your other comments in a separate email. Detailed responses
>> inline....
>>
>> On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
>>
>>
>> Hi John,
>>
>> Wow, that's a giant heap! :)
>>
>> I think G1ConcRSLogCacheSize needs to be validated to
>> make sure it's <= 31; otherwise, I think you get
>> undefined behavior on left shifting with it.
>>
>>
>> Good catch. Done.
>>
>> I don't think you need _def_use_cache -- can be replaced
>> with G1ConcRSLogCacheSize > 0?
>>
>>
>> Done. I've added a function that returns the result of the
>> comparison and I use that in place of G1ConcRSLogCacheSize.
>>
>> I'm sure this is due to my lack of G1 knowledge, but the
>> concurrency control inside g1HotCardCache is a bit
>> unclear. There's a CAS to claim the region of cards,
>> there's a HotCache lock for inserting a card. However,
>> reset_hot_cache() does a naked write of a few fields.
>> Are there any visibility and ordering constraints that
>> need to be enforced? Do some of the stores need an
>> OrderAccess barrier of some sort, depending on what's
>> required? Sorry if I'm just missing it ...
>>
>>
>> The drain routine is only called from within a GC pause but
>> it is called by multiple GC worker threads. Each worker will
>> claim a chunk of cards using the CAS and refine them.
>> Resetting the boundaries (the values reset by
>> reset_hot_cache()) in the drain routine would be a mistake
>> since a worker thread could see the new boundary values and
>> return, potentially leaving some cards unrefined and some
>> missing entries in remembered sets. I can only clear the
>> fields when the last thread has finished draining the cache.
>> The best place to do this is just before the VM thread
>> re-enables the cache (we know the worker threads will have
>> finished at this point). Since the "drain" doesn't actually
>> drain, perhaps a better name might be refine_all()?
>>
>> The HotCache lock is used when adding entries to the cache.
>> Entries are added by the refinement threads (and there will
>> most likely be more than one). Since the act of adding an
>> entry can also evict an entry we need the lock to guard
>> against hitting the ABA problem. This could result in
>> skipping the refinement of a card, which will lead to missing
>> remembered set entries which are not fun to track down.
>>
>> Draining during the GC is immune from the ABA problem because
>> we're not actually removing entries from the cache. We would
>> still be immune, however, if we were removing entries since
>> we would not be adding entries at the same time.
>>
>> Thanks,
>>
>> JohnC
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130124/bd10cd21/attachment.htm>
More information about the hotspot-gc-dev
mailing list