RFR(S): 8010463: G1: Crashes with -UseTLAB and heap verification

John Cuthbertson john.cuthbertson at oracle.com
Fri Mar 22 19:02:23 UTC 2013


Hi Thomas,

Thanks for looking over the changes. Replies inline....

On 3/22/2013 3:15 AM, Thomas Schatzl wrote:
> Hi,
>
> On Fri, 2013-03-22 at 07:48 +0100, Bengt Rutisson wrote:
>> Hi John,
>> Your changes look good to me.
>>
>> I think your motivation for removing the verification from
>> universe2_init() and init_globals() is fine. Actually I wonder why
>> they were there in the first place, but they do seem intentionally put
>> in there. However, I'm fine with removing them.
> These verifications seem to make the intervals between verification
> smaller to be able to pinpoint any problem location quicker.
> Threads::create_vm() does a lot after all. Removal is fine with me as
> well.

That was my guess as well but it looks like only CMS and Serial would do 
some verification. PS looks like it skips checking it's generations 
until just before the first real proper GC.

> Comparing to the comments on the CR (great analysis btw!), you opted for
> the approach to just skip verification of roots, heap regions and remset
> until a safepoint. From your analysis, the other option would have been
> to allow a NULL VM thread in some places.
>
> Allowing a NULL VMThread would allow checking heap consistency right
> after Universe::genesis(). It is probably not expected that there are
> already issues about the heap at this stage, but since we're already
> verifying, I'd tend to prefer the option that verifies as much as
> possible.
> (Did not really check if it would really make a difference)
>
> Maybe you could add a guarantee() or such in Threads::oops_do() and
> Threads::possibly_parallel_oops_do() that only allowed a NULL VM Thread
> during initialization?

I wouldn't say "opted". I would say "made the -UseTLABS and +UseTLABS 
cases consistent". :)

I actually tried to get enable the verification but after several 
different crash/assertion errors decided it wasn't worth it. To do it we 
would need to do it at a safepoint - hence the move of the verify to 
after the creation of the VM thread - so we have something to execute 
the VM op.

It wasn't just a NULL VM thread. That was only the first error. Here's 
where I went:

Crash #1: Null VMThread. Changed Threads::oops_do() and 
Threads::possibly_parallel_oops_do() to allow for a NULL VM thread.
This kind of makes sense. Afterall if we don't have a VM thread, there's 
no need to scan it for oops.

Assertion #2: Assertion failure in ObjectSynchronizer::oops_do():

> void ObjectSynchronizer::oops_do(OopClosure* f) {
>   assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint");
>   for (ObjectMonitor* block = gBlockList; block != NULL; block = 
> next(block)) {
>     assert(block->object() == CHAINMARKER, "must be a block header");
>     for (int i = 1; i < _BLOCKSIZE; i++) {
>       ObjectMonitor* mid = &block[i];
>       if (mid->object() != NULL) {
>         f->do_oop((oop*)mid->object_addr());
>       }
>     }
>   }
> }
>

Changed this one to allow for execution during VM startup. Afterall 
gBlockList was null so executing the routine before any Java code was 
executed was safe.

Assertion #3: Assertion failure in G1CollectedHeap::verify_region_sets():

> void G1CollectedHeap::verify_region_sets() {
>   assert_heap_locked_or_at_safepoint(true /* should_be_vm_thread */);

Tried two approaches. The first was to lock the Heap_lock. That led to

Assertion #4: Assertion failure in CodeCache::blobs_do:

> void CodeCache::blobs_do(CodeBlobClosure* f) {
>   assert_locked_or_safepoint(CodeCache_lock);
>   FOR_ALL_ALIVE_BLOBS(cb) {
>     f->do_code_blob(cb);
>
> #ifdef ASSERT
>     if (cb->is_nmethod())
>       ((nmethod*)cb)->verify_scavenge_root_oops();
> #endif //ASSERT
>   }
> }

Why this did not fail when the Heap_lock was not locked - I do not know. 
But locking the CodeCache_lock was out of the question. Regardless of 
the locking over between the CodeCache_lock and Heap_lock, you get the 
deadlock detection assertion messages - which was assertion #5. I then 
removed locking the Heap_lock and allowed 
G1CollectedHeap::verify_region_sets() to be executed during startup. I 
didn't even look at failure #6.

Following this approach is not worth it. If we want to do then we will 
need to do it in a VMOperation as a separate changeset.

>> One question that is not strictly related to your change:
>>
>> The code to do the verification in Threads::create_vm() is:
>>
>> 3449   if (VerifyBeforeGC &&
>> 3450       Universe::heap()->total_collections() >= VerifyGCStartAt) {
>> 3451     Universe::heap()->prepare_for_verify();
>> 3452     Universe::verify();   // make sure we're starting with a
>> clean slate
>> 3453   }
>>
>> This is what it looked like before your change as well. But to me this
>> looks kind of odd. First, we re-use the flag VerifyBeforeGC even
>> though we are not about to do a GC. I can live with that, but it is
> You mean because of the name of the flag? That's a point. Otoh at VM
> start you're technically before the next gc (whenever this happens ;)

>
>>   kind of strange. Then we have the check against VerifyGCStartAt. By
>> default this is 0 so we will do the verification. But why do we do
>> this check? There is no chance that we have been able to do any GC
>> yet, right? So, checking against Universe::heap()->total_collections()
> Some values of some command line options change VerifyGCStartAt to
> eventually disable verification at VM startup (beginning
> arguments.cpp:2004).
>
>>   seems unnecessary. We should either check VerifyGCStartAt == 0 or not
> I would not mind either way; but you still need the VerifyBeforeGC (or
> another flag) because otherwise the verification would practically be
> unconditional as the default value of VerifyGCStartAt is zero.

In my response to Bengt - I said I think we have a few choices:

* Add a new flag to verify the heap during VM startup.
* Rename and extend the existing VerifyOnExit(??) flag to cover 
verifying during startup.
* change the code to "VerifyBeforeGC && VerifyGCStartAt == 0"
* Leave it as we, in the GC team, are familiar with that particular 
pattern. :)

Thanks,

JohnC



More information about the hotspot-gc-dev mailing list