RFR(S): 7192128: G1: Extend fix for 6948537 to G1's BOT

Bengt Rutisson bengt.rutisson at oracle.com
Fri Aug 17 06:18:59 UTC 2012


Hi John,

Great that you found this issue and fixed it so quickly!

I think the change looks good.

An optional change that you can do if you would like to is:

The G1CollectedHeap constructor now has this code:

#ifdef SPARC
   // Issue a stern warning, but allow use for experimentation and 
debugging.
   if (VM_Version::is_sun4v() && UseMemSetInBOT) {
     assert(!FLAG_IS_DEFAULT(UseMemSetInBOT), "Error");
     warning("Experimental flag -XX:+UseMemSetInBOT is known to cause 
instability"
             " on sun4v; please understand that you are using at your 
own risk!");
   }
#endif

Which is duplicated code from the CMSCollector constructor. Could we 
move it to a common place? Maybe even include it in vm_version_sparc.cpp 
-> VM_Version::initialize() where we have the other special treatment of 
UseMemSetInBOT for CMS and G1? Something like:

   if (is_niagara()) {
     ...
     if (UseMemSetInBOT && (UseConcMarkSweepGC || UseG1GC)) {
       if (FLAG_IS_DEFAULT(UseMemSetInBOT) {
         FLAG_SET_DEFAULT(UseMemSetInBOT, false);
       } else {
         warning("Experimental flag -XX:+UseMemSetInBOT is known to 
cause instability"
                 " on sun4v; please understand that you are using at 
your own risk!");
       }
     }
     ...
   }

It would also have been nice to be able have a wrapper method for the 4 
places where we now have this type of code:

     if (UseMemSetInBOT) {
       memset(&_offset_array[index_for(left)], offset, num_cards);
     } else {
       size_t i = index_for(left);
       const size_t end = i + num_cards;
       for (; i < end; i++) {
         _offset_array[i] = offset;
       }
     }

But I'm fine with leaving it as it is for now.

Since I know this change is kind of urgent and my comments are optional 
I'm fine with you pushing the change as it is.

Cheers,
Bengt



On 2012-08-17 01:09, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers to review the fix for this CR? The 
> webrev can be found at: 
> http://cr.openjdk.java.net/~johnc/7192128/webrev.0/
>
> Summary:
> A while back, Ramki discovered an issue on Niagara systems where 
> concurrent readers of the block offset table could see spurious zero 
> entries as a result of the implementation of memset using the SPARC 
> BIS instruction. At the time it was thought that G1 was not affected 
> by this issue.
>
> During testing of the perm-gen removal changes, the development 
> engineers started to see assertion failures and crashes in G1's block 
> offset table. The assertions were about erroneous contents of the 
> offset array. As a result the values used in the assertion were 
> printed and the value being displayed should not have failed the 
> assertion:
>
>> --- a/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp
>> +++ b/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp
>> @@ -546,7 +546,10 @@
>>      assert(_array->offset_array(j) > 0 &&
>>             _array->offset_array(j) <=
>>               (u_char) (N_words+BlockOffsetArray::N_powers-1),
>> -           "offset array should have been set");
>> +           err_msg("offset array should have been set "
>> +           SIZE_FORMAT " not > 0 OR " SIZE_FORMAT " not <= "
>> +           SIZE_FORMAT, _array->offset_array(j), 
>> _array->offset_array(j),
>> +           (N_words+BlockOffsetArray::N_powers-1)));
>>    }
>>  #endif
>>  }
>
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc:  
> SuppressErrorAt=/g1BlockOffsetTable.cpp:552
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error 
> (/tmp/jprt/P1/173804.cphillim/s/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp:552), 
> pid=5210, tid=27
> #  assert(_array->offset_array(j) > 0 && _array->offset_array(j) <= 
> (u_char) (N_words+BlockOffsetArray::N_powers-1)) failed: offset array 
> should have been set 65 not > 0 OR 65 not <= 77
> #
>
> So we had a value which failed the assertion check that, when it was 
> re-read for the error message, had a value which should have passed 
> the assertion check.
>
> It seems that G1 is not immune to the problem seen in 6948537 and I 
> believe that the recent PLAB resizing change has increased the 
> likelihood of hitting the issue as it can increase the amount of 
> concurrent refinement of BlockOffsetTable entries.
>
> Testing: jprt runs with the perm-gen removal changes, command line 
> tests on sparc and x86 systems, GCOld (which was the failing test) on 
> SPARC systems.
>
> Thanks,
>
> JohnC




More information about the hotspot-gc-dev mailing list