RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Ioi Lam ioi.lam at oracle.com
Thu Sep 6 18:11:58 UTC 2018



On 9/6/18 10:28 AM, Jiangli Zhou wrote:
> Hi Ioi,
>
>
> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>
>>
>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> Thanks for the quick review.
>>>
>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>> Hi Jiangli,
>>>>
>>>> Thanks for fixing this! I like the simple approach that minimizes 
>>>> code change.
>>>>
>>>> The following comment is misleading. It suggest that mirror objects 
>>>> cannot be restored at this stage, but doesn't explain why.
>>>>
>>>> Also, I don't think we should single out constant pools (the crash 
>>>> call stack had nothing to do with constant pool.)
>>>>
>>>> 2039     // No mirror object is access/restored during
>>>> 2040     // 
>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>> 2042     //
>>>> 2043     // Object_klass()->constants()->restore_unshareable_info() 
>>>> restores the
>>>> 2044     // constant pool resolved_references array for Object 
>>>> klass. It must be
>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>> All constant pool resolved_references arrays for shared classes are 
>>> archived (like the mirrors). Restoration of the resolved_references 
>>> (of the Object klass) array object is done as part of the 
>>> Object_klass()->constants()->restore_unshareable_info(). Although it 
>>> wasn't shown in the backtrace of this specific crash, it can be an 
>>> issue as long it is done before the mapped archive regions are fixed 
>>> up. That's why MetaspaceShared::fixup_mapped_heap_regions() needs to 
>>> be called before 
>>> Object_klass()->constants()->restore_unshareable_info(). Any 
>>> operation to the archived java objects can be potentially unsafe if 
>>> it is done before MetaspaceShared::fixup_mapped_heap_regions(). So 
>>> we want to place the MetaspaceShared::fixup_mapped_heap_regions() 
>>> call as early as possible but after SystemDictionary::Object_klass() 
>>> is set up. Please let me know if adding some of these explanations 
>>> would help make the comments more clear.
>>>
>> We should have concise comments that clearly describes the issues, 
>> and avoid repeating things that's already in the code. So sentences 
>> like "Fixup the mapped archive heap regions after resolving 
>> Object_klass" doesn't seem that useful.
>>
>> How about this:
>>
>>     resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), 
>> scan, CHECK);
>>     // It's unsafe to access the archived heap regions before they
>>     // are fixed up, so we must do the fixup as early as possible before
>>     // the archived java objects are accessedby function such as
>>     // java_lang_Class::restore_archived_mirror
>>     // and ConstantPool::restore_unshareable_info.
>>     //
>>     // However, MetaspaceShared::fixup_mapped_heap_regions() fills the
>>     // leading empty spaces in the archived regions and may use
>>     // SystemDictionary::Object_klass(), so we can do this only after
>>     // Object_klass is resolved.
>>     //
>>     // Note: no mirror objects are accessed/restored in the above call.
>>     // Mirrors are restored after java.lang.Class is loaded.
>>     MetaspaceShared::fixup_mapped_heap_regions();
> This looks ok. I'll make adjustment as suggested. To prevent 
> Object_klass()->constants()->restore_unshareable_info() accidentally 
> being move above MetaspaceShared::fixup_mapped_heap_regions() in the 
> future, the following comment should be included explicitly.
>
> 2043     // Object_klass()->constants()->restore_unshareable_info() 
> restores the
> 2044     // constant pool resolved_references array for Object klass. 
> It must be
> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().

I think it's already mentioned above with "before .... 
ConstantPool::restore_unshareable_info. "

I don't see any need for this extra verbosity. The longer the comment, 
the less likely people are going to read it. But if you insist, then go 
ahead and add it.

>>
>>
>> BTW, while you're fixing this area, I think we need an assert here:
>>
>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>   ..
>>   if (SystemDictionary::Object_klass_loaded()) {
>>     ...
>>   } else {
>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), 
>> "must be");
>>    }
> Ok.
>>
>> That's because you have special handling for the Object klass, but no 
>> other klasses. However, the check in 
>> ConstantPool::restore_unshareable_info currently doesn't explicitly 
>> match this.
>>
>>     // Initialize the constant pool for the Object_class
>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>
>>
>>> Just to give some background/history information to help understand. 
>>> The mapping/fixing-up of the archive heap region was designed and 
>>> implemented as part of the shared String support. No archived String 
>>> objects were accessed before the region fixup operation. So this was 
>>> not an issue. Later on, open archive heap region was introduced and 
>>> archiving supports for resolved_reference arrays and mirrors were 
>>> implemented. That introduced the hidden issue because some of the 
>>> archived mirrors and resolved_references arrays were accessed early 
>>> (before region fixup). The issue was undiscovered because the mapped 
>>> region start usually was the same as the start of the GC region that 
>>> contains the mapped data. There was no gap at the top of the mapped 
>>> GC regions in normal cases. We recently started to stress testing 
>>> this area heavily with relocations for the default CDS archive and 
>>> the issue then surfaced.
>>>
>>>>
>>>> The real reason we have to do this is:
>>>>
>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, 
>>>> the heap is in an inconsistent state. At some G1 regions, the 
>>>> bottom part contains all zeros. However, some GC operations 
>>>> (debug-build verifications only?) would need to walk G1 regions 
>>>> starting from the bottom, and will run into asserts when we get an 
>>>> uninitialized block.
>>> Just want to make it clear, only the mapped archived heap regions 
>>> may have this issue when the mapped region start is different from 
>>> the containing GC region's start. None of the regular GC regions 
>>> have such issue.
>>>
>>>>
>>>> + One of such operation is the barrier operations, as shown in the 
>>>> crash call stack in the bug report. (Maybe there are others??)
>>> Any operation to the archived java objects could have issue if it is 
>>> done before archive region fixup. That's why we want to do the 
>>> archive region fixup as early as possible.
>>>>
>>>> For safety, we should put in asserts such that -- before 
>>>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not 
>>>> do anything that would trigger barrier operations. For example, we 
>>>> must not call oopDesc::obj_field_put, which triggered the barrier 
>>>> operations in the crash stack.
>>>>
>>>> The crash is timing related and happened only when 
>>>> G1UpdateBufferSize was set to one. I think we need asserts so we 
>>>> can reliably catch any future problems. For example, if someone 
>>>> adds new code to modify obj fields in the archived heap before 
>>>> fixup_mapped_heap_regions() is called, we should get an assert 
>>>> immediately, instead of waiting for some future point that may or 
>>>> may not happen.
>>>>
>>>> I would suggest working with the GC folks for a suitable assert.
>>> Not sure if there is a central place within GC code can be used for 
>>> the assertion, as any operation could be an issue. We can check with 
>>> the GC experts.
>>>
>>> If we want to be extra cautious, we could map the archive heap 
>>> regions without access permission (PROT_NONE) initially. The archive 
>>> region fixup code can then reset memory protection and makes it 
>>> readable/writable. With that, any operations to the archived java 
>>> objects before fixup can trigger signal. This probably should be 
>>> done for debugging only build and should be handled separately.
>>>
>>
>> Making the memory read-only initially seems a very good idea. I don't 
>> think we should put it off in a future RFE. This bug happened because 
>> we don't have a safeguard mechanism, so the bug fix should include 
>> such a mechanism.
> I briefly looked into it already. I think it requires some code 
> restructuring. The scope is larger and beyond the current bug fix.
>
> There is no object restoring before 
> MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put a 
> safeguard, I can place an assert in the CDS code where it materializes 
> an archived object. That's the entry point to GC code.
>
That sounds good.

Thanks
- Ioi
> I'll make changes and update the webrev.
>
> Thanks,
> Jiangli
>
>>
>> I spend 2 days trying to debug this issue. I would hate to have 
>> someone go through this again in the future just because we have 
>> filed an RFE but never get to doing it.
>>
>> Thanks
>> - Ioi
>>
>>
>>> Thanks,
>>> Jiangli
>>>
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>> Please review the following fix for JDK-8209971, 
>>>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>>>> G1UpdateBufferSize=1.
>>>>>
>>>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>
>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>
>>>>> This is a pre-existing issue that's exposed when archived heap 
>>>>> object relocation is enabled. At CDS runtime, the archived heap 
>>>>> regions are mapped back to the runtime java heap. As the mapped 
>>>>> heap data may not occupy the entire runtime GC region(s), the 
>>>>> empty spaces within the regions are filled with dummy objects, 
>>>>> which is done by a fixup process after mapping the data. The fixup 
>>>>> was done after SystemDictionary::initialize(), because fixup needs 
>>>>> to access SystemDictionary::Object_klass() (in some cases). 
>>>>> However that is too late as archived java objects are accessed 
>>>>> when SystemDictionary::initialize() resolves well-known classes.
>>>>>
>>>>> The solution is to call 
>>>>> MetaspaceShared::fixup_mapped_heap_regions() at an earlier time. 
>>>>> In the webre, the call is moved into 
>>>>> SystemDictionary::resolve_preloaded_classes(). It's called after 
>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), 
>>>>> but before any of the archived java objects is accessed. No mirror 
>>>>> object is access/restored during 
>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>>>>> Archived mirrors are restored after java.lang.Class is loaded.
>>>>>
>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - 
>>>>> tier5 normal runs and tier1 - tier4 default CDS runs are in progress.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jiangli
>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list