RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Jiangli Zhou jiangli.zhou at oracle.com
Fri Sep 7 02:05:06 UTC 2018


Here is the updated webrev that addresses Ioi's comments: 
http://cr.openjdk.java.net/~jiangli/8209971/webrev.01/.

- Updated the comment above the 
MetaspaceShared::fixup_mapped_heap_regions() call as suggested. I made 
some minor adjustments.

- Added the _archive_heap_region_fixed flag, which is set to true when 
the archive heap regions are fixed. Added assert in 
MetaspaceShared::materialize_archived_object() to make sure the mapped 
archive regions are in good state when installing an archived java object.

Thanks,

Jiangli


On 9/6/18 11:11 AM, Ioi Lam wrote:
>
>
> On 9/6/18 10:28 AM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>>
>> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>>
>>>
>>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>> Thanks for the quick review.
>>>>
>>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>>> Hi Jiangli,
>>>>>
>>>>> Thanks for fixing this! I like the simple approach that minimizes 
>>>>> code change.
>>>>>
>>>>> The following comment is misleading. It suggest that mirror 
>>>>> objects cannot be restored at this stage, but doesn't explain why.
>>>>>
>>>>> Also, I don't think we should single out constant pools (the crash 
>>>>> call stack had nothing to do with constant pool.)
>>>>>
>>>>> 2039     // No mirror object is access/restored during
>>>>> 2040     // 
>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>>> 2042     //
>>>>> 2043     // 
>>>>> Object_klass()->constants()->restore_unshareable_info() restores the
>>>>> 2044     // constant pool resolved_references array for Object 
>>>>> klass. It must be
>>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>>> All constant pool resolved_references arrays for shared classes are 
>>>> archived (like the mirrors). Restoration of the resolved_references 
>>>> (of the Object klass) array object is done as part of the 
>>>> Object_klass()->constants()->restore_unshareable_info(). Although 
>>>> it wasn't shown in the backtrace of this specific crash, it can be 
>>>> an issue as long it is done before the mapped archive regions are 
>>>> fixed up. That's why MetaspaceShared::fixup_mapped_heap_regions() 
>>>> needs to be called before 
>>>> Object_klass()->constants()->restore_unshareable_info(). Any 
>>>> operation to the archived java objects can be potentially unsafe if 
>>>> it is done before MetaspaceShared::fixup_mapped_heap_regions(). So 
>>>> we want to place the MetaspaceShared::fixup_mapped_heap_regions() 
>>>> call as early as possible but after 
>>>> SystemDictionary::Object_klass() is set up. Please let me know if 
>>>> adding some of these explanations would help make the comments more 
>>>> clear.
>>>>
>>> We should have concise comments that clearly describes the issues, 
>>> and avoid repeating things that's already in the code. So sentences 
>>> like "Fixup the mapped archive heap regions after resolving 
>>> Object_klass" doesn't seem that useful.
>>>
>>> How about this:
>>>
>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, 
>>> CHECK);
>>>     // It's unsafe to access the archived heap regions before they
>>>     // are fixed up, so we must do the fixup as early as possible 
>>> before
>>>     // the archived java objects are accessedby function such as
>>>     // java_lang_Class::restore_archived_mirror
>>>     // and ConstantPool::restore_unshareable_info.
>>>     //
>>>     // However, MetaspaceShared::fixup_mapped_heap_regions() fills the
>>>     // leading empty spaces in the archived regions and may use
>>>     // SystemDictionary::Object_klass(), so we can do this only after
>>>     // Object_klass is resolved.
>>>     //
>>>     // Note: no mirror objects are accessed/restored in the above call.
>>>     // Mirrors are restored after java.lang.Class is loaded.
>>>     MetaspaceShared::fixup_mapped_heap_regions();
>> This looks ok. I'll make adjustment as suggested. To prevent 
>> Object_klass()->constants()->restore_unshareable_info() accidentally 
>> being move above MetaspaceShared::fixup_mapped_heap_regions() in the 
>> future, the following comment should be included explicitly.
>>
>> 2043     // Object_klass()->constants()->restore_unshareable_info() 
>> restores the
>> 2044     // constant pool resolved_references array for Object klass. 
>> It must be
>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>
> I think it's already mentioned above with "before .... 
> ConstantPool::restore_unshareable_info. "
>
> I don't see any need for this extra verbosity. The longer the comment, 
> the less likely people are going to read it. But if you insist, then 
> go ahead and add it.
>
>>>
>>>
>>> BTW, while you're fixing this area, I think we need an assert here:
>>>
>>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>>   ..
>>>   if (SystemDictionary::Object_klass_loaded()) {
>>>     ...
>>>   } else {
>>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), 
>>> "must be");
>>>    }
>> Ok.
>>>
>>> That's because you have special handling for the Object klass, but 
>>> no other klasses. However, the check in 
>>> ConstantPool::restore_unshareable_info currently doesn't explicitly 
>>> match this.
>>>
>>>     // Initialize the constant pool for the Object_class
>>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>>
>>>
>>>> Just to give some background/history information to help 
>>>> understand. The mapping/fixing-up of the archive heap region was 
>>>> designed and implemented as part of the shared String support. No 
>>>> archived String objects were accessed before the region fixup 
>>>> operation. So this was not an issue. Later on, open archive heap 
>>>> region was introduced and archiving supports for resolved_reference 
>>>> arrays and mirrors were implemented. That introduced the hidden 
>>>> issue because some of the archived mirrors and resolved_references 
>>>> arrays were accessed early (before region fixup). The issue was 
>>>> undiscovered because the mapped region start usually was the same 
>>>> as the start of the GC region that contains the mapped data. There 
>>>> was no gap at the top of the mapped GC regions in normal cases. We 
>>>> recently started to stress testing this area heavily with 
>>>> relocations for the default CDS archive and the issue then surfaced.
>>>>
>>>>>
>>>>> The real reason we have to do this is:
>>>>>
>>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, 
>>>>> the heap is in an inconsistent state. At some G1 regions, the 
>>>>> bottom part contains all zeros. However, some GC operations 
>>>>> (debug-build verifications only?) would need to walk G1 regions 
>>>>> starting from the bottom, and will run into asserts when we get an 
>>>>> uninitialized block.
>>>> Just want to make it clear, only the mapped archived heap regions 
>>>> may have this issue when the mapped region start is different from 
>>>> the containing GC region's start. None of the regular GC regions 
>>>> have such issue.
>>>>
>>>>>
>>>>> + One of such operation is the barrier operations, as shown in the 
>>>>> crash call stack in the bug report. (Maybe there are others??)
>>>> Any operation to the archived java objects could have issue if it 
>>>> is done before archive region fixup. That's why we want to do the 
>>>> archive region fixup as early as possible.
>>>>>
>>>>> For safety, we should put in asserts such that -- before 
>>>>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not 
>>>>> do anything that would trigger barrier operations. For example, we 
>>>>> must not call oopDesc::obj_field_put, which triggered the barrier 
>>>>> operations in the crash stack.
>>>>>
>>>>> The crash is timing related and happened only when 
>>>>> G1UpdateBufferSize was set to one. I think we need asserts so we 
>>>>> can reliably catch any future problems. For example, if someone 
>>>>> adds new code to modify obj fields in the archived heap before 
>>>>> fixup_mapped_heap_regions() is called, we should get an assert 
>>>>> immediately, instead of waiting for some future point that may or 
>>>>> may not happen.
>>>>>
>>>>> I would suggest working with the GC folks for a suitable assert.
>>>> Not sure if there is a central place within GC code can be used for 
>>>> the assertion, as any operation could be an issue. We can check 
>>>> with the GC experts.
>>>>
>>>> If we want to be extra cautious, we could map the archive heap 
>>>> regions without access permission (PROT_NONE) initially. The 
>>>> archive region fixup code can then reset memory protection and 
>>>> makes it readable/writable. With that, any operations to the 
>>>> archived java objects before fixup can trigger signal. This 
>>>> probably should be done for debugging only build and should be 
>>>> handled separately.
>>>>
>>>
>>> Making the memory read-only initially seems a very good idea. I 
>>> don't think we should put it off in a future RFE. This bug happened 
>>> because we don't have a safeguard mechanism, so the bug fix should 
>>> include such a mechanism.
>> I briefly looked into it already. I think it requires some code 
>> restructuring. The scope is larger and beyond the current bug fix.
>>
>> There is no object restoring before 
>> MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put a 
>> safeguard, I can place an assert in the CDS code where it 
>> materializes an archived object. That's the entry point to GC code.
>>
> That sounds good.
>
> Thanks
> - Ioi
>> I'll make changes and update the webrev.
>>
>> Thanks,
>> Jiangli
>>
>>>
>>> I spend 2 days trying to debug this issue. I would hate to have 
>>> someone go through this again in the future just because we have 
>>> filed an RFE but never get to doing it.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>>> Please review the following fix for JDK-8209971, 
>>>>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>>>>> G1UpdateBufferSize=1.
>>>>>>
>>>>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>>
>>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>>
>>>>>> This is a pre-existing issue that's exposed when archived heap 
>>>>>> object relocation is enabled. At CDS runtime, the archived heap 
>>>>>> regions are mapped back to the runtime java heap. As the mapped 
>>>>>> heap data may not occupy the entire runtime GC region(s), the 
>>>>>> empty spaces within the regions are filled with dummy objects, 
>>>>>> which is done by a fixup process after mapping the data. The 
>>>>>> fixup was done after SystemDictionary::initialize(), because 
>>>>>> fixup needs to access SystemDictionary::Object_klass() (in some 
>>>>>> cases). However that is too late as archived java objects are 
>>>>>> accessed when SystemDictionary::initialize() resolves well-known 
>>>>>> classes.
>>>>>>
>>>>>> The solution is to call 
>>>>>> MetaspaceShared::fixup_mapped_heap_regions() at an earlier time. 
>>>>>> In the webre, the call is moved into 
>>>>>> SystemDictionary::resolve_preloaded_classes(). It's called after 
>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), 
>>>>>> but before any of the archived java objects is accessed. No 
>>>>>> mirror object is access/restored during 
>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>>>>>> Archived mirrors are restored after java.lang.Class is loaded.
>>>>>>
>>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - 
>>>>>> tier5 normal runs and tier1 - tier4 default CDS runs are in 
>>>>>> progress.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list