RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Jiangli Zhou jiangli.zhou at oracle.com
Fri Sep 7 16:35:18 UTC 2018


Thanks, Ioi!

Jiangli 

> On Sep 7, 2018, at 7:03 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
> 
> Hi Jiangli,
> 
> This looks great! Thanks for coming up with a simple and clean fix.
> 
> - Ioi
> 
> 
>> On 9/6/18 7:05 PM, Jiangli Zhou wrote:
>> Here is the updated webrev that addresses Ioi's comments: http://cr.openjdk.java.net/~jiangli/8209971/webrev.01/.
>> 
>> - Updated the comment above the MetaspaceShared::fixup_mapped_heap_regions() call as suggested. I made some minor adjustments.
>> 
>> - Added the _archive_heap_region_fixed flag, which is set to true when the archive heap regions are fixed. Added assert in MetaspaceShared::materialize_archived_object() to make sure the mapped archive regions are in good state when installing an archived java object.
>> 
>> Thanks,
>> 
>> Jiangli
>> 
>> 
>>> On 9/6/18 11:11 AM, Ioi Lam wrote:
>>> 
>>> 
>>>> On 9/6/18 10:28 AM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>> 
>>>> 
>>>>> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>>>> 
>>>>> 
>>>>>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>> 
>>>>>> Thanks for the quick review.
>>>>>> 
>>>>>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>>>>> Hi Jiangli,
>>>>>>> 
>>>>>>> Thanks for fixing this! I like the simple approach that minimizes code change.
>>>>>>> 
>>>>>>> The following comment is misleading. It suggest that mirror objects cannot be restored at this stage, but doesn't explain why.
>>>>>>> 
>>>>>>> Also, I don't think we should single out constant pools (the crash call stack had nothing to do with constant pool.)
>>>>>>> 
>>>>>>> 2039     // No mirror object is access/restored during
>>>>>>> 2040     // resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>>>>> 2042     //
>>>>>>> 2043     // Object_klass()->constants()->restore_unshareable_info() restores the
>>>>>>> 2044     // constant pool resolved_references array for Object klass. It must be
>>>>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>>>>> All constant pool resolved_references arrays for shared classes are archived (like the mirrors). Restoration of the resolved_references (of the Object klass) array object is done as part of the Object_klass()->constants()->restore_unshareable_info(). Although it wasn't shown in the backtrace of this specific crash, it can be an issue as long it is done before the mapped archive regions are fixed up. That's why MetaspaceShared::fixup_mapped_heap_regions() needs to be called before Object_klass()->constants()->restore_unshareable_info(). Any operation to the archived java objects can be potentially unsafe if it is done before MetaspaceShared::fixup_mapped_heap_regions(). So we want to place the MetaspaceShared::fixup_mapped_heap_regions() call as early as possible but after SystemDictionary::Object_klass() is set up. Please let me know if adding some of these explanations would help make the comments more clear.
>>>>>> 
>>>>> We should have concise comments that clearly describes the issues, and avoid repeating things that's already in the code. So sentences like "Fixup the mapped archive heap regions after resolving Object_klass" doesn't seem that useful.
>>>>> 
>>>>> How about this:
>>>>> 
>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK);
>>>>>     // It's unsafe to access the archived heap regions before they
>>>>>     // are fixed up, so we must do the fixup as early as possible before
>>>>>     // the archived java objects are accessedby function such as
>>>>>     // java_lang_Class::restore_archived_mirror
>>>>>     // and ConstantPool::restore_unshareable_info.
>>>>>     //
>>>>>     // However, MetaspaceShared::fixup_mapped_heap_regions() fills the
>>>>>     // leading empty spaces in the archived regions and may use
>>>>>     // SystemDictionary::Object_klass(), so we can do this only after
>>>>>     // Object_klass is resolved.
>>>>>     //
>>>>>     // Note: no mirror objects are accessed/restored in the above call.
>>>>>     // Mirrors are restored after java.lang.Class is loaded.
>>>>>     MetaspaceShared::fixup_mapped_heap_regions();
>>>> This looks ok. I'll make adjustment as suggested. To prevent Object_klass()->constants()->restore_unshareable_info() accidentally being move above MetaspaceShared::fixup_mapped_heap_regions() in the future, the following comment should be included explicitly.
>>>> 
>>>> 2043     // Object_klass()->constants()->restore_unshareable_info() restores the
>>>> 2044     // constant pool resolved_references array for Object klass. It must be
>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>> 
>>> I think it's already mentioned above with "before .... ConstantPool::restore_unshareable_info. "
>>> 
>>> I don't see any need for this extra verbosity. The longer the comment, the less likely people are going to read it. But if you insist, then go ahead and add it.
>>> 
>>>>> 
>>>>> 
>>>>> BTW, while you're fixing this area, I think we need an assert here:
>>>>> 
>>>>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>>>>   ..
>>>>>   if (SystemDictionary::Object_klass_loaded()) {
>>>>>     ...
>>>>>   } else {
>>>>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), "must be");
>>>>>    }
>>>> Ok.
>>>>> 
>>>>> That's because you have special handling for the Object klass, but no other klasses. However, the check in ConstantPool::restore_unshareable_info currently doesn't explicitly match this.
>>>>> 
>>>>>     // Initialize the constant pool for the Object_class
>>>>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>>>> 
>>>>> 
>>>>>> Just to give some background/history information to help understand. The mapping/fixing-up of the archive heap region was designed and implemented as part of the shared String support. No archived String objects were accessed before the region fixup operation. So this was not an issue. Later on, open archive heap region was introduced and archiving supports for resolved_reference arrays and mirrors were implemented. That introduced the hidden issue because some of the archived mirrors and resolved_references arrays were accessed early (before region fixup). The issue was undiscovered because the mapped region start usually was the same as the start of the GC region that contains the mapped data. There was no gap at the top of the mapped GC regions in normal cases. We recently started to stress testing this area heavily with relocations for the default CDS archive and the issue then surfaced.
>>>>>> 
>>>>>>> 
>>>>>>> The real reason we have to do this is:
>>>>>>> 
>>>>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, the heap is in an inconsistent state. At some G1 regions, the bottom part contains all zeros. However, some GC operations (debug-build verifications only?) would need to walk G1 regions starting from the bottom, and will run into asserts when we get an uninitialized block.
>>>>>> Just want to make it clear, only the mapped archived heap regions may have this issue when the mapped region start is different from the containing GC region's start. None of the regular GC regions have such issue.
>>>>>> 
>>>>>>> 
>>>>>>> + One of such operation is the barrier operations, as shown in the crash call stack in the bug report. (Maybe there are others??)
>>>>>> Any operation to the archived java objects could have issue if it is done before archive region fixup. That's why we want to do the archive region fixup as early as possible.
>>>>>>> 
>>>>>>> For safety, we should put in asserts such that -- before MetaspaceShared::fixup_mapped_heap_regions() is done, we must not do anything that would trigger barrier operations. For example, we must not call oopDesc::obj_field_put, which triggered the barrier operations in the crash stack.
>>>>>>> 
>>>>>>> The crash is timing related and happened only when G1UpdateBufferSize was set to one. I think we need asserts so we can reliably catch any future problems. For example, if someone adds new code to modify obj fields in the archived heap before fixup_mapped_heap_regions() is called, we should get an assert immediately, instead of waiting for some future point that may or may not happen.
>>>>>>> 
>>>>>>> I would suggest working with the GC folks for a suitable assert.
>>>>>> Not sure if there is a central place within GC code can be used for the assertion, as any operation could be an issue. We can check with the GC experts.
>>>>>> 
>>>>>> If we want to be extra cautious, we could map the archive heap regions without access permission (PROT_NONE) initially. The archive region fixup code can then reset memory protection and makes it readable/writable. With that, any operations to the archived java objects before fixup can trigger signal. This probably should be done for debugging only build and should be handled separately.
>>>>>> 
>>>>> 
>>>>> Making the memory read-only initially seems a very good idea. I don't think we should put it off in a future RFE. This bug happened because we don't have a safeguard mechanism, so the bug fix should include such a mechanism.
>>>> I briefly looked into it already. I think it requires some code restructuring. The scope is larger and beyond the current bug fix.
>>>> 
>>>> There is no object restoring before MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put a safeguard, I can place an assert in the CDS code where it materializes an archived object. That's the entry point to GC code.
>>>> 
>>> That sounds good.
>>> 
>>> Thanks
>>> - Ioi
>>>> I'll make changes and update the webrev.
>>>> 
>>>> Thanks,
>>>> Jiangli
>>>> 
>>>>> 
>>>>> I spend 2 days trying to debug this issue. I would hate to have someone go through this again in the future just because we have filed an RFE but never get to doing it.
>>>>> 
>>>>> Thanks
>>>>> - Ioi
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>> 
>>>>>>> 
>>>>>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>>>>> Please review the following fix for JDK-8209971, TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1.
>>>>>>>> 
>>>>>>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>>>> 
>>>>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>>>> 
>>>>>>>> This is a pre-existing issue that's exposed when archived heap object relocation is enabled. At CDS runtime, the archived heap regions are mapped back to the runtime java heap. As the mapped heap data may not occupy the entire runtime GC region(s), the empty spaces within the regions are filled with dummy objects, which is done by a fixup process after mapping the data. The fixup was done after SystemDictionary::initialize(), because fixup needs to access SystemDictionary::Object_klass() (in some cases). However that is too late as archived java objects are accessed when SystemDictionary::initialize() resolves well-known classes.
>>>>>>>> 
>>>>>>>> The solution is to call MetaspaceShared::fixup_mapped_heap_regions() at an earlier time. In the webre, the call is moved into SystemDictionary::resolve_preloaded_classes(). It's called after resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but before any of the archived java objects is accessed. No mirror object is access/restored during resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). Archived mirrors are restored after java.lang.Class is loaded.
>>>>>>>> 
>>>>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - tier5 normal runs and tier1 - tier4 default CDS runs are in progress.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Jiangli
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 



More information about the hotspot-runtime-dev mailing list