RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Jiangli Zhou jiangli.zhou at oracle.com
Fri Sep 7 17:18:16 UTC 2018


Thanks, Calvin!

Jiangli


On 9/7/18 10:13 AM, Calvin Cheung wrote:
> Hi Jiangli,
>
> Fix looks good to me as well.
>
> thanks,
> Calvin
>
> On 9/7/18, 7:03 AM, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> This looks great! Thanks for coming up with a simple and clean fix.
>>
>> - Ioi
>>
>>
>> On 9/6/18 7:05 PM, Jiangli Zhou wrote:
>>> Here is the updated webrev that addresses Ioi's comments: 
>>> http://cr.openjdk.java.net/~jiangli/8209971/webrev.01/.
>>>
>>> - Updated the comment above the 
>>> MetaspaceShared::fixup_mapped_heap_regions() call as suggested. I 
>>> made some minor adjustments.
>>>
>>> - Added the _archive_heap_region_fixed flag, which is set to true 
>>> when the archive heap regions are fixed. Added assert in 
>>> MetaspaceShared::materialize_archived_object() to make sure the 
>>> mapped archive regions are in good state when installing an archived 
>>> java object.
>>>
>>> Thanks,
>>>
>>> Jiangli
>>>
>>>
>>> On 9/6/18 11:11 AM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 9/6/18 10:28 AM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>>
>>>>> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>>>>>
>>>>>>
>>>>>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> Thanks for the quick review.
>>>>>>>
>>>>>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>>>>>> Hi Jiangli,
>>>>>>>>
>>>>>>>> Thanks for fixing this! I like the simple approach that 
>>>>>>>> minimizes code change.
>>>>>>>>
>>>>>>>> The following comment is misleading. It suggest that mirror 
>>>>>>>> objects cannot be restored at this stage, but doesn't explain why.
>>>>>>>>
>>>>>>>> Also, I don't think we should single out constant pools (the 
>>>>>>>> crash call stack had nothing to do with constant pool.)
>>>>>>>>
>>>>>>>> 2039     // No mirror object is access/restored during
>>>>>>>> 2040     // 
>>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>>>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>>>>>> 2042     //
>>>>>>>> 2043     // 
>>>>>>>> Object_klass()->constants()->restore_unshareable_info() 
>>>>>>>> restores the
>>>>>>>> 2044     // constant pool resolved_references array for Object 
>>>>>>>> klass. It must be
>>>>>>>> 2045     // done after 
>>>>>>>> MetaspaceShared::fixup_mapped_heap_regions().
>>>>>>> All constant pool resolved_references arrays for shared classes 
>>>>>>> are archived (like the mirrors). Restoration of the 
>>>>>>> resolved_references (of the Object klass) array object is done 
>>>>>>> as part of the 
>>>>>>> Object_klass()->constants()->restore_unshareable_info(). 
>>>>>>> Although it wasn't shown in the backtrace of this specific 
>>>>>>> crash, it can be an issue as long it is done before the mapped 
>>>>>>> archive regions are fixed up. That's why 
>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() needs to be called 
>>>>>>> before Object_klass()->constants()->restore_unshareable_info(). 
>>>>>>> Any operation to the archived java objects can be potentially 
>>>>>>> unsafe if it is done before 
>>>>>>> MetaspaceShared::fixup_mapped_heap_regions(). So we want to 
>>>>>>> place the MetaspaceShared::fixup_mapped_heap_regions() call as 
>>>>>>> early as possible but after SystemDictionary::Object_klass() is 
>>>>>>> set up. Please let me know if adding some of these explanations 
>>>>>>> would help make the comments more clear.
>>>>>>>
>>>>>> We should have concise comments that clearly describes the 
>>>>>> issues, and avoid repeating things that's already in the code. So 
>>>>>> sentences like "Fixup the mapped archive heap regions after 
>>>>>> resolving Object_klass" doesn't seem that useful.
>>>>>>
>>>>>> How about this:
>>>>>>
>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), 
>>>>>> scan, CHECK);
>>>>>>     // It's unsafe to access the archived heap regions before they
>>>>>>     // are fixed up, so we must do the fixup as early as possible 
>>>>>> before
>>>>>>     // the archived java objects are accessedby function such as
>>>>>>     // java_lang_Class::restore_archived_mirror
>>>>>>     // and ConstantPool::restore_unshareable_info.
>>>>>>     //
>>>>>>     // However, MetaspaceShared::fixup_mapped_heap_regions() 
>>>>>> fills the
>>>>>>     // leading empty spaces in the archived regions and may use
>>>>>>     // SystemDictionary::Object_klass(), so we can do this only 
>>>>>> after
>>>>>>     // Object_klass is resolved.
>>>>>>     //
>>>>>>     // Note: no mirror objects are accessed/restored in the above 
>>>>>> call.
>>>>>>     // Mirrors are restored after java.lang.Class is loaded.
>>>>>>     MetaspaceShared::fixup_mapped_heap_regions();
>>>>> This looks ok. I'll make adjustment as suggested. To prevent 
>>>>> Object_klass()->constants()->restore_unshareable_info() 
>>>>> accidentally being move above 
>>>>> MetaspaceShared::fixup_mapped_heap_regions() in the future, the 
>>>>> following comment should be included explicitly.
>>>>>
>>>>> 2043     // 
>>>>> Object_klass()->constants()->restore_unshareable_info() restores the
>>>>> 2044     // constant pool resolved_references array for Object 
>>>>> klass. It must be
>>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>>>
>>>> I think it's already mentioned above with "before .... 
>>>> ConstantPool::restore_unshareable_info. "
>>>>
>>>> I don't see any need for this extra verbosity. The longer the 
>>>> comment, the less likely people are going to read it. But if you 
>>>> insist, then go ahead and add it.
>>>>
>>>>>>
>>>>>>
>>>>>> BTW, while you're fixing this area, I think we need an assert here:
>>>>>>
>>>>>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>>>>>   ..
>>>>>>   if (SystemDictionary::Object_klass_loaded()) {
>>>>>>     ...
>>>>>>   } else {
>>>>>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), 
>>>>>> "must be");
>>>>>>    }
>>>>> Ok.
>>>>>>
>>>>>> That's because you have special handling for the Object klass, 
>>>>>> but no other klasses. However, the check in 
>>>>>> ConstantPool::restore_unshareable_info currently doesn't 
>>>>>> explicitly match this.
>>>>>>
>>>>>>     // Initialize the constant pool for the Object_class
>>>>>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>>>>>
>>>>>>
>>>>>>> Just to give some background/history information to help 
>>>>>>> understand. The mapping/fixing-up of the archive heap region was 
>>>>>>> designed and implemented as part of the shared String support. 
>>>>>>> No archived String objects were accessed before the region fixup 
>>>>>>> operation. So this was not an issue. Later on, open archive heap 
>>>>>>> region was introduced and archiving supports for 
>>>>>>> resolved_reference arrays and mirrors were implemented. That 
>>>>>>> introduced the hidden issue because some of the archived mirrors 
>>>>>>> and resolved_references arrays were accessed early (before 
>>>>>>> region fixup). The issue was undiscovered because the mapped 
>>>>>>> region start usually was the same as the start of the GC region 
>>>>>>> that contains the mapped data. There was no gap at the top of 
>>>>>>> the mapped GC regions in normal cases. We recently started to 
>>>>>>> stress testing this area heavily with relocations for the 
>>>>>>> default CDS archive and the issue then surfaced.
>>>>>>>
>>>>>>>>
>>>>>>>> The real reason we have to do this is:
>>>>>>>>
>>>>>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is 
>>>>>>>> called, the heap is in an inconsistent state. At some G1 
>>>>>>>> regions, the bottom part contains all zeros. However, some GC 
>>>>>>>> operations (debug-build verifications only?) would need to walk 
>>>>>>>> G1 regions starting from the bottom, and will run into asserts 
>>>>>>>> when we get an uninitialized block.
>>>>>>> Just want to make it clear, only the mapped archived heap 
>>>>>>> regions may have this issue when the mapped region start is 
>>>>>>> different from the containing GC region's start. None of the 
>>>>>>> regular GC regions have such issue.
>>>>>>>
>>>>>>>>
>>>>>>>> + One of such operation is the barrier operations, as shown in 
>>>>>>>> the crash call stack in the bug report. (Maybe there are others??)
>>>>>>> Any operation to the archived java objects could have issue if 
>>>>>>> it is done before archive region fixup. That's why we want to do 
>>>>>>> the archive region fixup as early as possible.
>>>>>>>>
>>>>>>>> For safety, we should put in asserts such that -- before 
>>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must 
>>>>>>>> not do anything that would trigger barrier operations. For 
>>>>>>>> example, we must not call oopDesc::obj_field_put, which 
>>>>>>>> triggered the barrier operations in the crash stack.
>>>>>>>>
>>>>>>>> The crash is timing related and happened only when 
>>>>>>>> G1UpdateBufferSize was set to one. I think we need asserts so 
>>>>>>>> we can reliably catch any future problems. For example, if 
>>>>>>>> someone adds new code to modify obj fields in the archived heap 
>>>>>>>> before fixup_mapped_heap_regions() is called, we should get an 
>>>>>>>> assert immediately, instead of waiting for some future point 
>>>>>>>> that may or may not happen.
>>>>>>>>
>>>>>>>> I would suggest working with the GC folks for a suitable assert.
>>>>>>> Not sure if there is a central place within GC code can be used 
>>>>>>> for the assertion, as any operation could be an issue. We can 
>>>>>>> check with the GC experts.
>>>>>>>
>>>>>>> If we want to be extra cautious, we could map the archive heap 
>>>>>>> regions without access permission (PROT_NONE) initially. The 
>>>>>>> archive region fixup code can then reset memory protection and 
>>>>>>> makes it readable/writable. With that, any operations to the 
>>>>>>> archived java objects before fixup can trigger signal. This 
>>>>>>> probably should be done for debugging only build and should be 
>>>>>>> handled separately.
>>>>>>>
>>>>>>
>>>>>> Making the memory read-only initially seems a very good idea. I 
>>>>>> don't think we should put it off in a future RFE. This bug 
>>>>>> happened because we don't have a safeguard mechanism, so the bug 
>>>>>> fix should include such a mechanism.
>>>>> I briefly looked into it already. I think it requires some code 
>>>>> restructuring. The scope is larger and beyond the current bug fix.
>>>>>
>>>>> There is no object restoring before 
>>>>> MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put 
>>>>> a safeguard, I can place an assert in the CDS code where it 
>>>>> materializes an archived object. That's the entry point to GC code.
>>>>>
>>>> That sounds good.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>> I'll make changes and update the webrev.
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>>
>>>>>> I spend 2 days trying to debug this issue. I would hate to have 
>>>>>> someone go through this again in the future just because we have 
>>>>>> filed an RFE but never get to doing it.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiangli
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>>>>>> Please review the following fix for JDK-8209971, 
>>>>>>>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>>>>>>>> G1UpdateBufferSize=1.
>>>>>>>>>
>>>>>>>>>     webrev: 
>>>>>>>>> http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>>>>>
>>>>>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>>>>>
>>>>>>>>> This is a pre-existing issue that's exposed when archived heap 
>>>>>>>>> object relocation is enabled. At CDS runtime, the archived 
>>>>>>>>> heap regions are mapped back to the runtime java heap. As the 
>>>>>>>>> mapped heap data may not occupy the entire runtime GC 
>>>>>>>>> region(s), the empty spaces within the regions are filled with 
>>>>>>>>> dummy objects, which is done by a fixup process after mapping 
>>>>>>>>> the data. The fixup was done after 
>>>>>>>>> SystemDictionary::initialize(), because fixup needs to access 
>>>>>>>>> SystemDictionary::Object_klass() (in some cases). However that 
>>>>>>>>> is too late as archived java objects are accessed when 
>>>>>>>>> SystemDictionary::initialize() resolves well-known classes.
>>>>>>>>>
>>>>>>>>> The solution is to call 
>>>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() at an earlier 
>>>>>>>>> time. In the webre, the call is moved into 
>>>>>>>>> SystemDictionary::resolve_preloaded_classes(). It's called 
>>>>>>>>> after 
>>>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), 
>>>>>>>>> but before any of the archived java objects is accessed. No 
>>>>>>>>> mirror object is access/restored during 
>>>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>>>>>>>>> Archived mirrors are restored after java.lang.Class is loaded.
>>>>>>>>>
>>>>>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - 
>>>>>>>>> tier5 normal runs and tier1 - tier4 default CDS runs are in 
>>>>>>>>> progress.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Jiangli
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list