RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Calvin Cheung calvin.cheung at oracle.com
Fri Sep 7 17:13:23 UTC 2018


Hi Jiangli,

Fix looks good to me as well.

thanks,
Calvin

On 9/7/18, 7:03 AM, Ioi Lam wrote:
> Hi Jiangli,
>
> This looks great! Thanks for coming up with a simple and clean fix.
>
> - Ioi
>
>
> On 9/6/18 7:05 PM, Jiangli Zhou wrote:
>> Here is the updated webrev that addresses Ioi's comments: 
>> http://cr.openjdk.java.net/~jiangli/8209971/webrev.01/.
>>
>> - Updated the comment above the 
>> MetaspaceShared::fixup_mapped_heap_regions() call as suggested. I 
>> made some minor adjustments.
>>
>> - Added the _archive_heap_region_fixed flag, which is set to true 
>> when the archive heap regions are fixed. Added assert in 
>> MetaspaceShared::materialize_archived_object() to make sure the 
>> mapped archive regions are in good state when installing an archived 
>> java object.
>>
>> Thanks,
>>
>> Jiangli
>>
>>
>> On 9/6/18 11:11 AM, Ioi Lam wrote:
>>>
>>>
>>> On 9/6/18 10:28 AM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>>
>>>> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>>>>
>>>>>
>>>>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>> Thanks for the quick review.
>>>>>>
>>>>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>>>>> Hi Jiangli,
>>>>>>>
>>>>>>> Thanks for fixing this! I like the simple approach that 
>>>>>>> minimizes code change.
>>>>>>>
>>>>>>> The following comment is misleading. It suggest that mirror 
>>>>>>> objects cannot be restored at this stage, but doesn't explain why.
>>>>>>>
>>>>>>> Also, I don't think we should single out constant pools (the 
>>>>>>> crash call stack had nothing to do with constant pool.)
>>>>>>>
>>>>>>> 2039     // No mirror object is access/restored during
>>>>>>> 2040     // 
>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>>>>> 2042     //
>>>>>>> 2043     // 
>>>>>>> Object_klass()->constants()->restore_unshareable_info() restores 
>>>>>>> the
>>>>>>> 2044     // constant pool resolved_references array for Object 
>>>>>>> klass. It must be
>>>>>>> 2045     // done after 
>>>>>>> MetaspaceShared::fixup_mapped_heap_regions().
>>>>>> All constant pool resolved_references arrays for shared classes 
>>>>>> are archived (like the mirrors). Restoration of the 
>>>>>> resolved_references (of the Object klass) array object is done as 
>>>>>> part of the 
>>>>>> Object_klass()->constants()->restore_unshareable_info(). Although 
>>>>>> it wasn't shown in the backtrace of this specific crash, it can 
>>>>>> be an issue as long it is done before the mapped archive regions 
>>>>>> are fixed up. That's why 
>>>>>> MetaspaceShared::fixup_mapped_heap_regions() needs to be called 
>>>>>> before Object_klass()->constants()->restore_unshareable_info(). 
>>>>>> Any operation to the archived java objects can be potentially 
>>>>>> unsafe if it is done before 
>>>>>> MetaspaceShared::fixup_mapped_heap_regions(). So we want to place 
>>>>>> the MetaspaceShared::fixup_mapped_heap_regions() call as early as 
>>>>>> possible but after SystemDictionary::Object_klass() is set up. 
>>>>>> Please let me know if adding some of these explanations would 
>>>>>> help make the comments more clear.
>>>>>>
>>>>> We should have concise comments that clearly describes the issues, 
>>>>> and avoid repeating things that's already in the code. So 
>>>>> sentences like "Fixup the mapped archive heap regions after 
>>>>> resolving Object_klass" doesn't seem that useful.
>>>>>
>>>>> How about this:
>>>>>
>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, 
>>>>> CHECK);
>>>>>     // It's unsafe to access the archived heap regions before they
>>>>>     // are fixed up, so we must do the fixup as early as possible 
>>>>> before
>>>>>     // the archived java objects are accessedby function such as
>>>>>     // java_lang_Class::restore_archived_mirror
>>>>>     // and ConstantPool::restore_unshareable_info.
>>>>>     //
>>>>>     // However, MetaspaceShared::fixup_mapped_heap_regions() fills 
>>>>> the
>>>>>     // leading empty spaces in the archived regions and may use
>>>>>     // SystemDictionary::Object_klass(), so we can do this only after
>>>>>     // Object_klass is resolved.
>>>>>     //
>>>>>     // Note: no mirror objects are accessed/restored in the above 
>>>>> call.
>>>>>     // Mirrors are restored after java.lang.Class is loaded.
>>>>>     MetaspaceShared::fixup_mapped_heap_regions();
>>>> This looks ok. I'll make adjustment as suggested. To prevent 
>>>> Object_klass()->constants()->restore_unshareable_info() 
>>>> accidentally being move above 
>>>> MetaspaceShared::fixup_mapped_heap_regions() in the future, the 
>>>> following comment should be included explicitly.
>>>>
>>>> 2043     // Object_klass()->constants()->restore_unshareable_info() 
>>>> restores the
>>>> 2044     // constant pool resolved_references array for Object 
>>>> klass. It must be
>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>>
>>> I think it's already mentioned above with "before .... 
>>> ConstantPool::restore_unshareable_info. "
>>>
>>> I don't see any need for this extra verbosity. The longer the 
>>> comment, the less likely people are going to read it. But if you 
>>> insist, then go ahead and add it.
>>>
>>>>>
>>>>>
>>>>> BTW, while you're fixing this area, I think we need an assert here:
>>>>>
>>>>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>>>>   ..
>>>>>   if (SystemDictionary::Object_klass_loaded()) {
>>>>>     ...
>>>>>   } else {
>>>>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), 
>>>>> "must be");
>>>>>    }
>>>> Ok.
>>>>>
>>>>> That's because you have special handling for the Object klass, but 
>>>>> no other klasses. However, the check in 
>>>>> ConstantPool::restore_unshareable_info currently doesn't 
>>>>> explicitly match this.
>>>>>
>>>>>     // Initialize the constant pool for the Object_class
>>>>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>>>>
>>>>>
>>>>>> Just to give some background/history information to help 
>>>>>> understand. The mapping/fixing-up of the archive heap region was 
>>>>>> designed and implemented as part of the shared String support. No 
>>>>>> archived String objects were accessed before the region fixup 
>>>>>> operation. So this was not an issue. Later on, open archive heap 
>>>>>> region was introduced and archiving supports for 
>>>>>> resolved_reference arrays and mirrors were implemented. That 
>>>>>> introduced the hidden issue because some of the archived mirrors 
>>>>>> and resolved_references arrays were accessed early (before region 
>>>>>> fixup). The issue was undiscovered because the mapped region 
>>>>>> start usually was the same as the start of the GC region that 
>>>>>> contains the mapped data. There was no gap at the top of the 
>>>>>> mapped GC regions in normal cases. We recently started to stress 
>>>>>> testing this area heavily with relocations for the default CDS 
>>>>>> archive and the issue then surfaced.
>>>>>>
>>>>>>>
>>>>>>> The real reason we have to do this is:
>>>>>>>
>>>>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, 
>>>>>>> the heap is in an inconsistent state. At some G1 regions, the 
>>>>>>> bottom part contains all zeros. However, some GC operations 
>>>>>>> (debug-build verifications only?) would need to walk G1 regions 
>>>>>>> starting from the bottom, and will run into asserts when we get 
>>>>>>> an uninitialized block.
>>>>>> Just want to make it clear, only the mapped archived heap regions 
>>>>>> may have this issue when the mapped region start is different 
>>>>>> from the containing GC region's start. None of the regular GC 
>>>>>> regions have such issue.
>>>>>>
>>>>>>>
>>>>>>> + One of such operation is the barrier operations, as shown in 
>>>>>>> the crash call stack in the bug report. (Maybe there are others??)
>>>>>> Any operation to the archived java objects could have issue if it 
>>>>>> is done before archive region fixup. That's why we want to do the 
>>>>>> archive region fixup as early as possible.
>>>>>>>
>>>>>>> For safety, we should put in asserts such that -- before 
>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must 
>>>>>>> not do anything that would trigger barrier operations. For 
>>>>>>> example, we must not call oopDesc::obj_field_put, which 
>>>>>>> triggered the barrier operations in the crash stack.
>>>>>>>
>>>>>>> The crash is timing related and happened only when 
>>>>>>> G1UpdateBufferSize was set to one. I think we need asserts so we 
>>>>>>> can reliably catch any future problems. For example, if someone 
>>>>>>> adds new code to modify obj fields in the archived heap before 
>>>>>>> fixup_mapped_heap_regions() is called, we should get an assert 
>>>>>>> immediately, instead of waiting for some future point that may 
>>>>>>> or may not happen.
>>>>>>>
>>>>>>> I would suggest working with the GC folks for a suitable assert.
>>>>>> Not sure if there is a central place within GC code can be used 
>>>>>> for the assertion, as any operation could be an issue. We can 
>>>>>> check with the GC experts.
>>>>>>
>>>>>> If we want to be extra cautious, we could map the archive heap 
>>>>>> regions without access permission (PROT_NONE) initially. The 
>>>>>> archive region fixup code can then reset memory protection and 
>>>>>> makes it readable/writable. With that, any operations to the 
>>>>>> archived java objects before fixup can trigger signal. This 
>>>>>> probably should be done for debugging only build and should be 
>>>>>> handled separately.
>>>>>>
>>>>>
>>>>> Making the memory read-only initially seems a very good idea. I 
>>>>> don't think we should put it off in a future RFE. This bug 
>>>>> happened because we don't have a safeguard mechanism, so the bug 
>>>>> fix should include such a mechanism.
>>>> I briefly looked into it already. I think it requires some code 
>>>> restructuring. The scope is larger and beyond the current bug fix.
>>>>
>>>> There is no object restoring before 
>>>> MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put a 
>>>> safeguard, I can place an assert in the CDS code where it 
>>>> materializes an archived object. That's the entry point to GC code.
>>>>
>>> That sounds good.
>>>
>>> Thanks
>>> - Ioi
>>>> I'll make changes and update the webrev.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>>
>>>>> I spend 2 days trying to debug this issue. I would hate to have 
>>>>> someone go through this again in the future just because we have 
>>>>> filed an RFE but never get to doing it.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>>>>> Please review the following fix for JDK-8209971, 
>>>>>>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>>>>>>> G1UpdateBufferSize=1.
>>>>>>>>
>>>>>>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>>>>
>>>>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>>>>
>>>>>>>> This is a pre-existing issue that's exposed when archived heap 
>>>>>>>> object relocation is enabled. At CDS runtime, the archived heap 
>>>>>>>> regions are mapped back to the runtime java heap. As the mapped 
>>>>>>>> heap data may not occupy the entire runtime GC region(s), the 
>>>>>>>> empty spaces within the regions are filled with dummy objects, 
>>>>>>>> which is done by a fixup process after mapping the data. The 
>>>>>>>> fixup was done after SystemDictionary::initialize(), because 
>>>>>>>> fixup needs to access SystemDictionary::Object_klass() (in some 
>>>>>>>> cases). However that is too late as archived java objects are 
>>>>>>>> accessed when SystemDictionary::initialize() resolves 
>>>>>>>> well-known classes.
>>>>>>>>
>>>>>>>> The solution is to call 
>>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() at an earlier 
>>>>>>>> time. In the webre, the call is moved into 
>>>>>>>> SystemDictionary::resolve_preloaded_classes(). It's called 
>>>>>>>> after 
>>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but 
>>>>>>>> before any of the archived java objects is accessed. No mirror 
>>>>>>>> object is access/restored during 
>>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). Archived 
>>>>>>>> mirrors are restored after java.lang.Class is loaded.
>>>>>>>>
>>>>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - 
>>>>>>>> tier5 normal runs and tier1 - tier4 default CDS runs are in 
>>>>>>>> progress.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the hotspot-runtime-dev mailing list