RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Ioi Lam ioi.lam at oracle.com
Fri Sep 7 14:03:43 UTC 2018


Hi Jiangli,

This looks great! Thanks for coming up with a simple and clean fix.

- Ioi


On 9/6/18 7:05 PM, Jiangli Zhou wrote:
> Here is the updated webrev that addresses Ioi's comments: 
> http://cr.openjdk.java.net/~jiangli/8209971/webrev.01/.
>
> - Updated the comment above the 
> MetaspaceShared::fixup_mapped_heap_regions() call as suggested. I made 
> some minor adjustments.
>
> - Added the _archive_heap_region_fixed flag, which is set to true when 
> the archive heap regions are fixed. Added assert in 
> MetaspaceShared::materialize_archived_object() to make sure the mapped 
> archive regions are in good state when installing an archived java 
> object.
>
> Thanks,
>
> Jiangli
>
>
> On 9/6/18 11:11 AM, Ioi Lam wrote:
>>
>>
>> On 9/6/18 10:28 AM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>>
>>> On 9/6/18 12:04 AM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 9/5/18 10:10 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> Thanks for the quick review.
>>>>>
>>>>> On 9/5/18 8:46 PM, Ioi Lam wrote:
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> Thanks for fixing this! I like the simple approach that minimizes 
>>>>>> code change.
>>>>>>
>>>>>> The following comment is misleading. It suggest that mirror 
>>>>>> objects cannot be restored at this stage, but doesn't explain why.
>>>>>>
>>>>>> Also, I don't think we should single out constant pools (the 
>>>>>> crash call stack had nothing to do with constant pool.)
>>>>>>
>>>>>> 2039     // No mirror object is access/restored during
>>>>>> 2040     // 
>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>>>>>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>>>>>> 2042     //
>>>>>> 2043     // 
>>>>>> Object_klass()->constants()->restore_unshareable_info() restores the
>>>>>> 2044     // constant pool resolved_references array for Object 
>>>>>> klass. It must be
>>>>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>>>> All constant pool resolved_references arrays for shared classes 
>>>>> are archived (like the mirrors). Restoration of the 
>>>>> resolved_references (of the Object klass) array object is done as 
>>>>> part of the 
>>>>> Object_klass()->constants()->restore_unshareable_info(). Although 
>>>>> it wasn't shown in the backtrace of this specific crash, it can be 
>>>>> an issue as long it is done before the mapped archive regions are 
>>>>> fixed up. That's why MetaspaceShared::fixup_mapped_heap_regions() 
>>>>> needs to be called before 
>>>>> Object_klass()->constants()->restore_unshareable_info(). Any 
>>>>> operation to the archived java objects can be potentially unsafe 
>>>>> if it is done before MetaspaceShared::fixup_mapped_heap_regions(). 
>>>>> So we want to place the 
>>>>> MetaspaceShared::fixup_mapped_heap_regions() call as early as 
>>>>> possible but after SystemDictionary::Object_klass() is set up. 
>>>>> Please let me know if adding some of these explanations would help 
>>>>> make the comments more clear.
>>>>>
>>>> We should have concise comments that clearly describes the issues, 
>>>> and avoid repeating things that's already in the code. So sentences 
>>>> like "Fixup the mapped archive heap regions after resolving 
>>>> Object_klass" doesn't seem that useful.
>>>>
>>>> How about this:
>>>>
>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, 
>>>> CHECK);
>>>>     // It's unsafe to access the archived heap regions before they
>>>>     // are fixed up, so we must do the fixup as early as possible 
>>>> before
>>>>     // the archived java objects are accessedby function such as
>>>>     // java_lang_Class::restore_archived_mirror
>>>>     // and ConstantPool::restore_unshareable_info.
>>>>     //
>>>>     // However, MetaspaceShared::fixup_mapped_heap_regions() fills the
>>>>     // leading empty spaces in the archived regions and may use
>>>>     // SystemDictionary::Object_klass(), so we can do this only after
>>>>     // Object_klass is resolved.
>>>>     //
>>>>     // Note: no mirror objects are accessed/restored in the above 
>>>> call.
>>>>     // Mirrors are restored after java.lang.Class is loaded.
>>>>     MetaspaceShared::fixup_mapped_heap_regions();
>>> This looks ok. I'll make adjustment as suggested. To prevent 
>>> Object_klass()->constants()->restore_unshareable_info() accidentally 
>>> being move above MetaspaceShared::fixup_mapped_heap_regions() in the 
>>> future, the following comment should be included explicitly.
>>>
>>> 2043     // Object_klass()->constants()->restore_unshareable_info() 
>>> restores the
>>> 2044     // constant pool resolved_references array for Object 
>>> klass. It must be
>>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
>>
>> I think it's already mentioned above with "before .... 
>> ConstantPool::restore_unshareable_info. "
>>
>> I don't see any need for this extra verbosity. The longer the 
>> comment, the less likely people are going to read it. But if you 
>> insist, then go ahead and add it.
>>
>>>>
>>>>
>>>> BTW, while you're fixing this area, I think we need an assert here:
>>>>
>>>> void ConstantPool::restore_unshareable_info(TRAPS) {
>>>>   ..
>>>>   if (SystemDictionary::Object_klass_loaded()) {
>>>>     ...
>>>>   } else {
>>>> +  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), 
>>>> "must be");
>>>>    }
>>> Ok.
>>>>
>>>> That's because you have special handling for the Object klass, but 
>>>> no other klasses. However, the check in 
>>>> ConstantPool::restore_unshareable_info currently doesn't explicitly 
>>>> match this.
>>>>
>>>>     // Initialize the constant pool for the Object_class
>>>> Object_klass()->constants()->restore_unshareable_info(CHECK);
>>>>
>>>>
>>>>> Just to give some background/history information to help 
>>>>> understand. The mapping/fixing-up of the archive heap region was 
>>>>> designed and implemented as part of the shared String support. No 
>>>>> archived String objects were accessed before the region fixup 
>>>>> operation. So this was not an issue. Later on, open archive heap 
>>>>> region was introduced and archiving supports for 
>>>>> resolved_reference arrays and mirrors were implemented. That 
>>>>> introduced the hidden issue because some of the archived mirrors 
>>>>> and resolved_references arrays were accessed early (before region 
>>>>> fixup). The issue was undiscovered because the mapped region start 
>>>>> usually was the same as the start of the GC region that contains 
>>>>> the mapped data. There was no gap at the top of the mapped GC 
>>>>> regions in normal cases. We recently started to stress testing 
>>>>> this area heavily with relocations for the default CDS archive and 
>>>>> the issue then surfaced.
>>>>>
>>>>>>
>>>>>> The real reason we have to do this is:
>>>>>>
>>>>>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, 
>>>>>> the heap is in an inconsistent state. At some G1 regions, the 
>>>>>> bottom part contains all zeros. However, some GC operations 
>>>>>> (debug-build verifications only?) would need to walk G1 regions 
>>>>>> starting from the bottom, and will run into asserts when we get 
>>>>>> an uninitialized block.
>>>>> Just want to make it clear, only the mapped archived heap regions 
>>>>> may have this issue when the mapped region start is different from 
>>>>> the containing GC region's start. None of the regular GC regions 
>>>>> have such issue.
>>>>>
>>>>>>
>>>>>> + One of such operation is the barrier operations, as shown in 
>>>>>> the crash call stack in the bug report. (Maybe there are others??)
>>>>> Any operation to the archived java objects could have issue if it 
>>>>> is done before archive region fixup. That's why we want to do the 
>>>>> archive region fixup as early as possible.
>>>>>>
>>>>>> For safety, we should put in asserts such that -- before 
>>>>>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not 
>>>>>> do anything that would trigger barrier operations. For example, 
>>>>>> we must not call oopDesc::obj_field_put, which triggered the 
>>>>>> barrier operations in the crash stack.
>>>>>>
>>>>>> The crash is timing related and happened only when 
>>>>>> G1UpdateBufferSize was set to one. I think we need asserts so we 
>>>>>> can reliably catch any future problems. For example, if someone 
>>>>>> adds new code to modify obj fields in the archived heap before 
>>>>>> fixup_mapped_heap_regions() is called, we should get an assert 
>>>>>> immediately, instead of waiting for some future point that may or 
>>>>>> may not happen.
>>>>>>
>>>>>> I would suggest working with the GC folks for a suitable assert.
>>>>> Not sure if there is a central place within GC code can be used 
>>>>> for the assertion, as any operation could be an issue. We can 
>>>>> check with the GC experts.
>>>>>
>>>>> If we want to be extra cautious, we could map the archive heap 
>>>>> regions without access permission (PROT_NONE) initially. The 
>>>>> archive region fixup code can then reset memory protection and 
>>>>> makes it readable/writable. With that, any operations to the 
>>>>> archived java objects before fixup can trigger signal. This 
>>>>> probably should be done for debugging only build and should be 
>>>>> handled separately.
>>>>>
>>>>
>>>> Making the memory read-only initially seems a very good idea. I 
>>>> don't think we should put it off in a future RFE. This bug happened 
>>>> because we don't have a safeguard mechanism, so the bug fix should 
>>>> include such a mechanism.
>>> I briefly looked into it already. I think it requires some code 
>>> restructuring. The scope is larger and beyond the current bug fix.
>>>
>>> There is no object restoring before 
>>> MetaspaceShared::fixup_mapped_heap_regions() with the fix. To put a 
>>> safeguard, I can place an assert in the CDS code where it 
>>> materializes an archived object. That's the entry point to GC code.
>>>
>> That sounds good.
>>
>> Thanks
>> - Ioi
>>> I'll make changes and update the webrev.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>>
>>>> I spend 2 days trying to debug this issue. I would hate to have 
>>>> someone go through this again in the future just because we have 
>>>> filed an RFE but never get to doing it.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>>>>>> Please review the following fix for JDK-8209971, 
>>>>>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>>>>>> G1UpdateBufferSize=1.
>>>>>>>
>>>>>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>>>>>
>>>>>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>>>>>>
>>>>>>> This is a pre-existing issue that's exposed when archived heap 
>>>>>>> object relocation is enabled. At CDS runtime, the archived heap 
>>>>>>> regions are mapped back to the runtime java heap. As the mapped 
>>>>>>> heap data may not occupy the entire runtime GC region(s), the 
>>>>>>> empty spaces within the regions are filled with dummy objects, 
>>>>>>> which is done by a fixup process after mapping the data. The 
>>>>>>> fixup was done after SystemDictionary::initialize(), because 
>>>>>>> fixup needs to access SystemDictionary::Object_klass() (in some 
>>>>>>> cases). However that is too late as archived java objects are 
>>>>>>> accessed when SystemDictionary::initialize() resolves well-known 
>>>>>>> classes.
>>>>>>>
>>>>>>> The solution is to call 
>>>>>>> MetaspaceShared::fixup_mapped_heap_regions() at an earlier time. 
>>>>>>> In the webre, the call is moved into 
>>>>>>> SystemDictionary::resolve_preloaded_classes(). It's called after 
>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), 
>>>>>>> but before any of the archived java objects is accessed. No 
>>>>>>> mirror object is access/restored during 
>>>>>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>>>>>>> Archived mirrors are restored after java.lang.Class is loaded.
>>>>>>>
>>>>>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - 
>>>>>>> tier5 normal runs and tier1 - tier4 default CDS runs are in 
>>>>>>> progress.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangli
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list