RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Jiangli Zhou jiangli.zhou at oracle.com
Thu Sep 6 05:10:54 UTC 2018


Hi Ioi,

Thanks for the quick review.

On 9/5/18 8:46 PM, Ioi Lam wrote:
> Hi Jiangli,
>
> Thanks for fixing this! I like the simple approach that minimizes code 
> change.
>
> The following comment is misleading. It suggest that mirror objects 
> cannot be restored at this stage, but doesn't explain why.
>
> Also, I don't think we should single out constant pools (the crash 
> call stack had nothing to do with constant pool.)
>
> 2039     // No mirror object is access/restored during
> 2040     // 
> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
> 2041     // Mirrors are restored after java.lang.Class is loaded.
> 2042     //
> 2043     // Object_klass()->constants()->restore_unshareable_info() 
> restores the
> 2044     // constant pool resolved_references array for Object klass. 
> It must be
> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
All constant pool resolved_references arrays for shared classes are 
archived (like the mirrors). Restoration of the resolved_references (of 
the Object klass) array object is done as part of the 
Object_klass()->constants()->restore_unshareable_info(). Although it 
wasn't shown in the backtrace of this specific crash, it can be an issue 
as long it is done before the mapped archive regions are fixed up. 
That's why MetaspaceShared::fixup_mapped_heap_regions() needs to be 
called before Object_klass()->constants()->restore_unshareable_info(). 
Any operation to the archived java objects can be potentially unsafe if 
it is done before MetaspaceShared::fixup_mapped_heap_regions(). So we 
want to place the MetaspaceShared::fixup_mapped_heap_regions() call as 
early as possible but after SystemDictionary::Object_klass() is set up. 
Please let me know if adding some of these explanations would help make 
the comments more clear.

Just to give some background/history information to help understand. The 
mapping/fixing-up of the archive heap region was designed and 
implemented as part of the shared String support. No archived String 
objects were accessed before the region fixup operation. So this was not 
an issue. Later on, open archive heap region was introduced and 
archiving supports for resolved_reference arrays and mirrors were 
implemented. That introduced the hidden issue because some of the 
archived mirrors and resolved_references arrays were accessed early 
(before region fixup). The issue was undiscovered because the mapped 
region start usually was the same as the start of the GC region that 
contains the mapped data. There was no gap at the top of the mapped GC 
regions in normal cases. We recently started to stress testing this area 
heavily with relocations for the default CDS archive and the issue then 
surfaced.

>
> The real reason we have to do this is:
>
> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, the 
> heap is in an inconsistent state. At some G1 regions, the bottom part 
> contains all zeros. However, some GC operations (debug-build 
> verifications only?) would need to walk G1 regions starting from the 
> bottom, and will run into asserts when we get an uninitialized block.
Just want to make it clear, only the mapped archived heap regions may 
have this issue when the mapped region start is different from the 
containing GC region's start. None of the regular GC regions have such 
issue.

>
> + One of such operation is the barrier operations, as shown in the 
> crash call stack in the bug report. (Maybe there are others??)
Any operation to the archived java objects could have issue if it is 
done before archive region fixup. That's why we want to do the archive 
region fixup as early as possible.
>
> For safety, we should put in asserts such that -- before 
> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not do 
> anything that would trigger barrier operations. For example, we must 
> not call oopDesc::obj_field_put, which triggered the barrier 
> operations in the crash stack.
>
> The crash is timing related and happened only when G1UpdateBufferSize 
> was set to one. I think we need asserts so we can reliably catch any 
> future problems. For example, if someone adds new code to modify obj 
> fields in the archived heap before fixup_mapped_heap_regions() is 
> called, we should get an assert immediately, instead of waiting for 
> some future point that may or may not happen.
>
> I would suggest working with the GC folks for a suitable assert.
Not sure if there is a central place within GC code can be used for the 
assertion, as any operation could be an issue. We can check with the GC 
experts.

If we want to be extra cautious, we could map the archive heap regions 
without access permission (PROT_NONE) initially. The archive region 
fixup code can then reset memory protection and makes it 
readable/writable. With that, any operations to the archived java 
objects before fixup can trigger signal. This probably should be done 
for debugging only build and should be handled separately.

Thanks,
Jiangli

>
> Thanks
> - Ioi
>
>
> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>> Please review the following fix for JDK-8209971, 
>> TestOptionsWithRanges.java crashes in CDS mode with 
>> G1UpdateBufferSize=1.
>>
>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>
>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>
>> This is a pre-existing issue that's exposed when archived heap object 
>> relocation is enabled. At CDS runtime, the archived heap regions are 
>> mapped back to the runtime java heap. As the mapped heap data may not 
>> occupy the entire runtime GC region(s), the empty spaces within the 
>> regions are filled with dummy objects, which is done by a fixup 
>> process after mapping the data. The fixup was done after 
>> SystemDictionary::initialize(), because fixup needs to access 
>> SystemDictionary::Object_klass() (in some cases). However that is too 
>> late as archived java objects are accessed when 
>> SystemDictionary::initialize() resolves well-known classes.
>>
>> The solution is to call MetaspaceShared::fixup_mapped_heap_regions() 
>> at an earlier time. In the webre, the call is moved into 
>> SystemDictionary::resolve_preloaded_classes(). It's called after 
>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but 
>> before any of the archived java objects is accessed. No mirror object 
>> is access/restored during 
>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>> Archived mirrors are restored after java.lang.Class is loaded.
>>
>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - tier5 
>> normal runs and tier1 - tier4 default CDS runs are in progress.
>>
>> Thanks,
>>
>> Jiangli
>>
>>
>



More information about the hotspot-runtime-dev mailing list