RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Thu Sep 6 03:46:25 UTC 2018

Hi Jiangli,

Thanks for fixing this! I like the simple approach that minimizes code 
change.

The following comment is misleading. It suggest that mirror objects 
cannot be restored at this stage, but doesn't explain why.

Also, I don't think we should single out constant pools (the crash call 
stack had nothing to do with constant pool.)

2039     // No mirror object is access/restored during
2040     // resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
2041     // Mirrors are restored after java.lang.Class is loaded.
2042     //
2043     // Object_klass()->constants()->restore_unshareable_info() 
restores the
2044     // constant pool resolved_references array for Object klass. It 
must be
2045     // done after MetaspaceShared::fixup_mapped_heap_regions().

The real reason we have to do this is:

+ Before MetaspaceShared::fixup_mapped_heap_regions() is called, the 
heap is in an inconsistent state. At some G1 regions, the bottom part 
contains all zeros. However, some GC operations (debug-build 
verifications only?) would need to walk G1 regions starting from the 
bottom, and will run into asserts when we get an uninitialized block.

+ One of such operation is the barrier operations, as shown in the crash 
call stack in the bug report. (Maybe there are others??)

For safety, we should put in asserts such that -- before 
MetaspaceShared::fixup_mapped_heap_regions() is done, we must not do 
anything that would trigger barrier operations. For example, we must not 
call oopDesc::obj_field_put, which triggered the barrier operations in 
the crash stack.

The crash is timing related and happened only when G1UpdateBufferSize 
was set to one. I think we need asserts so we can reliably catch any 
future problems. For example, if someone adds new code to modify obj 
fields in the archived heap before fixup_mapped_heap_regions() is 
called, we should get an assert immediately, instead of waiting for some 
future point that may or may not happen.

I would suggest working with the GC folks for a suitable assert.

Thanks
- Ioi

On 9/5/18 7:58 PM, Jiangli Zhou wrote:
> Please review the following fix for JDK-8209971, 
> TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1.
>
>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>
>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>
> This is a pre-existing issue that's exposed when archived heap object 
> relocation is enabled. At CDS runtime, the archived heap regions are 
> mapped back to the runtime java heap. As the mapped heap data may not 
> occupy the entire runtime GC region(s), the empty spaces within the 
> regions are filled with dummy objects, which is done by a fixup 
> process after mapping the data. The fixup was done after 
> SystemDictionary::initialize(), because fixup needs to access 
> SystemDictionary::Object_klass() (in some cases). However that is too 
> late as archived java objects are accessed when 
> SystemDictionary::initialize() resolves well-known classes.
>
> The solution is to call MetaspaceShared::fixup_mapped_heap_regions() 
> at an earlier time. In the webre, the call is moved into 
> SystemDictionary::resolve_preloaded_classes(). It's called after 
> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but 
> before any of the archived java objects is accessed. No mirror object 
> is access/restored during 
> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
> Archived mirrors are restored after java.lang.Class is loaded.
>
> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - tier5 
> normal runs and tier1 - tier4 default CDS runs are in progress.
>
> Thanks,
>
> Jiangli
>
>