RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1
Jiangli Zhou
jiangli.zhou at oracle.com
Thu Sep 6 05:10:54 UTC 2018
Hi Ioi,
Thanks for the quick review.
On 9/5/18 8:46 PM, Ioi Lam wrote:
> Hi Jiangli,
>
> Thanks for fixing this! I like the simple approach that minimizes code
> change.
>
> The following comment is misleading. It suggest that mirror objects
> cannot be restored at this stage, but doesn't explain why.
>
> Also, I don't think we should single out constant pools (the crash
> call stack had nothing to do with constant pool.)
>
> 2039 // No mirror object is access/restored during
> 2040 //
> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
> 2041 // Mirrors are restored after java.lang.Class is loaded.
> 2042 //
> 2043 // Object_klass()->constants()->restore_unshareable_info()
> restores the
> 2044 // constant pool resolved_references array for Object klass.
> It must be
> 2045 // done after MetaspaceShared::fixup_mapped_heap_regions().
All constant pool resolved_references arrays for shared classes are
archived (like the mirrors). Restoration of the resolved_references (of
the Object klass) array object is done as part of the
Object_klass()->constants()->restore_unshareable_info(). Although it
wasn't shown in the backtrace of this specific crash, it can be an issue
as long it is done before the mapped archive regions are fixed up.
That's why MetaspaceShared::fixup_mapped_heap_regions() needs to be
called before Object_klass()->constants()->restore_unshareable_info().
Any operation to the archived java objects can be potentially unsafe if
it is done before MetaspaceShared::fixup_mapped_heap_regions(). So we
want to place the MetaspaceShared::fixup_mapped_heap_regions() call as
early as possible but after SystemDictionary::Object_klass() is set up.
Please let me know if adding some of these explanations would help make
the comments more clear.
Just to give some background/history information to help understand. The
mapping/fixing-up of the archive heap region was designed and
implemented as part of the shared String support. No archived String
objects were accessed before the region fixup operation. So this was not
an issue. Later on, open archive heap region was introduced and
archiving supports for resolved_reference arrays and mirrors were
implemented. That introduced the hidden issue because some of the
archived mirrors and resolved_references arrays were accessed early
(before region fixup). The issue was undiscovered because the mapped
region start usually was the same as the start of the GC region that
contains the mapped data. There was no gap at the top of the mapped GC
regions in normal cases. We recently started to stress testing this area
heavily with relocations for the default CDS archive and the issue then
surfaced.
>
> The real reason we have to do this is:
>
> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, the
> heap is in an inconsistent state. At some G1 regions, the bottom part
> contains all zeros. However, some GC operations (debug-build
> verifications only?) would need to walk G1 regions starting from the
> bottom, and will run into asserts when we get an uninitialized block.
Just want to make it clear, only the mapped archived heap regions may
have this issue when the mapped region start is different from the
containing GC region's start. None of the regular GC regions have such
issue.
>
> + One of such operation is the barrier operations, as shown in the
> crash call stack in the bug report. (Maybe there are others??)
Any operation to the archived java objects could have issue if it is
done before archive region fixup. That's why we want to do the archive
region fixup as early as possible.
>
> For safety, we should put in asserts such that -- before
> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not do
> anything that would trigger barrier operations. For example, we must
> not call oopDesc::obj_field_put, which triggered the barrier
> operations in the crash stack.
>
> The crash is timing related and happened only when G1UpdateBufferSize
> was set to one. I think we need asserts so we can reliably catch any
> future problems. For example, if someone adds new code to modify obj
> fields in the archived heap before fixup_mapped_heap_regions() is
> called, we should get an assert immediately, instead of waiting for
> some future point that may or may not happen.
>
> I would suggest working with the GC folks for a suitable assert.
Not sure if there is a central place within GC code can be used for the
assertion, as any operation could be an issue. We can check with the GC
experts.
If we want to be extra cautious, we could map the archive heap regions
without access permission (PROT_NONE) initially. The archive region
fixup code can then reset memory protection and makes it
readable/writable. With that, any operations to the archived java
objects before fixup can trigger signal. This probably should be done
for debugging only build and should be handled separately.
Thanks,
Jiangli
>
> Thanks
> - Ioi
>
>
> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>> Please review the following fix for JDK-8209971,
>> TestOptionsWithRanges.java crashes in CDS mode with
>> G1UpdateBufferSize=1.
>>
>> webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>
>> This is a pre-existing issue that's exposed when archived heap object
>> relocation is enabled. At CDS runtime, the archived heap regions are
>> mapped back to the runtime java heap. As the mapped heap data may not
>> occupy the entire runtime GC region(s), the empty spaces within the
>> regions are filled with dummy objects, which is done by a fixup
>> process after mapping the data. The fixup was done after
>> SystemDictionary::initialize(), because fixup needs to access
>> SystemDictionary::Object_klass() (in some cases). However that is too
>> late as archived java objects are accessed when
>> SystemDictionary::initialize() resolves well-known classes.
>>
>> The solution is to call MetaspaceShared::fixup_mapped_heap_regions()
>> at an earlier time. In the webre, the call is moved into
>> SystemDictionary::resolve_preloaded_classes(). It's called after
>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but
>> before any of the archived java objects is accessed. No mirror object
>> is access/restored during
>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>> Archived mirrors are restored after java.lang.Class is loaded.
>>
>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - tier5
>> normal runs and tier1 - tier4 default CDS runs are in progress.
>>
>> Thanks,
>>
>> Jiangli
>>
>>
>
More information about the hotspot-runtime-dev
mailing list