Need help to fix a potential G1 crash in jdk11

Volker Simonis volker.simonis at gmail.com
Wed Jun 3 17:14:38 UTC 2020


Hi Erik,

thanks a lot for the quick response and the hint with ClassUnloading. I've
just started several runs of the test program with "-XX:-ClassUnloading".
I'll report back instantly once I have some results.

Best regards,
Volker

On Wed, Jun 3, 2020 at 5:26 PM Erik Österlund <erik.osterlund at oracle.com>
wrote:

> Hi Volker,
>
> In JDK 12, I changed quite a bit how G1 performs class unloading, to a
> new model.
> Since the verification runs just after class unloading, I guess it could
> be interesting
> to check if the error happens with -XX:-ClassUnloading as well. If not,
> then perhaps
> some of my class unloading changes for G1 in JDK 12 fixed the problem.
>
> Just a gut feeling...
>
> Thanks,
> /Erik
>
> On 2020-06-03 17:02, Volker Simonis wrote:
> > Hi,
> >
> > I would appreciate some help/advice for debugging a potential G1 crash in
> > jdk 11. The crash usually occurs when running a proprietary jar file for
> > about 20-30 minutes and it happens in various parts of the VM (C1- or
> > C2-compiled code, interpreter, GC). Because the crash locations are so
> > different and because the customer which reported the issue claimed that
> it
> > doesn't happen with Parallel GC, I thought it might be a G1 issue. I
> > couldn't reproduce the crash with jdk 12 and 14 (but with jdk 11 and
> > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1 changes in jdk
> 12 I
> > couldn't find any apparent bug fix which potentially solves this problem
> > but it may have been solved by one of the many G1 changes which happened
> in
> > jdk 12.
> >
> > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
> > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
> > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
> > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got verification errors
> (see
> > [1] for a complete hs_err file). Sometimes it's just a few fields
> pointing
> > to dead objects:
> >
> > [1035.782s][error][gc,verify         ] ----------
> > [1035.782s][error][gc,verify         ] Field 0x00000000fb509148 of live
> obj
> > 0x00000000fb509130 in region [0x00000000fb500000, 0x00000000fb600000)
> > [1035.782s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.ATNConfig
> > [1035.782s][error][gc,verify         ] points to dead obj
> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> > [1035.782s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.SingletonPredictionContext
> > [1035.782s][error][gc,verify         ] ----------
> > [1035.783s][error][gc,verify         ] Field 0x00000000fb509168 of live
> obj
> > 0x00000000fb509150 in region [0x00000000fb500000, 0x00000000fb600000)
> > [1035.783s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.ATNConfig
> > [1035.783s][error][gc,verify         ] points to dead obj
> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> > [1035.783s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.SingletonPredictionContext
> > [1035.783s][error][gc,verify         ] ----------
> > ...
> > [1043.928s][error][gc,verify         ] Heap Regions: E=young(eden),
> > S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues),
> > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start (previous,
> > next)
> > ...
> > [1043.929s][error][gc,verify         ] |  79|0x00000000f9b00000,
> > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O|  |TAMS
> 0x00000000f9bfffe8,
> > 0x00000000f9b00000| Updating
> > ...
> > [1043.971s][error][gc,verify         ] | 105|0x00000000fb500000,
> > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
> 0x00000000fb500000,
> > 0x00000000fb500000| Complete
> >
> > but I also got verification errors with more than 30000 fields of
> distinct
> > objects pointing to more than 1000 dead objects. How can that happen? Is
> > the verification always accurate or can this also be a problem with the
> > verification itself and I'm hunting the wrong problem?
> >
> > Sometimes I also saw verification errors where fields point to objects in
> > regions with "Untracked remset":
> >
> > [673.762s][error][gc,verify] ----------
> > [673.762s][error][gc,verify] Field 0x00000000fca49298 of live obj
> > 0x00000000fca49280 in region [0x00000000fca0000
> > 0, 0x00000000fcb00000)
> > [673.762s][error][gc,verify] class name
> org.antlr.v4.runtime.atn.ATNConfig
> > [673.762s][error][gc,verify] points to obj 0x00000000f9d5a9a0 in region
> > 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000] remset
> > Untracked
> > [673.762s][error][gc,verify] ----------
> >
> > But they are by far not that common like the pointers to dead objects.
> Once
> > I even saw a "Root location" pointing to a dead object:
> >
> > [369.808s][error][gc,verify] Root location 0x00007f35bb33f1f8 points to
> > dead obj 0x00000000f87fa200
> > [369.808s][error][gc,verify]
> org.antlr.v4.runtime.atn.PredictionContextCache
> > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
> > 'org/antlr/v4/runtime/atn/PredictionContextCache'
> > [369.850s][error][gc,verify] ----------
> > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of live obj
> > 0x00000000fbc608f0 in region [0x00000000fbc00000, 0x00000000fbd00000)
> > [369.850s][error][gc,verify] class name
> > org.antlr.v4.runtime.atn.ParserATNSimulator
> > [369.850s][error][gc,verify] points to dead obj 0x00000000f87fa200 in
> > region [0x00000000f8700000, 0x00000000f8800000)
> > [369.850s][error][gc,verify] class name
> > org.antlr.v4.runtime.atn.PredictionContextCache
> > [369.850s][error][gc,verify] ----------
> >
> > All these verification errors occur after the Remark phase in
> > G1ConcurrentMark::remark() at:
> >
> > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
> > VerifyOption_G1UsePrevMarking, "Remark after");
> >
> > V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int, char const*,
> > char const*, ...)+0x106
> > V  [libjvm.so+0x7d4a99]  G1HeapVerifier::verify(VerifyOption)+0x399
> > V  [libjvm.so+0xe128bb]  Universe::verify(VerifyOption, char
> const*)+0x16b
> > V  [libjvm.so+0x7d44ee]
> >   G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType, VerifyOption, char
> > const*)+0x9e
> > V  [libjvm.so+0x7addcf]
> >   G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
> > VerifyOption, char const*)+0x9f
> > V  [libjvm.so+0x7b172e]  G1ConcurrentMark::remark()+0x3be
> > V  [libjvm.so+0xe6a5e1]  VM_CGC_Operation::doit()+0x211
> > V  [libjvm.so+0xe69908]  VM_Operation::evaluate()+0xd8
> > V  [libjvm.so+0xe6713f]  VMThread::evaluate_operation(VM_Operation*)
> [clone
> > .constprop.54]+0xff
> > V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
> > V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b
> >
> > The GC log output looks as follows:
> > ...
> > [1035.775s][info ][gc,verify,start   ] Verifying During GC (Remark after)
> > [1035.775s][debug][gc,verify         ] Threads
> > [1035.776s][debug][gc,verify         ] Heap
> > [1035.776s][debug][gc,verify         ] Roots
> > [1035.782s][debug][gc,verify         ] HeapRegionSets
> > [1035.782s][debug][gc,verify         ] HeapRegions
> > [1035.782s][error][gc,verify         ] ----------
> > ...
> > A more complete GC log can be found here [2].
> >
> > For the field 0x00000000fb509148 of live obj 0x00000000fb509130 which
> > points to the dead object 0x00000000f9ba39b0 I get the following
> > information if I inspect them with clhsdb:
> >
> > hsdb> inspect 0x00000000fb509130
> > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
> 0x00000000fb509130
> > @ 0x00000000fb509130 (size = 32)
> > _mark: 13
> > _metadata._compressed_klass: InstanceKlass for
> > org/antlr/v4/runtime/atn/ATNConfig
> > state: Oop for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> Oop
> > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> > alt: 1
> > context: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba39b0 Oop for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba39b0
> > reachesIntoOuterContext: 8
> > semanticContext: Oop for
> org/antlr/v4/runtime/atn/SemanticContext$Predicate
> > @ 0x00000000f83d57c0 Oop for
> > org/antlr/v4/runtime/atn/SemanticContext$Predicate @ 0x00000000f83d57c0
> >
> > hsdb> inspect 0x00000000f9ba39b0
> > instance of Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
> > _mark: 41551306041
> > _metadata._compressed_klass: InstanceKlass for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext
> > id: 100635259
> > cachedHashCode: 2005943142
> > parent: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba01b0 Oop for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba01b0
> > returnState: 18228
> >
> > I could also reproduce the verification errors with a fast debug build of
> > 11.0.7 which I did run with "-XX:+CheckCompressedOops -XX:+VerifyOops
> > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to the options
> > mentioned before, but unfortunaltey the run didn't trigger neither an
> > assertion nor a different verification error.
> >
> > So to summarize, my basic questions are:
> >   - has somebody else encountered similar crashes?
> >   - is someone aware of specific changes in jdk12 which might solve this
> > problem?
> >   - are the verification errors I'm seeing accurate or is it possible to
> get
> > false positives when running with -XX:Verify{Before,During,After}GC ?
> >
> > Thanks for your patience,
> > Volker
> >
> > [1]
> >
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
> > [2]
> >
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
>
>



More information about the hotspot-gc-dev mailing list