Tracking potential GC bugs
Jarkko Miettinen
jarkko.miettinen at relex.fi
Tue Nov 14 13:08:07 UTC 2017
Any hints for some better forums to solicit advice?
On 20/09/2017 11.12, Jarkko Miettinen wrote:
> Hello all,
>
> I am not sure if this is the best forum for soliciting advice on how
> to hunt potential GC bugs, but this was the best I could come up with.
>
> Ideas about better forums are welcome.
>
> This post is about bugs
> https://bugs.openjdk.java.net/browse/JDK-8172756 and
> https://bugs.openjdk.java.net/browse/JDK-8143310
>
> which both we're seeing when using G1 GC. We've seen this problem on
> 92/112/131/141 releases of JDK 8.
>
> Currently, we have a situation where we're usually able to reproduce
> the maybe crash once in a three days by running the whole application
>
> and mimicing actual usage with scripts, with no hope in sight for any
> shorter / simpler reproduction.
>
> As the crash was in oopDesc::size(), we tried back-porting JDK-8168914
> even though our crash was elsewhere, adding memory fence to
>
> reading/writing the class and then trying to identify if the actual
> pointed-to class was invalid (with Metaspace::contains(obj->klass())).
>
> These changes can be seen in this changeset:
> https://gist.github.com/jmiettinen/3ae14b2cfa509a0f17efb35e5503c17b
>
> If I've understood corretly the JDK code, the OOPs for which size-call
> crashes are from situations where GC goes through some set of
>
> objects (let's call them BadObjects) marking all that they refer grey
> / copying them to survivor space.
>
> So we'll end up with something like this:
>
> class BadObject {
>
> char* ptr;
>
> }
>
> where bad_object.ptr points to some garbled value.
>
> This raises at least following hypotheses:
>
> 1. Some stage of garbage collection misses updating references in a
> BadObject. I don't know if G1 does that kind of pointer updating.
>
> 2. Some part of the software (native code, anything using Unsafe,
> miscompiled Java-code) garbles the pointer.
>
> For the first hypothesis, we've so far tried turning
> _hrm.verify_optional() and verify_region_sets_optional() in in
>
> G1CollectedHeap::do_collection_pause_at_safepoint on in production,
> but they have not caught any irregularities.
>
> Could there be other causes? Are there any suggestions for next steps
> given how hard the reproduction is?
>
> We're unable to move to JDK9 and try reproduction there as we're
> running JRuby and it's not working at the moment with JDK9.
>
> Used JVM parameters are:
>
> -Xms3000G -Xmx3000G -XX:MaxPermSize=512m
> -XX:ReservedCodeCacheSize=512m -XX:+UseCodeCacheFlushing
> -XX:MaxDirectMemorySize=20G -XX:AutoBoxCacheMax=8192
> -XX:MetaspaceSize=512M -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions
> -XX:G1NewSizePercent=1 -XX:G1MaxNewSizePercent=80
> -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1HeapWastePercent=5
> -XX:G1MixedGCCountTarget=4 -XX:MaxGCPauseMillis=3000 -verbose:gc
> -XX:-PrintGCTimeStamps -XX:+PrintGCDateStamps
> -XX:+PrintTenuringDistribution -XX:G1ReservePercent=20
> -XX:SurvivorRatio=1 -XX:+UseGCOverheadLimit
> -XX:SoftRefLRUPolicyMSPerMB=10
> -Xloggc:/opt/apps/customer/shared/log/gc.log
> -XX:-HeapDumpOnOutOfMemoryError -Djruby.compile.invokedynamic=false
> -Djruby.ji.objectProxyCache=false
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-use
mailing list