Tracking potential GC bugs

Jarkko Miettinen jarkko.miettinen at relex.fi
Tue Nov 14 13:08:07 UTC 2017


Any hints for some better forums to solicit advice?

On 20/09/2017 11.12, Jarkko Miettinen wrote:
> Hello all,
>
> I am not sure if this is the best forum for soliciting advice on how 
> to hunt potential GC bugs, but this was the best I could come up with.
>
> Ideas about better forums are welcome.
>
> This post is about bugs 
> https://bugs.openjdk.java.net/browse/JDK-8172756 and 
> https://bugs.openjdk.java.net/browse/JDK-8143310
>
> which both we're seeing when using G1 GC. We've seen this problem on 
> 92/112/131/141 releases of JDK 8.
>
> Currently, we have a situation where we're usually able to reproduce 
> the maybe crash once in a three days by running the whole application
>
> and mimicing actual usage with scripts, with no hope in sight for any 
> shorter / simpler reproduction.
>
> As the crash was in oopDesc::size(), we tried back-porting JDK-8168914 
> even though our crash was elsewhere, adding memory fence to
>
> reading/writing the class and then trying to identify if the actual 
> pointed-to class was invalid (with Metaspace::contains(obj->klass())).
>
> These changes can be seen in this changeset: 
> https://gist.github.com/jmiettinen/3ae14b2cfa509a0f17efb35e5503c17b
>
> If I've understood corretly the JDK code, the OOPs for which size-call 
> crashes are from situations where GC goes through some set of
>
> objects (let's call them BadObjects) marking all that they refer grey 
> / copying them to survivor space.
>
> So we'll end up with something like this:
>
> class BadObject {
>
>     char* ptr;
>
> }
>
> where bad_object.ptr points to some garbled value.
>
> This raises at least following hypotheses:
>
> 1. Some stage of garbage collection misses updating references in a 
> BadObject. I don't know if G1 does that kind of pointer updating.
>
> 2. Some part of the software (native code, anything using Unsafe, 
> miscompiled Java-code) garbles the pointer.
>
> For the first hypothesis, we've so far tried turning 
> _hrm.verify_optional() and verify_region_sets_optional() in in
>
> G1CollectedHeap::do_collection_pause_at_safepoint on in production, 
> but they have not caught any irregularities.
>
> Could there be other causes? Are there any suggestions for next steps 
> given how hard the reproduction is?
>
> We're unable to move to JDK9 and try reproduction there as we're 
> running JRuby and it's not working at the moment with JDK9.
>
> Used JVM parameters are:
>
> -Xms3000G -Xmx3000G -XX:MaxPermSize=512m 
> -XX:ReservedCodeCacheSize=512m -XX:+UseCodeCacheFlushing 
> -XX:MaxDirectMemorySize=20G -XX:AutoBoxCacheMax=8192 
> -XX:MetaspaceSize=512M -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions 
> -XX:G1NewSizePercent=1 -XX:G1MaxNewSizePercent=80 
> -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1HeapWastePercent=5 
> -XX:G1MixedGCCountTarget=4 -XX:MaxGCPauseMillis=3000 -verbose:gc 
> -XX:-PrintGCTimeStamps -XX:+PrintGCDateStamps 
> -XX:+PrintTenuringDistribution -XX:G1ReservePercent=20 
> -XX:SurvivorRatio=1 -XX:+UseGCOverheadLimit 
> -XX:SoftRefLRUPolicyMSPerMB=10 
> -Xloggc:/opt/apps/customer/shared/log/gc.log 
> -XX:-HeapDumpOnOutOfMemoryError -Djruby.compile.invokedynamic=false 
> -Djruby.ji.objectProxyCache=false
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use




More information about the hotspot-gc-use mailing list