Application failure with traversal

Sat Feb 10 12:11:35 UTC 2018

Okidoki. This is yet another problem. I think this patch should fix it:

http://cr.openjdk.java.net/~rkennke/humongous-live-data.patch

Other than that, we need to debug your original NPE. I tried to
reproduce it by throwing the passive+barrier flags at a bunch of
programs, but failed.

As first step, I'd like to see the actual NPE stacktrace. Maybe one of
the intrinsics sticks out from there? And then a -XX:+PrintAssembly
dump would be most useful.

Have a nice weekend!
Roman

On Sat, Feb 10, 2018 at 12:17 AM, Lennart Börjeson
<lennart.borjeson at cinnober.com> wrote:
> System runs initially OK with this patch, but eventually crashed when I applied some load. Cf. comments below, and crash log at the end.
>
> I'm off for the weekend now, let's reconnect on Monday.
>
> /Lennart
>
>> 9 feb. 2018 kl. 22:40 skrev Roman Kennke <rkennke at redhat.com>:
>>
>> Alright, here's the catch: We cannot forgo the barriers on constants,
>> because we'd have to scan the whole code cache at init-traversal,
>> which precludes class unloading. Which means, we need to figure out
>> why your code trips on an NPE. I suspect we really need a reproducer
>> for that, or else an -XX:+PrintAssembly dump with the hsdis-amd64.so
>> in LD_LIBRARY_PATH.
>>
>
> I know how to build and run with hsdis, if we need to go that way.
>
>
>> If we are generating barriers on constants anyway, we can just as well
>> also implement concurrent code cache scanning in traversal mode. This
>> should enhance latency/pause times significantly.
>>
>> In the long run we may even do this without barriers on constants:
>> we'd need nmethod entry barriers (I believe we talked about this a
>> short while ago) that are activated whenever some code enters a
>> compiled nmethod, which would scan+evacuate all constants of that
>> nmethod.
>>
>> Also, I forgot to actually post the patch in my last email :-)
>>
>> http://cr.openjdk.java.net/~rkennke/traversal-no-const-barriers.patch
>>
>> Roman
>>
>> On Fri, Feb 9, 2018 at 10:21 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>> Ok, another attempt. :-)
>>> 1. Are you on the latest code from shenandoah/jdk10 ? If yes, I am
>>> wondering why my patchlet did not apply... Also, there may have been
>>> some changes that address the 'does not converge' assert lately.
>>> 2. Can you try the following patch and pass the additional VM option:
>>> -XX:ShenandoahUnloadClassesFrequency=0
>
> I'm on the tip. This time I patched with curl + patch -p1, last time cut-and-paste from your mail. Probably some formatting changed.
>
>
>>>
>>> I haven't yet managed to trip the assert that you mentioned with this.
>>>
>>> Thanks, Roman
>>>>
>
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/home/lennartb/shenandoah-jdk10/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp:371), pid=15181, tid=15215
> #  assert(used() >= get_live_data_bytes()) failed: Live Data must be a subset of used() live: 2502704 used: 1048576
> #
> # JRE version: OpenJDK Runtime Environment (10.0) (fastdebug build 10-internal+0-adhoc.lennartb.shenandoah-jdk10)
> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 10-internal+0-adhoc.lennartb.shenandoah-jdk10, mixed mode, tiered, compressed oops, Shenandoah gc, linux-amd64)
> # Core dump will be written. Default location: /home/tetest/TE/system/cd1/core.15181
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
>
> ---------------  S U M M A R Y ------------
>
> Command Line: -Xms2400M -Xmx4800M -XX:+PrintFlagsFinal -Xlog:gc*=info,safepoint*=info,vmoperation*=trace:stdout:uptime,uptimenanos,timenanos,level,tags -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=Verbose=1 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1 --add-modules=java.xml.bind --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports=java.base/sun.nio.ch=ALL-UNNAMED -XX:+UseShenandoahGC -XX:ConcGCThreads=8 -XX:ParallelGCThreads=4 -XX:MonitorBound=20000 -XX:-UseBiasedLocking -XX:+DoEscapeAnalysis -XX:+UseNUMA -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCHeuristics=traversal -DRoundRobinPrio=0 -DHibernate3=true -DdumpConfig=VALUE -Djava.net.preferIPv4Stack=true -Djava.util.prefs.systemRoot=/home/tetest com.cinnober.framework.server.impl.FwStart --stdouttolog --stderrtolog -s CD1 -r http://frank-10g.cinnober.com:22780 -i TE -v com.cinnober.common.version.TeVersion
>
> Host: frank.cinnober.com, Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 48 cores, 503G, CentOS Linux release 7.3.1611 (Core)
> Time: Sat Feb 10 00:10:57 2018 CET elapsed time: 600 seconds (0d 0h 10m 0s)
>
> ---------------  T H R E A D  ---------------
>
> Current thread (0x00007f3ed43698b0):  VMThread "VM Thread" [stack: 0x00007f3eba2f7000,0x00007f3eba3f7000] [id=15215]
>
> Stack: [0x00007f3eba2f7000,0x00007f3eba3f7000],  sp=0x00007f3eba3f5470,  free space=1017k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0x18afa0f]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x15f
> V  [libjvm.so+0x18b084a]  VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x4a
> V  [libjvm.so+0xb15bca]  report_vm_error(char const*, int, char const*, char const*, ...)+0xea
> V  [libjvm.so+0x16add89]  ShenandoahHeapRegion::garbage() const+0x99
> V  [libjvm.so+0x1502333]  ShenandoahTraversalHeuristics::choose_collection_set(ShenandoahCollectionSet*)+0x93
> V  [libjvm.so+0x16fd673]  ShenandoahTraversalGC::prepare()+0x93
> V  [libjvm.so+0x16fe2be]  ShenandoahTraversalGC::init_traversal_collection()+0x14e
> V  [libjvm.so+0x16973e7]  ShenandoahHeap::entry_init_traversal()+0x87
> V  [libjvm.so+0x18edeba]  VM_ShenandoahInitTraversalGC::doit()+0x2a
> V  [libjvm.so+0x18eaf89]  VM_Operation::evaluate()+0x159
>
>