Disastrous bug when running jinfo and jmap

tobe tobeg3oogle at gmail.com
Tue Sep 2 13:37:56 UTC 2014


Now I'm considering something about ptrace. Our kernel version is
2.6.32-279. Maybe it doesn't resume the threads correctly. Is it related to
http://kernel.opensuse.org/cgit/kernel/commit/?h=openSUSE-13.1&id=d1f26676dad578a65c94782f0c2bd00b7aa68f1b
?


On Tue, Sep 2, 2014 at 8:03 PM, tobe <tobeg3oogle at gmail.com> wrote:

> Just like what @mikael said, running jstack -F has the same behaviour
> while jstack doesn't. But our processes have been suspended for several
> days and it's quite abnormal. I think there's something preventing the
> processes from recovering. Is it related to our running environment or
> jdk1.6?
>
>
> On Tue, Sep 2, 2014 at 6:05 PM, tobe <tobeg3oogle at gmail.com> wrote:
>
>> Hi @martijn. Do you mean you can run jmap and jinfo on the Java process
>> which has ran over 25 days? Have you checked the status of that process?
>> Our 1.6 jvms were suspended but not exited.
>>
>> If it's the issue on 1.6, can anyone help to find out that issue and
>> patch?
>>
>>
>> On Tue, Sep 2, 2014 at 5:38 PM, tobe <tobeg3oogle at gmail.com> wrote:
>>
>>> Thank @mikael for replying. But I can see the complete message "Server
>>> compiler detected" and expect the JVM to continue. It's wired that this
>>> doesn't happen when jinfo the new processes.
>>>
>>>
>>>
>>> On Tue, Sep 2, 2014 at 5:28 PM, Staffan Larsen <
>>> staffan.larsen at oracle.com> wrote:
>>>
>>>>
>>>> On 2 sep 2014, at 11:15, Mikael Gerdin <mikael.gerdin at oracle.com>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > This is the expected behavior for jmap and jinfo. If you call jstack
>>>> with the "-F" flag you will see the same behavior.
>>>> >
>>>> > The reason for this is that jmap, jinfo and jstack -F all attach to
>>>> your target JVM as a debugger and read the memory from the process. That
>>>> needs to be done when the target process is in a frozen state.
>>>>
>>>> But when jinfo/jmap/jstack is done with the process it should continue
>>>> execution.
>>>>
>>>> Is this reproducible with JDK 8?
>>>>
>>>> /Staffan
>>>>
>>>>
>>>> >
>>>> > /Mikael
>>>> >
>>>> > On 2014-09-02 11:08, tobe wrote:
>>>> >> When I run jinfo or jmap to any Java process, it will "suspend" the
>>>> Java
>>>> >> process. It's 100% reproduced for the long running processes.
>>>> >>
>>>> >> Here're the detailed steps:
>>>> >>
>>>> >> 1. Pick a Java process which is running over 25 days(It's wired
>>>> because
>>>> >> this doesn't work for new processes).
>>>> >> 2. Run ps to check the state of the process, should be "Sl" which is
>>>> >> expected.
>>>> >> 3. Run jinfo or jmap to this process(BTY, jstack doesn't have this
>>>> issue).
>>>> >> 4. Run ps to check the state of the process. This time it changes to
>>>> "Tl"
>>>> >> which means STOPPED and the process doesn't response any requests.
>>>> >>
>>>> >> Here's the output of our process:
>>>> >>
>>>> >> [work at hadoop ~]$ ps aux |grep "qktst" |grep "RegionServer"
>>>> >> work     36663  0.1  1.7 24157828 1150820 ?    Sl   Aug06  72:54
>>>> >> /opt/soft/jdk/bin/java -cp
>>>> >>
>>>> /home/work/app/hbase/qktst-qk/regionserver/:/home/work/app/hbase/qktst-qk/regionserver/package//:/home/work/app/hbase/qktst-qk/regionserver/package//lib/*:/home/work/app/hbase/qktst-qk/regionserver/package//*
>>>> >>
>>>> -Djava.library.path=:/home/work/app/hbase/qktst-qk/regionserver/package/lib/native/:/home/work/app/hbase/qktst-qk/regionserver/package/lib/native/Linux-amd64-64
>>>> >>
>>>> -Xbootclasspath/p:/home/work/app/hbase/qktst-qk/regionserver/package/lib/hadoop-security-2.0.0-mdh1.1.0.jar
>>>> >> -Xmx10240m -Xms10240m -Xmn1024m -XX:MaxDirectMemorySize=1024m
>>>> >> -XX:MaxPermSize=512m
>>>> >>
>>>> -Xloggc:/home/work/app/hbase/qktst-qk/regionserver/stdout/regionserver_gc_20140806-211157.log
>>>> >> -Xss256k -XX:PermSize=64m -XX:+HeapDumpOnOutOfMemoryError
>>>> >> -XX:HeapDumpPath=/home/work/app/hbase/qktst-qk/regionserver/log
>>>> >> -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC
>>>> -verbose:gc
>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:SurvivorRatio=6
>>>> >> -XX:+UseCMSCompactAtFullCollection
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled
>>>> >> -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled
>>>> >> -XX:CMSMaxAbortablePrecleanTime=10000 -XX:TargetSurvivorRatio=80
>>>> >> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100
>>>> -XX:GCLogFileSize=128m
>>>> >> -XX:CMSWaitDuration=2000 -XX:+CMSScavengeBeforeRemark
>>>> >> -XX:+PrintPromotionFailure -XX:ConcGCThreads=16
>>>> -XX:ParallelGCThreads=16
>>>> >> -XX:PretenureSizeThreshold=2097088 -XX:+CMSConcurrentMTEnabled
>>>> >> -XX:+ExplicitGCInvokesConcurrent -XX:+SafepointTimeout
>>>> >> -XX:MonitorBound=16384 -XX:-UseBiasedLocking
>>>> -XX:MaxTenuringThreshold=3
>>>> >> -Dproc_regionserver
>>>> >>
>>>> -Djava.security.auth.login.config=/home/work/app/hbase/qktst-qk/regionserver/jaas.conf
>>>> >> -Djava.net.preferIPv4Stack=true
>>>> >> -Dhbase.log.dir=/home/work/app/hbase/qktst-qk/regionserver/log
>>>> >> -Dhbase.pid=36663 -Dhbase.cluster=qktst-qk -Dhbase.log.level=debug
>>>> >> -Dhbase.policy.file=hbase-policy.xml
>>>> >> -Dhbase.home.dir=/home/work/app/hbase/qktst-qk/regionserver/package
>>>> >>
>>>> -Djava.security.krb5.conf=/home/work/app/hbase/qktst-qk/regionserver/krb5.conf
>>>> >> -Dhbase.id.str=work
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>>> >> [work at hadoop ~]$ jinfo 36663 > tobe.jinfo
>>>> >> Attaching to process ID 36663, please wait...
>>>> >> Debugger attached successfully.
>>>> >> Server compiler detected.
>>>> >> JVM version is 20.12-b01
>>>> >> [work at hadoop ~]$ ps aux |grep "qktst" |grep "RegionServer"
>>>> >> work     36663  0.1  1.7 24157828 1151008 ?    Tl   Aug06  72:54
>>>> >> /opt/soft/jdk/bin/java -cp
>>>> >>
>>>> /home/work/app/hbase/qktst-qk/regionserver/:/home/work/app/hbase/qktst-qk/regionserver/package//:/home/work/app/hbase/qktst-qk/regionserver/package//lib/*:/home/work/app/hbase/qktst-qk/regionserver/package//*
>>>> >>
>>>> -Djava.library.path=:/home/work/app/hbase/qktst-qk/regionserver/package/lib/native/:/home/work/app/hbase/qktst-qk/regionserver/package/lib/native/Linux-amd64-64
>>>> >>
>>>> -Xbootclasspath/p:/home/work/app/hbase/qktst-qk/regionserver/package/lib/hadoop-security-2.0.0-mdh1.1.0.jar
>>>> >> -Xmx10240m -Xms10240m -Xmn1024m -XX:MaxDirectMemorySize=1024m
>>>> >> -XX:MaxPermSize=512m
>>>> >>
>>>> -Xloggc:/home/work/app/hbase/qktst-qk/regionserver/stdout/regionserver_gc_20140806-211157.log
>>>> >> -Xss256k -XX:PermSize=64m -XX:+HeapDumpOnOutOfMemoryError
>>>> >> -XX:HeapDumpPath=/home/work/app/hbase/qktst-qk/regionserver/log
>>>> >> -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC
>>>> -verbose:gc
>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:SurvivorRatio=6
>>>> >> -XX:+UseCMSCompactAtFullCollection
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled
>>>> >> -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled
>>>> >> -XX:CMSMaxAbortablePrecleanTime=10000 -XX:TargetSurvivorRatio=80
>>>> >> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100
>>>> -XX:GCLogFileSize=128m
>>>> >> -XX:CMSWaitDuration=2000 -XX:+CMSScavengeBeforeRemark
>>>> >> -XX:+PrintPromotionFailure -XX:ConcGCThreads=16
>>>> -XX:ParallelGCThreads=16
>>>> >> -XX:PretenureSizeThreshold=2097088 -XX:+CMSConcurrentMTEnabled
>>>> >> -XX:+ExplicitGCInvokesConcurrent -XX:+SafepointTimeout
>>>> >> -XX:MonitorBound=16384 -XX:-UseBiasedLocking
>>>> -XX:MaxTenuringThreshold=3
>>>> >> -Dproc_regionserver
>>>> >>
>>>> -Djava.security.auth.login.config=/home/work/app/hbase/qktst-qk/regionserver/jaas.conf
>>>> >> -Djava.net.preferIPv4Stack=true
>>>> >> -Dhbase.log.dir=/home/work/app/hbase/qktst-qk/regionserver/log
>>>> >> -Dhbase.pid=36663 -Dhbase.cluster=qktst-qk -Dhbase.log.level=debug
>>>> >> -Dhbase.policy.file=hbase-policy.xml
>>>> >> -Dhbase.home.dir=/home/work/app/hbase/qktst-qk/regionserver/package
>>>> >>
>>>> -Djava.security.krb5.conf=/home/work/app/hbase/qktst-qk/regionserver/krb5.conf
>>>> >> -Dhbase.id.str=work
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>>> >>
>>>> >>
>>>> >> I hope some JVM experts here could help.
>>>> >>
>>>> >> $ java -version
>>>> >> java version "1.6.0_37"
>>>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>>>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>>> >>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20140902/661f7e85/attachment-0001.html>


More information about the serviceability-dev mailing list