Fwd: JVM crash HS machine

Wed Sep 7 09:15:55 PDT 2011

My bad. I hit "reply" instead of "reply all" on that older thread so my
follow-ups didn't show up in the list. I'm including the original mail
below. Anyway, it wasn't fixed here, but we don't see a reproduce any more
(on both 6u23 and 6u25, 64-bit Server VM), so we're just letting it slip
through. One possibility is that we're switching more and more to CMS, and
the problem occurred in ParallelScavange.

The original mail:

---------- Forwarded message ----------
From: Y. Srinivas Ramakrishna <y.s.ramakrishna at oracle.com>
Date: Mon, Apr 18, 2011 at 11:31 PM
Subject: Re: Crash log when do GC...
To: Krystal Mok <rednaxelafx at gmail.com>

i wonder if it's an issue with array copy stubs which leave random
junk in some locations of the array, or if there's a race that causes
some locations to transiently have bad data. Seems unlikely, but the
involvement of object arrays raises some suspicions. I'll see if any
array copying bugs have surfaced or been fixed recently although none
comes readily to mind...

PS: if it's production runs, you won't be able to use heap verification,
but if you have a test load that reproduces the problem, may be
heap verification might give us some clues (although given the nature of
the problem, I am not hopeful). If you have a support contract,
I'd suggest filing an official ticket and sending in a couple of core
files, if you have any sitting around. That may be the only way to
make progress on this kind of issue.

-- ramki

On 4/18/2011 8:16 AM, Krystal Mok wrote:

> Hi,
>
> I wasn't able to make a minimal repro to this problem, because it seem to
> happen pretty randomly, running fine for 9 to 15 hours before suddenly
> crashing with a segfault.
> It's already running JDK6u23, and there doesn't seem to be a lot of changes
> to HotSpot that got into JDK6u24, so I doubt if there would be any progress
> upgrading to this version. Might try JDK6u25b03 and see if there's any
> luck.
>
> Attached with this email is another crash log on the same issue. The
> program
> had a lot of threads, and crashes with this stack trace:
>
> Stack: [0x0000000000000000,**0x0000000000000000],  sp=0x0000000041f8a810,
>  free space=1080874k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V  [libjvm.so+0x3e62c3]<void ParScanClosure::do_oop_work<**unsigned
> int>(unsigned int*, bool, bool)+0x63>
> V  [libjvm.so+0x60bc83]<**objArrayKlass::oop_oop_**iterate_nv(oopDesc*,
> ParScanWithoutBarrierClosure*)**+0xf3>
> V  [libjvm.so+0x6318d4]<**ParScanThreadState::trim_**queues(int)+0x124>
> V  [libjvm.so+0x3e61c5]<void
> ParScanClosure::do_oop_work<**oopDesc*>(oopDesc**, bool, bool)+0x105>
> V  [libjvm.so+0x632260]
> <**ParRootScanWithoutBarrierClosu**re::do_oop(oopDesc**)+0x10>
> V  [libjvm.so+0x3702b1]<**InterpreterFrameClosure::**offset_do(int)+0x31>
> V  [libjvm.so+0x619776]
> <InterpreterOopMap::iterate_**oop(OffsetClosure*)+0x86>
> V  [libjvm.so+0x36efd8]<frame::**oops_interpreted_do(**OopClosure*,
> RegisterMap
> const*, bool)+0x188>
> V  [libjvm.so+0x36fd71]<frame::**oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0xb1>
> V  [libjvm.so+0x728fb3]<**JavaThread::oops_do(**OopClosure*,
> CodeBlobClosure*)+0x1d3>
> V  [libjvm.so+0x72bc9e]<Threads::**possibly_parallel_oops_do(**
> OopClosure*,
> CodeBlobClosure*)+0xbe>
> V  [libjvm.so+0x69572e]<**SharedHeap::process_strong_**roots(bool, bool,
> SharedHeap::ScanningOption, OopClosure*, CodeBlobClosure*,
> OopsInGenClosure*)+0x8e>
> V  [libjvm.so+0x39d75d]<**GenCollectedHeap::gen_process_**
> strong_roots(int,
> bool, bool, bool, SharedHeap::ScanningOption, OopsInGenClosure*, bool,
> OopsInGenClosure*)+0x7d>
> V  [libjvm.so+0x6325f6]<**ParNewGenTask::work(int)+0xd6>
> V  [libjvm.so+0x78018d]<**GangWorker::loop()+0xaa>
> V  [libjvm.so+0x7800a4]<**GangWorker::run()+0x24>
> V  [libjvm.so+0x623e1f]<java_**start(Thread*)+0x13f>
>
> JavaThread 0x00002aaab7692800 (nid = 8559) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  java.lang.reflect.Array.set(**Ljava/lang/Object;ILjava/lang/**
> Object;)V+0
> J
>  com.taobao.top.core.**DefaultBlackBoxEngine.callHsf(**
> Ljava/lang/String;Ljava/lang/**String;Ljava/lang/Long;Lcom/**
> taobao/hsf/app/spring/util/**SuperHSFSpringConsumerBeanTop;**
> [Ljava/lang/String;[Ljava/**lang/Object;Lcom/taobao/top/**
> core/framework/TopPipeResult;)**Ljava/lang/Object;
> J
>  com.taobao.top.core.**DefaultApiExecutor.execute(**Lcom/taobao/top/core/*
> *framework/TopPipeInput;Lcom/**taobao/top/core/framework/**
> TopPipeResult;)V
> J  com.taobao.top.core.framework.**TopPipeTask.run()V
> J  java.util.concurrent.**Executors$RunnableAdapter.**
> call()Ljava/lang/Object;
> J  java.util.concurrent.**FutureTask.run()V
> J
>  java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(Ljava/lang/Runnable;)V
> J  java.util.concurrent.**ThreadPoolExecutor$Worker.run(**)V
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
>
> What's weird about it is that this program would repeatedly crash in the
> same function in ParNew GC, and that the JavaThread it's working on was in
> an invocation to java.lang.reflect.Array.set(). In this case it's trying to
> dereference off a bad pointer decompressed from a narrowOop, but it's hard
> to trace just where things went wrong at the beginning.
>
> We'll see if it's affordable to turn on heap verification to trace it down.
>
> Sincerely,
> Kris Mok
>
> On Mon, Apr 18, 2011 at 10:58 PM, Y. Srinivas Ramakrishna<
> y.s.ramakrishna at oracle.com>  wrote:
>
> Hi, i have heard a couple of other reports of this sort recently.
>> But i don't think we have found or fixed any issue recently that
>> might address this. You might want to try a more recent
>> JVM/JDK to confirm if the crash still occurs (which i think
>> it probably will, going by other such reports). Do you have
>> a test case? If so, please file a bug through support or send
>> us your test case off-line. You can also enable heap verification
>> at some considerable GC performance cost and see if that gets us
>> closer to the root cause. (From looking at the stack retrace it appears
>> as though GC finds a bad reference from an object array while copying
>> live objects from the young generation during a scavenge.)
>>
>> -- ramki
>>
>>
>>
>> On 4/18/2011 6:48 AM, BlueDavy Lin wrote:
>>
>> hi!
>>>
>>>       Rencently our two app often crash when do gc,the crash log
>>> attached,can someone give me some advice? thks.
>>>
>>>       ps: I tried to set -XX:-UseCompressedOops,but still crash,and
>>> log is the same.
>>>
>>>
>>>
>>
>

On Thu, Sep 8, 2011 at 12:06 AM, Ramki Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

> **
> I didn't see any follow-up on the issue reported at:-
>
>
>
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html
>
> so I do not know if that issue ever got satisfactorily resolved. I don't
> think
> there are any open bugs in our database for that issue. If there's a
> test-case we
> can take a look.
>
> thanks.
> -- ramki
>
>
> On 9/7/2011 4:36 AM, Krystal Mok wrote:
>
> CC'ing hotspot-gc-dev for the first stack trace
>
> ---------- Forwarded message ----------
> From: Krystal Mok <rednaxelafx at gmail.com>
> Date: Wed, Sep 7, 2011 at 7:35 PM
> Subject: Re: JVM crash HS machine
> To: yogesh <ydhaked at amdocs.com>
>
>
> Hi,
>
>  I don't think the two stack traces shown here are of the same issue. The
> first one (the one in quotes) seem to be the same as one mentioned before:
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html ,
> but no solutions yet (to my knowledge).
>
>  The second stack trace is missing some very important stuff. It's
> important to know the caller of the operator new, which means a deeper stack
> trace log would help; without that it's quite hard to infer any context out
> of the stack trace. It'd also be helpful to know what signal it was.
>
>  Regards,
> Kris Mok
>
>
> On Wed, Sep 7, 2011 at 7:06 PM, yogesh <ydhaked at amdocs.com> wrote:
>
>> Igor Shprukh <igor.shprukh at ...> <igor.shprukh at ...> writes:
>>
>> >
>> > I have attached the hs log file.
>> > The JVM continuously crashes every two hours.
>> > Thank You!
>> > -----Original Message-----
>> > From: Dmitry Samersoff [mailto:Dmitry.Samersoff <at> oracle.com]
>> > Sent: Sunday, April 17, 2011 4:53 PM
>> > To: Igor Shprukh
>> > Cc: hotspot-runtime-dev <at> openjdk.java.net
>> > Subject: Re: JVM crash HS machine
>> >
>> > Igor,
>> >
>> > Please, send across full hs_err_*.log
>> >
>> > -Dmitry
>> >
>> > On 2011-04-17 17:23, Igor Shprukh wrote:
>> > > *Hi all, I have the following error after the running the JVM for
>> about
>> > > 5 hrs.*
>> > >
>> > > *This is linux – amd 64bit machine with 16 proccesors.*
>> > >
>> > > *The crash is at the GC, do you have any ideas on the cause ?*
>> > >
>> > > **
>> > >
>> > > *Thank You !*
>> > >
>> > > Program terminated with signal 6, Aborted.
>> > >
>> > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>> > >
>> > > (gdb) bt
>> > >
>> > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>> > >
>> > > #1 0x00000035b2431d10 in abort () from /lib64/libc.so.6
>> > >
>> > > #2 0x00002aed9f0a8fd7 in os::abort(bool) ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #3 0x00002aed9f1fc05d in VMError::report_and_die() ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #4 0x00002aed9f0af655 in JVM_handle_linux_signal ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #5 0x00002aed9f0abbae in signalHandler(int, siginfo*, void*) ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #6 <signal handler called>
>> > >
>> > > #7 0x00002aed9ee64703 in void ParScanClosure::do_oop_work<unsigned
>> > > int>(unsigned int*, bool, bool) () from
>> > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #8 0x00002aed9f095d43 in objArrayKlass::oop_oop_iterate_nv(oopDesc*,
>> > > ParScanWithoutBarrierClosure*) () from
>> > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #9 0x00002aed9f0bc0e4 in ParScanThreadState::trim_queues(int) ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #10 0x00002aed9f0bcbde in ParEvacuateFollowersClosure::do_void() ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #11 0x00002aed9f0bce36 in ParNewGenTask::work(int) ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #12 0x00002aed9f21245d in GangWorker::loop() ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #13 0x00002aed9f212374 in GangWorker::run() ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #14 0x00002aed9f0ae14f in java_start(Thread*) ()
>> > >
>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>> > >
>> > > #15 0x00000035b2c0673d in start_thread () from /lib64/libpthread.so.0
>> > >
>> > > #16 0x00000035b24d3d1d in clone () from /lib64/libc.so.6
>> > >
>> > > (gdb)
>> > >
>> >
>> >
>>
>>
>>
>>
>> I have same problem with Linux and jdk1.6.0_24.
>>
>> If any body have any solution please let me know.
>> Below is the part of gdb stack trace-
>>
>> Thread 1 (Thread 1996):
>> #0  0xffffe410 in __kernel_vsyscall ()
>> No symbol table info available.
>> #1  0x00b0ddf0 in raise () from /lib/libc.so.6
>> No symbol table info available.
>> #2  0x00b0f701 in abort () from /lib/libc.so.6
>> No symbol table info available.
>> #3  0xf78d823f in os::abort(bool) ()
>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>> No symbol table info available.
>> #4  0xf7a1f431 in VMError::report_and_die() ()
>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>> No symbol table info available.
>> #5  0xf78df1dc in JVM_handle_linux_signal ()
>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>> No symbol table info available.
>> #6  0xf78db124 in signalHandler(int, siginfo*, void*) ()
>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>> No symbol table info available.
>> #7  <signal handler called>
>> No symbol table info available.
>> #8  0x00b4ef5f in _int_malloc () from /lib/libc.so.6
>> No symbol table info available.
>> #9  0x00b50fb7 in malloc () from /lib/libc.so.6
>> No symbol table info available.
>> #10 0x4c242af7 in operator new(unsigned int) () from
>> /usr/lib/libstdc++.so.6
>>
>> Thanks
>> /Y
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20110908/ffc36c8b/attachment-0001.html