Fwd: JVM crash HS machine

Wed Sep 7 18:01:57 UTC 2011

Oops, my memory fell short there, too...
I thought I recall it was PS without actually reading the original mail.
Sorry, it's ParNew+CMS.

-- Kris

On Thu, Sep 8, 2011 at 1:00 AM, Ramki Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

> **
> Kris, Thanks for the reminder. As you can tell my memory is short (and
> fading :-)
>
> Anyway, the crash below (and in your emails and in Yogesh's) all seem to be
> with
> ParNew (which is the young gen scavenger that typically goes with CMS when
> you
> run on an MP platform), not with ParallelScavenge.
>
> In case it resurfaces with more recent JVMs, we should follow up to see if
> something can be done ...
> If it's with older JVM's, please follow up with the appropriate support
> org.
>
> thanks!
> -- ramki
>
>
> On 9/7/2011 9:15 AM, Krystal Mok wrote:
>
> My bad. I hit "reply" instead of "reply all" on that older thread so my
> follow-ups didn't show up in the list. I'm including the original mail
> below. Anyway, it wasn't fixed here, but we don't see a reproduce any more
> (on both 6u23 and 6u25, 64-bit Server VM), so we're just letting it slip
> through. One possibility is that we're switching more and more to CMS, and
> the problem occurred in ParallelScavange.
>
>  The original mail:
>
>
>
> ---------- Forwarded message ----------
> From: Y. Srinivas Ramakrishna <y.s.ramakrishna at oracle.com>
> Date: Mon, Apr 18, 2011 at 11:31 PM
> Subject: Re: Crash log when do GC...
> To: Krystal Mok <rednaxelafx at gmail.com>
>
>
> i wonder if it's an issue with array copy stubs which leave random
> junk in some locations of the array, or if there's a race that causes
> some locations to transiently have bad data. Seems unlikely, but the
> involvement of object arrays raises some suspicions. I'll see if any
> array copying bugs have surfaced or been fixed recently although none
> comes readily to mind...
>
> PS: if it's production runs, you won't be able to use heap verification,
> but if you have a test load that reproduces the problem, may be
> heap verification might give us some clues (although given the nature of
> the problem, I am not hopeful). If you have a support contract,
> I'd suggest filing an official ticket and sending in a couple of core
> files, if you have any sitting around. That may be the only way to
> make progress on this kind of issue.
>
> -- ramki
>
>
> On 4/18/2011 8:16 AM, Krystal Mok wrote:
>
>> Hi,
>>
>> I wasn't able to make a minimal repro to this problem, because it seem to
>> happen pretty randomly, running fine for 9 to 15 hours before suddenly
>> crashing with a segfault.
>> It's already running JDK6u23, and there doesn't seem to be a lot of
>> changes
>> to HotSpot that got into JDK6u24, so I doubt if there would be any
>> progress
>> upgrading to this version. Might try JDK6u25b03 and see if there's any
>> luck.
>>
>> Attached with this email is another crash log on the same issue. The
>> program
>> had a lot of threads, and crashes with this stack trace:
>>
>> Stack: [0x0000000000000000,0x0000000000000000],  sp=0x0000000041f8a810,
>>  free space=1080874k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
>> code)
>> V  [libjvm.so+0x3e62c3]<void ParScanClosure::do_oop_work<unsigned
>> int>(unsigned int*, bool, bool)+0x63>
>> V  [libjvm.so+0x60bc83]<objArrayKlass::oop_oop_iterate_nv(oopDesc*,
>> ParScanWithoutBarrierClosure*)+0xf3>
>> V  [libjvm.so+0x6318d4]<ParScanThreadState::trim_queues(int)+0x124>
>> V  [libjvm.so+0x3e61c5]<void
>> ParScanClosure::do_oop_work<oopDesc*>(oopDesc**, bool, bool)+0x105>
>> V  [libjvm.so+0x632260]
>> <ParRootScanWithoutBarrierClosure::do_oop(oopDesc**)+0x10>
>> V  [libjvm.so+0x3702b1]<InterpreterFrameClosure::offset_do(int)+0x31>
>> V  [libjvm.so+0x619776]
>> <InterpreterOopMap::iterate_oop(OffsetClosure*)+0x86>
>> V  [libjvm.so+0x36efd8]<frame::oops_interpreted_do(OopClosure*,
>> RegisterMap
>> const*, bool)+0x188>
>> V  [libjvm.so+0x36fd71]<frame::oops_do_internal(OopClosure*,
>> CodeBlobClosure*, RegisterMap*, bool)+0xb1>
>> V  [libjvm.so+0x728fb3]<JavaThread::oops_do(OopClosure*,
>> CodeBlobClosure*)+0x1d3>
>> V  [libjvm.so+0x72bc9e]<Threads::possibly_parallel_oops_do(OopClosure*,
>> CodeBlobClosure*)+0xbe>
>> V  [libjvm.so+0x69572e]<SharedHeap::process_strong_roots(bool, bool,
>> SharedHeap::ScanningOption, OopClosure*, CodeBlobClosure*,
>> OopsInGenClosure*)+0x8e>
>> V  [libjvm.so+0x39d75d]<GenCollectedHeap::gen_process_strong_roots(int,
>> bool, bool, bool, SharedHeap::ScanningOption, OopsInGenClosure*, bool,
>> OopsInGenClosure*)+0x7d>
>> V  [libjvm.so+0x6325f6]<ParNewGenTask::work(int)+0xd6>
>> V  [libjvm.so+0x78018d]<GangWorker::loop()+0xaa>
>> V  [libjvm.so+0x7800a4]<GangWorker::run()+0x24>
>> V  [libjvm.so+0x623e1f]<java_start(Thread*)+0x13f>
>>
>> JavaThread 0x00002aaab7692800 (nid = 8559) was being processed
>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>> j  java.lang.reflect.Array.set(Ljava/lang/Object;ILjava/lang/Object;)V+0
>> J
>>
>>  com.taobao.top.core.DefaultBlackBoxEngine.callHsf(Ljava/lang/String;Ljava/lang/String;Ljava/lang/Long;Lcom/taobao/hsf/app/spring/util/SuperHSFSpringConsumerBeanTop;[Ljava/lang/String;[Ljava/lang/Object;Lcom/taobao/top/core/framework/TopPipeResult;)Ljava/lang/Object;
>> J
>>
>>  com.taobao.top.core.DefaultApiExecutor.execute(Lcom/taobao/top/core/framework/TopPipeInput;Lcom/taobao/top/core/framework/TopPipeResult;)V
>> J  com.taobao.top.core.framework.TopPipeTask.run()V
>> J  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;
>> J  java.util.concurrent.FutureTask.run()V
>> J
>>
>>  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V
>> J  java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>> j  java.lang.Thread.run()V+11
>> v  ~StubRoutines::call_stub
>>
>> What's weird about it is that this program would repeatedly crash in the
>> same function in ParNew GC, and that the JavaThread it's working on was in
>> an invocation to java.lang.reflect.Array.set(). In this case it's trying
>> to
>> dereference off a bad pointer decompressed from a narrowOop, but it's hard
>> to trace just where things went wrong at the beginning.
>>
>> We'll see if it's affordable to turn on heap verification to trace it
>> down.
>>
>> Sincerely,
>> Kris Mok
>>
>> On Mon, Apr 18, 2011 at 10:58 PM, Y. Srinivas Ramakrishna<
>> y.s.ramakrishna at oracle.com>  wrote:
>>
>>  Hi, i have heard a couple of other reports of this sort recently.
>>> But i don't think we have found or fixed any issue recently that
>>> might address this. You might want to try a more recent
>>> JVM/JDK to confirm if the crash still occurs (which i think
>>> it probably will, going by other such reports). Do you have
>>> a test case? If so, please file a bug through support or send
>>> us your test case off-line. You can also enable heap verification
>>> at some considerable GC performance cost and see if that gets us
>>> closer to the root cause. (From looking at the stack retrace it appears
>>> as though GC finds a bad reference from an object array while copying
>>> live objects from the young generation during a scavenge.)
>>>
>>> -- ramki
>>>
>>>
>>>
>>> On 4/18/2011 6:48 AM, BlueDavy Lin wrote:
>>>
>>>  hi!
>>>>
>>>>       Rencently our two app often crash when do gc,the crash log
>>>> attached,can someone give me some advice? thks.
>>>>
>>>>       ps: I tried to set -XX:-UseCompressedOops,but still crash,and
>>>> log is the same.
>>>>
>>>>
>>>>
>>>
>>
>
> On Thu, Sep 8, 2011 at 12:06 AM, Ramki Ramakrishna <
> y.s.ramakrishna at oracle.com> wrote:
>
>>  I didn't see any follow-up on the issue reported at:-
>>
>>
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html
>>
>>  so I do not know if that issue ever got satisfactorily resolved. I don't
>> think
>> there are any open bugs in our database for that issue. If there's a
>> test-case we
>> can take a look.
>>
>> thanks.
>>  -- ramki
>>
>>
>> On 9/7/2011 4:36 AM, Krystal Mok wrote:
>>
>> CC'ing hotspot-gc-dev for the first stack trace
>>
>> ---------- Forwarded message ----------
>> From: Krystal Mok <rednaxelafx at gmail.com>
>> Date: Wed, Sep 7, 2011 at 7:35 PM
>> Subject: Re: JVM crash HS machine
>> To: yogesh <ydhaked at amdocs.com>
>>
>>
>> Hi,
>>
>>  I don't think the two stack traces shown here are of the same issue. The
>> first one (the one in quotes) seem to be the same as one mentioned before:
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html ,
>> but no solutions yet (to my knowledge).
>>
>>  The second stack trace is missing some very important stuff. It's
>> important to know the caller of the operator new, which means a deeper stack
>> trace log would help; without that it's quite hard to infer any context out
>> of the stack trace. It'd also be helpful to know what signal it was.
>>
>>  Regards,
>> Kris Mok
>>
>>
>> On Wed, Sep 7, 2011 at 7:06 PM, yogesh <ydhaked at amdocs.com> wrote:
>>
>>> Igor Shprukh <igor.shprukh at ...> <igor.shprukh at ...> writes:
>>>
>>> >
>>> > I have attached the hs log file.
>>> > The JVM continuously crashes every two hours.
>>> > Thank You!
>>> > -----Original Message-----
>>> > From: Dmitry Samersoff [mailto:Dmitry.Samersoff <at> oracle.com]
>>> > Sent: Sunday, April 17, 2011 4:53 PM
>>> > To: Igor Shprukh
>>> > Cc: hotspot-runtime-dev <at> openjdk.java.net
>>> > Subject: Re: JVM crash HS machine
>>> >
>>> > Igor,
>>> >
>>> > Please, send across full hs_err_*.log
>>> >
>>> > -Dmitry
>>> >
>>> > On 2011-04-17 17:23, Igor Shprukh wrote:
>>> > > *Hi all, I have the following error after the running the JVM for
>>> about
>>> > > 5 hrs.*
>>> > >
>>> > > *This is linux – amd 64bit machine with 16 proccesors.*
>>> > >
>>> > > *The crash is at the GC, do you have any ideas on the cause ?*
>>> > >
>>> > > **
>>> > >
>>> > > *Thank You !*
>>> > >
>>> > > Program terminated with signal 6, Aborted.
>>> > >
>>> > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>>> > >
>>> > > (gdb) bt
>>> > >
>>> > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>>> > >
>>> > > #1 0x00000035b2431d10 in abort () from /lib64/libc.so.6
>>> > >
>>> > > #2 0x00002aed9f0a8fd7 in os::abort(bool) ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #3 0x00002aed9f1fc05d in VMError::report_and_die() ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #4 0x00002aed9f0af655 in JVM_handle_linux_signal ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #5 0x00002aed9f0abbae in signalHandler(int, siginfo*, void*) ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #6 <signal handler called>
>>> > >
>>> > > #7 0x00002aed9ee64703 in void ParScanClosure::do_oop_work<unsigned
>>> > > int>(unsigned int*, bool, bool) () from
>>> > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #8 0x00002aed9f095d43 in objArrayKlass::oop_oop_iterate_nv(oopDesc*,
>>> > > ParScanWithoutBarrierClosure*) () from
>>> > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #9 0x00002aed9f0bc0e4 in ParScanThreadState::trim_queues(int) ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #10 0x00002aed9f0bcbde in ParEvacuateFollowersClosure::do_void() ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #11 0x00002aed9f0bce36 in ParNewGenTask::work(int) ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #12 0x00002aed9f21245d in GangWorker::loop() ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #13 0x00002aed9f212374 in GangWorker::run() ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #14 0x00002aed9f0ae14f in java_start(Thread*) ()
>>> > >
>>> > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>> > >
>>> > > #15 0x00000035b2c0673d in start_thread () from /lib64/libpthread.so.0
>>> > >
>>> > > #16 0x00000035b24d3d1d in clone () from /lib64/libc.so.6
>>> > >
>>> > > (gdb)
>>> > >
>>> >
>>> >
>>>
>>>
>>>
>>>
>>> I have same problem with Linux and jdk1.6.0_24.
>>>
>>> If any body have any solution please let me know.
>>> Below is the part of gdb stack trace-
>>>
>>> Thread 1 (Thread 1996):
>>> #0  0xffffe410 in __kernel_vsyscall ()
>>> No symbol table info available.
>>> #1  0x00b0ddf0 in raise () from /lib/libc.so.6
>>> No symbol table info available.
>>> #2  0x00b0f701 in abort () from /lib/libc.so.6
>>> No symbol table info available.
>>> #3  0xf78d823f in os::abort(bool) ()
>>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>> No symbol table info available.
>>> #4  0xf7a1f431 in VMError::report_and_die() ()
>>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>> No symbol table info available.
>>> #5  0xf78df1dc in JVM_handle_linux_signal ()
>>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>> No symbol table info available.
>>> #6  0xf78db124 in signalHandler(int, siginfo*, void*) ()
>>> from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>> No symbol table info available.
>>> #7  <signal handler called>
>>> No symbol table info available.
>>> #8  0x00b4ef5f in _int_malloc () from /lib/libc.so.6
>>> No symbol table info available.
>>> #9  0x00b50fb7 in malloc () from /lib/libc.so.6
>>> No symbol table info available.
>>> #10 0x4c242af7 in operator new(unsigned int) () from
>>> /usr/lib/libstdc++.so.6
>>>
>>> Thanks
>>> /Y
>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110908/2f356bcb/attachment.htm>