答复: valgrind reveal bug of openjdk during pressure test

Fri Oct 15 01:10:19 PDT 2010

Feng,

Valgrind can't be used with JVM for many of reasons (e.g. JVM heavy use 
UNIX signals for it's own purpose, maintain it's own memory, has 
dynamically generated code etc)

If you really whant to use valgrind you have to carefully insert 
VALGRIND_* macros in the JVM code and re-build hotspot,
but it's a huge work.

-Dmitry

On 2010-10-15 06:39, Feng.Da at zxelec.com wrote:
>
>   Program is mainly apache mina with native jni. I find jdk5.0 under the same condition is running OK. So it should not be the problem of my program. I should be able to reproduce the problem, it just takes longer time to crash. By the way,
> Mina can produce lots of leaked socket under heavy GC, which shows invalid file handle when lsof -p displays them.
>
> -----邮件原件-----
> 发件人: David Holmes [mailto:David.Holmes at oracle.com]
> 发送时间: 2010年10月15日 10:35
> 收件人: Feng.Da at zxelec.com
> 抄送: hotspot-dev at openjdk.java.net
> 主题: Re: valgrind reveal bug of openjdk during pressure test
>
> Feng.Da at zxelec.com said the following on 10/15/10 11:08:
>> I’m doing a pressure test and find openjdk thrashed suse10 and crashed.
>> The following is the valgrind report and hs_error file.
>
> Ok I belatedly found the 7z command to access the original rar archive.
>
> The hs_err log shows:
>
> ---------------  T H R E A D  ---------------
>
> Current thread (0x06ea4000):  GCTaskThread [stack:
> 0x0bf9b000,0x0c01c000] [id=31719]
>
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
> si_addr=0xfffff032;;
>
> Registers:
> EAX=0x06f5c1e8, EBX=0x16840fc0, ECX=0x06f5c1e8, EDX=0xfffff032
> ESP=0x00000000, EBP=0x0c01b088, ESI=0x0c01b0b0, EDI=0x00000001
> EIP=0x077febc1, CR2=0xfffff032, EFLAGS=0x00200000
>
> Top of Stack: (sp=0x00000000)
> 0x00000000:
> [error occurred during error reporting (printing registers, top of
> stack, instructions near pc), id 0xb]
>
> Stack: [0x0bf9b000,0x0c01c000]
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> C=native code)
> V  [libjvm.so+0x576bc1];;
> _ZN10PSScavenge26copy_and_push_safe_barrierIP7oopDescEEvP18PSPromotionManagerPT_+0x11
> V  [libjvm.so+0x526bfb];;
> _ZN9OopMapSet6all_doEPK5framePK11RegisterMapP10OopClosurePFvPP7oopDescSA_ES7_+0x17b
> V  [libjvm.so+0x526a72];;
> _ZN9OopMapSet7oops_doEPK5framePK11RegisterMapP10OopClosure+0x32
> V  [libjvm.so+0x2ea3e1];;
> _ZN5frame17oops_code_blob_doEP10OopClosurePK11RegisterMap+0x31
> V  [libjvm.so+0x2eab1f];;
> _ZN5frame16oops_do_internalEP10OopClosureP11RegisterMapb+0x7f
> V  [libjvm.so+0x6131aa];;  _ZN10JavaThread7oops_doEP10OopClosure+0xea
> V  [libjvm.so+0x578d0d];;  _ZN15ThreadRootsTask5do_itEP13GCTaskManagerj+0x8d
> V  [libjvm.so+0x30e0eb];;  _ZN12GCTaskThread3runEv+0x12b
> V  [libjvm.so+0x531cce];;  _Z10java_startP6Thread+0x14e
> C  [libpthread.so.0+0x52ab]
>
>
> The valgrind log shows the original error as:
>
> ==31714== Thread 6:
> ==31714== Invalid read of size 4
> ==31714==    at 0x77FEBC1: void
> PSScavenge::copy_and_push_safe_barrier<oopDesc*>(PSPromotionManager*,
> oopDesc**) (in /opt/java/jdk1.6.0_17/jre/lib/i386/server/libjvm.so)
>
> and the secondary error during error reporting as:
>
> ==31714== Invalid read of size 4
> ==31714==    at 0x77B1D06: os::print_hex_dump(outputStream*, unsigned
> char*, unsigned char*, int) (in
> /opt/java/jdk1.6.0_17/jre/lib/i386/server/libjvm.so)
> ==31714==    by 0x77BBE35: os::print_context(outputStream*, void*) (in
> /opt/java/jdk1.6.0_17/jre/lib/i386/server/libjvm.so)
> ==31714==    by 0x78D72ED: VMError::report(outputStream*) (in
> /opt/java/jdk1.6.0_17/jre/lib/i386/server/libjvm.so)
>
> which I suspect is caused by the fact the sp = 0 (which is obviously not
> be!)
>
> What program were you running? can you reproduce the crash outside of
> valgrind?
>
> David Holmes

-- 
Dmitry Samersoff
J2SE Sustaining team, SPB04
* Give Rabbit time and he'll always get the answer ...