need help on bug 6894652
Coleen Phillimore - Sun Microsystems
Coleen.Phillimore at Sun.COM
Mon Dec 7 07:48:15 PST 2009
Hi Yumin,
Sorry I just saw this with SWAN available. To put a webrev to openjdk
do this (substituting your name and webrev directory):
echo "scp -r /home/coleenp/public_html/webrev/$1
coleenp at cr.openjdk.java.net:"
echo open webrev at http://cr.openjdk.java.net/~coleenp/$1/
The openjdk alias is: hotspot-runtime-dev at openjdk.java.net
I was looking at the webrev again, and do you ever reset this to zero,
after the thread stack walking is done for all the threads?
+ ((NamedThread *)thread)->set_java_thread_gc(this);
Thanks,
Coleen
On 12/04/09 20:45, Yumin Qi wrote:
> Hi, Coleen
>
> I have prepared the webrev according to your suggestions. But I am
> not sure which one is the open alias, could you send me the right
> alias name?
>
>
> Thanks
> Yumin
>
> On 12/03/09 16:31, Coleen Phillimore wrote:
>>
>> Hi Yumin,
>> Sorry this got flooded in my mailbox yesterday with Ramki's Intel bug
>> mail and other so I didn't see it.
>> This change looks fantastic! I wouldn't have expected a performance
>> degradation but it's great that you checked. Also, I like the idea
>> of having a -XX:+CrashGCForDumpingJavaThread for verifying that the
>> code doesn't bit-rot in between customers hitting it, but it'd still
>> require someone to run a test with the flag which probably won't
>> happen too much.
>>
>> I think this should be backported to older versions of the jvm too.
>>
>> When you send it out for review, send it to the open alias. They
>> will like this code too.
>>
>> Thanks!!
>> Coleen
>>
>> Yumin Qi wrote:
>>> Coleen,
>>>
>>> Please have a look at the code change via link:
>>> http://javaweb.sfbay/~yq123930/webrev/6361589
>>>
>>> I have done test using volano to check if there is performance
>>> degradation. The result showed OK:
>>>
>>> result_sparc_fcs:
>>> /java/re/jdk/7/promoted/latest/binaries/solaris-sparc
>>> result_sparc_new: hg.openjdk.java.net/jdk7/hotspot-rt/hotspot +
>>> this change.
>>>
>>> $ sh compare volano_result_sparc_new volano_result_sparc_fcs
>>> ============================================================================
>>>
>>> volano_result_sparc_new
>>> Benchmark Samples Mean Stdev Geomean
>>> Weight
>>> volano 20 43971.86 6081.44
>>> ============================================================================
>>>
>>> volano_result_sparc_fcs
>>> Benchmark Samples Mean Stdev %Diff P
>>> Significant
>>> volano 20 46286.67 7288.64 5.26
>>> 0.283 *
>>> ============================================================================
>>>
>>> * - Not Significant: The %Diff for the mean could be noise. If
>>> there is
>>> an actual difference, more samples would be needed to detect it.
>>>
>>> Also, I have set up some error code to cause SEGV manually in
>>> frame::oops_do to verify the output. The output is as expected,
>>> after verification, the code inserted was removed. If you think this
>>> version is okay, I will submit a codereview for integration.
>>>
>>> (To verify this code, the SEGV-ing code can be wrapped by a debug
>>> flag, such as -XX:+CrashGCForDumpingJavaThread, in debug, when this
>>> flag set, trigger a SEGV in frame::oops_do so the output will be
>>> handled in VMError::report(), but I don't think we need this extra
>>> code to verify it).
>>>
>>>
>>> Thanks
>>> Yumin
>>>
>>> Coleen Phillimore wrote:
>>>>
>>>> Hi Yumin,
>>>>
>>>> I was thinking exactly the second way, but I couldn't figure out
>>>> where to put extra data member to point to the thread that it's
>>>> walking. I didn't look at it for very long though. It sounds like
>>>> you have figured that out. I think the extra data member won't
>>>> matter or even setting it up, performance wise and the GC people
>>>> would so appreciate the extra information it would be worth it.
>>>> The first way seems like it might be more code and more likely to
>>>> crash during error reporting which wouldn't be good, so I
>>>> completely agree with your idea.
>>>> This is great- thanks for taking this!!
>>>>
>>>> Coleen
>>>>
>>>> Yumin Qi wrote:
>>>>> Coleen,
>>>>>
>>>>> I checked the code, thought there might be two ways to output the
>>>>> related JavaThread information (if it is involved in GC thread
>>>>> crashed).
>>>>> When process crashed, we call report in VMError, but there is no
>>>>> record of JavaThread in failed GC thread so we have to find if
>>>>> there is a JavaThread involved.
>>>>>
>>>>> 1) when printing out current thread (failed one) stacks, check
>>>>> the data in current frame, if it is belong to stack of a
>>>>> JavaThread, we know it "could" be this JavaThread involved. Need
>>>>> to check every data in this frame against all the stack limits of
>>>>> all JavaThreads. So it will cost much time to iterate all the data
>>>>> in one frame then for all the frames. If found, keep this
>>>>> JavaThread then do next.
>>>>>
>>>>> 2) JavaThread::oops_do, this function is called by GC threads, VM
>>>>> Thread, ConcurrentGCthread . We can have an extra data member in
>>>>> Thread (or vmThread, GCTaskThread, GangWorker, ConcurrentGCThread)
>>>>> to record the JavaThread. After crash, we will be in signal
>>>>> handler, call to VMError::report(...). If failed thread is one of
>>>>> above threads, check if the data member of JavaThread , if it is
>>>>> valid one do next.
>>>>>
>>>>> Now we have JavaThread, from above two, we can print out stacks,
>>>>> java methods etc information from it.
>>>>>
>>>>> I prefer second way, it is simple, but need to add extra data
>>>>> member, may not make others happy. Also, in GC, extra code to
>>>>> record its value. I can test if there are much degradation
>>>>> between no-change vs change.
>>>>>
>>>>> For first way, It cost much more time (but it does not matter,
>>>>> the process already died, who cares about take more time to print
>>>>> out more information?), we need to find if a JavaThread in its
>>>>> valid stack frames. But if there are thousands of threads, it will
>>>>> be taking too much time.
>>>>>
>>>>> Do you have different idea how to implement this functionality?
>>>>> Please let me know.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>> Coleen Phillimore - Sun Microsystems wrote:
>>>>>>
>>>>>> The RFE is 6361589 .. You'd be a good person to add this support
>>>>>> because you do this all the time in the debugger!
>>>>>>
>>>>>> Thanks!!
>>>>>> Coleen
>>>>>>
>>>>>> On 11/12/09 12:44, Yumin Qi wrote:
>>>>>>> Coleen,
>>>>>>>
>>>>>>> Do you have the bug id for the RFE, I would like to work on it.
>>>>>>> BTW, customer got fastdebug to run, hope we can get more closer
>>>>>>> look at the failure if any.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>> Coleen Phillimore wrote:
>>>>>>>> It would be good to know the bytecodes of the method at bci 39
>>>>>>>> and see if local (or stack)[5] is supposed to be an oop.
>>>>>>>> CBOE is usually good about running fastdebug vm, can they get
>>>>>>>> this to crash with that? We'd have more information.
>>>>>>>> I don't know the answer to your last question about the frame
>>>>>>>> containing pc not being real frame.
>>>>>>>> I have an RFE to print out the thread stack of the thread that
>>>>>>>> GC is working on if it crashes in the hs_err file. That would
>>>>>>>> have saved a lot of debugging (and we'd know the method to run
>>>>>>>> javap on). It was lowered in priority because we hadn't seen
>>>>>>>> this sort of crash for a while. Yumin, since you do this
>>>>>>>> manually all the time maybe you can add code to do this.
>>>>>>>> Also, you might be able to manually figure out the method from
>>>>>>>> the address, then we can have CBOE run javap on it and see the
>>>>>>>> bytecodes around 39. Otherwise see if they can reproduce this
>>>>>>>> with fastdebug.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> Yumin Qi wrote:
>>>>>>>>> Hi, Tom and Coleen
>>>>>>>>>
>>>>>>>>> I am working on a bug filed by CBOE, it run on solaris-i586
>>>>>>>>> 32bit on Solaris 10 with JDK 6u04.
>>>>>>>>>
>>>>>>>>> The location for core file is:
>>>>>>>>> /net/cores/cores/dir23/71840156/102109_case1
>>>>>>>>> (use dbx -s dbxrc to read the core file)
>>>>>>>>> When it crashed, the GC thread was iterating over a
>>>>>>>>> JavaThread and an oop in stack is not a valid oop.
>>>>>>>>> The GC thread is t at 6:
>>>>>>>>> current thread: t at 6
>>>>>>>>> [1] __lwp_kill(0x6, 0x6), at 0xfef254d7
>>>>>>>>> [2] _thr_kill(0x6, 0x6), at 0xfef22c86
>>>>>>>>> [3] raise(0x6), at 0xfeed13f3
>>>>>>>>> [4] abort(0xfecb8000, 0xfe24a4f8, 0xfe24a4f8, 0xfeb7d838,
>>>>>>>>> 0x1, 0x807f9d0), at 0xfeeb1709
>>>>>>>>> [5] os::abort(0x1), at 0xfead7440
>>>>>>>>> [6] VMError::report_and_die(0xfe24a690), at 0xfeb7d838
>>>>>>>>> [7] JVM_handle_solaris_signal(0xb, 0xfe24a9b4, 0xfe24a7b4,
>>>>>>>>> 0x1), at 0xfe71bf0b
>>>>>>>>> [8] signalHandler(0xb, 0xfe24a9b4, 0xfe24a7b4), at 0xfe71b7c2
>>>>>>>>> [9] __sighndlr(0xb, 0xfe24a9b4, 0xfe24a7b4, 0xfe71b79c), at
>>>>>>>>> 0xfef2489f
>>>>>>>>> [10] call_user_handler(0xb, 0xfe24a9b4, 0xfe24a7b4), at
>>>>>>>>> 0xfef1a75e
>>>>>>>>> [11] sigacthandler(0xb, 0xfe24a9b4, 0xfe24a7b4, 0xf, 0x0,
>>>>>>>>> 0x4), at 0xfef1a8de
>>>>>>>>> ---- called from signal handler with signal 11 (SIGSEGV) ------
>>>>>>>>> [12]
>>>>>>>>> ParNewGeneration::copy_to_survivor_space_avoiding_promotion_undo(0x8081b70,
>>>>>>>>> 0xe322738, 0x8d8505f8, 0x3b88c022, 0x0), at 0xfeae2862
>>>>>>>>> [13] ParScanClosure::do_oop_work(0xe3227ec, 0x6dbce8dc, 0x0,
>>>>>>>>> 0x1), at 0xfeae3454
>>>>>>>>> [14] ParRootScanWithoutBarrierClosure::do_oop(0xe3227ec,
>>>>>>>>> 0x6dbce8dc), at 0xfeae3343
>>>>>>>>> [15] InterpreterFrameClosure::offset_do(0xfe24ab74, 0x5), at
>>>>>>>>> 0xfe6cb893
>>>>>>>>> [16] InterpreterOopMap::iterate_oop(0xfe24ab88, 0xfe24ab74),
>>>>>>>>> at 0xfe6db740
>>>>>>>>> =>[17] frame::oops_interpreted_do(0xfe24ac34, 0xe3227ec,
>>>>>>>>> 0xfe24ac4c, 0x1), at 0xfe94e11f
>>>>>>>>> [18] frame::oops_do_internal(0xfe24ac34, 0xe3227ec,
>>>>>>>>> 0xfe24ac4c, 0x1), at 0xfe94e77a
>>>>>>>>> [19] JavaThread::oops_do(0x9dc1e20, 0xe3227ec), at 0xfe6e4a03
>>>>>>>>> [20] Threads::possibly_parallel_oops_do(0xe3227ec), at
>>>>>>>>> 0xfeb4758f
>>>>>>>>> [21] SharedHeap::process_strong_roots(0x807cfe8, 0x0, 0x1,
>>>>>>>>> 0xe3227ec, 0xe322810), at 0xfeb0a85f
>>>>>>>>> [22] GenCollectedHeap::gen_process_strong_roots(0x807cfe8,
>>>>>>>>> 0x0, 0x1, 0x0, 0x1, 0xe322810, 0xe3227ec), at 0xfe954a49
>>>>>>>>> [23] ParNewGenTask::work(0x759feb18, 0x4), at 0xfeae1a63
>>>>>>>>> [24] GangWorker::loop(0x807f9d0), at 0xfeb82272
>>>>>>>>> [25] GangWorker::run(0x807f9d0), at 0xfeb8215b
>>>>>>>>> [26] java_start(0x807f9d0), at 0xfead6d62
>>>>>>>>> [27] _thr_setup(0xfe4b2200), at 0xfef244b7
>>>>>>>>> [28] _lwp_start(0x6, 0x6, 0xfef4e000, 0xfe24a3fc, 0xfeed13f3,
>>>>>>>>> 0x6), at 0xfef247a0
>>>>>>>>>
>>>>>>>>> The JavaThread it was iterating is t at 18098
>>>>>>>>> frame info:
>>>>>>>>> *((class frame *) 0xfe24ac34U) = {
>>>>>>>>> frame::_sp = 0x6dbce8d8
>>>>>>>>> frame::_pc = 0xfb2126ed "\xe9L"
>>>>>>>>> frame::_cb = 0xfb202508
>>>>>>>>> frame::_deopt_state = not_deoptimized
>>>>>>>>> frame::_fp = 0x6dbce908
>>>>>>>>> frame::_unextended_sp = 0x6dbce8d8
>>>>>>>>>
>>>>>>>>> *(InterpreterOopMap*)0xfe24ab88
>>>>>>>>> *((class InterpreterOopMap *) 0xfe24ab88U) = {
>>>>>>>>> InterpreterOopMap::_method = 0xf468c368
>>>>>>>>> InterpreterOopMap::_bci = 39U
>>>>>>>>> InterpreterOopMap::_mask_size = 6
>>>>>>>>> InterpreterOopMap::_expression_stack_size = 1
>>>>>>>>> InterpreterOopMap::_bit_mask = (47, 0)
>>>>>>>>>
>>>>>>>>> The offset in the interpreter frame is 5.
>>>>>>>>>
>>>>>>>>> JavaThread is t at 18098:
>>>>>>>>> current thread: t at 18098
>>>>>>>>> [1] ___lwp_mutex_timedlock(0x80790d8, 0x0), at 0xfef25547
>>>>>>>>> [2] _private_lwp_mutex_lock(0x80790d8), at 0xfef1183c
>>>>>>>>> [3] Mutex::lock_without_safepoint_check(0x8079080), at
>>>>>>>>> 0xfe6c22d3
>>>>>>>>> [4] SafepointSynchronize::block(0x9dc1e20), at 0xfe7081c1
>>>>>>>>> [5] InterpreterRuntime::monitorexit(0x9dc1e20, 0x6dbce8e0,
>>>>>>>>> 0xfb2126ed, 0xa8f05060, 0x8d8505f8, 0x3), at 0xfe6f8164
>>>>>>>>> =>[6] 0xfb212713(0x0, 0xa8f05060, 0xceea7ad8, 0xaa3a70e8,
>>>>>>>>> 0x90f36230, 0x9dc1e20), at 0xfb212713
>>>>>>>>> [7] 0xfb20028f(0x6dbce970, 0x6dbcebc8, 0xa, 0xf468c368,
>>>>>>>>> 0xfb208d00, 0x6dbcead8, 0x1, 0x9dc1e20), at 0xfb20028f
>>>>>>>>> [8] JavaCalls::call_helper(0x6dbcebc4, 0x6dbcea48,
>>>>>>>>> 0x6dbcead4, 0x9dc1e20), at 0xfe6f476b
>>>>>>>>> [9] os::os_exception_wrapper(0xfe6f45c8, 0x6dbcebc4,
>>>>>>>>> 0x6dbcea48, 0x6dbcead4, 0x9dc1e20), at 0xfe6f4a13
>>>>>>>>> [10] JavaCalls::call(0x6dbcebc4, 0x960eb48, 0x6dbcead4,
>>>>>>>>> 0x9dc1e20), at 0xfe6f4a4b
>>>>>>>>> [11] JavaCalls::call_virtual(0x6dbcebc4, 0x960eb3c,
>>>>>>>>> 0xfed0daf4, 0xfed0dd58, 0x6dbcead4, 0x9dc1e20), at 0xfe76b069
>>>>>>>>> [12] JavaCalls::call_virtual(0x6dbcebc4, 0x960eb38,
>>>>>>>>> 0x960eb3c, 0xfed0daf4, 0xfed0dd58, 0x9dc1e20), at 0xfe773532
>>>>>>>>> [13] thread_entry(0x9dc1e20, 0x9dc1e20), at 0xfe784ca6
>>>>>>>>> [14] JavaThread::thread_main_inner(0x9dc1e20), at 0xfe7811d8
>>>>>>>>> [15] JavaThread::run(0x9dc1e20), at 0xfe781182
>>>>>>>>> [16] java_start(0x9dc1e20), at 0xfead6d62
>>>>>>>>> [17] _thr_setup(0x6e560a00), at 0xfef244b7
>>>>>>>>> [18] _lwp_start(0x80790d8, 0x0, 0xfecb8000, 0x6dbce858,
>>>>>>>>> 0xfe6c22d3, 0x80790d8), at 0xfef247a0
>>>>>>>>>
>>>>>>>>> frame 6 is the java frame which was being checked by t at 6, the
>>>>>>>>> frame:
>>>>>>>>> 0xfb2126d3: testl %ebx,%ebx
>>>>>>>>> 0xfb2126d5: je 0xfb21273e [ 0xfb21273e, .+0x69 ]
>>>>>>>>> 0xfb2126db: lock cmpxchgl %ebx,(%ecx)
>>>>>>>>> 0xfb2126df: je 0xfb21273e [ 0xfb21273e, .+0x5f ]
>>>>>>>>> 0xfb2126e5: movl %ecx,0x00000004(%edx)
>>>>>>>>> 0xfb2126e8: call 0xfb2126f2 [ 0xfb2126f2, .+0xa
>>>>>>>>> ] ///// this call to an instruction in this frame
>>>>>>>>> 0xfb2126ed: jmp 0xfb21273e [ 0xfb21273e, .+0x51
>>>>>>>>> ] ///// this address 0xfb2126e8: the pc in frame
>>>>>>>>> 0xfb2126f2: pushl
>>>>>>>>> %edx
>>>>>>>>> //// this is the called address
>>>>>>>>> 0xfb2126f3: leal 0x00000008(%esp),%eax
>>>>>>>>> 0xfb2126f7: movl %esi,0xffffffe4(%ebp)
>>>>>>>>> 0xfb2126fa: movl %gs:0x000000e0 [ 0xe0 ],%edi
>>>>>>>>> 0xfb212701: pushl %edi
>>>>>>>>> 0xfb212702: movl %ebp,0x000000ec(%edi)
>>>>>>>>> 0xfb212708: movl %eax,0x000000e4(%edi)
>>>>>>>>> 0xfb21270e: call monitorexit [ 0xfe6f802c,
>>>>>>>>> .+0x34e591e ] //// next frame
>>>>>>>>> 0xfb212713: movl
>>>>>>>>> $0x00000000,0x000000e4(%edi) //// real frame pc
>>>>>>>>> is here!!!!!
>>>>>>>>> 0xfb21271d: movl $0x00000000,0x000000ec(%edi)
>>>>>>>>> 0xfb212727: addl $0x00000008,%esp
>>>>>>>>> 0xfb21272a: cmpl $0x00000000,0x00000004(%edi)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is no deoptimization involved, the pc in
>>>>>>>>> Thread::JavaFrameAncore is null, so the pc was calculated with
>>>>>>>>> sp[-1], which is the value we see here:
>>>>>>>>> 0x6dbce8d4: 0xfb2126ed
>>>>>>>>>
>>>>>>>>> But the frame containing this pc is not a real frame I
>>>>>>>>> think(maybe I am wrong?), it just call address 0xfb2126f2 and
>>>>>>>>> caused the instruction address after it pushed on to stack. I
>>>>>>>>> don't know how the frame cooking take this as a frame. If it
>>>>>>>>> is not a real frame, the OopMap calculation based on it could
>>>>>>>>> be wrong, and the value in the stack slot could be invalid. If
>>>>>>>>> there is no problem about the frame info here, then the value
>>>>>>>>> on stack may come from last GC.
>>>>>>>>>
>>>>>>>>> searched invalid oop: 0x8d8505f8 in java heap, non exist.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Yumin
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list