[PATCH RFC 0/2] Add linux/ppc64 support for Hotspot serviceability agent to read core files
Volker Simonis
volker.simonis at gmail.com
Tue Nov 25 18:40:48 UTC 2014
Hi Maynard,
so here finally comes my version of your change. I've tested 'jstack'
with no, '-F' and '-m' flags on PIDs and core files on BigEndian
Linux/PPC64 and it seems to work quite stable now. I've also done some
basic tests with 'jmap' with no, '-histo' and '-F' flags and 'jinfo'.
All together I think that this change is now ripe for integration. You
can find a webrev with all the changes baked together into one patch
at:
http://cr.openjdk.java.net/~simonis/webrevs/8049716/
Besides some minor fixes I've completely removed the LR-register from
the PPC64 frame class because I think that it is not needed (we also
don't have it in the corresponding HotSpot code. And I've added the
parsing and usage of the Elf function descriptor section (i.e. the
'.opd' section) for resolving native function names. I think this
should not impact Little Endian PPC64 because they shouldn't have an
'.opd' section but of course it would be good if you could test the
new version on ppc64le.
If you have any ideas for improvements please add them to the current
patch and create a new webrev (if you have problems with that just
send me the changes and I'll do it for you). Finally, please send out
a new RFR to ppc-aix-port-dev at openjdk.java.net and
serviceability-dev at openjdk.java.net so that we can finally get this
reviewed and integrated (again, I can help out uploading the webrev if
you can't do that).
Thanks a lot for your contribution and best regards,
Volker
On Thu, Nov 20, 2014 at 7:05 PM, Maynard Johnson <maynardj at us.ibm.com> wrote:
> On 11/20/2014 05:03 AM, Volker Simonis wrote:
>> Hy Maynard,
>>
>> I've just realized that in your patch the two directory patterns
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/linux_ppc64/*.java and
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/ppc64/*.java are absent from
>> "make/sa.files". This of course breaks incremental builds.
> Nice, thank you. I was wondering why incremental build didn't work.
> I had resorted to using a build script that did the following:
> rm build/linux-ppc64-normal-server-release/hotspot/linux_ppc64_compiler2/product/../generated/sa-jdi.jar
> make all JOBS=24
> make install
>
> -Maynard
>
>>
>> Regards,
>> Volker
>>
>>
>> diff -r 6f35dca1949c make/sa.files
>> --- a/make/sa.files Mon Nov 17 14:47:41 2014 +0100
>> +++ b/make/sa.files Thu Nov 20 11:58:52 2014 +0100
>> @@ -94,12 +94,14 @@
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/linux_amd64/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/linux_x86/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/linux_sparc/*.java \
>> +$(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/linux_ppc64/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/posix/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/solaris_amd64/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/solaris_sparc/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/solaris_x86/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/sparc/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/x86/*.java \
>> +$(AGENT_SRC_DIR)/sun/jvm/hotspot/runtime/ppc64/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/tools/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/tools/jcore/*.java \
>> $(AGENT_SRC_DIR)/sun/jvm/hotspot/tools/soql/*.java \
>>
>> On Wed, Nov 19, 2014 at 7:33 PM, Volker Simonis
>> <volker.simonis at gmail.com> wrote:
>>> Hi Maynard,
>>>
>>> I just wanted to let you know that I'm still working on fixing the
>>> bogus entries in the stack trace. I'm pretty sure they are related to
>>> inlining. If you run your test program with "-XX:+PrintCompilation
>>> -XX:+PrintInlining -XX:CICompilerCount=1" you'll get the following
>>> output:
>>>
>>> ..
>>> 10954 5 % test::run_test @ 59 (99 bytes)
>>> @ 75 java.lang.String::getChars (62
>>> bytes) inline (hot)
>>> @ 58 java.lang.System::arraycopy (0
>>> bytes) (intrinsic)
>>> @ 87 test::get_my_chars (86 bytes) inline (hot)
>>> @ 6 java.lang.String::getChars (62
>>> bytes) inline (hot)
>>> @ 58 java.lang.System::arraycopy (0
>>> bytes) (intrinsic)
>>> @ 38 java.lang.String::<init> (62
>>> bytes) inline (hot)
>>> @ 1 java.lang.Object::<init> (1
>>> bytes) inline (hot)
>>> @ 55 java.util.Arrays::copyOfRange
>>> (63 bytes) too big
>>> @ 47 java.lang.StringBuilder::<init>
>>> (7 bytes) inline (hot)
>>> @ 52 java.lang.StringBuilder::append
>>> (8 bytes) executed < MinInliningThreshold times
>>> @ 57 java.lang.StringBuilder::append
>>> (8 bytes) executed < MinInliningThreshold times
>>> @ 62 java.lang.StringBuilder::append
>>> (8 bytes) executed < MinInliningThreshold times
>>> @ 65 java.lang.StringBuilder::toString
>>> (17 bytes) executed < MinInliningThreshold times
>>> @ 79 java.lang.String::length (6
>>> bytes) inline (hot)
>>> @ 82 java.io.OutputStreamWriter::write
>>> (11 bytes) executed < MinInliningThreshold times
>>> ..
>>>
>>> The stack trace I get from jstack looks as follows:
>>>
>>> Thread 4448: (state = IN_JAVA)
>>> - test.get_my_chars(int, int, char[], java.lang.String, long)
>>> @bci=43, line=15 (Compiled frame; information may be imprecise)
>>> - test.run_test() @bci=87, line=35 (Compiled frame)
>>> - java.lang.String.getChars(int, int, char[], int) @bci=58, line=814
>>> (Compiled frame)
>>> - test.run_test() @bci=75, line=34 (Compiled frame)
>>>
>>> From a system perspective 'test::run_test' is one native frame,
>>> because 'test::run_test' inlines all the other functions reported
>>> above. HotSpot has special functionality to detect and walk these
>>> inlined methods (so called "virtual frames" or "vframe"s). For some
>>> reason this vframe walking doesn't seem to work in the agent. In gdb,
>>> when calling "ps()" at the same point where I created the above core
>>> file I'll get the following stack trace:
>>>
>>> (gdb) call ps()
>>>
>>> "Executing ps"
>>> for thread: "main" #1 prio=5 os_prio=0 tid=0x00003fffb0010800
>>> nid=0x1160 runnable [0x0000000000000000]
>>> java.lang.Thread.State: RUNNABLE
>>> JavaThread state: _thread_in_Java
>>> Thread: 0x00003fffb0010800 [0x1160] State: _running _has_called_back
>>> 0 _at_poll_safepoint 0
>>> JavaThread state: _thread_in_Java
>>>
>>> (guessing starting frame id=0x3fffb66ddc60 based on current fp)
>>> C frame (sp=0x00003fffb66ddad0 unextended sp=0x00003fffb66ddad0,
>>> fp=0x00003fffb66ddc60, real_fp=0x00003fffb66ddc60,
>>> pc=0x000000001000067c)
>>> 1 - frame( sp=0x00003fffb66ddc60, unextended_sp=0x00003fffb66ddc60,
>>> fp=0x00003fffb66ddd60, pc=0x00003fffa0159d58)
>>> test.get_my_chars(test.java:16)
>>> 2 - frame( sp=0x00003fffb66ddc60, unextended_sp=0x00003fffb66ddc60,
>>> fp=0x00003fffb66ddd60, pc=0x00003fffa0159d58)
>>> test.run_test(test.java:35)
>>> 3 - frame( sp=0x00003fffb66ddd60, unextended_sp=0x00003fffb66ddd60,
>>> fp=0x00003fffb66dde60, pc=0x00003fffa000f518)
>>> test.main(test.java:56)
>>>
>>> I'll keep you informed once I fixed the problem (I'll also look into
>>> the .opd issue afterwards).
>>>
>>> Regards,
>>> Volker
>>>
>>>
>>> On Tue, Nov 18, 2014 at 12:20 AM, Maynard Johnson <maynardj at us.ibm.com> wrote:
>>>> On 11/17/2014 01:21 PM, Volker Simonis wrote:
>>>>> On Mon, Nov 17, 2014 at 6:59 PM, Maynard Johnson <maynardj at us.ibm.com> wrote:
>>>>>> On 11/17/2014 10:20 AM, Volker Simonis wrote:
>>>>>>> Hi Maynard,
>>>>>>>
>>>>>>> I'm currently looking at your changes. At first glance they look good.
>>>>>>>
>>>>>>> I could open a simple core file which contained both, interpreted and
>>>>>>> compiled frames:
>>>>>>>
>>>>>>> $ jstack ./images/j2sdk-image/bin/java core.7034
>>>>>>> ...
>>>>>>> Thread 7035: (state = IN_VM)
>>>>>>> - sun.misc.Unsafe.putAddress(long, long) @bci=0 (Interpreted frame)
>>>>>>> - Crash.crashIt(sun.misc.Unsafe, int) @bci=10, line=8 (Interpreted frame)
>>>>>>> - Crash.doIt() @bci=45, line=23 (Compiled frame)
>>>>>>> - sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method,
>>>>>>> java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
>>>>>>> - sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object,
>>>>>>> java.lang.Object[]) @bci=100, line=62 (Interpreted frame)
>>>>>>> - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object,
>>>>>>> java.lang.Object[]) @bci=6, line=43 (Interpreted frame)
>>>>>>> - java.lang.reflect.Method.invoke(java.lang.Object,
>>>>>>> java.lang.Object[]) @bci=56, line=498 (Interpreted frame)
>>>>>>> - Crash.main(java.lang.String[]) @bci=32, line=31 (Interpreted frame)
>>>>>>>
>>>>>>> The one thing that doesn't currently work is "jstack -m" (i.e. "mixed
>>>>>>> mode" for java and native frames). Are you aware of this?
>>>>>> Hi, Volker,
>>>>>> Yeah, I knew about this problem and forgot to mention it in my patch posting. I started
>>>>>> looking at it this morning, and so far, I have at least fixed the UnmappedAddressException.
>>>>>> But now I'm getting different results on little endian vs big endian ppc64 systems.
>>>>>> On BE, I either get no symbol names (i.e., "?????") or wrong symbol names. On LE,
>>>>>> I seem to get correct symbol names for the first symbol (either __pthread_cond_wait
>>>>>> or __pthread_cond_timedwait) and the last symbol (start_thread) of each stack, but
>>>>>> everything in between is "?????".
>>>>>>
>>>>>
>>>>> Maybe this is related to the fact that we have function descriptors on
>>>>> BE and simple function pointers on LE. You may have a look at the
>>>>> elf-decoder for ppc64 to find some more information.
>>>>
>>>> Yes, indeed. With the following patch, the mixed mode option works fine on ppc64 little endian,
>>>> but not on big endian:
>>>>
>>>> Index: jdk9-dev/hotspot/agent/src/share/classes/sun/jvm/hotspot/debugger/linux/ppc64/LinuxPPC64CFrame.java
>>>> ===================================================================
>>>> --- jdk9-dev.orig/hotspot/agent/src/share/classes/sun/jvm/hotspot/debugger/linux/ppc64/LinuxPPC64CFrame.java
>>>> +++ jdk9-dev/hotspot/agent/src/share/classes/sun/jvm/hotspot/debugger/linux/ppc64/LinuxPPC64CFrame.java
>>>> @@ -60,14 +60,15 @@ final public class LinuxPPC64CFrame exte
>>>> return null;
>>>> }
>>>>
>>>> - Address nextSP = sp.getAddressAt( PPC64ThreadContext.SP * address_size + PPC64_STACK_BIAS);
>>>> + Address nextSP = sp.getAddressAt(0);
>>>> if (nextSP == null) {
>>>> return null;
>>>> }
>>>> - Address nextPC = sp.getAddressAt(PPC64ThreadContext.PC * address_size + PPC64_STACK_BIAS);
>>>> + Address nextPC = sp.getAddressAt(2 * address_size);
>>>> if (nextPC == null) {
>>>> return null;
>>>> }
>>>> +
>>>> return new LinuxPPC64CFrame(dbg, nextSP, nextPC,address_size);
>>>> }
>>>>
>>>> -------------------------------------------------
>>>>
>>>> I see that ppc64 fixups were made in the hotspot utilities (by you) about a year ago
>>>> (http://cr.openjdk.java.net/~simonis/webrevs/8019929.v3/). We obviously need something
>>>> similar in the hotspot agent native code that implements the JNI call 'lookupByAddress0'.
>>>> I hacked the build_symtab_internal() function in hotspot/agent/src/os/linux/symtab.c and
>>>> see that the symbol "offset" we're getting is really the address of the symbol's opd.
>>>> I'm not sure where to start to fix this, so if you have any suggestions, I'm all ears. :-)
>>>>
>>>> Thanks.
>>>> -Maynard
>>>>>
>>>>>>>
>>>>>>> Regarding your "test.java" example - how do you use it?
>>>>>>>
>>>>>>> If I just attach with jstack to the Java process which runs
>>>>>>> "test.java" I get the correct stack trace of all threads. But I think
>>>>>>> that's actual no SA-functionality but a VM-feature (the same that can
>>>>>>> be triggered by sending kill -SIGQUIT to java process).
>>>>>>>
>>>>>>> If I attach with "jstack -F" I see the problems you mentioned. First I
>>>>>>> didn't saw any frame at all which confused me but then I also saw the
>>>>>>> two cases mentioned by you. I'll need to have a closer look what
>>>>>>> happens.
>>>>>>
>>>>>> I was just running the 'test' java app and, in another session, killing it with SIGSEGV.
>>>>>> To be honest, I wasn't aware of the 'jstack -F' option.
>>>>>>
>>>>>
>>>>> Another possibility I've just found out is to create a core from gdb
>>>>> with the 'generate-core-file' command. You can than still inspect the
>>>>> original program in gdb while debugging how jstack is working on the
>>>>> core file.
>>>>>
>>>>>> -Maynard
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 14, 2014 at 7:09 PM, Maynard Johnson <maynardj at us.ibm.com> wrote:
>>>>>>>> When Hotspot SA tools jmap, jstack, and jsadebugd are run against a core file, they fail with the following runtime exception:
>>>>>>>>
>>>>>>>> OS/CPU combination linux/ppc64 not yet supported
>>>>>>>>
>>>>>>>> I will post a patch set that adds this support. The patch set consists of the following patches:
>>>>>>>>
>>>>>>>> PATCH 1/2: Updates to non-Java files to support linux/ppc64 Hotspot SA with core files
>>>>>>>>
>>>>>>>> PATCH 2/2: New PPC64 class files (and updates to generic files) to support linux/ppc64 Hotspot SA with core files
>>>>>>>>
>>>>>>>> These two patches apply cleanly to a November 13 pull of the jdk9-dev upstream sources.
>>>>>>>>
>>>>>>>> ------------
>>>>>>>> Open issues:
>>>>>>>> ------------
>>>>>>>> 1) The jstack tool does not print a stack entry for the 'main()' method of the Java
>>>>>>>> workload (attached) under test. For example:
>>>>>>>>
>>>>>>>> (Note: Addresses and method signatures elided for brevity.)
>>>>>>>>
>>>>>>>> Thread 24358: (state = IN_JAVA, current Java SP = null
>>>>>>>> )
>>>>>>>> - java.lang.String.getChars(...) @bci=58, line=814, pc=..., Method*=... (Compiled frame; ... imprecise)
>>>>>>>> - test.run_test() @bci=80, line=33, pc=..., Method*=... (Compiled frame)
>>>>>>>> ==> (Expect an entry for test.main() here)
>>>>>>>>
>>>>>>>> 2) The jstack tool sometimes prints what appears to be two complete stacks for the Java workload. For example:
>>>>>>>>
>>>>>>>> Thread 24779: (state = IN_JAVA, current Java SP = null
>>>>>>>> )
>>>>>>>> - java.lang.String.getChars(...) @bci=58, line=814, pc=..., Method*=... (Compiled frame; ... imprecise)
>>>>>>>> - test.run_test() @bci=80, line=33, pc=..., Method*=... (Compiled frame)
>>>>>>>> - test.get_my_chars(...) @bci=39, line=15, pc=..., Method*=... (Compiled frame)
>>>>>>>> - test.run_test() @bci=92, line=34, pc=..., Method*=... (Compiled frame)
>>>>>>>>
>>>>>>>> Again, the 'test.main' method is missing, but there's also the anomaly of the
>>>>>>>> test.run_test' method showing up twice in the stack, implying that it is called
>>>>>>>> by 'test.get_my_chars' at line 15. But that that is not accurate. In fact, run_test
>>>>>>>> does call String.getChars at line 33 *and* it calls test.get_my_chars at line 34 --
>>>>>>>> but these are totally distinct call graphs. Somehow, we are seeing these two distinct
>>>>>>>> stacks in the core file, which seems impossible.
>>>>>>>>
>>>>>>>> ---------
>>>>>>>>
>>>>>>>> Any help offered on these two open issues would be greatly appreciated.
>>>>>>>>
>>>>>>>> -Maynard
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
More information about the ppc-aix-port-dev
mailing list