RFR(M): 8247272: SA ELF file support has never worked for 64-bit causing address to symbol name mapping to fail
Kevin Walls
kevin.walls at oracle.com
Wed Jul 8 10:47:30 UTC 2020
Hi Chris --
This is a great story/history lesson.
You could if you like, edit those comments in ElfFileParser.java so
"Elf32_Addr" as they will contain either "Elf64_Addr or Elf32_Addr",
similarly Elf64_Off. The other Elf64 fields are the same as the 32 bit
ones.
Yes, the symbol fields are ordered differently.
So all looks good to me!
Thanks
Kevin
On 08/07/2020 07:20, Chris Plummer wrote:
> Hello,
>
> Please help review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8247272/webrev.00/index.html
> https://bugs.openjdk.java.net/browse/JDK-8247272
>
> The short story is that SA address to native symbol name
> mapping/lookup has never worked on 64-bit, and this is due to the java
> level ELF file support only supporting 32-bit. This CR fixes that, and
> I believe also maintains 32-bit compatibility, although I have no way
> of testing that.
>
> There is more to the story however on how we got here. Before going
> into the gory detail below, I just want to point out that currently
> nothing is using this support, and therefore it is technically not
> fixing anything, although I did verify that the fixes work (see
> details below). Also, I intend to remove all the java level ELF file
> support as part of JDK-8247516 [1]. The only reason I want to push
> these changes first is because I already did the work to get it
> working with 64-bit, and would like to get it archived before removing
> it in case for some reason it is revived in the future.
>
> Now for the ugly details on how we got here (and you really don't need
> to read this unless you have any concerns with what I stated above).
> It starts with the clhsdb "whatis" command, which was the only
> (indirect) user of this java level ELF file support. It's
> implementation is in javascript, so we have not had access to it ever
> since JDK9 module support broke the SA javascript support (and
> javascript support is now removed). I started the process of
> converting "whatis" to java. It is basically the same as the clhsdb
> "findpc" command, except it also checks for native symbols, which it
> does with the following code:
>
> var dso = loadObjectContainingPC(addr);
> var sym = dso.closestSymbolToPC(addr);
> return sym.name + '+' + sym.offset;
>
> Converting this to java was trivial. I just stuck support for it in
> the PointerFinder class, which is what findpc relies on. However, it
> always failed to successfully lookup a symbol. I found that
> DSO.closestSymbolToPC() called into the java level ELF support, and
> that was failing badly. After some debugging I noticed that the values
> read in for various ELF headers were mostly garbage. It then occurred
> to me that it was reading in 32-bit values that probably needed to be
> 64-bit. Sure enough, this code was never converted to 64-bit support.
> I then went and tried "whatis" on JDK8, the last version where it was
> available, and it failed there also with 64-bit binaries. So this is
> why I initially fixed it to work with 64-bit, and also how I tested it
> (using the modified findpc on a native symbol). But the story
> continues...
>
> DSO.java, and as a consequence the java ELF file support, is used by
> all our posix ports to do address to symbol lookups. So I figured that
> after fixing the java level ELF file support for 64-bit, my improved
> findpc would start working on OSX also. No such luck, and for obvious
> reasons. OSX uses mach-o files. This ELF code should never have been
> used for it, and of course has never worked.
>
> So I was left trying to figure out how to do OSX address to native
> symbol lookups. I then recalled that there was a
> CFrame.closestSymbolToPC() API that did address to native symbol
> lookups for native stack traces, and wondered how it was ever working
> (even on linux with the broken ELF 64-bit support). It turns out this
> takes a very different path to do the lookups, ending up in native
> code in libsaproc, where we also have ELF file support. I then
> converted DSO.closestSymbolToPC(addr) to use this libsaproc code
> instead, and it worked fine. So now there was no need for the java
> level ELF file support since its only user was
> DSO.closestSymbolToPC(addr). I should also add that this is the
> approach that has always been used on windows, with both
> CFrame.closestSymbolToPC() and DSO.closestSymbolToPC(addr) using the
> same libsaproc support.
>
> There is still a bit more to the story. After diverting
> DSO.closestSymbolToPC(addr) to the libsaproc lookup code, it still
> didn't work for OSX. I thought it would just work since the native
> BsdDebuggerLocal.lookupByName0() is implemented, and it seems to
> trickle down to the proper lower level APIs to find the symbol, but
> there were two issues. The first is that for processes there is no
> support for looking up all the libraries and populating the list of
> ps_prochandle structures that are used to do the symbol lookups. This
> was just never implemented (also is why PMap does not work for OSX
> processes). For core files the ps_prochandle structs are there, but
> the lookup code was badly broken. That has now been fixed by
> JDK-8247515 [2], currently out for review. So the end result is we'll
> have address to native symbol lookup for everything but OSX processes.
>
> If your still here, thanks for listening!
>
> Chris
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8247516
> [2] https://bugs.openjdk.java.net/browse/JDK-8247515
>
>
>
>
More information about the serviceability-dev
mailing list