RFR(M): 8247272: SA ELF file support has never worked for 64-bit causing address to symbol name mapping to fail
Kevin Walls
kevin.walls at oracle.com
Wed Jul 8 21:07:24 UTC 2020
Thanks Chris, it's a bit of clutter, but truthful clutter. 8-)
On 08/07/2020 20:26, Chris Plummer wrote:
> Webrev has been updated with the suggested comment changes. Note to
> new reviewers, look in webrev.00 first since it doesn't have the
> clutter of the comment changes, making it easier to see which lines
> actually have code changes.
>
> http://cr.openjdk.java.net/~cjplummer/8247272/webrev.01/index.html
>
> thanks,
>
> Chris
>
> On 7/8/20 11:04 AM, Chris Plummer wrote:
>> Hi Kevin,
>>
>> Thanks for the review. I'll add the additional Elf64_Addr and
>> Elf64_Off comments. Probably the others should be updated too.
>> Although they are the same size, they do have different names. For
>> example:
>>
>> /* Type for a 16-bit quantity. */
>> typedef uint16_t Elf32_Half;
>> typedef uint16_t Elf64_Half;
>>
>> thanks,
>>
>> Chris
>>
>> On 7/8/20 3:47 AM, Kevin Walls wrote:
>>> Hi Chris --
>>>
>>> This is a great story/history lesson.
>>>
>>> You could if you like, edit those comments in ElfFileParser.java so
>>> "Elf32_Addr" as they will contain either "Elf64_Addr or Elf32_Addr",
>>> similarly Elf64_Off. The other Elf64 fields are the same as the 32
>>> bit ones.
>>>
>>> Yes, the symbol fields are ordered differently.
>>>
>>> So all looks good to me!
>>>
>>> Thanks
>>> Kevin
>>>
>>>
>>>
>>> On 08/07/2020 07:20, Chris Plummer wrote:
>>>> Hello,
>>>>
>>>> Please help review the following:
>>>>
>>>> http://cr.openjdk.java.net/~cjplummer/8247272/webrev.00/index.html
>>>> https://bugs.openjdk.java.net/browse/JDK-8247272
>>>>
>>>> The short story is that SA address to native symbol name
>>>> mapping/lookup has never worked on 64-bit, and this is due to the
>>>> java level ELF file support only supporting 32-bit. This CR fixes
>>>> that, and I believe also maintains 32-bit compatibility, although I
>>>> have no way of testing that.
>>>>
>>>> There is more to the story however on how we got here. Before going
>>>> into the gory detail below, I just want to point out that currently
>>>> nothing is using this support, and therefore it is technically not
>>>> fixing anything, although I did verify that the fixes work (see
>>>> details below). Also, I intend to remove all the java level ELF
>>>> file support as part of JDK-8247516 [1]. The only reason I want to
>>>> push these changes first is because I already did the work to get
>>>> it working with 64-bit, and would like to get it archived before
>>>> removing it in case for some reason it is revived in the future.
>>>>
>>>> Now for the ugly details on how we got here (and you really don't
>>>> need to read this unless you have any concerns with what I stated
>>>> above). It starts with the clhsdb "whatis" command, which was the
>>>> only (indirect) user of this java level ELF file support. It's
>>>> implementation is in javascript, so we have not had access to it
>>>> ever since JDK9 module support broke the SA javascript support (and
>>>> javascript support is now removed). I started the process of
>>>> converting "whatis" to java. It is basically the same as the clhsdb
>>>> "findpc" command, except it also checks for native symbols, which
>>>> it does with the following code:
>>>>
>>>> var dso = loadObjectContainingPC(addr);
>>>> var sym = dso.closestSymbolToPC(addr);
>>>> return sym.name + '+' + sym.offset;
>>>>
>>>> Converting this to java was trivial. I just stuck support for it in
>>>> the PointerFinder class, which is what findpc relies on. However,
>>>> it always failed to successfully lookup a symbol. I found that
>>>> DSO.closestSymbolToPC() called into the java level ELF support, and
>>>> that was failing badly. After some debugging I noticed that the
>>>> values read in for various ELF headers were mostly garbage. It then
>>>> occurred to me that it was reading in 32-bit values that probably
>>>> needed to be 64-bit. Sure enough, this code was never converted to
>>>> 64-bit support. I then went and tried "whatis" on JDK8, the last
>>>> version where it was available, and it failed there also with
>>>> 64-bit binaries. So this is why I initially fixed it to work with
>>>> 64-bit, and also how I tested it (using the modified findpc on a
>>>> native symbol). But the story continues...
>>>>
>>>> DSO.java, and as a consequence the java ELF file support, is used
>>>> by all our posix ports to do address to symbol lookups. So I
>>>> figured that after fixing the java level ELF file support for
>>>> 64-bit, my improved findpc would start working on OSX also. No such
>>>> luck, and for obvious reasons. OSX uses mach-o files. This ELF code
>>>> should never have been used for it, and of course has never worked.
>>>>
>>>> So I was left trying to figure out how to do OSX address to native
>>>> symbol lookups. I then recalled that there was a
>>>> CFrame.closestSymbolToPC() API that did address to native symbol
>>>> lookups for native stack traces, and wondered how it was ever
>>>> working (even on linux with the broken ELF 64-bit support). It
>>>> turns out this takes a very different path to do the lookups,
>>>> ending up in native code in libsaproc, where we also have ELF file
>>>> support. I then converted DSO.closestSymbolToPC(addr) to use this
>>>> libsaproc code instead, and it worked fine. So now there was no
>>>> need for the java level ELF file support since its only user was
>>>> DSO.closestSymbolToPC(addr). I should also add that this is the
>>>> approach that has always been used on windows, with both
>>>> CFrame.closestSymbolToPC() and DSO.closestSymbolToPC(addr) using
>>>> the same libsaproc support.
>>>>
>>>> There is still a bit more to the story. After diverting
>>>> DSO.closestSymbolToPC(addr) to the libsaproc lookup code, it still
>>>> didn't work for OSX. I thought it would just work since the native
>>>> BsdDebuggerLocal.lookupByName0() is implemented, and it seems to
>>>> trickle down to the proper lower level APIs to find the symbol, but
>>>> there were two issues. The first is that for processes there is no
>>>> support for looking up all the libraries and populating the list of
>>>> ps_prochandle structures that are used to do the symbol lookups.
>>>> This was just never implemented (also is why PMap does not work for
>>>> OSX processes). For core files the ps_prochandle structs are there,
>>>> but the lookup code was badly broken. That has now been fixed by
>>>> JDK-8247515 [2], currently out for review. So the end result is
>>>> we'll have address to native symbol lookup for everything but OSX
>>>> processes.
>>>>
>>>> If your still here, thanks for listening!
>>>>
>>>> Chris
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8247516
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8247515
>>>>
>>>>
>>>>
>>>>
>>
>>
>
>
More information about the serviceability-dev
mailing list