RFR(M): 8247272: SA ELF file support has never worked for 64-bit causing address to symbol name mapping to fail

Chris Plummer chris.plummer at oracle.com
Tue Jul 14 19:01:24 UTC 2020


Hello,

Can I get a second reviewer please. As noted below, it's easier to look 
at webrev.00 first to just see the coding changes. webrev.01 just adds 
some updated comments.

thanks,

Chris

On 7/8/20 2:07 PM, Kevin Walls wrote:
> Thanks Chris, it's a bit of clutter, but truthful clutter. 8-)
>
>
> On 08/07/2020 20:26, Chris Plummer wrote:
>> Webrev has been updated with the suggested comment changes. Note to 
>> new reviewers, look in webrev.00 first since it doesn't have the 
>> clutter of the comment changes, making it easier to see which lines 
>> actually have code changes.
>>
>> http://cr.openjdk.java.net/~cjplummer/8247272/webrev.01/index.html
>>
>> thanks,
>>
>> Chris
>>
>> On 7/8/20 11:04 AM, Chris Plummer wrote:
>>> Hi Kevin,
>>>
>>> Thanks for the review. I'll add the additional Elf64_Addr and 
>>> Elf64_Off comments. Probably the others should be updated too. 
>>> Although they are the same size, they do have different names. For 
>>> example:
>>>
>>> /* Type for a 16-bit quantity.  */
>>> typedef uint16_t Elf32_Half;
>>> typedef uint16_t Elf64_Half;
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 7/8/20 3:47 AM, Kevin Walls wrote:
>>>> Hi Chris --
>>>>
>>>> This is a great story/history lesson.
>>>>
>>>> You could if you like, edit those comments in ElfFileParser.java so 
>>>> "Elf32_Addr" as they will contain either "Elf64_Addr or 
>>>> Elf32_Addr", similarly Elf64_Off.  The other Elf64 fields are the 
>>>> same as the 32 bit ones.
>>>>
>>>> Yes, the symbol fields are ordered differently.
>>>>
>>>> So all looks good to me!
>>>>
>>>> Thanks
>>>> Kevin
>>>>
>>>>
>>>>
>>>> On 08/07/2020 07:20, Chris Plummer wrote:
>>>>> Hello,
>>>>>
>>>>> Please help review the following:
>>>>>
>>>>> http://cr.openjdk.java.net/~cjplummer/8247272/webrev.00/index.html
>>>>> https://bugs.openjdk.java.net/browse/JDK-8247272
>>>>>
>>>>> The short story is that SA address to native symbol name 
>>>>> mapping/lookup has never worked on 64-bit, and this is due to the 
>>>>> java level ELF file support only supporting 32-bit. This CR fixes 
>>>>> that, and I believe also maintains 32-bit compatibility, although 
>>>>> I have no way of testing that.
>>>>>
>>>>> There is more to the story however on how we got here. Before 
>>>>> going into the gory detail below, I just want to point out that 
>>>>> currently nothing is using this support, and therefore it is 
>>>>> technically not fixing anything, although I did verify that the 
>>>>> fixes work (see details below). Also, I intend to remove all the 
>>>>> java level ELF file support as part of JDK-8247516 [1]. The only 
>>>>> reason I want to push these changes first is because I already did 
>>>>> the work to get it working with 64-bit, and would like to get it 
>>>>> archived before removing it in case for some reason it is revived 
>>>>> in the future.
>>>>>
>>>>> Now for the ugly details on how we got here (and you really don't 
>>>>> need to read this unless you have any concerns with what I stated 
>>>>> above). It starts with the clhsdb "whatis" command, which was the 
>>>>> only (indirect) user of this java level ELF file support. It's 
>>>>> implementation is in javascript, so we have not had access to it 
>>>>> ever since JDK9 module support broke the SA javascript support 
>>>>> (and javascript support is now removed). I started the process of 
>>>>> converting "whatis" to java. It is basically the same as the 
>>>>> clhsdb "findpc" command, except it also checks for native symbols, 
>>>>> which it does with the following code:
>>>>>
>>>>>   var dso = loadObjectContainingPC(addr);
>>>>>   var sym = dso.closestSymbolToPC(addr);
>>>>>   return sym.name + '+' + sym.offset;
>>>>>
>>>>> Converting this to java was trivial. I just stuck support for it 
>>>>> in the PointerFinder class, which is what findpc relies on. 
>>>>> However, it always failed to successfully lookup a symbol. I found 
>>>>> that DSO.closestSymbolToPC() called into the java level ELF 
>>>>> support, and that was failing badly. After some debugging I 
>>>>> noticed that the values read in for various ELF headers were 
>>>>> mostly garbage. It then occurred to me that it was reading in 
>>>>> 32-bit values that probably needed to be 64-bit. Sure enough, this 
>>>>> code was never converted to 64-bit support. I then went and tried 
>>>>> "whatis" on JDK8, the last version where it was available, and it 
>>>>> failed there also with 64-bit binaries. So this is why I initially 
>>>>> fixed it to work with 64-bit, and also how I tested it (using the 
>>>>> modified findpc on a native symbol). But the story continues...
>>>>>
>>>>> DSO.java, and as a consequence the java ELF file support, is used 
>>>>> by all our posix ports to do address to symbol lookups. So I 
>>>>> figured that after fixing the java level ELF file support for 
>>>>> 64-bit, my improved findpc would start working on OSX also. No 
>>>>> such luck, and for obvious reasons. OSX uses mach-o files. This 
>>>>> ELF code should never have been used for it, and of course has 
>>>>> never worked.
>>>>>
>>>>> So I was left trying to figure out how to do OSX address to native 
>>>>> symbol lookups. I then recalled that there was a 
>>>>> CFrame.closestSymbolToPC() API that did address to native symbol 
>>>>> lookups for native stack traces, and wondered how it was ever 
>>>>> working (even on linux with the broken ELF 64-bit support). It 
>>>>> turns out this takes a very different path to do the lookups, 
>>>>> ending up in native code in libsaproc, where we also have ELF file 
>>>>> support. I then converted DSO.closestSymbolToPC(addr) to use this 
>>>>> libsaproc code instead, and it worked fine. So now there was no 
>>>>> need for the java level ELF file support since its only user was 
>>>>> DSO.closestSymbolToPC(addr). I should also add that this is the 
>>>>> approach that has always been used on windows, with both 
>>>>> CFrame.closestSymbolToPC() and DSO.closestSymbolToPC(addr) using 
>>>>> the same libsaproc support.
>>>>>
>>>>> There is still a bit more to the story. After diverting 
>>>>> DSO.closestSymbolToPC(addr) to the libsaproc lookup code, it still 
>>>>> didn't work for OSX. I thought it would just work since the native 
>>>>> BsdDebuggerLocal.lookupByName0() is implemented, and it seems to 
>>>>> trickle down to the proper lower level APIs to find the symbol, 
>>>>> but there were two issues. The first is that for processes there 
>>>>> is no support for looking up all the libraries and populating the 
>>>>> list of ps_prochandle structures that are used to do the symbol 
>>>>> lookups. This was just never implemented (also is why PMap does 
>>>>> not work for OSX processes). For core files the ps_prochandle 
>>>>> structs are there, but the lookup code was badly broken. That has 
>>>>> now been fixed by JDK-8247515 [2], currently out for review. So 
>>>>> the end result is we'll have address to native symbol lookup for 
>>>>> everything but OSX processes.
>>>>>
>>>>> If  your still here, thanks for listening!
>>>>>
>>>>> Chris
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8247516
>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8247515
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>>




More information about the serviceability-dev mailing list