gdb and OpenJDK

Thu Feb 19 16:47:13 UTC 2015

I spent a great deal of time with native object file debugging formats.
But the size of the Dwarf .debuginfo section of a .so file should not be used as a measure of the debug info size for a single object file.
Back when I worked with the older Stabs debug format (a less structured format), the Dwarf format was generally smaller in a single object file (.o file) than Stabs.
The Stabs format benefited from a link time compression that made the resulting shared object files smaller by merging together common type information coming from the .o files. I'm pretty sure that the equivalent compression for Dwarf at link time never materialized, but I may be wrong.

In any case, using a standard format for debug info, like Dwarf, would probably yield benefits. The Dwarf format is also customizable, you can create your own tags/attributes, that any Dwarf parser will know how to parse, and ignore or interpret.

Defining yet a new debugging format might sound like a fun project, but I think the Dwarf standard was created by quite a few smart debugging experts, and I would not dismiss it too quickly. Some kind of customized Dwarf format could provide benefits to any debugger that is capable of parsing Dwarf already.

Just my 2 cents.

-kto

On Feb 17, 2015, at 8:15 PM, Alexander Smundak <asmundak at google.com> wrote:

> I considered and even implemented something similar to some of the proposals
> discussed (see https://sourceware.org/ml/gdb-patches/2013-12/msg00964.html
> for the attempt based on JIT reader; the discussion lasted until next June and
> patch was eventually rejected).
> It is definitely doable to generate DWARF unwind info, although it will cost you
> memory (about 7% of the emitted code), and DWARF symbol info, which is going
> to cost you a lot more memory (for the estimate, look at the size of
> the .debuginfo
> for libjvm.so).
> Note also that although Python has a number of drawbacks, verbosity is not one
> of them, so the implementation in C++ will require at least as much
> code. And this
> code not being essential for running HotSpot, its will eventually
> receive about as
> much love as SA; IMHO a better way to address this problem is to have
> unit tests,
> and this requires the same effort for Python as for SA.
> I will get back with the answer about license required tomorrow.
> 
> On Mon, Feb 16, 2015 at 4:48 AM, Erik Helin <erik.helin at oracle.com> wrote:
>> On 2015-02-16, Andrew Haley wrote:
>>> On 02/16/2015 12:06 PM, Erik Helin wrote:
>>>> On 2015-02-16, Andrew Haley wrote:
>>>>> On 02/16/2015 10:43 AM, Volker Simonis wrote:
>>>>>> Now if we replicate this SA code one more time in a Python library for
>>>>>> GDB, you'll probably agree that it can't work more reliably than the
>>>>>> original SA code. This may be good enough for some use cases, but it
>>>>>> won't be perfect. I'm not a gdb/DWARF expert but I think what we
>>>>>> really need is to generate debug information for all the generated
>>>>>> code. We need to know for every single PC of generated code the
>>>>>> corresponding frame information and how to get to the previous frame.
>>>>> 
>>>>> It would be nice.  We don't actually need it, given that we've done
>>>>> without for years, and generating e.g. full DWARF unwinder data for
>>>>> every instruction is something that even GCC doesn't always attempt to
>>>>> do.  (And, of course, there's a lot of hand-written assembly code in
>>>>> HotSpot.  Annotating this is a significant effort.)
>>>> 
>>>> Do we really need to use DWARF though? The gdbjit interface seems to
>>>> support a custom debug format if you also implement a reader for
>>>> your custom debug format. I've never done this, so I can't say if
>>>> there is something missing from the gdbjit API that HotSpot requires.
>>> 
>>> Well, it would have to be able to convey the same information as DWARF
>>> unwinder data; the GDB people tell me that generating some DWARF is
>>> the right way to do it.  But of course I'm not wedded to any
>>> particular format.
>> 
>> I agree that DWARF would be a very nice thing to have, it would (most
>> likely) allow us to print names of variables, arguments etc in a frame.
>> However, as you mentioned, making HotSpot output DWARF in-memory for the
>> assembly it produces would be a massive effort.
>> 
>> I guess what I wonder is, how little debug information can we get away
>> with if we only want to traverse the stack and print the name of each
>> frame? This is why I was interested in the support from gdbjit for a
>> custom debug format.
>> 
>> An alternative to using gdbjit, as mentioned earlier in this thread,
>> would be to generate data structures (structs) at a well-known
>> symbol/address that can easily be consumed from various plugins/tools.
>> The reason for using such approach is to try to keep the maintenance
>> work for each plugin/tool as low as possible.
>> 
>> Thanks,
>> Erik