Project to improve hs_err files
John Rose
john.r.rose at oracle.com
Wed Feb 12 11:15:07 PST 2014
The hs_err file has grown to include lots of handy information. I agree that it would be reasonable to add more, and I'm really glad that you are thinking about it in this level of detail. This is especially good as you are an experienced consumer of these files.
The typical size of such a file is currently about 40kb. As long as the most useful information is kept near the top, there is (IMO) room for this file to grow 2x or more in typical size.
Some of the configuration information you mention may be present at the top of the hotspot.log file, before the big <tty> element. It might be fruitful to ensure that such preamble information is always captured at startup, and dumped into the log file.
I don't think dump-time disassembly is practical, since we don't have an engine bundled in the JVM, but we should make it possible with post-processing to get a good disassembly from hex dumps in the error dump file. This has been done before; perhaps it needs reviving or refinement.
Here's another thought, along the lines of symbolic disassembly, but for data rather than code:
One thing I would like to see more of is memory contents, along with a way to interpret their meaning. The memory blocks around current PC and SP is supplied. It might be worth while dumping additional memory blocks one or two indirections away from the (apparent) pointers in those initial memory blocks. I often wonder, "is that the object I care about?" when looking at those memory dumps. I am guessing that there is a cheap, robust way to put more clues into the dump, without getting entangled in object parsing (which as David points out could cause further crashing). Perhaps there is a way to classify data words in a post-processing tool, like we can pull out disassembled code. At least, we can observe whether an apparent point refers into a live part of the heap (assuming we have the right few words of heap boundary info).
We could also (maybe) identify Klass pointers in the headers of objects and output a little bit of data in the crash log to make it possible to identify the (apparent) classes of (apparent) object pointers in the regions dumped. At least the values of well-known classes (in SystemDictionary::_something), if they occur as the (apparent) classes of hex dump addresses, could be supplied as an extra hint. Clearly this could scale beyond the reasonable size of a crash dump, so some sort of size limit would need to be applied. (The size limit could be set to zero, or the log file section removed, if customers are nervous about memory dumps.) I think there is scope for tasteful engineering here, especially if we push fancy formatting work into a post-pass tool.
Perhaps there is a way to join hands with the SA (serviceability agent) infrastructure, and run a tiny SA instance out of a relatively limited supply of hex dump from the crash file, instead of out of the full picture supplied by the core file or a live process. It's at least an interesting thought experiment.
Please keep up this good work!
— John
On Sep 9, 2013, at 10:38 AM, Mattis Castegren <mattis.castegren at oracle.com> wrote:
> Hi. I sent this email to serviceability and runtime, but I got a request to forward the mail to all of hotspot dev as hs_err files affects all areas. Please let me know if you have any feedback. Don't worry about if the suggestions are feasible or not, that will come in a second step.
More information about the hotspot-dev
mailing list